Static Analysis   for PHP          PHPDay Verona, Italy 2012 Nick Galbreath @ngalbreath nickg@etsy.com
http://slidesha.re/  KzTfLygithub.com/client9/hphp-tools    Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
Static Analysis• Typically analyzes source code “at rest” for  bugs, security problems, leaks, threading  problems.• We’ll...
Dynamic Analysis        Analysis of code while running• valgrind http://valgrind.org/• xdebug http://xdebug.org/• xhprof h...
Simple Static Analysis
The Littlest Static Analysis                  php -l  • Syntax errors should never be committed.  • Syntax errors should n...
PHP Leading Whitespacepre-commit check that every file starts witheither #! or <?php exactlyNick Galbreath @ngalbreath PHPD...
PHP Trailing Whitespace Check that file ends exactly with ?> or make sure it doesn’t have a closing tag. Nick Galbreath @ng...
Anti-Virus      On Source Code• It’s static analysis too!• Not so concerned with PHP but do you  have Javascript, Flash, W...
ClamAV• http://www.clamav.net/• Free anti-virus.• Available on every OS.
ClamAV Performance   1G of Source Code / Minute      Why not do it?
Advanced Static Analysis
Why not use... AST?http://docs.php.net/manual/en/function.token-get-all.php    •    token_get_all($file)  takes a file and ...
Why not use...         CodeSniffer? http://pear.php.net/package/PHP_CodeSniffer• Excellent tool, but...• Based on token_ge...
Why not use...• php-SAT:Orphaned 2009• php-AST: Orphaned 2008• phc: active but doesn’t support... OBJECTS• Every other PHP...
Facebook’s HpHp• A full re-implementation of Apache+PHP• Compiles PHP to it’s own byte code format  and executes in own ru...
Bad News #1      No action since        2011-12-06Facebook appears to use “code drops”instead of true “streaming” open sou...
Bad News #2Missing Many Common       Modules• Has: apc, array, bcmath, bzip2, ctype,  curl,iconv, gd, imap, ipc, json, lda...
Bad News #3Doesn’t Track PHP 5.3                PHP 5.4? No way!•   Some functions signatures aren’t quite    right. e.g. ...
Bad News #4    Seriously #*$%&!#     annoying to build• My crappy CentOS build script  https://github.com/client9/hphp-too...
Bad News #5    Won’t help with   Dynamic Evaluation$fn = “foo”;$fn(1,2,3); // function not foundeval(“foo(1,2,3)”); // no ...
Conclusion• You aren’t going to run your application  under HpHp (at least not as is)• But, it has a great static analyzer...
Using HpHp
Step 1: Make a         constants file• HpHp doesn’t know about hardwired constants• Nifty script generates the constants• M...
Step 2: Make a stubs file• HpHp doesn’t have many binary extensions• But... the analyzer doesn’t care. Just make a  stub fu...
Step 3: Create the file list • Create a list of all php files to be analyzed   and include your constants and stubs file. • I...
correction:       grep -v helper.idl.php | grep -v constants.php >> $JUNK          Nick Galbreath @ngalbreath PHPDay Veron...
Step 4: Do itInclude paths are a bit mysterious.You’ll have to play around to get it right.   Nick Galbreath @ngalbreath P...
Step 5: Analyze it•   /tmp/hphp/Stats.js contains some...    statistics in JSON format.•   /tmp/hphp/CodeError.js is were ...
UseUndeclaredVariable • #1 bug. • Typically typos, scoping or cut-n-paste   errors • Found frequently in error handling ca...
TooManyArgumentTooFewArgumentToo Many Arguments typically indicates thecaller is confused and has logic errors (bug).Too F...
BadDefine    UsesEvaluationdefine($k, $v);eval(“1+1”); • “Bad” since HpHp can’t compile it, but likely   legal PHP. • Avoi...
UseUndeclaredGlobalVariable  • HpHp only defines certain globals.  • Used only by Smarty?   •   $GLOBALS[HTTP_SERVER_VARS] ...
UseVoidReturnSome function returns “nothing” but thevalue is usedfunction foo() {   if (time() % 60 == 0) { return true; }...
RequiredAfterOptionalParam   function foo($first, $second=2, $third) {   • IMHO should be a PHP syntax error   • Confusing...
DeclaredConstantTwice• Probably not invalid PHP, but HpHp  analyzes all files at once.• Best to have one file that defines co...
UnknownFunction UnknownObjectMethod    UnknownClass  UnknownBaseClass• Is your file list complete?• Do you need to make stu...
BadPHPIncludeFile Likely a PHP file trying to include/require itself or invalid file name or your autoloader is ambiguous. N...
PHPIncludeFileNotFound • Really common • Probably unique to your autoloader. • Not sure I quite understand how HpHp   comp...
HpHp at Etsy
Every Commit• Every commit gets checked in real-time• “try-server” also allows developers to test  before committing.• Fin...
Analysis•   CodeError.js is processed through a    custom script.• Has a large blacklist of checks or files we    don’t car...
hphp-try runs in Jenkins                       oops                                 Console Output                        ...
Work in Progress• It took a lot of work to get the code base in  shape so we could add pre-commit hook.• Over 200 real pro...
Can We Do Better?
Checks aren’t that      complicated• HpHp’s runtime type-inference isn’t used for  static analysis (good since type-infere...
Slice off HpHp?• The HpHp Runtime is nice, but really  complicated and a moving target.• Can we slice out the analysis par...
Or Build New?• Can this run off “byte code” or hook into  the parsing step of PHP?• Exec a snippet of PHP for the loading ...
Acknowledgments and References
Thanks• The Facebook Team!• Sebastian Bergman who first blogged about  using HpHp for static analysis• Rasmus who first hack...
Facebook References• https://github.com/facebook/hiphop-php  Main source repo + wiki• http://developers.facebook.com/blog/...
Notes from    Sebastian Bergman• http://sebastian-bergmann.de/archives/894-  Using-HipHop-for-Static-Analysis.html  Static...
Misc References• http://arstechnica.com/business/2011/12/  facebook-looks-to-fix-php-performance-  with-hiphop-virtual-mach...
This Talk• These slides are posted at  http://slidesha.re/KzTfLy• Tools for building on CentOS  https://github.com/client9...
Nick Galbreath nickg@etsy.com @ngalbreath      PHPDay Verona Italy May 19, 2012            http://2012.phpday.it/
Upcoming SlideShare
Loading in …5
×

Static Analysis for PHP, from PHPDay Italy 2012

20,073
-1

Published on

Techniques for static analysis of PHP, include using Facebooks HipHop for PHP (HpHp) and ClamAV. First presented at PHPDay 2012, Verona, Italy, May 19, 2012. Companion source code is available at https://github.com/client9/hphp-tools

Published in: Technology
6 Comments
26 Likes
Statistics
Notes
No Downloads
Views
Total Views
20,073
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
93
Comments
6
Likes
26
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Static Analysis for PHP, from PHPDay Italy 2012

    1. 1. Static Analysis for PHP PHPDay Verona, Italy 2012 Nick Galbreath @ngalbreath nickg@etsy.com
    2. 2. http://slidesha.re/ KzTfLygithub.com/client9/hphp-tools Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    3. 3. Static Analysis• Typically analyzes source code “at rest” for bugs, security problems, leaks, threading problems.• We’ll cover simple checks and HpHp• Some commercial tools exists too. Veracode runs off of PHP byte code http://www.veracode.com/products/static Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    4. 4. Dynamic Analysis Analysis of code while running• valgrind http://valgrind.org/• xdebug http://xdebug.org/• xhprof http://pecl.php.net/package/xhprof Great tools, but not for this talk.Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    5. 5. Simple Static Analysis
    6. 6. The Littlest Static Analysis php -l • Syntax errors should never be committed. • Syntax errors should never go to prod! • Make sure dev and prod versions of PHP are identical Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    7. 7. PHP Leading Whitespacepre-commit check that every file starts witheither #! or <?php exactlyNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    8. 8. PHP Trailing Whitespace Check that file ends exactly with ?> or make sure it doesn’t have a closing tag. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    9. 9. Anti-Virus On Source Code• It’s static analysis too!• Not so concerned with PHP but do you have Javascript, Flash, Word, PowerPoint, PDFs, ZIPs in your source tree? Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    10. 10. ClamAV• http://www.clamav.net/• Free anti-virus.• Available on every OS.
    11. 11. ClamAV Performance 1G of Source Code / Minute Why not do it?
    12. 12. Advanced Static Analysis
    13. 13. Why not use... AST?http://docs.php.net/manual/en/function.token-get-all.php • token_get_all($file) takes a file and returns an Abstract Syntax Tree in php. • Orders of magnitude slower -- can’t use for pre-commit check on large code bases • Too low level -- need to turn it into an intermediate representation. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    14. 14. Why not use... CodeSniffer? http://pear.php.net/package/PHP_CodeSniffer• Excellent tool, but...• Based on token_get_all• SAX-style API• Too slow for pre/post commit hooks Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    15. 15. Why not use...• php-SAT:Orphaned 2009• php-AST: Orphaned 2008• phc: active but doesn’t support... OBJECTS• Every other PHP to Java translator or converter is orphaned or has other problems Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    16. 16. Facebook’s HpHp• A full re-implementation of Apache+PHP• Compiles PHP to it’s own byte code format and executes in own runtime.• May also translate to C++ for other compilation or use JIT• Does type-inference for speed-ups• Also includes a HTTP web server Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    17. 17. Bad News #1 No action since 2011-12-06Facebook appears to use “code drops”instead of true “streaming” open sourcemodel. BOO.Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    18. 18. Bad News #2Missing Many Common Modules• Has: apc, array, bcmath, bzip2, ctype, curl,iconv, gd, imap, ipc, json, ldap,math,mb, mcrypt,memcache, mysql, network, openssl, pdo, posix, preg, process, session, simplexml, soap, socket, slqite3, stream, string, thread, thrift, url, xml*, zlib.• That’s it! (No filter_var, no ftp, no ..) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    19. 19. Bad News #3Doesn’t Track PHP 5.3 PHP 5.4? No way!• Some functions signatures aren’t quite right. e.g. debug_print_backtrace • HpHp 2 arguments • PHP 5.3.6 3 arguments • PHP 5.4 4 arguments• (End up needing to whitelist this to ignore false positives) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    20. 20. Bad News #4 Seriously #*$%&!# annoying to build• My crappy CentOS build script https://github.com/client9/hphp-tools• Ubuntu users are slightly better off (see HpHp wiki)• Takes hours. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    21. 21. Bad News #5 Won’t help with Dynamic Evaluation$fn = “foo”;$fn(1,2,3); // function not foundeval(“foo(1,2,3)”); // no • This is more for runtime dynamic analysis. • Try to avoid this anyways. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    22. 22. Conclusion• You aren’t going to run your application under HpHp (at least not as is)• But, it has a great static analyzer that works and finds real bugs really fast.• Scans thousands of files in a few seconds Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    23. 23. Using HpHp
    24. 24. Step 1: Make a constants file• HpHp doesn’t know about hardwired constants• Nifty script generates the constants• May need to hand edit Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    25. 25. Step 2: Make a stubs file• HpHp doesn’t have many binary extensions• But... the analyzer doesn’t care. Just make a stub function. // http://php.net/manual/en/function.filter-var.php function filter_var($var, $filter=0, $options=NULL) { return $var; } You can make stub classes as well. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    26. 26. Step 3: Create the file list • Create a list of all php files to be analyzed and include your constants and stubs file. • Ignore phpunit and other tests • HpHp implements much of PHP base functionality as PHP code. (e.g. the Exception class is written in PHP). You need to add these system PHP as well Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    27. 27. correction: grep -v helper.idl.php | grep -v constants.php >> $JUNK Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    28. 28. Step 4: Do itInclude paths are a bit mysterious.You’ll have to play around to get it right. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    29. 29. Step 5: Analyze it• /tmp/hphp/Stats.js contains some... statistics in JSON format.• /tmp/hphp/CodeError.js is were the good stuff is.• JSON format, includes: Error type, file, line number, code snippetNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    30. 30. UseUndeclaredVariable • #1 bug. • Typically typos, scoping or cut-n-paste errors • Found frequently in error handling casesif (!$ok) { error_log(“$user_id has a problem”); Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    31. 31. TooManyArgumentTooFewArgumentToo Many Arguments typically indicates thecaller is confused and has logic errors (bug).Too Few Arguments is frequently a seriousbug as PHP silently fails and defaults to null.hash_hmac(‘sha1’, ‘foo’); // ooops no keyNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    32. 32. BadDefine UsesEvaluationdefine($k, $v);eval(“1+1”); • “Bad” since HpHp can’t compile it, but likely legal PHP. • Avoid using dynamic constant generation. Use configuration file instead. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    33. 33. UseUndeclaredGlobalVariable • HpHp only defines certain globals. • Used only by Smarty? • $GLOBALS[HTTP_SERVER_VARS] • $GLOBALS[HTTP_SESSION_VARS] • $GLOBALS[HTTP_COOKIE_VARS] Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    34. 34. UseVoidReturnSome function returns “nothing” but thevalue is usedfunction foo() { if (time() % 60 == 0) { return true; } // oops void}$now = foo(); // errorNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    35. 35. RequiredAfterOptionalParam function foo($first, $second=2, $third) { • IMHO should be a PHP syntax error • Confusing • (Oops, I haven’t investigated behavior) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    36. 36. DeclaredConstantTwice• Probably not invalid PHP, but HpHp analyzes all files at once.• Best to have one file that defines constants or just not use them. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    37. 37. UnknownFunction UnknownObjectMethod UnknownClass UnknownBaseClass• Is your file list complete?• Do you need to make stubs? Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    38. 38. BadPHPIncludeFile Likely a PHP file trying to include/require itself or invalid file name or your autoloader is ambiguous. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    39. 39. PHPIncludeFileNotFound • Really common • Probably unique to your autoloader. • Not sure I quite understand how HpHp computes file names and loads includes, requires, require_once... yet Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    40. 40. HpHp at Etsy
    41. 41. Every Commit• Every commit gets checked in real-time• “try-server” also allows developers to test before committing.• Finds and prevents bugs before they go live every day.• Almost no false positives (!!)• Developers love it (especially the Java groups) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    42. 42. Analysis• CodeError.js is processed through a custom script.• Has a large blacklist of checks or files we don’t care about (3rd party, known bad, etc).• File and line info pass through to git blame to find author and date/time. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    43. 43. hphp-try runs in Jenkins oops Console Output gives details Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    44. 44. Work in Progress• It took a lot of work to get the code base in shape so we could add pre-commit hook.• Over 200 real problems first identified.• We still have blacklisted some checks since we are still cleaning up legacy code (and figuring out how HpHp works) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    45. 45. Can We Do Better?
    46. 46. Checks aren’t that complicated• HpHp’s runtime type-inference isn’t used for static analysis (good since type-inference is hard)• All checks are fairly simple book-keeping.• All could be done in CodeSniffer/AST but too slow Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    47. 47. Slice off HpHp?• The HpHp Runtime is nice, but really complicated and a moving target.• Can we slice out the analysis part of HpHp?• Much simpler to build, easier to hack on. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    48. 48. Or Build New?• Can this run off “byte code” or hook into the parsing step of PHP?• Exec a snippet of PHP for the loading script files ?• Seems feasible Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    49. 49. Acknowledgments and References
    50. 50. Thanks• The Facebook Team!• Sebastian Bergman who first blogged about using HpHp for static analysis• Rasmus who first hacked up a version of HpHp in house at Etsy• The QA and DevTools teams at Etsy• All the Etsy developers who had some painful weeks getting the code in shape! Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    51. 51. Facebook References• https://github.com/facebook/hiphop-php Main source repo + wiki• http://developers.facebook.com/blog/post/ 2010/02/02/hiphop-for-php--move-fast/ Main announcement, 2010-02-02• https://www.facebook.com/note.php? note_id=416880943919 Update 2012-08-13 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    52. 52. Notes from Sebastian Bergman• http://sebastian-bergmann.de/archives/894- Using-HipHop-for-Static-Analysis.html Static Analysis Intro, 2010-07-27• http://sebastian-bergmann.de/archives/918- Static-Analysis-with-HipHop-for-PHP.html Tool to help process output, 2012-01-27 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    53. 53. Misc References• http://arstechnica.com/business/2011/12/ facebook-looks-to-fix-php-performance- with-hiphop-virtual-machine/ ArsTechnica overview, 2011-12-13• http://www.serversidemagazine.com/news/ 10-questions-with-facebook-research- engineer-andrei-alexandrescu/ Lots of good stuff in here, 2012-01-29 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    54. 54. This Talk• These slides are posted at http://slidesha.re/KzTfLy• Tools for building on CentOS https://github.com/client9/hphp-tools• More about Nick Galbreath http://client9.com/ Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
    55. 55. Nick Galbreath nickg@etsy.com @ngalbreath PHPDay Verona Italy May 19, 2012 http://2012.phpday.it/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×