Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Static Analysis for PHP, from PHPDay Italy 2012

22,249 views

Published on

Techniques for static analysis of PHP, include using Facebooks HipHop for PHP (HpHp) and ClamAV. First presented at PHPDay 2012, Verona, Italy, May 19, 2012. Companion source code is available at https://github.com/client9/hphp-tools

Published in: Technology

Static Analysis for PHP, from PHPDay Italy 2012

  1. 1. Static Analysis for PHP PHPDay Verona, Italy 2012 Nick Galbreath @ngalbreath nickg@etsy.com
  2. 2. http://slidesha.re/ KzTfLygithub.com/client9/hphp-tools Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  3. 3. Static Analysis• Typically analyzes source code “at rest” for bugs, security problems, leaks, threading problems.• We’ll cover simple checks and HpHp• Some commercial tools exists too. Veracode runs off of PHP byte code http://www.veracode.com/products/static Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  4. 4. Dynamic Analysis Analysis of code while running• valgrind http://valgrind.org/• xdebug http://xdebug.org/• xhprof http://pecl.php.net/package/xhprof Great tools, but not for this talk.Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  5. 5. Simple Static Analysis
  6. 6. The Littlest Static Analysis php -l • Syntax errors should never be committed. • Syntax errors should never go to prod! • Make sure dev and prod versions of PHP are identical Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  7. 7. PHP Leading Whitespacepre-commit check that every file starts witheither #! or <?php exactlyNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  8. 8. PHP Trailing Whitespace Check that file ends exactly with ?> or make sure it doesn’t have a closing tag. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  9. 9. Anti-Virus On Source Code• It’s static analysis too!• Not so concerned with PHP but do you have Javascript, Flash, Word, PowerPoint, PDFs, ZIPs in your source tree? Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  10. 10. ClamAV• http://www.clamav.net/• Free anti-virus.• Available on every OS.
  11. 11. ClamAV Performance 1G of Source Code / Minute Why not do it?
  12. 12. Advanced Static Analysis
  13. 13. Why not use... AST?http://docs.php.net/manual/en/function.token-get-all.php • token_get_all($file) takes a file and returns an Abstract Syntax Tree in php. • Orders of magnitude slower -- can’t use for pre-commit check on large code bases • Too low level -- need to turn it into an intermediate representation. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  14. 14. Why not use... CodeSniffer? http://pear.php.net/package/PHP_CodeSniffer• Excellent tool, but...• Based on token_get_all• SAX-style API• Too slow for pre/post commit hooks Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  15. 15. Why not use...• php-SAT:Orphaned 2009• php-AST: Orphaned 2008• phc: active but doesn’t support... OBJECTS• Every other PHP to Java translator or converter is orphaned or has other problems Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  16. 16. Facebook’s HpHp• A full re-implementation of Apache+PHP• Compiles PHP to it’s own byte code format and executes in own runtime.• May also translate to C++ for other compilation or use JIT• Does type-inference for speed-ups• Also includes a HTTP web server Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  17. 17. Bad News #1 No action since 2011-12-06Facebook appears to use “code drops”instead of true “streaming” open sourcemodel. BOO.Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  18. 18. Bad News #2Missing Many Common Modules• Has: apc, array, bcmath, bzip2, ctype, curl,iconv, gd, imap, ipc, json, ldap,math,mb, mcrypt,memcache, mysql, network, openssl, pdo, posix, preg, process, session, simplexml, soap, socket, slqite3, stream, string, thread, thrift, url, xml*, zlib.• That’s it! (No filter_var, no ftp, no ..) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  19. 19. Bad News #3Doesn’t Track PHP 5.3 PHP 5.4? No way!• Some functions signatures aren’t quite right. e.g. debug_print_backtrace • HpHp 2 arguments • PHP 5.3.6 3 arguments • PHP 5.4 4 arguments• (End up needing to whitelist this to ignore false positives) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  20. 20. Bad News #4 Seriously #*$%&!# annoying to build• My crappy CentOS build script https://github.com/client9/hphp-tools• Ubuntu users are slightly better off (see HpHp wiki)• Takes hours. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  21. 21. Bad News #5 Won’t help with Dynamic Evaluation$fn = “foo”;$fn(1,2,3); // function not foundeval(“foo(1,2,3)”); // no • This is more for runtime dynamic analysis. • Try to avoid this anyways. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  22. 22. Conclusion• You aren’t going to run your application under HpHp (at least not as is)• But, it has a great static analyzer that works and finds real bugs really fast.• Scans thousands of files in a few seconds Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  23. 23. Using HpHp
  24. 24. Step 1: Make a constants file• HpHp doesn’t know about hardwired constants• Nifty script generates the constants• May need to hand edit Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  25. 25. Step 2: Make a stubs file• HpHp doesn’t have many binary extensions• But... the analyzer doesn’t care. Just make a stub function. // http://php.net/manual/en/function.filter-var.php function filter_var($var, $filter=0, $options=NULL) { return $var; } You can make stub classes as well. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  26. 26. Step 3: Create the file list • Create a list of all php files to be analyzed and include your constants and stubs file. • Ignore phpunit and other tests • HpHp implements much of PHP base functionality as PHP code. (e.g. the Exception class is written in PHP). You need to add these system PHP as well Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  27. 27. correction: grep -v helper.idl.php | grep -v constants.php >> $JUNK Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  28. 28. Step 4: Do itInclude paths are a bit mysterious.You’ll have to play around to get it right. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  29. 29. Step 5: Analyze it• /tmp/hphp/Stats.js contains some... statistics in JSON format.• /tmp/hphp/CodeError.js is were the good stuff is.• JSON format, includes: Error type, file, line number, code snippetNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  30. 30. UseUndeclaredVariable • #1 bug. • Typically typos, scoping or cut-n-paste errors • Found frequently in error handling casesif (!$ok) { error_log(“$user_id has a problem”); Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  31. 31. TooManyArgumentTooFewArgumentToo Many Arguments typically indicates thecaller is confused and has logic errors (bug).Too Few Arguments is frequently a seriousbug as PHP silently fails and defaults to null.hash_hmac(‘sha1’, ‘foo’); // ooops no keyNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  32. 32. BadDefine UsesEvaluationdefine($k, $v);eval(“1+1”); • “Bad” since HpHp can’t compile it, but likely legal PHP. • Avoid using dynamic constant generation. Use configuration file instead. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  33. 33. UseUndeclaredGlobalVariable • HpHp only defines certain globals. • Used only by Smarty? • $GLOBALS[HTTP_SERVER_VARS] • $GLOBALS[HTTP_SESSION_VARS] • $GLOBALS[HTTP_COOKIE_VARS] Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  34. 34. UseVoidReturnSome function returns “nothing” but thevalue is usedfunction foo() { if (time() % 60 == 0) { return true; } // oops void}$now = foo(); // errorNick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  35. 35. RequiredAfterOptionalParam function foo($first, $second=2, $third) { • IMHO should be a PHP syntax error • Confusing • (Oops, I haven’t investigated behavior) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  36. 36. DeclaredConstantTwice• Probably not invalid PHP, but HpHp analyzes all files at once.• Best to have one file that defines constants or just not use them. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  37. 37. UnknownFunction UnknownObjectMethod UnknownClass UnknownBaseClass• Is your file list complete?• Do you need to make stubs? Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  38. 38. BadPHPIncludeFile Likely a PHP file trying to include/require itself or invalid file name or your autoloader is ambiguous. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  39. 39. PHPIncludeFileNotFound • Really common • Probably unique to your autoloader. • Not sure I quite understand how HpHp computes file names and loads includes, requires, require_once... yet Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  40. 40. HpHp at Etsy
  41. 41. Every Commit• Every commit gets checked in real-time• “try-server” also allows developers to test before committing.• Finds and prevents bugs before they go live every day.• Almost no false positives (!!)• Developers love it (especially the Java groups) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  42. 42. Analysis• CodeError.js is processed through a custom script.• Has a large blacklist of checks or files we don’t care about (3rd party, known bad, etc).• File and line info pass through to git blame to find author and date/time. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  43. 43. hphp-try runs in Jenkins oops Console Output gives details Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  44. 44. Work in Progress• It took a lot of work to get the code base in shape so we could add pre-commit hook.• Over 200 real problems first identified.• We still have blacklisted some checks since we are still cleaning up legacy code (and figuring out how HpHp works) Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  45. 45. Can We Do Better?
  46. 46. Checks aren’t that complicated• HpHp’s runtime type-inference isn’t used for static analysis (good since type-inference is hard)• All checks are fairly simple book-keeping.• All could be done in CodeSniffer/AST but too slow Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  47. 47. Slice off HpHp?• The HpHp Runtime is nice, but really complicated and a moving target.• Can we slice out the analysis part of HpHp?• Much simpler to build, easier to hack on. Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  48. 48. Or Build New?• Can this run off “byte code” or hook into the parsing step of PHP?• Exec a snippet of PHP for the loading script files ?• Seems feasible Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  49. 49. Acknowledgments and References
  50. 50. Thanks• The Facebook Team!• Sebastian Bergman who first blogged about using HpHp for static analysis• Rasmus who first hacked up a version of HpHp in house at Etsy• The QA and DevTools teams at Etsy• All the Etsy developers who had some painful weeks getting the code in shape! Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  51. 51. Facebook References• https://github.com/facebook/hiphop-php Main source repo + wiki• http://developers.facebook.com/blog/post/ 2010/02/02/hiphop-for-php--move-fast/ Main announcement, 2010-02-02• https://www.facebook.com/note.php? note_id=416880943919 Update 2012-08-13 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  52. 52. Notes from Sebastian Bergman• http://sebastian-bergmann.de/archives/894- Using-HipHop-for-Static-Analysis.html Static Analysis Intro, 2010-07-27• http://sebastian-bergmann.de/archives/918- Static-Analysis-with-HipHop-for-PHP.html Tool to help process output, 2012-01-27 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  53. 53. Misc References• http://arstechnica.com/business/2011/12/ facebook-looks-to-fix-php-performance- with-hiphop-virtual-machine/ ArsTechnica overview, 2011-12-13• http://www.serversidemagazine.com/news/ 10-questions-with-facebook-research- engineer-andrei-alexandrescu/ Lots of good stuff in here, 2012-01-29 Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  54. 54. This Talk• These slides are posted at http://slidesha.re/KzTfLy• Tools for building on CentOS https://github.com/client9/hphp-tools• More about Nick Galbreath http://client9.com/ Nick Galbreath @ngalbreath PHPDay Verona, Italy 2012
  55. 55. Nick Galbreath nickg@etsy.com @ngalbreath PHPDay Verona Italy May 19, 2012 http://2012.phpday.it/

×