Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Just another bughunt

1,883 views

Published on

It's not the bugs you know that kill a website. It's the ones you can't see, lurking just out of sight, that get you. Learn how Lafayette College identified the Lovecraftian code horrors lurking beneath its feet with tools like Splunk (server log analysis), OSSEC (server-side bad behavior monitor) and SiteImprove (web page auditing tool) and then surgically eliminated the problems. Examples include PHP scripts spewing error notices into logs, undiscovered CAS authentication failures, and thumbnail generation scripts that choke on large files.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Just another bughunt

  1. 1. Just another bughunt? Tools to improve your site without nuking it from orbit Ken Newquist (@knewquist) | Charles Fulton (@mackensen) #DPA11
  2. 2. Who we are Ken Newquist Director, Web Applications Development Lafayette College Charles Fulton Senior Web Applications Developer Lafayette College #DPA11
  3. 3. Rebuild or Fix? ● Your website’s problems may seem intractable ● The temptation to nuke the bugs and start fresh is strong ● We’ve found tools that identify the problems so we can surgically eliminate them ○ (and find a few issues we didn’t know about in the process) #DPA11
  4. 4. Tools #DPA11
  5. 5. Siteimprove ● Crawls web presence ● Reports broken links and common misspellings ● Shows changes over time ● Pretty graphs! #DPA11
  6. 6. Pretty graph! #DPA11
  7. 7. Splunk ● Log aggregation ● Real-time monitoring ● Rich analysis ● More pretty graphs! #DPA11
  8. 8. Another pretty graph! #DPA11
  9. 9. Nagios ● Real-time monitoring ● Defines a base-line of system performance ● Does not detect presence of dinosaurs #DPA11
  10. 10. Dinosaurs! #DPA11
  11. 11. OSSEC ● Log-based intrusion detection system ● Define states of acceptable behavior ● No pretty graphs #DPA11
  12. 12. Not a pretty graph :/ #DPA11
  13. 13. Discovering your web presence ● Define expected behavior with OSSEC & Nagios ● Test expectations with Siteimprove & Splunk ● Here be monsters #DPA11
  14. 14. Investigations #DPA11
  15. 15. The Lost Thumbnails ● Site: Moodle ● Tools: Splunk, OSSEC ● Outcome: Improved Apache configuration #DPA11
  16. 16. Sky falling! ● Splunk reported ~400 500 internal server errors within a few minutes ● Also showed concentrated bursts of 404 errors when viewing resources ● Concern within department that sky was falling #DPA11
  17. 17. Sky not falling! ● System ran out of memory generating thumbnails from massive images; threw 500s ● Preview of missing images generated the 404s #DPA11
  18. 18. Outcomes ● Memory limits were not reasonable ● Users do not report catastrophic errors #DPA11
  19. 19. Comments ● Site: WordPress ● Tools: Splunk, OSSEC ● Outcome: WordPress core fixes #DPA11
  20. 20. What Lies Beneath ● 500 errors are reserved for server issues ● WordPress has notions of its own ○ Double-submitted comment? 500 error ○ Missing a required field? 500 error ○ Blank comment? 500 error ● OSSEC would ban all of these for bad behavior #DPA11
  21. 21. https://github.com/bigcompany/know-your-http #DPA11
  22. 22. Outcomes ● Learned reasonable mistakes can yield unreasonable error codes ● Hacked core to return 200s and 400s instead ● Core is discussing what to do ○ https://core.trac.wordpress.org/ticket/11286 #DPA11
  23. 23. Revenge of the Base Theme ● Site: WordPress ● Tools: Siteimprove ● Outcome: WordPress theme fix; Apache configuration change #DPA11
  24. 24. March 10: the day the links broke #DPA11
  25. 25. Nothing to see here … oh wait-- ● Developer dismissed initial reports of login issues as user error ● Then Siteimprove said we had 1,800 new broken links ● A two-character change in RHEL defaults for httpd.conf broke WordPress #DPA11
  26. 26. Lessons ● Small changes have vast consequences ● Documentation is doubleplusgood #DPA11
  27. 27. The Incredible Shrinking Provost ● Site: Drupal ● Tools: Splunk ● Outcome: Cleaned data in ERP system #DPA11
  28. 28. Who’s the fairest of them all? ● The directory passes the search query via a GET parameter ● Splunk told us our associate provost, “Jane Doe”, was most-searched by an order of magnitude #DPA11
  29. 29. ...we searched for “Jane Doe”... ...and the search returned... ...NOTHING! #DPA11
  30. 30. Lessons ● “Jane A. B. Doe !== Jane Doe” ● Data lies #DPA11
  31. 31. Dumpster fire #DPA11
  32. 32. The Virtual Tour ● Site: Custom app ● Tools: Splunk ● Outcome: Fixed PHP bugs #DPA11
  33. 33. Pretty graphs! ● 238,908 errors...in three days ● (We didn’t expect that) #DPA11
  34. 34. Fixed it! #DPA11
  35. 35. Outcomes ● No one cares that we fixed the Virtual Tour ○ (we feel better though) #DPA11
  36. 36. Mr. Foo and Mr. Bar ● Site: WordPress ● Tools: Splunk ● Outcome: Disproved long-standing alleged bug #DPA11
  37. 37. I swear I wasn’t there! ● Various reports over the years alleging that WordPress improperly reported another user was editing a post ● Much speculation and theorizing in absence of facts #DPA11
  38. 38. Outcomes ● People are wrong on the Internet #DPA11
  39. 39. The Cache That Wouldn’t Die ● Site: WordPress ● Tools: Nagios ● Outcome: Database size reduced by two-thirds #DPA11
  40. 40. Doom at 11…. ● Nagios had concerns ● MySQL ran out of disk space ● Size of WordPress DB tripled in two weeks #DPA11
  41. 41. SELECT option_name FROM wp_190_options WHERE option_name LIKE "displayed_gallery%"; ... | displayed_gallery_rendering_ffffb5e48845fbb7b3347244f8aa06d4 | | displayed_gallery_rendering_ffffd6d9f2ab40195295c70f775b0ee8 | | displayed_gallery_rendering_ffffe1416b8d969e25ec7a6094282bbe | | displayed_gallery_rendering_ffffe8e4a0c399605f434bd51be2d9d7 | +--------------------------------------------------------------+ 722141 rows in set (2.28 sec) Pretty terminal dumps? #DPA11
  42. 42. …Salvation at Noon ● The Google Mini found something terrible lurking in club websites ● NextGEN Gallery bug caused near-endless crawl by the mini ● Code bug meant the cache never expired #DPA11
  43. 43. Outcomes ● NextGEN Gallery has stability issues ● Listen to Nagios ● It’s turtles all the way down #DPA11
  44. 44. Attack of the Python Script ● Site: WordPress ● Tools: Nagios, Splunk ● Outcome: Quickly identified source of massive load event #DPA11
  45. 45. Traffic Jam! ● Load on a server spiked at 800% ● Seemed bad ● Nagios had more concerns #DPA11
  46. 46. Hello there! ● Splunk real-time monitoring revealed top client IPs ● We’re very popular with a misconfigured IIS Server in Oregon and its “Python-urllib/3.4” script #DPA11
  47. 47. Outcomes ● Banned the IP on the proxy ● Began developing rate-limiting rules for OSSEC #DPA11
  48. 48. Alternatives #DPA11
  49. 49. Bughunting on the cheap W3C Link Checker ● Reports on broken links to a specified depth ● http://validator.w3.org/checklink Google Webmaster Tools ● Details on broken links and server errors ● https://www.google.com/webmasters/tools/ #DPA11
  50. 50. More options ● Bureau of Internet Accessibility ○ Cheaper than Siteimprove ○ Broken link and accessibility reports ○ http://www.boia.org ● Google Analytics ○ Identify high-traffic broken pages ○ http://google.com/analytics ● vim | grep ○ Eyeballing your logs can’t hurt #DPA11
  51. 51. Conclusions #DPA11
  52. 52. Did we really fix all those errors? Or is logging broken? #DPA11
  53. 53. Takeaways ● Data are free ● Bugs are hard to find ● Reports are expensive ● Good reports make finding bugs easy ● You can improve your site without rebuilding it from scratch ● You will find more bugs than you can fix #DPA11
  54. 54. #DPA11
  55. 55. Anatomy of a Redirect ● Tool: Splunk ● Forthcoming from Lafayette College ● WordPress tries to be helpful! #DPA11
  56. 56. Join the discussion at https://core.trac.wordpress.org/ticket/16557! #DPA11
  57. 57. Questions? Ken Newquist ● newquisk@lafayette.edu ● @knewquist Charles Fulton ● fultonc@lafayette.edu ● @mackensen #DPA11

×