Street Fighting Data Science

2,816 views
2,722 views

Published on

Practical problem solving with data involves more than just visualization or applying the latest machine learning techniques. Intuition, domain knowledge, and reasonable approximations can mean the difference between a successful model and a catastrophic failure. We’ll dive into some best practices I’ve extracted from solving real world problems like computing trending topics, cleaning election data, and ranking experts on social networks.

New analysts or engineers are often lost when textbook approaches fail on real world data. Drawing inspiration from problem solving techniques in mathematics and physics, we will walk through examples that illustrate how come up with creative solutions and solve problems with big data.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,816
On SlideShare
0
From Embeds
0
Number of Embeds
132
Actions
Shares
0
Downloads
32
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Street Fighting Data Science

  1. 1. Street Fighting Data Science Pete Skomoroch @peteskomoroch O’Reilly Strata Conference February 28, 2012
  2. 2. To solve hard problems:
  3. 3. Think like a street fighter
  4. 4. AnalyzeImproviseAnticipateAdapt
  5. 5. How does this apply to Data Science?
  6. 6. Pricing model decreases profit in test stores by 30%
  7. 7. What went wrong?
  8. 8. • Ran complex “black box” model• Didn’t analyze the data first• Didn’t anticipate elasticity errors
  9. 9. How could this have been avoided?
  10. 10. The Men Who Stare at Charts
  11. 11. Look at your data
  12. 12. Raw Data: FEC Contributions
  13. 13. not employed 118672 retired 32938self employed 92973 self-employed 25454information requested 17627 information requested per best efforts 1313refused 728 homemaker 4992unemployed 1493 the bank of new york 65self-employed 5919 john mccain 2008 57university of california 825 u.s. government 121microsoft 915 idt corp. 54university of chicago 616 merrill lynch 273harvard university 848 blank rome l.l.p. 51google 662 department of defense 100stanford university 716 u.s. army 90university of washington 614 us army 141ibm 1016 none 642columbia university 782 greenberg traurig 118university of michigan 514 northrop grumman 105freelance 372 at&t 141sa 150 citigroup 134sidley austin llp 509 bridgewater associates 44na 999 univision communications inc. 36
  14. 14. not employed 118672 retired 32938self employed 92973 self-employed 25454information requested 17627 information requested per best efforts 1313refused 728 homemaker 4992unemployed 1493 the bank of new york 65self-employed 5919 john mccain 2008 57university of california 825 u.s. government 121microsoft 915 idt corp. 54university of chicago 616 merrill lynch 273harvard university 848 blank rome l.l.p. 51google 662 department of defense 100stanford university 716 u.s. army 90university of washington 614 us army 141ibm 1016 none 642columbia university 782 greenberg traurig 118university of michigan 514 northrop grumman 105freelance 372 at&t 141sa 150 citigroup 134sidley austin llp 509 bridgewater associates 44na 999 univision communications inc. 36
  15. 15. Katherine Alexandra
  16. 16. “Dont indulge in anyunnecessary, sophisticatedmoves.Youll get clobbered if you do,and in a street fight youllhave your shirt zipped offyou.”- Bruce Lee
  17. 17. Look at your errors
  18. 18. • Sanity check row counts• Track errors over time• Find patterns in the error data• Add missing features to models• Replace models entirely
  19. 19. AnalyzeImproviseAnticipateAdapt
  20. 20. Think like a street fighter

×