0
Street Fighting Data Science                     Pete Skomoroch                    @peteskomoroch          O’Reilly Strata...
To solve hard problems:
Think like a street     fighter
AnalyzeImproviseAnticipateAdapt
How does this apply to   Data Science?
Pricing model decreases profit     in test stores by 30%
What went wrong?
• Ran complex “black box” model• Didn’t analyze the data first• Didn’t anticipate elasticity errors
How could this have  been avoided?
The Men Who Stare at Charts
Look at your data
Raw Data: FEC Contributions
not employed	118672             retired	 32938self employed	92973             self-employed	25454information requested	 17...
not employed	118672             retired	 32938self employed	92973             self-employed	25454information requested	 17...
Katherine Alexandra
“Dont indulge in anyunnecessary, sophisticatedmoves.Youll get clobbered if you do,and in a street fight youllhave your shi...
Look at your errors
• Sanity check row counts• Track errors over time• Find patterns in the error data• Add missing features to models• Replac...
AnalyzeImproviseAnticipateAdapt
Think like a street     fighter
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Street Fighting Data Science
Upcoming SlideShare
Loading in...5
×

Street Fighting Data Science

2,391

Published on

Practical problem solving with data involves more than just visualization or applying the latest machine learning techniques. Intuition, domain knowledge, and reasonable approximations can mean the difference between a successful model and a catastrophic failure. We’ll dive into some best practices I’ve extracted from solving real world problems like computing trending topics, cleaning election data, and ranking experts on social networks.

New analysts or engineers are often lost when textbook approaches fail on real world data. Drawing inspiration from problem solving techniques in mathematics and physics, we will walk through examples that illustrate how come up with creative solutions and solve problems with big data.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,391
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
31
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Street Fighting Data Science"

  1. 1. Street Fighting Data Science Pete Skomoroch @peteskomoroch O’Reilly Strata Conference February 28, 2012
  2. 2. To solve hard problems:
  3. 3. Think like a street fighter
  4. 4. AnalyzeImproviseAnticipateAdapt
  5. 5. How does this apply to Data Science?
  6. 6. Pricing model decreases profit in test stores by 30%
  7. 7. What went wrong?
  8. 8. • Ran complex “black box” model• Didn’t analyze the data first• Didn’t anticipate elasticity errors
  9. 9. How could this have been avoided?
  10. 10. The Men Who Stare at Charts
  11. 11. Look at your data
  12. 12. Raw Data: FEC Contributions
  13. 13. not employed 118672 retired 32938self employed 92973 self-employed 25454information requested 17627 information requested per best efforts 1313refused 728 homemaker 4992unemployed 1493 the bank of new york 65self-employed 5919 john mccain 2008 57university of california 825 u.s. government 121microsoft 915 idt corp. 54university of chicago 616 merrill lynch 273harvard university 848 blank rome l.l.p. 51google 662 department of defense 100stanford university 716 u.s. army 90university of washington 614 us army 141ibm 1016 none 642columbia university 782 greenberg traurig 118university of michigan 514 northrop grumman 105freelance 372 at&t 141sa 150 citigroup 134sidley austin llp 509 bridgewater associates 44na 999 univision communications inc. 36
  14. 14. not employed 118672 retired 32938self employed 92973 self-employed 25454information requested 17627 information requested per best efforts 1313refused 728 homemaker 4992unemployed 1493 the bank of new york 65self-employed 5919 john mccain 2008 57university of california 825 u.s. government 121microsoft 915 idt corp. 54university of chicago 616 merrill lynch 273harvard university 848 blank rome l.l.p. 51google 662 department of defense 100stanford university 716 u.s. army 90university of washington 614 us army 141ibm 1016 none 642columbia university 782 greenberg traurig 118university of michigan 514 northrop grumman 105freelance 372 at&t 141sa 150 citigroup 134sidley austin llp 509 bridgewater associates 44na 999 univision communications inc. 36
  15. 15. Katherine Alexandra
  16. 16. “Dont indulge in anyunnecessary, sophisticatedmoves.Youll get clobbered if you do,and in a street fight youllhave your shirt zipped offyou.”- Bruce Lee
  17. 17. Look at your errors
  18. 18. • Sanity check row counts• Track errors over time• Find patterns in the error data• Add missing features to models• Replace models entirely
  19. 19. AnalyzeImproviseAnticipateAdapt
  20. 20. Think like a street fighter
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×