- 1. WINNING<br />WITH<br />BIG <br />DATA<br />Secrets of the Successful<br />Data Scientist<br />SDForum BI SIG<br />June 15, 2010<br />Michael Driscoll<br />@dataspora<br />
- 2. WHY DATA<br />MATTERS<br />NOW<br />
- 3. THE INDUSTRIAL<br />AGE <br />OF <br />DATA<br />
- 4. WHAT IS <br />BIG DATA?<br />Data that is distributed.<br />
- 5. WHAT IS<br />DATA <br />SCIENCE?<br />
- 6. WHY DATA SCIENCE<br />IS SEXY<br />
- 7. “The sexy job in the next ten years will be statisticians…”<br />- Hal Varian<br />=<br />+<br />
- 8.
- 9. data<br />model<br />1000 bytes<br />2 bytes<br />
- 10. 9 WAYS TO WIN<br />WITH DATA<br />
- 11. 1. CHOOSE THE<br />RIGHT TOOL<br />You don’t need a chainsaw to cut butter.<br />
- 12. 2. COMPRESS EVERYTHING<br />mysqldump -u myuser -p mypasssourceDB | <br />gzip | ssh mike@dataspora.com "cat - | <br />gunzip | mysql -u myuser -p mypasstargetDB"<br />The world is IO-bound.<br />
- 13. 3. SPLIT UP<br />YOUR DATA<br />Split, apply, combine.<br />
- 14. 4. WORK <br />WITH SAMPLES<br />perl -ne "print if (rand() < 0.01)" <br /> data.csv > sample.csv<br />Big Data is heavy, <br />samples are light.<br />
- 15. 5. USE<br />STATISTICS<br />
- 16. COPY<br />FROM OTHERS<br />git clone git://github.com/kevinweil/hadoop-lzo<br />Use open source.<br />
- 17. 7. ESCHEW CHART TYPOLOGIES<br />Charts are compositions,<br />not containers.<br />
- 18. 8. COLORWITH CARE<br />Color can enhance <br />or insult.<br />
- 19. 9. TELL A STORY<br />People are listening.<br />
- 20. ONE <br />SUCCESS<br />STORY<br />
- 21. WHY DO TELCO CUSTOMERS LEAVE?<br />Sign up<br />Leave<br />Goal: “less churn.”<br />
- 22. DATA:<br />BILLIONS<br />OF CALLS<br />… and millions of callers.<br />
- 23. DOES CALL <br />QUALITY<br />MATTER?<br />… a difference,<br />but not significant.<br />
- 24. WHAT ABOUT<br />SOCIAL<br />NETWORKS?<br />Hmmm...<br />
- 25. BUILD THE <br />CALL GRAPH<br />… but is it predictive?<br />
- 26. EVOLUTION OF A CALL GRAPH<br />April<br />
- 27. EVOLUTION OF A CALL GRAPH<br />May<br />
- 28. EVOLUTION OF A CALL GRAPH<br />June<br />
- 29. EVOLUTION OF A CALL GRAPH<br />July<br />
- 30. 700% INCREASE<br />IN CHURN<br />when a cancellation<br />occurs in a call network.<br />
- 31. FINAL <br />THOUGHTS<br />
- 32. THE BIG DATA STACK<br />Actions<br />Data Products<br />(Content Filters, Rec Engines)<br />Analytics<br />(R, SPSS, SAS, SAP)<br />Insights<br />Big Data<br />Dedicated RDBMS <br />Data<br />
- 33. THANKS!<br />QUESTIONS?<br />Michael Driscoll<br />med@dataspora.com<br />@dataspora on Twitter<br />http://www.dataspora.com/blog<br />SDForum BI SIG<br />June 15, 2010<br />

