Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WINNING<br />WITH<br />BIG <br />DATA<br />Secrets of the Successful<br />Data Scientist<br />SDForum BI SIG<br />June 15,...
WHY DATA<br />MATTERS<br />NOW<br />
THE INDUSTRIAL<br />AGE <br />OF <br />DATA<br />
WHAT IS <br />BIG DATA?<br />Data that is distributed.<br />
WHAT IS<br />DATA <br />SCIENCE?<br />
WHY DATA SCIENCE<br />IS SEXY<br />
“The sexy job in the next ten years will be statisticians…”<br />- Hal Varian<br />=<br />+<br />
data<br />model<br />1000 bytes<br />2 bytes<br />
9 WAYS TO WIN<br />WITH DATA<br />
1.  CHOOSE THE<br />RIGHT TOOL<br />You don’t need a chainsaw to cut butter.<br />
2. COMPRESS  EVERYTHING<br />mysqldump -u myuser -p mypasssourceDB | <br />gzip | ssh mike@dataspora.com "cat - | <br />gu...
3. SPLIT UP<br />YOUR DATA<br />Split, apply, combine.<br />
4. WORK <br />WITH SAMPLES<br />perl -ne "print if (rand() < 0.01)"  <br />  data.csv > sample.csv<br />Big Data is heavy,...
5.  USE<br />STATISTICS<br />
COPY<br />FROM OTHERS<br />git clone git://github.com/kevinweil/hadoop-lzo<br />Use open source.<br />
7. ESCHEW CHART TYPOLOGIES<br />Charts are compositions,<br />not containers.<br />
8. COLORWITH CARE<br />Color can enhance <br />or insult.<br />
9. TELL A STORY<br />People are listening.<br />
ONE <br />SUCCESS<br />STORY<br />
WHY DO TELCO CUSTOMERS LEAVE?<br />Sign up<br />Leave<br />Goal:  “less churn.”<br />
DATA:<br />BILLIONS<br />OF CALLS<br />… and millions of callers.<br />
DOES CALL <br />QUALITY<br />MATTER?<br />… a difference,<br />but not significant.<br />
WHAT ABOUT<br />SOCIAL<br />NETWORKS?<br />Hmmm...<br />
BUILD THE <br />CALL GRAPH<br />… but is it predictive?<br />
EVOLUTION OF A CALL GRAPH<br />April<br />
EVOLUTION OF A CALL GRAPH<br />May<br />
EVOLUTION OF A CALL GRAPH<br />June<br />
EVOLUTION OF A CALL GRAPH<br />July<br />
700% INCREASE<br />IN CHURN<br />when a cancellation<br />occurs in a call network.<br />
FINAL <br />THOUGHTS<br />
THE BIG DATA STACK<br />Actions<br />Data Products<br />(Content Filters, Rec Engines)<br />Analytics<br />(R, SPSS, SAS, ...
THANKS!<br />QUESTIONS?<br />Michael Driscoll<br />med@dataspora.com<br />@dataspora  on Twitter<br />http://www.dataspora...
Upcoming SlideShare
Loading in …5
×

Driscoll bi sig_15_jun2010

2,049 views

Published on

Published in: Technology
  • Be the first to comment

Driscoll bi sig_15_jun2010

  1. 1. WINNING<br />WITH<br />BIG <br />DATA<br />Secrets of the Successful<br />Data Scientist<br />SDForum BI SIG<br />June 15, 2010<br />Michael Driscoll<br />@dataspora<br />
  2. 2. WHY DATA<br />MATTERS<br />NOW<br />
  3. 3. THE INDUSTRIAL<br />AGE <br />OF <br />DATA<br />
  4. 4. WHAT IS <br />BIG DATA?<br />Data that is distributed.<br />
  5. 5. WHAT IS<br />DATA <br />SCIENCE?<br />
  6. 6. WHY DATA SCIENCE<br />IS SEXY<br />
  7. 7. “The sexy job in the next ten years will be statisticians…”<br />- Hal Varian<br />=<br />+<br />
  8. 8.
  9. 9. data<br />model<br />1000 bytes<br />2 bytes<br />
  10. 10. 9 WAYS TO WIN<br />WITH DATA<br />
  11. 11. 1. CHOOSE THE<br />RIGHT TOOL<br />You don’t need a chainsaw to cut butter.<br />
  12. 12. 2. COMPRESS EVERYTHING<br />mysqldump -u myuser -p mypasssourceDB | <br />gzip | ssh mike@dataspora.com "cat - | <br />gunzip | mysql -u myuser -p mypasstargetDB"<br />The world is IO-bound.<br />
  13. 13. 3. SPLIT UP<br />YOUR DATA<br />Split, apply, combine.<br />
  14. 14. 4. WORK <br />WITH SAMPLES<br />perl -ne "print if (rand() < 0.01)" <br /> data.csv > sample.csv<br />Big Data is heavy, <br />samples are light.<br />
  15. 15. 5. USE<br />STATISTICS<br />
  16. 16. COPY<br />FROM OTHERS<br />git clone git://github.com/kevinweil/hadoop-lzo<br />Use open source.<br />
  17. 17. 7. ESCHEW CHART TYPOLOGIES<br />Charts are compositions,<br />not containers.<br />
  18. 18. 8. COLORWITH CARE<br />Color can enhance <br />or insult.<br />
  19. 19. 9. TELL A STORY<br />People are listening.<br />
  20. 20. ONE <br />SUCCESS<br />STORY<br />
  21. 21. WHY DO TELCO CUSTOMERS LEAVE?<br />Sign up<br />Leave<br />Goal: “less churn.”<br />
  22. 22. DATA:<br />BILLIONS<br />OF CALLS<br />… and millions of callers.<br />
  23. 23. DOES CALL <br />QUALITY<br />MATTER?<br />… a difference,<br />but not significant.<br />
  24. 24. WHAT ABOUT<br />SOCIAL<br />NETWORKS?<br />Hmmm...<br />
  25. 25. BUILD THE <br />CALL GRAPH<br />… but is it predictive?<br />
  26. 26. EVOLUTION OF A CALL GRAPH<br />April<br />
  27. 27. EVOLUTION OF A CALL GRAPH<br />May<br />
  28. 28. EVOLUTION OF A CALL GRAPH<br />June<br />
  29. 29. EVOLUTION OF A CALL GRAPH<br />July<br />
  30. 30. 700% INCREASE<br />IN CHURN<br />when a cancellation<br />occurs in a call network.<br />
  31. 31. FINAL <br />THOUGHTS<br />
  32. 32. THE BIG DATA STACK<br />Actions<br />Data Products<br />(Content Filters, Rec Engines)<br />Analytics<br />(R, SPSS, SAS, SAP)<br />Insights<br />Big Data<br />Dedicated RDBMS <br />Data<br />
  33. 33. THANKS!<br />QUESTIONS?<br />Michael Driscoll<br />med@dataspora.com<br />@dataspora on Twitter<br />http://www.dataspora.com/blog<br />SDForum BI SIG<br />June 15, 2010<br />

×