Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Driscoll bi sig_15_jun2010 by Dataspora 1931 views
- Multi Level Modelling&Weights W... by egebhardt72 1929 views
- Winning with Big Data: Secrets of t... by Dataspora 9667 views
- A Survey Of R Graphics by Dataspora 14896 views
- ForecastIT 6. Multi-Variable Linear... by DeepThought, Inc. 2399 views
- Wal Mart And Tesco Study by Pham Ngoc 19422 views

2,075 views

Published on

Presented at the June 2010 gathering of the Bay Area's Business Intelligence Special Interest Group.

Published in:
Technology

No Downloads

Total views

2,075

On SlideShare

0

From Embeds

0

Number of Embeds

4

Shares

0

Downloads

113

Comments

0

Likes

4

No embeds

No notes for slide

- 1. WINNING<br />WITH<br />BIG <br />DATA<br />Secrets of the Successful<br />Data Scientist<br />SDForum BI SIG<br />June 15, 2010<br />Michael Driscoll<br />@dataspora<br />
- 2. WHY DATA<br />MATTERS<br />NOW<br />
- 3. THE INDUSTRIAL<br />AGE <br />OF <br />DATA<br />
- 4. WHAT IS <br />BIG DATA?<br />Data that is distributed.<br />
- 5. WHAT IS<br />DATA <br />SCIENCE?<br />
- 6. WHY DATA SCIENCE<br />IS SEXY<br />
- 7. “The sexy job in the next ten years will be statisticians…”<br />- Hal Varian<br />=<br />+<br />
- 8.
- 9. data<br />model<br />1000 bytes<br />2 bytes<br />
- 10. 9 WAYS TO WIN<br />WITH DATA<br />
- 11. 1. CHOOSE THE<br />RIGHT TOOL<br />You don’t need a chainsaw to cut butter.<br />
- 12. 2. COMPRESS EVERYTHING<br />mysqldump -u myuser -p mypasssourceDB | <br />gzip | ssh mike@dataspora.com "cat - | <br />gunzip | mysql -u myuser -p mypasstargetDB"<br />The world is IO-bound.<br />
- 13. 3. SPLIT UP<br />YOUR DATA<br />Split, apply, combine.<br />
- 14. 4. WORK <br />WITH SAMPLES<br />perl -ne "print if (rand() < 0.01)" <br /> data.csv > sample.csv<br />Big Data is heavy, <br />samples are light.<br />
- 15. 5. USE<br />STATISTICS<br />
- 16. COPY<br />FROM OTHERS<br />git clone git://github.com/kevinweil/hadoop-lzo<br />Use open source.<br />
- 17. 7. ESCHEW CHART TYPOLOGIES<br />Charts are compositions,<br />not containers.<br />
- 18. 8. COLORWITH CARE<br />Color can enhance <br />or insult.<br />
- 19. 9. TELL A STORY<br />People are listening.<br />
- 20. ONE <br />SUCCESS<br />STORY<br />
- 21. WHY DO TELCO CUSTOMERS LEAVE?<br />Sign up<br />Leave<br />Goal: “less churn.”<br />
- 22. DATA:<br />BILLIONS<br />OF CALLS<br />… and millions of callers.<br />
- 23. DOES CALL <br />QUALITY<br />MATTER?<br />… a difference,<br />but not significant.<br />
- 24. WHAT ABOUT<br />SOCIAL<br />NETWORKS?<br />Hmmm...<br />
- 25. BUILD THE <br />CALL GRAPH<br />… but is it predictive?<br />
- 26. EVOLUTION OF A CALL GRAPH<br />April<br />
- 27. EVOLUTION OF A CALL GRAPH<br />May<br />
- 28. EVOLUTION OF A CALL GRAPH<br />June<br />
- 29. EVOLUTION OF A CALL GRAPH<br />July<br />
- 30. 700% INCREASE<br />IN CHURN<br />when a cancellation<br />occurs in a call network.<br />
- 31. FINAL <br />THOUGHTS<br />
- 32. THE BIG DATA STACK<br />Actions<br />Data Products<br />(Content Filters, Rec Engines)<br />Analytics<br />(R, SPSS, SAS, SAP)<br />Insights<br />Big Data<br />Dedicated RDBMS <br />Data<br />
- 33. THANKS!<br />QUESTIONS?<br />Michael Driscoll<br />med@dataspora.com<br />@dataspora on Twitter<br />http://www.dataspora.com/blog<br />SDForum BI SIG<br />June 15, 2010<br />

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment