Your SlideShare is downloading. ×

Big data meetup

1,357

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,357
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Science Data Meetup Jan. 12
  • 2. What is data science?Besides a reason to have beer and pizza…
  • 3. What does the literature say?
  • 4. Hacking“Good data scientists understand, in adeep way, that the heavy lifting ofcleanup and preparation isn’tsomething that gets in the way of solvingthe problem… it is the problem” DJ Patil bash/awk/sed
  • 5. StatisticsWhat’s the probability that 2 people inthe front 2 rows share a birthday?1. ~10%2. ~20%3. ~50%4. ~90%What’s the probability that a 99%accurate test diagnosed a 1/1000 disease?1. ~10%2. ~50%3. ~90%4. ~99%
  • 6. Domain Expertise
  • 7. Intelligence Cookbook Just follow the steps
  • 8. The RecipeFirst, make it valuable.Then, make it possible.Then, make it beautiful. Then, make it smart.
  • 9. ExampleE-Commerce website
  • 10. Make it valuableFind a KPI that is correlated to bottom line revenuee.g. number of products the visitor browses through
  • 11. Make it possibleDevelop the simplest heuristice.g. show the visitor one of the top 10 selling products
  • 12. Make it beautifulCreate a method to quickly test new algorithms against old ones e.g. create a framework that split tests two models and reports which one is better
  • 13. Make it smartFigure out in what field your problem is and choose an off the shelf algorithm e.g. recognize that the problem is product recommendation and use collaborative filtering
  • 14. Common ML problems• Supervised learning • Classification • Regression • Anomaly detection• Unsupervised learning • Clustering • Separation• Recommendation • Feature based recommendation • Collaborative filtering• Search • Indexing • Ranking
  • 15. To sum it all upReal data science is hardbut …Real data science is the last step in datascience, not the firstand besides …The most important thing in data science isthe business, not the science
  • 16. Questions?email: vitalyp@liveperson.com Twitter: @bigdatasc

×