Your SlideShare is downloading. ×
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
How to do Data Science Without the Scientist
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

How to do Data Science Without the Scientist

455

Published on

http://www.datameer.com With data scientists in short supply, it’s surprising that much of their precious time is spent doing ”data plumbing”—preparing data or servicing business users rather than …

http://www.datameer.com With data scientists in short supply, it’s surprising that much of their precious time is spent doing ”data plumbing”—preparing data or servicing business users rather than doing actual data science.

Hadoop and self-service Smart Analytics is changing this reality. Join Matt Schumpert, Director of Solution Engineering at Datameer, as he walks through a real-world use case and discusses the evolution toward self-service data science, freeing data scientists to the hard work we so badly need.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
455
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Easier, Faster, Smarter
  • 2. Data Science without the Scientist Matt Schumpert 10.30.13 © 2013 Datameer, Inc. All rights reserved.
  • 3. Agenda Background First principles Mind-blowing fun fact Current state & challenges Suggestions for making life easier Demo! © 2013 Datameer, Inc. All rights reserved.
  • 4. Me Enterprise infrastructure software guy Focused on abstraction and customers Likes simplicity © 2013 Datameer, Inc. All rights reserved.
  • 5. A favorite example... Buffered Web Services: “When a buffered operation is invoked by a client, the method operation goes on a JMS queue and WebLogic Server deals with it asynchronously by transparently creating a Message Driven Bean to consume the message. As with Web Service reliable messaging, if WebLogic Server goes down while the method invocation is still in the queue, it will be dealt with as soon as WebLogic Server is restarted. When a client invokes the buffered Web Service, the client does not wait for a response from the invoke, and the execution of the client can continue” © 2013 Datameer, Inc. All rights reserved.
  • 6. 1. First Principles
  • 7. First Principles from an Expert Instrument everything Invest in infrastructure Put all your data in one place Data first, questions later Keep raw data forever Let everyone party on the data Produce tools to support the whole lifecycle - Jeff Hammerbacher © 2013 Datameer, Inc. All rights reserved.
  • 8. 2. Mind-boggling fun fact
  • 9. 190,000 unfilled data scientist jobs by 2018 -McKinsey
  • 10. Signal-to-Noise Ratio is Dropping!
  • 11. 3. Current state + challenges
  • 12. Hallmarks of Traditional Analytics Esoteric skills Long cycle times Low transparency Data & application silos Mired in data prep Sampling (guesstimation) Expensive! Extremely valuable work products © 2013 Datameer, Inc. All rights reserved.
  • 13. Current Recipe: Pull historical data Sample Cleanse / Pre-process Design / implement model Train Hand-code / Integrate Deploy Fine-Tune, rinse and repeat © 2013 Datameer, Inc. All rights reserved.
  • 14. Science != Everyday Decisions
  • 15. There must be a better way!
  • 16. Apply traditional tools to big data? SAS R Mahout Expensive Not Scalable Silo’ed Requires Coding Retraining Clunky Architecture Coding Required Immature Limited Support © 2013 Datameer, Inc. All rights reserved.
  • 17. And what about the rest of the (big data) story?
  • 18. Big Data Analytics is NOT (just): A sexy new visualization tool Machine learning / Predictive analytics Data science Hadoop The data warehousing movie replayed © 2013 Datameer, Inc. All rights reserved.
  • 19. Big Data Analytics IS: A granular, complete and current understanding of your operations and customers Answering questions at the speed of business Relevancy in all customer interactions Closed-loop decisioning that’s data-driven Managing data through a lifecycle © 2013 Datameer, Inc. All rights reserved.
  • 20. The Big Data Analytics Lifecycle Prepare and Analyze Analyze Create your Integrate hypothesis Visualize Visualize Act on insight and measure ROI Deploy © 2013 Datameer, Inc. All rights reserved.
  • 21. A lesson from data warehousing / BI traditional / schema-on-write: slow static complex agile / schema-on-read: fast dynamic simple Source: TDWI © 2013 Datameer, Inc. All rights reserved.
  • 22. Don’t rebuild Rome... again!! © 2013 Datameer, Inc. All rights reserved.
  • 23. There must be a better way!
  • 24. 4. Making life easier
  • 25. How (without army): Speak the language of the business Generate (don’t write) code Simplify data integration and preparation Move the computation (analytics) to the data © 2013 Datameer, Inc. All rights reserved.
  • 26. Esoteric Language == Obscurity K-Means CART Mutual Information Matrix Factorization Random Forest? Logistical Regression Support Vector Machine?? © 2013 Datameer, Inc. All rights reserved.
  • 27. Algorithms can be straightforward! © 2013 Datameer, Inc. All rights reserved.
  • 28. Clustering © 2013 Datameer, Inc. All rights reserved.
  • 29. Column Dependencies © 2013 Datameer, Inc. All rights reserved.
  • 30. Decision Trees © 2013 Datameer, Inc. All rights reserved.
  • 31. Recommendations © 2013 Datameer, Inc. All rights reserved.
  • 32. Example: Fraud Investigation Sales Conversion
  • 33. DEMO
  • 34. Data Wrangling
  • 35. DEMO
  • 36. © 2013 Datameer, Inc. All rights reserved.
  • 37. @Datameer

×