Easier, Faster, Smarter
Data Science without the Scientist
Matt Schumpert
10.30.13

© 2013 Datameer, Inc. All rights reserved.
Agenda
Background
First principles
Mind-blowing fun fact
Current state & challenges
Suggestions for making life easier
Dem...
Me
Enterprise infrastructure software guy
Focused on abstraction and customers
Likes simplicity

© 2013 Datameer, Inc. All...
A favorite example...
Buffered Web Services:
“When a buffered operation is invoked by a client, the method operation goes on...
1. First Principles
First Principles from an Expert
Instrument everything
Invest in infrastructure
Put all your data in one place
Data first, q...
2. Mind-boggling fun fact
190,000 unfilled data
scientist jobs by 2018

-McKinsey
Signal-to-Noise Ratio is Dropping!
3. Current state + challenges
Hallmarks of Traditional Analytics
Esoteric skills
Long cycle times
Low transparency
Data & application silos
Mired in dat...
Current Recipe:
Pull historical data
Sample
Cleanse / Pre-process
Design / implement model
Train
Hand-code / Integrate
Dep...
Science != Everyday Decisions
There must be a better
way!
Apply traditional tools to big data?

SAS

R

Mahout

Expensive
Not Scalable
Silo’ed

Requires Coding
Retraining
Clunky Ar...
And what about the rest
of the (big data) story?
Big Data Analytics is NOT (just):
A sexy new visualization tool
Machine learning / Predictive analytics
Data science
Hadoo...
Big Data Analytics IS:
A granular, complete and current understanding
of your operations and customers
Answering questions...
The Big Data Analytics Lifecycle
Prepare and
Analyze
Analyze

Create your
Integrate
hypothesis

Visualize
Visualize

Act o...
A lesson from data warehousing / BI
traditional / schema-on-write:
slow

static

complex

agile / schema-on-read:
fast

dy...
Don’t rebuild Rome... again!!

© 2013 Datameer, Inc. All rights reserved.
There must be a better
way!
4. Making life easier
How (without army):
Speak the language of the business
Generate (don’t write) code
Simplify data integration and preparati...
Esoteric Language == Obscurity
K-Means

CART

Mutual Information

Matrix Factorization
Random Forest?

Logistical Regressi...
Algorithms can be straightforward!

© 2013 Datameer, Inc. All rights reserved.
Clustering

© 2013 Datameer, Inc. All rights reserved.
Column Dependencies

© 2013 Datameer, Inc. All rights reserved.
Decision Trees

© 2013 Datameer, Inc. All rights reserved.
Recommendations

© 2013 Datameer, Inc. All rights reserved.
Example:
Fraud Investigation
Sales Conversion
DEMO
Data Wrangling
DEMO
© 2013 Datameer, Inc. All rights reserved.
@Datameer
Upcoming SlideShare
Loading in …5
×

How to do Data Science Without the Scientist

1,007 views

Published on

http://www.datameer.com With data scientists in short supply, it’s surprising that much of their precious time is spent doing ”data plumbing”—preparing data or servicing business users rather than doing actual data science.

Hadoop and self-service Smart Analytics is changing this reality. Join Matt Schumpert, Director of Solution Engineering at Datameer, as he walks through a real-world use case and discusses the evolution toward self-service data science, freeing data scientists to the hard work we so badly need.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,007
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

How to do Data Science Without the Scientist

  1. 1. Easier, Faster, Smarter
  2. 2. Data Science without the Scientist Matt Schumpert 10.30.13 © 2013 Datameer, Inc. All rights reserved.
  3. 3. Agenda Background First principles Mind-blowing fun fact Current state & challenges Suggestions for making life easier Demo! © 2013 Datameer, Inc. All rights reserved.
  4. 4. Me Enterprise infrastructure software guy Focused on abstraction and customers Likes simplicity © 2013 Datameer, Inc. All rights reserved.
  5. 5. A favorite example... Buffered Web Services: “When a buffered operation is invoked by a client, the method operation goes on a JMS queue and WebLogic Server deals with it asynchronously by transparently creating a Message Driven Bean to consume the message. As with Web Service reliable messaging, if WebLogic Server goes down while the method invocation is still in the queue, it will be dealt with as soon as WebLogic Server is restarted. When a client invokes the buffered Web Service, the client does not wait for a response from the invoke, and the execution of the client can continue” © 2013 Datameer, Inc. All rights reserved.
  6. 6. 1. First Principles
  7. 7. First Principles from an Expert Instrument everything Invest in infrastructure Put all your data in one place Data first, questions later Keep raw data forever Let everyone party on the data Produce tools to support the whole lifecycle - Jeff Hammerbacher © 2013 Datameer, Inc. All rights reserved.
  8. 8. 2. Mind-boggling fun fact
  9. 9. 190,000 unfilled data scientist jobs by 2018 -McKinsey
  10. 10. Signal-to-Noise Ratio is Dropping!
  11. 11. 3. Current state + challenges
  12. 12. Hallmarks of Traditional Analytics Esoteric skills Long cycle times Low transparency Data & application silos Mired in data prep Sampling (guesstimation) Expensive! Extremely valuable work products © 2013 Datameer, Inc. All rights reserved.
  13. 13. Current Recipe: Pull historical data Sample Cleanse / Pre-process Design / implement model Train Hand-code / Integrate Deploy Fine-Tune, rinse and repeat © 2013 Datameer, Inc. All rights reserved.
  14. 14. Science != Everyday Decisions
  15. 15. There must be a better way!
  16. 16. Apply traditional tools to big data? SAS R Mahout Expensive Not Scalable Silo’ed Requires Coding Retraining Clunky Architecture Coding Required Immature Limited Support © 2013 Datameer, Inc. All rights reserved.
  17. 17. And what about the rest of the (big data) story?
  18. 18. Big Data Analytics is NOT (just): A sexy new visualization tool Machine learning / Predictive analytics Data science Hadoop The data warehousing movie replayed © 2013 Datameer, Inc. All rights reserved.
  19. 19. Big Data Analytics IS: A granular, complete and current understanding of your operations and customers Answering questions at the speed of business Relevancy in all customer interactions Closed-loop decisioning that’s data-driven Managing data through a lifecycle © 2013 Datameer, Inc. All rights reserved.
  20. 20. The Big Data Analytics Lifecycle Prepare and Analyze Analyze Create your Integrate hypothesis Visualize Visualize Act on insight and measure ROI Deploy © 2013 Datameer, Inc. All rights reserved.
  21. 21. A lesson from data warehousing / BI traditional / schema-on-write: slow static complex agile / schema-on-read: fast dynamic simple Source: TDWI © 2013 Datameer, Inc. All rights reserved.
  22. 22. Don’t rebuild Rome... again!! © 2013 Datameer, Inc. All rights reserved.
  23. 23. There must be a better way!
  24. 24. 4. Making life easier
  25. 25. How (without army): Speak the language of the business Generate (don’t write) code Simplify data integration and preparation Move the computation (analytics) to the data © 2013 Datameer, Inc. All rights reserved.
  26. 26. Esoteric Language == Obscurity K-Means CART Mutual Information Matrix Factorization Random Forest? Logistical Regression Support Vector Machine?? © 2013 Datameer, Inc. All rights reserved.
  27. 27. Algorithms can be straightforward! © 2013 Datameer, Inc. All rights reserved.
  28. 28. Clustering © 2013 Datameer, Inc. All rights reserved.
  29. 29. Column Dependencies © 2013 Datameer, Inc. All rights reserved.
  30. 30. Decision Trees © 2013 Datameer, Inc. All rights reserved.
  31. 31. Recommendations © 2013 Datameer, Inc. All rights reserved.
  32. 32. Example: Fraud Investigation Sales Conversion
  33. 33. DEMO
  34. 34. Data Wrangling
  35. 35. DEMO
  36. 36. © 2013 Datameer, Inc. All rights reserved.
  37. 37. @Datameer

×