What is Data Science?
Daniel D. Gutierrez, Data Scientist
AMULET Analytics
February 2014
/ page 2
A Life in Data Science
AMULET Analytics
My personal consultancy doing work in data science – computational marketing
Doing...
Data Science in Perspective
Facilitates a cascade of technologies
Big Data is facilitated by data science
Data science is ...
/ page 5
/ page 6
/ page 7
Who Does Data Science? Unicorns!
Controversy in hiring data scientists
Some companies post job ads for
unicorns, mythical ...
What is Big Data?
Big Data
– “large data sets so big that commonly-used software tools are unable to capture,
curate, mana...
Applications for Big Data
Smarter Healthcare
Multi-channel sales
Finance
Log Analysis
Homeland Security
Traffic Control
Te...
The Minnesota Dad
Father and daughter walk into Target store and to speak with the manager:
– Wants to know why the store ...
/ page 12
Machine Learning Overview
What is Machine Learning?
Components have been around for decades
“Data science” is just a new n...
Sentiment Analysis

/ page 14
R vs. Python Wars
R
– Very good for data acquisition, cleaning, munging, exploratory analysis, model
selection, machine le...
Visualization is Critical

/ page 16
Learning More About Data Science
Doing Data Science
Cathy O’Neil & Rachel Schutt
O’Reilly Media

/ page 17
Data Science in Action

/ page 18
Summary – Data Science is Here to Stay
Integral part of Big Data

– Data science and machine learning fuel big data ✔

The...
Thank you!
Follow me: @AMULETAnalytics
Contact me: dan@amuletanalytics.com
www.amuletanalytics.com
Upcoming SlideShare
Loading in …5
×

What is Data Science? Daniel D Gutierrez

1,548 views

Published on

What is Data Science talk for General Assembly by Daniel D Gutierrez - February 5, 2014

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,548
On SlideShare
0
From Embeds
0
Number of Embeds
88
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

What is Data Science? Daniel D Gutierrez

  1. 1. What is Data Science? Daniel D. Gutierrez, Data Scientist AMULET Analytics February 2014
  2. 2. / page 2
  3. 3. A Life in Data Science AMULET Analytics My personal consultancy doing work in data science – computational marketing Doing data analysis, machine learning and visualization for enterprises Wide breadth of industries: startups, manufacturing, non-profit, fashion, ecommerce, market research, etc. Big Data Journalist Managing Editor – insideBIGDATA.com Blogger – Big Data Republic (bigdatarepublic.com) Blogger – All Analytics (allanalytics.com) Teaching Community TA – Coursera UCLA Extension Writing a book: “Introduction to Machine Learning with R” / page 3
  4. 4. Data Science in Perspective Facilitates a cascade of technologies Big Data is facilitated by data science Data science is facilitated by machine learning Machine learning is a confluence of technologies and disciplines – Computer science, mathematical statistics, probability theory, visualization Data science in nothing new! Components have been around for decades “Data science” is just a new name for something old and proven (I do love it!) “Machine learning” used to be “data mining” or KDD. Much hype recently Harvard Business Review proclaimed “sexiest job for the 21st century.” I’ll take it! Now with “big data” it’s a force barely contained / page 4
  5. 5. / page 5
  6. 6. / page 6
  7. 7. / page 7
  8. 8. Who Does Data Science? Unicorns! Controversy in hiring data scientists Some companies post job ads for unicorns, mythical creatures having no basis in reality Hire a data science TEAM! Don’t expect a single individual to be both a “theorist” and an “experimentalist” Consultant vs. full-time hire / page 8
  9. 9. What is Big Data? Big Data – “large data sets so big that commonly-used software tools are unable to capture, curate, manage, and process the data within a tolerable elapsed time.” Hadoop Dominates Big Data market – Used widely by some of the world's largest websites, such as Facebook, eBay, Amazon and Yahoo – Moving into the enterprise – Invented by developers at Yahoo! Apache Hadoop / page 9
  10. 10. Applications for Big Data Smarter Healthcare Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom “Big Data is the definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless.” Search Quality Manufacturing Source: Wikibon 2012 Trading Analytics Fraud and Risk Retail: Churn / page 10
  11. 11. The Minnesota Dad Father and daughter walk into Target store and to speak with the manager: – Wants to know why the store is bombarding his teenage daughter with ads for baby strollers, diapers and other baby goods. "Are you trying to encourage her to get pregnant?” – The befuddled manager apologizes and responds he has no idea why the company is sending her such items Father later phones the store to apologize - turns out his daughter was expecting How? – Target used Big Data to predict pregnancy. When a woman begins buying vitamins, increases her purchases of lotion, and buys an oversized purse or bag, the odds are very high she is expecting – Target knew the daughter was pregnant before the family / page 11
  12. 12. / page 12
  13. 13. Machine Learning Overview What is Machine Learning? Components have been around for decades “Data science” is just a new name for something old and proven (I do love it!) “Machine learning” used to be “data mining” or KDD. Supervised learning Prediction and classification Linear regression, logistic regression, classification trees, SVM, neural nets Train the algorithm on known labelled data to be able to predict new data Unsupervised learning Hierarchical clustering K-means clustering Principal component analysis (PCA) Dimensionality reduction to address “the curse of dimensionality” / page 13
  14. 14. Sentiment Analysis / page 14
  15. 15. R vs. Python Wars R – Very good for data acquisition, cleaning, munging, exploratory analysis, model selection, machine learning algorithm development and training, model performance evaluation – One of the best visualization tools bar none – Has over 4,000 packages Python – Good choice for production deployment – Rapidly catching up with R in terms of data science capabilities / page 15
  16. 16. Visualization is Critical / page 16
  17. 17. Learning More About Data Science Doing Data Science Cathy O’Neil & Rachel Schutt O’Reilly Media / page 17
  18. 18. Data Science in Action / page 18
  19. 19. Summary – Data Science is Here to Stay Integral part of Big Data – Data science and machine learning fuel big data ✔ The shortage of data scientists is real – – – – Big data is expected to be a $53.4 billion industry by 2016 ✔ Job postings for “data scientist” increased 15,000% between 2011 and 2012 ✔ Job market currently 140,000 – 190,000 open positions ✔ Between 2010-2020 project growth of 18.7% ✔ Companies of all sizes need to plan out their data science strategy – Increase value of enterprise data assets ✔ 2014 should be a wild year! – Conference circuit is exploding ✔ – New books, news sources, press coverage abound ✔ / page 19
  20. 20. Thank you! Follow me: @AMULETAnalytics Contact me: dan@amuletanalytics.com www.amuletanalytics.com

×