What is Data Science? Daniel D Gutierrez
Upcoming SlideShare
Loading in...5
×
 

What is Data Science? Daniel D Gutierrez

on

  • 659 views

What is Data Science talk for General Assembly by Daniel D Gutierrez - February 5, 2014

What is Data Science talk for General Assembly by Daniel D Gutierrez - February 5, 2014

Statistics

Views

Total Views
659
Views on SlideShare
588
Embed Views
71

Actions

Likes
0
Downloads
20
Comments
0

3 Embeds 71

http://www.scoop.it 69
http://beyondthearcinc.tumblr.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

What is Data Science? Daniel D Gutierrez What is Data Science? Daniel D Gutierrez Presentation Transcript

  • What is Data Science? Daniel D. Gutierrez, Data Scientist AMULET Analytics February 2014
  • / page 2
  • A Life in Data Science AMULET Analytics My personal consultancy doing work in data science – computational marketing Doing data analysis, machine learning and visualization for enterprises Wide breadth of industries: startups, manufacturing, non-profit, fashion, ecommerce, market research, etc. Big Data Journalist Managing Editor – insideBIGDATA.com Blogger – Big Data Republic (bigdatarepublic.com) Blogger – All Analytics (allanalytics.com) Teaching Community TA – Coursera UCLA Extension Writing a book: “Introduction to Machine Learning with R” / page 3
  • Data Science in Perspective Facilitates a cascade of technologies Big Data is facilitated by data science Data science is facilitated by machine learning Machine learning is a confluence of technologies and disciplines – Computer science, mathematical statistics, probability theory, visualization Data science in nothing new! Components have been around for decades “Data science” is just a new name for something old and proven (I do love it!) “Machine learning” used to be “data mining” or KDD. Much hype recently Harvard Business Review proclaimed “sexiest job for the 21st century.” I’ll take it! Now with “big data” it’s a force barely contained / page 4
  • / page 5
  • / page 6
  • / page 7
  • Who Does Data Science? Unicorns! Controversy in hiring data scientists Some companies post job ads for unicorns, mythical creatures having no basis in reality Hire a data science TEAM! Don’t expect a single individual to be both a “theorist” and an “experimentalist” Consultant vs. full-time hire / page 8
  • What is Big Data? Big Data – “large data sets so big that commonly-used software tools are unable to capture, curate, manage, and process the data within a tolerable elapsed time.” Hadoop Dominates Big Data market – Used widely by some of the world's largest websites, such as Facebook, eBay, Amazon and Yahoo – Moving into the enterprise – Invented by developers at Yahoo! Apache Hadoop / page 9
  • Applications for Big Data Smarter Healthcare Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom “Big Data is the definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless.” Search Quality Manufacturing Source: Wikibon 2012 Trading Analytics Fraud and Risk Retail: Churn / page 10
  • The Minnesota Dad Father and daughter walk into Target store and to speak with the manager: – Wants to know why the store is bombarding his teenage daughter with ads for baby strollers, diapers and other baby goods. "Are you trying to encourage her to get pregnant?” – The befuddled manager apologizes and responds he has no idea why the company is sending her such items Father later phones the store to apologize - turns out his daughter was expecting How? – Target used Big Data to predict pregnancy. When a woman begins buying vitamins, increases her purchases of lotion, and buys an oversized purse or bag, the odds are very high she is expecting – Target knew the daughter was pregnant before the family / page 11
  • / page 12
  • Machine Learning Overview What is Machine Learning? Components have been around for decades “Data science” is just a new name for something old and proven (I do love it!) “Machine learning” used to be “data mining” or KDD. Supervised learning Prediction and classification Linear regression, logistic regression, classification trees, SVM, neural nets Train the algorithm on known labelled data to be able to predict new data Unsupervised learning Hierarchical clustering K-means clustering Principal component analysis (PCA) Dimensionality reduction to address “the curse of dimensionality” / page 13
  • Sentiment Analysis / page 14
  • R vs. Python Wars R – Very good for data acquisition, cleaning, munging, exploratory analysis, model selection, machine learning algorithm development and training, model performance evaluation – One of the best visualization tools bar none – Has over 4,000 packages Python – Good choice for production deployment – Rapidly catching up with R in terms of data science capabilities / page 15
  • Visualization is Critical / page 16
  • Learning More About Data Science Doing Data Science Cathy O’Neil & Rachel Schutt O’Reilly Media / page 17
  • Data Science in Action / page 18
  • Summary – Data Science is Here to Stay Integral part of Big Data – Data science and machine learning fuel big data ✔ The shortage of data scientists is real – – – – Big data is expected to be a $53.4 billion industry by 2016 ✔ Job postings for “data scientist” increased 15,000% between 2011 and 2012 ✔ Job market currently 140,000 – 190,000 open positions ✔ Between 2010-2020 project growth of 18.7% ✔ Companies of all sizes need to plan out their data science strategy – Increase value of enterprise data assets ✔ 2014 should be a wild year! – Conference circuit is exploding ✔ – New books, news sources, press coverage abound ✔ / page 19
  • Thank you! Follow me: @AMULETAnalytics Contact me: dan@amuletanalytics.com www.amuletanalytics.com