SlideShare a Scribd company logo
What is Data Science?
Daniel D. Gutierrez, Data Scientist
AMULET Analytics
February 2014
/ page 2
A Life in Data Science
AMULET Analytics
My personal consultancy doing work in data science – computational marketing
Doing data analysis, machine learning and visualization for enterprises
Wide breadth of industries: startups, manufacturing, non-profit, fashion, ecommerce, market research, etc.
Big Data Journalist
Managing Editor – insideBIGDATA.com
Blogger – Big Data Republic (bigdatarepublic.com)
Blogger – All Analytics (allanalytics.com)
Teaching
Community TA – Coursera
UCLA Extension
Writing a book: “Introduction to Machine Learning with R”
/ page 3
Data Science in Perspective
Facilitates a cascade of technologies
Big Data is facilitated by data science
Data science is facilitated by machine learning
Machine learning is a confluence of technologies and disciplines
–

Computer science, mathematical statistics, probability theory, visualization

Data science in nothing new!
Components have been around for decades
“Data science” is just a new name for something old and proven (I do love it!)

“Machine learning” used to be “data mining” or KDD.

Much hype recently
Harvard Business Review proclaimed “sexiest job for the 21st century.” I’ll take it!
Now with “big data” it’s a force barely contained

/ page 4
/ page 5
/ page 6
/ page 7
Who Does Data Science? Unicorns!
Controversy in hiring data scientists
Some companies post job ads for
unicorns, mythical creatures having
no basis in reality
Hire a data science TEAM!
Don’t expect a single individual to be
both a “theorist” and an
“experimentalist”
Consultant vs. full-time hire

/ page 8
What is Big Data?
Big Data
– “large data sets so big that commonly-used software tools are unable to capture,
curate, manage, and process the data within a tolerable elapsed time.”

Hadoop Dominates Big Data market
– Used widely by some of the world's largest websites,
such as Facebook, eBay, Amazon and Yahoo
– Moving into the enterprise
– Invented by developers at Yahoo!

Apache Hadoop

/ page 9
Applications for Big Data
Smarter Healthcare
Multi-channel sales
Finance
Log Analysis
Homeland Security
Traffic Control
Telecom

“Big Data is the definitive source of
competitive advantage across all
industries. For those organizations
that understand and embrace the new
reality of Big Data, the possibilities
for new innovation, improved agility,
and increased profitability are nearly
endless.”

Search Quality
Manufacturing

Source: Wikibon 2012

Trading Analytics

Fraud and Risk
Retail: Churn

/ page 10
The Minnesota Dad
Father and daughter walk into Target store and to speak with the manager:
– Wants to know why the store is bombarding his teenage daughter with ads for baby
strollers, diapers and other baby goods. "Are you trying to encourage her to get
pregnant?”
– The befuddled manager apologizes and responds he has no idea why the company is
sending her such items

Father later phones the store to apologize - turns out his daughter was expecting
How?
– Target used Big Data to predict pregnancy. When a woman begins buying vitamins,
increases her purchases of lotion, and buys an oversized purse or bag, the odds are
very high she is expecting

– Target knew the daughter was pregnant before the family

/ page 11
/ page 12
Machine Learning Overview
What is Machine Learning?
Components have been around for decades
“Data science” is just a new name for something old and proven (I do love it!)
“Machine learning” used to be “data mining” or KDD.

Supervised learning
Prediction and classification
Linear regression, logistic regression, classification trees, SVM, neural nets

Train the algorithm on known labelled data to be able to predict new data

Unsupervised learning
Hierarchical clustering
K-means clustering
Principal component analysis (PCA)
Dimensionality reduction to address “the curse of dimensionality”
/ page 13
Sentiment Analysis

/ page 14
R vs. Python Wars
R
– Very good for data acquisition, cleaning, munging, exploratory analysis, model
selection, machine learning algorithm development and training, model performance
evaluation
– One of the best visualization tools bar none
– Has over 4,000 packages

Python
– Good choice for production deployment
– Rapidly catching up with R in terms of data science capabilities

/ page 15
Visualization is Critical

/ page 16
Learning More About Data Science
Doing Data Science
Cathy O’Neil & Rachel Schutt
O’Reilly Media

/ page 17
Data Science in Action

/ page 18
Summary – Data Science is Here to Stay
Integral part of Big Data

– Data science and machine learning fuel big data ✔

The shortage of data scientists is real
–
–
–
–

Big data is expected to be a $53.4 billion industry by 2016 ✔
Job postings for “data scientist” increased 15,000% between 2011 and 2012 ✔
Job market currently 140,000 – 190,000 open positions ✔
Between 2010-2020 project growth of 18.7% ✔

Companies of all sizes need to plan out their data science strategy
– Increase value of enterprise data assets ✔

2014 should be a wild year!

– Conference circuit is exploding ✔
– New books, news sources, press coverage abound ✔

/ page 19
Thank you!
Follow me: @AMULETAnalytics
Contact me: dan@amuletanalytics.com
www.amuletanalytics.com

More Related Content

What's hot

Big Data, Big Opportunities
Big Data, Big OpportunitiesBig Data, Big Opportunities
Big Data, Big Opportunities
Arimo, Inc.
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Edureka!
 
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Swiss Big Data User Group
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
Edureka!
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
Experian_US
 
WIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data ScientistWIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data ScientistChristopher Teixeira
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
Angelo Mariano
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
Bohitesh Misra, PMP
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
Mahesh Kumar CV
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
ali easazadeh
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Laguna State Polytechnic University
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
sreekanthricky
 
Business analytics
Business analyticsBusiness analytics
Business analytics
SwarnaLatha177
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
Tyrone Systems
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
bhavesh lande
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
Lyn Fenex
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DigitYser
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
Aswadmehar
 
How BIG is Big Data
How BIG is Big DataHow BIG is Big Data
How BIG is Big Data
Ma Foi Analytics
 
Big Data Presentation at SCQAA-SF on June 12 2013
Big Data Presentation at SCQAA-SF on June 12 2013Big Data Presentation at SCQAA-SF on June 12 2013
Big Data Presentation at SCQAA-SF on June 12 2013Sujit Ghosh
 

What's hot (20)

Big Data, Big Opportunities
Big Data, Big OpportunitiesBig Data, Big Opportunities
Big Data, Big Opportunities
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...
 
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
WIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data ScientistWIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data Scientist
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
How BIG is Big Data
How BIG is Big DataHow BIG is Big Data
How BIG is Big Data
 
Big Data Presentation at SCQAA-SF on June 12 2013
Big Data Presentation at SCQAA-SF on June 12 2013Big Data Presentation at SCQAA-SF on June 12 2013
Big Data Presentation at SCQAA-SF on June 12 2013
 

Similar to What is Data Science? Daniel D Gutierrez

Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
Vipul Kalamkar
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
Claudiu Popa
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" Introduction
Ivan Gruer
 
Data Scientist - Good Rebels -
Data Scientist - Good Rebels -Data Scientist - Good Rebels -
Data Scientist - Good Rebels -
Good Rebels
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
Way-Yen Lin
 
data analytics lecture2.pptx
data analytics lecture2.pptxdata analytics lecture2.pptx
data analytics lecture2.pptx
NamrataBhatt8
 
API Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were OilAPI Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were Oil
Drew Bartkiewicz
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
Laurent Kinet
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
T.S. Lim
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databasesbutest
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
InnoTech
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
Sandip Tipayle Patil
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
Dylan Erens
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
Capgemini
 
DAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use Cases
DATAVERSITY
 
Building data science teams
Building data science teamsBuilding data science teams
Building data science teams
Gülşah Gürük, MSc, PMP®
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
caniceconsulting
 
The Business of AI
The Business of AIThe Business of AI
The Business of AI
Nabeel Adeni
 

Similar to What is Data Science? Daniel D Gutierrez (20)

Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
 
Big Data-Job 2
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" Introduction
 
Data Scientist - Good Rebels -
Data Scientist - Good Rebels -Data Scientist - Good Rebels -
Data Scientist - Good Rebels -
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
data analytics lecture2.pptx
data analytics lecture2.pptxdata analytics lecture2.pptx
data analytics lecture2.pptx
 
API Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were OilAPI Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were Oil
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databases
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
DAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use Cases
 
Building data science teams
Building data science teamsBuilding data science teams
Building data science teams
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
The Business of AI
The Business of AIThe Business of AI
The Business of AI
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

What is Data Science? Daniel D Gutierrez

  • 1. What is Data Science? Daniel D. Gutierrez, Data Scientist AMULET Analytics February 2014
  • 3. A Life in Data Science AMULET Analytics My personal consultancy doing work in data science – computational marketing Doing data analysis, machine learning and visualization for enterprises Wide breadth of industries: startups, manufacturing, non-profit, fashion, ecommerce, market research, etc. Big Data Journalist Managing Editor – insideBIGDATA.com Blogger – Big Data Republic (bigdatarepublic.com) Blogger – All Analytics (allanalytics.com) Teaching Community TA – Coursera UCLA Extension Writing a book: “Introduction to Machine Learning with R” / page 3
  • 4. Data Science in Perspective Facilitates a cascade of technologies Big Data is facilitated by data science Data science is facilitated by machine learning Machine learning is a confluence of technologies and disciplines – Computer science, mathematical statistics, probability theory, visualization Data science in nothing new! Components have been around for decades “Data science” is just a new name for something old and proven (I do love it!) “Machine learning” used to be “data mining” or KDD. Much hype recently Harvard Business Review proclaimed “sexiest job for the 21st century.” I’ll take it! Now with “big data” it’s a force barely contained / page 4
  • 8. Who Does Data Science? Unicorns! Controversy in hiring data scientists Some companies post job ads for unicorns, mythical creatures having no basis in reality Hire a data science TEAM! Don’t expect a single individual to be both a “theorist” and an “experimentalist” Consultant vs. full-time hire / page 8
  • 9. What is Big Data? Big Data – “large data sets so big that commonly-used software tools are unable to capture, curate, manage, and process the data within a tolerable elapsed time.” Hadoop Dominates Big Data market – Used widely by some of the world's largest websites, such as Facebook, eBay, Amazon and Yahoo – Moving into the enterprise – Invented by developers at Yahoo! Apache Hadoop / page 9
  • 10. Applications for Big Data Smarter Healthcare Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom “Big Data is the definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless.” Search Quality Manufacturing Source: Wikibon 2012 Trading Analytics Fraud and Risk Retail: Churn / page 10
  • 11. The Minnesota Dad Father and daughter walk into Target store and to speak with the manager: – Wants to know why the store is bombarding his teenage daughter with ads for baby strollers, diapers and other baby goods. "Are you trying to encourage her to get pregnant?” – The befuddled manager apologizes and responds he has no idea why the company is sending her such items Father later phones the store to apologize - turns out his daughter was expecting How? – Target used Big Data to predict pregnancy. When a woman begins buying vitamins, increases her purchases of lotion, and buys an oversized purse or bag, the odds are very high she is expecting – Target knew the daughter was pregnant before the family / page 11
  • 13. Machine Learning Overview What is Machine Learning? Components have been around for decades “Data science” is just a new name for something old and proven (I do love it!) “Machine learning” used to be “data mining” or KDD. Supervised learning Prediction and classification Linear regression, logistic regression, classification trees, SVM, neural nets Train the algorithm on known labelled data to be able to predict new data Unsupervised learning Hierarchical clustering K-means clustering Principal component analysis (PCA) Dimensionality reduction to address “the curse of dimensionality” / page 13
  • 15. R vs. Python Wars R – Very good for data acquisition, cleaning, munging, exploratory analysis, model selection, machine learning algorithm development and training, model performance evaluation – One of the best visualization tools bar none – Has over 4,000 packages Python – Good choice for production deployment – Rapidly catching up with R in terms of data science capabilities / page 15
  • 17. Learning More About Data Science Doing Data Science Cathy O’Neil & Rachel Schutt O’Reilly Media / page 17
  • 18. Data Science in Action / page 18
  • 19. Summary – Data Science is Here to Stay Integral part of Big Data – Data science and machine learning fuel big data ✔ The shortage of data scientists is real – – – – Big data is expected to be a $53.4 billion industry by 2016 ✔ Job postings for “data scientist” increased 15,000% between 2011 and 2012 ✔ Job market currently 140,000 – 190,000 open positions ✔ Between 2010-2020 project growth of 18.7% ✔ Companies of all sizes need to plan out their data science strategy – Increase value of enterprise data assets ✔ 2014 should be a wild year! – Conference circuit is exploding ✔ – New books, news sources, press coverage abound ✔ / page 19
  • 20. Thank you! Follow me: @AMULETAnalytics Contact me: dan@amuletanalytics.com www.amuletanalytics.com