SlideShare a Scribd company logo
Big Data and the
Art of Data Science
Andrew B. Gardner, PhD
www.linkedin.com/in/andywocky/
agardner@momentics.com
www.momentics.com
Big Data is Not New
Big Data Challenge
tion
e
old
8
1880 census – 50M people
The First Big Data Solution
• Hollerith Tabulating
System
• Punched cards – 80
variables
• Used for 1890 census
• 6 weeks instead of 7+
years
9
Hollerith Tabulation System
{age, number of insanes, …} 7 years  6 weeks
Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif
Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg
Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
Big Data Is More Than 3 Vs*
Volume Variety Velocity
*2001 (Meta) / 2012 (Gartner) Definition of Big Data
IDC Report 2011
8 billion TB in 2015
40 billion TB in 2020
90% of all data < 2 years
storage  transport
processing
relational, graph
time series, sensor,
audio, video, text,
geo, scientific, …
80% unstructured
facebook 500 TB/day
Large Hadron 35 GB/sec
twitter 300K tweets/min
real time  stream
Big Data Opportunities
“… big data market will grow from $3.2B (2010) to $16.9B (2015)…”
“… gains of 5-6% productivity and profitability …”
“… business volume will double every 1.2 years …”
“… required for companies to stay innovative and competitive …”
“… retail 60% increase in net margin attainable …”
“… manufacturing production costs decrease 50% …”
“… $300B annual savings in healthcare …”
IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
Big Data Successes
Walmart
• 10-15% online sales lift
• $1B incremental revenue
• Recommendations
• Engineered content
• 2012 Presidential Election • Fleet telematics save fuel
What’s Going On?
1: Growth of Data
Amount of data in the world…
2005
100 EB
2012
2800 EB
2013
8000 EB
1 EB = 1 Exabyte = 1 billion GB
… doubles every 2 years
2: Connectedness & Sources
More non-human
nodes online than
people
50B+ non-human
nodes online
The Internet of Things (IoT)
Source: Swan, M. Sensor Mania! The Internet of Things, Objective
Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3),
217-253.
social
mobile
web
enriched data
science
IoT
Data Sources
3: Demand
Increasing dependence on data.
4: Economics
Attention economy not information economy!
• Data is bountiful
• Storage is cheap
• Computing is cheap
• Analysis is cheap
• Talent is expensive
• Time is expensive
Big Data Disruption
• define schema
• pour in data
• analyze
Better Cycle Times and Better Questions Win!
 (few) well calculated
questions first
• collect data
• explore
• schema as needed
 data first then
exploratory decision
making
unknown unknowns = insight gold
OLD NEW
Rumsfeld Analytics
Things we
know
don’t know
we know
we don’t
know
we know
we don’t
know
Facts – could be wrong.
Questions – do reporting.
Intuition – quantify to improve.
Exploration– unfair advantages.
Goal: data discoveries = insights = game changers = unknown unknowns.
Data Alone is Just An Asset
• Depreciating
• Liability
• Useful lifetime
• Expense
Finished goods create value
from raw materials
data
$$ data product $$
Enter the Data Scientist
• mathematical
• developer
• data talented
• problem solver
• insight whisperer
• product savvy
Source: FICO Infographic
data + data scientist
$$ data product $$
A Brief History of Data Science
BC - The Greeks
1974 Peter Naur @ UoC
2001 William S. Cleveland @ CSU
2003 Journal of Data Science
2009 Jeff Hammerbacher @Facebook
2010 Hillary Mason & Chris Wiggins @ Dataists
2010 Mike Loukadis @ O'Reilly
2011 DJ Patil @ LinkedIn
Famous Definitions – New Blend
Conway’s “Data Science” Venn Diagram (2010)
Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
new skill blend:
one stop rock star
Famous Definitions – Skeptic
[… with a great salary]
Famous Definitions – Comparison
Many Flavors of Data Scientist
Alternatively, Data Roles × Skill Sets
Harlan Harris, et al.
datacommunitydc.org/ blog/ wp- content/ uploads/
Analyzing the Analyze
Harlan Harris, S
Marck Vaisman
O’Reilly, 2013
amazon.com/ dp
… from research
to development
to business-focused
Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013.
role
skill
2012-3 Survey
Universal Agreement: Scarcity
In 2018
Huge shortage of analytic
talent (140K+).
Gap of 1.5M managers that
can make decisions based on
data analysis
McKinsey Prediction
• Talent is the biggest resource
• There is a raging talent war
Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011).
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
The Data Scientist’s Craft
• Discover unknown unknowns in data
• Obtain predictive, actionable insight
• Communicate business data stories
• Build business decision confidence
• Create valuable Data Products
Valuable & Reusable Data Products
Image credit: Harlan Harris
Building Data Products
Objectives
Levers
Data
Models
What outcome am I trying to achieve?
What inputs can we control?
What data can we collect?
How do the levers impact the data?
Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
Data Product Aims
provide
increase
open
new
improve
data
Some Data Products
fitbit
flu tracker
amazon
traffic ads
SIRI
How Do Data Scientists Do It?
• Tools
• Workflow
• Creativity
Data Science Tools
• Java, R, Python
• Hadoop, HDFS, MapReduce, Spark, Storm
• HBase, Pig, Hive, Shark, Impala
• ETL, Webscrapers, Flume, Sqoop
• SQL, RDBMS, DW, OLAP
• Weka, RapidMiner, numpy, scipy, pandas
• D3.js, ggplot2, Wakari, Tableau, Flare, Shiny
• SPSS, Matlab, SAS
• NoSQL, MongoDB, Redis, ..
• MS-Excel
• Machine Learning
• ...
Data Science Workflow
Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to
the Factory: Building a Production Machine Learning Infrastructure.”
+ creative exploration
Data Science Creativity
TECHNOLOGY
(feasibility)
BUSINESS
(viability)
HUMAN VALUES
(usability, desirability)
1. Design thinking
2. Scientific method
3. Lots of ideas
4. Inspiration
5. Perspiration
Challenges for Data Scientists
• Stakeholder naivetee
– 2-3 days, right?
• Red tape
– No access allowed
• Terminology
– What’s a wonkulator?
• Real world data
– Messy, noisy, missing,
…
• Unknown need
– What’s the business goal?
• Stakeholder alignment
– CMO, CIO, Prod, DevOps
• Analysis distrust
– … but I don’t like that result
Some Practical Tips
Rapid Iteration
Implement Implement
Feedback
Visualize, Draw, Sketch, Share
Start Simple, Start Small Goal, But Not Perfection
Big Data Science & Sensemaking
Source: HP “Monetizing Big Data” Perspective.
A Final Word of Caution
big data
hypehope happy
time
expectations
cloud computing
2013 2018-2023
Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
Notable Quotes
Simple models and a lot of data trump more elaborate
models based on less data
- Peter Norvig
- W.E. Deming
In God we trust, all others bring data.
- Harvard Prof. Gary King
Big data is not about the data! The value in big data
[is in] the analytics.
Conclusion
• Data is an asset, talent is
a more valuable asset.
• Big data represents a
disruptive shift.
• Data science is the magic
enabler via Data Products.
• Better + faster
explorations &
questions win.
Andrew B. Gardner, PhD
http://linkd.in/1byADxC
agardner@momentics.com
www.momentics.com

More Related Content

What's hot

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
Kenny Daniel
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
Marcos Colebrook-Santamaria
 
Big data 101
Big data 101Big data 101
Big data 101
Lars Marius Garshol
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
Jason Geng
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016
Richard Vidgen
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
University of Washington
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
James Hendler
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
BigMine
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Richard Vidgen
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
Gabriel Moreira
 
Big data
Big dataBig data
Big data
Pooja Shah
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
fazail amin
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
Sri Ambati
 

What's hot (20)

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Big data 101
Big data 101Big data 101
Big data 101
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Big data
Big dataBig data
Big data
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 

Viewers also liked

Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
Gilles Louppe
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
Lightbend
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
Capgemini
 
分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋
The Japan DataScientist Society
 
データサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティストデータサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティスト
The Japan DataScientist Society
 
データサイエンスの全体像
データサイエンスの全体像データサイエンスの全体像
データサイエンスの全体像
The Japan DataScientist Society
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
nlab_utokyo
 

Viewers also liked (9)

Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
EVOLVE'13 | Keynote | Roy Fielding
EVOLVE'13 | Keynote | Roy FieldingEVOLVE'13 | Keynote | Roy Fielding
EVOLVE'13 | Keynote | Roy Fielding
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋
 
データサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティストデータサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティスト
 
データサイエンスの全体像
データサイエンスの全体像データサイエンスの全体像
データサイエンスの全体像
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
 

Similar to Big Data and the Art of Data Science

Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
TJ Stalcup
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Big Data Spain
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
Marcel Blattner, PhD
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
jybufgofasfbkpoovh
 
DBMS
DBMSDBMS
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
TJ Stalcup
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DC
TJ Stalcup
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
eGov Innovation Center
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
Robert Grossman
 
How Your Data Can Predict The Future
How Your Data Can Predict The FutureHow Your Data Can Predict The Future
How Your Data Can Predict The Future
Becky Wang
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
IJMER
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
University of Washington
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
Jay Gendron
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Thinkful
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
Thinkful
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
Thinkful
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
Han Woo PARK
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
Semantic Web Company
 

Similar to Big Data and the Art of Data Science (20)

Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
DBMS
DBMSDBMS
DBMS
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DC
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How Your Data Can Predict The Future
How Your Data Can Predict The FutureHow Your Data Can Predict The Future
How Your Data Can Predict The Future
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Big Data and the Art of Data Science

  • 1. Big Data and the Art of Data Science Andrew B. Gardner, PhD www.linkedin.com/in/andywocky/ agardner@momentics.com www.momentics.com
  • 2. Big Data is Not New Big Data Challenge tion e old 8 1880 census – 50M people The First Big Data Solution • Hollerith Tabulating System • Punched cards – 80 variables • Used for 1890 census • 6 weeks instead of 7+ years 9 Hollerith Tabulation System {age, number of insanes, …} 7 years  6 weeks Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
  • 3. Big Data Is More Than 3 Vs* Volume Variety Velocity *2001 (Meta) / 2012 (Gartner) Definition of Big Data IDC Report 2011 8 billion TB in 2015 40 billion TB in 2020 90% of all data < 2 years storage  transport processing relational, graph time series, sensor, audio, video, text, geo, scientific, … 80% unstructured facebook 500 TB/day Large Hadron 35 GB/sec twitter 300K tweets/min real time  stream
  • 4. Big Data Opportunities “… big data market will grow from $3.2B (2010) to $16.9B (2015)…” “… gains of 5-6% productivity and profitability …” “… business volume will double every 1.2 years …” “… required for companies to stay innovative and competitive …” “… retail 60% increase in net margin attainable …” “… manufacturing production costs decrease 50% …” “… $300B annual savings in healthcare …” IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
  • 5. Big Data Successes Walmart • 10-15% online sales lift • $1B incremental revenue • Recommendations • Engineered content • 2012 Presidential Election • Fleet telematics save fuel
  • 7. 1: Growth of Data Amount of data in the world… 2005 100 EB 2012 2800 EB 2013 8000 EB 1 EB = 1 Exabyte = 1 billion GB … doubles every 2 years
  • 8. 2: Connectedness & Sources More non-human nodes online than people 50B+ non-human nodes online The Internet of Things (IoT) Source: Swan, M. Sensor Mania! The Internet of Things, Objective Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3), 217-253. social mobile web enriched data science IoT Data Sources
  • 10. 4: Economics Attention economy not information economy! • Data is bountiful • Storage is cheap • Computing is cheap • Analysis is cheap • Talent is expensive • Time is expensive
  • 11. Big Data Disruption • define schema • pour in data • analyze Better Cycle Times and Better Questions Win!  (few) well calculated questions first • collect data • explore • schema as needed  data first then exploratory decision making unknown unknowns = insight gold OLD NEW
  • 12. Rumsfeld Analytics Things we know don’t know we know we don’t know we know we don’t know Facts – could be wrong. Questions – do reporting. Intuition – quantify to improve. Exploration– unfair advantages. Goal: data discoveries = insights = game changers = unknown unknowns.
  • 13. Data Alone is Just An Asset • Depreciating • Liability • Useful lifetime • Expense Finished goods create value from raw materials data $$ data product $$
  • 14. Enter the Data Scientist • mathematical • developer • data talented • problem solver • insight whisperer • product savvy Source: FICO Infographic data + data scientist $$ data product $$
  • 15. A Brief History of Data Science BC - The Greeks 1974 Peter Naur @ UoC 2001 William S. Cleveland @ CSU 2003 Journal of Data Science 2009 Jeff Hammerbacher @Facebook 2010 Hillary Mason & Chris Wiggins @ Dataists 2010 Mike Loukadis @ O'Reilly 2011 DJ Patil @ LinkedIn
  • 16. Famous Definitions – New Blend Conway’s “Data Science” Venn Diagram (2010) Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram new skill blend: one stop rock star
  • 17. Famous Definitions – Skeptic [… with a great salary]
  • 19. Many Flavors of Data Scientist Alternatively, Data Roles × Skill Sets Harlan Harris, et al. datacommunitydc.org/ blog/ wp- content/ uploads/ Analyzing the Analyze Harlan Harris, S Marck Vaisman O’Reilly, 2013 amazon.com/ dp … from research to development to business-focused Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013. role skill 2012-3 Survey
  • 20. Universal Agreement: Scarcity In 2018 Huge shortage of analytic talent (140K+). Gap of 1.5M managers that can make decisions based on data analysis McKinsey Prediction • Talent is the biggest resource • There is a raging talent war Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011). http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
  • 21. The Data Scientist’s Craft • Discover unknown unknowns in data • Obtain predictive, actionable insight • Communicate business data stories • Build business decision confidence • Create valuable Data Products
  • 22. Valuable & Reusable Data Products Image credit: Harlan Harris
  • 23. Building Data Products Objectives Levers Data Models What outcome am I trying to achieve? What inputs can we control? What data can we collect? How do the levers impact the data? Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
  • 25. Some Data Products fitbit flu tracker amazon traffic ads SIRI
  • 26. How Do Data Scientists Do It? • Tools • Workflow • Creativity
  • 27. Data Science Tools • Java, R, Python • Hadoop, HDFS, MapReduce, Spark, Storm • HBase, Pig, Hive, Shark, Impala • ETL, Webscrapers, Flume, Sqoop • SQL, RDBMS, DW, OLAP • Weka, RapidMiner, numpy, scipy, pandas • D3.js, ggplot2, Wakari, Tableau, Flare, Shiny • SPSS, Matlab, SAS • NoSQL, MongoDB, Redis, .. • MS-Excel • Machine Learning • ...
  • 28. Data Science Workflow Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to the Factory: Building a Production Machine Learning Infrastructure.” + creative exploration
  • 29. Data Science Creativity TECHNOLOGY (feasibility) BUSINESS (viability) HUMAN VALUES (usability, desirability) 1. Design thinking 2. Scientific method 3. Lots of ideas 4. Inspiration 5. Perspiration
  • 30. Challenges for Data Scientists • Stakeholder naivetee – 2-3 days, right? • Red tape – No access allowed • Terminology – What’s a wonkulator? • Real world data – Messy, noisy, missing, … • Unknown need – What’s the business goal? • Stakeholder alignment – CMO, CIO, Prod, DevOps • Analysis distrust – … but I don’t like that result
  • 31. Some Practical Tips Rapid Iteration Implement Implement Feedback Visualize, Draw, Sketch, Share Start Simple, Start Small Goal, But Not Perfection
  • 32. Big Data Science & Sensemaking Source: HP “Monetizing Big Data” Perspective.
  • 33. A Final Word of Caution big data hypehope happy time expectations cloud computing 2013 2018-2023 Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
  • 34. Notable Quotes Simple models and a lot of data trump more elaborate models based on less data - Peter Norvig - W.E. Deming In God we trust, all others bring data. - Harvard Prof. Gary King Big data is not about the data! The value in big data [is in] the analytics.
  • 35. Conclusion • Data is an asset, talent is a more valuable asset. • Big data represents a disruptive shift. • Data science is the magic enabler via Data Products. • Better + faster explorations & questions win. Andrew B. Gardner, PhD http://linkd.in/1byADxC agardner@momentics.com www.momentics.com

Editor's Notes

  1. Herman HollerithObsolete1880 – 50,189,2091890 – 62,947,714
  2. ~ 15 mins via 10Gbps LAN to transfer 1TB~ 220 hrs for 1 PB =&gt; move the servers?
  3. Harlan Harris
  4. Data is the new currency of business.Understand customer use, behavior, and interests. Targeted products and marketing offers Understand customer experience across network, services, and social conversation.Network optimization Connect with OTT players, advertisers, and verticals. New business models