SlideShare a Scribd company logo
THE ART OF DATA
SCIENCE
Josh Wills
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
2© Cloudera, Inc. All rights reserved.
Or, A Less Pretentious Title
Josh Wills | Senior Director of Data Science
The Art of Data Science
3© Cloudera, Inc. All rights reserved.
The Data Science State of the Union
4© Cloudera, Inc. All rights reserved.
Data Scientists At Work
5© Cloudera, Inc. All rights reserved.
Data Scientists at Home
6© Cloudera, Inc. All rights reserved.
Data Scientists…Everywhere?
7© Cloudera, Inc. All rights reserved.
Creating Some Definition
8© Cloudera, Inc. All rights reserved.
The Mismeasure of Data Scientists
9© Cloudera, Inc. All rights reserved.
The Tremendous Promise of Big Data
10© Cloudera, Inc. All rights reserved.
The Unfortunate Reality of Big Data
11© Cloudera, Inc. All rights reserved.
Like Urban Planning, but for Data
12© Cloudera, Inc. All rights reserved.
Data Modeling for Data Science
13© Cloudera, Inc. All rights reserved.
Event Series Analytics
14© Cloudera, Inc. All rights reserved.
A Simple Star Schema for Spell Correction
15© Cloudera, Inc. All rights reserved.
A Supernova Schema for Search
16© Cloudera, Inc. All rights reserved.
Spell Correction in SQL
17© Cloudera, Inc. All rights reserved.
The Operational/Analytical Impedance Mismatch
18© Cloudera, Inc. All rights reserved.
Exhibit: http://github.com/jwills/exhibit
19© Cloudera, Inc. All rights reserved.
Pushing Beyond The Limits of Our Tools
20© Cloudera, Inc. All rights reserved.
Thanks!
jwills@cloudera.com

More Related Content

What's hot

Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl
 
Data Scientist: Sexiest job of the 21st century
Data Scientist: Sexiest job of the 21st centuryData Scientist: Sexiest job of the 21st century
Data Scientist: Sexiest job of the 21st century
Muyeena Khanzada
 
6 reasons to use Workly
6 reasons to use Workly 6 reasons to use Workly
6 reasons to use Workly
Nadya Russkikh
 
Paul Sonderegger, Oracle MassTLC Big Data Summit Keynote
Paul Sonderegger, Oracle MassTLC Big Data Summit KeynotePaul Sonderegger, Oracle MassTLC Big Data Summit Keynote
Paul Sonderegger, Oracle MassTLC Big Data Summit KeynoteMassTLC
 
Hadoop UK Strata Panel Discussion
Hadoop UK Strata Panel DiscussionHadoop UK Strata Panel Discussion
Hadoop UK Strata Panel Discussion
huguk
 
Forbes co2 and temperature presentation for earth day at cua april 22 2015 ...
Forbes   co2 and temperature presentation for earth day at cua april 22 2015 ...Forbes   co2 and temperature presentation for earth day at cua april 22 2015 ...
Forbes co2 and temperature presentation for earth day at cua april 22 2015 ...
Kevin Forbes
 

What's hot (6)

Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data Silos
 
Data Scientist: Sexiest job of the 21st century
Data Scientist: Sexiest job of the 21st centuryData Scientist: Sexiest job of the 21st century
Data Scientist: Sexiest job of the 21st century
 
6 reasons to use Workly
6 reasons to use Workly 6 reasons to use Workly
6 reasons to use Workly
 
Paul Sonderegger, Oracle MassTLC Big Data Summit Keynote
Paul Sonderegger, Oracle MassTLC Big Data Summit KeynotePaul Sonderegger, Oracle MassTLC Big Data Summit Keynote
Paul Sonderegger, Oracle MassTLC Big Data Summit Keynote
 
Hadoop UK Strata Panel Discussion
Hadoop UK Strata Panel DiscussionHadoop UK Strata Panel Discussion
Hadoop UK Strata Panel Discussion
 
Forbes co2 and temperature presentation for earth day at cua april 22 2015 ...
Forbes   co2 and temperature presentation for earth day at cua april 22 2015 ...Forbes   co2 and temperature presentation for earth day at cua april 22 2015 ...
Forbes co2 and temperature presentation for earth day at cua april 22 2015 ...
 

Viewers also liked

The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
CS, NcState
 
Data Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8thData Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8th
Jonathan Woodward
 
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
MLconf
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
 
The Art of Data Science
The Art of Data ScienceThe Art of Data Science
The Art of Data Science
Bostjan Kaluza
 
Web design.ppt
Web design.pptWeb design.ppt
Web design.ppt
twistseo11
 
Punchcut Magic : Design for Magic - Think Like A Magician
Punchcut Magic : Design for Magic - Think Like A MagicianPunchcut Magic : Design for Magic - Think Like A Magician
Punchcut Magic : Design for Magic - Think Like A Magician
Punchcut
 
Stefan Kleinberger: Google Adwords zur Bewerbung von Veranstaltungen
Stefan Kleinberger: Google Adwords zur Bewerbung von VeranstaltungenStefan Kleinberger: Google Adwords zur Bewerbung von Veranstaltungen
Stefan Kleinberger: Google Adwords zur Bewerbung von Veranstaltungen
Christian Henner-Fehr
 
Lies, Damn Lies and Social Statistics
Lies, Damn Lies and Social StatisticsLies, Damn Lies and Social Statistics
Lies, Damn Lies and Social Statistics
Ogilvy Consulting
 
Beyond Measure, Erika Hall
Beyond Measure, Erika HallBeyond Measure, Erika Hall
Beyond Measure, Erika Hall
Future Insights
 
20 Simple Christmas Craft projects for the family!
20 Simple Christmas Craft projects for the family!20 Simple Christmas Craft projects for the family!
20 Simple Christmas Craft projects for the family!
Howcrafts
 
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
BI Brainz
 
Evaluating, adapting elt materials and technology in elt
Evaluating, adapting elt materials and technology in elt Evaluating, adapting elt materials and technology in elt
Evaluating, adapting elt materials and technology in elt
Pradana Akbar Tanjung
 
Top 10 Excel Beginner Basics
Top 10 Excel Beginner BasicsTop 10 Excel Beginner Basics
Top 10 Excel Beginner Basics
Wiley
 
Ten Tips to Make You More Productive in Excel
Ten Tips to Make You More Productive in ExcelTen Tips to Make You More Productive in Excel
Ten Tips to Make You More Productive in Excel
LinkedIn Learning Solutions
 
An introduction to open data
An introduction to open dataAn introduction to open data
An introduction to open data
Sally Lait
 
The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015
The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015
The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015
ArabNet ME
 
How marketers need to prepare for the wearables explosion
How marketers need to prepare for the wearables explosionHow marketers need to prepare for the wearables explosion
How marketers need to prepare for the wearables explosion
Spendsetter
 
Data Gravity, IoT, and Time Series - ThingMonk 2015
Data Gravity, IoT, and Time Series - ThingMonk 2015Data Gravity, IoT, and Time Series - ThingMonk 2015
Data Gravity, IoT, and Time Series - ThingMonk 2015
dave.m
 
9 handy Excel demos
9 handy Excel demos9 handy Excel demos
9 handy Excel demos
CPA Australia
 

Viewers also liked (20)

The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
Data Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8thData Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8th
 
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
The Art of Data Science
The Art of Data ScienceThe Art of Data Science
The Art of Data Science
 
Web design.ppt
Web design.pptWeb design.ppt
Web design.ppt
 
Punchcut Magic : Design for Magic - Think Like A Magician
Punchcut Magic : Design for Magic - Think Like A MagicianPunchcut Magic : Design for Magic - Think Like A Magician
Punchcut Magic : Design for Magic - Think Like A Magician
 
Stefan Kleinberger: Google Adwords zur Bewerbung von Veranstaltungen
Stefan Kleinberger: Google Adwords zur Bewerbung von VeranstaltungenStefan Kleinberger: Google Adwords zur Bewerbung von Veranstaltungen
Stefan Kleinberger: Google Adwords zur Bewerbung von Veranstaltungen
 
Lies, Damn Lies and Social Statistics
Lies, Damn Lies and Social StatisticsLies, Damn Lies and Social Statistics
Lies, Damn Lies and Social Statistics
 
Beyond Measure, Erika Hall
Beyond Measure, Erika HallBeyond Measure, Erika Hall
Beyond Measure, Erika Hall
 
20 Simple Christmas Craft projects for the family!
20 Simple Christmas Craft projects for the family!20 Simple Christmas Craft projects for the family!
20 Simple Christmas Craft projects for the family!
 
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
 
Evaluating, adapting elt materials and technology in elt
Evaluating, adapting elt materials and technology in elt Evaluating, adapting elt materials and technology in elt
Evaluating, adapting elt materials and technology in elt
 
Top 10 Excel Beginner Basics
Top 10 Excel Beginner BasicsTop 10 Excel Beginner Basics
Top 10 Excel Beginner Basics
 
Ten Tips to Make You More Productive in Excel
Ten Tips to Make You More Productive in ExcelTen Tips to Make You More Productive in Excel
Ten Tips to Make You More Productive in Excel
 
An introduction to open data
An introduction to open dataAn introduction to open data
An introduction to open data
 
The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015
The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015
The State of Payments 2015 by PayFort - ArabNet Digital Summit 2015
 
How marketers need to prepare for the wearables explosion
How marketers need to prepare for the wearables explosionHow marketers need to prepare for the wearables explosion
How marketers need to prepare for the wearables explosion
 
Data Gravity, IoT, and Time Series - ThingMonk 2015
Data Gravity, IoT, and Time Series - ThingMonk 2015Data Gravity, IoT, and Time Series - ThingMonk 2015
Data Gravity, IoT, and Time Series - ThingMonk 2015
 
9 handy Excel demos
9 handy Excel demos9 handy Excel demos
9 handy Excel demos
 

Similar to The Art of Data Science

Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Cloudera, Inc.
 
Meet up roadmap cloudera 2020 - janeiro
Meet up   roadmap cloudera 2020 - janeiroMeet up   roadmap cloudera 2020 - janeiro
Meet up roadmap cloudera 2020 - janeiro
Thiago Santiago
 
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
Big Data Roundtable. Why, how, where, which, and when to start doing Big DataBig Data Roundtable. Why, how, where, which, and when to start doing Big Data
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
Raul Goycoolea Seoane
 
Data Science Perspective and DS demo
Data Science Perspective and DS demo Data Science Perspective and DS demo
Data Science Perspective and DS demo
PivotalOpenSourceHub
 
Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...
Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...
Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...
Neo4j
 
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v0213 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v02Erin Kerrigan
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
Wes McKinney
 
Keynote: Art of the Possible - Moore
Keynote: Art of the Possible - MooreKeynote: Art of the Possible - Moore
Keynote: Art of the Possible - Moore
Neo4j
 
Cloudera for Internet of Things
Cloudera for Internet of ThingsCloudera for Internet of Things
Cloudera for Internet of Things
Cloudera, Inc.
 
K2 oracle big data at work transform your business with analytics
K2 oracle big data at work transform your business with analyticsK2 oracle big data at work transform your business with analytics
K2 oracle big data at work transform your business with analyticsDr. Wilfred Lin (Ph.D.)
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
joshwills
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
Ray Bugg
 
Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?
Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?
Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?Digiday
 
Scraping Classifieds Ads Sites
Scraping Classifieds Ads SitesScraping Classifieds Ads Sites
Scraping Classifieds Ads Sites
PromptCloud
 
Making Big Data Projects Successful - Data Science Pop-up Seattle
Making Big Data Projects Successful - Data Science Pop-up SeattleMaking Big Data Projects Successful - Data Science Pop-up Seattle
Making Big Data Projects Successful - Data Science Pop-up Seattle
Domino Data Lab
 
Data Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business InvestmentData Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business Investment
Kalido
 
Oracle Big data at work
Oracle Big data at workOracle Big data at work
Oracle Big data at work
solarisyougood
 
Software is Eating the World, And You're For Lunch"
Software is Eating the World, And You're For Lunch"Software is Eating the World, And You're For Lunch"
Software is Eating the World, And You're For Lunch"
Extract Data Conference
 
Government and Education Webinar: Cyber Technology to Enable Operator Effecti...
Government and Education Webinar: Cyber Technology to Enable Operator Effecti...Government and Education Webinar: Cyber Technology to Enable Operator Effecti...
Government and Education Webinar: Cyber Technology to Enable Operator Effecti...
SolarWinds
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Cloudera, Inc.
 

Similar to The Art of Data Science (20)

Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
 
Meet up roadmap cloudera 2020 - janeiro
Meet up   roadmap cloudera 2020 - janeiroMeet up   roadmap cloudera 2020 - janeiro
Meet up roadmap cloudera 2020 - janeiro
 
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
Big Data Roundtable. Why, how, where, which, and when to start doing Big DataBig Data Roundtable. Why, how, where, which, and when to start doing Big Data
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
 
Data Science Perspective and DS demo
Data Science Perspective and DS demo Data Science Perspective and DS demo
Data Science Perspective and DS demo
 
Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...
Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...
Neo4j GraphSummit Copenhagen - The Art Of The Possible With Graph Technology ...
 
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v0213 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
 
Keynote: Art of the Possible - Moore
Keynote: Art of the Possible - MooreKeynote: Art of the Possible - Moore
Keynote: Art of the Possible - Moore
 
Cloudera for Internet of Things
Cloudera for Internet of ThingsCloudera for Internet of Things
Cloudera for Internet of Things
 
K2 oracle big data at work transform your business with analytics
K2 oracle big data at work transform your business with analyticsK2 oracle big data at work transform your business with analytics
K2 oracle big data at work transform your business with analytics
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?
Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?
Dstillery at DES: Distillation of Behaviors: What Are My Customers Doing?
 
Scraping Classifieds Ads Sites
Scraping Classifieds Ads SitesScraping Classifieds Ads Sites
Scraping Classifieds Ads Sites
 
Making Big Data Projects Successful - Data Science Pop-up Seattle
Making Big Data Projects Successful - Data Science Pop-up SeattleMaking Big Data Projects Successful - Data Science Pop-up Seattle
Making Big Data Projects Successful - Data Science Pop-up Seattle
 
Data Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business InvestmentData Scientists: Your Must-Have Business Investment
Data Scientists: Your Must-Have Business Investment
 
Oracle Big data at work
Oracle Big data at workOracle Big data at work
Oracle Big data at work
 
Software is Eating the World, And You're For Lunch"
Software is Eating the World, And You're For Lunch"Software is Eating the World, And You're For Lunch"
Software is Eating the World, And You're For Lunch"
 
Government and Education Webinar: Cyber Technology to Enable Operator Effecti...
Government and Education Webinar: Cyber Technology to Enable Operator Effecti...Government and Education Webinar: Cyber Technology to Enable Operator Effecti...
Government and Education Webinar: Cyber Technology to Enable Operator Effecti...
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
 

More from odsc

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
odsc
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
odsc
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
odsc
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
odsc
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
odsc
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
odsc
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
odsc
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
odsc
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
odsc
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
odsc
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
odsc
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
odsc
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
odsc
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Science
odsc
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
odsc
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
odsc
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
odsc
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
odsc
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
odsc
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 

More from odsc (20)

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Science
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

The Art of Data Science

Editor's Notes

  1. Data cleansing, preparation, feature engineering. The dirty work we do all day at the insight factory.
  2. Does everyone have to be a data scientist? Are there tools that can make anyone into a data scientist?
  3. http://www.quora.com/What-is-the-difference-between-a-data-scientist-and-a-statistician
  4. Data scientists can do two things better than data analysts: ask great questions and answer them faster than other people would think possible.
  5. Everyone gets a Ferrari!
  6. Oh no! Everyone has a Ferrari! Induced demand: as you increase the supply of something, the demand for it increases as well.
  7. We need the equivalent of public transit infrastructure for analytic queries: low marginal cost for asking one more question, goes the places most people need to go, removes load from the roadways.
  8. The spell correction example as a model for what the public transit infrastructure should look like.
  9. Can we create a data model that makes this kind of powerful analysis available to people who only know SQL?
  10. Even better, can a common data model enable us to seamlessly move models from the offline, analytical world to the online, operational world? Because the supernova data model is essentially the HBase/Cassandra/Mongo/etc. data model.
  11. http://github.com/jwills/exhibit
  12. No tool can make you a data scientist, because it’s the ability to push beyond the limits of your tools that makes you a data scientist.