SlideShare a Scribd company logo
1 of 18
Data Science
  Data Meetup Jan. 12
What is data science?
Besides a reason to have beer and pizza…
What does the literature say?
Hacking
“Good data scientists understand, in a
deep way, that the heavy lifting of
cleanup and preparation isn’t
something that gets in the way of solving
the problem… it is the problem”
                                   DJ Patil



 bash/awk/sed
Statistics
What’s the probability that 2 people in
the front 2 rows share a birthday?
1. ~10%
2. ~20%
3. ~50%
4. ~90%

What’s the probability that a 99%
accurate test diagnosed a 1/1000 disease?
1. ~10%
2. ~50%
3. ~90%
4. ~99%
Domain Expertise
Intelligence Cookbook
      Just follow the steps
The Recipe

First, make it valuable.
Then, make it possible.
Then, make it beautiful.
 Then, make it smart.
Example

E-Commerce website
Make it valuable

Find a KPI that is correlated
   to bottom line revenue


e.g. number of products the
  visitor browses through
Make it possible

Develop the simplest heuristic



e.g. show the visitor one of the
     top 10 selling products
Make it beautiful

Create a method to quickly test new
    algorithms against old ones


 e.g. create a framework that split
   tests two models and reports
         which one is better
Make it smart

Figure out in what field your problem is
 and choose an off the shelf algorithm


    e.g. recognize that the problem
   is product recommendation and
       use collaborative filtering
Common ML problems
•   Supervised learning
    •   Classification
    •   Regression
    •   Anomaly detection
•   Unsupervised learning
    •   Clustering
    •   Separation
•   Recommendation
    •   Feature based recommendation
    •   Collaborative filtering
•   Search
    •   Indexing
    •   Ranking
To sum it all up
Real data science is hard

but …

Real data science is the last step in data
science, not the first

and besides …

The most important thing in data science is
the business, not the science
Questions?

email: vitalyp@liveperson.com

     Twitter: @bigdatasc

More Related Content

What's hot

TDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral EconTDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral EconStephen Wendel
 
How to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistHow to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistTanayKarnik1
 
Nabep analytics presentation
Nabep analytics presentationNabep analytics presentation
Nabep analytics presentationaarongblack1
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJuliosarahdijulio
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenBigDataExpo
 
DataScienceSummit2016
DataScienceSummit2016DataScienceSummit2016
DataScienceSummit2016Paolo Massimi
 
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @RasaTheFamily
 
Start Thinking Like a Data Scientist
Start Thinking Like a Data ScientistStart Thinking Like a Data Scientist
Start Thinking Like a Data ScientistAmanMehta47
 
Making fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningMaking fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningBrad Klingenberg
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupDavid Johnston
 
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoBig Data Spain
 
Design Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopDesign Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopIntuit Inc.
 

What's hot (14)

TDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral EconTDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral Econ
 
How to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistHow to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data Scientist
 
Nabep analytics presentation
Nabep analytics presentationNabep analytics presentation
Nabep analytics presentation
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDriven
 
DataScienceSummit2016
DataScienceSummit2016DataScienceSummit2016
DataScienceSummit2016
 
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
 
Start Thinking Like a Data Scientist
Start Thinking Like a Data ScientistStart Thinking Like a Data Scientist
Start Thinking Like a Data Scientist
 
Making fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningMaking fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learning
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
 
Idea generation
Idea generationIdea generation
Idea generation
 
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
 
Design Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopDesign Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoop
 
Essentials op3
Essentials op3Essentials op3
Essentials op3
 

Viewers also liked

Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphVitaly Gordon
 
Building Data Products
Building Data ProductsBuilding Data Products
Building Data ProductsCloudera, Inc.
 
LinkedIn Data Products
LinkedIn Data ProductsLinkedIn Data Products
LinkedIn Data ProductsVitaly Gordon
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Vitaly Gordon
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 

Viewers also liked (7)

Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic Graph
 
Building Data Products
Building Data ProductsBuilding Data Products
Building Data Products
 
LinkedIn Data Products
LinkedIn Data ProductsLinkedIn Data Products
LinkedIn Data Products
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 

Similar to Big data meetup

Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data ScienceMandar Parikh
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewAnidata
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Onlinesfdatascience
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineDan Meyer
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)Lakshmi Prasanna
 
CYCLES Course (2): Alignment
CYCLES Course (2): AlignmentCYCLES Course (2): Alignment
CYCLES Course (2): AlignmentBryan Cassady
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
Ala virtual july2012
Ala virtual july2012Ala virtual july2012
Ala virtual july2012Stephen Abram
 
How to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product ManagerHow to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product ManagerProduct School
 
Digital analytics lecture1
Digital analytics lecture1Digital analytics lecture1
Digital analytics lecture1Joni Salminen
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist? BICC Thomas More
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher educationPeter Trkman
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Googleprateek kumar
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Watershed
 

Similar to Big data meetup (20)

Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics Outline
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
 
CYCLES Course (2): Alignment
CYCLES Course (2): AlignmentCYCLES Course (2): Alignment
CYCLES Course (2): Alignment
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Ala virtual july2012
Ala virtual july2012Ala virtual july2012
Ala virtual july2012
 
How to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product ManagerHow to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product Manager
 
Oclc cla2012 abram
Oclc cla2012 abramOclc cla2012 abram
Oclc cla2012 abram
 
Digital analytics lecture1
Digital analytics lecture1Digital analytics lecture1
Digital analytics lecture1
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher education
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Google
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
 
Saoug
SaougSaoug
Saoug
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Big data meetup

  • 1. Data Science Data Meetup Jan. 12
  • 2. What is data science? Besides a reason to have beer and pizza…
  • 3.
  • 4.
  • 5. What does the literature say?
  • 6. Hacking “Good data scientists understand, in a deep way, that the heavy lifting of cleanup and preparation isn’t something that gets in the way of solving the problem… it is the problem” DJ Patil bash/awk/sed
  • 7. Statistics What’s the probability that 2 people in the front 2 rows share a birthday? 1. ~10% 2. ~20% 3. ~50% 4. ~90% What’s the probability that a 99% accurate test diagnosed a 1/1000 disease? 1. ~10% 2. ~50% 3. ~90% 4. ~99%
  • 9. Intelligence Cookbook Just follow the steps
  • 10. The Recipe First, make it valuable. Then, make it possible. Then, make it beautiful. Then, make it smart.
  • 12. Make it valuable Find a KPI that is correlated to bottom line revenue e.g. number of products the visitor browses through
  • 13. Make it possible Develop the simplest heuristic e.g. show the visitor one of the top 10 selling products
  • 14. Make it beautiful Create a method to quickly test new algorithms against old ones e.g. create a framework that split tests two models and reports which one is better
  • 15. Make it smart Figure out in what field your problem is and choose an off the shelf algorithm e.g. recognize that the problem is product recommendation and use collaborative filtering
  • 16. Common ML problems • Supervised learning • Classification • Regression • Anomaly detection • Unsupervised learning • Clustering • Separation • Recommendation • Feature based recommendation • Collaborative filtering • Search • Indexing • Ranking
  • 17. To sum it all up Real data science is hard but … Real data science is the last step in data science, not the first and besides … The most important thing in data science is the business, not the science