SlideShare a Scribd company logo
1 of 36
Recruiting Solutions 
1 
My Three Ex’s: 
A Data Science Approach for 
Applied Machine Learning
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/data-science-machine-learning 
InfoQ.com: News & Community Site 
• 750,000 unique visitors/month 
• Published in 4 languages (English, Chinese, Japanese and Brazilian 
Portuguese) 
• Post content from our QCon conferences 
• News 15-20 / week 
• Articles 3-4 / week 
• Presentations (videos) 12-15 / week 
• Interviews 2-3 / week 
• Books 1 / month
Presented at QCon San Francisco 
www.qconsf.com 
Purpose of QCon 
- to empower software development by facilitating the spread of 
knowledge and innovation 
Strategy 
- practitioner-driven conference designed for YOU: influencers of 
change and innovation in your teams 
- speakers and topics driving the evolution and innovation 
- connecting and catalyzing the influencers and innovators 
Highlights 
- attended by more than 12,000 delegates since 2007 
- held in 9 cities worldwide
Dedicated to 3 of my favorite ex co-workers.
First, a disclosure. 
This isn’t a talk about machine learning. 
It’s a talk about applying machine learning. 
What’s the difference? 
3
Let’s talk about something else for a moment. 
Hash Tables 
4
What you (need to) know about hash tables. 
Theory Application 
5 
Class HashMap<K,V> 
java.lang.Object 
java.util.AbstractMap<K,V> 
java.util.HashMap<K,V> 
Type Parameters: 
K - the type of keys maintained by this map 
V - the type of mapped values 
All Implemented Interfaces: 
Serializable, Cloneable, Map<K,V>
Now let’s get back to machine learning! 
6
Please allow me to introduce my three ex’s. 
Express. 
Explain. 
Experiment. 
7
Embrace the data science mindset. 
Express 
Understand your utility and inputs. 
Explain 
Understand your models and metrics. 
Experiment 
Optimize for the speed of learning. 
8
Express. 
9
How to train your machine learning model. 
1. Define your objective function. 
2. Collect training data. 
3. Build models. 
4. Profit! 
10
You can only improve what you measure. 
11 
Clicks? 
Actions? 
Outcomes?
Be careful how you define precision… 
12
Account for non-uniform inputs and costs. 
13
Stratified sampling is your friend. 
14
An example of segmenting models. 
15 
Searcher: Recruiter 
Query: Person Name 
Searcher: Job Seeker 
Query: Person Name 
Searcher: Recruiter 
Query: Job Title 
Searcher: Job Seeker 
Query: Job Title
Express yourself in your feature vectors. 
16
Express: Summary. 
 Choose an objective function that models utility. 
 Be careful how you define precision. 
 Account for non-uniform inputs and costs. 
 Stratified sampling is your friend. 
 Express yourself in your feature vectors. 
17
Explain. 
18
With apologies to the little prince. 
19
Everyone is talking about Deep Learning. 
20
But accuracy isn’t everything. 
21
Explainable models, explainable features. 
 Less is more when it comes to explainability. 
 Algorithms can protect you from overfitting, but they can’t 
protect you from the biases you introduce. 
 Introspection into your models and features makes it 
easier for you and others to debug them. 
 Especially if you don’t completely trust your objective 
function or the representativeness of your training data. 
22
Linear regression? Decision trees? 
 Linear regression and decision trees favor explainability 
over accuracy, compared to more sophisticated models. 
 But size matters. If you have too many features or too 
deep a decision tree, you lose explainability. 
 You can always upgrade to a more sophisticated model 
when you trust your objective function and training data. 
 Build a machine learning model is an iterative process. 
Optimize for the speed of your own learning. 
23
Explain: Summary. 
 Accuracy isn’t everything. 
 Less is more when it comes to explainability. 
 Don’t knock linear models and decision trees! 
 Start with simple models, then upgrade. 
24
Experiment. 
25
Why experiments matter. 
“You have to kiss a lot of frogs to find one prince. 
So how can you find your prince faster? 
By finding more frogs and kissing them faster and 
faster.” 
-- Mike Moran 
26
Life in the age of big data. 
Yesterday Today 
27 
Experiments are expensive, 
choose hypotheses wisely. 
Experiments are cheap, 
do as many as you can!
So should we just test everything? 
28
Optimize for the speed of learning. 
29 
vs
Be disciplined: test one variable at a time. 
• Autocomplete 
• Entity Tagging 
• Vertical Intent 
• # of Suggestions 
• Suggestion Order 
• Language 
• Query Construction 
• Ranking Model 
30
Experiment: Summary. 
 Kiss lots of frogs: experiments are cheap. 
 But test in good faith – don’t just flip coins. 
 Optimize for the speed of learning. 
 Be disciplined: test one variable at a time. 
31
Bringing it all together. 
Express 
Understand your utility and inputs. 
Explain 
Understand your models and metrics. 
Experiment 
Optimize for the speed of learning. 
32
33 
Daniel Tunkelang 
dtunkelang@linkedin.com 
https://linkedin.com/in/dtunkelang
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/data-science- 
machine-learning

More Related Content

More from C4Media

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 

More from C4Media (20)

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 

Recently uploaded

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

My Three Ex’s: A Data Science Approach for Applied Machine Learning

  • 1. Recruiting Solutions 1 My Three Ex’s: A Data Science Approach for Applied Machine Learning
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /data-science-machine-learning InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Dedicated to 3 of my favorite ex co-workers.
  • 5. First, a disclosure. This isn’t a talk about machine learning. It’s a talk about applying machine learning. What’s the difference? 3
  • 6. Let’s talk about something else for a moment. Hash Tables 4
  • 7. What you (need to) know about hash tables. Theory Application 5 Class HashMap<K,V> java.lang.Object java.util.AbstractMap<K,V> java.util.HashMap<K,V> Type Parameters: K - the type of keys maintained by this map V - the type of mapped values All Implemented Interfaces: Serializable, Cloneable, Map<K,V>
  • 8. Now let’s get back to machine learning! 6
  • 9. Please allow me to introduce my three ex’s. Express. Explain. Experiment. 7
  • 10. Embrace the data science mindset. Express Understand your utility and inputs. Explain Understand your models and metrics. Experiment Optimize for the speed of learning. 8
  • 12. How to train your machine learning model. 1. Define your objective function. 2. Collect training data. 3. Build models. 4. Profit! 10
  • 13. You can only improve what you measure. 11 Clicks? Actions? Outcomes?
  • 14. Be careful how you define precision… 12
  • 15. Account for non-uniform inputs and costs. 13
  • 16. Stratified sampling is your friend. 14
  • 17. An example of segmenting models. 15 Searcher: Recruiter Query: Person Name Searcher: Job Seeker Query: Person Name Searcher: Recruiter Query: Job Title Searcher: Job Seeker Query: Job Title
  • 18. Express yourself in your feature vectors. 16
  • 19. Express: Summary.  Choose an objective function that models utility.  Be careful how you define precision.  Account for non-uniform inputs and costs.  Stratified sampling is your friend.  Express yourself in your feature vectors. 17
  • 21. With apologies to the little prince. 19
  • 22. Everyone is talking about Deep Learning. 20
  • 23. But accuracy isn’t everything. 21
  • 24. Explainable models, explainable features.  Less is more when it comes to explainability.  Algorithms can protect you from overfitting, but they can’t protect you from the biases you introduce.  Introspection into your models and features makes it easier for you and others to debug them.  Especially if you don’t completely trust your objective function or the representativeness of your training data. 22
  • 25. Linear regression? Decision trees?  Linear regression and decision trees favor explainability over accuracy, compared to more sophisticated models.  But size matters. If you have too many features or too deep a decision tree, you lose explainability.  You can always upgrade to a more sophisticated model when you trust your objective function and training data.  Build a machine learning model is an iterative process. Optimize for the speed of your own learning. 23
  • 26. Explain: Summary.  Accuracy isn’t everything.  Less is more when it comes to explainability.  Don’t knock linear models and decision trees!  Start with simple models, then upgrade. 24
  • 28. Why experiments matter. “You have to kiss a lot of frogs to find one prince. So how can you find your prince faster? By finding more frogs and kissing them faster and faster.” -- Mike Moran 26
  • 29. Life in the age of big data. Yesterday Today 27 Experiments are expensive, choose hypotheses wisely. Experiments are cheap, do as many as you can!
  • 30. So should we just test everything? 28
  • 31. Optimize for the speed of learning. 29 vs
  • 32. Be disciplined: test one variable at a time. • Autocomplete • Entity Tagging • Vertical Intent • # of Suggestions • Suggestion Order • Language • Query Construction • Ranking Model 30
  • 33. Experiment: Summary.  Kiss lots of frogs: experiments are cheap.  But test in good faith – don’t just flip coins.  Optimize for the speed of learning.  Be disciplined: test one variable at a time. 31
  • 34. Bringing it all together. Express Understand your utility and inputs. Explain Understand your models and metrics. Experiment Optimize for the speed of learning. 32
  • 35. 33 Daniel Tunkelang dtunkelang@linkedin.com https://linkedin.com/in/dtunkelang
  • 36. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/data-science- machine-learning