SlideShare a Scribd company logo
1 of 30
LEVERAGING
Dylan Hogg - Head of Data Science
MACHINE LEARNING
FOR COMPETITIVE
February 2016
ADVANTAGE
OUTLINE
• About me and Search Party
• Scalable Data Platform: Hadoop and Spark
• Machine Learning: What and Why?
• Two Different Applications of Machine Learning
2
ABOUT ME
• Head of Data Science at Search Party (3 years)
• Software Engineering, Database and BI experience
(15 years)
• Postgrad study in Machine Learning from UNSW (2012)
& B.Comm from Auckland University (1999)
3
ABOUT SEARCH PARTY
4
Recruitment Global
Marketplace
• A marketplace that connects
employers, recruiters and
candidates
• Offices in Australia, UK
and Canada
• Employers search millions
of anonymised candidates
and engage with selected
recruiters to fill a vacancy
SEARCH PARTY MISSION
Leverage cutting edge
technology to evolve recruitment
into an efficient digital process
that provides a better outcome
for all three parties involved -
recruiters, employers and job
seekers.
5
SCALABLE
DATA
PLATFORM
6
6
SEARCH PARTY DATA PLATFORM
7
15 million candidate
resumes
From over 800
recruiters
With over 80 million
role descriptions
Job titles
Job descriptions
Skills
Company names
Universities
Executive summaries
Qualifications
Industries
SCALABLE DATA PLATFORM
8
• Cloudera distribution of Hadoop hosted with Rackspace
• Adopted Spark in 2014 for fast data processing
• One platform for prototype and production algorithms
• Spark has Scala, Java, Python and R language support
MACHINE
LEARNING
9
9
(a branch of artificial intelligence)
WHAT IS MACHINE LEARNING?
10
• Origins in computer science
and statistics
• Algorithms that can learn a
specific task from data
without being explicitly
programmed
• Supervised learning to make
predictions
• Unsupervised learning to
discover patterns
WHY MACHINE LEARNING?
11
• Machine learning provides a
set of tools to extract value
and insights from data
• Creates a barrier to entry with
data driven products and
services
• With more data, machine
learning models can become
more accurate
12
MACHINE
LEARNING
Algorithm 1: Detecting
Duplicate candidates
DETECTING DUPLICATE CANDIDATES
– THE PROBLEM
13
• Why does Search Party need to detect duplicate
candidates?
– Candidates may be represented by multiple recruiters
– We resolve duplicate candidate resumes to a single
candidate in our marketplace
Joe Bloggs resume from Recruiter 1
Joe Bloggs resume from Recruiter 2
Joe Bloggs in the marketplace
DETECTING DUPLICATE CANDIDATES
– APPROACH
14
• The naïve implementation scales
poorly with candidate number
• Several versions of “smart”
implementations
• Final version: custom clustering
algorithm in Spark
– De-duplicate 15M candidates in 9 hours on 8 node cluster
– Algorithm considers full name, email, phone, skills and list of
employers
DETECTING DUPLICATE CANDIDATES
– EXAMPLE
15
Email Full Name Phone Skills Employers
j.blogs@gmail.com Joe Bloggs 0409 123 456 Budgeting
External Audit
Forecasting
General Ledger
Intuit Quicken
Wrigley Company Pty Ltd
PepsiCo Australia Holdings
Austbrokers Holdings Pty
jblogs@hotmail.com Joseph
Bloggs
0409 555 555 Accruals
External Audit
General Ledger
PepsiCo Australia Holdings
Austbrokers Holdings Pty
jblogs@hotmail.com Joe Bloggs +61
0409555555
Accruals
General Ledger
AUSTBROKERS
HOLDINGS PTY LTD
DETECTING DUPLICATE CANDIDATES
– ALGORTIHM
16
1. Tokenise candidate fields e.g. “Dylan” -> dy, yl, la, an
2. Compute vectors from tokens for each candidate field
(using TD-IDF)
3. Fast Canopy Clustering1 on candidate full name only
4. Slower Correlation Clustering2 on each canopy using all
candidate fields
5. Main platform uses results to resolve duplicates
1 McCallum et al (2000) "Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching”
2 Elsner et al (2009) "Bounding and Comparing Methods for Correlation Clustering Beyond ILP"
17
 Solves an important business problem of grouping
duplicate candidates in marketplace search results
 This ability is a barrier to entry for potential competitors
 Improves candidate reporting and analysis
 Coming soon: Allowing recruiters to de-duplicate their
CRM database
HOW DOES CANDIDATE “DEDUPE” GIVE
SEARCH PARTY A COMPETIVE ADVANTAGE?
DEEP
LEARNING
18
18
(a branch of machine learning)
WHAT IS DEEP LEARNING?
19
• Neural networks with many hidden
layers
• Use of GPU processors for
performance
• Requires lots of training data
• Slow to train, fast to make predictions
DEEP
LEARNING
Algorithm 2: Job Title
Resolution
UI
designer
UX
designer
User
Interface
designer
95%
80%
80%
20
21
JOB TITLE RESOLUTION – THE PROBLEM
• 6 million distinct job titles extracted
from resumes
• Most common 25 job titles cover
10% of records
• Want to resolve distinct titles that
make up the long tail
• E.g. “Chief Data Officer - Europe”
with “Interim CDO”
with “CDO - Financial Services”
22
JOB TITLE RESOLUTION – TRAINING DATA
• Training data from a partner who has millions of public
vacancies
• Training example: Raw text vacancy description with
corresponding job title
Vacancy Description
We are seeking a Chief
Data Officer who can
manage standard data
operating procedures,
data quality standards
as well as being able
to understand how to
combine different data
sources within the
organisation with each
other.
Chief Data Officer
23
JOB TITLE RESOLUTION – PREDICTION
Prediction Output: Probabilities for a list of job titles
Chief Data Officer - Asia
Probability Suggested Job Title
80% Chief Data Officer
75% CDO
60% Data Officer
50% Data Quality Officer
40% Senior Data Analyst
24
JOB TITLE RESOLUTION
– PREPARING INPUTS
• Neutral network input must be numerical vectors
representing words
• Create word vectors using Google’s word2vec1
algorithm
• Enables vector arithmetic on words:
1 Mikolov et al (2013) "Efficient Estimation of Word Representations in Vector Space”
King
Queen
+ Woman
- Man
Paris
Warsaw
+ Poland
- France
25
JOB TITLE RESOLUTION
– DEEP LEARNING NETWORK
• Training the network learns parameters that allows it to
to correctly make predictions
• Training uses a vacancy description and correct job title
as well as an incorrect job title
• Adapted into a ranking problem to try and rank the
correct job title above the incorrect one
26
JOB TITLE RESOLUTION
– EXAMPLES
Resolution handles abbreviations, re-orderings, synonyms
and misspellings.
Chief Information Officer - Europe
Chief Information Officer
Software engineer - contract
Software engineer
CIO
Interim Chief
Information Officer
Software Engineer
.NET
Software Engineer II
27
HOW DOES JOB TITLE RESOLUTION GIVE
SEARCH PARTY A COMPETITIVE ADVANTAGE?
 Improvements to our candidate search engine via
improved understanding of user search queries
 Increased chance of a successful candidate placement
for a vacancy
 Data cleansing enables linking records with external
data sources
SUMMARY
• Machine learning algorithms give Search Party the ability to
extract value from our data and build great data products
• These products provide a better outcome for all parties
involved - recruiters, employers and job seekers
• Our scalable machine learning algorithms increase the barrier
to entry for any potential competitors
• As our data grows, supervised machine learning algorithms
improve and this directly benefits our platform
28
KEY TAKEAWAY
Algorithms are as important as Data
and are a key competitive advantage for
your organisation
29
Questions?
Connect with me…
www.linkedin.com/in/dylanhogg
@dylanhogg
www.thesearchparty.com

More Related Content

Similar to Leveraging Machine Learning for Competitive Advantage at Search Party

LMFAO Leveraging Machines for Awesome Outreach
LMFAO  Leveraging Machines for Awesome OutreachLMFAO  Leveraging Machines for Awesome Outreach
LMFAO Leveraging Machines for Awesome OutreachGareth Simpson
 
Building Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data ScientistBuilding Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data ScientistProduct School
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and MLQuantUniversity
 
Hrm activities in a company
Hrm activities in a companyHrm activities in a company
Hrm activities in a companyrinuemma
 
SA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterpriseSA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterpriseDavid Graham
 
Delivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMDelivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMProduct School
 
Digicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdfDigicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdfitsmeankitkhan
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Benjamin Le
 
Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now Cornerstone OnDemand
 
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...AgileNetwork
 
Automated Placement System
Automated Placement SystemAutomated Placement System
Automated Placement SystemIRJET Journal
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
 
Everyday Data Science
Everyday Data ScienceEveryday Data Science
Everyday Data SciencePaul Laughlin
 
Muhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique
 
Lead confluent HQ Dec 2019
Lead   confluent HQ Dec 2019Lead   confluent HQ Dec 2019
Lead confluent HQ Dec 2019Sabri Skhiri
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning India Quotient
 
Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...finalyearproject61
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 

Similar to Leveraging Machine Learning for Competitive Advantage at Search Party (20)

LMFAO Leveraging Machines for Awesome Outreach
LMFAO  Leveraging Machines for Awesome OutreachLMFAO  Leveraging Machines for Awesome Outreach
LMFAO Leveraging Machines for Awesome Outreach
 
Building Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data ScientistBuilding Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data Scientist
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Hrm activities in a company
Hrm activities in a companyHrm activities in a company
Hrm activities in a company
 
SA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterpriseSA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterprise
 
Wims2012
Wims2012Wims2012
Wims2012
 
Delivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMDelivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PM
 
Digicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdfDigicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdf
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
 
Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now
 
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
 
Automated Placement System
Automated Placement SystemAutomated Placement System
Automated Placement System
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
Everyday Data Science
Everyday Data ScienceEveryday Data Science
Everyday Data Science
 
Muhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET Job
 
Lead confluent HQ Dec 2019
Lead   confluent HQ Dec 2019Lead   confluent HQ Dec 2019
Lead confluent HQ Dec 2019
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Leveraging Machine Learning for Competitive Advantage at Search Party

  • 1. LEVERAGING Dylan Hogg - Head of Data Science MACHINE LEARNING FOR COMPETITIVE February 2016 ADVANTAGE
  • 2. OUTLINE • About me and Search Party • Scalable Data Platform: Hadoop and Spark • Machine Learning: What and Why? • Two Different Applications of Machine Learning 2
  • 3. ABOUT ME • Head of Data Science at Search Party (3 years) • Software Engineering, Database and BI experience (15 years) • Postgrad study in Machine Learning from UNSW (2012) & B.Comm from Auckland University (1999) 3
  • 4. ABOUT SEARCH PARTY 4 Recruitment Global Marketplace • A marketplace that connects employers, recruiters and candidates • Offices in Australia, UK and Canada • Employers search millions of anonymised candidates and engage with selected recruiters to fill a vacancy
  • 5. SEARCH PARTY MISSION Leverage cutting edge technology to evolve recruitment into an efficient digital process that provides a better outcome for all three parties involved - recruiters, employers and job seekers. 5
  • 7. SEARCH PARTY DATA PLATFORM 7 15 million candidate resumes From over 800 recruiters With over 80 million role descriptions Job titles Job descriptions Skills Company names Universities Executive summaries Qualifications Industries
  • 8. SCALABLE DATA PLATFORM 8 • Cloudera distribution of Hadoop hosted with Rackspace • Adopted Spark in 2014 for fast data processing • One platform for prototype and production algorithms • Spark has Scala, Java, Python and R language support
  • 9. MACHINE LEARNING 9 9 (a branch of artificial intelligence)
  • 10. WHAT IS MACHINE LEARNING? 10 • Origins in computer science and statistics • Algorithms that can learn a specific task from data without being explicitly programmed • Supervised learning to make predictions • Unsupervised learning to discover patterns
  • 11. WHY MACHINE LEARNING? 11 • Machine learning provides a set of tools to extract value and insights from data • Creates a barrier to entry with data driven products and services • With more data, machine learning models can become more accurate
  • 13. DETECTING DUPLICATE CANDIDATES – THE PROBLEM 13 • Why does Search Party need to detect duplicate candidates? – Candidates may be represented by multiple recruiters – We resolve duplicate candidate resumes to a single candidate in our marketplace Joe Bloggs resume from Recruiter 1 Joe Bloggs resume from Recruiter 2 Joe Bloggs in the marketplace
  • 14. DETECTING DUPLICATE CANDIDATES – APPROACH 14 • The naïve implementation scales poorly with candidate number • Several versions of “smart” implementations • Final version: custom clustering algorithm in Spark – De-duplicate 15M candidates in 9 hours on 8 node cluster – Algorithm considers full name, email, phone, skills and list of employers
  • 15. DETECTING DUPLICATE CANDIDATES – EXAMPLE 15 Email Full Name Phone Skills Employers j.blogs@gmail.com Joe Bloggs 0409 123 456 Budgeting External Audit Forecasting General Ledger Intuit Quicken Wrigley Company Pty Ltd PepsiCo Australia Holdings Austbrokers Holdings Pty jblogs@hotmail.com Joseph Bloggs 0409 555 555 Accruals External Audit General Ledger PepsiCo Australia Holdings Austbrokers Holdings Pty jblogs@hotmail.com Joe Bloggs +61 0409555555 Accruals General Ledger AUSTBROKERS HOLDINGS PTY LTD
  • 16. DETECTING DUPLICATE CANDIDATES – ALGORTIHM 16 1. Tokenise candidate fields e.g. “Dylan” -> dy, yl, la, an 2. Compute vectors from tokens for each candidate field (using TD-IDF) 3. Fast Canopy Clustering1 on candidate full name only 4. Slower Correlation Clustering2 on each canopy using all candidate fields 5. Main platform uses results to resolve duplicates 1 McCallum et al (2000) "Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching” 2 Elsner et al (2009) "Bounding and Comparing Methods for Correlation Clustering Beyond ILP"
  • 17. 17  Solves an important business problem of grouping duplicate candidates in marketplace search results  This ability is a barrier to entry for potential competitors  Improves candidate reporting and analysis  Coming soon: Allowing recruiters to de-duplicate their CRM database HOW DOES CANDIDATE “DEDUPE” GIVE SEARCH PARTY A COMPETIVE ADVANTAGE?
  • 18. DEEP LEARNING 18 18 (a branch of machine learning)
  • 19. WHAT IS DEEP LEARNING? 19 • Neural networks with many hidden layers • Use of GPU processors for performance • Requires lots of training data • Slow to train, fast to make predictions
  • 20. DEEP LEARNING Algorithm 2: Job Title Resolution UI designer UX designer User Interface designer 95% 80% 80% 20
  • 21. 21 JOB TITLE RESOLUTION – THE PROBLEM • 6 million distinct job titles extracted from resumes • Most common 25 job titles cover 10% of records • Want to resolve distinct titles that make up the long tail • E.g. “Chief Data Officer - Europe” with “Interim CDO” with “CDO - Financial Services”
  • 22. 22 JOB TITLE RESOLUTION – TRAINING DATA • Training data from a partner who has millions of public vacancies • Training example: Raw text vacancy description with corresponding job title Vacancy Description We are seeking a Chief Data Officer who can manage standard data operating procedures, data quality standards as well as being able to understand how to combine different data sources within the organisation with each other. Chief Data Officer
  • 23. 23 JOB TITLE RESOLUTION – PREDICTION Prediction Output: Probabilities for a list of job titles Chief Data Officer - Asia Probability Suggested Job Title 80% Chief Data Officer 75% CDO 60% Data Officer 50% Data Quality Officer 40% Senior Data Analyst
  • 24. 24 JOB TITLE RESOLUTION – PREPARING INPUTS • Neutral network input must be numerical vectors representing words • Create word vectors using Google’s word2vec1 algorithm • Enables vector arithmetic on words: 1 Mikolov et al (2013) "Efficient Estimation of Word Representations in Vector Space” King Queen + Woman - Man Paris Warsaw + Poland - France
  • 25. 25 JOB TITLE RESOLUTION – DEEP LEARNING NETWORK • Training the network learns parameters that allows it to to correctly make predictions • Training uses a vacancy description and correct job title as well as an incorrect job title • Adapted into a ranking problem to try and rank the correct job title above the incorrect one
  • 26. 26 JOB TITLE RESOLUTION – EXAMPLES Resolution handles abbreviations, re-orderings, synonyms and misspellings. Chief Information Officer - Europe Chief Information Officer Software engineer - contract Software engineer CIO Interim Chief Information Officer Software Engineer .NET Software Engineer II
  • 27. 27 HOW DOES JOB TITLE RESOLUTION GIVE SEARCH PARTY A COMPETITIVE ADVANTAGE?  Improvements to our candidate search engine via improved understanding of user search queries  Increased chance of a successful candidate placement for a vacancy  Data cleansing enables linking records with external data sources
  • 28. SUMMARY • Machine learning algorithms give Search Party the ability to extract value from our data and build great data products • These products provide a better outcome for all parties involved - recruiters, employers and job seekers • Our scalable machine learning algorithms increase the barrier to entry for any potential competitors • As our data grows, supervised machine learning algorithms improve and this directly benefits our platform 28
  • 29. KEY TAKEAWAY Algorithms are as important as Data and are a key competitive advantage for your organisation 29

Editor's Notes

  1. Final v2 10th Feb.
  2. Talk about team and Dr Jan Luts
  3. Airbnb, Uber
  4. Just think about what we do… we allow employers to…….
  5. We apply it in multipl …….extracted two simple aplications …to give examples of
  6. Why important…anecdote…
  7. Cf dupe customers in crm’s
  8. “Easy for brain hard for If then”