This is the slide deck for the Women in Big Data Meetup, hosted by LinkedIn in Jan 2020.
The meetup focuses on how LinkedIn powers its job recommendations, with sub-talks addressing different aspects of it, including AI, Fairness, Infra, and Data Science.
The event page is here: http://wibd2020.splashthat.com/ref
The meetup recording is available on Youtube and can be found here: https://www.youtube.com/watch?v=M6Qs5A4fkfc&t
Introduction to Prompt Engineering (Focusing on ChatGPT)
Â
Women In Big Data Meetup 2020, Hosted by LinkedIn
1.
2. Women in Big Data Meetup
Wednesday, January 22, 2020
3. Agenda
Ensuring Fairness in Recruiter Search
Keynote
AI for Job Recommendations and Recruiter Search
Fun Facts!
Data and Machine Learning Infrastructure
Measuring impact through Data Science
Q & A +Raffle Prizes
Women in Big Data - Charter
5. To champion the success
of women in big data
careers.
Our Mission
Inspire: influence with key partners
Connect: embrace and cultivate a
community of diverse women in big data and
analytics
Grow: develop, educate, support women
today, for the future
Elevate: provide opportunities for
leadership and a forum to celebrate their
successes
8. Pay Disparity
But the EQUITY GAP is even larger
Women own only 39 cents
to every $1 that men own*
Women earn an average of 80 cents to every
dollar a man earns for the same work. Women of
color and transgender women earn even less
than the average.
Did You Know ?
9. Be part of the solution!
Join…Volunteer…Partner…Sponsor
Find a local chapter and MeetUp at womeninbigdata.org
Join our LinkedIn group“Women in Big Data Forum”
Follow us @DataWomen on Twitter and
@womeninbigdataglobal on Instagram
Watch our video:
https://youtu.be/6nvst1zaYLU
Join us
18. "I will feel equality has arrived when we can elect
to office women who are as incompetent as some
of the men who are already there."
-Maureen Reagan
19. Research shows that
in order to apply for
a job women feel
they need to meet
100% of the criteria
while men usually
apply after meeting
about 60%.
LinkedIn Gender Insights Report, 2019
20. Companies place
tremendous value on
employee referrals
and recruiters report
that they are the top
source of quality
hires.
LinkedIn Gender Insights Report, 2019
21. If women only apply
when they feel
extremely qualified, it
makes sense that
they'd have a higher
success rate — but
this could also
indicate they are not
pursuing stretch
opportunities.
LinkedIn Gender Insights Report, 2019
26. Members
• Current Role
• Location
• Interests
• Preferences
• Skills
• Education
• Other Details
Recommendations
Job Matching
Retrieval
& Ranking
Jobs
• Job Title
• Company
• Skills
• Location
• Job Description
• Other Details
Jobs You May Be Interested In (JYMBII)
27. Retrieval
Subset Selection
• Only a subset of jobs are relevant
Fast Retrieval
Recall
• Focus on retrieving ALL relevant jobs
Precision
• Focus on retrieving ONLY relevant jobs
28. Subset Selection & Retrieval: Generating a Search Query
Combine
Profile-Job Matching Clauses
User title ~ Job title
User skills ~ Job skills
…
Preference Matching Clauses
Title pref ~ Job title
Location pref ~ Job location
…
Other Matching Clauses
…
Search
Index
29. Ranking
Surface best matches first
• Optimizing for a specific goal
Complex models for scoring
• e.g. GLMix
Ranking and filtering
• Respond to user feedback
30. GLMix: Generalized Linear Mixed Models
• Mixture of linear models into an additive model
• Fixed Effect – Population Average Model
• Random Effects – Entity Specific Models
Response Prediction
Entity 1
Random Effect Model
Entity 2
Random Effect Model
Personalization
Job 2
Random Effect Model
Job 1
Random Effect Model
Collaboration
Global Fixed Effect Model
Content-Based Similarity
31. Personalized Job Recommendations on LinkedIn
Scalability Relevance Experience
Scalability
AI Infrastructure
â—Ź Members x Jobs
â—Ź Up-to-date
Computation
â—Ź Efficiency
â—Ź Accuracy
Relevance
Knowledge Representation
â—Ź Profiles
â—Ź Preferences
â—Ź Interests
â—Ź Job Descriptions
Matching algorithms
â—Ź Retrieval
â—Ź Ranking
Experience
Power the job seeking
experience on LinkedIn
â—Ź Jobs Tab
â—Ź Email
â—Ź Home Page
â—Ź Company Pages
Measure impact
â—Ź Views, Applies
â—Ź Member Feedback
38. 1. Multi-objective:
• Recruiters to find relevant candidates.
• Candidates to be interested in the job
opening that the recruiter is sourcing for.
Key
Challenges
40. 2. Showing few relevant candidates
• Fulfill product requirement of showing only relevant
candidates from a matched set of millions.
Several
Millions
Results
Only
Relevant
Results
Ranking Layers
Key
Challenges
41. Recruiter Search has layered ranking architecture
Merger Re-ranker
In-memory
Key/Value
Store
Application
Host 1
ML
ModelIndex
Host 2
ML
ModelIndex
Host 3
ML
ModelIndex
L1 Ranker L2 Ranker
42. Software Engineer
ML Engineer
Big Data Engineer
Physiotherapist
Gynecologist
Chiropractor
Hadoop
Spark
Deep Learning
Keras
TensorFlow
Biosciences
Medicine
3. Semantically Matching
Millions of Entities
Key
Challenges
47. AI can amplify inherent biases in society
Image results:
"Unprofessional
hair for work"
Image results:
"Professional hair
for work"
48. Ensuring economic opportunity for every
member of the global workforce
Proactively counter biases in models
In Recruiter Search, our goal is to have the top search results be
representative of the broader qualified candidate set.
Fairness aware, or representative, ranking.
50. Fairness
Aware
Ranking
Re-rank the set of
candidates from Recruiter
Search AI
Rank the candidates in each gender bucket according to
the scores assigned by the machine-learned model.
51. Fairness
Aware
Ranking
Re-rank the set of
candidates from Recruiter
Search AI
Merge the gender buckets, while obeying representation
constraints based on the gender proportions computed
from the set of qualified candidates.
52. Advantages of
this approach
1. Agnostic to the specifics of each model -
scalable across different model choices.
2. Easier to incorporate as part of existing
systems - stand-alone service/component
for post-processing
3. Ensures that the search results presented to
the users of LinkedIn Recruiter are
representative of the underlying talent pool.
53. Ramped to all Recruiter users
A/B tests show no significant drop in
business metrics when fairness aware rankings
were ramped.
68. Offline Infrastructure Overview
Data Sources Data Ingestion Data Storage Data Management
Oracle DB
Espresso
Venice
Kafka
3rd Party
Service
Gobblin
HDFS
Dali
Datasets
70. Case Study: Join Algorithm in Fairness
The Traditional Approach
Scores every joined record in
a ML model
Member
Feature 1
Member
Feature 2
...Member
Feature n
Entity
Features
[Left Table]
Connection
Feature 1
Connection
Feature 2
Connection
Feature n
Pair
Features
[Pair Table]
Member
Feature 1
Member
Feature 2
...Member
Feature n
Entity
Features
[Right Table]
Intermediate
Joined
Table
Final
Joined
Table
71. Warning: Exploding intermediate data!
The job couldn't finish
Can we perform a 3-way join and score in a single step?
Case Study: Join Algorithm in Fairness (Cont')
The Traditional Approach
72. M1
M2
M3
N3N2N1
M1N1 M1N2 M1N3
M2N1 M2N2 M2N3
M3N1 M3N2 M3N3
Left
table
Right
table
Pair table
For each pair partition,
join with the corresponding left partition
and the right partition.
Case Study: Join in Fairness (Cont')
The Innovative Approach: 2d Partition Join
Mission Impossible -> Efficient Algorithm!
For each joined record,
apply the scoring function
& output the scorables for future use
76. Model training &
Evaluation
Steps -> Pipelines
Powerful, Flexible and
Pluggable
Infrastructure
Feature
Provider
Frame
Feature
Provider
Feature
Transformer
Trainer
TensorFlow
trainer
GLMIX
trainer
Data
Analyzer
Model
Analyzer
Quasar
Model
Analyzer
Model
Rewriter
...
Feature Provider Data Aanlyzer
Feature
Transformer
Trainer
Model Rewriter Model Analyzer
Pipeline
Blueprint
Steps and their flavors
(implementation)
Systematic compatible data
77. Flavored
trainer:
GLMix
More experiments in GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction
GLMix (Open-Sourced!)
• A fixed effect component + multiple random
effects that powers personalization for
recommendations and searched results:
• Online A/B testing compared with the normal
Linear Regression model:
• 20%-40% lift in job application clicks
• consistent 10%-20% lift in job detail views
79. Feature Provider
Flavor:
Frame
Provides name-
based feature
accessor (a virtual
feature store)
1) Take care of join-and-compute logic in training
2) Take care of fetch-and-compute logic for inference
3) Provide easy feature access by name
4) Feature can be shared across applications by name
85. We Cover Product Analytics Lifecyle from End-to-End
Ask the Right
Question
• How do we define
success?
Specify Tracking
Needs
• What user behaviors to
track?
Metrics and
Dashboards
• An unified metrics
platform for reporting
and a/b testing
Generate Insights
• A/B Testing, modeling,
deep dive analyses
87. What do members think of JYMBII?
How Do We Measure Success for JYMBII?
Job Freshness Job Liquidity In-product Rating
% of Jobs Posted X
Days Ago
% of results with no
jobs
% of ratings with
thumbs up
88. Make Data Driven Decisions Through Experimentation
Our approach to A/B testing is guided by three key philosophies:
Member First
Business Strategy
R&D Ownership
1.
2.
3.
89. Automate A/B Testing via XLNT Platform
Automation:
Metrics impact, p-value, and
error margin for free
Flexibility:
Easy customization on
segmentation, experiment
unit, and timeframe
1
2
90. Use Causal Inference to Identify Important Job Attributes
How do we compare two groups when a/b testing does not work?