Women In Big Data Meetup 2020, Hosted by LinkedIn

Women in Big Data Meetup
Wednesday, January 22, 2020

Agenda
Ensuring Fairness in Recruiter Search
Keynote
AI for Job Recommendations and Recruiter Search
Fun Facts!
Data and Machine Learning Infrastructure
Measuring impact through Data Science
Q & A +Raffle Prizes
Women in Big Data - Charter

Suja Viswesan
Westcoast Networking Chapter Lead, WiBD
Director, LinkedIn

To champion the success
of women in big data
careers.
Our Mission
Inspire: influence with key partners
Connect: embrace and cultivate a
community of diverse women in big data and
analytics
Grow: develop, educate, support women
today, for the future
Elevate: provide opportunities for
leadership and a forum to celebrate their
successes

Pay Disparity
But the EQUITY GAP is even larger
Women own only 39 cents
to every $1 that men own*
Women earn an average of 80 cents to every
dollar a man earns for the same work. Women of
color and transgender women earn even less
than the average.
Did You Know ?

Be part of the solution!
Join…Volunteer…Partner…Sponsor
Find a local chapter and MeetUp at womeninbigdata.org
Join our LinkedIn group“Women in Big Data Forum”
Follow us @DataWomen on Twitter and
@womeninbigdataglobal on Instagram
Watch our video:
https://youtu.be/6nvst1zaYLU
Join us

Keynote
Suju Rajan
Sr. Director, AI for Enterprise Solutions

Our Vision
Create economic
opportunity for every
member of the global
workforce.

LinkedIn Emerging Jobs Report , Dec 2019

Network Gap
70% get hired at a company with a
connection.
Drivers of connections:
ZipCode >100K median income: 3X
Top School: 2X
Top Company: 2X

Dat
Machine Learning
Infrastructure
1
Members
1
Recruiters Jobs
Online
Infrastructure
Data
Science
Events
Offline
Infrastructure

"I will feel equality has arrived when we can elect
to office women who are as incompetent as some
of the men who are already there."
-Maureen Reagan

Research shows that
in order to apply for
a job women feel
they need to meet
100% of the criteria
while men usually
apply after meeting
about 60%.
LinkedIn Gender Insights Report, 2019

Companies place
tremendous value on
employee referrals
and recruiters report
that they are the top
source of quality
hires.

If women only apply
when they feel
extremely qualified, it
makes sense that
they'd have a higher
success rate — but
this could also
indicate they are not
pursuing stretch
opportunities.

JYMBII
Personalized
Job Recommendations
Suman Sundaresh
Engineering Manager - Relevance Engineering

Dat
Machine Learning
Infrastructure
1
Recruiters
Online
Infrastructure
Data
Science
Online
Events Detection
Offline
Infrastructure
1
Members
Jobs
JYMBII

JYMBII: Jobs You May Be Interested In

Members
• Current Role
• Location
• Interests
• Preferences
• Skills
• Education
• Other Details
Recommendations
Job Matching
Retrieval
& Ranking
Jobs
• Job Title
• Company
• Skills
• Location
• Job Description
• Other Details
Jobs You May Be Interested In (JYMBII)

Retrieval
Subset Selection
• Only a subset of jobs are relevant
Fast Retrieval
Recall
• Focus on retrieving ALL relevant jobs
Precision
• Focus on retrieving ONLY relevant jobs

Subset Selection & Retrieval: Generating a Search Query
Combine
Profile-Job Matching Clauses
User title ~ Job title
User skills ~ Job skills
…
Preference Matching Clauses
Title pref ~ Job title
Location pref ~ Job location
…
Other Matching Clauses
…
Search
Index

Ranking
Surface best matches first
• Optimizing for a specific goal
Complex models for scoring
• e.g. GLMix
Ranking and filtering
• Respond to user feedback

GLMix: Generalized Linear Mixed Models
• Mixture of linear models into an additive model
• Fixed Effect – Population Average Model
• Random Effects – Entity Specific Models
Response Prediction
Entity 1
Random Effect Model
Entity 2
Random Effect Model
Personalization
Job 2
Random Effect Model
Job 1
Random Effect Model
Collaboration
Global Fixed Effect Model
Content-Based Similarity

Personalized Job Recommendations on LinkedIn
Scalability Relevance Experience
Scalability
AI Infrastructure
● Members x Jobs
● Up-to-date
Computation
● Efficiency
● Accuracy
Relevance
Knowledge Representation
● Profiles
● Preferences
● Interests
● Job Descriptions
Matching algorithms
● Retrieval
● Ranking
Experience
Power the job seeking
experience on LinkedIn
● Jobs Tab
● Email
● Home Page
● Company Pages
Measure impact
● Views, Applies
● Member Feedback

Recruiter Search &
Recommendations AI
Tanvi Motwani
Manager, Software Engineering, Machine Learning

Jobs
Dat
Machine Learning
Infrastructure
Online
Infrastructure
Data
Science
Offline
Infrastructure
Online
Events Detection
1
Members
1
Recruiters
Recruiter Search &
Recommendations

Recruiter
• Hiring Company
• Job Skills
• Basic Qualifications
• Preferred Qualifications
• InMails
• Recruiter Search Queries
Member
• Work Experience
• Current Role
• Location
• Interests
• Preferences
• Skills
• Education
• Other Details
Candidates You May Be Interested In
Candidate Retrieval &
Ranking Algorithms
Recruiter Search

1. Multi-objective:
• Recruiters to find relevant candidates.
• Candidates to be interested in the job
opening that the recruiter is sourcing for.
Key
Challenges

Recruiter/Member Actions
Recruiter
Search Sends InMail
Candidate
Accepts

2. Showing few relevant candidates
• Fulfill product requirement of showing only relevant
candidates from a matched set of millions.
Several
Millions
Results
Only
Relevant
Results
Ranking Layers
Key
Challenges

Recruiter Search has layered ranking architecture
Merger Re-ranker
In-memory
Key/Value
Store
Application
Host 1
ML
ModelIndex
Host 2
ML
ModelIndex
Host 3
ML
ModelIndex
L1 Ranker L2 Ranker

Software Engineer
ML Engineer
Big Data Engineer
Physiotherapist
Gynecologist
Chiropractor
Hadoop
Spark
Deep Learning
Keras
TensorFlow
Biosciences
Medicine
3. Semantically Matching
Millions of Entities
Key
Challenges

Timeline
Extensive Feature Engineering

Ensuring Fairness in
Recruiter Search
Jenelle Bray
Sr. Manager, Software Engineering, Machine Learning

Dat
Machine Learning
Infrastructure
Online
Infrastructure
Data
Science
Offline
Infrastructure
Online
Events Detection
1
Members
Jobs
1
Recruiters
Fairness

AI can amplify inherent biases in society
Image results:
"Unprofessional
hair for work"
Image results:
"Professional hair
for work"

Ensuring economic opportunity for every
member of the global workforce
Proactively counter biases in models
In Recruiter Search, our goal is to have the top search results be
representative of the broader qualified candidate set.
Fairness aware, or representative, ranking.

Fairness
Aware
Ranking
Re-rank the set of
candidates from Recruiter
Search AI
Partition the set of potential candidates into different
gender buckets.

Fairness
Aware
Ranking
Re-rank the set of
Search AI
Rank the candidates in each gender bucket according to
the scores assigned by the machine-learned model.

Fairness
Aware
Ranking
Re-rank the set of
Search AI
Merge the gender buckets, while obeying representation
constraints based on the gender proportions computed
from the set of qualified candidates.

Advantages of
this approach
1. Agnostic to the specifics of each model -
scalable across different model choices.
2. Easier to incorporate as part of existing
systems - stand-alone service/component
for post-processing
3. Ensures that the search results presented to
the users of LinkedIn Recruiter are
representative of the underlying talent pool.

Ramped to all Recruiter users
A/B tests show no significant drop in
business metrics when fairness aware rankings
were ramped.

Equality of opportunity can come without
any business costs!

Deepthi Sridharan, Sr. Software Engineer
The online
Infrastructure
that powers JYMBII

Jobs
1
Members
1
Recruiters
Offline
Infrastructure
Dat
Machine Learning
Infrastructure
Data
Science
Online
Infrastructure
Events

How many members does
LinkedIn have globally?

How many open job listings
are on LinkedIn?

How many projects has
LinkedIn Open Sourced?

How many nodes are in
our largest Hadoop
cluster?

Number of Kafka
events per day?

Number of bytes
processed offline per
day?

Creation Ingestion Processing
Creation
• Generated across
globe
• Scalable
• Multiple sources
Access
Processing
• Reliable
• Fault tolerant
• Offline batch and
nearline stream
processing
Ingestion
• Reliable
• High scale
• Low latency
• High throughput
• Offline to
online service
transport
Access
• Durable
persistence
• High availability
• Easily searchable
Data Lifecycle

Messaging
Change-
Capture
Graph
Infra
Galene
Search Infra
Processed
Data
Embedded
library
Source Ingest Process Access
Ingestion
ETL

The offline
Infrastructure
that powers JYMBII
Zoe Lin, Systems and Infrastructure Engineer

Jobs
1
Members
1
Recruiters
Dat
Machine Learning
Infrastructure
Data
Science
Online
Infrastructure
Events
Offline
Infrastructure

Offline Infrastructure Overview
Data Sources Data Ingestion Data Storage Data Management
Oracle DB
Espresso
Venice
Kafka
3rd Party
Service
Gobblin
HDFS
Dali
Datasets

Offline Infrastructure Overview (Cont')
Cluster
Management
Compute
Engines
Workflow
Orchestration
Use cases
A/B Testing
YARN
Azkaban
Machine
Learning
Analytics
Reporting

Case Study: Join Algorithm in Fairness
The Traditional Approach
Scores every joined record in
a ML model
Member
Feature 1
Member
Feature 2
...Member
Feature n
Entity
Features
[Left Table]
Connection
Feature 1
Connection
Feature 2
Connection
Feature n
Pair
Features
[Pair Table]
Member
Feature 1
Member
Feature 2
...Member
Feature n
Entity
Features
[Right Table]
Intermediate
Joined
Table
Final
Joined
Table

Warning: Exploding intermediate data!
The job couldn't finish
Can we perform a 3-way join and score in a single step?
Case Study: Join Algorithm in Fairness (Cont')
The Traditional Approach

M1
M2
M3
N3N2N1
M1N1 M1N2 M1N3
M2N1 M2N2 M2N3
M3N1 M3N2 M3N3
Left
table
Right
table
Pair table
For each pair partition,
join with the corresponding left partition
and the right partition.
Case Study: Join in Fairness (Cont')
The Innovative Approach: 2d Partition Join
Mission Impossible -> Efficient Algorithm!
For each joined record,
apply the scoring function
& output the scorables for future use

Machine Learning
Infrastructure at LinkedIn
Ann Yan
Machine Learning Infrastructure

Offline
Infrastructure
Jobs
1
Members
1
Recruiters
Dat
Data
Science
Online
Infrastructure
Events
Machine Learning
Infrastructure

Lifecycle of a
ML project
Problem statement

Model training &
Evaluation
Steps -> Pipelines
Powerful, Flexible and
Pluggable
Infrastructure
Feature
Provider
Frame
Feature
Provider
Feature
Transformer
Trainer
TensorFlow
trainer
GLMIX
trainer
Data
Analyzer
Model
Analyzer
Quasar
Model
Analyzer
Model
Rewriter
...
Feature Provider Data Aanlyzer
Feature
Transformer
Trainer
Model Rewriter Model Analyzer
Pipeline
Blueprint
Steps and their flavors
(implementation)
Systematic compatible data

Flavored
trainer:
GLMix
More experiments in GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction
GLMix (Open-Sourced!)
• A fixed effect component + multiple random
effects that powers personalization for
recommendations and searched results:
• Online A/B testing compared with the normal
Linear Regression model:
• 20%-40% lift in job application clicks
• consistent 10%-20% lift in job detail views

Feature Provider
Flavor:
Frame
Provides name-
based feature
accessor (a virtual
feature store)
1) Take care of join-and-compute logic in training
2) Take care of fetch-and-compute logic for inference
3) Provide easy feature access by name
4) Feature can be shared across applications by name

Model Flavor:
Quasar
Provides capabilities
of performant feature
transformation,
scoring and ranking
Inference example with TensorFlow

Runs
Executions of
the pipeline
Trackable,
reproduceable

How Data Science Helps
JYMBII Make Decisions
Fiona Li, Sr. Data Scientist
Xiaonan Duan, Data Scientist

Machine Learning
Infrastructure
Offline
Infrastructure
Jobs
1
Members
1
Recruiters
Dat
Online
Infrastructure
EventsData
Science

How Data Science Drives Value at LinkedIn?

We Cover Product Analytics Lifecyle from End-to-End
Ask the Right
Question
• How do we define
success?
Specify Tracking
Needs
• What user behaviors to
track?
Metrics and
Dashboards
• An unified metrics
platform for reporting
and a/b testing
Generate Insights
• A/B Testing, modeling,
deep dive analyses

We Track The Entire Job Seeking Funnel

What do members think of JYMBII?
How Do We Measure Success for JYMBII?
Job Freshness Job Liquidity In-product Rating
% of Jobs Posted X
Days Ago
% of results with no
jobs
% of ratings with
thumbs up

Make Data Driven Decisions Through Experimentation
Our approach to A/B testing is guided by three key philosophies:
Member First
Business Strategy
R&D Ownership
1.
2.
3.

Automate A/B Testing via XLNT Platform
Automation:
Metrics impact, p-value, and
error margin for free
Flexibility:
Easy customization on
segmentation, experiment
unit, and timeframe
1
2

Use Causal Inference to Identify Important Job Attributes
How do we compare two groups when a/b testing does not work?

Learn
Our 2020 journey has just begun...
Connect Be Inspired

Women In Big Data Meetup 2020, Hosted by LinkedIn

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Women In Big Data Meetup 2020, Hosted by LinkedIn

Similar to Women In Big Data Meetup 2020, Hosted by LinkedIn (20)

Recently uploaded

Recently uploaded (20)

Women In Big Data Meetup 2020, Hosted by LinkedIn