ML Platform Q1 Meetup: An introduction to LinkedIn's Ranking and Federation Libraries

Quasar and ReMix
An introduction to LinkedIn's Ranking and Federation libraries
Andris Birkmanis & Lance Wall
1

Relevance: Verticals & Infrastructure
2
Relevance
Isolated ML
models
Integrated ML
models
Relevance
Infra
Relevance
Verticals
Deployed ML
services
ML algos Scoring
and Ranking
Tools
Relevance
service platform
Quasar ReMix

Quasar
Quick Scoring and Ranking
Our mission is making efficient feature transformation, scoring, and ranking simple.
3

Scoring
• Scoring
• Scorables
• Features
• Feature Transformation
4

Ranking
• Sorting
• TopK
• Filtering
• Group by
• Distinct
• Union
• Custom
5

Relevance Models: DAGs of Computations
Filter BY
interest-match
> 0.5
Filter BY
skill-match
> 0.7
TOP 50 BY
content-match
All
Documents
member
interest category
interest-
match-
score
news
feed
skill content
skill-
match-
score
content-
match-
score
10,000
500 500
50

ML and Training
• Tracking training dependencies between ML models
• Integrating with training engines via Training API
• Automatic type conversion for features and model parameters
• Reuse of feature transformations between training and prediction
7

Quasar Components
• Domain Specific Language (DSL)
▪ Oriented towards scoring and ranking concepts
▪ Supports various machine learning models
▪ Supports various ranking operators
▪ Supports pluggable feature transformers
▪ Supports arithmetical and logical expressions
• Library
▪ Includes out-of-box feature transformers tuned for performance (dense/sparse vectors, bags of
words, etc.)
▪ Extensible with custom transformers and ranking operators
• Execution engine
▪ Supports multiple evaluation strategies for different objectives (lazy/eager/batching/etc.)
▪ Debuggability, logging, and other cross-cutting concerns
▪ API for scoring, ranking, read/write access to features, training

• LinkedIn Relevance Products
▪ Feed, Recommendation, Search
• Adoption
▪ 1000+ Quasar models
Project Status

Future directions
• Better training support for external models (XGBoost, Tensorflow)
• Making feature transformers and operators more reusable
• Better type information
• Standardized storage formats for features and model parameters
• See the upcoming LinkedIn engineering blog for technical details
10

Example relevance workflows at LinkedIn
Member ID
Fetch
Member
Profile
Fetch
Member
Profile
Compute
Job
Recommendations
Compute
People
Recommendations
Format
Results
Member ID
Format
Results

Motivation
• Multitenant relevance workflow services with tens of engineers on
multiple teams contributing
• Each relevance workflow service has different APIs and conventions
• Lack of abstraction of system-level concerns from application logic
• Diminished productivity, operability, and leverage

ReMix’s Mission
Provide an easy to use platform for building relevance services
with a focus on optimizing leverage and automating common
operability concerns.

Design Goals
• Consolidation of various relevance service stacks
• Ease of support
• Ease of development
• Ease of operation

Features of ReMix
• Leverages ParSeq for easy asynchronous I/O
• Exposes declarative API for composing workflows
• Provides automated monitoring instrumentation and tooling
• Provides robust, extensible solutions for common workflow
functionality
• Provides isolation and robustness to downstream instability

How does ReMix work?
Operator
is assembled into
Workflow
is submitted to
WorkflowEngine

Operator
• Modular functional component of a Workflow
• ReMix provides Operators for common functionality
• ReMix provides decorative interfaces for common optimizations
• ReMix provides generic support for asynchronous execution

Workflow
• Declaration of deferred execution
• Easy to understand declarative language
• Leverages ParSeq and exposes a simpler API
• Abstraction of execution behavior & optimizations
• Independent of environment or service (i.e. portable)

WorkflowEngine
• Executor of Workflows
• Translates Workflows to ParSeq Tasks
• Provides special considerations for async/RPC operations
• Provides common operability functionality

Project Status & Planned Work
• ReMix adopters include job recommendations and blended search
• Working on integration with Quasar
▪ Complete solution for model serving from offline to online
• ReMix Cloud
▪ Simple toolkit/UI for creating a Workflow and deploying it to production
▪ Hosts Workflows in a managed service, with little to no operational cost to
Workflow developers
▪ Increased leverage due to reuse of common components in multitenant platform

Quasar Model
26
MODELID "feed_quasar";
DOCPARAM com.linkedin.feed.FeedItem feedItem;
REQUEST PARAM Profile member;
REQUEST FEATURE VECTOR interests = GetInterests(member);
DOCUMENT FEATURE VECTOR categories = GetCategories(feedItem);
DOCUMENT FEATURE LONG publishedTime = GetPublishedTime(feedItem);
MODEL PARAM timeBuckets = { "1hr" : 60, "3hr" : 180 };
DOCUMENT FEATURE VECTOR normalizedTime = Bucketize(diffTime, timeBuckets);
DOCUMENT FEATURE VECTOR interestMatch = Similarity(interests, categories);
MODEL PARAM MAP<STRING, OBJECT> modelWeights = {
“normalizedTime”: { “1hr”: 0.234, “3hr”: 0.456, “Other”:0.21 }, “interestMatch”: 0.823 };
DOCUMENT FEATURE FLOAT score = LinearScore(modelWeights, “sigmoid”);
DOCUMENT FEATURE BOOLEAN aboveThreshold = score > 0.5
filteredFeed = FILTER DOCUMENTS BY aboveThreshold;
rerankedFeed = ORDER filteredFeed BY score WITH DESC;
RETURN rerankedFeed;

Candidate list of documents
Filter Documents
getInterest
s
getCateg
ories
getPublish
edTime
getSimilari
ty
Bucketize
LinearSco
re
getCateg
ories
getPublish
edTime
getSimilari
ty
Bucketize
LinearSco
re
getCateg
ories
getPublish
edTime
getSimilari
ty
Bucketize
LinearSco
re
getCateg
ories
getPublish
edTime
getSimilari
ty
Bucketize
LinearSco
re
1 3 4Request
1 3 4
3 1 4
Order
Documents
Pass 1
2
Pass 2
Decision
Tree
LinearSc
ore
Decision
Tree
LinearSc
ore
getVie
wTimes
Bucke
tize
Decision
Tree
LinearSc
ore
The multipass ensemble
model
at runtime

Vector Math and Expression Support
• Vector as first class citizen in DSL
• State-of-art Java Vector implementation
▪ Compact and efficient data structure
▪ Efficient Vector math computation
C++
Java
Networ
k
1.0
1.0
3.0
1.0
Linux
Member/Job
Similarity
Score
=
member.skill
Hadoop
Scala
Gradle
2.0
1.0
2.0
job.required_skill
dot
product

ML Platform Q1 Meetup: An introduction to LinkedIn's Ranking and Federation Libraries

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ML Platform Q1 Meetup: An introduction to LinkedIn's Ranking and Federation Libraries

Similar to ML Platform Q1 Meetup: An introduction to LinkedIn's Ranking and Federation Libraries (20)

Recently uploaded

Recently uploaded (20)

ML Platform Q1 Meetup: An introduction to LinkedIn's Ranking and Federation Libraries