SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Pinterest - Big Data Machine Learning Platform at Pinterest
This was presented by the Yongsheng Wu, head of big data and ML platform at Pinterest, at the Alluxio bay area meetup.
Yongsheng shares Pinterest's journey to build a fast and scalable big data and ML platform in AWS for Pinterest to handle the requests and complexity in data at scale. In this talk, he will cover different aspects from the requirements of the platform, the challenges encountered, the technologies chosen, and the tradeoffs that were made.
This was presented by the Yongsheng Wu, head of big data and ML platform at Pinterest, at the Alluxio bay area meetup.
Yongsheng shares Pinterest's journey to build a fast and scalable big data and ML platform in AWS for Pinterest to handle the requests and complexity in data at scale. In this talk, he will cover different aspects from the requirements of the platform, the challenges encountered, the technologies chosen, and the tradeoffs that were made.
3.
Mission
Help people discover and do
what they love.
4.
Scale@Pinterest
Service Scale
• 300M+ MAUs
• 120B+ Pins
• 3B+ Boards
Big Data Scale
• 300+ PB on S3
• 6000+ Hive/Hadoop nodes
• 400+ Presto nodes
• 1000+ Spark nodes
5.
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
6.
Mission
Provide a highly scalable, reliable, secure, performant, efficient and
delightful-to-use big data and machine learning platform to enable rapid
product innovation and help make Pinterest a thriving business.
Vision
A big data and machine learning platform at scale enables every single
engineer at Pinterest to derive trustworthy, actionable insights and
apply ML to solve complex problems with ease and confidence.
7.
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
8.
Principles
● Put engineers first - make the platform delightful-to-use for all
engineers at Pinterest
● Keep it simple, get it right - build a simple yet sufficient
platform
● Enable speed and quality - enable all engineers at Pinterest to
move fast with scalable, reliable, secure, performant and efficient
solutions made easy by the platform
● Build with reusability and for reusability - embrace open
source technology, build with lego blocks and provide lego blocks to
all engineers at Pinterest
9.
9
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
10.
Big Data Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
12.
Feature Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
13.
Pinterest’s data graph: Pin/Image/Board/User...
xJoin
pin’s text
image
info
video
info
texts
text
languages
text
scores
SEO
signa
l
link
languagelink
country
link perf
link scores
safe
search
spam
visual
signal
catvec_v0
pin’s catvec_v0
catvec_v1
pin’s catvec_v1
topicvec_v4
pin’s topicvec_v4
country
vecs
text
tokens
landing
page
annot_embedding v3
annotation_v2
annotation_v3
annotation_v4
Feature Platform - Today
14.
code
module
developer
retrieval API, serving, acl, ...
offline consumers
(ML model training)
online consumers
(ML model serving)
Signal Access & Serving
spec
metadata
code
module
developer
spec
metadata
code
module
developer
spec
metadata
Galaxy: next-gen feature platform
* incremental dataflow execution engine
* signal data store (“column”-partitioned) and metadata repo (registry, stats)
* dependency management
* governance: enforcement & tracking
Metadata-driven framework & dev API
ML Platform
BDP BDP
15.
ML Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
18.
Response Prediction Use Cases at Pinterest
● Discovery
○ Home Feed: time-ordered following feed to ML based recommendation feed
○ Related Pins, Search: heuristic to ML ranking
● Ads
○ gCTR, CPI, CVR
● Growth
○ Notifications, NUX topics
● Content
○ Content comprehension
● Shopping
○ CTR prediction
● Protect
○ Spam & Porn, ATO
● … ...
19.
Response prediction ML at Pinterest
Surfaces 2014:
Home feed
ranking;
Ads ranking
2015:
Related Pins
ranking
2016:
Search
ranking;
Notifications
ranking
2017:
Spam
detection
2018:
NUX topics;
Ads retrieval
Scale < 10 serving
hosts;
Training on
laptop
2500+ serving
hosts;
Training on
clusters
20.
Configuration
Data
Verification
Feature Extraction
Process
Management Tools
Data
Collection
ML
Code Analytics Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
&
Alerting
Hidden Technical Debt in Machine Learning Systems
David Sculley et al., Google, NIPS 2015
21.
Much more complex in practice
Learner 1
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction 1
Related Pins Ads Home Feed
Learner 2
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 2
Learner 3
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 3
Distributed
Training
Distributed
Training
Similar components, no sharing!
Incomplete stacks
22.
Unified ML Platform
Learner
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction
Related Pins Ads Home Feed
Data
Monitoring
Distributed
Training
Client teams focus on business problems, not infra problems.
Search
NUX Topic Picker
Notifications
New use cases
Platform team specializes in
infra problems.
Quick to build new
ML applications.
23.
Unified Big Data ML Platform
● Speed & quality
● Single Use Case
○ 0 -> 1 made fast, easy and robust - create a ML model
to solve a complex problem
○ 1 -> N made automated - such a ML model continuously
trained, improved, and deployed
● Many Use Cases on the Platform
○ N -> N2 - most of ML models trained and served by the platform
24.
24
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
25.
Scorpion Training & Catwalk
Catwalk: enables running training jobs on
distributed cluster
Tensorflow XGBoost
Mesos: Cluster resource
management (CPUs, RAM,
GPUs)
Kubernetes:
to replace Mesos in
2018
Scorpion Training
Abstracts user from specific trainer package used.
future: other
packages
runs on
28.
Linchpin - Easy Feature Definition
Declarative language for using common
feature extraction logic.
● Single implementation for both serving
& training.
● Heavily optimized.
Generic "Match"
Implementation
Interest
Match
Annotation
Match
reuses
pin <- source(TAG="pin", OUTPUTS="p", TYPE="PinJoinRawData")
user <- source(TAG="user", OUTPUTS="u", TYPE="UserJoinRawData")
cat_match <- match(INPUTS=[user.u.categoryVec, pin.p.categoryVec],
MATCH_TYPE="COSINE_SIM")
topic_match <- match(INPUTS=[user.u.topicVec, pin.p.topicVec], ...)
features <- union(INPUTS=[cat_match, topic_match, ...])
29.
Confidential
Corpus
Root
Query
understanding
Leaf Leaf Leaf
Searchable
doc
index
builder
index
Indexing
pipeline
model
training
pipeline
models
Cache
Mixer
Cache
Reranker
Feature log
Merger
corpus
Fresh
corpus
streaming
pipeline
index builder
fresh index
Fresh index
dispatcher
Perdoc
data
dispatc
her
Searchable
doc
Planner
Muse
30.
Pixie: Graph walks
● The greatest asset of Pinterest is our pin-to-board graph
○ It captures relationships between pins (how objects are organized into collections)
○ Can be used to capture multiple different interactions: pins to boards, clicks by user,...
● We use Pixie for candidate generation: How to quickly go from 2B pins to 1k
pins so that ML models can then score each pin separately
● Represent user a (set of) pin(s) Q and do a random walk from Q:
○ Bias the walk towards fresh pins, Pins in the local user’s language, Pins that males/females like
32.
32
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
33.
● [Product Enablement] Streaming engines
○ Spark Structured Streaming
○ Flink
○ … ...
● [Scalability] Spinner - next gen workflow engine
● [Performance] Hive on Tez
● [Efficiency] Hadoop auto-scaling
● [Future Proofing] Spark on Kubernetes
● [Future Proofing] Hadoop 3.0
Big Data Platform
34.
code
module
developer
retrieval API, serving, acl, ...
offline consumers
(ML model training)
online consumers
(ML model serving)
Signal Access & Serving
spec
metadata
code
module
developer
spec
metadata
code
module
developer
spec
metadata
Galaxy: next-gen feature platform
* incremental dataflow execution engine
* signal data store (“column”-partitioned) and metadata repo (registry, stats)
* dependency management
* governance: enforcement & tracking
Metadata-driven framework & dev API
ML Platform
BDP BDP
35.
ML Platform
Learner
Model Eval &
Comparison
Data
Monitoring
Feature
Analysis
Parameter
Autotunin
g
Model
Serving
Logging
Developer Frontend
off-the-shelf
solutions:
Tensorflow ...
Scorpion
Serving
Scorpion
Training
Incremental & Real-Time Training Automation
Model
Deploy
Linchpin DSL
Model Version
Management
Feature
Extraction
Real-time
Feature Sources
Counting
Service
ML Serving Systems
ML Training Platform
Team key:
Model Runtime
Validation
36.
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
37.
Key Learnings
● Unified big data ML platform greatly accelerates
product innovations
● Data lineage, quality and democracy are vital to
organization scalability
● Speed, quality & delightful-to-use