SlideShare a Scribd company logo
Big Data ML Platform at Pinterest
Yongsheng Wu
Pinterest: pinterest.com/yswu
LinkedIn: linkedin.com/in/yongshengwu
Twitter: @yswu
06/17/2019
Pinterest :
The World’s Catalog of Ideas
Mission
Help people discover and do
what they love.
Scale@Pinterest
Service Scale
• 300M+ MAUs
• 120B+ Pins
• 3B+ Boards
Big Data Scale
• 300+ PB on S3
• 6000+ Hive/Hadoop nodes
• 400+ Presto nodes
• 1000+ Spark nodes
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Mission
Provide a highly scalable, reliable, secure, performant, efficient and
delightful-to-use big data and machine learning platform to enable rapid
product innovation and help make Pinterest a thriving business.
Vision
A big data and machine learning platform at scale enables every single
engineer at Pinterest to derive trustworthy, actionable insights and
apply ML to solve complex problems with ease and confidence.
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Principles
● Put engineers first - make the platform delightful-to-use for all
engineers at Pinterest
● Keep it simple, get it right - build a simple yet sufficient
platform
● Enable speed and quality - enable all engineers at Pinterest to
move fast with scalable, reliable, secure, performant and efficient
solutions made easy by the platform
● Build with reusability and for reusability - embrace open
source technology, build with lego blocks and provide lego blocks to
all engineers at Pinterest
9
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Big Data Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
Big Data Platform
Feature Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
Pinterest’s data graph: Pin/Image/Board/User...
xJoin
pin’s text
image
info
video
info
texts
text
languages
text
scores
SEO
signa
l
link
languagelink
country
link perf
link scores
safe
search
spam
visual
signal
catvec_v0
pin’s catvec_v0
catvec_v1
pin’s catvec_v1
topicvec_v4
pin’s topicvec_v4
country
vecs
text
tokens
landing
page
annot_embedding v3
annotation_v2
annotation_v3
annotation_v4
Feature Platform - Today
code
module
developer
retrieval API, serving, acl, ...
offline consumers
(ML model training)
online consumers
(ML model serving)
Signal Access & Serving
spec
metadata
code
module
developer
spec
metadata
code
module
developer
spec
metadata
Galaxy: next-gen feature platform
* incremental dataflow execution engine
* signal data store (“column”-partitioned) and metadata repo (registry, stats)
* dependency management
* governance: enforcement & tracking
Metadata-driven framework & dev API
ML Platform
BDP BDP
ML Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
Response prediction ML
Serving
TrainingProfiles
Users, Pins, Boards
Logs
events
content
Visual ML
Response Prediction Use Cases at Pinterest
● Discovery
○ Home Feed: time-ordered following feed to ML based recommendation feed
○ Related Pins, Search: heuristic to ML ranking
● Ads
○ gCTR, CPI, CVR
● Growth
○ Notifications, NUX topics
● Content
○ Content comprehension
● Shopping
○ CTR prediction
● Protect
○ Spam & Porn, ATO
● … ...
Response prediction ML at Pinterest
Surfaces 2014:
Home feed
ranking;
Ads ranking
2015:
Related Pins
ranking
2016:
Search
ranking;
Notifications
ranking
2017:
Spam
detection
2018:
NUX topics;
Ads retrieval
Scale < 10 serving
hosts;
Training on
laptop
2500+ serving
hosts;
Training on
clusters
Configuration
Data
Verification
Feature Extraction
Process
Management Tools
Data
Collection
ML
Code Analytics Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
&
Alerting
Hidden Technical Debt in Machine Learning Systems
David Sculley et al., Google, NIPS 2015
Much more complex in practice
Learner 1
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction 1
Related Pins Ads Home Feed
Learner 2
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 2
Learner 3
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 3
Distributed
Training
Distributed
Training
Similar components, no sharing!
Incomplete stacks
Unified ML Platform
Learner
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction
Related Pins Ads Home Feed
Data
Monitoring
Distributed
Training
Client teams focus on business problems, not infra problems.
Search
NUX Topic Picker
Notifications
New use cases
Platform team specializes in
infra problems.
Quick to build new
ML applications.
Unified Big Data ML Platform
● Speed & quality
● Single Use Case
○ 0 -> 1 made fast, easy and robust - create a ML model
to solve a complex problem
○ 1 -> N made automated - such a ML model continuously
trained, improved, and deployed
● Many Use Cases on the Platform
○ N -> N2 - most of ML models trained and served by the platform
24
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Scorpion Training & Catwalk
Catwalk: enables running training jobs on
distributed cluster
Tensorflow XGBoost
Mesos: Cluster resource
management (CPUs, RAM,
GPUs)
Kubernetes:
to replace Mesos in
2018
Scorpion Training
Abstracts user from specific trainer package used.
future: other
packages
runs on
Catwalk
Mesos
Master
Caffe GPU
SciPy
MXNet
KerasCaffe
TensorFlow
TFMesosServer
Param
Server
Update
gradients
Chronos/Aurora
TFMesos
TFMesos
Torch
TFMesosServer
Worker
TFMesosServer
Worker
Chronos/
Aurora
PinBall
Legend
Mesos Agents
Scorpion Serving
Linchpin - Easy Feature Definition
Declarative language for using common
feature extraction logic.
● Single implementation for both serving
& training.
● Heavily optimized.
Generic "Match"
Implementation
Interest
Match
Annotation
Match
reuses
pin <- source(TAG="pin", OUTPUTS="p", TYPE="PinJoinRawData")
user <- source(TAG="user", OUTPUTS="u", TYPE="UserJoinRawData")
cat_match <- match(INPUTS=[user.u.categoryVec, pin.p.categoryVec],
MATCH_TYPE="COSINE_SIM")
topic_match <- match(INPUTS=[user.u.topicVec, pin.p.topicVec], ...)
features <- union(INPUTS=[cat_match, topic_match, ...])
Confidential
Corpus
Root
Query
understanding
Leaf Leaf Leaf
Searchable
doc
index
builder
index
Indexing
pipeline
model
training
pipeline
models
Cache
Mixer
Cache
Reranker
Feature log
Merger
corpus
Fresh
corpus
streaming
pipeline
index builder
fresh index
Fresh index
dispatcher
Perdoc
data
dispatc
her
Searchable
doc
Planner
Muse
Pixie: Graph walks
● The greatest asset of Pinterest is our pin-to-board graph
○ It captures relationships between pins (how objects are organized into collections)
○ Can be used to capture multiple different interactions: pins to boards, clicks by user,...
● We use Pixie for candidate generation: How to quickly go from 2B pins to 1k
pins so that ML models can then score each pin separately
● Represent user a (set of) pin(s) Q and do a random walk from Q:
○ Bias the walk towards fresh pins, Pins in the local user’s language, Pins that males/females like
Pixie Architecture Diagram
32
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
● [Product Enablement] Streaming engines
○ Spark Structured Streaming
○ Flink
○ … ...
● [Scalability] Spinner - next gen workflow engine
● [Performance] Hive on Tez
● [Efficiency] Hadoop auto-scaling
● [Future Proofing] Spark on Kubernetes
● [Future Proofing] Hadoop 3.0
Big Data Platform
code
module
developer
retrieval API, serving, acl, ...
offline consumers
(ML model training)
online consumers
(ML model serving)
Signal Access & Serving
spec
metadata
code
module
developer
spec
metadata
code
module
developer
spec
metadata
Galaxy: next-gen feature platform
* incremental dataflow execution engine
* signal data store (“column”-partitioned) and metadata repo (registry, stats)
* dependency management
* governance: enforcement & tracking
Metadata-driven framework & dev API
ML Platform
BDP BDP
ML Platform
Learner
Model Eval &
Comparison
Data
Monitoring
Feature
Analysis
Parameter
Autotunin
g
Model
Serving
Logging
Developer Frontend
off-the-shelf
solutions:
Tensorflow ...
Scorpion
Serving
Scorpion
Training
Incremental & Real-Time Training Automation
Model
Deploy
Linchpin DSL
Model Version
Management
Feature
Extraction
Real-time
Feature Sources
Counting
Service
ML Serving Systems
ML Training Platform
Team key:
Model Runtime
Validation
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Key Learnings
● Unified big data ML platform greatly accelerates
product innovations
● Data lineage, quality and democracy are vital to
organization scalability
● Speed, quality & delightful-to-use
Pinterest - Big Data Machine Learning Platform at Pinterest

More Related Content

What's hot

Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
 
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWSAWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
Amazon Web Services Korea
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Data Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its EcosystemData Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its Ecosystem
Databricks
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
Erik Bernhardsson
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Databricks
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
Databricks
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
Herman Wu
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
Amazon Web Services
 
Centralized logging
Centralized loggingCentralized logging
Centralized logging
blessYahu
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
MLflow at Company Scale
MLflow at Company ScaleMLflow at Company Scale
MLflow at Company Scale
Databricks
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
Databricks
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
Siddharth Teotia
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Amazon Web Services
 

What's hot (20)

Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWSAWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Data Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its EcosystemData Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its Ecosystem
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Centralized logging
Centralized loggingCentralized logging
Centralized logging
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
 
MLflow at Company Scale
MLflow at Company ScaleMLflow at Company Scale
MLflow at Company Scale
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 

Similar to Pinterest - Big Data Machine Learning Platform at Pinterest

Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?
IDEAS - Int'l Data Engineering and Science Association
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...
Michael Li
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
Sri Ambati
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
Venkatesh Umaashankar
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
Turi, Inc.
 
The Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptxThe Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptx
SynergisticIT
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Kai Wähner
 
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane FineBuilding Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
MongoDB
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Debmalya Biswas
 
Mohit Kalra 25th August
Mohit Kalra 25th AugustMohit Kalra 25th August
Mohit Kalra 25th August
mdk8989
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
WeCloudData
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
VMware Tanzu
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Sri Ambati
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 
Power BI storytelling 101
Power BI storytelling 101Power BI storytelling 101
Power BI storytelling 101
Ida Bergum
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
Andy Lathrop
 

Similar to Pinterest - Big Data Machine Learning Platform at Pinterest (20)

Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
The Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptxThe Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptx
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane FineBuilding Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
 
Mohit Kalra 25th August
Mohit Kalra 25th AugustMohit Kalra 25th August
Mohit Kalra 25th August
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Power BI storytelling 101
Power BI storytelling 101Power BI storytelling 101
Power BI storytelling 101
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 

More from Alluxio, Inc.

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
Alluxio, Inc.
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Alluxio, Inc.
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Alluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
 

More from Alluxio, Inc. (20)

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 

Recently uploaded

KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
Tier1 app
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 

Recently uploaded (20)

KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 

Pinterest - Big Data Machine Learning Platform at Pinterest

  • 1. Big Data ML Platform at Pinterest Yongsheng Wu Pinterest: pinterest.com/yswu LinkedIn: linkedin.com/in/yongshengwu Twitter: @yswu 06/17/2019
  • 2. Pinterest : The World’s Catalog of Ideas
  • 3. Mission Help people discover and do what they love.
  • 4. Scale@Pinterest Service Scale • 300M+ MAUs • 120B+ Pins • 3B+ Boards Big Data Scale • 300+ PB on S3 • 6000+ Hive/Hadoop nodes • 400+ Presto nodes • 1000+ Spark nodes
  • 5. Mission & Vision Principles Current Status Key Technologies Future Plan
  • 6. Mission Provide a highly scalable, reliable, secure, performant, efficient and delightful-to-use big data and machine learning platform to enable rapid product innovation and help make Pinterest a thriving business. Vision A big data and machine learning platform at scale enables every single engineer at Pinterest to derive trustworthy, actionable insights and apply ML to solve complex problems with ease and confidence.
  • 7. Mission & Vision Principles Current Status Key Technologies Future Plan
  • 8. Principles ● Put engineers first - make the platform delightful-to-use for all engineers at Pinterest ● Keep it simple, get it right - build a simple yet sufficient platform ● Enable speed and quality - enable all engineers at Pinterest to move fast with scalable, reliable, secure, performant and efficient solutions made easy by the platform ● Build with reusability and for reusability - embrace open source technology, build with lego blocks and provide lego blocks to all engineers at Pinterest
  • 9. 9 Mission & Vision Principles Current Status Key Technologies Future Plan
  • 10. Big Data Platform Big Data PlatformBig Data Platform Feature Platform ML Platform
  • 12. Feature Platform Big Data PlatformBig Data Platform Feature Platform ML Platform
  • 13. Pinterest’s data graph: Pin/Image/Board/User... xJoin pin’s text image info video info texts text languages text scores SEO signa l link languagelink country link perf link scores safe search spam visual signal catvec_v0 pin’s catvec_v0 catvec_v1 pin’s catvec_v1 topicvec_v4 pin’s topicvec_v4 country vecs text tokens landing page annot_embedding v3 annotation_v2 annotation_v3 annotation_v4 Feature Platform - Today
  • 14. code module developer retrieval API, serving, acl, ... offline consumers (ML model training) online consumers (ML model serving) Signal Access & Serving spec metadata code module developer spec metadata code module developer spec metadata Galaxy: next-gen feature platform * incremental dataflow execution engine * signal data store (“column”-partitioned) and metadata repo (registry, stats) * dependency management * governance: enforcement & tracking Metadata-driven framework & dev API ML Platform BDP BDP
  • 15. ML Platform Big Data PlatformBig Data Platform Feature Platform ML Platform
  • 16. Response prediction ML Serving TrainingProfiles Users, Pins, Boards Logs events content
  • 18. Response Prediction Use Cases at Pinterest ● Discovery ○ Home Feed: time-ordered following feed to ML based recommendation feed ○ Related Pins, Search: heuristic to ML ranking ● Ads ○ gCTR, CPI, CVR ● Growth ○ Notifications, NUX topics ● Content ○ Content comprehension ● Shopping ○ CTR prediction ● Protect ○ Spam & Porn, ATO ● … ...
  • 19. Response prediction ML at Pinterest Surfaces 2014: Home feed ranking; Ads ranking 2015: Related Pins ranking 2016: Search ranking; Notifications ranking 2017: Spam detection 2018: NUX topics; Ads retrieval Scale < 10 serving hosts; Training on laptop 2500+ serving hosts; Training on clusters
  • 20. Configuration Data Verification Feature Extraction Process Management Tools Data Collection ML Code Analytics Tools Machine Resource Management Serving Infrastructure Monitoring & Alerting Hidden Technical Debt in Machine Learning Systems David Sculley et al., Google, NIPS 2015
  • 21. Much more complex in practice Learner 1 Parameter Autotuning Serving & Logging Automation Feature Extraction 1 Related Pins Ads Home Feed Learner 2 Data Monitoring Serving & Logging Automation Feature Extraction 2 Learner 3 Data Monitoring Serving & Logging Automation Feature Extraction 3 Distributed Training Distributed Training Similar components, no sharing! Incomplete stacks
  • 22. Unified ML Platform Learner Parameter Autotuning Serving & Logging Automation Feature Extraction Related Pins Ads Home Feed Data Monitoring Distributed Training Client teams focus on business problems, not infra problems. Search NUX Topic Picker Notifications New use cases Platform team specializes in infra problems. Quick to build new ML applications.
  • 23. Unified Big Data ML Platform ● Speed & quality ● Single Use Case ○ 0 -> 1 made fast, easy and robust - create a ML model to solve a complex problem ○ 1 -> N made automated - such a ML model continuously trained, improved, and deployed ● Many Use Cases on the Platform ○ N -> N2 - most of ML models trained and served by the platform
  • 24. 24 Mission & Vision Principles Current Status Key Technologies Future Plan
  • 25. Scorpion Training & Catwalk Catwalk: enables running training jobs on distributed cluster Tensorflow XGBoost Mesos: Cluster resource management (CPUs, RAM, GPUs) Kubernetes: to replace Mesos in 2018 Scorpion Training Abstracts user from specific trainer package used. future: other packages runs on
  • 28. Linchpin - Easy Feature Definition Declarative language for using common feature extraction logic. ● Single implementation for both serving & training. ● Heavily optimized. Generic "Match" Implementation Interest Match Annotation Match reuses pin <- source(TAG="pin", OUTPUTS="p", TYPE="PinJoinRawData") user <- source(TAG="user", OUTPUTS="u", TYPE="UserJoinRawData") cat_match <- match(INPUTS=[user.u.categoryVec, pin.p.categoryVec], MATCH_TYPE="COSINE_SIM") topic_match <- match(INPUTS=[user.u.topicVec, pin.p.topicVec], ...) features <- union(INPUTS=[cat_match, topic_match, ...])
  • 29. Confidential Corpus Root Query understanding Leaf Leaf Leaf Searchable doc index builder index Indexing pipeline model training pipeline models Cache Mixer Cache Reranker Feature log Merger corpus Fresh corpus streaming pipeline index builder fresh index Fresh index dispatcher Perdoc data dispatc her Searchable doc Planner Muse
  • 30. Pixie: Graph walks ● The greatest asset of Pinterest is our pin-to-board graph ○ It captures relationships between pins (how objects are organized into collections) ○ Can be used to capture multiple different interactions: pins to boards, clicks by user,... ● We use Pixie for candidate generation: How to quickly go from 2B pins to 1k pins so that ML models can then score each pin separately ● Represent user a (set of) pin(s) Q and do a random walk from Q: ○ Bias the walk towards fresh pins, Pins in the local user’s language, Pins that males/females like
  • 32. 32 Mission & Vision Principles Current Status Key Technologies Future Plan
  • 33. ● [Product Enablement] Streaming engines ○ Spark Structured Streaming ○ Flink ○ … ... ● [Scalability] Spinner - next gen workflow engine ● [Performance] Hive on Tez ● [Efficiency] Hadoop auto-scaling ● [Future Proofing] Spark on Kubernetes ● [Future Proofing] Hadoop 3.0 Big Data Platform
  • 34. code module developer retrieval API, serving, acl, ... offline consumers (ML model training) online consumers (ML model serving) Signal Access & Serving spec metadata code module developer spec metadata code module developer spec metadata Galaxy: next-gen feature platform * incremental dataflow execution engine * signal data store (“column”-partitioned) and metadata repo (registry, stats) * dependency management * governance: enforcement & tracking Metadata-driven framework & dev API ML Platform BDP BDP
  • 35. ML Platform Learner Model Eval & Comparison Data Monitoring Feature Analysis Parameter Autotunin g Model Serving Logging Developer Frontend off-the-shelf solutions: Tensorflow ... Scorpion Serving Scorpion Training Incremental & Real-Time Training Automation Model Deploy Linchpin DSL Model Version Management Feature Extraction Real-time Feature Sources Counting Service ML Serving Systems ML Training Platform Team key: Model Runtime Validation
  • 36. Mission & Vision Principles Current Status Key Technologies Future Plan
  • 37. Key Learnings ● Unified big data ML platform greatly accelerates product innovations ● Data lineage, quality and democracy are vital to organization scalability ● Speed, quality & delightful-to-use