SlideShare a Scribd company logo
1 of 11
Download to read offline
Benoit Rostykus
Machine Learning Researcher Oct. 10, 2017 - ML Platform Meetup
Scope
1,888k 252k 2,322k 110k 6k
Lines of code*: *: git ls-files | xargs cat | wc -l
● 0.05 dev (I spend 5% of my time on it)
● offers a minimal DAG with backprop for feed-forward nets
● sparse data as first class citizen
● arbitrary loss function
● extremely fast on CPU
○ 0 memory allocation
○ lock-free inter-core parallelism
○ LLVM intrinsics for dense ops SIMD vectorization
Performance
● Currently in A/B test, one of the many sub-algorithms used to construct Netflix
homepage recommendations
● Training set
○ 33M rows / ~510 nonzeros per row / total dimensionality 7.3k / sparsity = 7%
○ 8 bytes per entry =(index, value)=(uint, float)
○ 16.8B entries, 125GB total
4.2 sec per SGD pass (proximal AdaGrad) over 16 cores (r4.8xlarge ec2 instance)
1.9GB
491k rows
250M entries
/ sec / core
!
33 GB / sec
1 GFLOPS / core
75% mem bandwidth
DDR4 SDRAM r4.8xlarge max read throughput is ~44GB/s
Real-world job: sparse logistic regression with positivity constraint on weights
Trade-offs
“All non-trivial abstractions, to some degree, are leaky.” - Joel Spolsky
genericity performance
● tensorflow/core/kernels
adjust_hue_op.cc
sparse_xent_op.cc
word2vec_ops.cc
REGISTER_OP("Skipgram")
.Deprecated(19,
"Moving word2vec into
tensorflow_models/tutorials and "
"deprecating its ops here as a result")
● RNN unrolling
Design choice: D
● Fact 1: python is awesome but slow. Fact 2: scientists can’t code in C++.
○ Mainstream solution: python to frontend an efficient C++ backend
○ Problem: scientists have outsourced technological leverage to C++ coders
○ Scientists might think they need a cluster of GPUs instead of a single box
○ Creates a “division of labor” which hampers innovation at interface
● vectorflow is written in D: a modern systems language
○ python-like experience for beginners, 100x faster runtime
○ C++ done right for experienced users
○ code compile run debug loop almost as fast as python
○ statically typed with great type-inference, best-in-class templates
○ amazing LLVM compiler LDC
○ low-level control if needed
■ compile-time evaluation, inline asm
■ manual mem management
● Single language benefits
○ you don’t have to switch language to have efficient code
○ less abstractions, less impedance mismatch, less bugs
○ faster dev time
D C++
Design choice: optimize for latency
● Most DL libraries optimize for throughput, not latency - assume memory move is cheap
○ mini-batch API
○ pass-by-copy by default, gather when sparse
■ computation is assumed to outweigh memory transport cost
● RAM -> GPU memory -> computation -> RAM
■ makes sense for compute heavy, dense problems
● images: convolutions are expensive
● Instead, vectorflow optimizes for low latency - assumes memory move is expensive
○ row-based API : fast query time
○ everything is pre-allocated when the graph is built
○ no memory allocation/copy during forward-prop nor backward-prop (RAM is slow)
○ great for low latency problems / sparse or shallow nets: real-time bidding, trading etc.
...
shallow => IO bound => CPU
deep => compute bound => GPU
...optimized for:
optimized for:
Design choice: templates leverage
● Data
○ Format agnostic: “bring your own data”
○ Move the code to the data, not the opposite
○ Loose requirement on schema
○ Library just expects an iterator
■ in-memory or out-of-core learning possible
○ Compile-time mapping of data fields to DAG roots to
avoid runtime copy
○ Netflix internal data-adapter example:
stream parquet-encoded s3-backed Hive tables
● Loss callback
○ Easily implement arbitrary loss functions
○ Compile-time specialization of learning logic based on
callback signature
○ Gradient buffer reference to avoid allocation
■ Can be dense or sparse!
example: sparse auto-encoder
(sparse cross-entropy)
Design choice: parallelism
● Distributed learning...
○ … is hard to implement & debug
○ … trades convergence speed for lower communication cost
■ meta-algorithms such as CoCoA (Berkeley), AIDE (CMU) help
● Don’t distribute over multiple machines unless you need it
● Inter-core parallelism: SIMD for all dense ops
● Intra-core parallelism: Hogwild! - asynchronous SGD
○ Data parallelism: each core iterates over a data chunk
○ Lock-free strategy, pretends each core is alone - race conditions will happen
○ Avoid need of a meta-algorithm
○ Works great as long as read/write patterns are sparse enough
■ More likely to be true in the sparse bottom layer
○ Works surprisingly well on dense problems too
○ Free: only cost is CPU cache line trashing
small > big simple > complex
● Distributed as source-code, not pre-compiled library
○ Compiling arch = running arch always optimized
■ leverages LLVM as much as possible, no handwritten-SIMD
● No third party dependencies
○ No brainer to install, just need a D compiler
○ Works everywhere
● Small code base, easy to understand and hack
● Polar bear friendly
Some Netflix use-cases:
● Survival regression
● Quantile regression
● Binary/multiclass classification
● Causal inference
● Auto-encoder
● ...
Roadmap:
● more complex nodes and deeper sparsity support
● algebraic API (mix of pytorch / tf through operators overloading)
● RNN, more optimizers (SVRG etc.)
● keep it simple & small - not meant to be an ML kitchen sink
demo
Thank you!
links:

More Related Content

Recently uploaded

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Netflix VectorFlow at ML Platform Meetup Oct 2017

  • 1. Benoit Rostykus Machine Learning Researcher Oct. 10, 2017 - ML Platform Meetup
  • 2. Scope 1,888k 252k 2,322k 110k 6k Lines of code*: *: git ls-files | xargs cat | wc -l ● 0.05 dev (I spend 5% of my time on it) ● offers a minimal DAG with backprop for feed-forward nets ● sparse data as first class citizen ● arbitrary loss function ● extremely fast on CPU ○ 0 memory allocation ○ lock-free inter-core parallelism ○ LLVM intrinsics for dense ops SIMD vectorization
  • 3. Performance ● Currently in A/B test, one of the many sub-algorithms used to construct Netflix homepage recommendations ● Training set ○ 33M rows / ~510 nonzeros per row / total dimensionality 7.3k / sparsity = 7% ○ 8 bytes per entry =(index, value)=(uint, float) ○ 16.8B entries, 125GB total 4.2 sec per SGD pass (proximal AdaGrad) over 16 cores (r4.8xlarge ec2 instance) 1.9GB 491k rows 250M entries / sec / core ! 33 GB / sec 1 GFLOPS / core 75% mem bandwidth DDR4 SDRAM r4.8xlarge max read throughput is ~44GB/s Real-world job: sparse logistic regression with positivity constraint on weights
  • 4. Trade-offs “All non-trivial abstractions, to some degree, are leaky.” - Joel Spolsky genericity performance ● tensorflow/core/kernels adjust_hue_op.cc sparse_xent_op.cc word2vec_ops.cc REGISTER_OP("Skipgram") .Deprecated(19, "Moving word2vec into tensorflow_models/tutorials and " "deprecating its ops here as a result") ● RNN unrolling
  • 5. Design choice: D ● Fact 1: python is awesome but slow. Fact 2: scientists can’t code in C++. ○ Mainstream solution: python to frontend an efficient C++ backend ○ Problem: scientists have outsourced technological leverage to C++ coders ○ Scientists might think they need a cluster of GPUs instead of a single box ○ Creates a “division of labor” which hampers innovation at interface ● vectorflow is written in D: a modern systems language ○ python-like experience for beginners, 100x faster runtime ○ C++ done right for experienced users ○ code compile run debug loop almost as fast as python ○ statically typed with great type-inference, best-in-class templates ○ amazing LLVM compiler LDC ○ low-level control if needed ■ compile-time evaluation, inline asm ■ manual mem management ● Single language benefits ○ you don’t have to switch language to have efficient code ○ less abstractions, less impedance mismatch, less bugs ○ faster dev time D C++
  • 6. Design choice: optimize for latency ● Most DL libraries optimize for throughput, not latency - assume memory move is cheap ○ mini-batch API ○ pass-by-copy by default, gather when sparse ■ computation is assumed to outweigh memory transport cost ● RAM -> GPU memory -> computation -> RAM ■ makes sense for compute heavy, dense problems ● images: convolutions are expensive ● Instead, vectorflow optimizes for low latency - assumes memory move is expensive ○ row-based API : fast query time ○ everything is pre-allocated when the graph is built ○ no memory allocation/copy during forward-prop nor backward-prop (RAM is slow) ○ great for low latency problems / sparse or shallow nets: real-time bidding, trading etc. ... shallow => IO bound => CPU deep => compute bound => GPU ...optimized for: optimized for:
  • 7. Design choice: templates leverage ● Data ○ Format agnostic: “bring your own data” ○ Move the code to the data, not the opposite ○ Loose requirement on schema ○ Library just expects an iterator ■ in-memory or out-of-core learning possible ○ Compile-time mapping of data fields to DAG roots to avoid runtime copy ○ Netflix internal data-adapter example: stream parquet-encoded s3-backed Hive tables ● Loss callback ○ Easily implement arbitrary loss functions ○ Compile-time specialization of learning logic based on callback signature ○ Gradient buffer reference to avoid allocation ■ Can be dense or sparse! example: sparse auto-encoder (sparse cross-entropy)
  • 8. Design choice: parallelism ● Distributed learning... ○ … is hard to implement & debug ○ … trades convergence speed for lower communication cost ■ meta-algorithms such as CoCoA (Berkeley), AIDE (CMU) help ● Don’t distribute over multiple machines unless you need it ● Inter-core parallelism: SIMD for all dense ops ● Intra-core parallelism: Hogwild! - asynchronous SGD ○ Data parallelism: each core iterates over a data chunk ○ Lock-free strategy, pretends each core is alone - race conditions will happen ○ Avoid need of a meta-algorithm ○ Works great as long as read/write patterns are sparse enough ■ More likely to be true in the sparse bottom layer ○ Works surprisingly well on dense problems too ○ Free: only cost is CPU cache line trashing
  • 9. small > big simple > complex ● Distributed as source-code, not pre-compiled library ○ Compiling arch = running arch always optimized ■ leverages LLVM as much as possible, no handwritten-SIMD ● No third party dependencies ○ No brainer to install, just need a D compiler ○ Works everywhere ● Small code base, easy to understand and hack ● Polar bear friendly Some Netflix use-cases: ● Survival regression ● Quantile regression ● Binary/multiclass classification ● Causal inference ● Auto-encoder ● ... Roadmap: ● more complex nodes and deeper sparsity support ● algebraic API (mix of pytorch / tf through operators overloading) ● RNN, more optimizers (SVRG etc.) ● keep it simple & small - not meant to be an ML kitchen sink
  • 10. demo