Dato Keynote

The ML pipeline circa 2013
Data
ML
Algorithm
My curve is
better than
your curve
Write a
paper

Retail
Movie Distribution
Music
Advertising
Networking
Search
Taxis
Dating
Legal Advice
Human Resources
Coupons
Campaigning
Real Estate
Wearables
CRM
Disruptive companies
diﬀerentiated by
INTELLIGENT
APPLICATIONS
using
Machine Learning

Dato’s mission is to
accelerate the creation of
intelligent applications
by making
sophisticated machine learning
as easy as
“Hello world!”

•  Released 3 products
•  More than 10,000 downloads
GraphLab Create Dato Distributed Dato Predictive Services
Since last year…

Since last year…
Our
customers…

Demo:
Intelligent application
(Gift for Julia)

Systems
Elastic, scalable
People
Data scientist
Challenge today: Path from inspiration to production
ProductionPrototyping
Inspiration
Scale
Sophisticated ML Production
Sophisticated ML is
impractical
• Hard to match algo to app
• Algos trapped in paper
Scaling is costly
• Rewrite algo from scratch
• Expensive infrastructure
Deployment: more costly
infrastructure & time
• Build custom services & API
• Model quality deteriorates
Deploy Service
Slow & expensive process

Sophisticated ML is
impractical

MLdevelopmenttoday
Inspiration for Intelligent Application
Data
Top down solution
would be easiest
Read data
Extract text
Create features
Choose model
Tune parameter
Forced to go
bottoms up
Try again
And again
but not possible:
Application is
innovative
→
no black box
solution available
Fine approach if it’s 2013 & I’m obsessed with
“my curve is better than your curve”
(i.e., yet another solution for same old problem)
or not primarily focused on
accelerating creation of
intelligent applications

Inspiration for Intelligent Application
Data
If in 5 years all applications intelligent, ML needs:
Start from relevant,high-level,
sophisticated ML building blocks
Don’t waste time on boring stuﬀ, like parameter search
or
worry about specialized ML knowledge, like SGD
Quickly write code:
combine, blend,
understand, adapt,
improve, optimize
Read data
Extract text
Create features
Choose model
Tune parameter
Forced to go
bottoms up
Try again
And again
ML done
diﬀerently,
Let’s see
how…

Demo:
Building an intelligent application with
GraphLab Create
(Restaurant recommender)

High-level ML toolkits
get started with 4 lines of code,
then modify, blend, add yours…
Recommender
Image
search
Sentiment
analysis
Data
matching
Auto
tagging
Churn
predictor
Object
detector
Product
sentiment
Click
prediction
Fraud detection
User
segmentation
Data
completion
Anomaly
detection
Document
clustering
Forecasting
Search
ranking
Summarization …
import graphlab as gl
data = gl.SFrame.read_csv('my_data.csv')
model = gl.recommender.create(data,
user_id='user',
item_id='movie’,
target='rating')
recommendations = model.recommend(k=5)

Sophisticated machine learning made easy
Create Intelligence Accelerants
High-level
ML toolkits
AutoML
tune params, model
selection,…
è
so you can focus on
creative parts
Reusable
features
transferrable feature
engineering
è
accuracy with less data &
less eﬀort

Makes
ML hard
Understand
& scale
complex
models
Feature
engineering
Need for
lots of
labeled data
Very hard!
Usually: Simple models &
lots of feature engineering
Krishna’s talk tomorrow @9:10am:
auto feature engineering
Next: Transfer learning can
provide complex models with
less work & less data
Modeling challenge Data challenge
Representation challenge

Example:
Deep learning in computer vision
(or the deep devil is in the deep details)

Image features
•  Features = local detectors
o  Combined to make prediction
o  (in reality, features are more low-level)
Face!
Eye
Eye
Nose
Mouth

Many hand create features exist…Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$

Standard image classiﬁcation approach
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Extract features Use simple classiﬁer
e.g., logistic regression, SVMs
Car?

Many hand create features exist…Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
… but very painful to design

Deep neural networks
implicitly learn features
Each layer learns features, at diﬀerent levels of abstraction
Y LeCun
MA Ranzato
Deep Learning = Learning Hierarchical Representations
It's deep if it has more than one stage of non-linear feature
transformation
Trainable
Classifier
Low-Level
Feature
Mid-Level
Feature
High-Level
Feature
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
Color & edge
detectors
Geometric
detectors
Car-speciﬁc
detectors

Deep learning has yielded exciting accuracy, e.g.,
Krizhevsky et al. won 2012
ImageNet competition impressively
Huge
gain

Deep learning workﬂow
Lots of
labeled data
Training set
Validation
set
80%
20%
Learn deep
neural net
model
Validate

Many tricks needed to work well…
Diﬀerent types of layers, connections,… needed for high accuracy
Krizhevsky et al. ‘12

GraphLab Create adds deep features
Deep learning
+
Transfer learning

Change image classiﬁcation approach?
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Extract features Use simple classiﬁer
Car?
Can we learn features
from data,
even when
we don’t have
data or time?

Transfer learning:
Use data from one domain to help learn on another
Lots of data:
Learn
neural net
Great accuracy
on cat v. dogvs.
Some data: Neural net as
feature extractor
+
Simple classiﬁer
Great accuracy
on 101
categories
Old idea, explored for deep learning by Donahue et al. ’14

What’s learned in a neural net
Neural net trained for Task 1: cat vs. dog
Very speciﬁc to Task 1
Should be ignored for other tasks
More generic
Can be used as feature extractor
vs.

Transfer learning in more detail…
Neural net trained for Task 1: cat vs. dog
Very specific to Task 1
Should be ignored for other tasks
More generic
Can be used as feature extractor
Keep weights fixed!
For Task 2, predicting 101 categories, learn only end part
Use simple classifier
Class?

Transfer learning with deep features
Training set
Validation
set
80%
20%
Learn
simple
model
Some
labeled data
Extract
features
with neural
net trained
on diﬀerent
task
Validate
Deploy in
production
Deep learning tutorial tomorrow, 4pm!

Demo:
The power of deep features, a.k.a., transfer learning
(Shoes, please)

How general are deep
features?
Talk by founder, Jason Gates, tomorrow 9:40am

GraphLab Create includes
easy to use, deep learning on multi-GPUs
Deep learning tutorial tomorrow, 4pm!
graphlab.deeplearning.create(data,target=label')
Deep learning in
1 line of code You can also
open the box
and add your
own layers
Average Pooling Layer Rectiﬁed Linear Layer
Convolution Layer Sigmoid Layer
Dropout Layer SoftMax Layer
Flatten Layer SoftPlus Layer
Full Connection Layer Sum Pooling Layer
Max Pooling Layer Tanh Layer

0.60%
0.65%
0.70%
0.75%
0.80%
0.85%
0 5 10 15
TestError
Hours
Digit recognition benchmark
H2O.ai:
10 machines/80 cores
GraphLab Create
4 min on 4 GPUs

GraphLab Create
for intelligent applications
High-level ML toolkits
(4 lines of code gets you started)
deep learning, recommender,
product reviews, data matching,
sentiment, image search, churn,
click prediction, customer
segmentation, fraud detection,…
Auto Feature Engineering
(automate, achieve high accuracy)
. deep & reusable features
. data transformation pipelines
. kernels & hashing, encodings
AutoML
(automate to focus on creativity)
. parameter search
. model selection
. algorithm selection
. distributed
Tables, graphs,
text, images
Scalable viz for
TBs of data
Including
Matplotlib
at scale

Anthony Goldbloom
Founder & CEO
Debora Donato
Sr. Director of Personalization
& Principal Data Scientist

Native Advertising – The opportunity of making ads valuable
For
the
users

For
the

publishers

Bad advertising does not work for anybody

The data:
•  400k raw html pages containing:
o  text, images, links, and well, everything web pages have
The task:
•  predict which pages are organic and which are
sponsored advertising
When:
•  starts August 1!
The Prize
•  Fame!!!
•  Knowledge!!!
•  $10,000

A lot of eﬀort in Kaggle
competitions involves running
many experiments…
…can get slow L

SFrame ❤️ all ML tools SGraph
Sophisticated machine learning made scalable
Data Structures to Create Intelligence

Data frames
user movie rating
When you choose a
data frame,
have your application in mind
SFrame is
optimized for ML
ML has speciﬁc
data access patterns,
we make them fast, really fast
(Columnar transformations,
creating new features, iterations,…)

… Same
code
user movie rating
SFrame: Scalable data frame optimized for ML
Never run out of memory
Sharded, compressed, out-of-core, columnar
Arbitrary lambda transformations, joins,… from Python
Talk tomorrow with details: Yucheng @11am
Large data on one machine?
Limited RAM è Must use disk
(out-of-core computation)

Opportunity for Out-of-Core ML
Capacity 1 TB
0.5 GB/s
10 TB
0.1 GB/s
0.1 TB
1 GB/sThroughput
Fast, but signiﬁcantly
limits data sizeOpportunity for big data on 1 machine
For sequential reads only!
Random access very slow
Out-of-core ML
opportunity is huge
Usual design → Lots of
random access → Slow
Design to maximize
sequential access for
ML algo patterns
GraphChi early example
SFrame data frame for ML

Demo: 10TBs of data on one
machine!

scikit-learn is awesome, but...
0
1000
2000
3000
4000
0 50 100 150 200 250 300 350 400
Runtime(s)
Millions of RowsAirline Delay Dataset,
SGDLinearClassiﬁer
scikit-learn
+
Numpy
Out of RAM
Numpy in memory only

Demo: 10TBs of data on one machine
redux

Numpy Automatically Backed by Sframes →
Scale many Python packages (scikit-learn, scipy,…)
import graphlab.numpy
Scalable numpy activation successful
0
1000
2000
3000
4000
0 50 100 150 200 250 300 350 400
Runtime(s)
Millions of Rows
Airline Delay Dataset,
SGDLinearClassiﬁer
Out of RAM
Graphlab Create
+
scikit-learn
+
Numpy
scikit-learn
+
Numpy
Caveats apply
- Scales most memory-bound sklearn algorithms
- Sequential access highly preferred for performance

ML pipelines combine multiple data types
Raw
Wikipedia
< / >< / >< / >
XML
Hyperlinks PageRank Top 20 Pages
Title PR
Text
Table
Title Body
Topic Model
(LDA) Word Topics
WordTopic
Term-Doc
Graph

SGraph
Graph processing
& analytics
Out-of-core &
scalable
Neighborhoods, paths, graph
algos, community detection,
label propagation, ML on
graphs, viz, …
Backed by
SFrame

Performance of SGraph
55

70 sec
251 sec
200 sec
2,128 sec
0 750 1500 2250
GraphLab Create
GraphX
Giraph
Spark
Connected components in Twitter graph
Source(s): Gonzalez et. al. (OSDI 2014)
Twitter: 41 million Nodes, 1.4 billion Edges
SGraph
16 machines
1 machine

Pagerank on Common Crawl Graph
3.5 billion Nodes and 128 billion Edges
0
2
4
6
8
10
1 machine
Minutesperiteration
16 CPUs, 1 SSD

SFrame & SGraph
Optimized
out-of-core
computation for ML
High Performance
1 machine can handle:
TBs of data
100s Billions of edges
Optimized for ML
. Columnar transformation
. Create features
. Iterators
. Filter, join, group-by, aggregate
. User-deﬁned functions
. Easily extended through SDK
Tables, graphs,
text, images
Open-source
❤️
BSD
license
(August)

Distributed
machine
learning
Your big data
infrastructure
(cloud, hadoop, spark,..)
Sophisticated machine learning made distributed
Create Intelligence on Huge Data

Pagerank on Common Crawl Graph
3.5 billion Nodes and 128 billion Edges
0
2
4
6
8
10
1 machine 16 machines
Minutesperiteration
256 CPUs16 CPUs
45 secs/iteration
3B edges/sec

Criteo Terabyte Click Prediction
4.4 Billion Rows
13 Features
½ TB of data
0
500
1000
1500
2000
2500
3000
3500
4000
0 4 8 12 16
Runtime
#Machines
225s
3630s

Same code, distributed ML
import graphlab as gl
data = gl.SFrame.read_csv(’s3://…')
model = gl.classifier.create(data,
target=’click’)
Singlemachine
MLcode
c = gl.deploy.ec2_cluster.load(’s3://…')
gl.set_distributed_execution_environment(c)
c = gl.deploy.hadoop_cluster.load(’hdfs://…')c = gl.deploy.spark_cluster.load(’hdfs://…')

Dato machine learning platform
Inspiration
Scale
Sophisticated ML
Optimized for ML performance,
for any data size, on any infrastructure
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features
Job Mgmt
Distributed Engine
Distributed MLDato
Distributed
SGraph
Create Engine
SFrame
GraphLab Create
Machine Learning
In Production

Machine Learning in Production
Deployment
Easily serve live predictions

Deployment Engineers
Deploying ML models
Data Scientists
Exciting new deep
learning model.
How long is this
going to take?!
REST API!
I will be done today.
It’s
accurate!
Dato Predictive
Services

Choosing between deployed models
Machine Learning in Production
Evaluation
Monitoring
Deployment
Management
Easily serve live predictions
Measuring quality of deployed models
Tracking model operations
Talk tomorrow with details: Alice & Rajat @1:45pm

Evaluation
Monitoring
Deployment
Management
Inspiration
Scale
Sophisticated ML
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features
Job Mgmt
Distributed Engine
Distributed MLDato
Distributed
SGraph
Create Engine
SFrame
GraphLab Create

Inspiration
Scale
ProductionDeploy Service
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features REST Client Model Mgmt
Dato Predictive Services
Robust, Elastic
Direct
Job Mgmt
Distributed Engine
Distributed MLDato
Distributed
SGraph
Create Engine
SFrame
GraphLab Create
Sophisticated ML
Create of intelligent applications faster & cheaper

My curve is
better than
your curve
INTELLIGENT
APPLICATIONS
are
disrupting markets
Phase transition of
machine learning
Accelerate this process
> pip install graphlab-create
jobs@dato.com@guestrin

Dato Keynote

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Dato Keynote

Similar to Dato Keynote (20)

More from Turi, Inc.

More from Turi, Inc. (20)

Recently uploaded

Recently uploaded (20)

Dato Keynote

Editor's Notes