9. Systems
Elastic, scalable
People
Data scientist
Challenge today: Path from inspiration to production
ProductionPrototyping
Inspiration
Scale
Sophisticated ML Production
Sophisticated ML is
impractical
• Hard to match algo to app
• Algos trapped in paper
Scaling is costly
• Rewrite algo from scratch
• Expensive infrastructure
Deployment: more costly
infrastructure & time
• Build custom services & API
• Model quality deteriorates
Deploy Service
Slow & expensive process
11. MLdevelopmenttoday
Inspiration for Intelligent Application
Data
Top down solution
would be easiest
Read data
Extract text
Create features
Choose model
Tune parameter
Forced to go
bottoms up
Try again
And again
but not possible:
Application is
innovative
→
no black box
solution available
Fine approach if it’s 2013 & I’m obsessed with
“my curve is better than your curve”
(i.e., yet another solution for same old problem)
or not primarily focused on
accelerating creation of
intelligent applications
12. Inspiration for Intelligent Application
Data
If in 5 years all applications intelligent, ML needs:
Start from relevant,high-level,
sophisticated ML building blocks
Don’t waste time on boring stuff, like parameter search
or
worry about specialized ML knowledge, like SGD
Quickly write code:
combine, blend,
understand, adapt,
improve, optimize
Read data
Extract text
Create features
Choose model
Tune parameter
Forced to go
bottoms up
Try again
And again
ML done
differently,
Let’s see
how…
14. High-level ML toolkits
get started with 4 lines of code,
then modify, blend, add yours…
Recommender
Image
search
Sentiment
analysis
Data
matching
Auto
tagging
Churn
predictor
Object
detector
Product
sentiment
Click
prediction
Fraud detection
User
segmentation
Data
completion
Anomaly
detection
Document
clustering
Forecasting
Search
ranking
Summarization …
import graphlab as gl
data = gl.SFrame.read_csv('my_data.csv')
model = gl.recommender.create(data,
user_id='user',
item_id='movie’,
target='rating')
recommendations = model.recommend(k=5)
15. Sophisticated machine learning made easy
Create Intelligence Accelerants
High-level
ML toolkits
AutoML
tune params, model
selection,…
è
so you can focus on
creative parts
Reusable
features
transferrable feature
engineering
è
accuracy with less data &
less effort
16. Makes
ML hard
Understand
& scale
complex
models
Feature
engineering
Need for
lots of
labeled data
Very hard!
Usually: Simple models &
lots of feature engineering
Krishna’s talk tomorrow @9:10am:
auto feature engineering
Next: Transfer learning can
provide complex models with
less work & less data
Modeling challenge Data challenge
Representation challenge
18. Image features
• Features = local detectors
o Combined to make prediction
o (in reality, features are more low-level)
Face!
Eye
Eye
Nose
Mouth
19. Many hand create features exist…Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
20. Standard image classification approach
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier
e.g., logistic regression, SVMs
Car?
21. Many hand create features exist…Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
… but very painful to design
22. Deep neural networks
implicitly learn features
Each layer learns features, at different levels of abstraction
Y LeCun
MA Ranzato
Deep Learning = Learning Hierarchical Representations
It's deep if it has more than one stage of non-linear feature
transformation
Trainable
Classifier
Low-Level
Feature
Mid-Level
Feature
High-Level
Feature
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
Color & edge
detectors
Geometric
detectors
Car-specific
detectors
23. Deep learning has yielded exciting accuracy, e.g.,
Krizhevsky et al. won 2012
ImageNet competition impressively
Huge
gain
28. Change image classification approach?
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier
e.g., logistic regression, SVMs
Car?
Can we learn features
from data,
even when
we don’t have
data or time?
29. Transfer learning:
Use data from one domain to help learn on another
Lots of data:
Learn
neural net
Great accuracy
on cat v. dogvs.
Some data: Neural net as
feature extractor
+
Simple classifier
Great accuracy
on 101
categories
Old idea, explored for deep learning by Donahue et al. ’14
30. What’s learned in a neural net
Neural net trained for Task 1: cat vs. dog
Very specific to Task 1
Should be ignored for other tasks
More generic
Can be used as feature extractor
vs.
31. Transfer learning in more detail…
Neural net trained for Task 1: cat vs. dog
Very specific to Task 1
Should be ignored for other tasks
More generic
Can be used as feature extractor
Keep weights fixed!
For Task 2, predicting 101 categories, learn only end part
Use simple classifier
e.g., logistic regression, SVMs
Class?
32. Transfer learning with deep features
Training set
Validation
set
80%
20%
Learn
simple
model
Some
labeled data
Extract
features
with neural
net trained
on different
task
Validate
Deploy in
production
Deep learning tutorial tomorrow, 4pm!
33. Demo:
The power of deep features, a.k.a., transfer learning
(Shoes, please)
34. How general are deep
features?
Talk by founder, Jason Gates, tomorrow 9:40am
35. GraphLab Create includes
easy to use, deep learning on multi-GPUs
Deep learning tutorial tomorrow, 4pm!
graphlab.deeplearning.create(data,target=label')
Deep learning in
1 line of code You can also
open the box
and add your
own layers
Average Pooling Layer Rectified Linear Layer
Convolution Layer Sigmoid Layer
Dropout Layer SoftMax Layer
Flatten Layer SoftPlus Layer
Full Connection Layer Sum Pooling Layer
Max Pooling Layer Tanh Layer
37. GraphLab Create
for intelligent applications
High-level ML toolkits
(4 lines of code gets you started)
deep learning, recommender,
product reviews, data matching,
sentiment, image search, churn,
click prediction, customer
segmentation, fraud detection,…
Auto Feature Engineering
(automate, achieve high accuracy)
. deep & reusable features
. data transformation pipelines
. kernels & hashing, encodings
AutoML
(automate to focus on creativity)
. parameter search
. model selection
. algorithm selection
. distributed
Tables, graphs,
text, images
Scalable viz for
TBs of data
Including
Matplotlib
at scale
41. The data:
• 400k raw html pages containing:
o text, images, links, and well, everything web pages have
The task:
• predict which pages are organic and which are
sponsored advertising
When:
• starts August 1!
The Prize
• Fame!!!
• Knowledge!!!
• $10,000
42. A lot of effort in Kaggle
competitions involves running
many experiments…
…can get slow L
43. SFrame ❤️ all ML tools SGraph
Sophisticated machine learning made scalable
Data Structures to Create Intelligence
44. Data frames
user movie rating
When you choose a
data frame,
have your application in mind
SFrame is
optimized for ML
ML has specific
data access patterns,
we make them fast, really fast
(Columnar transformations,
creating new features, iterations,…)
45. … Same
code
user movie rating
SFrame: Scalable data frame optimized for ML
Never run out of memory
Sharded, compressed, out-of-core, columnar
Arbitrary lambda transformations, joins,… from Python
Talk tomorrow with details: Yucheng @11am
Large data on one machine?
Limited RAM è Must use disk
(out-of-core computation)
46. Opportunity for Out-of-Core ML
Capacity 1 TB
0.5 GB/s
10 TB
0.1 GB/s
0.1 TB
1 GB/sThroughput
Fast, but significantly
limits data sizeOpportunity for big data on 1 machine
For sequential reads only!
Random access very slow
Out-of-core ML
opportunity is huge
Usual design → Lots of
random access → Slow
Design to maximize
sequential access for
ML algo patterns
GraphChi early example
SFrame data frame for ML
53. ML pipelines combine multiple data types
Raw
Wikipedia
< / >< / >< / >
XML
Hyperlinks PageRank Top 20 Pages
Title PR
Text
Table
Title Body
Topic Model
(LDA) Word Topics
WordTopic
Term-Doc
Graph
58. SFrame & SGraph
Optimized
out-of-core
computation for ML
High Performance
1 machine can handle:
TBs of data
100s Billions of edges
Optimized for ML
. Columnar transformation
. Create features
. Iterators
. Filter, join, group-by, aggregate
. User-defined functions
. Easily extended through SDK
Tables, graphs,
text, images
Open-source
❤️
BSD
license
(August)
61. Criteo Terabyte Click Prediction
4.4 Billion Rows
13 Features
½ TB of data
0
500
1000
1500
2000
2500
3000
3500
4000
0 4 8 12 16
Runtime
#Machines
225s
3630s
62. Same code, distributed ML
import graphlab as gl
data = gl.SFrame.read_csv(’s3://…')
model = gl.classifier.create(data,
target=’click’)
Singlemachine
MLcode
c = gl.deploy.ec2_cluster.load(’s3://…')
gl.set_distributed_execution_environment(c)
c = gl.deploy.hadoop_cluster.load(’hdfs://…')c = gl.deploy.spark_cluster.load(’hdfs://…')
63. Dato machine learning platform
Inspiration
Scale
Sophisticated ML
Optimized for ML performance,
for any data size, on any infrastructure
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features
Job Mgmt
Distributed Engine
Distributed MLDato
Distributed
SGraph
Create Engine
SFrame
GraphLab Create
Machine Learning
In Production
65. Deployment Engineers
Deploying ML models
Data Scientists
Exciting new deep
learning model.
How long is this
going to take?!
REST API!
I will be done today.
It’s
accurate!
Dato Predictive
Services
66. Choosing between deployed models
Machine Learning in Production
Evaluation
Monitoring
Deployment
Management
Easily serve live predictions
Measuring quality of deployed models
Tracking model operations
Talk tomorrow with details: Alice & Rajat @1:45pm
68. Dato machine learning platform
Inspiration
Scale
ProductionDeploy Service
Optimized for ML performance,
for any data size, on any infrastructure
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features REST Client Model Mgmt
Dato Predictive Services
Robust, Elastic
Direct
Job Mgmt
Distributed Engine
Distributed MLDato
Distributed
SGraph
Create Engine
SFrame
GraphLab Create
Sophisticated ML
Create of intelligent applications faster & cheaper
69. My curve is
better than
your curve
INTELLIGENT
APPLICATIONS
are
disrupting markets
Phase transition of
machine learning
Accelerate this process
> pip install graphlab-create
jobs@dato.com@guestrin
Editor's Notes
and if you talked to me in 2013, this is how I thought machine learning worked...
But, I didn't get into machine learning to write papers, I got into it because, as I kid, I read a lot of scifi
and I wanted to build intelligent robots
applications that really demonstrate intelligence
I'm excited that today, these fantasies are coming to reality...
we are seeing industry after industry being disrupted by companies that build intelligent applications
amazon
netflix
pandora
adsense
uber
and these intelligent applications use machine learning at their core
so, revisiting that childhood dream, I can say...
by making sophisticated ML easy for my people, the developers and data scientists...
Since last year, a lot has happened...
And, it is with great enthusiasm, that I can share that we, Dato, are the emerging machine learning company with most paying customers...
And, the vision we share is that building intelligent applications is the key differentiator that they can provide for their users
My sister Julia is a successful fashion designer...
Let's see what it takes today to build such intelligent applications
start with inspiration
to understand why building sophisticated ML applications is impractical for at a huge scale, let's look at the ML journey
but, i predict that in 5 years, every disruptive application will be differentiate by machine learning
for this to come true...
MNIST is I think its just 60K images at 28x28, 10 classes, 4x GRID K520 GPUs on EC2.
http://h2o.ai/blog/2015/02/deep-learning-performance/
Native Advertising is paid content that matches a publication’s editorial standards while meeting the audience’s expectations.
Demian Farnworth - http://www.copyblogger.com/examples-of-native-ads
When trying to get to performance and scalability on a single machine, the most important thing for oany programmer to understand is the storage hierarchy.
r3.8xlarge
If you were to try to represent this in memory, it is a minimum of a TB of memory or so, excluding overheads.
r3.8xlarge
If you were to try to represent this in memory, it is a minimum of a TB of memory or so, excluding overheads.
Simplify the process of moving models to production
Show that will support serving sci-kit-learn, R, mlLib ..
Manage multiple models in production
Connects to user services (high-availability and low latency)
stepping back...
we are lucky enough to be at a time in the development of technology when we are witnessing a phase transition
actually, you are making this machine learning phase transition happen
from the 2013 perspective, when I was only focused on papers and my curve is better than your curve
to a time when disruptive applications are differentiated by machine learning
we hope we can help accelerate this transition