0
Machine Learning on Big DataLessons Learned from Google ProjectsMax LinSoftware Engineer | Google ResearchMassively Parall...
Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
“Machine Learning is a studyof computer algorithms that   improve automatically    through experience.”
The quick brown foxjumped over the lazy dog.
The quick brown fox                            Englishjumped over the lazy dog.
The quick brown fox                            Englishjumped over the lazy dog.To err is human, but toreally foul things u...
The quick brown fox                            Englishjumped over the lazy dog.To err is human, but toreally foul things u...
The quick brown fox                            Englishjumped over the lazy dog.To err is human, but toreally foul things u...
The quick brown fox                            Englishjumped over the lazy dog.To err is human, but toreally foul things u...
The quick brown fox                            Englishjumped over the lazy dog.To err is human, but toreally foul things u...
The quick brown fox                            Englishjumped over the lazy dog.To err is human, but toreally foul things u...
The quick brown fox                             Englishjumped over the lazy dog.To err is human, but toreally foul things ...
The quick brown fox                             Englishjumped over the lazy dog.To err is human, but toreally foul things ...
The quick brown fox                             Englishjumped over the lazy dog.To err is human, but toreally foul things ...
The quick brown fox                             Englishjumped over the lazy dog.To err is human, but toreally foul things ...
The quick brown fox                                        English           jumped over the lazy dog.           To err is...
The quick brown fox                                        English           jumped over the lazy dog.           To err is...
The quick brown fox                                          English           jumped over the lazy dog.           To err ...
The quick brown fox                                          English           jumped over the lazy dog.           To err ...
The quick brown fox                                          English           jumped over the lazy dog.           To err ...
The quick brown fox                                          English           jumped over the lazy dog.           To err ...
The quick brown fox                                           English           jumped over the lazy dog.           To err...
Linear Classifier
Linear ClassifierThe quick brown fox jumped over the lazy dog.
Linear Classifier  The quick brown fox jumped over the lazy dog.‘a’
Linear Classifier  The quick brown fox jumped over the lazy dog.‘a’ ...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...0,
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,...
Linear Classifier   The quick brown fox jumped over the lazy dog.  ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ .....
Linear Classifier   The quick brown fox jumped over the lazy dog.  ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ .....
Linear Classifier       The quick brown fox jumped over the lazy dog.    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañ...
Linear Classifier       The quick brown fox jumped over the lazy dog.    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañ...
Linear Classifier       The quick brown fox jumped over the lazy dog.    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañ...
Linear Classifier       The quick brown fox jumped over the lazy dog.    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañ...
Training Data                 Input X                      Ouput Y                        P                               ...
Typical machine learningdata at GoogleN: 100 billions / 1 billionP: 1 billion / 10 million(mean / median)                 ...
Classifier Training• Training: Given {(x, y)} and f, minimize the  following objective function                  N        a...
Use Newton’s method?    t +1      t     t −1                    tw          ← w − H(w )           J(w )                   ...
Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
Scaling Up
Scaling Up• Why big data?
Scaling Up• Why big data?• Parallelize machine learning algorithms
Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel
Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines
Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • ...
Subsampling
Subsampling      Big Data
Subsampling                    Big DataShard 1   Shard 2     Shard 3         Shard M                                ...
Subsampling                      Big DataReduce N   Shard 1
Subsampling                      Big DataReduce N   Shard 1           Machine
Subsampling                      Big DataReduce N           Machine           Shard 1
Subsampling                      Big DataReduce N           Machine           Shard 1           Model
Why not Small Data?                [Banko and Brill, 2001]
Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • ...
Parallelize Estimates• Naive Bayes Classifier                 N   P                               i     arg min −          ...
Word Counting
Word CountingMap
Word Counting      X: “The quick brown fox ...”Map      Y: EN
Word Counting                                     (‘the|EN’, 1)      X: “The quick brown fox ...”Map      Y: EN
Word Counting                                     (‘the|EN’, 1)      X: “The quick brown fox ...”Map                      ...
Word Counting                                     (‘the|EN’, 1)      X: “The quick brown fox ...”Map                      ...
Word Counting                                        (‘the|EN’, 1)         X: “The quick brown fox ...” Map               ...
Word Counting                                            (‘the|EN’, 1)         X: “The quick brown fox ...” Map           ...
Word Counting                                            (‘the|EN’, 1)         X: “The quick brown fox ...” Map           ...
Word Counting                                            (‘the|EN’, 1)         X: “The quick brown fox ...” Map           ...
Word Counting
Word Counting       Big Data
Word Counting                    Big DataShard 1   Shard 2   Shard 3    ...   Shard M
Word Counting                                   Big Data          Mapper 1   Mapper 2     Mapper 3          Mapper MMap   ...
Word Counting                                      Big Data             Mapper 1   Mapper 2    Mapper 3             Mapper...
Word Counting                                      Big Data             Mapper 1   Mapper 2    Mapper 3             Mapper...
Parallelize Optimization            N           P       i yi                 exp( p=1 wp ∗ xp )    arg min               P...
Parallelize Optimization• Maximum Entropy Classifiers                         P             N                   i yi       ...
Parallelize Optimization• Maximum Entropy Classifiers                         P             N                   i yi       ...
Parallelize Optimization• Maximum Entropy Classifiers                         P             N                   i yi       ...
Parallelize Optimization• Maximum Entropy Classifiers                          P              N                   i yi     ...
Parallelize Optimization• Maximum Entropy Classifiers                          P              N                   i yi     ...
Parallelize Optimization• Maximum Entropy Classifiers                          P              N                   i yi     ...
Gradient Descent        http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf
Gradient Descent
Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients •
Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients       J(w) •
Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w)     t+1    t
Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w)     t+1          t...
Distribute Gradient• w is initialized as zero• for t in 1 to T • Calculate gradients in parallel• Training CPU: O(TPN) to ...
Distribute Gradient• w is initialized as zero• for t in 1 to T • Calculate gradients in parallel    wt+1 ← wt − η J(w)• Tr...
Distribute Gradient
Distribute Gradient          Big Data
Distribute Gradient                     Big Data Shard 1   Shard 2   Shard 3    ...   Shard M
Distribute Gradient                                   Big Data       Machine 1     Machine 2   Machine 3          Machine ...
Distribute Gradient                                      Big Data          Machine 1     Machine 2   Machine 3          Ma...
Distribute Gradient                                      Big Data          Machine 1     Machine 2   Machine 3          Ma...
Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • ...
Parallelize Subroutines• Support Vector Machines                 1                                         n              ...
The computationalcost for the Primal-Dual Interior PointMethod is O(n^3) intime and O(n^2) in      memoryhttp://www.flickr....
Parallel SVM      [Chang et al, 2007]          √              N
Parallel SVM                    [Chang et al, 2007]•   Parallel, row-wise incomplete Cholesky    Factorization for Q      ...
Parallel SVM                [Chang et al, 2007]•   Parallel, row-wise incomplete Cholesky    Factorization for Q•   Parall...
Parallel SVM                [Chang et al, 2007]•   Parallel, row-wise incomplete Cholesky    Factorization for Q•   Parall...
Parallel ICF• Distribute Q by row into M machines    Machine 1     Machine 2   Machine 3      row 1        row 3       row...
Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • ...
Majority Vote
Majority Vote       Big Data
Majority Vote                    Big DataShard 1   Shard 2   Shard 3    ...   Shard M
Majority Vote                                Big Data      Machine 1   Machine 2   Machine 3          Machine MMap    Shar...
Majority Vote                                Big Data      Machine 1   Machine 2   Machine 3          Machine MMap    Shar...
Majority Vote• Train individual classifiers independently• Predict by taking majority votes• Training CPU: O(TPN) to O(TPN ...
Parameter Mixture               [Mann et al, 2009]
Parameter Mixture   [Mann et al, 2009]         Big Data
Parameter Mixture                     [Mann et al, 2009]                     Big Data Shard 1   Shard 2   Shard 3    ...  ...
Parameter Mixture                          [Mann et al, 2009]                                Big Data      Machine 1   Mac...
Parameter Mixture                          [Mann et al, 2009]                                   Big Data         Machine 1...
Parameter Mixture                          [Mann et al, 2009]                                   Big Data         Machine 1...
Much Less network                                                      usage than                                         ...
Iterative Param Mixture                  [McDonald et al., 2010]
Iterative Param Mixture[McDonald et al., 2010]            Big Data
Iterative Param Mixture            [McDonald et al., 2010]                       Big Data   Shard 1   Shard 2   Shard 3   ...
Iterative Param Mixture                   [McDonald et al., 2010]                                Big Data      Machine 1  ...
Iterative Param Mixture                       [McDonald et al., 2010]                                       Big Data      ...
Iterative Param Mixture                       [McDonald et al., 2010]                                       Big Data      ...
Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
Scalable           http://www.flickr.com/photos/mr_t_in_dc/5469563053
Parallelhttp://www.flickr.com/photos/aloshbennett/3209564747/
Accuracyhttp://www.flickr.com/photos/wanderlinse/4367261825/
http://www.flickr.com/photos/imagelink/4006753760/
Binary                                                     Classificationhttp://www.flickr.com/photos/brenderous/4532934181/
Automatic FeatureDiscovery   http://www.flickr.com/photos/mararie/2340572508/
Fast                                              Responsehttp://www.flickr.com/photos/prunejuice/3687192643/
Memory is new      hard disk.http://www.flickr.com/photos/jepoirrier/840415676/
Algorithm +                                                Infrastructurehttp://www.flickr.com/photos/neubie/854242030/
Design forMulticores             http://www.flickr.com/photos/geektechnique/2344029370/
Combiner
Multi-shard Combiner[Chandra et al., 2010]
MachineLearning on Big Data
Parallelize ML Algorithms
Parallelize ML         Algorithms• Embarrassingly parallel
Parallelize ML         Algorithms• Embarrassingly parallel• Parallelize sub-routines
Parallelize ML         Algorithms• Embarrassingly parallel• Parallelize sub-routines• Distributed learning
Parallel
Parallel   Accuracy
Parallel   Accuracy  FastResponse
Parallel   Accuracy  FastResponse
Google APIs
Google APIs•   Prediction API    •   machine learning service on the cloud    •   http://code.google.com/apis/predict
Google APIs•   Prediction API    •   machine learning service on the cloud    •   http://code.google.com/apis/predict•   B...
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Machine Learning on Big Data
Upcoming SlideShare
Loading in...5
×

Machine Learning on Big Data

29,025

Published on

Max Lin gave a guest lecture about machine learning on big data in the Harvard CS 264 class on March 29, 2011.

Published in: Technology
2 Comments
68 Likes
Statistics
Notes
No Downloads
Views
Total Views
29,025
On Slideshare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
1,184
Comments
2
Likes
68
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \\arg \\min_{\\mathbf{w}} \\sum_{n = 1}^N L(y_i, f(x_i; \\mathbf{w})) + R(\\mathbf{w})\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n\\arg \\min_\\mathbf{w} -\\prod_{i=1}^N \\prod_{p=1}^P P(x^i_p | y_i; \\mathbf{w}) P(y_i; \\mathbf{w})\n\nf(x) = \\sum_{p=1}^P \\log \\frac{\\mathbf{1}(x_p) * w_{p|EN}}{\\mathbf{1}(x_p) * w_{p|ES}} + \\log \\frac{w_{EN}}{w_{ES}}\n\nw_{the|EN} = \\frac{\\sum_{i=1}^N \\mathbf{1}_{EN,the}(x^i)}{\\sum_{i=1}^N \\mathbf{1}_{EN}(x^i)}\n\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  • concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  • concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  • concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  • concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  • concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  • \n
  • \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  • \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  • \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  • \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \\arg \\min \\frac{1}{2} || \\mathbf{w} ||_2^2 + C \\sum_{i=1}^n \\zeta_i\n\\arg \\min_{\\mathbf{w},b,\\mathbf{\\zeta}} \\frac{1}{2} || \\mathbf{w} ||_2^2 + C \\sum_{i=1}^n \\zeta_i\n\n\\text{s.t.} \\quad 1 - y_i(\\mathbf{w} \\cdot \\phi(x_i) + b) \\le \\zeta_i, \\zeta_i \\ge 0\n\n\\arg \\min_\\mathbf{\\alpha} \\frac{1}{2} \\mathbf{\\alpha}^T \\mathbf{Q} \\mathbf{\\alpha} - \\mathbf{\\alpha}^T\\mathbf{1}\n\n\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Amdahl’s law\ncommunication cost, choose M\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Sub-sampling provides inferior performance\nParameter mixture improves, but not as good as all data\nIterative parameter mixture achieves as good as all data\nDistributed algorithms return better classifiers quicker\n\n
  • \n
  • billion of instances, millions of features\nwithin reasonable resources\n
  • \n
  • guarantee, state-of-the-art\n
  • easy to use - adoption setup\neasy to maintain - reliable, production, work with batch systems\n
  • \n
  • \n
  • new features every day\nfeedback change\n
  • iterative, fast retrieval for data and model / parameters\n
  • fault-tolerant pieces: MapReduce (scalable), multi-cores, GFS for data\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "Machine Learning on Big Data"

    1. 1. Machine Learning on Big DataLessons Learned from Google ProjectsMax LinSoftware Engineer | Google ResearchMassively Parallel Computing | Harvard CS 264Guest Lecture | March 29th, 2011
    2. 2. Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
    3. 3. Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
    4. 4. “Machine Learning is a studyof computer algorithms that improve automatically through experience.”
    5. 5. The quick brown foxjumped over the lazy dog.
    6. 6. The quick brown fox Englishjumped over the lazy dog.
    7. 7. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up youneed a computer.
    8. 8. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.
    9. 9. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bienno venga.
    10. 10. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bien Spanishno venga.
    11. 11. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bien Spanishno venga.La tercera es la vencida.
    12. 12. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bien Spanishno venga.La tercera es la vencida. Spanish
    13. 13. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bien Spanishno venga.La tercera es la vencida. SpanishTo be or not to be -- thatis the question
    14. 14. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bien Spanishno venga.La tercera es la vencida. SpanishTo be or not to be -- that ?is the question
    15. 15. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bien Spanishno venga.La tercera es la vencida. SpanishTo be or not to be -- that ?is the questionLa fe mueve montañas.
    16. 16. The quick brown fox Englishjumped over the lazy dog.To err is human, but toreally foul things up you Englishneed a computer.No hay mal que por bien Spanishno venga.La tercera es la vencida. SpanishTo be or not to be -- that ?is the questionLa fe mueve montañas. ?
    17. 17. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you EnglishTraining need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
    18. 18. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you EnglishTraining Input X need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
    19. 19. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you EnglishTraining Input X need a computer. Output Y No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
    20. 20. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you EnglishTraining Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
    21. 21. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you EnglishTraining Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ?Testing is the question La fe mueve montañas. ?
    22. 22. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you EnglishTraining Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ?Testing f(x’) is the question La fe mueve montañas. ?
    23. 23. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you EnglishTraining Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ?Testing f(x’) is the question = y’ La fe mueve montañas. ?
    24. 24. Linear Classifier
    25. 25. Linear ClassifierThe quick brown fox jumped over the lazy dog.
    26. 26. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’
    27. 27. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ...
    28. 28. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’
    29. 29. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ...
    30. 30. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’
    31. 31. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ...
    32. 32. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’
    33. 33. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ...
    34. 34. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’
    35. 35. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
    36. 36. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...0,
    37. 37. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ...
    38. 38. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0,
    39. 39. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ...
    40. 40. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1,
    41. 41. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ...
    42. 42. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1,
    43. 43. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ...
    44. 44. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0,
    45. 45. Linear Classifier The quick brown fox jumped over the lazy dog.‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0, ...
    46. 46. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...[ 0, ... 0, ... 1, ... 1, ... 0, ...
    47. 47. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...[ 0, ... 0, ... 1, ... 1, ... 0, ... ]
    48. 48. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
    49. 49. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ]
    50. 50. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]w [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ]
    51. 51. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]w [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ] P f (x) = w · x = w p ∗ xp p=1
    52. 52. Training Data Input X Ouput Y P ... ... ...N ... ... ... ... ... ... ...
    53. 53. Typical machine learningdata at GoogleN: 100 billions / 1 billionP: 1 billion / 10 million(mean / median) http://www.flickr.com/photos/mr_t_in_dc/5469563053
    54. 54. Classifier Training• Training: Given {(x, y)} and f, minimize the following objective function N arg min L(yi , f (xi ; w)) + R(w) w n=1
    55. 55. Use Newton’s method? t +1 t t −1 tw ← w − H(w ) J(w ) http://www.flickr.com/photos/visitfinland/5424369765/
    56. 56. Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
    57. 57. Scaling Up
    58. 58. Scaling Up• Why big data?
    59. 59. Scaling Up• Why big data?• Parallelize machine learning algorithms
    60. 60. Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel
    61. 61. Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines
    62. 62. Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
    63. 63. Subsampling
    64. 64. Subsampling Big Data
    65. 65. Subsampling Big DataShard 1 Shard 2 Shard 3 Shard M ...
    66. 66. Subsampling Big DataReduce N Shard 1
    67. 67. Subsampling Big DataReduce N Shard 1 Machine
    68. 68. Subsampling Big DataReduce N Machine Shard 1
    69. 69. Subsampling Big DataReduce N Machine Shard 1 Model
    70. 70. Why not Small Data? [Banko and Brill, 2001]
    71. 71. Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
    72. 72. Parallelize Estimates• Naive Bayes Classifier N P i arg min − P (xp |yi ; w)P (yi ; w) w i=1 p=1• Maximum Likelihood Estimates N i i=1 1EN,the (x ) wthe|EN = N i=1 1EN (xi )
    73. 73. Word Counting
    74. 74. Word CountingMap
    75. 75. Word Counting X: “The quick brown fox ...”Map Y: EN
    76. 76. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...”Map Y: EN
    77. 77. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...”Map (‘quick|EN’, 1) Y: EN
    78. 78. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...”Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1)
    79. 79. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1)Reduce
    80. 80. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1)Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ]
    81. 81. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1)Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ] C(‘the’|EN) = SUM of values = 3
    82. 82. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1)Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ] C(‘the’|EN) = SUM of values = 3 C( the |EN ) w the |EN = C(EN )
    83. 83. Word Counting
    84. 84. Word Counting Big Data
    85. 85. Word Counting Big DataShard 1 Shard 2 Shard 3 ... Shard M
    86. 86. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper MMap Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1)
    87. 87. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1) ReducerReduce Tally counts and update w
    88. 88. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1) ReducerReduce Tally counts and update w Model
    89. 89. Parallelize Optimization N P i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
    90. 90. Parallelize Optimization• Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
    91. 91. Parallelize Optimization• Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
    92. 92. Parallelize Optimization• Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
    93. 93. Parallelize Optimization• Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p• Good: J(w) is concave
    94. 94. Parallelize Optimization• Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p• Good: J(w) is concave• Bad: no closed-form solution like NB
    95. 95. Parallelize Optimization• Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p• Good: J(w) is concave• Bad: no closed-form solution like NB• Ugly: Large N
    96. 96. Gradient Descent http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf
    97. 97. Gradient Descent
    98. 98. Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients •
    99. 99. Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients J(w) •
    100. 100. Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w) t+1 t
    101. 101. Gradient Descent• w is initialized as zero• for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w) t+1 t N J(w) = P (w, xi , yi ) i=1
    102. 102. Distribute Gradient• w is initialized as zero• for t in 1 to T • Calculate gradients in parallel• Training CPU: O(TPN) to O(TPN / M)
    103. 103. Distribute Gradient• w is initialized as zero• for t in 1 to T • Calculate gradients in parallel wt+1 ← wt − η J(w)• Training CPU: O(TPN) to O(TPN / M)
    104. 104. Distribute Gradient
    105. 105. Distribute Gradient Big Data
    106. 106. Distribute Gradient Big Data Shard 1 Shard 2 Shard 3 ... Shard M
    107. 107. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine MMap Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum)
    108. 108. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum)Reduce Sum and Update w
    109. 109. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum)Reduce Sum and Update w Repeat M/R until converge Model
    110. 110. Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
    111. 111. Parallelize Subroutines• Support Vector Machines 1 n 2 arg min ||w||2 +C ζi w,b,ζ 2 i=1 s.t. 1 − yi (w · φ(xi ) + b) ≤ ζi , ζi ≥ 0• Solve the dual problem 1 T arg min α Qα − αT 1 α 2 s.t. 0 ≤ α ≤ C, yT α = 0
    112. 112. The computationalcost for the Primal-Dual Interior PointMethod is O(n^3) intime and O(n^2) in memoryhttp://www.flickr.com/photos/sea-turtle/198445204/
    113. 113. Parallel SVM [Chang et al, 2007] √ N
    114. 114. Parallel SVM [Chang et al, 2007]• Parallel, row-wise incomplete Cholesky Factorization for Q √ N
    115. 115. Parallel SVM [Chang et al, 2007]• Parallel, row-wise incomplete Cholesky Factorization for Q• Parallel interior point method • Time O(n^3) becomes O(n^2 / M) √ • Memory O(n^2) becomes O(n N / M)
    116. 116. Parallel SVM [Chang et al, 2007]• Parallel, row-wise incomplete Cholesky Factorization for Q• Parallel interior point method • Time O(n^3) becomes O(n^2 / M) √ • Memory O(n^2) becomes O(n N / M)• Parallel Support Vector Machines (psvm) http:// code.google.com/p/psvm/ • Implement in MPI
    117. 117. Parallel ICF• Distribute Q by row into M machines Machine 1 Machine 2 Machine 3 row 1 row 3 row 5 ... row 2 row 4 row 6• For each dimension n < N √ • Send local pivots to master • Master selects largest local pivots and broadcast the global pivot to workers
    118. 118. Scaling Up• Why big data?• Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
    119. 119. Majority Vote
    120. 120. Majority Vote Big Data
    121. 121. Majority Vote Big DataShard 1 Shard 2 Shard 3 ... Shard M
    122. 122. Majority Vote Big Data Machine 1 Machine 2 Machine 3 Machine MMap Shard 1 Shard 2 Shard 3 ... Shard M
    123. 123. Majority Vote Big Data Machine 1 Machine 2 Machine 3 Machine MMap Shard 1 Shard 2 Shard 3 ... Shard M Model 1 Model 2 Model 3 Model 4
    124. 124. Majority Vote• Train individual classifiers independently• Predict by taking majority votes• Training CPU: O(TPN) to O(TPN / M)
    125. 125. Parameter Mixture [Mann et al, 2009]
    126. 126. Parameter Mixture [Mann et al, 2009] Big Data
    127. 127. Parameter Mixture [Mann et al, 2009] Big Data Shard 1 Shard 2 Shard 3 ... Shard M
    128. 128. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine MMap Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...
    129. 129. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...Reduce Average w
    130. 130. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...Reduce Average w Model
    131. 131. Much Less network usage than distributed gradient descent O(MN) vs. O(MNT)ttp://www.flickr.com/photos/annamatic3000/127945652/
    132. 132. Iterative Param Mixture [McDonald et al., 2010]
    133. 133. Iterative Param Mixture[McDonald et al., 2010] Big Data
    134. 134. Iterative Param Mixture [McDonald et al., 2010] Big Data Shard 1 Shard 2 Shard 3 ... Shard M
    135. 135. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine MMap Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...
    136. 136. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduceafter each Average w epoch
    137. 137. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduceafter each Average w epoch Model
    138. 138. Outline• Machine Learning intro• Scaling machine learning algorithms up• Design choices of large scale ML systems
    139. 139. Scalable http://www.flickr.com/photos/mr_t_in_dc/5469563053
    140. 140. Parallelhttp://www.flickr.com/photos/aloshbennett/3209564747/
    141. 141. Accuracyhttp://www.flickr.com/photos/wanderlinse/4367261825/
    142. 142. http://www.flickr.com/photos/imagelink/4006753760/
    143. 143. Binary Classificationhttp://www.flickr.com/photos/brenderous/4532934181/
    144. 144. Automatic FeatureDiscovery http://www.flickr.com/photos/mararie/2340572508/
    145. 145. Fast Responsehttp://www.flickr.com/photos/prunejuice/3687192643/
    146. 146. Memory is new hard disk.http://www.flickr.com/photos/jepoirrier/840415676/
    147. 147. Algorithm + Infrastructurehttp://www.flickr.com/photos/neubie/854242030/
    148. 148. Design forMulticores http://www.flickr.com/photos/geektechnique/2344029370/
    149. 149. Combiner
    150. 150. Multi-shard Combiner[Chandra et al., 2010]
    151. 151. MachineLearning on Big Data
    152. 152. Parallelize ML Algorithms
    153. 153. Parallelize ML Algorithms• Embarrassingly parallel
    154. 154. Parallelize ML Algorithms• Embarrassingly parallel• Parallelize sub-routines
    155. 155. Parallelize ML Algorithms• Embarrassingly parallel• Parallelize sub-routines• Distributed learning
    156. 156. Parallel
    157. 157. Parallel Accuracy
    158. 158. Parallel Accuracy FastResponse
    159. 159. Parallel Accuracy FastResponse
    160. 160. Google APIs
    161. 161. Google APIs• Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict
    162. 162. Google APIs• Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict• BigQuery • interactive analysis of massive data on the cloud • http://code.google.com/apis/bigquery
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×