Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

Machine Learning on Big Data Slide 1 Machine Learning on Big Data Slide 2 Machine Learning on Big Data Slide 3 Machine Learning on Big Data Slide 4 Machine Learning on Big Data Slide 5 Machine Learning on Big Data Slide 6 Machine Learning on Big Data Slide 7 Machine Learning on Big Data Slide 8 Machine Learning on Big Data Slide 9 Machine Learning on Big Data Slide 10 Machine Learning on Big Data Slide 11 Machine Learning on Big Data Slide 12 Machine Learning on Big Data Slide 13 Machine Learning on Big Data Slide 14 Machine Learning on Big Data Slide 15 Machine Learning on Big Data Slide 16 Machine Learning on Big Data Slide 17 Machine Learning on Big Data Slide 18 Machine Learning on Big Data Slide 19 Machine Learning on Big Data Slide 20 Machine Learning on Big Data Slide 21 Machine Learning on Big Data Slide 22 Machine Learning on Big Data Slide 23 Machine Learning on Big Data Slide 24 Machine Learning on Big Data Slide 25 Machine Learning on Big Data Slide 26 Machine Learning on Big Data Slide 27 Machine Learning on Big Data Slide 28 Machine Learning on Big Data Slide 29 Machine Learning on Big Data Slide 30 Machine Learning on Big Data Slide 31 Machine Learning on Big Data Slide 32 Machine Learning on Big Data Slide 33 Machine Learning on Big Data Slide 34 Machine Learning on Big Data Slide 35 Machine Learning on Big Data Slide 36 Machine Learning on Big Data Slide 37 Machine Learning on Big Data Slide 38 Machine Learning on Big Data Slide 39 Machine Learning on Big Data Slide 40 Machine Learning on Big Data Slide 41 Machine Learning on Big Data Slide 42 Machine Learning on Big Data Slide 43 Machine Learning on Big Data Slide 44 Machine Learning on Big Data Slide 45 Machine Learning on Big Data Slide 46 Machine Learning on Big Data Slide 47 Machine Learning on Big Data Slide 48 Machine Learning on Big Data Slide 49 Machine Learning on Big Data Slide 50 Machine Learning on Big Data Slide 51 Machine Learning on Big Data Slide 52 Machine Learning on Big Data Slide 53 Machine Learning on Big Data Slide 54 Machine Learning on Big Data Slide 55 Machine Learning on Big Data Slide 56 Machine Learning on Big Data Slide 57 Machine Learning on Big Data Slide 58 Machine Learning on Big Data Slide 59 Machine Learning on Big Data Slide 60 Machine Learning on Big Data Slide 61 Machine Learning on Big Data Slide 62 Machine Learning on Big Data Slide 63 Machine Learning on Big Data Slide 64 Machine Learning on Big Data Slide 65 Machine Learning on Big Data Slide 66 Machine Learning on Big Data Slide 67 Machine Learning on Big Data Slide 68 Machine Learning on Big Data Slide 69 Machine Learning on Big Data Slide 70 Machine Learning on Big Data Slide 71 Machine Learning on Big Data Slide 72 Machine Learning on Big Data Slide 73 Machine Learning on Big Data Slide 74 Machine Learning on Big Data Slide 75 Machine Learning on Big Data Slide 76 Machine Learning on Big Data Slide 77 Machine Learning on Big Data Slide 78 Machine Learning on Big Data Slide 79 Machine Learning on Big Data Slide 80 Machine Learning on Big Data Slide 81 Machine Learning on Big Data Slide 82 Machine Learning on Big Data Slide 83 Machine Learning on Big Data Slide 84 Machine Learning on Big Data Slide 85 Machine Learning on Big Data Slide 86 Machine Learning on Big Data Slide 87 Machine Learning on Big Data Slide 88 Machine Learning on Big Data Slide 89 Machine Learning on Big Data Slide 90 Machine Learning on Big Data Slide 91 Machine Learning on Big Data Slide 92 Machine Learning on Big Data Slide 93 Machine Learning on Big Data Slide 94 Machine Learning on Big Data Slide 95 Machine Learning on Big Data Slide 96 Machine Learning on Big Data Slide 97 Machine Learning on Big Data Slide 98 Machine Learning on Big Data Slide 99 Machine Learning on Big Data Slide 100 Machine Learning on Big Data Slide 101 Machine Learning on Big Data Slide 102 Machine Learning on Big Data Slide 103 Machine Learning on Big Data Slide 104 Machine Learning on Big Data Slide 105 Machine Learning on Big Data Slide 106 Machine Learning on Big Data Slide 107 Machine Learning on Big Data Slide 108 Machine Learning on Big Data Slide 109 Machine Learning on Big Data Slide 110 Machine Learning on Big Data Slide 111 Machine Learning on Big Data Slide 112 Machine Learning on Big Data Slide 113 Machine Learning on Big Data Slide 114 Machine Learning on Big Data Slide 115 Machine Learning on Big Data Slide 116 Machine Learning on Big Data Slide 117 Machine Learning on Big Data Slide 118 Machine Learning on Big Data Slide 119 Machine Learning on Big Data Slide 120 Machine Learning on Big Data Slide 121 Machine Learning on Big Data Slide 122 Machine Learning on Big Data Slide 123 Machine Learning on Big Data Slide 124 Machine Learning on Big Data Slide 125 Machine Learning on Big Data Slide 126 Machine Learning on Big Data Slide 127 Machine Learning on Big Data Slide 128 Machine Learning on Big Data Slide 129 Machine Learning on Big Data Slide 130 Machine Learning on Big Data Slide 131 Machine Learning on Big Data Slide 132 Machine Learning on Big Data Slide 133 Machine Learning on Big Data Slide 134 Machine Learning on Big Data Slide 135 Machine Learning on Big Data Slide 136 Machine Learning on Big Data Slide 137 Machine Learning on Big Data Slide 138 Machine Learning on Big Data Slide 139 Machine Learning on Big Data Slide 140 Machine Learning on Big Data Slide 141 Machine Learning on Big Data Slide 142 Machine Learning on Big Data Slide 143 Machine Learning on Big Data Slide 144 Machine Learning on Big Data Slide 145 Machine Learning on Big Data Slide 146 Machine Learning on Big Data Slide 147 Machine Learning on Big Data Slide 148 Machine Learning on Big Data Slide 149 Machine Learning on Big Data Slide 150 Machine Learning on Big Data Slide 151 Machine Learning on Big Data Slide 152 Machine Learning on Big Data Slide 153 Machine Learning on Big Data Slide 154 Machine Learning on Big Data Slide 155 Machine Learning on Big Data Slide 156 Machine Learning on Big Data Slide 157 Machine Learning on Big Data Slide 158 Machine Learning on Big Data Slide 159 Machine Learning on Big Data Slide 160 Machine Learning on Big Data Slide 161 Machine Learning on Big Data Slide 162 Machine Learning on Big Data Slide 163 Machine Learning on Big Data Slide 164 Machine Learning on Big Data Slide 165 Machine Learning on Big Data Slide 166 Machine Learning on Big Data Slide 167 Machine Learning on Big Data Slide 168 Machine Learning on Big Data Slide 169 Machine Learning on Big Data Slide 170 Machine Learning on Big Data Slide 171 Machine Learning on Big Data Slide 172 Machine Learning on Big Data Slide 173 Machine Learning on Big Data Slide 174 Machine Learning on Big Data Slide 175 Machine Learning on Big Data Slide 176 Machine Learning on Big Data Slide 177
Upcoming SlideShare
501 synonyms and antonyms
Next
Download to read offline and view in fullscreen.

109 Likes

Share

Download to read offline

Machine Learning on Big Data

Download to read offline

Max Lin gave a guest lecture about machine learning on big data in the Harvard CS 264 class on March 29, 2011.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Machine Learning on Big Data

  1. 1. Machine Learning on Big Data Lessons Learned from Google Projects Max Lin Software Engineer | Google Research Massively Parallel Computing | Harvard CS 264 Guest Lecture | March 29th, 2011
  2. 2. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  3. 3. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  4. 4. “Machine Learning is a study of computer algorithms that improve automatically through experience.”
  5. 5. The quick brown fox jumped over the lazy dog.
  6. 6. The quick brown fox English jumped over the lazy dog.
  7. 7. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you need a computer.
  8. 8. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer.
  9. 9. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien no venga.
  10. 10. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga.
  11. 11. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida.
  12. 12. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
  13. 13. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that is the question
  14. 14. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question
  15. 15. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas.
  16. 16. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  17. 17. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  18. 18. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  19. 19. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  20. 20. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  21. 21. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? Testing is the question La fe mueve montañas. ?
  22. 22. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? Testing f(x’) is the question La fe mueve montañas. ?
  23. 23. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? Testing f(x’) is the question = y’ La fe mueve montañas. ?
  24. 24. Linear Classifier
  25. 25. Linear Classifier The quick brown fox jumped over the lazy dog.
  26. 26. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’
  27. 27. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ...
  28. 28. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’
  29. 29. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ...
  30. 30. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’
  31. 31. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ...
  32. 32. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’
  33. 33. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ...
  34. 34. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’
  35. 35. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
  36. 36. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,
  37. 37. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ...
  38. 38. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0,
  39. 39. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ...
  40. 40. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1,
  41. 41. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ...
  42. 42. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1,
  43. 43. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ...
  44. 44. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0,
  45. 45. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0, ...
  46. 46. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ...
  47. 47. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
  48. 48. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
  49. 49. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ]
  50. 50. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] w [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ]
  51. 51. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] w [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ] P f (x) = w · x = w p ∗ xp p=1
  52. 52. Training Data Input X Ouput Y P ... ... ... N ... ... ... ... ... ... ...
  53. 53. Typical machine learning data at Google N: 100 billions / 1 billion P: 1 billion / 10 million (mean / median) http://www.flickr.com/photos/mr_t_in_dc/5469563053
  54. 54. Classifier Training • Training: Given {(x, y)} and f, minimize the following objective function N arg min L(yi , f (xi ; w)) + R(w) w n=1
  55. 55. Use Newton’s method? t +1 t t −1 t w ← w − H(w ) J(w ) http://www.flickr.com/photos/visitfinland/5424369765/
  56. 56. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  57. 57. Scaling Up
  58. 58. Scaling Up • Why big data?
  59. 59. Scaling Up • Why big data? • Parallelize machine learning algorithms
  60. 60. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel
  61. 61. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines
  62. 62. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  63. 63. Subsampling
  64. 64. Subsampling Big Data
  65. 65. Subsampling Big Data Shard 1 Shard 2 Shard 3 Shard M ...
  66. 66. Subsampling Big Data Reduce N Shard 1
  67. 67. Subsampling Big Data Reduce N Shard 1 Machine
  68. 68. Subsampling Big Data Reduce N Machine Shard 1
  69. 69. Subsampling Big Data Reduce N Machine Shard 1 Model
  70. 70. Why not Small Data? [Banko and Brill, 2001]
  71. 71. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  72. 72. Parallelize Estimates • Naive Bayes Classifier N P i arg min − P (xp |yi ; w)P (yi ; w) w i=1 p=1 • Maximum Likelihood Estimates N i i=1 1EN,the (x ) wthe|EN = N i=1 1EN (xi )
  73. 73. Word Counting
  74. 74. Word Counting Map
  75. 75. Word Counting X: “The quick brown fox ...” Map Y: EN
  76. 76. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map Y: EN
  77. 77. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN
  78. 78. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1)
  79. 79. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce
  80. 80. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ]
  81. 81. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ] C(‘the’|EN) = SUM of values = 3
  82. 82. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ] C(‘the’|EN) = SUM of values = 3 C( the |EN ) w the |EN = C(EN )
  83. 83. Word Counting
  84. 84. Word Counting Big Data
  85. 85. Word Counting Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  86. 86. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1)
  87. 87. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1) Reducer Reduce Tally counts and update w
  88. 88. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1) Reducer Reduce Tally counts and update w Model
  89. 89. Parallelize Optimization N P i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  90. 90. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  91. 91. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  92. 92. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  93. 93. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p • Good: J(w) is concave
  94. 94. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p • Good: J(w) is concave • Bad: no closed-form solution like NB
  95. 95. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p • Good: J(w) is concave • Bad: no closed-form solution like NB • Ugly: Large N
  96. 96. Gradient Descent http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf
  97. 97. Gradient Descent
  98. 98. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients •
  99. 99. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients J(w) •
  100. 100. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w) t+1 t
  101. 101. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w) t+1 t N J(w) = P (w, xi , yi ) i=1
  102. 102. Distribute Gradient • w is initialized as zero • for t in 1 to T • Calculate gradients in parallel • Training CPU: O(TPN) to O(TPN / M)
  103. 103. Distribute Gradient • w is initialized as zero • for t in 1 to T • Calculate gradients in parallel wt+1 ← wt − η J(w) • Training CPU: O(TPN) to O(TPN / M)
  104. 104. Distribute Gradient
  105. 105. Distribute Gradient Big Data
  106. 106. Distribute Gradient Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  107. 107. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum)
  108. 108. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum) Reduce Sum and Update w
  109. 109. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum) Reduce Sum and Update w Repeat M/R until converge Model
  110. 110. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  111. 111. Parallelize Subroutines • Support Vector Machines 1 n 2 arg min ||w||2 +C ζi w,b,ζ 2 i=1 s.t. 1 − yi (w · φ(xi ) + b) ≤ ζi , ζi ≥ 0 • Solve the dual problem 1 T arg min α Qα − αT 1 α 2 s.t. 0 ≤ α ≤ C, yT α = 0
  112. 112. The computational cost for the Primal- Dual Interior Point Method is O(n^3) in time and O(n^2) in memory http://www.flickr.com/photos/sea-turtle/198445204/
  113. 113. Parallel SVM [Chang et al, 2007] √ N
  114. 114. Parallel SVM [Chang et al, 2007] • Parallel, row-wise incomplete Cholesky Factorization for Q √ N
  115. 115. Parallel SVM [Chang et al, 2007] • Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) √ • Memory O(n^2) becomes O(n N / M)
  116. 116. Parallel SVM [Chang et al, 2007] • Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) √ • Memory O(n^2) becomes O(n N / M) • Parallel Support Vector Machines (psvm) http:// code.google.com/p/psvm/ • Implement in MPI
  117. 117. Parallel ICF • Distribute Q by row into M machines Machine 1 Machine 2 Machine 3 row 1 row 3 row 5 ... row 2 row 4 row 6 • For each dimension n < N √ • Send local pivots to master • Master selects largest local pivots and broadcast the global pivot to workers
  118. 118. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  119. 119. Majority Vote
  120. 120. Majority Vote Big Data
  121. 121. Majority Vote Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  122. 122. Majority Vote Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
  123. 123. Majority Vote Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M Model 1 Model 2 Model 3 Model 4
  124. 124. Majority Vote • Train individual classifiers independently • Predict by taking majority votes • Training CPU: O(TPN) to O(TPN / M)
  125. 125. Parameter Mixture [Mann et al, 2009]
  126. 126. Parameter Mixture [Mann et al, 2009] Big Data
  127. 127. Parameter Mixture [Mann et al, 2009] Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  128. 128. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...
  129. 129. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce Average w
  130. 130. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce Average w Model
  131. 131. Much Less network usage than distributed gradient descent O(MN) vs. O(MNT) ttp://www.flickr.com/photos/annamatic3000/127945652/
  132. 132. Iterative Param Mixture [McDonald et al., 2010]
  133. 133. Iterative Param Mixture[McDonald et al., 2010] Big Data
  134. 134. Iterative Param Mixture [McDonald et al., 2010] Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  135. 135. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...
  136. 136. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch
  137. 137. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch Model
  138. 138. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  139. 139. Scalable http://www.flickr.com/photos/mr_t_in_dc/5469563053
  140. 140. Parallel http://www.flickr.com/photos/aloshbennett/3209564747/
  141. 141. Accuracy http://www.flickr.com/photos/wanderlinse/4367261825/
  142. 142. http://www.flickr.com/photos/imagelink/4006753760/
  143. 143. Binary Classification http://www.flickr.com/photos/brenderous/4532934181/
  144. 144. Automatic Feature Discovery http://www.flickr.com/photos/mararie/2340572508/
  145. 145. Fast Response http://www.flickr.com/photos/prunejuice/3687192643/
  146. 146. Memory is new hard disk. http://www.flickr.com/photos/jepoirrier/840415676/
  147. 147. Algorithm + Infrastructure http://www.flickr.com/photos/neubie/854242030/
  148. 148. Design for Multicores http://www.flickr.com/photos/geektechnique/2344029370/
  149. 149. Combiner
  150. 150. Multi-shard Combiner [Chandra et al., 2010]
  151. 151. Machine Learning on Big Data
  152. 152. Parallelize ML Algorithms
  153. 153. Parallelize ML Algorithms • Embarrassingly parallel
  154. 154. Parallelize ML Algorithms • Embarrassingly parallel • Parallelize sub-routines
  155. 155. Parallelize ML Algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  156. 156. Parallel
  157. 157. Parallel Accuracy
  158. 158. Parallel Accuracy Fast Response
  159. 159. Parallel Accuracy Fast Response
  160. 160. Google APIs
  161. 161. Google APIs • Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict
  162. 162. Google APIs • Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict • BigQuery • interactive analysis of massive data on the cloud • http://code.google.com/apis/bigquery
  • KetanPatel219

    May. 1, 2018
  • ibanitescu

    Mar. 21, 2018
  • gsandis

    Jan. 31, 2018
  • GunesInal

    Jan. 19, 2018
  • KazumaOgane

    Dec. 11, 2017
  • shosan88

    Dec. 1, 2017
  • MuhammadIkram16

    Nov. 6, 2017
  • luckyrusiaster

    Oct. 30, 2017
  • sureshmadhuvarsu

    Oct. 5, 2017
  • zhb_mccoy

    Sep. 29, 2017
  • ZhangJing1

    Aug. 30, 2017
  • thanhtran81

    Jul. 14, 2017
  • rohithreddykota

    Jul. 1, 2017
  • RISHAVKUMAR80

    May. 14, 2017
  • gyanprakash50999405

    Apr. 15, 2017
  • MaraVzquez33

    Feb. 21, 2017
  • LarsMeinichAndersen

    Feb. 17, 2017
  • weima794

    Feb. 5, 2017
  • AyaSeitkazina

    Jan. 23, 2017
  • NehaBansalJindal

    Jan. 19, 2017

Max Lin gave a guest lecture about machine learning on big data in the Harvard CS 264 class on March 29, 2011.

Views

Total views

37,704

On Slideshare

0

From embeds

0

Number of embeds

6,065

Actions

Downloads

2,111

Shares

0

Comments

0

Likes

109

×