Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

3,116 views

Published on

No Downloads

Total views

3,116

On SlideShare

0

From Embeds

0

Number of Embeds

57

Shares

0

Downloads

50

Comments

0

Likes

5

No embeds

No notes for slide

- 1. Bringing Algebraic Semantics to Apache Mahout Sebastian Schelter Infolunch @ Hasso Plattner Institut, Potsdam 05/07/2014
- 2. Mahout: Past & Future
- 3. Apache Mahout: History • library for scalable machine learning (ML) • started six years ago as ML on MapReduce • focus on popular ML problems and algorithms – Collaborative Filtering • (Item-based, User-based, Matrix Factorization) – Classification • (Naive Bayes, Logistic Regression, Random Forest) – Clustering • (k-Means, Streaming k-Means) – Dimensionality Reduction • (Lanczos, Stochastic SVD) + in-memory math library (fork of Colt) • large userbase (e.g. Adobe, AOL, Accenture, Foursquare, Mendeley, Researchgate, Twitter)
- 4. Apache Mahout: Problems • Organisational – Onerous to provide support and documentation for scalable ML (need to convey technical and theoretical background at the same time) – partly failed to compensate for developers that left the project – barrier for contributions too low • Technical – MapReduce not well suited for ML • slow execution, especially for iterations • constrained programming model makes code hard to write, read and adjust – different quality of implementations – lack of unified preprocessing machinery
- 5. Apache Mahout: Future • Narrowing of focus – removed a large number of rarely used or barely maintained algorithm implementations • Abandonment of MapReduce – will reject new MapReduce implementations – widely used „legacy“ implementations will be maintained • „Reboot“ – usage of modern parallel processing systems which offer richer progamming model & more efficient execution → Apache Spark, possibly Stratosphere in the future – DSL for linear algebraic operations as foundation for new implemenations – automatic optimization and parallelization of programs written in this DSL
- 6. Scala & Spark-Bindings
- 7. Requirements for an ideal ML environment 1. R/Matlab-like semantics – type system that covers (a) linear algebra, (b) statistics and (c) data frames 2. Modern programming language qualities – functional programming – object oriented programming – sensible byte code Performance – scriptable and interactive 3. Scalability – automatic distribution and parallelization with sensible performance 4. Collection of off-the-shelf building blocks and algorithms 5. Visualization
- 8. Requirements for an ideal ML environment 1. R/Matlab-like semantics – type system that covers (a) linear algebra, (b) statistics and (c) data frames 2. Modern programming language qualities – functional programming – object oriented programming – sensible byte code Performance – scriptable and Interactive 3. Scalability – automatic distribution and parallelization with sensible performance 4. Collection of off-the-shelf building blocks and algorithms 5. Visualization → Mahout‘s new Scala & Spark-bindings address requirements 1(a), 2, 3 and 4
- 9. Scala & Spark Bindings • Scala as programming/scripting environment • R-like DSL : val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q) • Algebraic expression optimizer for distributed linear algebra – provides a translation layer to distributed engines – currently supports Apache Spark only – might support Stratosphere in the future q T q TTT ssCCBBG
- 10. Data Types • Scalar real values • In-memory vectors – dense – 2 types of sparse • In-memory matrices – sparse and dense – a number of specialized matrices • Distributed Row Matrices (DRM) – huge matrix, partitioned by rows – lives in the main memory of the cluster – provides small set of parallelized operations – lazily evaluated operation execution val x = 2.367 val v = dvec(1, 0, 5) val w = svec((0 -> 1)::(2 -> 5)::Nil) val A = dense((1, 0, 5), (2, 1, 4), (4, 3, 1)) val drmA = drmFromHDFS(...)
- 11. Features (1) • Matrix, vector, scalar operators: in-memory, out-of-core • Slicing operators • Assignments (in-memory only) • Vector-specific • Summaries drmA %*% drmB A %*% x A.t %*% drmB A * B A(5 until 20, 3 until 40) A(5, ::); A(5, 5) x(a to b) A(5, ::) := x A *= B A -=: B; 1 /:= x x dot y; x cross y A.nrow; x.length; A.colSums; B.rowMeans x.sum; A.norm
- 12. Features (2) • solving linear systems • in-memory decompositions • out-of-core decompositions • caching of DRMs val x = solve(A, b) val (inMemQ, inMemR) = qr(inMemM) val ch = chol(inMemM) val (inMemV, d) = eigen(inMemM) val (inMemU, inMemV, s) = svd(inMemM) val (drmQ, inMemR) = thinQR(drmA) val (drmU, drmV, s) = dssvd(drmA, k = 50, q = 1) val drmA_cached = drmA.checkpoint() drmA_cached.uncache()
- 13. Example
- 14. Cereals Name protein fat carbo sugars rating Apple Cinnamon Cheerios 2 2 10.5 10 29.509541 Cap‘n‘Crunch 1 2 12 12 18.042851 Cocoa Puffs 1 1 12 13 22.736446 Froot Loops 2 1 11 13 32.207582 Honey Graham Ohs 1 2 12 11 21.871292 Wheaties Honey Gold 2 1 16 8 36.187559 Cheerios 6 2 17 1 50.764999 Clusters 3 2 13 7 40.400208 Great Grains Pecan 3 3 13 4 45.811716 http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html
- 15. Linear Regression • Assumption: target variable y generated by linear combination of feature matrix X with parameter vector β, plus noise ε • Goal: find estimate of the parameter vector β that explains the data well • Cereals example X = weights of ingredients y = customer rating Xy
- 16. Data Ingestion • Usually: load dataset as DRM from a distributed filesystem: val drmData = drmFromHdfs(...) • ‚Mimick‘ a large dataset for our example: val drmData = drmParallelize(dense( (2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios (1, 2, 12, 12, 18.042851), // Cap'n'Crunch (1, 1, 12, 13, 22.736446), // Cocoa Puffs (2, 1, 11, 13, 32.207582), // Froot Loops (1, 2, 12, 11, 21.871292), // Honey Graham Ohs (2, 1, 16, 8, 36.187559), // Wheaties Honey Gold (6, 2, 17, 1, 50.764999), // Cheerios (3, 2, 13, 7, 40.400208), // Clusters (3, 3, 13, 4, 45.811716)), // Great Grains Pecan numPartitions = 2)
- 17. Data Preparation • Cereals example: target variable y is customer rating, weights of ingredients are features X • extract X as DRM by slicing, fetch y as in-core vector val drmX = drmData(::, 0 until 4) val y = drmData.collect(::, 4) 8117164541333 4002084071323 7649995011726 1875593681612 87129221111221 20758232131112 73644622131211 04285118121221 509541291051022 . . . . . . . . .. drmX y
- 18. Estimating β • Ordinary Least Squares: minimizes the sum of residual squares between true target variable and prediction of target variable • Closed-form expression for estimation of ß as • Computing XTX and XTy is as simple as typing the formulas: val drmXtX = drmX.t %*% drmX val drmXty = drmX %*% y yXXX TT 1 )(ˆ
- 19. Estimating β • Solve the following linear system to get least-squares estimate of ß • Fetch XTX andXTy onto the driver and use an in-core solver – assumes XTX fits into memory – uses analogon to R’s solve() function val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0) val betaHat = solve(XtX, Xty) yXXX TT ˆ
- 20. Estimating β • Solve the following linear system to get least-squares estimate of ß • Fetch XTX andXTy onto the driver and use an in-memory solver – assumes XTX fits into memory – uses analogon to R’s solve() function val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0) val betaHat = solve(XtX, Xty) yXXX TT ˆ → We have implemented distributed linear regression!
- 21. Goodness of fit • Prediction of the target variable is simple matrix-vector multiplication • Check L2 norm of the difference between true target variable and our prediction val yHat = (drmX %*% betaHat).collect(::, 0) (y - yHat).norm(2) ˆˆ Xy
- 22. Adding a bias term • Bias term left out so far – constant factor added to the model, “shifts the line vertically” • Common trick is to add a column of ones to the feature matrix – bias term will be learned automatically 141333 171323 111726 181612 1111221 1131112 1131211 1121221 11051022 . 41333 71323 11726 81612 111221 131112 131211 121221 105.1022
- 23. Adding a bias term • How do we add a new column to a DRM? → mapBlock() allows custom modifications to the matrix val drmXwithBiasColumn = drmX.mapBlock(ncol = drmX.ncol + 1) { case(keys, block) => // create a new block with an additional column val blockWithBiasCol = block.like(block.nrow, block.ncol+1) // copy data from current block into the new block blockWithBiasCol(::, 0 until block.ncol) := block // last column consists of ones blockWithBiasColumn(::, block.ncol) := 1 keys -> blockWithBiasColumn }
- 24. Under the covers
- 25. Underlying systems • currently: prototype on Apache Spark – fast and expressive cluster computing system – general computation graphs, in-memory primitives, rich API, interactive shell • future: add Stratosphere – research system developed by TU Berlin, HU Berlin, HPI – accepted into Apache Incubator recently – functionality similar to Apache Spark, adds data flow optimization and efficient out-of-core execution
- 26. Optimization • Execution is defered, user composes logical operators • Computational actions implicitly trigger optimization (= selection of physical plan) and execution • Optimization factors: size of operands, orientation of operands, partitioning, sharing of computational paths • e. g.: matrix multiplication: – 5 physical operators for drmA %*% drmB – 2 operators for drmA %*% inMemA – 1 operator for drm A %*% x – 1 operator for x %*% drmA val drmA = drmB.t %*% drmC drmA.writeDrm(path); val inMemA = (drmB.t %*% drmB).collect
- 27. Optimization Example (1) • Computation of XTX in the linear regression example • Logical optimization: logical plan gets rewritten to use a special logical operator for Transpose-Times-Self multiplication via a rule in the optimizer val drmXtX = drmX.t %*% drmX val XtX = drmXtX.collect OpAt OpAB drmX drmX drmXtX OpAtA drmX drmXtX
- 28. Optimization Example (2) • Mahout computes XTX via row-outer-product formulation of matrix multiplication that can be efficiently executed in a single pass over the row-partitioned X • optimizer chooses between to physical operators for OpAtA – standard implementation: • Map operator builds partial outer products from individual blocks based on an estimate of a good partitioning • Reduce operator adds them up using a Reduce operator – AtA_slim for tall but skinny matrices when an intermediate symmetric matrix requiring (X.ncol * X.ncol) / 2 entries fits into memory m i T ii T xxXX 0
- 29. Thank you. Questions? Credits: Design & implementation of the Scala & Spark Bindings and parts of this slideset by Dmitriy Lyubimov dlyubimov@apache.org

No public clipboards found for this slide

Be the first to comment