Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Introduction to Mahout and Machine ... by Varad Meru 55872 views
- Apache Mahout Tutorial - Recommenda... by Cataldo Musto 30344 views
- Intro to Apache Mahout by Grant Ingersoll 27357 views
- Mahout Tutorial and Hands-on (versi... by Cataldo Musto 2391 views
- Whats Right and Wrong with Apache M... by Ted Dunning 4656 views
- Tutorial Mahout - Recommendation by Cataldo Musto 32765 views

10,744 views

10,245 views

10,245 views

Published on

No Downloads

Total views

10,744

On SlideShare

0

From Embeds

0

Number of Embeds

338

Shares

0

Downloads

540

Comments

0

Likes

32

No embeds

No notes for slide

- 1. + Machine Learning and Apache Mahout Varad Meru Software Development Engineer Orzota, Inc. about.me/vrdmr © Varad Meru, 2013
- 2. + 2 Who Am I Orzota, Inc. Making BigData Easy Designing a Cloud-based platform for ETL, Analytics Past Work Experience Persistent Systems Ltd. Recommendation Engines and User Behavior Analytics. Area of Interest Machine Learning Distributed Systems Recommendation Engines
- 3. + 3 Outline Introduction Machine Learning Apache Mahout Introduction and History Types of Learning Algorithms Applications What’s New History Architecture Applications and Examples Conclusion © Varad Meru, 2013
- 4. + Machine Learning Rise of the Machine-Era 4
- 5. + 5 Introduction “Machine Learning is Programming Computers to optimize a Performance Criterion using Example Data or Past Experience” Term coined by Arthur Samuel "Field of study that gives computers the ability to learn without being explicitly programmed“. Branch of Artificial Intelligence and Statistics Focuses on prediction based on known properties Used as a sub-process in Data Mining. Data Mining focuses on discovering new, unknown properties.
- 6. + 6 Learning Algorithms Supervised Learning Unsupervised Learning Unlabelled input data. Creating a function to predict the relation and output Semi-Supervised Learning Labelled input data. Creating classifiers to predict unseen inputs. Combines Supervised and Unsupervised Learning methodology Reinforcement Learning Reward-Punishment based agent.
- 7. + 7 Supervised Learning Introduction Learn from the Data Data is already labelled Expert, Crowd-sourced or case-based labelling of data. Applications Handwriting Recognition Spam Detection Information Retrieval Personalisation based on ranks Speech Recognition
- 8. + 8 Supervised Learning Algorithms Decision Trees k-Nearest Neighbours Naive Bayes Logistic Regression Perceptron and Multi-level Perceptrons Neural Networks SVM and Kernel estimation
- 9. + 9 Supervised Learning Example: Naive Bayes Classifier President Obama’s Speech’s Word Map
- 10. + 10 Supervised Learning Example: Naive Bayes Classifier A Spam Document’s Word Map
- 11. + 11 Supervised Learning Example: Naive Bayes Classifier Running a test on the Classifier “Order a trial Adobe chicken daily EABList new summer savings, welcome!” Classifier Spam Bin
- 12. + 12 Unsupervised Learning Introduction Finding hidden structure in data Unlabelled Data SMEs needed post-processing to verify, validate and use the output Used in exploratory analysis rather than predictive analytics Applications Pattern Recognition Groupings based on a distance measure Group of People, Objects, ...
- 13. + 13 Unsupervised Learning Algorithms Clustering k-Means, MinHash, Hierarchical Clustering Hidden Markov Models Feature Extraction methods Self-organizing Maps (Neural Nets)
- 14. + 14 Unsupervised Learning Example K-Means Source: http://apandre.wordpress.com/visible-data/cluster-analysis/
- 15. + 15 Learning Problem Cat and Dog Problem Humans can easily classify which is a cat and which is a dog. But how can a computer do that? Some attempts used Clustering Mechanisms to solve it – Cooccurence Clustering, Deep Learning
- 16. + Apache Mahout Scalable Machine Learning Library 16 © Varad Meru, 2013
- 17. + 17 History and Etymology Inspired from MapReduce for Machine Learning on Multicore” Ng et. al. Written in Java. Apache License. Founders Mahout – Isabel Drost, Grant Ingersoll, Karl Witten. Taste – Sean Owen Mahout – Keeper/Driver of Elephants. Current Release – 0.8 (stable) © Varad Meru, 2013
- 18. + Size Need BigData Ever-growing data. Yesterday’s methods to process tomorrow’s data Cheap Storage Scalable from Ground Up Lines Sample Data KBs – low MBs Prototype Data Analysis and Visualisation Analysis and Visualisation Tools18 Whiteboard, Bash, ... Matlab, Octave, R, Processing, Bash, ... Storage MySQL (DBs), ... Analysis NumPy, SciPy, Pandas, Weka.. MBs – low GBs Should be build on top of anyOnline existing Distributed Systems Data framework Should contain distributed version of ML algorithms Classification GBs – TBs – PBs Visualisation Flare, AmCharts, Raphael Storage HDFS, Hbase, Cassandra,... Analysis Hive, Giraph, Hama, Mahout
- 19. + 19 Mahout Modules Applications Evolutionary Algorithms Classification Utilies Lucene/Vectorizer Clustering Recommenders Math Vectors/ Matrics/SVD Regression Collections (Primitives) FPM Dimension Reduction Hadoop
- 20. + 20 Recommender Systems © Varad Meru, 2013
- 21. + 21 Recommender Systems Introduction Types of Recommender Systems Content Based Recommendations Collaborative Filtering Recommendations User-User Recommendations Item-Item Recommendations Dimensionality Reduction (SVD) Recommendations Applications Products you would like to buy People you might want to connect with Potential Life-Partners Recommending Songs you might like ...
- 22. + 22 Recommender Systems Collaborative Filtering in Action Assuming people have seen at least one movie. Cold Start? © Varad Meru, 2013 1: seen 0: not seen
- 23. + 23 Collaborative Filtering in Action Tanimoto Coefficient T ( a, b) NA NC NB NC NA – Number of Customers who bought A NB – Number of Customers who bought B NC – Number of Customers who bought A and B © Varad Meru, 2013
- 24. + 24 Collaborative Filtering in Action Cosine Coefficient C (a, b) NC NA NB NA – Number of Customers who bought A NB – Number of Customers who bought B NC – Number of Customers who bought A and B © Varad Meru, 2013
- 25. + 25 Apache Mahout Recommender System Architecture Two Modes Stand-alone non distributed (“Taste”) Scalable Distributed Algorithmic version for Collaborative Filtering Top-level Packages Data Model User Similarity Item Similarity User Neighbourhood Recommender
- 26. + 26 Naive Bayes Classifier “Order a trial Adobe chicken daily EABList new summer savings, welcome!” Classifier
- 27. + 27 Naive Bayes Classifier Naive Bayes is a pretty complex process in Mahout: training the classifier requires four separate Hadoop jobs. Training: Calculate per-Document Statistics Normalize across Categories Read the Features Calculate normalizing factor of each label Testing Classification (fifth job, explicitly invoked) © Varad Meru, 2013
- 28. + 28 K-Means Clustering Iterations
- 29. + 29 K-Means Clustering MapReduce Version
- 30. 30 + Summary • Machine Learning • • • Learning Algorithms Varied Applications Mahout • Scaling to Giga/Tera/Peta Scale • Free and Open Source
- 31. + 31 More Info. 1. “Scalable Similarity-Based Neighborhood Methods with MapReduce” by Sebastian Schelter, Christoph Boden and Volker Markl. – RecSys 2012. 2. “Case Study Evaluation of Mahout as a Recommender Platform” by Carlos E. Seminario and David C. Wilson - Workshop on Recommendation Utility Evaluation: Beyond RMSE (RUE 2012) 3. http://mahout.apache.org/ - Apache Mahout Project Page 4. http://www.ibm.com/developerworks/java/library/j-mahout/ Introducing Apache Mahout 5. [VIDEO] “Collaborative filtering at scale” by Sean Owen 6. [BOOK] “Mahout in Action” by Owen et. al., Manning Pub. © Varad Meru, 2013
- 32. + Questions? 32 © Varad Meru, 2013
- 33. 33 + Thank You Go BigData!!! © Varad Meru, 2014

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment