SlideShare a Scribd company logo
Ofer Vugman
 May 2012
Agenda and such…


   What is ML (Machine Learning)
   ML Common Use Cases
   Mahout Overview
   Algorithms in Mahout
   Mahout Commercial Use
   Mahout Summary
What is ML



       “Machine Learning is programming
      computers to optimize a performance
       criterion using example data or past
                    experience”


 Intro. To Machine Learning by E. Alpaydin
ML Common Use Cases


 Recommendation
ML Common Use Cases


 Classification
ML Common Use Cases


 Clustering
ML Common Libraries
Mahout Overview – What ?


A mahout is a person who keeps and drives
  an elephant
Mahout Overview – What ?


 A scalable machine learning library
Mahout Overview – What ?


 Began life at 2008 as a subproject of
  Apache’s Lucene project
 On 2010 Mahout became a top-level
  Apache project in its own right
 Implemented in Java
 Built upon Apache’s Hadoop (Look ! An
  Elephant !)
Mahout Overview – Why ?


 Many open source ML libraries either:
   Lack community
   Lack documentation and examples
   Lack scalability
   Lack the Apache license
   Are research oriented
   Not well tested
   Not built over existing production quality
    libraries
Mahout Overview – Why ?


 Scalability
   Scalable to reasonably large datasets (core
    algorithms implemented in Map/Reduce,
    runnable on Hadoop)
   Scalable to support your business case
    (Apache License)
   Scalable community
Mahout Overview – Why ?


 Built over existing production quality
  libraries
Mahout Overview – Use Cases


 Mahout currently supports mainly four
  use cases:
  1. Recommendation
  2. Clustering
  3. Classification
  4. Frequent Itemset Mining
Mahout Overview - Technical


 System Requirements
     Linux (or Cygwin on Windows)
     Java 1.6.x or greater
     Maven 2.0.11 or greater to build the source
      code
     Hadoop 0.2 or greater*


* Not all algorithms are implemented to work on Hadoop clusters
Algorithms in Mahout


 We’ll focus on one example:
   Collaborative Filtering (Recommenders)



 Yet there are many (many !!) more, you
  can find them all on
  https://cwiki.apache.org/confluence/dis
  play/MAHOUT/Algorithms
Algorithms Examples –
Recommendation

 Help users find items they might like
  based on historical preferences




 Based on example by Sebastian Schelter in “Distributed Itembased
  Collaborative Filtering with Apache Mahout”
Algorithms Examples –
Recommendation




      Alice   5     1   4




      Bob     ?     2   5




     Peter    4     3   2
Algorithms Examples –
Recommendation

 Algorithm
   Neighborhood-based approach
   Works by finding similarly rated items in the
    user-item-matrix (e.g. cosine, Pearson-
    Correlation, Tanimoto Coefficient)
   Estimates a user's preference towards an
    item by looking at his/her preferences
    towards similar items
Algorithms Examples –
Recommendation

 Prediction: Estimate Bob's preference
  towards “The Matrix”
  1. Look at all items that
        a) are similar to “The Matrix“
        b) have been rated by Bob
           => “Alien“, “Inception“
  2. Estimate the unknown preference with a
     weighted sum
Algorithms Examples –
Recommendation

 MapReduce phase 1
   Map – Make user the key
    (Alice, Matrix, 5)        Alice (Matrix, 5)
    (Alice, Alien, 1)         Alice (Alien, 1)
    (Alice, Inception, 4)     Alice (Inception, 4)
    (Bob, Alien, 2)           Bob (Alien, 2)
    (Bob, Inception, 5)       Bob (Inception, 5)
    (Peter, Matrix, 4)        Peter (Matrix, 4)
    (Peter, Alien, 3)         Peter (Alien, 3)
    (Peter, Inception, 2)     Peter (Inception, 2)
Algorithms Examples –
Recommendation

 MapReduce phase 1
   Reduce – Create inverted index
 Alice (Matrix, 5)
 Alice (Alien, 1)
 Alice (Inception, 4)     Alice (Matrix, 5) (Alien, 1) (Inception, 4)
 Bob (Alien, 2)           Bob (Alien, 2) (Inception, 5)
 Bob (Inception, 5)       Peter(Matrix, 4) (Alien, 3) (Inception, 2)
 Peter (Matrix, 4)
 Peter (Alien, 3)
 Peter (Inception, 2)
Algorithms Examples –
Recommendation

 MapReduce phase 2
    Map – Isolate all co-occurred ratings (all
      cases where a user rated both items)
                                              Matrix, Alien (5,1)
                                              Matrix, Alien (4,3)
Alice (Matrix, 5) (Alien, 1) (Inception, 4)   Alien, Inception (1,4)
Bob (Alien, 2) (Inception, 5)                 Alien, Inception (2,5)
Peter(Matrix, 4) (Alien, 3) (Inception, 2)    Alien, Inception (3,2)
                                              Matrix, Inception (4,2)
                                              Matrix, Inception (5,4)
Algorithms Examples –
Recommendation

 MapReduce phase 2
   Reduce – Compute similarities

  Matrix, Alien (5,1)
  Matrix, Alien (4,3)
  Alien, Inception (1,4)    Matrix, Alien (-0.47)
  Alien, Inception (2,5)    Matrix, Inception (0.47)
  Alien, Inception (3,2)    Alien, Inception(-0.63)
  Matrix, Inception (4,2)
  Matrix, Inception (5,4)
Algorithms Examples –
Recommendation




      Alice   5     1   4




      Bob     1.5   2   5




     Peter    4     3   2
Mahout Commercial Use


 Commercial use
Mahout Resources

 Mahout website - http://mahout.apache.org/
 Introducing Apache Mahout –
  http://www.ibm.com/developerworks/java/lib
  rary/j-mahout/
 “Mahout In Action” by Sean Owen and Robin
  Anil
Mahout Summary


 ML is all over the web today
 Mahout is about scalable machine
  learning
 Mahout has functionality for many of
  today’s common machine learning tasks
 MapReduce magic in
  action
Mahout Summary




     Thank you and good night

More Related Content

What's hot

Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用
James Chen
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
Keeyong Han
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
Varad Meru
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
Cataldo Musto
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
Yasmine Gaber
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
Korea Sdec
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
Grant Ingersoll
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Varad Meru
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
Ted Dunning
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
OSCON Byrum
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Evan Casey
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
aneeshabakharia
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Joaquin Delgado PhD.
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
Ajit Koti
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
Aman Adhikari
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013
MLconf
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 

What's hot (20)

Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
 
Mahout
MahoutMahout
Mahout
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 

Viewers also liked

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
Gaurav Kasliwal
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIs
Smitha Mysore Lokesh
 
Apache tika
Apache tikaApache tika
Vaklipi Text Analytics Tools
Vaklipi Text Analytics ToolsVaklipi Text Analytics Tools
Vaklipi Text Analytics Tools
aiaioo
 
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq HanayshaVPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
Hanaysha
 
Data Science for Cyber Risk
Data Science for Cyber RiskData Science for Cyber Risk
Data Science for Cyber Risk
Scott Allen Mongeau
 
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015Ertunga Arsal
 
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Ryan Cuprak
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithmsmozgkarakaya
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session
Splunk
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Building an Analytics Enables SOC
Building an Analytics Enables SOCBuilding an Analytics Enables SOC
Building an Analytics Enables SOC
Splunk
 
Introducing OpenText Auto-Classification
Introducing OpenText Auto-ClassificationIntroducing OpenText Auto-Classification
Introducing OpenText Auto-Classification
Stephen Ludlow
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
Ludovic Piot
 
Dev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leasewebDev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leaseweb
Microsoft
 
Resume Shavez Hasan (1)
Resume Shavez Hasan (1)Resume Shavez Hasan (1)
Resume Shavez Hasan (1)Shavez Mirza
 
Openstack benelux 2015
Openstack benelux 2015Openstack benelux 2015
Openstack benelux 2015
Microsoft
 
Corredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola CreoleCorredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola Creoleguesta96e92
 

Viewers also liked (20)

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIs
 
Apache tika
Apache tikaApache tika
Apache tika
 
Vaklipi Text Analytics Tools
Vaklipi Text Analytics ToolsVaklipi Text Analytics Tools
Vaklipi Text Analytics Tools
 
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq HanayshaVPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
 
Data Science for Cyber Risk
Data Science for Cyber RiskData Science for Cyber Risk
Data Science for Cyber Risk
 
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
 
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithms
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Building an Analytics Enables SOC
Building an Analytics Enables SOCBuilding an Analytics Enables SOC
Building an Analytics Enables SOC
 
Introducing OpenText Auto-Classification
Introducing OpenText Auto-ClassificationIntroducing OpenText Auto-Classification
Introducing OpenText Auto-Classification
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
 
Dev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leasewebDev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leaseweb
 
Resume Shavez Hasan (1)
Resume Shavez Hasan (1)Resume Shavez Hasan (1)
Resume Shavez Hasan (1)
 
Openstack benelux 2015
Openstack benelux 2015Openstack benelux 2015
Openstack benelux 2015
 
DailyTranslate Brochure
DailyTranslate BrochureDailyTranslate Brochure
DailyTranslate Brochure
 
Corredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola CreoleCorredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola Creole
 

Similar to Intro to Mahout

Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)
Gautam Rege
 
A tour on Spur for non-VM experts
A tour on Spur for non-VM expertsA tour on Spur for non-VM experts
A tour on Spur for non-VM experts
ESUG
 
OSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningRobin Anil
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
Eoin Hurrell, PhD
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
Tom Maiaroto
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
Sease
 
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongJDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
PROIDEA
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Jisang Yoon
 
Download Materials
Download MaterialsDownload Materials
Download Materialsbutest
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsGIScRG
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
AI Frontiers
 
AI in Production
AI in ProductionAI in Production
AI in Production
Giovanni Fernandez-Kincade
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Databricks
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
Tomer Gabel
 
Exploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinarExploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinar
Maarten Balliauw
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
elephantscale
 
DriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous CarsDriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous Cars
University of Passau
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw
 

Similar to Intro to Mahout (20)

mahout-cf
mahout-cfmahout-cf
mahout-cf
 
Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)
 
A tour on Spur for non-VM experts
A tour on Spur for non-VM expertsA tour on Spur for non-VM experts
A tour on Spur for non-VM experts
 
OSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine Learning
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongJDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Download Materials
Download MaterialsDownload Materials
Download Materials
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational Experiments
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
 
AI in Production
AI in ProductionAI in Production
AI in Production
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
 
Exploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinarExploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinar
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
DriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous CarsDriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous Cars
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
 

More from Uri Lavi

JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDD
Uri Lavi
 
API Best Practices
API Best PracticesAPI Best Practices
API Best PracticesUri Lavi
 
Web Performance 101
Web Performance 101Web Performance 101
Web Performance 101
Uri Lavi
 
Cloud Aware Architecture
Cloud Aware ArchitectureCloud Aware Architecture
Cloud Aware Architecture
Uri Lavi
 
Software craftsmanship - 4
Software craftsmanship - 4Software craftsmanship - 4
Software craftsmanship - 4Uri Lavi
 
Software Craftsmanship - 3
Software Craftsmanship - 3Software Craftsmanship - 3
Software Craftsmanship - 3Uri Lavi
 
Software Craftsmanship - 2
Software Craftsmanship - 2Software Craftsmanship - 2
Software Craftsmanship - 2
Uri Lavi
 
Software Craftsmanship - 1 Meeting
Software Craftsmanship - 1 MeetingSoftware Craftsmanship - 1 Meeting
Software Craftsmanship - 1 Meeting
Uri Lavi
 
Effective Code Review
Effective Code ReviewEffective Code Review
Effective Code ReviewUri Lavi
 

More from Uri Lavi (9)

JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDD
 
API Best Practices
API Best PracticesAPI Best Practices
API Best Practices
 
Web Performance 101
Web Performance 101Web Performance 101
Web Performance 101
 
Cloud Aware Architecture
Cloud Aware ArchitectureCloud Aware Architecture
Cloud Aware Architecture
 
Software craftsmanship - 4
Software craftsmanship - 4Software craftsmanship - 4
Software craftsmanship - 4
 
Software Craftsmanship - 3
Software Craftsmanship - 3Software Craftsmanship - 3
Software Craftsmanship - 3
 
Software Craftsmanship - 2
Software Craftsmanship - 2Software Craftsmanship - 2
Software Craftsmanship - 2
 
Software Craftsmanship - 1 Meeting
Software Craftsmanship - 1 MeetingSoftware Craftsmanship - 1 Meeting
Software Craftsmanship - 1 Meeting
 
Effective Code Review
Effective Code ReviewEffective Code Review
Effective Code Review
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Intro to Mahout

  • 2. Agenda and such…  What is ML (Machine Learning)  ML Common Use Cases  Mahout Overview  Algorithms in Mahout  Mahout Commercial Use  Mahout Summary
  • 3. What is ML “Machine Learning is programming computers to optimize a performance criterion using example data or past experience”  Intro. To Machine Learning by E. Alpaydin
  • 4. ML Common Use Cases  Recommendation
  • 5. ML Common Use Cases  Classification
  • 6. ML Common Use Cases  Clustering
  • 8. Mahout Overview – What ? A mahout is a person who keeps and drives an elephant
  • 9. Mahout Overview – What ?  A scalable machine learning library
  • 10. Mahout Overview – What ?  Began life at 2008 as a subproject of Apache’s Lucene project  On 2010 Mahout became a top-level Apache project in its own right  Implemented in Java  Built upon Apache’s Hadoop (Look ! An Elephant !)
  • 11. Mahout Overview – Why ?  Many open source ML libraries either:  Lack community  Lack documentation and examples  Lack scalability  Lack the Apache license  Are research oriented  Not well tested  Not built over existing production quality libraries
  • 12. Mahout Overview – Why ?  Scalability  Scalable to reasonably large datasets (core algorithms implemented in Map/Reduce, runnable on Hadoop)  Scalable to support your business case (Apache License)  Scalable community
  • 13. Mahout Overview – Why ?  Built over existing production quality libraries
  • 14. Mahout Overview – Use Cases  Mahout currently supports mainly four use cases: 1. Recommendation 2. Clustering 3. Classification 4. Frequent Itemset Mining
  • 15. Mahout Overview - Technical  System Requirements  Linux (or Cygwin on Windows)  Java 1.6.x or greater  Maven 2.0.11 or greater to build the source code  Hadoop 0.2 or greater* * Not all algorithms are implemented to work on Hadoop clusters
  • 16. Algorithms in Mahout  We’ll focus on one example:  Collaborative Filtering (Recommenders)  Yet there are many (many !!) more, you can find them all on https://cwiki.apache.org/confluence/dis play/MAHOUT/Algorithms
  • 17. Algorithms Examples – Recommendation  Help users find items they might like based on historical preferences  Based on example by Sebastian Schelter in “Distributed Itembased Collaborative Filtering with Apache Mahout”
  • 18. Algorithms Examples – Recommendation Alice 5 1 4 Bob ? 2 5 Peter 4 3 2
  • 19. Algorithms Examples – Recommendation  Algorithm  Neighborhood-based approach  Works by finding similarly rated items in the user-item-matrix (e.g. cosine, Pearson- Correlation, Tanimoto Coefficient)  Estimates a user's preference towards an item by looking at his/her preferences towards similar items
  • 20. Algorithms Examples – Recommendation  Prediction: Estimate Bob's preference towards “The Matrix” 1. Look at all items that  a) are similar to “The Matrix“  b) have been rated by Bob => “Alien“, “Inception“ 2. Estimate the unknown preference with a weighted sum
  • 21. Algorithms Examples – Recommendation  MapReduce phase 1  Map – Make user the key (Alice, Matrix, 5) Alice (Matrix, 5) (Alice, Alien, 1) Alice (Alien, 1) (Alice, Inception, 4) Alice (Inception, 4) (Bob, Alien, 2) Bob (Alien, 2) (Bob, Inception, 5) Bob (Inception, 5) (Peter, Matrix, 4) Peter (Matrix, 4) (Peter, Alien, 3) Peter (Alien, 3) (Peter, Inception, 2) Peter (Inception, 2)
  • 22. Algorithms Examples – Recommendation  MapReduce phase 1  Reduce – Create inverted index Alice (Matrix, 5) Alice (Alien, 1) Alice (Inception, 4) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Bob (Alien, 2) Bob (Alien, 2) (Inception, 5) Bob (Inception, 5) Peter(Matrix, 4) (Alien, 3) (Inception, 2) Peter (Matrix, 4) Peter (Alien, 3) Peter (Inception, 2)
  • 23. Algorithms Examples – Recommendation  MapReduce phase 2  Map – Isolate all co-occurred ratings (all cases where a user rated both items) Matrix, Alien (5,1) Matrix, Alien (4,3) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Alien, Inception (1,4) Bob (Alien, 2) (Inception, 5) Alien, Inception (2,5) Peter(Matrix, 4) (Alien, 3) (Inception, 2) Alien, Inception (3,2) Matrix, Inception (4,2) Matrix, Inception (5,4)
  • 24. Algorithms Examples – Recommendation  MapReduce phase 2  Reduce – Compute similarities Matrix, Alien (5,1) Matrix, Alien (4,3) Alien, Inception (1,4) Matrix, Alien (-0.47) Alien, Inception (2,5) Matrix, Inception (0.47) Alien, Inception (3,2) Alien, Inception(-0.63) Matrix, Inception (4,2) Matrix, Inception (5,4)
  • 25. Algorithms Examples – Recommendation Alice 5 1 4 Bob 1.5 2 5 Peter 4 3 2
  • 26. Mahout Commercial Use  Commercial use
  • 27. Mahout Resources  Mahout website - http://mahout.apache.org/  Introducing Apache Mahout – http://www.ibm.com/developerworks/java/lib rary/j-mahout/  “Mahout In Action” by Sean Owen and Robin Anil
  • 28. Mahout Summary  ML is all over the web today  Mahout is about scalable machine learning  Mahout has functionality for many of today’s common machine learning tasks  MapReduce magic in action
  • 29. Mahout Summary Thank you and good night

Editor's Notes

  1. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers (2008)Apache Lucene(TM) is a high-performance, full-featured text search engine library  (2005)