Intro to Apache Mahout

Grant Ingersoll
Grant IngersollCTO at Lucidworks
Machine Learning with Apache Mahout ,[object Object],Grant Ingersoll,[object Object],July 8, 2010,[object Object]
Introduction,[object Object],You,[object Object],Machine learning experience?,[object Object],Business Intelligence?,[object Object],Natural Lang. Processing?,[object Object],ApacheHadoop?,[object Object],Me,[object Object],Co-founder Apache Mahout,[object Object],Apache Lucene/Solr committer,[object Object],Co-founder Lucid Imagination,[object Object]
Topics,[object Object],What is Machine Learning?,[object Object],ML Use Cases,[object Object],What is Mahout?,[object Object],What can I do with it right now?,[object Object],Where’s Mahout headed?,[object Object]
Amazon.com,[object Object],What is Machine Learning?,[object Object],Google News,[object Object]
Really it’s…,[object Object],“Machine Learning is programming computers to optimize a performance criterion using example data or past experience”,[object Object],Intro. To Machine Learning by E. Alpaydin,[object Object],Subset of Artificial Intelligence,[object Object],Lots of related fields:,[object Object],Information Retrieval,[object Object],Stats,[object Object],Biology,[object Object],Linear algebra,[object Object],Many more,[object Object]
Common Use Cases,[object Object],Recommend friends/dates/products,[object Object],Classify content into predefined groups,[object Object],Find similar content based on object properties,[object Object],Find associations/patterns in actions/behaviors,[object Object],Identify key topics in large collections of text,[object Object],Detect anomalies in machine output,[object Object],Ranking search results,[object Object],Others?,[object Object]
Useful Terminology,[object Object],Vectors/Matrices,[object Object],Weights,[object Object],Sparse,[object Object],Dense,[object Object],Norms,[object Object],Features,[object Object],Feature reduction,[object Object],Occurrences and Co-occurrences,[object Object]
Getting Started with ML,[object Object],Get your data,[object Object],Decide on your features per your algorithm,[object Object],Prep the data,[object Object],Different approaches for different algorithms,[object Object],Run your algorithm(s),[object Object],Lather, rinse, repeat,[object Object],Validate your results,[object Object],Smell test, A/B testing, more formal methods,[object Object]
Apache Mahout,[object Object],http://dictionary.reference.com/browse/mahout,[object Object],An Apache Software Foundation project to create scalable machine learning libraries under the Apache Software License,[object Object],http://mahout.apache.org,[object Object],Why Mahout?,[object Object],Many Open Source ML libraries either:,[object Object],Lack Community,[object Object],Lack Documentation and Examples,[object Object],Lack Scalability,[object Object],Lack the Apache License,[object Object],Or are research-oriented,[object Object]
Focus: Machine Learning,[object Object],Applications,[object Object],Examples,[object Object],Recommenders,[object Object],Clustering,[object Object],Classification,[object Object],Freq. Pattern,[object Object],Mining,[object Object],Genetic,[object Object],Math,[object Object],Vectors/Matrices/SVD,[object Object],Utilities,[object Object],Lucene/Vectorizer,[object Object],Collections (primitives),[object Object],Apache Hadoop,[object Object],See http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms,[object Object]
Focus: Scalable,[object Object],Goal: Be as fast and efficient as possible given the intrinsic design of the algorithm,[object Object],Some algorithms won’t scale to massive machine clusters,[object Object],Others fit logically on a Map Reduce framework like Apache Hadoop,[object Object],Still others will needalternative distributed programming models,[object Object],Be pragmatic,[object Object],Most Mahout implementations are Map Reduce enabled,[object Object],(Always a) Work in Progress,[object Object]
Prepare Data from Raw content,[object Object],Data Sources:,[object Object],Lucene integration,[object Object],bin/mahout lucenevector …,[object Object],Document Vectorizer,[object Object],bin/mahout seqdirectory …,[object Object],bin/mahout seq2sparse …,[object Object],Programmatically,[object Object],See the Utils module in Mahout,[object Object],Database,[object Object],File system,[object Object]
Recommendations,[object Object],Extensive framework for collaborative filtering,[object Object],Recommenders,[object Object],User based,[object Object],Item based,[object Object],Online and Offline support,[object Object],Offline can utilize Hadoop,[object Object],Many different Similarity measures,[object Object],Cosine, LLR, Tanimoto, Pearson, others,[object Object]
Clustering,[object Object],Document level,[object Object],Group documents based on a notion of similarity,[object Object],K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift,[object Object],Distance Measures,[object Object],Manhattan, Euclidean, other,[object Object],Topic Modeling ,[object Object],Cluster words across documents to identify topics,[object Object],Latent Dirichlet Allocation,[object Object]
Categorization,[object Object],Place new items into predefined categories:,[object Object],Sports, politics, entertainment,[object Object],Mahout has several implementations,[object Object],Naïve Bayes,[object Object],Complementary Naïve Bayes,[object Object],Decision Forests,[object Object],Logistic Regression (Almost done),[object Object]
Freq. Pattern Mining,[object Object],Identify frequently co-occurrent items,[object Object],Useful for:,[object Object],Query Recommendations,[object Object],Apple -> iPhone, orange, OS X,[object Object],Related product placement,[object Object],“Beer and Diapers”,[object Object],Spam Detection,[object Object],Yahoo: http://www.slideshare.net/hadoopusergroup/mail-antispam,[object Object],http://www.amazon.com,[object Object]
Evolutionary,[object Object],Map-Reduce ready fitness functions for genetic programming,[object Object],Integration with Watchmaker,[object Object],http://watchmaker.uncommons.org/index.php,[object Object],Problems solved:,[object Object],Traveling salesman,[object Object],Class discovery,[object Object],Many others,[object Object]
Singular Value Decomposition,[object Object],Reduces a big matrix into a much smaller matrix by amplifying the important parts while removing/reducing the less important parts,[object Object],Mahout has fully distributed Lanczos implementation,[object Object],<MAHOUT_HOME>/bin/mahout svd -Dmapred.input.dir=path/to/corpus --tempDir path/for/svd-output --rank 300 --numColumns <numcols> --numRows <num rows in the input>,[object Object],<MAHOUT_HOME>/bin/mahout cleansvd --eigenInput path/for/svd-output --corpusInput path/to/corpus --output path/for/cleanOutput --maxError 0.1 --minEigenvalue 10.0 ,[object Object],https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction,[object Object]
How To: Recommenders,[object Object],Data: ,[object Object],Users (abstract),[object Object],Items (abstract),[object Object],Ratings (optional),[object Object],Load the data model,[object Object],Ask for Recommendations:,[object Object],User-User,[object Object],Item-Item,[object Object]
Ugly Demo I,[object Object],Group Lens Data: http://www.grouplens.org,[object Object],http://lucene.apache.org/mahout/taste.html#demo,[object Object],http://localhost:8080/RecommenderServlet?userID=1&debug=true,[object Object],In other words:  the reason why I work on servers, not UIs!,[object Object]
How to: Command Line,[object Object],Most algorithms have a Driver program,[object Object],Shell script in $MAHOUT_HOME/bin helps with most tasks,[object Object],Prepare the Data,[object Object],Different algorithms require different setup,[object Object],Run the algorithm,[object Object],Single Node,[object Object],Hadoop,[object Object],Print out the results,[object Object],Several helper classes: ,[object Object],LDAPrintTopics, ClusterDumper, etc.,[object Object]
Ugly Demo II - Prep,[object Object],Data Set: Reuters,[object Object],http://www.daviddlewis.com/resources/testcollections/reuters21578/,[object Object],Convert to Text via http://www.lucenebootcamp.com/lucene-boot-camp-preclass-training/,[object Object],Convert to Sequence File:,[object Object],bin/mahout seqdirectory –input <PATH> --output <PATH> --charset UTF-8,[object Object],Convert to Sparse Vector:,[object Object],bin/mahout seq2sparse --input <PATH>/content/reuters/seqfiles/ --norm 2 --weight TF --output <PATH>/content/reuters/seqfiles-TF/ --minDF 5 --maxDFPercent 90,[object Object]
Ugly Demo II: Topic Modeling,[object Object],Latent Dirichlet Allocation,[object Object],./mahout lda --input  <PATH>/content/reuters/seqfiles-TF/vectors/ --output  <PATH>/content/reuters/seqfiles-TF/lda-output --numWords 34000 –numTopics 10,[object Object],./mahout org.apache.mahout.clustering.lda.LDAPrintTopics --input <PATH>/content/reuters/seqfiles-TF/lda-output/state-19 --dict <PATH>/content/reuters/seqfiles-TF/dictionary.file-0 --words 10 --output <PATH>/content/reuters/seqfiles-TF/lda-output/topics --dictionaryTypesequencefile,[object Object],Good feature reduction (stopword removal) required,[object Object]
Ugly Demo III: Clustering,[object Object],K-Means,[object Object],Same Prep as UD II, except use TFIDF weight,[object Object],./mahout kmeans --input <PATH>/content/reuters/seqfiles-TFIDF/vectors/part-00000 --k 15 --output <PATH>/content/reuters/seqfiles-TFIDF/output-kmeans --clusters <PATH>/content/reuters/seqfiles-TFIDF/output-kmeans/clusters,[object Object],Print out the clusters: ./mahout clusterdump --seqFileDir <PATH>/content/reuters/seqfiles-TFIDF/output-kmeans/clusters-15/ --pointsDir <PATH>/content/reuters/seqfiles-TFIDF/output-kmeans/points/ --dictionary <PATH>/content/reuters/seqfiles-TFIDF/dictionary.file-0 --dictionaryTypesequencefile --substring 20,[object Object]
Ugly Demo IV: Frequent Pattern Mining,[object Object],Data: http://fimi.cs.helsinki.fi/data/,[object Object],./mahout fpg -i <PATH>/content/freqitemset/accidents.dat -o patterns -k 50 -method mapreduce -g 10 -regex [],[object Object], ./mahout seqdump --seqFile patterns/fpgrowth/part-r-00000 ,[object Object]
What’s Next?,[object Object],Google Summer of Code (GSOC),[object Object],Restricted Boltzmann Machines,[object Object],SVM,[object Object],EigenCuts (spectral clustering),[object Object],Neural Networks,[object Object],SVD-based Recommender,[object Object],0.4 release in Fall (after GSOC),[object Object],Stabilize API’s for 1.0 release,[object Object],Benchmarking,[object Object],http://cwiki.apache.org/MAHOUT/howtocontribute.html,[object Object]
Resources,[object Object],Slides will be posted at http://lucene.grantingersoll.com,[object Object],Examples at:,[object Object],http://lucene.grantingersoll.com/2010/02/13/intro-to-mahout-slides-and-demo-examples/ ,[object Object],More Examples in Mahout SVN in the examples directory,[object Object]
Resources,[object Object],http://mahout.apache.org,[object Object],http://cwiki.apache.org/MAHOUT,[object Object],{user|dev}@mahout.apache.org,[object Object],http://svn.apache.org/repos/asf/mahout/trunk,[object Object],http://hadoop.apache.org,[object Object]
Resources,[object Object],“Mahout in Action” by Owen and Anil,[object Object],http://lucene.li/E,[object Object],“Taming Text” by Ingersoll, Morton and Farris,[object Object],http://lucene.li/D,[object Object],“Introducing Apache Mahout” ,[object Object],http://lucene.li/F,[object Object],“Programming Collective Intelligence” by Toby Segaran,[object Object],http://lucene.li/G,[object Object],“Data Mining - Practical Machine Learning Tools and Techniques” by Ian H. Witten and Eibe Frank,[object Object],http://lucene.li/H,[object Object]
References,[object Object],HAL: http://en.wikipedia.org/wiki/File:Hal-9000.jpg,[object Object],Terminator: http://en.wikipedia.org/wiki/File:Terminator1984movieposter.jpg,[object Object],Matrix: http://en.wikipedia.org/wiki/File:The_Matrix_Poster.jpg,[object Object],Google News: http://news.google.com,[object Object],Amazon.com: http://www.amazon.com,[object Object],Facebook: http://www.facebook.com,[object Object],Mahout: http://lucene.apache.org/mahout,[object Object],Beer and Diapers: http://www.flickr.com/photos/baubcat/2484459070/,[object Object],http://www.theregister.co.uk/2006/08/15/beer_diapers/,[object Object],DMOZ: http://www.dmoz.org,[object Object]
1 of 30

Recommended

Apache Mahout by
Apache MahoutApache Mahout
Apache MahoutAjit Koti
6.9K views51 slides
Apache mahout by
Apache mahoutApache mahout
Apache mahoutPuneet Gupta
614 views11 slides
Introduction to Collaborative Filtering with Apache Mahout by
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahoutsscdotopen
9.6K views13 slides
Introduction to Apache Mahout by
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache MahoutAman Adhikari
2K views13 slides
Intro to Mahout -- DC Hadoop by
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
4.1K views27 slides
mahout introduction by
mahout  introductionmahout  introduction
mahout introductionchanggeng Zhang
3.6K views29 slides

More Related Content

What's hot

Tutorial Mahout - Recommendation by
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
38.9K views68 slides
Mahout Tutorial and Hands-on (version 2015) by
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
5.3K views117 slides
Apache Mahout by
Apache MahoutApache Mahout
Apache MahoutSave Manos
2.9K views19 slides
Machine Learning with Apache Mahout by
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache MahoutDaniel Glauser
8.9K views102 slides
Apache Mahout 於電子商務的應用 by
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用James Chen
8.7K views55 slides
SDEC2011 Mahout - the what, the how and the why by
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
2.2K views30 slides

What's hot(20)

Tutorial Mahout - Recommendation by Cataldo Musto
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
Cataldo Musto38.9K views
Mahout Tutorial and Hands-on (version 2015) by Cataldo Musto
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
Cataldo Musto5.3K views
Apache Mahout by Save Manos
Apache MahoutApache Mahout
Apache Mahout
Save Manos2.9K views
Machine Learning with Apache Mahout by Daniel Glauser
Machine Learning with Apache MahoutMachine Learning with Apache Mahout
Machine Learning with Apache Mahout
Daniel Glauser8.9K views
Apache Mahout 於電子商務的應用 by James Chen
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用
James Chen8.7K views
SDEC2011 Mahout - the what, the how and the why by Korea Sdec
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
Korea Sdec2.2K views
An Introduction to Apache Hadoop, Mahout and HBase by Lukas Vlcek
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek5.4K views
Apache Mahout Tutorial - Recommendation - 2013/2014 by Cataldo Musto
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
Cataldo Musto37.8K views
Whats Right and Wrong with Apache Mahout by Ted Dunning
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
Ted Dunning6.1K views
Orchestrating the Intelligent Web with Apache Mahout by aneeshabakharia
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
aneeshabakharia2.6K views
Hands on Mahout! by OSCON Byrum
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
OSCON Byrum18.1K views
Big Data Analytics using Mahout by IMC Institute
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
IMC Institute5.4K views
Apache Mahout: Driving the Yellow Elephant by Grant Ingersoll
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow Elephant
Grant Ingersoll2.4K views
Next directions in Mahout's recommenders by sscdotopen
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenders
sscdotopen9.4K views
Mahout Introduction BarCampDC by Drew Farris
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
Drew Farris1.5K views
Mahout classification presentation by Naoki Nakatani
Mahout classification presentationMahout classification presentation
Mahout classification presentation
Naoki Nakatani2.8K views
Intro to Mahout by Uri Lavi
Intro to MahoutIntro to Mahout
Intro to Mahout
Uri Lavi3K views
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms by DataStax Academy
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy4.1K views

Similar to Intro to Apache Mahout

Mahout and Distributed Machine Learning 101 by
Mahout and Distributed Machine Learning 101Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101John Ternent
669 views29 slides
OSCON: Apache Mahout - Mammoth Scale Machine Learning by
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningRobin Anil
3.3K views42 slides
Machine Learning Hadoop by
Machine Learning HadoopMachine Learning Hadoop
Machine Learning HadoopAletheLabs
352 views2 slides
Classification with Naive Bayes by
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive BayesJosh Patterson
19K views23 slides
Hadoop Based Data Discovery by
Hadoop Based Data DiscoveryHadoop Based Data Discovery
Hadoop Based Data DiscoveryBenjamin Ashkar
721 views19 slides
Machine Learning and Hadoop by
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
6.8K views35 slides

Similar to Intro to Apache Mahout(20)

Mahout and Distributed Machine Learning 101 by John Ternent
Mahout and Distributed Machine Learning 101Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101
John Ternent669 views
OSCON: Apache Mahout - Mammoth Scale Machine Learning by Robin Anil
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine Learning
Robin Anil3.3K views
Machine Learning Hadoop by AletheLabs
Machine Learning HadoopMachine Learning Hadoop
Machine Learning Hadoop
AletheLabs352 views
Classification with Naive Bayes by Josh Patterson
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
Josh Patterson19K views
Machine Learning and Hadoop by Josh Patterson
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
Josh Patterson6.8K views
Populate your Search index, NEST 2016-01 by David Smiley
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01
David Smiley814 views
Recommender.system.presentation.pjug.05.20.2014 by rpbrehm
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
rpbrehm418 views
Vital AI: Big Data Modeling by Vital.AI
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
Vital.AI17.8K views
Hive @ Hadoop day seattle_2010 by nzhang
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang3K views
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop by IJTET Journal
A Survey on Approaches for Frequent Item Set Mining on Apache HadoopA Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
IJTET Journal403 views
Big data: Descoberta de conhecimento em ambientes de big data e computação na... by Rio Info
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info528 views
Data science technology overview by Soojung Hong
Data science technology overviewData science technology overview
Data science technology overview
Soojung Hong346 views
Vipul divyanshu mahout_documentation by Vipul Divyanshu
Vipul divyanshu mahout_documentationVipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentation
Vipul Divyanshu717 views
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac... by Cloudera, Inc.
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.6.8K views
Big Data and Hadoop by Flavio Vit
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit2.6K views
scale_perf_best_practices by webuploader
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practices
webuploader724 views

More from Grant Ingersoll

Solr for Data Science by
Solr for Data ScienceSolr for Data Science
Solr for Data ScienceGrant Ingersoll
4.2K views22 slides
This Ain't Your Parent's Search Engine by
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineGrant Ingersoll
1.6K views18 slides
Data IO: Next Generation Search with Lucene and Solr 4 by
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll
1.9K views22 slides
Intro to Search by
Intro to SearchIntro to Search
Intro to SearchGrant Ingersoll
1.9K views20 slides
Open Source Search FTW by
Open Source Search FTWOpen Source Search FTW
Open Source Search FTWGrant Ingersoll
2.5K views24 slides
Crowd Sourced Reflected Intelligence for Solr and Hadoop by
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopGrant Ingersoll
1.8K views25 slides

More from Grant Ingersoll(20)

This Ain't Your Parent's Search Engine by Grant Ingersoll
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search Engine
Grant Ingersoll1.6K views
Data IO: Next Generation Search with Lucene and Solr 4 by Grant Ingersoll
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4
Grant Ingersoll1.9K views
Crowd Sourced Reflected Intelligence for Solr and Hadoop by Grant Ingersoll
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Grant Ingersoll1.8K views
What's new in Lucene and Solr 4.x by Grant Ingersoll
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
Grant Ingersoll2.6K views
Scalable Machine Learning with Hadoop by Grant Ingersoll
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with Hadoop
Grant Ingersoll2.9K views
Large Scale Search, Discovery and Analytics in Action by Grant Ingersoll
Large Scale Search, Discovery and Analytics in ActionLarge Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in Action
Grant Ingersoll1.1K views
OpenSearchLab and the Lucene Ecosystem by Grant Ingersoll
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene Ecosystem
Grant Ingersoll3.7K views
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr by Grant Ingersoll
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Grant Ingersoll846 views
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr by Grant Ingersoll
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Grant Ingersoll770 views
Bet you didn't know Lucene can... by Grant Ingersoll
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...
Grant Ingersoll4.9K views
Starfish: A Self-tuning System for Big Data Analytics by Grant Ingersoll
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data Analytics
Grant Ingersoll1.9K views
Intro to Apache Lucene and Solr by Grant Ingersoll
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
Grant Ingersoll5.2K views
Intelligent Apps with Apache Lucene, Mahout and Friends by Grant Ingersoll
Intelligent Apps with Apache Lucene, Mahout and FriendsIntelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and Friends
Grant Ingersoll3.6K views

Recently uploaded

Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericShapeBlue
130 views9 slides
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
194 views62 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
79 views20 slides
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...ShapeBlue
145 views17 slides
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsShapeBlue
238 views13 slides
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...ShapeBlue
186 views15 slides

Recently uploaded(20)

Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue130 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue194 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue145 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue238 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue186 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc170 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker54 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue180 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue152 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue184 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue126 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash158 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue159 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue218 views

Intro to Apache Mahout

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.

Editor's Notes

  1. Hadoop experience? ML experience?
  2. A few things come to mind
  3. Think about data differently than traditional DB
  4. Helps address the dimensionality problem often associated with machine learning problems