SlideShare a Scribd company logo
1 of 22
Download to read offline
Artificial Intelligence
Layer
Mahout, MLLib, & other projects
Víctor Sánchez Anguix
Universitat Politècnica de València
MSc. In Artificial Intelligence, Pattern Recognition, and Digital
Image
Course 2014/2015
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Core technologies like DFS and MR (i.e.,
Hadoop)
➢ ETL for transforming data (i.e., Pig)
➢ Alternative core/ETL technology (i.e., Spark)
➢ Now we can build AI tools from scratch
So far...
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
Can I save some work with
existing code?
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Write some UDF wrappers for Weka in
Pig/Spark
➢ Use connectors to R and Python
➢ Parallelize execution of multiple non-
distributed algorithms
Actually, we can...
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Still problematic if algorithm instances are
very big
➢ They are not really parallel algorithms
➢ Use parallel algorithms to tackle big problems:
○ Apache Mahout
○ Apache Spark
But we can do better!
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Collection of
parallel AI & ML
algorithms
➢ Map Reduce
algorithms → Spark
➢ Latest major
release: Mahout 0.9
(February 2014)
http://mahout.apache.org/
Apache Mahout
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Clustering algorithms:
○ K-means (parallel)
○ Fuzzy K-means (parallel)
○ Spectral K-means (parallel)
➢ Classification algorithms:
○ Logistic regression (non parallel)
○ Naive Bayes (parallel)
○ Random Forest (parallel)
○ Multilayer perceptron (non parallel)
Apache Mahout: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Dimensionality reduction:
○ Singular Value Decomposition (parallel)
○ PCA (parallel)
○ Lanczos decomposition (parallel)
○ QR decomposition (parallel)
➢ Text algorithms:
○ TF-IDF (parallel)
Apache Mahout: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Just type mahout in the shell
➢ A list of available algorithms will pop out
➢ Typing mahout algorithm_name will print the
help for the specific algorithm
➢ Executing distributed algorithms requires of
Hadoop and DFS
Mahout from shell
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ mahout recommenditembased :
○ --input: file with user_id item_id rows to represent
purchases
○ --output: where mahout should store results
○ --usersFile: who we should recommend
○ --itemsFile: what items we can recommend
○ -b: true (in our case, binary data)
○ --similarityClassname: SIMILARITY_LOGLIKELIHOOD
or SIMILARITY_TANIMOTO_COEFFICIENT (in our
case, binary data)
Mahout example: Item-based
Collaborative filtering
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Execute:
mahout recommenditembased --input
data/purchases_mahout.tsv --output mahout_cf --
usersFile data/users_mahout.tsv --itemsFile
data/valid_products_mahout.tsv --booleanData --
similarityClassname SIMILARITY_LOGLIKELIHOOD
Mahout example: Item-based
Collaborative filtering
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Machine learning
library inside Spark
➢ Completely
distributed
➢ It is bundled with
Spark!
MLLib
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Classification & Regression:
○ Support Vector Machines
○ Logistic Regression
○ Linear Regression
○ Random Forests
➢ Clustering:
○ K-means
MLLib: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Dimensionality reduction:
○ Singular Value Decomposition
○ PCA
➢ Clustering:
○ K-means
➢ Collaborative filtering
○ ALS item-based recommender
MLLib: Algorithms
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Let us apply K-means on the iris data set
MLLib: K-Means example
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
val numClusters = 3
val numIteration = 20
val data_iris = sc.textFile( “hdfs:///user/sanguix/data/iris.csv”).map(
l=> l.split(“,”,-1) )
val parsedData = data_iris.map( r => Vectors.dense( Array( r(0).toDouble,
r(1).toDouble, r(2).toDouble, r(3).toDouble ) ) ).cache()
val clusters = KMeans.train( parsedData, numClusters, numIteration )
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Spark built-in library for graphs
➢ Algorithms:
○ PageRank
○ (Strong) Connected components
○ Label propagation
○ Other basic graph operations
Other projects: Graphx
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Graph framework over Hadoop
➢ Specialized for building
algorithms for graphs
➢ Latest major release:
Giraph 1.1.0 (Nov. 2014)
http://giraph.apache.org/
Other projects: Giraph
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Distributed framework for machine learning
➢ Originally created at Carnegie Mellon
➢ Algorithms:
○ Collaborative filtering
○ Text analysis
○ Page Rank
○ Deep learning
➢ Latest release: GraphLab 2.2 (July 2013)
https://github.com/graphlab-code/graphlab
Other projects: GraphLab
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ ML library on top of Hadoop/Spark
➢ Algorithms:
○ Random Forests
○ Generalized Linear Model
○ Deep learning
○ K-Means
➢ Latest release: H2O 2.8.4.4
(February 2015)
https://github.com/h2oai/h2o-dev
Other projects: H2O
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Large scale data processing engine in
Java/Scala
➢ In memory collections
➢ Latest release: Flink 0.8.0
(January 2015)
http://flink.apache.org/
Other projects: Apache Flink
Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and
Digital Image
➢ Mahout in Action. Sean Owen. Eds. Manning
Publications (2011)
➢ Apache Mahout Cookbook. Piero Giacomelli.
Ed. Packt Publishing (2013)
➢ StackOverflow
Extra information
Artificial Intelligence
Layer
Mahout, MLLib, & other projects
Víctor Sánchez Anguix
Universitat Politècnica de València
MSc. In Artificial Intelligence, Pattern Recognition, and Digital
Image
Course 2014/2015

More Related Content

What's hot

Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deckEric Dill
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Modelssaurav singla
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Joshua Shinavier
 
Algebraic Property Graphs
Algebraic Property GraphsAlgebraic Property Graphs
Algebraic Property GraphsAdrian Wilke
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumRalf Gommers
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with pythonVishalBisht9217
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoMLNing Jiang
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Yannis Kalfoglou
 

What's hot (15)

Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Poster
PosterPoster
Poster
 
Persian MNIST in 5 Minutes
Persian MNIST in 5 MinutesPersian MNIST in 5 Minutes
Persian MNIST in 5 Minutes
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
Algebraic Property Graphs
Algebraic Property GraphsAlgebraic Property Graphs
Algebraic Property Graphs
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
 
Tutorial4
Tutorial4Tutorial4
Tutorial4
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with python
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
Python libraries
Python librariesPython libraries
Python libraries
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002
 

Similar to Artificial Intelligence Layer: Mahout, MLLib, and other projects

Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting StartedRafey Iqbal Rahman
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easyVictor Sanchez Anguix
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycleDatabricks
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
DeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net PlatformDeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net PlatformTuri, Inc.
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroGraphAware
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is DistributedAlluxio, Inc.
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabDiana Dymolazova
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 

Similar to Artificial Intelligence Layer: Mahout, MLLib, and other projects (20)

Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting Started
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
DeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net PlatformDeepLearning4J: Open Source Neural Net Platform
DeepLearning4J: Open Source Neural Net Platform
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro Negro
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLab
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Artificial Intelligence Layer: Mahout, MLLib, and other projects

  • 1. Artificial Intelligence Layer Mahout, MLLib, & other projects Víctor Sánchez Anguix Universitat Politècnica de València MSc. In Artificial Intelligence, Pattern Recognition, and Digital Image Course 2014/2015
  • 2. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Core technologies like DFS and MR (i.e., Hadoop) ➢ ETL for transforming data (i.e., Pig) ➢ Alternative core/ETL technology (i.e., Spark) ➢ Now we can build AI tools from scratch So far...
  • 3. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image Can I save some work with existing code?
  • 4. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Write some UDF wrappers for Weka in Pig/Spark ➢ Use connectors to R and Python ➢ Parallelize execution of multiple non- distributed algorithms Actually, we can...
  • 5. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Still problematic if algorithm instances are very big ➢ They are not really parallel algorithms ➢ Use parallel algorithms to tackle big problems: ○ Apache Mahout ○ Apache Spark But we can do better!
  • 6. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Collection of parallel AI & ML algorithms ➢ Map Reduce algorithms → Spark ➢ Latest major release: Mahout 0.9 (February 2014) http://mahout.apache.org/ Apache Mahout
  • 7. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Clustering algorithms: ○ K-means (parallel) ○ Fuzzy K-means (parallel) ○ Spectral K-means (parallel) ➢ Classification algorithms: ○ Logistic regression (non parallel) ○ Naive Bayes (parallel) ○ Random Forest (parallel) ○ Multilayer perceptron (non parallel) Apache Mahout: Algorithms
  • 8. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Dimensionality reduction: ○ Singular Value Decomposition (parallel) ○ PCA (parallel) ○ Lanczos decomposition (parallel) ○ QR decomposition (parallel) ➢ Text algorithms: ○ TF-IDF (parallel) Apache Mahout: Algorithms
  • 9. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Just type mahout in the shell ➢ A list of available algorithms will pop out ➢ Typing mahout algorithm_name will print the help for the specific algorithm ➢ Executing distributed algorithms requires of Hadoop and DFS Mahout from shell
  • 10. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ mahout recommenditembased : ○ --input: file with user_id item_id rows to represent purchases ○ --output: where mahout should store results ○ --usersFile: who we should recommend ○ --itemsFile: what items we can recommend ○ -b: true (in our case, binary data) ○ --similarityClassname: SIMILARITY_LOGLIKELIHOOD or SIMILARITY_TANIMOTO_COEFFICIENT (in our case, binary data) Mahout example: Item-based Collaborative filtering
  • 11. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Execute: mahout recommenditembased --input data/purchases_mahout.tsv --output mahout_cf -- usersFile data/users_mahout.tsv --itemsFile data/valid_products_mahout.tsv --booleanData -- similarityClassname SIMILARITY_LOGLIKELIHOOD Mahout example: Item-based Collaborative filtering
  • 12. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Machine learning library inside Spark ➢ Completely distributed ➢ It is bundled with Spark! MLLib
  • 13. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Classification & Regression: ○ Support Vector Machines ○ Logistic Regression ○ Linear Regression ○ Random Forests ➢ Clustering: ○ K-means MLLib: Algorithms
  • 14. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Dimensionality reduction: ○ Singular Value Decomposition ○ PCA ➢ Clustering: ○ K-means ➢ Collaborative filtering ○ ALS item-based recommender MLLib: Algorithms
  • 15. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Let us apply K-means on the iris data set MLLib: K-Means example import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors val numClusters = 3 val numIteration = 20 val data_iris = sc.textFile( “hdfs:///user/sanguix/data/iris.csv”).map( l=> l.split(“,”,-1) ) val parsedData = data_iris.map( r => Vectors.dense( Array( r(0).toDouble, r(1).toDouble, r(2).toDouble, r(3).toDouble ) ) ).cache() val clusters = KMeans.train( parsedData, numClusters, numIteration )
  • 16. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Spark built-in library for graphs ➢ Algorithms: ○ PageRank ○ (Strong) Connected components ○ Label propagation ○ Other basic graph operations Other projects: Graphx
  • 17. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Graph framework over Hadoop ➢ Specialized for building algorithms for graphs ➢ Latest major release: Giraph 1.1.0 (Nov. 2014) http://giraph.apache.org/ Other projects: Giraph
  • 18. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Distributed framework for machine learning ➢ Originally created at Carnegie Mellon ➢ Algorithms: ○ Collaborative filtering ○ Text analysis ○ Page Rank ○ Deep learning ➢ Latest release: GraphLab 2.2 (July 2013) https://github.com/graphlab-code/graphlab Other projects: GraphLab
  • 19. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ ML library on top of Hadoop/Spark ➢ Algorithms: ○ Random Forests ○ Generalized Linear Model ○ Deep learning ○ K-Means ➢ Latest release: H2O 2.8.4.4 (February 2015) https://github.com/h2oai/h2o-dev Other projects: H2O
  • 20. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Large scale data processing engine in Java/Scala ➢ In memory collections ➢ Latest release: Flink 0.8.0 (January 2015) http://flink.apache.org/ Other projects: Apache Flink
  • 21. Artificial Intelligence Layer: Mahout, MLLib & other projects. MSc. in Artificial Intelligence, Pattern Recognition and Digital Image ➢ Mahout in Action. Sean Owen. Eds. Manning Publications (2011) ➢ Apache Mahout Cookbook. Piero Giacomelli. Ed. Packt Publishing (2013) ➢ StackOverflow Extra information
  • 22. Artificial Intelligence Layer Mahout, MLLib, & other projects Víctor Sánchez Anguix Universitat Politècnica de València MSc. In Artificial Intelligence, Pattern Recognition, and Digital Image Course 2014/2015