SlideShare a Scribd company logo
1 of 34
Machine Learning
con Apache Mahout
  Domingo Suarez Torres
Machine Learning (ML)
        Introduction
Definition

     • Machine learning, a branch of artificial
        intelligence, is a scientific discipline
        concerned with the design and
        development of algorithms that allow
        computers to evolve behaviors based on
        empirical data (1)


1http://en.wikipedia.org/wiki/Machine_learning
• “Machine Learning is programming
  computers to optimize a performance
  criterion using example data or past
  experience”
 • Intro. To Machine Learning by E. Alpaydin
Applications
•   Recommend friends/dates/        •   Detect anomalies in machine
    products                            output

•   Classify content into           •   Ranking search results
    predefined groups
                                    •   Fraud detection
•   Find similar content based
    on object properties            •   Spam detection

•   Find associations/patterns in   •   Medical diagnostics
    actions/behaviors
                                    •   Translators
•   Identify key topics in large
    collections of text             •   Much more¡
Math

• Stadistics
• Discrete Math
• Linear algebra
• Probability
Starting with ML
•   Get your data
•   Decide on your features per your algorithm
•   Prep the data
    •   Different approaches for different algorithms
•   Run your algorithm(s)
    •   Lather, rinse, repeat
•   Validate your results
    •   Smell test, A/B testing
Apache Mahout

• Machine Learning library. Platform?
• Extensible, we can use our own algorithm.
• Hadoop support
• 2005. Taste Framework
• 2008. Included in Lucene
Scalability
•   Huge amount of data, growing every second¡
•   Be as fast and efficient as possible given the intrinsic design of
    the algorithm
    •   Some algorithms won’t scale to massive machine clusters
    •   Others fit logically on a Map Reduce framework like
        Apache Hadoop
    •   Still others will need alternative distributed programming
        models
    •   Be pragmatic
•   Most Mahout implementations are Map Reduce enabled
Who uses Mahout?
Components

• Recommender Engines (collaborative
  filtering, content-based)
• Clustering
• Classification
When to use?
• Recommendation
 • Rank large datasets
• Clustering
 • Group your data
• Classification
 • Train me to think like you
Recommenders
•   Given a data set. Make a recomendation.
    •   Item recomendation (Book, Movie, etc)
•   Ranking based
•   Recomendations
    •   User based
    •   Item based
•   knowledge of user’s relationships to items (user
    preferences)
Colaborative filtering
• User based
• Item based
• Both techniques require no knowledge of
  the properties of the items themselves.
• Item Type is irrelevant. Apache Mahout is
  happy
17
Content based
• Domain-specific approaches
• Hard to meaningfully codify into a
  framework
• We are responsables of choosing which
  item's attributes to use.
• Apache Mahout can’t handle this out-of-
  the-box, but can built on top.
Making recommendations

 • What we need?
  • Input data
  • Neighborhood
  • Similarity
Input Data
•   In Mahout terms: Preferences
•   A preference contains:
    •   User ID
    •   Item ID
    •   Preference value
    •   Example:
        •   1,101,5.0
        •   USER ID: 1, ITEM ID: 101, PrefValue: 5.0
21
Neighborhood
Nearest N Users    Threshold
Similarity
Clustering

• Surface naturally occurring groups of data
• A notion of similarity (and dissimilarity)
• Algorithms do not require training
• Stopping condition - iterate until close
  enough
Clustering
•   Document level
    •   Group documents based on a notion of similarity
    •   K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift
    •   Distance Measures
    •   Manhattan, Euclidean, other
•   Topic Modeling
    •   Cluster words across documents to identify topics
    •   Latent Dirichlet Allocation
Classification

• Require training (supervised)
• Make a single decision with a very limited
  set of outcomes
• Typical answers naturally fit into categories
Classification samples

• Credit card fraud prediction
• Customer attrition
• Diabetes detector
• Search Engine
Mahout/Hadoop
• For large data sets
• Online
• Offline (Hadoop prefered)
• You can build your solution with Mahout
• Take a look into Weka
 • http://www.cs.waikato.ac.nz/ml/weka/
Resources
Resources
Resources
Join us¡
• GIAMA.
 • Agustin Ramos iniciative

More Related Content

What's hot

Cloud Computing Business Models
Cloud Computing Business ModelsCloud Computing Business Models
Cloud Computing Business ModelsKarri Huhtanen
 
Virtualization and cloud Computing
Virtualization and cloud ComputingVirtualization and cloud Computing
Virtualization and cloud ComputingRishikese MR
 
SDPM - Lecture 2 -The STEP WISE Approach to Project Planning
SDPM - Lecture 2 -The STEP WISE Approach to Project PlanningSDPM - Lecture 2 -The STEP WISE Approach to Project Planning
SDPM - Lecture 2 -The STEP WISE Approach to Project PlanningOpenLearningLab
 
Cloud computing writeup
Cloud computing writeupCloud computing writeup
Cloud computing writeupselvavijay1987
 
Software Project Management( lecture 1)
Software Project Management( lecture 1)Software Project Management( lecture 1)
Software Project Management( lecture 1)Syed Muhammad Hammad
 
Cloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big dataCloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big dataDavid Parsons
 
Project Planning in Software Engineering
Project Planning in Software EngineeringProject Planning in Software Engineering
Project Planning in Software EngineeringFáber D. Giraldo
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
JBoss EAP / WildFly, State of the Union
JBoss EAP / WildFly, State of the UnionJBoss EAP / WildFly, State of the Union
JBoss EAP / WildFly, State of the UnionDimitris Andreadis
 
Cloud computing Risk management
Cloud computing Risk management  Cloud computing Risk management
Cloud computing Risk management Padma Jella
 
Google app engine
Google app engineGoogle app engine
Google app engineSuraj Mehta
 
Software Project Management ppt
Software Project Management pptSoftware Project Management ppt
Software Project Management pptAndreea Usatenco
 

What's hot (20)

Cloud Computing Business Models
Cloud Computing Business ModelsCloud Computing Business Models
Cloud Computing Business Models
 
Trabalho de Pesquisa Service Desk
Trabalho de Pesquisa Service DeskTrabalho de Pesquisa Service Desk
Trabalho de Pesquisa Service Desk
 
Cloud Computing Tools
Cloud Computing ToolsCloud Computing Tools
Cloud Computing Tools
 
Virtualization and cloud Computing
Virtualization and cloud ComputingVirtualization and cloud Computing
Virtualization and cloud Computing
 
SDPM - Lecture 2 -The STEP WISE Approach to Project Planning
SDPM - Lecture 2 -The STEP WISE Approach to Project PlanningSDPM - Lecture 2 -The STEP WISE Approach to Project Planning
SDPM - Lecture 2 -The STEP WISE Approach to Project Planning
 
Cloud computing writeup
Cloud computing writeupCloud computing writeup
Cloud computing writeup
 
Software Project Management( lecture 1)
Software Project Management( lecture 1)Software Project Management( lecture 1)
Software Project Management( lecture 1)
 
Cloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big dataCloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big data
 
Open Source Cloud
Open Source CloudOpen Source Cloud
Open Source Cloud
 
Project Planning in Software Engineering
Project Planning in Software EngineeringProject Planning in Software Engineering
Project Planning in Software Engineering
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
JBoss EAP / WildFly, State of the Union
JBoss EAP / WildFly, State of the UnionJBoss EAP / WildFly, State of the Union
JBoss EAP / WildFly, State of the Union
 
Cloud computing Risk management
Cloud computing Risk management  Cloud computing Risk management
Cloud computing Risk management
 
Case tools
Case tools Case tools
Case tools
 
Cluster and Grid Computing
Cluster and Grid ComputingCluster and Grid Computing
Cluster and Grid Computing
 
Google app engine
Google app engineGoogle app engine
Google app engine
 
Public cloud
Public cloudPublic cloud
Public cloud
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Software Project Management ppt
Software Project Management pptSoftware Project Management ppt
Software Project Management ppt
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 

Viewers also liked

Viewers also liked (6)

SGCE 2015 REST APIs
SGCE 2015 REST APIsSGCE 2015 REST APIs
SGCE 2015 REST APIs
 
Serling dev team, development process
Serling dev team, development processSerling dev team, development process
Serling dev team, development process
 
SGCE 2012 Lightning Talk-Single Page Interface
SGCE 2012 Lightning Talk-Single Page InterfaceSGCE 2012 Lightning Talk-Single Page Interface
SGCE 2012 Lightning Talk-Single Page Interface
 
SGNext Elasticsearch
SGNext ElasticsearchSGNext Elasticsearch
SGNext Elasticsearch
 
JVM Reactive Programming
JVM Reactive ProgrammingJVM Reactive Programming
JVM Reactive Programming
 
SGCE 2014 micro services
SGCE 2014 micro servicesSGCE 2014 micro services
SGCE 2014 micro services
 

Similar to Machine Learning & Apache Mahout

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
Download Materials
Download MaterialsDownload Materials
Download Materialsbutest
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Pythonbotsplash.com
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...Lucas Jellema
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesHelen Bailey
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 

Similar to Machine Learning & Apache Mahout (20)

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
Download Materials
Download MaterialsDownload Materials
Download Materials
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
machine learning
machine learningmachine learning
machine learning
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital Humanities
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 

More from Domingo Suarez Torres

Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesCloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesDomingo Suarez Torres
 
Java Dev Day 2019 No kuberneteen por convivir
Java Dev Day 2019  No kuberneteen por convivirJava Dev Day 2019  No kuberneteen por convivir
Java Dev Day 2019 No kuberneteen por convivirDomingo Suarez Torres
 
Retos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosRetos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosDomingo Suarez Torres
 
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDomingo Suarez Torres
 
Cloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesCloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesDomingo Suarez Torres
 
Meetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureMeetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureDomingo Suarez Torres
 
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyCloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyDomingo Suarez Torres
 
Cloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityCloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityDomingo Suarez Torres
 
Orquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextOrquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextDomingo Suarez Torres
 
Webinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosWebinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosDomingo Suarez Torres
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Domingo Suarez Torres
 

More from Domingo Suarez Torres (20)

Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesCloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
 
Java Dev Day 2019 No kuberneteen por convivir
Java Dev Day 2019  No kuberneteen por convivirJava Dev Day 2019  No kuberneteen por convivir
Java Dev Day 2019 No kuberneteen por convivir
 
Contenedores 101 Digital Ocean CDMX
Contenedores 101 Digital Ocean CDMXContenedores 101 Digital Ocean CDMX
Contenedores 101 Digital Ocean CDMX
 
Retos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosRetos en la arquitectura de Microservicios
Retos en la arquitectura de Microservicios
 
Java Cloud Native Hack Nights GDL
Java Cloud Native Hack Nights GDLJava Cloud Native Hack Nights GDL
Java Cloud Native Hack Nights GDL
 
meetup digital ocean kubernetes
meetup digital ocean kubernetesmeetup digital ocean kubernetes
meetup digital ocean kubernetes
 
Peru JUG Micronaut & GraalVM
Peru JUG Micronaut & GraalVMPeru JUG Micronaut & GraalVM
Peru JUG Micronaut & GraalVM
 
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
 
Cloud Native Development in the JVM
Cloud Native Development in the JVMCloud Native Development in the JVM
Cloud Native Development in the JVM
 
Cloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesCloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a Kubernetes
 
Meetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureMeetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architecture
 
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyCloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
 
Cloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityCloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 Observability
 
Cloud Native Mexico Presentacion
Cloud Native Mexico PresentacionCloud Native Mexico Presentacion
Cloud Native Mexico Presentacion
 
gRPC: Beyond REST
gRPC: Beyond RESTgRPC: Beyond REST
gRPC: Beyond REST
 
Devops Landscape
Devops LandscapeDevops Landscape
Devops Landscape
 
Orquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextOrquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNext
 
Webinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosWebinar Arquitectura de Microservicios
Webinar Arquitectura de Microservicios
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 
Ratpack JVM_MX Meetup February 2016
Ratpack JVM_MX Meetup February 2016Ratpack JVM_MX Meetup February 2016
Ratpack JVM_MX Meetup February 2016
 

Recently uploaded

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Recently uploaded (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Machine Learning & Apache Mahout

  • 1. Machine Learning con Apache Mahout Domingo Suarez Torres
  • 2. Machine Learning (ML) Introduction
  • 3. Definition • Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data (1) 1http://en.wikipedia.org/wiki/Machine_learning
  • 4. • “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” • Intro. To Machine Learning by E. Alpaydin
  • 5. Applications • Recommend friends/dates/ • Detect anomalies in machine products output • Classify content into • Ranking search results predefined groups • Fraud detection • Find similar content based on object properties • Spam detection • Find associations/patterns in • Medical diagnostics actions/behaviors • Translators • Identify key topics in large collections of text • Much more¡
  • 6. Math • Stadistics • Discrete Math • Linear algebra • Probability
  • 7.
  • 8. Starting with ML • Get your data • Decide on your features per your algorithm • Prep the data • Different approaches for different algorithms • Run your algorithm(s) • Lather, rinse, repeat • Validate your results • Smell test, A/B testing
  • 9. Apache Mahout • Machine Learning library. Platform? • Extensible, we can use our own algorithm. • Hadoop support • 2005. Taste Framework • 2008. Included in Lucene
  • 10. Scalability • Huge amount of data, growing every second¡ • Be as fast and efficient as possible given the intrinsic design of the algorithm • Some algorithms won’t scale to massive machine clusters • Others fit logically on a Map Reduce framework like Apache Hadoop • Still others will need alternative distributed programming models • Be pragmatic • Most Mahout implementations are Map Reduce enabled
  • 12. Components • Recommender Engines (collaborative filtering, content-based) • Clustering • Classification
  • 13. When to use? • Recommendation • Rank large datasets • Clustering • Group your data • Classification • Train me to think like you
  • 14. Recommenders • Given a data set. Make a recomendation. • Item recomendation (Book, Movie, etc) • Ranking based • Recomendations • User based • Item based • knowledge of user’s relationships to items (user preferences)
  • 15.
  • 16. Colaborative filtering • User based • Item based • Both techniques require no knowledge of the properties of the items themselves. • Item Type is irrelevant. Apache Mahout is happy
  • 17. 17
  • 18. Content based • Domain-specific approaches • Hard to meaningfully codify into a framework • We are responsables of choosing which item's attributes to use. • Apache Mahout can’t handle this out-of- the-box, but can built on top.
  • 19. Making recommendations • What we need? • Input data • Neighborhood • Similarity
  • 20. Input Data • In Mahout terms: Preferences • A preference contains: • User ID • Item ID • Preference value • Example: • 1,101,5.0 • USER ID: 1, ITEM ID: 101, PrefValue: 5.0
  • 21. 21
  • 22.
  • 25. Clustering • Surface naturally occurring groups of data • A notion of similarity (and dissimilarity) • Algorithms do not require training • Stopping condition - iterate until close enough
  • 26. Clustering • Document level • Group documents based on a notion of similarity • K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift • Distance Measures • Manhattan, Euclidean, other • Topic Modeling • Cluster words across documents to identify topics • Latent Dirichlet Allocation
  • 27. Classification • Require training (supervised) • Make a single decision with a very limited set of outcomes • Typical answers naturally fit into categories
  • 28. Classification samples • Credit card fraud prediction • Customer attrition • Diabetes detector • Search Engine
  • 29. Mahout/Hadoop • For large data sets • Online • Offline (Hadoop prefered) • You can build your solution with Mahout • Take a look into Weka • http://www.cs.waikato.ac.nz/ml/weka/
  • 33.
  • 34. Join us¡ • GIAMA. • Agustin Ramos iniciative

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n