SlideShare a Scribd company logo
1 of 34
Machine Learning
con Apache Mahout
  Domingo Suarez Torres
Machine Learning (ML)
        Introduction
Definition

     • Machine learning, a branch of artificial
        intelligence, is a scientific discipline
        concerned with the design and
        development of algorithms that allow
        computers to evolve behaviors based on
        empirical data (1)


1http://en.wikipedia.org/wiki/Machine_learning
• “Machine Learning is programming
  computers to optimize a performance
  criterion using example data or past
  experience”
 • Intro. To Machine Learning by E. Alpaydin
Applications
•   Recommend friends/dates/        •   Detect anomalies in machine
    products                            output

•   Classify content into           •   Ranking search results
    predefined groups
                                    •   Fraud detection
•   Find similar content based
    on object properties            •   Spam detection

•   Find associations/patterns in   •   Medical diagnostics
    actions/behaviors
                                    •   Translators
•   Identify key topics in large
    collections of text             •   Much more¡
Math

• Stadistics
• Discrete Math
• Linear algebra
• Probability
Starting with ML
•   Get your data
•   Decide on your features per your algorithm
•   Prep the data
    •   Different approaches for different algorithms
•   Run your algorithm(s)
    •   Lather, rinse, repeat
•   Validate your results
    •   Smell test, A/B testing
Apache Mahout

• Machine Learning library. Platform?
• Extensible, we can use our own algorithm.
• Hadoop support
• 2005. Taste Framework
• 2008. Included in Lucene
Scalability
•   Huge amount of data, growing every second¡
•   Be as fast and efficient as possible given the intrinsic design of
    the algorithm
    •   Some algorithms won’t scale to massive machine clusters
    •   Others fit logically on a Map Reduce framework like
        Apache Hadoop
    •   Still others will need alternative distributed programming
        models
    •   Be pragmatic
•   Most Mahout implementations are Map Reduce enabled
Who uses Mahout?
Components

• Recommender Engines (collaborative
  filtering, content-based)
• Clustering
• Classification
When to use?
• Recommendation
 • Rank large datasets
• Clustering
 • Group your data
• Classification
 • Train me to think like you
Recommenders
•   Given a data set. Make a recomendation.
    •   Item recomendation (Book, Movie, etc)
•   Ranking based
•   Recomendations
    •   User based
    •   Item based
•   knowledge of user’s relationships to items (user
    preferences)
Colaborative filtering
• User based
• Item based
• Both techniques require no knowledge of
  the properties of the items themselves.
• Item Type is irrelevant. Apache Mahout is
  happy
17
Content based
• Domain-specific approaches
• Hard to meaningfully codify into a
  framework
• We are responsables of choosing which
  item's attributes to use.
• Apache Mahout can’t handle this out-of-
  the-box, but can built on top.
Making recommendations

 • What we need?
  • Input data
  • Neighborhood
  • Similarity
Input Data
•   In Mahout terms: Preferences
•   A preference contains:
    •   User ID
    •   Item ID
    •   Preference value
    •   Example:
        •   1,101,5.0
        •   USER ID: 1, ITEM ID: 101, PrefValue: 5.0
21
Neighborhood
Nearest N Users    Threshold
Similarity
Clustering

• Surface naturally occurring groups of data
• A notion of similarity (and dissimilarity)
• Algorithms do not require training
• Stopping condition - iterate until close
  enough
Clustering
•   Document level
    •   Group documents based on a notion of similarity
    •   K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift
    •   Distance Measures
    •   Manhattan, Euclidean, other
•   Topic Modeling
    •   Cluster words across documents to identify topics
    •   Latent Dirichlet Allocation
Classification

• Require training (supervised)
• Make a single decision with a very limited
  set of outcomes
• Typical answers naturally fit into categories
Classification samples

• Credit card fraud prediction
• Customer attrition
• Diabetes detector
• Search Engine
Mahout/Hadoop
• For large data sets
• Online
• Offline (Hadoop prefered)
• You can build your solution with Mahout
• Take a look into Weka
 • http://www.cs.waikato.ac.nz/ml/weka/
Resources
Resources
Resources
Join us¡
• GIAMA.
 • Agustin Ramos iniciative

More Related Content

What's hot

An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AICori Faklaris
 
The Next Big Thing in AI - Causality
The Next Big Thing in AI - CausalityThe Next Big Thing in AI - Causality
The Next Big Thing in AI - CausalityVaticle
 
Propositional And First-Order Logic
Propositional And First-Order LogicPropositional And First-Order Logic
Propositional And First-Order Logicankush_kumar
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionKent State University
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Raheel Ahmad
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAnimesh Singh
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetHoang Nguyen Phong
 
MACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULEMACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULEDrBindhuM
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Towards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptxTowards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptxLuis775803
 

What's hot (20)

An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
The Next Big Thing in AI - Causality
The Next Big Thing in AI - CausalityThe Next Big Thing in AI - Causality
The Next Big Thing in AI - Causality
 
Windows forensic artifacts
Windows forensic artifactsWindows forensic artifacts
Windows forensic artifacts
 
Propositional And First-Order Logic
Propositional And First-Order LogicPropositional And First-Order Logic
Propositional And First-Order Logic
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
AI Science
AI Science AI Science
AI Science
 
Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models?
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AI
 
ChatGPT for Academic
ChatGPT for AcademicChatGPT for Academic
ChatGPT for Academic
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 
AI and Accountability
AI and AccountabilityAI and Accountability
AI and Accountability
 
MACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULEMACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULE
 
Kafka at trivago
Kafka at trivagoKafka at trivago
Kafka at trivago
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Towards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptxTowards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptx
 

Viewers also liked

Viewers also liked (6)

SGCE 2015 REST APIs
SGCE 2015 REST APIsSGCE 2015 REST APIs
SGCE 2015 REST APIs
 
Serling dev team, development process
Serling dev team, development processSerling dev team, development process
Serling dev team, development process
 
SGCE 2012 Lightning Talk-Single Page Interface
SGCE 2012 Lightning Talk-Single Page InterfaceSGCE 2012 Lightning Talk-Single Page Interface
SGCE 2012 Lightning Talk-Single Page Interface
 
SGNext Elasticsearch
SGNext ElasticsearchSGNext Elasticsearch
SGNext Elasticsearch
 
JVM Reactive Programming
JVM Reactive ProgrammingJVM Reactive Programming
JVM Reactive Programming
 
SGCE 2014 micro services
SGCE 2014 micro servicesSGCE 2014 micro services
SGCE 2014 micro services
 

Similar to Machine Learning & Apache Mahout

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
Download Materials
Download MaterialsDownload Materials
Download Materialsbutest
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Pythonbotsplash.com
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...Lucas Jellema
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesHelen Bailey
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 

Similar to Machine Learning & Apache Mahout (20)

Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
Download Materials
Download MaterialsDownload Materials
Download Materials
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
machine learning
machine learningmachine learning
machine learning
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital Humanities
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 

More from Domingo Suarez Torres

Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesCloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesDomingo Suarez Torres
 
Java Dev Day 2019 No kuberneteen por convivir
Java Dev Day 2019  No kuberneteen por convivirJava Dev Day 2019  No kuberneteen por convivir
Java Dev Day 2019 No kuberneteen por convivirDomingo Suarez Torres
 
Retos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosRetos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosDomingo Suarez Torres
 
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDomingo Suarez Torres
 
Cloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesCloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesDomingo Suarez Torres
 
Meetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureMeetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureDomingo Suarez Torres
 
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyCloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyDomingo Suarez Torres
 
Cloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityCloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityDomingo Suarez Torres
 
Orquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextOrquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextDomingo Suarez Torres
 
Webinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosWebinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosDomingo Suarez Torres
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Domingo Suarez Torres
 

More from Domingo Suarez Torres (20)

Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de KubernetesCloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
Cloud Native MX Meetup - Asegurando tu Cluster de Kubernetes
 
Java Dev Day 2019 No kuberneteen por convivir
Java Dev Day 2019  No kuberneteen por convivirJava Dev Day 2019  No kuberneteen por convivir
Java Dev Day 2019 No kuberneteen por convivir
 
Contenedores 101 Digital Ocean CDMX
Contenedores 101 Digital Ocean CDMXContenedores 101 Digital Ocean CDMX
Contenedores 101 Digital Ocean CDMX
 
Retos en la arquitectura de Microservicios
Retos en la arquitectura de MicroserviciosRetos en la arquitectura de Microservicios
Retos en la arquitectura de Microservicios
 
Java Cloud Native Hack Nights GDL
Java Cloud Native Hack Nights GDLJava Cloud Native Hack Nights GDL
Java Cloud Native Hack Nights GDL
 
meetup digital ocean kubernetes
meetup digital ocean kubernetesmeetup digital ocean kubernetes
meetup digital ocean kubernetes
 
Peru JUG Micronaut & GraalVM
Peru JUG Micronaut & GraalVMPeru JUG Micronaut & GraalVM
Peru JUG Micronaut & GraalVM
 
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con IstioDevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
DevFest Lima Corriendo cargas e trabajo seguras en GKE con Istio
 
Cloud Native Development in the JVM
Cloud Native Development in the JVMCloud Native Development in the JVM
Cloud Native Development in the JVM
 
Cloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a KubernetesCloud Native Mexico - Introducción a Kubernetes
Cloud Native Mexico - Introducción a Kubernetes
 
Meetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architectureMeetup DigitalOcean Cloud Native architecture
Meetup DigitalOcean Cloud Native architecture
 
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y EnvoyCloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
Cloud Native Mexico Meetup de Marzo 2018 Service Mesh con Istio y Envoy
 
Cloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 ObservabilityCloud Native Mexico Meetup enero 2018 Observability
Cloud Native Mexico Meetup enero 2018 Observability
 
Cloud Native Mexico Presentacion
Cloud Native Mexico PresentacionCloud Native Mexico Presentacion
Cloud Native Mexico Presentacion
 
gRPC: Beyond REST
gRPC: Beyond RESTgRPC: Beyond REST
gRPC: Beyond REST
 
Devops Landscape
Devops LandscapeDevops Landscape
Devops Landscape
 
Orquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNextOrquestación de contenedores con Kubernetes SGNext
Orquestación de contenedores con Kubernetes SGNext
 
Webinar Arquitectura de Microservicios
Webinar Arquitectura de MicroserviciosWebinar Arquitectura de Microservicios
Webinar Arquitectura de Microservicios
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 
Ratpack JVM_MX Meetup February 2016
Ratpack JVM_MX Meetup February 2016Ratpack JVM_MX Meetup February 2016
Ratpack JVM_MX Meetup February 2016
 

Recently uploaded

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Machine Learning & Apache Mahout

  • 1. Machine Learning con Apache Mahout Domingo Suarez Torres
  • 2. Machine Learning (ML) Introduction
  • 3. Definition • Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data (1) 1http://en.wikipedia.org/wiki/Machine_learning
  • 4. • “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” • Intro. To Machine Learning by E. Alpaydin
  • 5. Applications • Recommend friends/dates/ • Detect anomalies in machine products output • Classify content into • Ranking search results predefined groups • Fraud detection • Find similar content based on object properties • Spam detection • Find associations/patterns in • Medical diagnostics actions/behaviors • Translators • Identify key topics in large collections of text • Much more¡
  • 6. Math • Stadistics • Discrete Math • Linear algebra • Probability
  • 7.
  • 8. Starting with ML • Get your data • Decide on your features per your algorithm • Prep the data • Different approaches for different algorithms • Run your algorithm(s) • Lather, rinse, repeat • Validate your results • Smell test, A/B testing
  • 9. Apache Mahout • Machine Learning library. Platform? • Extensible, we can use our own algorithm. • Hadoop support • 2005. Taste Framework • 2008. Included in Lucene
  • 10. Scalability • Huge amount of data, growing every second¡ • Be as fast and efficient as possible given the intrinsic design of the algorithm • Some algorithms won’t scale to massive machine clusters • Others fit logically on a Map Reduce framework like Apache Hadoop • Still others will need alternative distributed programming models • Be pragmatic • Most Mahout implementations are Map Reduce enabled
  • 12. Components • Recommender Engines (collaborative filtering, content-based) • Clustering • Classification
  • 13. When to use? • Recommendation • Rank large datasets • Clustering • Group your data • Classification • Train me to think like you
  • 14. Recommenders • Given a data set. Make a recomendation. • Item recomendation (Book, Movie, etc) • Ranking based • Recomendations • User based • Item based • knowledge of user’s relationships to items (user preferences)
  • 15.
  • 16. Colaborative filtering • User based • Item based • Both techniques require no knowledge of the properties of the items themselves. • Item Type is irrelevant. Apache Mahout is happy
  • 17. 17
  • 18. Content based • Domain-specific approaches • Hard to meaningfully codify into a framework • We are responsables of choosing which item's attributes to use. • Apache Mahout can’t handle this out-of- the-box, but can built on top.
  • 19. Making recommendations • What we need? • Input data • Neighborhood • Similarity
  • 20. Input Data • In Mahout terms: Preferences • A preference contains: • User ID • Item ID • Preference value • Example: • 1,101,5.0 • USER ID: 1, ITEM ID: 101, PrefValue: 5.0
  • 21. 21
  • 22.
  • 25. Clustering • Surface naturally occurring groups of data • A notion of similarity (and dissimilarity) • Algorithms do not require training • Stopping condition - iterate until close enough
  • 26. Clustering • Document level • Group documents based on a notion of similarity • K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift • Distance Measures • Manhattan, Euclidean, other • Topic Modeling • Cluster words across documents to identify topics • Latent Dirichlet Allocation
  • 27. Classification • Require training (supervised) • Make a single decision with a very limited set of outcomes • Typical answers naturally fit into categories
  • 28. Classification samples • Credit card fraud prediction • Customer attrition • Diabetes detector • Search Engine
  • 29. Mahout/Hadoop • For large data sets • Online • Offline (Hadoop prefered) • You can build your solution with Mahout • Take a look into Weka • http://www.cs.waikato.ac.nz/ml/weka/
  • 33.
  • 34. Join us¡ • GIAMA. • Agustin Ramos iniciative

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n