• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Machine Learning & Apache Mahout
 

Machine Learning & Apache Mahout

on

  • 3,729 views

Machine Learning es una rama de la inteligencia artificial, que nos permite utilizar algoritmos que pueden operar sobre datos para determinar comportamiento, patrones, preferencias, etc....

Machine Learning es una rama de la inteligencia artificial, que nos permite utilizar algoritmos que pueden operar sobre datos para determinar comportamiento, patrones, preferencias, etc.

Apache Mahout es una librería de código abierto que implementa una diversidad de algoritmos de Machine Learning, que bien pueden ser usados para construir un motor de recomendaciones para dirigir compras.

Statistics

Views

Total Views
3,729
Views on SlideShare
3,567
Embed Views
162

Actions

Likes
4
Downloads
73
Comments
0

8 Embeds 162

https://twitter.com 55
http://lanyrd.com 44
http://archive.sg.com.mx 37
http://sg.com.mx 18
http://www.onlydoo.com 3
http://www.linkedin.com 3
http://us-w1.rockmelt.com 1
https://si0.twimg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Machine Learning & Apache Mahout Machine Learning & Apache Mahout Presentation Transcript

  • Machine Learningcon Apache Mahout Domingo Suarez Torres
  • Machine Learning (ML) Introduction
  • Definition • Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data (1)1http://en.wikipedia.org/wiki/Machine_learning
  • • “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” • Intro. To Machine Learning by E. Alpaydin
  • Applications• Recommend friends/dates/ • Detect anomalies in machine products output• Classify content into • Ranking search results predefined groups • Fraud detection• Find similar content based on object properties • Spam detection• Find associations/patterns in • Medical diagnostics actions/behaviors • Translators• Identify key topics in large collections of text • Much more¡
  • Math• Stadistics• Discrete Math• Linear algebra• Probability
  • Starting with ML• Get your data• Decide on your features per your algorithm• Prep the data • Different approaches for different algorithms• Run your algorithm(s) • Lather, rinse, repeat• Validate your results • Smell test, A/B testing
  • Apache Mahout• Machine Learning library. Platform?• Extensible, we can use our own algorithm.• Hadoop support• 2005. Taste Framework• 2008. Included in Lucene
  • Scalability• Huge amount of data, growing every second¡• Be as fast and efficient as possible given the intrinsic design of the algorithm • Some algorithms won’t scale to massive machine clusters • Others fit logically on a Map Reduce framework like Apache Hadoop • Still others will need alternative distributed programming models • Be pragmatic• Most Mahout implementations are Map Reduce enabled
  • Who uses Mahout?
  • Components• Recommender Engines (collaborative filtering, content-based)• Clustering• Classification
  • When to use?• Recommendation • Rank large datasets• Clustering • Group your data• Classification • Train me to think like you
  • Recommenders• Given a data set. Make a recomendation. • Item recomendation (Book, Movie, etc)• Ranking based• Recomendations • User based • Item based• knowledge of user’s relationships to items (user preferences)
  • Colaborative filtering• User based• Item based• Both techniques require no knowledge of the properties of the items themselves.• Item Type is irrelevant. Apache Mahout is happy
  • 17
  • Content based• Domain-specific approaches• Hard to meaningfully codify into a framework• We are responsables of choosing which items attributes to use.• Apache Mahout can’t handle this out-of- the-box, but can built on top.
  • Making recommendations • What we need? • Input data • Neighborhood • Similarity
  • Input Data• In Mahout terms: Preferences• A preference contains: • User ID • Item ID • Preference value • Example: • 1,101,5.0 • USER ID: 1, ITEM ID: 101, PrefValue: 5.0
  • 21
  • NeighborhoodNearest N Users Threshold
  • Similarity
  • Clustering• Surface naturally occurring groups of data• A notion of similarity (and dissimilarity)• Algorithms do not require training• Stopping condition - iterate until close enough
  • Clustering• Document level • Group documents based on a notion of similarity • K-Means, Fuzzy K-Means, Dirichlet, Canopy, Mean-Shift • Distance Measures • Manhattan, Euclidean, other• Topic Modeling • Cluster words across documents to identify topics • Latent Dirichlet Allocation
  • Classification• Require training (supervised)• Make a single decision with a very limited set of outcomes• Typical answers naturally fit into categories
  • Classification samples• Credit card fraud prediction• Customer attrition• Diabetes detector• Search Engine
  • Mahout/Hadoop• For large data sets• Online• Offline (Hadoop prefered)• You can build your solution with Mahout• Take a look into Weka • http://www.cs.waikato.ac.nz/ml/weka/
  • Resources
  • Resources
  • Resources
  • Join us¡• GIAMA. • Agustin Ramos iniciative