Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning with Apache Spark


Published on

A background on machine learning before we dive into the source code and examples of machine learning with Spark MLlib.

This is an excerpt for the Apache Spark with Scala training available at

Published in: Data & Analytics
  • Be the first to comment

Machine Learning with Apache Spark

  1. 1. Machine Learning with Spark
  2. 2. Background • What is machine learning? • Simply put, machine learning is creating and using models that are learned from data. • In other contexts it might be called predictive modeling or data mining.
  3. 3. More Background • Examples • Predicting whether an email message is spam or not • Predicting whether a credit card transaction is fraudulent • Predicting which advertisement a shopper is most likely to click on • Predicting which product a shopper is likely to purchase; recommendation engines
  4. 4. Types of Machine Learning Models • Supervised • Unsupervised
  5. 5. Types of Machine Learning Models • Supervised • Supervised models contains a set of data labeled with the correct answers to learn from • Unsupervised • Unsupervised models do not contain such labels.
  6. 6. Supervised Machine Learning Models Examples • k-Nearest Neighbors • Imagine trying to predict how a person will vote in the next presidential election. If you know nothing about the person and you are trying to predict their vote, one sensible approach is to look at how their neighbors are planning to vote (if you have the data)
  7. 7. Supervised Machine Learning Models Examples • Naive Bayes • A common use of the Naive Bayes machine learning model is spam detection. • A key to Naive Bayes is making the (big) assumption that the presences (or absences) of a word are independent of one another, conditional on a message being spam or not
  8. 8. Supervised Machine Learning Models Examples • Regression Models • Simple Linear Regression - used to prove correlation between two variables • Multiple Regression - used to prove correlation with more than two variables
  9. 9. Supervised Machine Learning Models Examples • Decision Trees • A decision tree uses a structure to represent a number of possible decision paths and an outcome for each path. • Decision trees are often divided into classification trees (which produce categorical outputs) and regression trees (which produce numeric outputs).
  10. 10. Supervised Machine Learning Models Examples • Neural Networks • Solve problems like handwriting recognition and face detection
  11. 11. Unsupervised Machine Learning Background • Clustering is an example of unsupervised learning, in which we work with completely unlabeled data. Clustering contrasts with supervised learning which uses labeled data as basis for making predictions • Clusters won’t label themselves. The labels may be determined via unsupervised learning approaches which look at the data underlying each label. • For example, a data set showing where millionaires live probably has clusters in places like Beverly Hills and Manhattan
  12. 12. Unsupervised Machine Learning Examples • k-means • Number of clusters k is chosen in advance • Example: Clustering may be used when determine or limiting a photograph to only five colors. For example, you may have a photograph which you want to use to print stickers. However, the printer only supports up to 5 colors per sticker.
  13. 13. Unsupervised Machine Learning Examples • Latent Dirichlet Analysis (LDA) • Natural Language Processing • Commonly used to identify common topics in a set of documents
  14. 14. Unsupervised Machine Learning Examples • Network Analysis • Recommender Systems
  15. 15. For more, visit