Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fikrimuhal TRHUG 2016 Machine Learning

67 views

Published on

Predictive Analytics is the next step after batch and stream processing.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Fikrimuhal TRHUG 2016 Machine Learning

  1. 1. Large-Scale Machine Learning Şükrü Hasdemir, Fikrimuhal TRHUG 2016
  2. 2. Contents Batch, Stream, Predictive Analytics Machine Learning Apache Spark ML Example: Recommender Systems Machine Learning Performance
  3. 3. Machine Learning • Batch processing shows what happened in the past • Stream processing shows what’s happening now • Machine Learning predicts the future
  4. 4. Machine Learning Is Used Everywhere
  5. 5. Apache Spark Fast and general framework for Big Data analytics Most active project in open source Big Data Faster than Hadoop MapReduce due to “in-memory” computation Can be used with Java, Scala, Python, R, interactive REPL, notebooks
  6. 6. Apache Spark Spark included rich libraries for a variety of purposes.
  7. 7. Apache Spark Coppatible with open source Big Data ecosystem Hadoop YARN Mesos “Standalone” Cloud: AWS EMR Azure HDInsight Google Cloud Dataproc
  8. 8. Personalized Recommendation Systems Taking into account personal preferences instead of offering the most popular items to all users. Applications: E-commerce, video, music, news… Increases customer engagement and revenue Amazon attributes 25% of its revenue to its recommendation system Netflix Prize: $1M for %10 increase in recommender performance Requires collection and analaysis of user-item interaction data. Machine Learning, business rules.
  9. 9. Recommendation Algorithms Content-Based Filtering Uses product features Collaborative Filtering Uses actions of other users Extrinsic/intrinsic feedback Neighborhood models User/item based Latent Factor Models
  10. 10. Matrix Factorization Model Kaynak: https://databricks-training.s3.amazonaws.com/img/matrix_factorization.png
  11. 11. Real World: Performance Cross-Validation, hyperparameter optimization Better metrics: Ranking performance metrics MAP, NDCG, precisionAt(k), … IR evaluation methods for retrieving highly relevant documents. K. Jarvelin and J. Kekalainen Online tests Ensemble (hybrid) models
  12. 12. Thank you!

×