3. Machine Learning
• Batch processing shows what happened in
the past
• Stream processing shows what’s happening
now
• Machine Learning predicts the future
5. Apache Spark
Fast and general framework for Big Data analytics
Most active project in open source Big Data
Faster than Hadoop MapReduce due to “in-memory” computation
Can be used with Java, Scala, Python, R, interactive REPL, notebooks
7. Apache Spark
Coppatible with open source Big Data ecosystem
Hadoop YARN
Mesos
“Standalone”
Cloud:
AWS EMR
Azure HDInsight
Google Cloud Dataproc
8. Personalized Recommendation Systems
Taking into account personal preferences instead of offering the most popular
items to all users.
Applications: E-commerce, video, music, news…
Increases customer engagement and revenue
Amazon attributes 25% of its revenue to its recommendation system
Netflix Prize: $1M for %10 increase in recommender performance
Requires collection and analaysis of user-item interaction data. Machine
Learning, business rules.