13. Why Personalization?
“...it works well the advertisements are annoying though I am not a fan of
mainstream music so hearing about pop bands is also driving me crazy”
“Great way to listen to whatever music you want. The ads can be really
annoying though since they don't seem to be targeted. I HATE rap music, yet I
seem to get a lot of ads for it.”
30. Kafka
● Kafka is a distributed, partitioned, replicated commit log service.
● Guarantees
● Kafka provides a total order over messages within a partition
● Fault tolerance : handles N-1 failures for replication factor N.
31. Ad Targeting Architecture V1.0
COTS Data
Infrastructure
Real-time Targeting
Spotify Backend
Infrastructure
32. SSttoorrmm
● Real time stream processing
● Like hadoop without HDFS
● Like Map/Reduce with many reducer steps
● Fault tolerant and guaranteed message processing
36. Apache Crunch
● Framework for writing, testing, and running MapReduce pipelines
● Pipelines are composed of user-defined functions and higher-level
abstractions of common MR tasks (filter, join, etc.)
38. Apache Crunch
What’s wrong with plain Python Streaming MapReduce?
● Testability
● Optimization
● Performance
● IDE support
● Type Safety
● Lack of higher-level operations (filter/join/aggregate)
From Spotify Presentation: Scalding the Crunchy Pig for Cascading into the Hive
39. Apache Crunch
● About a 5x performance improvement over Python streaming MapReduce
● Readable functional-style API in plain Java
● Great local testing support
● First-class support for Avro records.
From Spotify Presentation: Scalding the Crunchy Pig for Cascading into the Hive