Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Generating Real-time,
Streaming Recommendations
[NiFi + Kafka + Spark ML]
Kafka Summit SF
April 26, 2016
Who am I?
Chris Fregly, Principal Data Solutions Engineer
@ IBM Spark Technology Center
Previously, Data Engineer @ Netfli...
Relevant Spark Contribution
SPARK-1981: Add Kinesis support for Spark Streaming
Me
Fun Meetup!
Fun Workshop!!
San Jose: May 14th (full details @ advancedspark.com)
Fun Github Repo!!!
Agenda
Live, Interactive Demo!
NiFi
Spark Streaming
Streaming Recommendations
Netflix Pipeline (Bonus!)
Live, Interactive Demo
http://demo2.advancedspark.com
Agenda
Live, Interactive Demo!
NiFi
Spark Streaming
Streaming Recommendations
Netflix Recommendations (Bonus!)
NiFi
NiFi = “Niagra Files”
Maintainers @ Hortonworks since 2015
Developed @ NSA over last 8+ years
Integrates with EVERYTH...
NiFi + Kafka
NiFi Routing: Http Request
NiFi Geo-Enrichment
NiFi Extract Kafka Topic
NiFi Kafka PUT (Finally!)
NiFi Post-Kafka HttpResponse
NiFi Data Provenance
NiFi Provenance Event Types
ATTRIBUTES_MODIFIED (ie. Extract Topic Name)
CONTENT_MODIFIED (ie. Enrich with Geo)
RECEIVE (i...
NiFi Search Data Provenance
NiFi Kafka Provenance Event
NiFi Kafka Provenance Event
NiFi Kafka Provenance Event
NiFi Provenance Lineage
Agenda
Live, Interactive Demo!
NiFi
Spark Streaming
Streaming Recommendations
Netflix Recommendations (Bonus!)
Spark Streaming
Submits Time-Based Micro Batches of Data as Spark Jobs
Supports Kinesis, Flume, MQTT, ZeroMQ, Sockets, KAF...
Original Kafka Receiver
Newer Kafka “Direct” Receiver
Spark Streaming KafkaRDD
Kafka “Direct” Streaming Implementation (Spark 1.4+)
Recover/Replay from Kafka using File System-...
Agenda
Live, Interactive Demo!
NiFi
Spark Streaming
Streaming Recommendations
Netflix Recommendations (Bonus!)
Streaming Recommendations
Incremental Matrix Factorization!!
(Based on github.com/brkyvz/streaming-matrix-factorization)
Recommendation Serving Layer
Use Case: Recommendation Service Depends on Redis Cache
Problem: Redis Cache Goes Down!?
Answ...
Agenda
Live, Interactive Demo!
NiFi
Spark Streaming
Streaming Recommendations
Netflix Recommendations (Bonus!)
Netflix Data Pipeline
9 million events, 22 GB per second @ peak!
EC2 D2XL
Disk: 6 TB, 475 MB/s
RAM: 30 G
Network: 700 Mbps...
Recommendations Pipeline
Batch Matrix Factorization
Keep Batch Video (V) Matrix
Calculate Newer User (U) Matrix
Compute U ...
Thank You, Kafka Summit SF!
Chris Fregly
@cfregly
All Source Code, Demos, and Docker Images Available
@ advancedspark.com,...
Upcoming SlideShare
Loading in …5
×

Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi, Kafka, and Spark - Chris Fregly

5,063 views

Published on

Generating Real-time Recommendations with NiFi, Kafka, and Spark - Chris Fregly

Published in: Software

Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi, Kafka, and Spark - Chris Fregly

  1. 1. Generating Real-time, Streaming Recommendations [NiFi + Kafka + Spark ML] Kafka Summit SF April 26, 2016
  2. 2. Who am I? Chris Fregly, Principal Data Solutions Engineer @ IBM Spark Technology Center Previously, Data Engineer @ Netflix and Databricks Contributor @ Apache Spark, Committer @ Netflix OSS Founder @ Advanced Spark and TensorFlow Author @ Advanced Spark (advancedspark.com)
  3. 3. Relevant Spark Contribution SPARK-1981: Add Kinesis support for Spark Streaming Me
  4. 4. Fun Meetup!
  5. 5. Fun Workshop!! San Jose: May 14th (full details @ advancedspark.com)
  6. 6. Fun Github Repo!!!
  7. 7. Agenda Live, Interactive Demo! NiFi Spark Streaming Streaming Recommendations Netflix Pipeline (Bonus!)
  8. 8. Live, Interactive Demo http://demo2.advancedspark.com
  9. 9. Agenda Live, Interactive Demo! NiFi Spark Streaming Streaming Recommendations Netflix Recommendations (Bonus!)
  10. 10. NiFi NiFi = “Niagra Files” Maintainers @ Hortonworks since 2015 Developed @ NSA over last 8+ years Integrates with EVERYTHING! Provides Data Provenance Data Flow Management Me, Normal Guy Joe Witt, NiFi Co-Creator Buffalo Wild Wings Hat
  11. 11. NiFi + Kafka
  12. 12. NiFi Routing: Http Request
  13. 13. NiFi Geo-Enrichment
  14. 14. NiFi Extract Kafka Topic
  15. 15. NiFi Kafka PUT (Finally!)
  16. 16. NiFi Post-Kafka HttpResponse
  17. 17. NiFi Data Provenance
  18. 18. NiFi Provenance Event Types ATTRIBUTES_MODIFIED (ie. Extract Topic Name) CONTENT_MODIFIED (ie. Enrich with Geo) RECEIVE (ie. Handle Http Request) ROUTE (ie. Check Http Method) SEND (ie. PutKafka) DROP (Handle Http Response)
  19. 19. NiFi Search Data Provenance
  20. 20. NiFi Kafka Provenance Event
  21. 21. NiFi Kafka Provenance Event
  22. 22. NiFi Kafka Provenance Event
  23. 23. NiFi Provenance Lineage
  24. 24. Agenda Live, Interactive Demo! NiFi Spark Streaming Streaming Recommendations Netflix Recommendations (Bonus!)
  25. 25. Spark Streaming Submits Time-Based Micro Batches of Data as Spark Jobs Supports Kinesis, Flume, MQTT, ZeroMQ, Sockets, KAFKA! Framework for Custom Streaming Receivers Flexible Window Operations, Optimized State Management Basic Back Pressure and Throttling Support At Least Once Guarantees through Write Ahead Log (WAL)
  26. 26. Original Kafka Receiver
  27. 27. Newer Kafka “Direct” Receiver
  28. 28. Spark Streaming KafkaRDD Kafka “Direct” Streaming Implementation (Spark 1.4+) Recover/Replay from Kafka using File System-like Offsets Removes need for Write Ahead Log (WAL) Uses Kafka, itself, as the WAL! KafkaRDD
  29. 29. Agenda Live, Interactive Demo! NiFi Spark Streaming Streaming Recommendations Netflix Recommendations (Bonus!)
  30. 30. Streaming Recommendations Incremental Matrix Factorization!! (Based on github.com/brkyvz/streaming-matrix-factorization)
  31. 31. Recommendation Serving Layer Use Case: Recommendation Service Depends on Redis Cache Problem: Redis Cache Goes Down!? Answer: github.com/Netflix/Hystrix Circuit Breaker! Circuit States: Closed: Service OK Open: Service DOWN Fallback to Non-Personalized Recommendations from Disk
  32. 32. Agenda Live, Interactive Demo! NiFi Spark Streaming Streaming Recommendations Netflix Recommendations (Bonus!)
  33. 33. Netflix Data Pipeline 9 million events, 22 GB per second @ peak! EC2 D2XL Disk: 6 TB, 475 MB/s RAM: 30 G Network: 700 Mbps Auto-scaling, Fault tolerance A/B Tests, Trending Now SAMZA Splits high and normal priority
  34. 34. Recommendations Pipeline Batch Matrix Factorization Keep Batch Video (V) Matrix Calculate Newer User (U) Matrix Compute U x V Dot Product Save Model to Disk and EVCache https://github.com/Netflix/EVCache Throw away batch user factors (U) Keep video factors (V)
  35. 35. Thank You, Kafka Summit SF! Chris Fregly @cfregly All Source Code, Demos, and Docker Images Available @ advancedspark.com, github.com/fluxcapacitor/pipeline Join the Global Meetup for Slides, Videos, Book @ advancedspark.com

×