More Related Content Similar to Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra (20) Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra3. Agenda
© Copyright 2016 HomeAway, Inc.
• Overview
• The Problem
• The Experiment
• Results: Use Cases
• Lessons Learned
• Next Steps
6. In the old days: “Dial Tone” looked like this
© Copyright 2016 HomeAway, Inc.
ATDT
7. Today: Kafka is the modern “Dial Tone” for Data
© Copyright 2016 HomeAway, Inc.
Producer
Consumer
10. Our original problem/motivation
© Copyright 2016 HomeAway, Inc.
search head
indexer
indexer
app server forwarder
app server forwarder
1 TB/day ingress and growing!
40,000 calls/sec
12. Fill the Lake! Alternatives
?
Problem: Fill Hadoop!
Problem Data Lake
© Copyright 2016 HomeAway, Inc.
13. What we wanted… the Big Idea
© Copyright 2016 HomeAway, Inc.
If you can log it… … you can analyze it!
14. How to build self-service?
© Copyright 2016 HomeAway, Inc.
15. Hypothesis: Use Kafka!
© Copyright 2016 HomeAway, Inc.
2 ms median
latency
http://bit.ly/jay_on_logs
the log
2 Million Events / Sec!
(3 cheap machines)
http://goo.gl/pv5GoL
“Benchmarking Apache
Kafka”
23. Use Cases: Search + ClickStream
© Copyright 2016 HomeAway, Inc.
User Behavior
Search Requests
A/B Test Readouts
Proctor
EDAP
27. Lesson #1: The Schema [registry] is Everything!
Data Lake
© Copyright 2016 HomeAway, Inc.
Schema
Registry
• Decouples producers from
consumers
• Enforces backwards
compatibility
• Enables self-service /
democratization
• SOT for schemas in the pipe
28. Lesson #2: A Kafka/SR governance module is helpful
Data Lake
© Copyright 2016 HomeAway, Inc.
• TURN OFF Auto Topic Creation!
• Need a place for developers
to request topics
• Retention Policy
• Expected Load
• Compaction
• Partition Size / Partition Key
• Owner
• LTS Date
29. Lesson #3: Make it easy to do stream processing
© Copyright 2016 HomeAway, Inc.
Schema
Registry
• samza-archetype
• samza-job-deployer
• Will evaluate k-streams!!!!
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
31. Consistency : 3 types of Data
© Copyright 2016 HomeAway, Inc.
Event
Document
Transactional
37. Don’t be a dinosaur…
© Copyright 2016 HomeAway, Inc.
ATDT
Editor's Notes Is often the difference between dinosaurs and unicorns. “We are living at a dawn of a new age, where how we listen to data…
Today, we @ HomeAway believe that Kafka is the “Dial Tone” for data, enabling present businesses to turn themselves into rainbows and unicorns.