3. Goals of Lyft’s Streaming Platform
• Make it easy to build real-time, event-driven, stateful, microservices
• Solve the hard parts of stream processing ONCE for the entire
company
• Be a force multiplier for other teams within Lyft
• Three components: Pub/Sub, Streaming Compute, Stream Registry
3
4. Streaming Platform Overview
4
Streaming
Service One
Streaming
Service Two
Streaming
Service Three
Stream / Schema
Registry
Deployment
Tooling
Metrics &
Dashboards
Alerts Logging
Amazon
EC2
Amazon S3 Wavefront
Salt
(Conifg / Orca)
Docker
Pub/Sub Pub/Sub
Stream Compute
5. Lyft Streaming Platform - Streaming Compute Criteria
Operational Considerations
● Stateful Computation and Exactly-once
Processing Semantics
● Robust State Management
● Data Reprocessing (backfill)
● Asynchronous Checkpoints
● Back-pressure
● High throughput and low-latency
● Deployment Architecture
5
API Considerations:
● Functional / Fluent API
● Flexible Windowing API
● Event Time Support
● Apache Beam Support
● Stream SQL
● Powerful Direct API
● Late Data Handling
The contenders: Apache Flink, Apache Spark Streaming, Apache Kafka Streams
6. Why Flink? API Considerations
• Functional / Fluent API
• Flexible Windowing API
• Event Time Support
• Apache Beam Support
• Stream SQL
• Powerful Direct API
• Late Data Handling
6
7. Why Flink? Operational Considerations
• Stateful Computation and Exactly-once Processing Semantics
• Robust State Management
• Stateful Data Reprocessing (backfill)
• Asynchronous Checkpoints
• Back-pressure
• High throughput and low-latency
• Deployment Architecture
7
9. Why Kafka?
Pros
• Durability & Write Latency
• Read Latency & Consumer Fanout
• Transactions & Idempotent Writes
• Operational Concerns & Vendor Support
Cons
• No ordering by key, only partition
• Long term data storage still an issue
• Auto-Scaling still an issue
9
10. Open Problems
• Rescaling Kafka while preserving per-key ordering
• Efficient Dynamic Computations over streams
• Long term storage for events: real-time and historical reads
• Zero Downtime deployments for streaming services
10
11. Rescaling Kafka
• Rescaling Kafka while preserving per-key ordering
• Kafka only provides partition ordering guarantees!
• We want per-key ordering guarantees
• Guarantees should hold across re-partitioning events
• Basic approach: Read old partitions completely before reading new
• Achieve this using something akin to Flink’s checkpoint barriers to mark
re-partitioning events 11
14. Efficient Dynamic Computation Over Streams
• Enable many users to dynamically submit small streaming computations
• Share bandwidth amongst multiple computations
• Share computed sub-results amongst multiple computations
• Correctly handle bootstrapping of computations which depend on
historical data
• Basic approach: Map any computation into a fixed/general data flow
“shape”
14
19. Summary
• Lyft is building a next generation streaming platform based on Apache
Flink and Apache Kafka
• Stateful stream processing is not a “solved problem”
• There are many hard / open problems left to solve
• If these sort of problems interest you please come join us!
We’re Hiring!
19
Really excited about the great team we are building
Looking forward to the next quarter
Unfortunately no time for Q&A today (Eng all hands taking our room) but always available to chat or answer questions, just ping me