Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. In order
to meet this challenge, Twitter designed an end to end real-time stack consisting of DistributedLog, the distributed and replicated messaging system system, and Heron, the streaming system for real time computation. DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system. Twitter Heron is the next generation streaming system built from ground up to address our scalability and reliability needs. Both the systems have been in production for nearly two years and is widely used at Twitter in a range of diverse applications such as search ingestion pipeline, ad analytics, image classification and more. These slides will describe Heron and DistributedLog in detail, covering a few use cases in-depth and sharing the operating experiences and challenges of running large-scale real time systems at scale.