With the advent of reliable streaming technologies, real-time data pipelines have become a crucial component of any robust data initiative today. Compared to a traditional Hadoop-centric data hub, these real-time stacks provide high-levels of system availability and data integrity coupled with very low latency queries without incurring the overhead of inflexible schemas and batch analysis lag. Alex Silva demonstrates how to use Kafka, Spark Streaming, Akka, and Hadoop to orchestrate a real-time stack and explains how data flows through this system. This real-time data platform combines a mix of open source technologies and home-grown services aimed at providing a full end-to-end solution, starting from flexible data-ingestion protocols to fast data analysis and queries. Topics include: External message providers, which connect to the platform through a data-ingestion service modeled as a robust actor system using Akka and Scala Routing different backend systems, including Kafka and Druid Spark Streaming, which is used to perform real-time complex analytical and scientific processing on the data Exporting data for future processing into Hadoop Querying and visualization Photo of Alex Silva