SlideShare a Scribd company logo
Engineering Real Time
Event Driven Processing
Karthik Ramasamy, Co-founder
Streamlio
Information Age
!2
Ká !
Event data is key
Increasingly Connected World
!3
Internet of Things
30 B connected devices by 2020
Health Care
153 Exabytes (2013) -> 2314 Exabytes
(2020)
Machine Data
40% of digital universe by 2020
Connected Vehicles
Data transferred per vehicle per month
4 MB -> 5 GB
Digital Assistants (Predictive Analytics)
$2B (2012) -> $6.5B (2019) [1]
Siri/Cortana/Google Now
Augmented/Virtual Reality
$150B by 2020 [2]
Oculus/HoloLens/Magic Leap
Ñ
+
>
Use Cases
Observations
• Fight spammy content, engagements, and behaviors in Twitter
• Spam campaign comes in large batch
• Despite randomized tweaks, enough similarity among spammy entities are
preserved
Requirement
• Real time - a competition with spammers (i.e) “detect” vs “mutate”
• Generic - need to support all common feature representations
Product Safety
!5
Product Safety - System Overview
!6
KV store for
clustering
Messaging
System (Event
Bus)
Similarity Clustering (Heron)
Real Time Ads
!7
KV store
Messaging
System (Event
Bus)
Ads Serving (Heron)
Ads Prediction (Heron)
Impressions
Spend
Ads Analytics (Heron)
Engagements
Spend
Ads Requests
Ads Responses
Impressions
Spend
Connected Cars
!8
KV store for
clustering
Messaging
System
Traffic Patterns
Messaging
System
Data Capture/Filter
Fuel Efficiency
Architectural Pattern
Recurring Pattern
!10
ProcessMessaging
Storage
Data Ingestion Data Processing
Results StorageData Storage
Data
Serving
State of the World
!11
Aggregation
Systems
Messaging
Systems
Result
Engine
HDFS
Queryable
Engines
Towards Unification and Simplification
!12
Interactive
Querying
Storm API Streamlets SQL
Application
Builder
Pulsar
API
BK/
HDFS
API
Metadata
Management
Operational
Monitoring
Chargeback
Security
Authentication
Quota
Management
Kafka
API
Apache Pulsar
Apache Pulsar highlights
!14
Stream-Native Functions
*NEW*
Apply processing functions on
data, fully managed by Pulsar
Multi-tenancy
A single cluster can support
many tenants and use cases
High throughput
Millions of messages/s in a
single partition
Durability
Data replicated and synced
to disk
Geo-replication
Out of box support for
geographically distributed
applications
Unified messaging model
Support both Topic & Queue
semantic in a single model
Delivery Guarantees
At least once, at most once
and effectively once
Low Latency
Low publish latency of 5ms
at 99pct
Scalability
Supports millions of topics in
a single cluster
Pulsar Architecture
!15
Serving
Brokers can be added independently
Traffic can be shifted quickly across brokers
Storage
Bookies can be added independently
New bookies will ramp up traffic quickly
Segment Centric Architecture
!16
Pulsar multi-datacenter replication
!17
Geo-replication
Asynchronous replication
Integrated in the broker message flow
Simple configuration to add/remove regions
Topic (T1) Topic (T1)
Topic (T1)
Subscription
(S1)
Subscription
(S1)
Producer
(P1)
Consumer
Producer
(P3)
Producer
(P2)
Consumer
Data Center A Data Center B
Data Center C
Apache Heron
Apache Heron design goals
!19
Efficiency
Reduce resource
consumption
Support for diverse
workloads
Throughput vs latency
sensitive
Support for multiple
semantics
At most once, At least once,
Effectively once
Native Multi-Language
Support
C++, Java, Python
Task Isolation
Ease of debug-ability/isolation/
profiling
Support for back pressure
Topologies should be self
adjusting
Use of containers
Runs in schedulers - Kubernetes &
DCOS & many more
Multi-level APIs
Procedural, Functional and Declarative
for diverse applications
Diverse deployment models
Run as a service or pure library
Heron data model
!20
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
Writing Heron topologies
!21
Procedural - Low Level API
Directly write your spouts and bolts
Functional - Mid Level API
Use of maps, flat maps, transform, windows
Declarative - SQL (in the works)
Use of declarative language - specify what you
want, system will figure it out.
,
%
Topology execution
!22
Topology Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
DATA CONTAINER DATA CONTAINER
Metrics
Manager
Metrics
Manager
Metrics
Manager
Health
Manager
MASTER
CONTAINER
Heron topology examples
!23
Heron topology scale
!24
CONTAINERS - 1 TO 600
INSTANCES - 10 TO 6000
• No more pages during midnight for Heron team
• Very rare incidents for Heron customer teams
• Easy to debug during incident for quick turn around
• Reduced resource utilization saving cost
Heron impact at Twitter
!25
Apache BookKeeper
Computation across batch/streaming is similar
• Expressed as DAGS
• Run in parallel on the cluster
• Intermediate results need not be materialized
• Functional/Declarative APIs
Storage is the key
• Messaging/Storage are two faces of the same coin
• They serve the same data
Observations
!27
• Be able to write and read streams of records with low latency,
storage durability
• Data storage should be durable, consistent and fault tolerant
• Enable clients to stream or tail ledgers to propagate data as
they’re written
• Store and provide access to both historic and real-time data
Storage Requirements
!28
Using BookKeeper
!29
Companies using the technology
!30
Event Data Processing with Streamlio
Event Data Processing with Streamlio

More Related Content

What's hot

Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed ApplicationsAkka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Lightbend
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
confluent
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
Monal Daxini
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
DataWorks Summit/Hadoop Summit
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
Sangjin Lee
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Food Processing is Stream Processing (Stefan Freshe, Nordischer Maschinenbau...
Food Processing is Stream Processing (Stefan Freshe,  Nordischer Maschinenbau...Food Processing is Stream Processing (Stefan Freshe,  Nordischer Maschinenbau...
Food Processing is Stream Processing (Stefan Freshe, Nordischer Maschinenbau...
confluent
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
confluent
 
Shared time-series-analysis-using-an-event-streaming-platform -_v2
Shared   time-series-analysis-using-an-event-streaming-platform -_v2Shared   time-series-analysis-using-an-event-streaming-platform -_v2
Shared time-series-analysis-using-an-event-streaming-platform -_v2
confluent
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
confluent
 
Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
confluent
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
confluent
 

What's hot (20)

Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed ApplicationsAkka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed Applications
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
 
Food Processing is Stream Processing (Stefan Freshe, Nordischer Maschinenbau...
Food Processing is Stream Processing (Stefan Freshe,  Nordischer Maschinenbau...Food Processing is Stream Processing (Stefan Freshe,  Nordischer Maschinenbau...
Food Processing is Stream Processing (Stefan Freshe, Nordischer Maschinenbau...
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
 
Shared time-series-analysis-using-an-event-streaming-platform -_v2
Shared   time-series-analysis-using-an-event-streaming-platform -_v2Shared   time-series-analysis-using-an-event-streaming-platform -_v2
Shared time-series-analysis-using-an-event-streaming-platform -_v2
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
 
Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 

Similar to Event Data Processing with Streamlio

Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
Sri Ambati
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
TechHub pitch
TechHub pitchTechHub pitch
TechHub pitch
Serge Kotlyarov
 
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik RamasamyReal Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Data Con LA
 
Azure iot suite
Azure iot suiteAzure iot suite
Azure iot suite
Mostafa Ramezani
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
confluent
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
confluent
 
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
SeokJin Han
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
Data Driven Innovation
 
How to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent IIHow to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent II
confluent
 
Arya.ai artificial intelligence platform vinay kumar
Arya.ai   artificial intelligence platform  vinay kumarArya.ai   artificial intelligence platform  vinay kumar
Arya.ai artificial intelligence platform vinay kumar
TechXpla
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot Integrations
Cristina Vidu
 
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
Sri Ambati
 
Bay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update sessionBay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update session
Nills Franssens
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Webcast: API-Centric Architecture for Building Context-Aware Apps
Webcast: API-Centric Architecture for Building Context-Aware AppsWebcast: API-Centric Architecture for Building Context-Aware Apps
Webcast: API-Centric Architecture for Building Context-Aware Apps
Apigee | Google Cloud
 
Introduction to FIWARE Open Ecosystem
Introduction to FIWARE Open EcosystemIntroduction to FIWARE Open Ecosystem
Introduction to FIWARE Open Ecosystem
Fernando Lopez Aguilar
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
Aljoscha Krettek
 

Similar to Event Data Processing with Streamlio (20)

Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
TechHub pitch
TechHub pitchTechHub pitch
TechHub pitch
 
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik RamasamyReal Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik Ramasamy
 
Azure iot suite
Azure iot suiteAzure iot suite
Azure iot suite
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
How to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent IIHow to Build Streaming Apps with Confluent II
How to Build Streaming Apps with Confluent II
 
Arya.ai artificial intelligence platform vinay kumar
Arya.ai   artificial intelligence platform  vinay kumarArya.ai   artificial intelligence platform  vinay kumar
Arya.ai artificial intelligence platform vinay kumar
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot Integrations
 
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
 
Bay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update sessionBay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update session
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Webcast: API-Centric Architecture for Building Context-Aware Apps
Webcast: API-Centric Architecture for Building Context-Aware AppsWebcast: API-Centric Architecture for Building Context-Aware Apps
Webcast: API-Centric Architecture for Building Context-Aware Apps
 
Introduction to FIWARE Open Ecosystem
Introduction to FIWARE Open EcosystemIntroduction to FIWARE Open Ecosystem
Introduction to FIWARE Open Ecosystem
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 

More from Streamlio

Infinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache PulsarInfinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache Pulsar
Streamlio
 
Strata London 2018: Multi-everything with Apache Pulsar
Strata London 2018:  Multi-everything with Apache PulsarStrata London 2018:  Multi-everything with Apache Pulsar
Strata London 2018: Multi-everything with Apache Pulsar
Streamlio
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
Streamlio
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
Streamlio
 
Building data-driven microservices
Building data-driven microservicesBuilding data-driven microservices
Building data-driven microservices
Streamlio
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
Streamlio
 
Evaluating Streaming Data Solutions
Evaluating Streaming Data SolutionsEvaluating Streaming Data Solutions
Evaluating Streaming Data Solutions
Streamlio
 
Autopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in HeronAutopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in Heron
Streamlio
 
Introduction to Apache Heron
Introduction to Apache HeronIntroduction to Apache Heron
Introduction to Apache Heron
Streamlio
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Streamlio
 

More from Streamlio (10)

Infinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache PulsarInfinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache Pulsar
 
Strata London 2018: Multi-everything with Apache Pulsar
Strata London 2018:  Multi-everything with Apache PulsarStrata London 2018:  Multi-everything with Apache Pulsar
Strata London 2018: Multi-everything with Apache Pulsar
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
 
Building data-driven microservices
Building data-driven microservicesBuilding data-driven microservices
Building data-driven microservices
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
 
Evaluating Streaming Data Solutions
Evaluating Streaming Data SolutionsEvaluating Streaming Data Solutions
Evaluating Streaming Data Solutions
 
Autopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in HeronAutopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in Heron
 
Introduction to Apache Heron
Introduction to Apache HeronIntroduction to Apache Heron
Introduction to Apache Heron
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
 

Recently uploaded

AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 

Recently uploaded (20)

AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 

Event Data Processing with Streamlio

  • 1. Engineering Real Time Event Driven Processing Karthik Ramasamy, Co-founder Streamlio
  • 3. Increasingly Connected World !3 Internet of Things 30 B connected devices by 2020 Health Care 153 Exabytes (2013) -> 2314 Exabytes (2020) Machine Data 40% of digital universe by 2020 Connected Vehicles Data transferred per vehicle per month 4 MB -> 5 GB Digital Assistants (Predictive Analytics) $2B (2012) -> $6.5B (2019) [1] Siri/Cortana/Google Now Augmented/Virtual Reality $150B by 2020 [2] Oculus/HoloLens/Magic Leap Ñ + >
  • 5. Observations • Fight spammy content, engagements, and behaviors in Twitter • Spam campaign comes in large batch • Despite randomized tweaks, enough similarity among spammy entities are preserved Requirement • Real time - a competition with spammers (i.e) “detect” vs “mutate” • Generic - need to support all common feature representations Product Safety !5
  • 6. Product Safety - System Overview !6 KV store for clustering Messaging System (Event Bus) Similarity Clustering (Heron)
  • 7. Real Time Ads !7 KV store Messaging System (Event Bus) Ads Serving (Heron) Ads Prediction (Heron) Impressions Spend Ads Analytics (Heron) Engagements Spend Ads Requests Ads Responses Impressions Spend
  • 8. Connected Cars !8 KV store for clustering Messaging System Traffic Patterns Messaging System Data Capture/Filter Fuel Efficiency
  • 10. Recurring Pattern !10 ProcessMessaging Storage Data Ingestion Data Processing Results StorageData Storage Data Serving
  • 11. State of the World !11 Aggregation Systems Messaging Systems Result Engine HDFS Queryable Engines
  • 12. Towards Unification and Simplification !12 Interactive Querying Storm API Streamlets SQL Application Builder Pulsar API BK/ HDFS API Metadata Management Operational Monitoring Chargeback Security Authentication Quota Management Kafka API
  • 14. Apache Pulsar highlights !14 Stream-Native Functions *NEW* Apply processing functions on data, fully managed by Pulsar Multi-tenancy A single cluster can support many tenants and use cases High throughput Millions of messages/s in a single partition Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Topic & Queue semantic in a single model Delivery Guarantees At least once, at most once and effectively once Low Latency Low publish latency of 5ms at 99pct Scalability Supports millions of topics in a single cluster
  • 15. Pulsar Architecture !15 Serving Brokers can be added independently Traffic can be shifted quickly across brokers Storage Bookies can be added independently New bookies will ramp up traffic quickly
  • 17. Pulsar multi-datacenter replication !17 Geo-replication Asynchronous replication Integrated in the broker message flow Simple configuration to add/remove regions Topic (T1) Topic (T1) Topic (T1) Subscription (S1) Subscription (S1) Producer (P1) Consumer Producer (P3) Producer (P2) Consumer Data Center A Data Center B Data Center C
  • 19. Apache Heron design goals !19 Efficiency Reduce resource consumption Support for diverse workloads Throughput vs latency sensitive Support for multiple semantics At most once, At least once, Effectively once Native Multi-Language Support C++, Java, Python Task Isolation Ease of debug-ability/isolation/ profiling Support for back pressure Topologies should be self adjusting Use of containers Runs in schedulers - Kubernetes & DCOS & many more Multi-level APIs Procedural, Functional and Declarative for diverse applications Diverse deployment models Run as a service or pure library
  • 20. Heron data model !20 % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 21. Writing Heron topologies !21 Procedural - Low Level API Directly write your spouts and bolts Functional - Mid Level API Use of maps, flat maps, transform, windows Declarative - SQL (in the works) Use of declarative language - specify what you want, system will figure it out. , %
  • 22. Topology execution !22 Topology Master ZK Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan DATA CONTAINER DATA CONTAINER Metrics Manager Metrics Manager Metrics Manager Health Manager MASTER CONTAINER
  • 24. Heron topology scale !24 CONTAINERS - 1 TO 600 INSTANCES - 10 TO 6000
  • 25. • No more pages during midnight for Heron team • Very rare incidents for Heron customer teams • Easy to debug during incident for quick turn around • Reduced resource utilization saving cost Heron impact at Twitter !25
  • 27. Computation across batch/streaming is similar • Expressed as DAGS • Run in parallel on the cluster • Intermediate results need not be materialized • Functional/Declarative APIs Storage is the key • Messaging/Storage are two faces of the same coin • They serve the same data Observations !27
  • 28. • Be able to write and read streams of records with low latency, storage durability • Data storage should be durable, consistent and fault tolerant • Enable clients to stream or tail ledgers to propagate data as they’re written • Store and provide access to both historic and real-time data Storage Requirements !28
  • 30. Companies using the technology !30