SlideShare a Scribd company logo
1 of 24
High-throughput data
streaming in Azure
Alexander Laysha
Solution Architect at EPAM Systems & Microsoft
Azure MVP
2
Few words about myself…
I’m Alexander Laysha
• Solution Architect from EPAM Systems & Microsoft
Azure MVP
• Focused on backend, high-load and cloud solutions
• Leader of Belarus Azure Community
• Speaker at local and external meetups and
conferences
My contacts
• Email: layshaalex@gmail.comom
• Twitter: @layshaalexander
• Facebook: alexander.laysha
3
• Business needs for real-time analytics
• Use-cases & architecture approaches
• Basics of real-time data streaming platforms
• Azure Event Hub capabilities & constraints
• Pricing calculations for multiple data ingestion scenarios based on Event Hub
• Summary
Which topics will we cover?
4
Past world
• Capture data for later analysis
• Reports and analytics with X days latency
Current days
• Dealing with tons of data
• Offline report and analysis in no longer enough (but still important)
• Business want to get immediate insights from captured data with X
seconds/minutes latency
Business needs for data analytics
5
IoT – device operational intelligence and pro-active alerts
Gaming Industry – real-time board with game leaders and scores
E-Commerce – online recommendation engines and proactive care
Operations - analyze real-time data to respond to dynamic environments in
order to take immediate action
Financial - monitor financial transactions in real-time to detect fraudulent
activity
Just few use-cases…
6
Collection – captured data from
multiple sources
Streaming - high-throughput data
pipeline systems like Kafka, Kinesis,
Event Hub
Processing – stream processing
platforms that performs a certain
task to produce output
Serving – app for stream processing
output consumption – UI, posts, DB,
report viewers, APIs
High-level real-time streaming architecture
OUR FOCUS
7
Persistence and batch - data is stored in
a persistence layer from which it is
ingested and processed by the batch
layer periodically (may includes stream
processing for on-fly ETL)
Speed layer - handles the portion of the
data that has not-yet been processed by
the batch layer (includes stream
processing and storage)
Serving layer - consolidates both by
merging the output of the batch and the
speed layer
High-level view of Lambda Architecture
8
High-level view of Kappa Architecture
Persistence – stores initial raw data for
historical purposes and can be used to
replay computations from initial data
stream
Speed Layer - the basic idea is to not
periodically recompute all data in the
batch layer, but to do all computation on
Speed Layer in the stream processing
system alone and only perform
recomputation when the business logic
changes by replaying historical data
9
Popular platforms for data streaming
Kinesis
Event Hub
10
Idea of Kafka/EventHub/Kinesis
11
Common terminology
Producer Producer Producer Publisher
Consumer Consumer Kinesis Stream
Applications
Consumer
Stream Topic Stream Event Hub
Partition Partition Shard Partition
Index Offset Sequence Number Offset
Consumer
Group
Consumer
Group
Application Consumer
Group
12
• Designed to handle very large quantities of small messages
• Horizontally scalable by using partitions and consumer groups
• Reliable and fault-tolerant
• Configurable data replication
• Configurable message TTL (stream level)
• Supports at-least-once delivery
• Logical data organization using partitions
• Separate date view for consumer by using consumer groups and indexes
• Ability to replay messages
• Messages with the same key are sent to the same partition
• Guarantee of message order in scope of partition
• Integrated with modern stream processing platforms (Stream Analytics,
Storm, Spark, etc.)
Common characteristics
13
Let’s take a close look to Azure Event Hub
Event
Producers
> 1M Producers
> 1GB/sec
Aggregate
Throughput
Direct
PartitionKey
Hash
Throughput Units:
• 1 ≤ TUs ≤ Partition Count
• TU: 1 MB/s writes, 2 MB/s
reads
Namespace
14
Ways to publish - individual event or batch:
• Round Robin
• Partition Id
• Partition Key
Supported Protocols:
• HTTPS – short-lived (low throughput)
• AMQP 1.0 – long-lived, (high throughput)
Publisher Policy - run-time feature designed to facilitate large numbers of
independent event publishers by using unique identifier and virtual endpoint:
//[my namespace].servicebus.windows.net/[event hub
name]/publishers/[my publisher name]
Event Hub Publishers
Event
Producers
15
Events listening - consumer connects to a partition using AMQP 1.0 protocol
and listens for incoming events
Consumer Groups - is a view (state, position, or offset) of an entire event hub.
Consumer groups enable multiple consuming applications to each have a
separate view of the event stream, and to read the stream independently at
their own pace and with their own offsets
Event Hub Consumers
16
• Security model is based on Shared Access Signature (SAS) tokens
• Shared access policy (key) supports following claims: Send, Listen, Manage
• Shared access policy (key):
• can be created on namespace or event hub level
• includes Primary and Secondary keys
• Primary and Secondary key can be revoked
• SAS tokens can be created on namespace, event hub or publisher level
• Granular control over event publishers through publisher policies (publisher
name should be the same as partition name, SAS token should be for
publisher endpoint)
• Event publishers can be revoked in case of usage of publisher specific SAS
token
Event Hub Security
17
• Automatic persistence of ingested events from Even Hub in Apach Avro format
• Supported storages:
• Azure Storage
• Azure Data Lake
• Configurable size & time windows per partition
Event Hub Capturing
18
Monitoring
• Integrated with Azure Monitor
• Type of diagnostics data: archive logs, operational logs, auto-scale logs, all
metrics
• Diagnostic logs can be send to: storage account, event hub, Log Analytics
Availability & Disaster Recovery
• SLA for 99,9% for operations on Event Hub
• HA is guaranteed by replication and availability sets
• In case of failure one of the partitions, other partitions will be available
• No built-in options for disaster recovery of Event Hub between regions (custom
solution: use events capturing with geo redundant storage and custom code to
populate Event Hub in another region)
Event Hub Monitoring & Disaster Recovery
19
• Throughput Unit – unit of scalability, shared across all event hubs in
namespace
• Manually or programmatically set (TUs)
• 1 TU = 1 MB/sec or 1000 events/sec on ingress, 2 MB/sec on egress, Max 100
TU for Standard Tier (contact support team)
• Dedicated Event Hub: 1 CU = ~200 TU, max 8 CU
• Enable Auto-Inflate for auto scaling up of TUs with ability to specify limits
• Partition count: 2-32. Count is not changeable and must be specified during
creation (count can be increased by contacting Microsoft)
• Consumer Group count: up to 20 per Event Hub
• 5 max concurrent readers on a partition per consumer group (recommended
to use one active receiver on a partition per consumer group)
Event Hub Scalability
20
• Single tenant hosting with no noise from other tenants, available to
customers with an enterprise agreement
• Repeatable performance every time
• No additional charge for incoming messages
• Message size increases to 1 MB as compared to 256 KB for Standard and Basic
• Scalable between 1 and 8 capacity units (CU) – providing up to 2 million
ingress events per second
• CUs manage the scale for Event Hubs Dedicated, 1 CU = ~200 TU, max 8 CU
• Zero maintenance: management of load balancing, OS updates, security
patches, and partitioning
• Fixed monthly pricing: ~720$ per day for 1 CU (pricing & CU size will change
starting from October 2017: ¼ CU for 5000$ per month)
Dedicated Event Hub
21
• Max 10 Event Hubs per Namespace
• Partition limit is 1 TU
• Number of AMQP connections per namespace: 5000
• Only Azure deployment, No Azure Stack support yet
• No SAS on consumer group level, no built-in encryption or compression of
event body
• No functionality to drain Event Hub (need to create custom drainer)
• No local emulator
Other non covered Event Hub Constraints
22
!!! COSTS JUST FOR INCOMING TRAFFIC WITHOUT STORAGE COSTS
1.000 msg/sec, 1 KB in size, 1 MB/sec, 24/7 = 1 TU Standard Pricing Tier *
22.32$ + (1.000 * 60sec * 60min * 24hrs * 31d)/1.000.000 * 0.028$ = 22.32$ +
75$ = 97.3$ per month - GOOD
100.000 msg/sec, 1 KB in size, 100 MB/sec, 24/7 = 100 TU (MAX) Standard
Pricing Tier * 22.32$ + (100.000 * 60sec * 60min * 24hrs * 31d)/1.000.000 *
0.028$ = 2.232$ + 7.500$ = 9.730$ per month - GOOD
1.000.000 msg/sec, 1 KB in size, 1 GB/sec, 24/7 = 4 CU Dedicated Pricing Tier
* 720$ * 31d = 89.280$ per month - TOO MUCH!
Event Hub pricing for ingestion
23
• Azure Event Hub is capable to handle middle-loaded scenarios (100.000
msg/sec or 100 MB/sec) in cost affective manner and provides good feature
parity
• For high-loaded scenarios (1.000.000+ msg/sec or 1+ GB/sec) or big-data
scenarios it seems too expensive (Apache Kafka cluster more cheaper but
requires invest into tuning & maintenance costs)
• Always consider quality attribute requirements for your system before
moving forward with technology decisions. PaaS is not always right choice in
case of high-loaded scenarios
Summary
24
Q & A
THANK YOU!

More Related Content

What's hot

Scaling Your Database In The Cloud
Scaling Your Database In The CloudScaling Your Database In The Cloud
Scaling Your Database In The CloudCory Isaacson
 
LeanXcale for Monitoring
LeanXcale for MonitoringLeanXcale for Monitoring
LeanXcale for MonitoringLeanXcale
 
VTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computingVTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computingSachin Gowda
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture materialAnkit Gupta
 
Creating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache PulsarCreating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache PulsarKarthik Ramasamy
 
AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...
AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...
AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...Amazon Web Services
 
Is "Free" Good Enough for Your MySQL Environment?
Is "Free" Good Enough for Your MySQL Environment?Is "Free" Good Enough for Your MySQL Environment?
Is "Free" Good Enough for Your MySQL Environment?Datavail
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Ankit Gupta
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...StreamNative
 
Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...
Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...
Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...StreamNative
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarSijie Guo
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarMatteo Merli
 
Building Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaSBuilding Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaSSameera Jayasoma
 
Cloud Technology Brief 2013 Q1 - Thailand
Cloud Technology Brief 2013 Q1 - ThailandCloud Technology Brief 2013 Q1 - Thailand
Cloud Technology Brief 2013 Q1 - ThailandAruj Thirawat
 
Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarStreamNative
 
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLiftThe Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLiftRui Quintino
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...confluent
 

What's hot (20)

Scaling Your Database In The Cloud
Scaling Your Database In The CloudScaling Your Database In The Cloud
Scaling Your Database In The Cloud
 
LeanXcale for Monitoring
LeanXcale for MonitoringLeanXcale for Monitoring
LeanXcale for Monitoring
 
Cloud stack for_beginners
Cloud stack for_beginnersCloud stack for_beginners
Cloud stack for_beginners
 
VTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computingVTU 6th Sem Elective CSE - Module 5 cloud computing
VTU 6th Sem Elective CSE - Module 5 cloud computing
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
 
Creating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache PulsarCreating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache Pulsar
 
AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...
AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...
AWS re:Invent 2016: How Telltale Games migrated its story analytics from Apac...
 
Is "Free" Good Enough for Your MySQL Environment?
Is "Free" Good Enough for Your MySQL Environment?Is "Free" Good Enough for Your MySQL Environment?
Is "Free" Good Enough for Your MySQL Environment?
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
 
Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...
Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...
Replicated Subscriptions: Taking Geo-Replication to the Next Level - Pulsar S...
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
 
Building Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaSBuilding Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaS
 
Securing your data with Azure SQL DB
Securing your data with Azure SQL DBSecuring your data with Azure SQL DB
Securing your data with Azure SQL DB
 
Cloud Technology Brief 2013 Q1 - Thailand
Cloud Technology Brief 2013 Q1 - ThailandCloud Technology Brief 2013 Q1 - Thailand
Cloud Technology Brief 2013 Q1 - Thailand
 
Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsar
 
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLiftThe Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 

Similar to High throughput data streaming in Azure

GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon_Org Team
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeSingleStore
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitterTwitter Developers
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20Vinay Kumar
 
Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20Vinay Kumar
 
NATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemNATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemShiju Varghese
 
SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)Nati Shalom
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overviewsedukull
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising TechApache Apex
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
OpenStack at the speed of business with SolidFire & Red Hat
OpenStack at the speed of business with SolidFire & Red Hat OpenStack at the speed of business with SolidFire & Red Hat
OpenStack at the speed of business with SolidFire & Red Hat NetApp
 
IoT & Azure (EventHub)
IoT & Azure (EventHub)IoT & Azure (EventHub)
IoT & Azure (EventHub)Mirco Vanini
 
Latest Updates to Azure Integration Services
Latest Updates to Azure Integration ServicesLatest Updates to Azure Integration Services
Latest Updates to Azure Integration ServicesDaniel Toomey
 

Similar to High throughput data streaming in Azure (20)

GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
IoT & Azure
IoT & AzureIoT & Azure
IoT & Azure
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20
 
Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20
 
NATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemNATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging System
 
SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overview
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
OpenStack at the speed of business with SolidFire & Red Hat
OpenStack at the speed of business with SolidFire & Red Hat OpenStack at the speed of business with SolidFire & Red Hat
OpenStack at the speed of business with SolidFire & Red Hat
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
 
IoT & Azure (EventHub)
IoT & Azure (EventHub)IoT & Azure (EventHub)
IoT & Azure (EventHub)
 
Latest Updates to Azure Integration Services
Latest Updates to Azure Integration ServicesLatest Updates to Azure Integration Services
Latest Updates to Azure Integration Services
 

More from Alexander Laysha

Data exposure in Azure - production use-case
Data exposure in Azure - production use-caseData exposure in Azure - production use-case
Data exposure in Azure - production use-caseAlexander Laysha
 
Multi-Tenant Hybrid Solution based on Hybrid Connections & App Service
Multi-Tenant Hybrid Solution based on Hybrid Connections & App ServiceMulti-Tenant Hybrid Solution based on Hybrid Connections & App Service
Multi-Tenant Hybrid Solution based on Hybrid Connections & App ServiceAlexander Laysha
 
Implement API Gateway using Azure API Management
Implement API Gateway using Azure API ManagementImplement API Gateway using Azure API Management
Implement API Gateway using Azure API ManagementAlexander Laysha
 
Usage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service FabricUsage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service FabricAlexander Laysha
 
Monitoring of distributed app hosted in Azure App Service
Monitoring of distributed app hosted in Azure App ServiceMonitoring of distributed app hosted in Azure App Service
Monitoring of distributed app hosted in Azure App ServiceAlexander Laysha
 
Миграция в Azure Service Fabric
Миграция в Azure Service FabricМиграция в Azure Service Fabric
Миграция в Azure Service FabricAlexander Laysha
 

More from Alexander Laysha (6)

Data exposure in Azure - production use-case
Data exposure in Azure - production use-caseData exposure in Azure - production use-case
Data exposure in Azure - production use-case
 
Multi-Tenant Hybrid Solution based on Hybrid Connections & App Service
Multi-Tenant Hybrid Solution based on Hybrid Connections & App ServiceMulti-Tenant Hybrid Solution based on Hybrid Connections & App Service
Multi-Tenant Hybrid Solution based on Hybrid Connections & App Service
 
Implement API Gateway using Azure API Management
Implement API Gateway using Azure API ManagementImplement API Gateway using Azure API Management
Implement API Gateway using Azure API Management
 
Usage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service FabricUsage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service Fabric
 
Monitoring of distributed app hosted in Azure App Service
Monitoring of distributed app hosted in Azure App ServiceMonitoring of distributed app hosted in Azure App Service
Monitoring of distributed app hosted in Azure App Service
 
Миграция в Azure Service Fabric
Миграция в Azure Service FabricМиграция в Azure Service Fabric
Миграция в Azure Service Fabric
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

High throughput data streaming in Azure

  • 1. High-throughput data streaming in Azure Alexander Laysha Solution Architect at EPAM Systems & Microsoft Azure MVP
  • 2. 2 Few words about myself… I’m Alexander Laysha • Solution Architect from EPAM Systems & Microsoft Azure MVP • Focused on backend, high-load and cloud solutions • Leader of Belarus Azure Community • Speaker at local and external meetups and conferences My contacts • Email: layshaalex@gmail.comom • Twitter: @layshaalexander • Facebook: alexander.laysha
  • 3. 3 • Business needs for real-time analytics • Use-cases & architecture approaches • Basics of real-time data streaming platforms • Azure Event Hub capabilities & constraints • Pricing calculations for multiple data ingestion scenarios based on Event Hub • Summary Which topics will we cover?
  • 4. 4 Past world • Capture data for later analysis • Reports and analytics with X days latency Current days • Dealing with tons of data • Offline report and analysis in no longer enough (but still important) • Business want to get immediate insights from captured data with X seconds/minutes latency Business needs for data analytics
  • 5. 5 IoT – device operational intelligence and pro-active alerts Gaming Industry – real-time board with game leaders and scores E-Commerce – online recommendation engines and proactive care Operations - analyze real-time data to respond to dynamic environments in order to take immediate action Financial - monitor financial transactions in real-time to detect fraudulent activity Just few use-cases…
  • 6. 6 Collection – captured data from multiple sources Streaming - high-throughput data pipeline systems like Kafka, Kinesis, Event Hub Processing – stream processing platforms that performs a certain task to produce output Serving – app for stream processing output consumption – UI, posts, DB, report viewers, APIs High-level real-time streaming architecture OUR FOCUS
  • 7. 7 Persistence and batch - data is stored in a persistence layer from which it is ingested and processed by the batch layer periodically (may includes stream processing for on-fly ETL) Speed layer - handles the portion of the data that has not-yet been processed by the batch layer (includes stream processing and storage) Serving layer - consolidates both by merging the output of the batch and the speed layer High-level view of Lambda Architecture
  • 8. 8 High-level view of Kappa Architecture Persistence – stores initial raw data for historical purposes and can be used to replay computations from initial data stream Speed Layer - the basic idea is to not periodically recompute all data in the batch layer, but to do all computation on Speed Layer in the stream processing system alone and only perform recomputation when the business logic changes by replaying historical data
  • 9. 9 Popular platforms for data streaming Kinesis Event Hub
  • 11. 11 Common terminology Producer Producer Producer Publisher Consumer Consumer Kinesis Stream Applications Consumer Stream Topic Stream Event Hub Partition Partition Shard Partition Index Offset Sequence Number Offset Consumer Group Consumer Group Application Consumer Group
  • 12. 12 • Designed to handle very large quantities of small messages • Horizontally scalable by using partitions and consumer groups • Reliable and fault-tolerant • Configurable data replication • Configurable message TTL (stream level) • Supports at-least-once delivery • Logical data organization using partitions • Separate date view for consumer by using consumer groups and indexes • Ability to replay messages • Messages with the same key are sent to the same partition • Guarantee of message order in scope of partition • Integrated with modern stream processing platforms (Stream Analytics, Storm, Spark, etc.) Common characteristics
  • 13. 13 Let’s take a close look to Azure Event Hub Event Producers > 1M Producers > 1GB/sec Aggregate Throughput Direct PartitionKey Hash Throughput Units: • 1 ≤ TUs ≤ Partition Count • TU: 1 MB/s writes, 2 MB/s reads Namespace
  • 14. 14 Ways to publish - individual event or batch: • Round Robin • Partition Id • Partition Key Supported Protocols: • HTTPS – short-lived (low throughput) • AMQP 1.0 – long-lived, (high throughput) Publisher Policy - run-time feature designed to facilitate large numbers of independent event publishers by using unique identifier and virtual endpoint: //[my namespace].servicebus.windows.net/[event hub name]/publishers/[my publisher name] Event Hub Publishers Event Producers
  • 15. 15 Events listening - consumer connects to a partition using AMQP 1.0 protocol and listens for incoming events Consumer Groups - is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and with their own offsets Event Hub Consumers
  • 16. 16 • Security model is based on Shared Access Signature (SAS) tokens • Shared access policy (key) supports following claims: Send, Listen, Manage • Shared access policy (key): • can be created on namespace or event hub level • includes Primary and Secondary keys • Primary and Secondary key can be revoked • SAS tokens can be created on namespace, event hub or publisher level • Granular control over event publishers through publisher policies (publisher name should be the same as partition name, SAS token should be for publisher endpoint) • Event publishers can be revoked in case of usage of publisher specific SAS token Event Hub Security
  • 17. 17 • Automatic persistence of ingested events from Even Hub in Apach Avro format • Supported storages: • Azure Storage • Azure Data Lake • Configurable size & time windows per partition Event Hub Capturing
  • 18. 18 Monitoring • Integrated with Azure Monitor • Type of diagnostics data: archive logs, operational logs, auto-scale logs, all metrics • Diagnostic logs can be send to: storage account, event hub, Log Analytics Availability & Disaster Recovery • SLA for 99,9% for operations on Event Hub • HA is guaranteed by replication and availability sets • In case of failure one of the partitions, other partitions will be available • No built-in options for disaster recovery of Event Hub between regions (custom solution: use events capturing with geo redundant storage and custom code to populate Event Hub in another region) Event Hub Monitoring & Disaster Recovery
  • 19. 19 • Throughput Unit – unit of scalability, shared across all event hubs in namespace • Manually or programmatically set (TUs) • 1 TU = 1 MB/sec or 1000 events/sec on ingress, 2 MB/sec on egress, Max 100 TU for Standard Tier (contact support team) • Dedicated Event Hub: 1 CU = ~200 TU, max 8 CU • Enable Auto-Inflate for auto scaling up of TUs with ability to specify limits • Partition count: 2-32. Count is not changeable and must be specified during creation (count can be increased by contacting Microsoft) • Consumer Group count: up to 20 per Event Hub • 5 max concurrent readers on a partition per consumer group (recommended to use one active receiver on a partition per consumer group) Event Hub Scalability
  • 20. 20 • Single tenant hosting with no noise from other tenants, available to customers with an enterprise agreement • Repeatable performance every time • No additional charge for incoming messages • Message size increases to 1 MB as compared to 256 KB for Standard and Basic • Scalable between 1 and 8 capacity units (CU) – providing up to 2 million ingress events per second • CUs manage the scale for Event Hubs Dedicated, 1 CU = ~200 TU, max 8 CU • Zero maintenance: management of load balancing, OS updates, security patches, and partitioning • Fixed monthly pricing: ~720$ per day for 1 CU (pricing & CU size will change starting from October 2017: ¼ CU for 5000$ per month) Dedicated Event Hub
  • 21. 21 • Max 10 Event Hubs per Namespace • Partition limit is 1 TU • Number of AMQP connections per namespace: 5000 • Only Azure deployment, No Azure Stack support yet • No SAS on consumer group level, no built-in encryption or compression of event body • No functionality to drain Event Hub (need to create custom drainer) • No local emulator Other non covered Event Hub Constraints
  • 22. 22 !!! COSTS JUST FOR INCOMING TRAFFIC WITHOUT STORAGE COSTS 1.000 msg/sec, 1 KB in size, 1 MB/sec, 24/7 = 1 TU Standard Pricing Tier * 22.32$ + (1.000 * 60sec * 60min * 24hrs * 31d)/1.000.000 * 0.028$ = 22.32$ + 75$ = 97.3$ per month - GOOD 100.000 msg/sec, 1 KB in size, 100 MB/sec, 24/7 = 100 TU (MAX) Standard Pricing Tier * 22.32$ + (100.000 * 60sec * 60min * 24hrs * 31d)/1.000.000 * 0.028$ = 2.232$ + 7.500$ = 9.730$ per month - GOOD 1.000.000 msg/sec, 1 KB in size, 1 GB/sec, 24/7 = 4 CU Dedicated Pricing Tier * 720$ * 31d = 89.280$ per month - TOO MUCH! Event Hub pricing for ingestion
  • 23. 23 • Azure Event Hub is capable to handle middle-loaded scenarios (100.000 msg/sec or 100 MB/sec) in cost affective manner and provides good feature parity • For high-loaded scenarios (1.000.000+ msg/sec or 1+ GB/sec) or big-data scenarios it seems too expensive (Apache Kafka cluster more cheaper but requires invest into tuning & maintenance costs) • Always consider quality attribute requirements for your system before moving forward with technology decisions. PaaS is not always right choice in case of high-loaded scenarios Summary

Editor's Notes

  1. https://www.slideshare.net/hadooparchbook/streaming-architecture-patterns
  2. http://container-solutions.com/introduction-stream-processing-systems-kafka-aws-kinesis-azure-event-hubs/
  3. https://blogs.msdn.microsoft.com/opensourcemsft/2015/08/08/choosing-between-azure-event-hub-and-kafka-what-you-need-to-know/ http://container-solutions.com/introduction-stream-processing-systems-kafka-aws-kinesis-azure-event-hubs/