Azure Event Hubs – Behind the scenes
Stream data using open standards and Apache Kafka®
Kasun Indrasiri
Sr. Product Manager
Azure Messaging @Microsoft
Business cases for Event Streaming
• High volume of events produced continuously from
wide array of sources at a rapid rate.
• Web clickstream
• Anomaly and fraud detection
• Application logs
• IoT sensor data
• Real-time ETL
• Change data capture
• Respond faster to customer needs.
Event Streaming Platform
• Ingest, store, enrich and analyze millions of events in such event
streams.
• Trigger
• Event Data source
• Capture and publish data to stream ingestion layer.
• Ingest and store
• Ingests and store event streams
• Wide array of APIs and input sources
• Delivery semantics – at-least once, consumer rewind and replay.
• Scale and distribute events into storage.
• Process/Analyze
• Consumes from event ingestion and storage layer.
• Stream Processing: Ability to react in real-time, filtering, aggregating and
prepping for analytics.
• Model and serve
• Serving queries
Trigger Ingest and store Process/Analyze Model and Serve
What is Azure Event Hubs?
• Platform-as-a-Service real-time event service
• Muti-protocol(AMQP, Kafka, HTTPS) low latency
event streaming.
• Seamlessly run Apache Kafka® workloads with far
lower cost and better performance.
• Fully managed: You use the features, Azure deals
with everything else
• Polyglot Azure SDK and cross-platform client
support
• Industry-leading reliability and availability
• Best-in-class performance.
Coordinate ownership of partitions
across multiple receivers
Clients can use any
native protocol.
Partitions are like lanes
on a freeway. More
lanes, more throughput.
Entity/Topic
AMQP 1.0
What is Event Hubs for Kafka
• Event Hubs does not run/host Kafka.
• Implements Kafka protocol head.
• Single broker supports AMQP and Kafka.​
• Provides versioning and compatibility
• Support from version 1.0 and above​
• No code changes to existing applications.
• Single stable endpoint.
Why choose Event Hubs for Kafka?
Cost Efficient
Better Performance and
Reliability
Simplify
Kafka
Far lower cost compared to
running Kafka on-prem or using
managed Kafka services
No initial/recurring licensing fees
Fully managed, no hardware.
Availability Zones with no
additional cost.
End to end latency of < 10ms
Minimal latency jitter
Ability choose the SKU based
on the performance needs.
Triple replicated, AZs and 99.99
availability
SLA(premium/dedicated)
Just a quick network hop away from your
existing workloads
Zero code setup, Seamless migration
Single stable endpoint (no broker endpoints)
Fully managed
Easy to scale (1MBps to >5GBps)
Azure Essentials
by default
Security, Compliance and Availability
AAD/OAuth
VNet/BYOK/IPFiltering
Zone Redundancy/Geo-DR
Multi-protocol Event
Streaming
• Event Hubs support multiple native event streaming protocols such as AMQP and
Kafka
• Kafka is RPC protocol, but AMQP can be more suitable for data streaming
• AMQP can offer
• Better performance for certain streaming workloads.
• Better idle connection handling with heart beats.
• Less resource utilization (avoiding redundant fetch calls)
• Ability to mix and match different protocols.
• Downstream Azure services use AMQP to stream data to and from Event Hubs.
Kafka
AMQP
Kafka
AMQP
HTTPs
Azure Event Hubs
Azure Event Hubs Hosting Models
Azure Stack Hub
Owner operated
Same limits as dedicated
Connected and Disconnected hosting
Standard Premium Dedicated
Ingress: 1 MB/s – 40MBs ingress Ingress: 10 MB/s(1 PU) – 160 MB/s (16
PU) ingress
Ingress: 50 MB/s to GBs
Multi-tenant Multi-tenant, minimal cross tenant
interference.
Single tenant (You own it)
Reserved bandwidth + pay as you go. Reserved compute and memory capacity. Reserved compute and memory capacity.
Low latency with predictability. Low latency with predictability.
Throttled beyond reserved capacity No throttling limits on data
ingress/egress, Extended limits and
quotas
No throttling limits on data ingress/egress,
Extended limits and quotas
Charged for capture and ingress events Capture and ingress event are included,
Premium features.
Capture and ingress event are included,
Premium features.
Throughput Units(TU) Processing Unit(PU) Capacity Units(CU)
99.95% 99.99% 99.99%
Event Hubs is fast!
• End-to-end latency:
• message to traverse the event streaming engine from the
producer through the system to the consumer.
• Event Hubs Premium – end-to-end latency is < 10ms for both Kafka
and AMQP workloads.
• Predictable low latency for high volume workloads.
• Faster than native Kafka brokers and managed Kafka offerings.
• More details at: Benchmarking Azure Event Hubs Premium for Kafka
and AMQP workloads - Microsoft Tech Community
0
2
4
6
8
10
12
0 20 40 60 80 100 120
E2E
Latency
(ms)
Number of Partitions
Event Hubs E2E Latency (4 PU, 1MB/s - 10 MB/s)
Event Hubs - Native Event Hubs - Kafka
High Availability
• Replicas/Fault Domain Placement
• Events are replicated across the cluster maintaining the low end to end
latency.
• Every topic partition is replicated three times
• One replica is designated as the primary/leader.
• Cluster VMs are spread across at least 3 fault domains such that the loss of a
rack or network poses no availability risk. Recovery from a fault domain
failure is fully automated and the system maintains SLA.
• Availability Zones(AZs)
• Each cluster spans three availability zones and maintains SLA without any
tolerance for data loss when one or two zones fail.
• Data is replicated more than one of the AZ instances before the producer is
acknowledged.
• Azure Azs support in Event Hubs are offered with no additional cost.
Event Hubs under the hood
Logical Architecture
3-tier architecture: Networking, Messaging, Storage
Gateway
Backend (Broker)
Storage
• Connection Management
• IP Filtering, VNET/PEP
• Transport Level Security (TLS)
• Authorization Handling
• Entity Management
• HTTPS / WebSocket Protocol
• AMQP 1.0 Protocol
• Apache Kafka Protocol
Azure Resource
Management API
Azure Portal
• Partition Placement
• Volatile State Replication
• Sequencing & Timestamping
• Journal Cache
• Indexing
• Pending request tracking
• Checkpoint handling
• At-Rest Encryption (CMK)
• EH Capture
AMQP
Azure
Active
Directory
• Binary Log Data Store
• Index Store
• At-Rest Data Replication
Premium Units (PU)
buy isolated capacity
at this layer
HTTPS / WebSockets
AMQP 1.0 Apache Kafka RPC
HTTPS
(RPC)
Availability Zone 2 Availability Zone 3
Availability Zone 1
Backend and Gateway Clusters
Logical Architecture meets Placement
Fault Domain 1 Fault Domain 2 Fault Domain 3 Fault Domain 1 Fault Domain 2 Fault Domain 3 Fault Domain 1 Fault Domain 2 Fault Domain 3
Azure Virtual Machine Scale Sets (VMSS) – VM Placement/Deployment & Networking (SLB)
VM2
VM1
VM4
VM3
VM6
VM5
VM8
VM7
VM10
VM9
VM12
VM11
VM14
VM13
VM16
VM15
VM18
VM17
Azure Service Fabric (SF) – Leader election (who owns what?), leader lookup, process placement and activation
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Gateway and Backend
Workload placement
Gateway
Backend
Azure Resource
Management API
Azure Portal
AMQP
Azure
Active
Directory
HTTPS / WebSockets
AMQP 1.0 Apache Kafka
VM1 VM3 VM5 VM7 VM9 VM11 VM13 VM15 VM17
Service Fabric Placement – Stateless Gateway Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
The gateway looks up the
backend owner of an entity from
SF for routing.
Entities are managed
by the gateway layer
AuthZ delegated
to AAD
Apache Kafka clients only see one broker
that owns all partitions. Partition
ownership is abstracted.
P P
Backend: Event Hubs Premium.
Workload placement
Backend
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Designated
secondaries in
different fault domains
and AZ reserved for
failover
Host 2
Namespace Broker Process 2 (4PU)
P P P P P P P P
VM6
CPU
Cores
L-Series
1 2 Memory
Allocations
8GB
Host 1
VM6
CPU
Cores
L-Series
1 2 + Memory
Allocations
8GB
Namespace Broker Process 1 (4PU)
P P P P P P P P
Namespace PU are split across processes:
1 PU = 2 Proc (8GB Mem), 1 Core/Proc (2C)
2 PU = 2 Proc, 1 Core/Proc + 1 Core (3C)
4 PU = 2 Proc, 2 Core/Proc + 1 Core (5C)
8 PU = 4 Proc, 2 Core/Proc + 1 Core (9C)
16 PU = 4 Proc, 4 Core/Proc + 1 Core (17C)
Cores are exclusively
mapped to a broker
process.
>=2PU: 1 Core extra for
utility tasks.
Partition ownership is
dynamically mapped to the
process(es) associated with a
namespace via SF
Isolated VMs / L88is
NVMe NVMe NVMe NVMe NVMe NVMe NVMe
Storage Layer – Event Hubs Premium
Backend
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
P P
Storage Account 1 (ZRS) Stg Acc N
Stg Acc 2
Partition X Log
Extent
1
Extent
2
B K E T V A
Normalizing / Indexing
Batching
Block Store Provider
Index
At-Rest Encryption (CMK)
Local
Block
Store
VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18
Service Fabric Placement – Stateful Backend Processes
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
Process
Container
B B
rpc
Partition X Log
Extent
1
Extent
2
Extent
3
Active Writes
Sealed
Sealed
B
As with EH Std, Partitions are
mapped to accounts.
But only complete extents
are written to storage at
once.
Block store is a fast NVMe
based append log store.
Native code, Service
Fabric Replication.
Synchronous 3x availability zone
replication with flush to disk for
each write:
Consistently under 3ms.
Azure Storage
Networking Features
Firewall, Virtual Network Integration with Private Endpoints
Gateway
HTTPS / WebSockets
AMQP 1.0 w/ TLS Apache Kafka RPC
VM1 VM3 VM5 VM7 VM9 VM11 VM13 VM15 VM17
Each cluster has a single public
load-balancer IPv4 address.
The address is generally stable and
will very rarely change. But: use
DNS firewall rules on your
namespace.
ns-eh1-prod-am3-403.cloudapp.net
52.236.186.64
CNAME contoso.servicebus.windows.net
Namespace names alias the cluster DNS
name. EH relies on that hostname to identify
the namespace tenant and it can therefore
not be further aliased.
TLS 1.2 is the default. All
current, supported clients use
TLS 1.2 and all traffic
generally uses TLS.
Legacy clients are still
permitted to use TLS <1.2,
customer controlled.
TLS is terminated at the
gateway VMs.
Common, namespace-level IP filter and VNet/PEP firewall policy enforcement on each VM.
Each cluster has an Azure-private
"IPv6 Service Endpoint" address for
private endpoints.
Customer Virtual Network: 10.0.0.0/8
Subnet 1: 10.1.1.0/24
VM
10.1.1.28
Private Endpoint
10.1.1.42
IPv6 SE
Client-Side Firewalls & Proxies
WebSockets AMQP tunneling
allows port 443 firewall traversal.
Event Streaming
Pipelines on Azure
Event Streaming with Azure
Stream Analytics
Jobs
Azure Data Explorer
Real-time stream processing
Data Lake Storage
Gen2
Storage blob
Streaming ETL
Big data analytics
Azure Synapse
Analytics
Function Apps
Kubernetes
Services
Event Streaming Apps
Azure Schema Registry
• Event Streaming often requires structured
data​.
• New consumers need to understand the
format of the messages.
• Validate event stream data, evolution of
event data
• Interaction of producers and consumers
without directly sharing schema.
• Included with Azure Event Hubs with no
additional cost.
Real-time event stream processing with Azure Stream
Analytics
• Process large volumes of streaming data with sub-
millisecond latencies with Azure Stream Analytics
• Create streaming pipelines using intuitive graphical drag-
and-drop tool which is built into Event Hubs and runs on
Azure Stream Analytics.
Capturing Event Streams
• You can capture event streams to data lakes and warehouses using built-in capture
feature or using Azure Stream Analytics jobs.
Data loading to Azure Data Explorer from Event Hubs
• Azure Data Explorer offers ingestion (data loading) from Event Hubs for near-real time real-
time analysis on large volumes of streaming data.
Learn more at
https://aka.ms/eventhubs
Checkout our blogs for updates and more
https://blogs.msdn.microsoft.com/eventhubs/
Contact us
askeventhubs@microsoft.com
Find us on GitHub
https://github.com/Azure/azure-event-hubs
Learn about EventHubs on Azure Stack
https://aka.ms/eventhubsonstack
Learn about Dedicated Event Hubs Clusters
https://aka.ms/eventhubsclusterquickstart
Event Hubs Resources
Thank
You.
Event Hubs – High Level Architecture
Coordinate ownership of partitions
across multiple receivers
Clients can use any
native protocol.
Partitions are like lanes
on a freeway. More
lanes, more throughput.
Entity/Topic
AMQP 1.0
Similar yet very different
Azure Event Hubs Apache Kafka
User Model Partitioned event stream broker with high-availability replication Partitioned event stream broker with high-availability replication
Architecture Multi-tenant, 3-Tier Gateway/Broker/Storage cluster model, with tenant-
isolation, all tiers independently scalable
Single-tenant monolith. Need to increase broker instances in a cluster to
scale any dimension.
Implementation Language C# and Native (C/C++) Java
Cluster Manager Azure Service Fabric (inline) Apache Zookeeper (external); KRaft (inline, experimental)
Partition Mapping Key hashing, client or server-side mapping of events Key hashing, client-side mapping of events
Consumer Partition Ownership Coordination Server-coordinated partition ownership (Kafka), client-coordinated
ownership with external leader election. Parallel, direct partition reads.
Server-coordinated partition ownership
Server Workload Balancing Dynamic and fully automated (100% hands-off). Broker resource allocation
independent of partition count or ownership, flexible scaling.
Static assignment of partitions to broker instances requiring operator
intervention for rebalancing.
Storage Model Replicated log store, synchronous per-message flush-to-disk on all replicas Replicated log store, asynchronous flush-to-disk controlled by host file
system write cache settings.
Networking Single endpoint access to all partitions, Public IP/DNS or Virtual
Networking, Firewall.
Endpoint per broker instance. Multiple IPs required. Complex network
management required.
Access Control Token-based access policy model with unlimited publisher policies, Azure
Active Directory role-based access control
Local accounts, federation extensibility.
Protocols AMQP 1.0 (optional: WebSockets)
HTTPS 1.1
Apache Kafka RPC
Apache Kafka RPC
Batching / Archives Avro-packaged batch-packaging and archival to blob store
Schema Registry Schema Registry based on open CNCF Schema Registry API (Proprietary from commercial vendors)
Azure Event Hubs vs Apache Kafka®

Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022

  • 1.
    Azure Event Hubs– Behind the scenes Stream data using open standards and Apache Kafka® Kasun Indrasiri Sr. Product Manager Azure Messaging @Microsoft
  • 2.
    Business cases forEvent Streaming • High volume of events produced continuously from wide array of sources at a rapid rate. • Web clickstream • Anomaly and fraud detection • Application logs • IoT sensor data • Real-time ETL • Change data capture • Respond faster to customer needs.
  • 3.
    Event Streaming Platform •Ingest, store, enrich and analyze millions of events in such event streams. • Trigger • Event Data source • Capture and publish data to stream ingestion layer. • Ingest and store • Ingests and store event streams • Wide array of APIs and input sources • Delivery semantics – at-least once, consumer rewind and replay. • Scale and distribute events into storage. • Process/Analyze • Consumes from event ingestion and storage layer. • Stream Processing: Ability to react in real-time, filtering, aggregating and prepping for analytics. • Model and serve • Serving queries Trigger Ingest and store Process/Analyze Model and Serve
  • 4.
    What is AzureEvent Hubs? • Platform-as-a-Service real-time event service • Muti-protocol(AMQP, Kafka, HTTPS) low latency event streaming. • Seamlessly run Apache Kafka® workloads with far lower cost and better performance. • Fully managed: You use the features, Azure deals with everything else • Polyglot Azure SDK and cross-platform client support • Industry-leading reliability and availability • Best-in-class performance. Coordinate ownership of partitions across multiple receivers Clients can use any native protocol. Partitions are like lanes on a freeway. More lanes, more throughput. Entity/Topic AMQP 1.0
  • 5.
    What is EventHubs for Kafka • Event Hubs does not run/host Kafka. • Implements Kafka protocol head. • Single broker supports AMQP and Kafka.​ • Provides versioning and compatibility • Support from version 1.0 and above​ • No code changes to existing applications. • Single stable endpoint.
  • 6.
    Why choose EventHubs for Kafka? Cost Efficient Better Performance and Reliability Simplify Kafka Far lower cost compared to running Kafka on-prem or using managed Kafka services No initial/recurring licensing fees Fully managed, no hardware. Availability Zones with no additional cost. End to end latency of < 10ms Minimal latency jitter Ability choose the SKU based on the performance needs. Triple replicated, AZs and 99.99 availability SLA(premium/dedicated) Just a quick network hop away from your existing workloads Zero code setup, Seamless migration Single stable endpoint (no broker endpoints) Fully managed Easy to scale (1MBps to >5GBps) Azure Essentials by default Security, Compliance and Availability AAD/OAuth VNet/BYOK/IPFiltering Zone Redundancy/Geo-DR
  • 7.
    Multi-protocol Event Streaming • EventHubs support multiple native event streaming protocols such as AMQP and Kafka • Kafka is RPC protocol, but AMQP can be more suitable for data streaming • AMQP can offer • Better performance for certain streaming workloads. • Better idle connection handling with heart beats. • Less resource utilization (avoiding redundant fetch calls) • Ability to mix and match different protocols. • Downstream Azure services use AMQP to stream data to and from Event Hubs. Kafka AMQP Kafka AMQP HTTPs Azure Event Hubs
  • 8.
    Azure Event HubsHosting Models Azure Stack Hub Owner operated Same limits as dedicated Connected and Disconnected hosting Standard Premium Dedicated Ingress: 1 MB/s – 40MBs ingress Ingress: 10 MB/s(1 PU) – 160 MB/s (16 PU) ingress Ingress: 50 MB/s to GBs Multi-tenant Multi-tenant, minimal cross tenant interference. Single tenant (You own it) Reserved bandwidth + pay as you go. Reserved compute and memory capacity. Reserved compute and memory capacity. Low latency with predictability. Low latency with predictability. Throttled beyond reserved capacity No throttling limits on data ingress/egress, Extended limits and quotas No throttling limits on data ingress/egress, Extended limits and quotas Charged for capture and ingress events Capture and ingress event are included, Premium features. Capture and ingress event are included, Premium features. Throughput Units(TU) Processing Unit(PU) Capacity Units(CU) 99.95% 99.99% 99.99%
  • 9.
    Event Hubs isfast! • End-to-end latency: • message to traverse the event streaming engine from the producer through the system to the consumer. • Event Hubs Premium – end-to-end latency is < 10ms for both Kafka and AMQP workloads. • Predictable low latency for high volume workloads. • Faster than native Kafka brokers and managed Kafka offerings. • More details at: Benchmarking Azure Event Hubs Premium for Kafka and AMQP workloads - Microsoft Tech Community 0 2 4 6 8 10 12 0 20 40 60 80 100 120 E2E Latency (ms) Number of Partitions Event Hubs E2E Latency (4 PU, 1MB/s - 10 MB/s) Event Hubs - Native Event Hubs - Kafka
  • 10.
    High Availability • Replicas/FaultDomain Placement • Events are replicated across the cluster maintaining the low end to end latency. • Every topic partition is replicated three times • One replica is designated as the primary/leader. • Cluster VMs are spread across at least 3 fault domains such that the loss of a rack or network poses no availability risk. Recovery from a fault domain failure is fully automated and the system maintains SLA. • Availability Zones(AZs) • Each cluster spans three availability zones and maintains SLA without any tolerance for data loss when one or two zones fail. • Data is replicated more than one of the AZ instances before the producer is acknowledged. • Azure Azs support in Event Hubs are offered with no additional cost.
  • 11.
  • 12.
    Logical Architecture 3-tier architecture:Networking, Messaging, Storage Gateway Backend (Broker) Storage • Connection Management • IP Filtering, VNET/PEP • Transport Level Security (TLS) • Authorization Handling • Entity Management • HTTPS / WebSocket Protocol • AMQP 1.0 Protocol • Apache Kafka Protocol Azure Resource Management API Azure Portal • Partition Placement • Volatile State Replication • Sequencing & Timestamping • Journal Cache • Indexing • Pending request tracking • Checkpoint handling • At-Rest Encryption (CMK) • EH Capture AMQP Azure Active Directory • Binary Log Data Store • Index Store • At-Rest Data Replication Premium Units (PU) buy isolated capacity at this layer HTTPS / WebSockets AMQP 1.0 Apache Kafka RPC HTTPS (RPC)
  • 13.
    Availability Zone 2Availability Zone 3 Availability Zone 1 Backend and Gateway Clusters Logical Architecture meets Placement Fault Domain 1 Fault Domain 2 Fault Domain 3 Fault Domain 1 Fault Domain 2 Fault Domain 3 Fault Domain 1 Fault Domain 2 Fault Domain 3 Azure Virtual Machine Scale Sets (VMSS) – VM Placement/Deployment & Networking (SLB) VM2 VM1 VM4 VM3 VM6 VM5 VM8 VM7 VM10 VM9 VM12 VM11 VM14 VM13 VM16 VM15 VM18 VM17 Azure Service Fabric (SF) – Leader election (who owns what?), leader lookup, process placement and activation Process Container Process Container Process Container Process Container Process Container Process Container Process Container Process Container Process Container Process Container Process Container Process Container Process Container
  • 14.
    Gateway and Backend Workloadplacement Gateway Backend Azure Resource Management API Azure Portal AMQP Azure Active Directory HTTPS / WebSockets AMQP 1.0 Apache Kafka VM1 VM3 VM5 VM7 VM9 VM11 VM13 VM15 VM17 Service Fabric Placement – Stateless Gateway Processes Process Container Process Container Process Container Process Container Process Container Process Container Process Container VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18 Service Fabric Placement – Stateful Backend Processes Process Container Process Container Process Container Process Container Process Container Process Container Process Container The gateway looks up the backend owner of an entity from SF for routing. Entities are managed by the gateway layer AuthZ delegated to AAD Apache Kafka clients only see one broker that owns all partitions. Partition ownership is abstracted. P P
  • 15.
    Backend: Event HubsPremium. Workload placement Backend VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18 Service Fabric Placement – Stateful Backend Processes Process Container Process Container Process Container Process Container Process Container Process Container Process Container Designated secondaries in different fault domains and AZ reserved for failover Host 2 Namespace Broker Process 2 (4PU) P P P P P P P P VM6 CPU Cores L-Series 1 2 Memory Allocations 8GB Host 1 VM6 CPU Cores L-Series 1 2 + Memory Allocations 8GB Namespace Broker Process 1 (4PU) P P P P P P P P Namespace PU are split across processes: 1 PU = 2 Proc (8GB Mem), 1 Core/Proc (2C) 2 PU = 2 Proc, 1 Core/Proc + 1 Core (3C) 4 PU = 2 Proc, 2 Core/Proc + 1 Core (5C) 8 PU = 4 Proc, 2 Core/Proc + 1 Core (9C) 16 PU = 4 Proc, 4 Core/Proc + 1 Core (17C) Cores are exclusively mapped to a broker process. >=2PU: 1 Core extra for utility tasks. Partition ownership is dynamically mapped to the process(es) associated with a namespace via SF Isolated VMs / L88is
  • 16.
    NVMe NVMe NVMeNVMe NVMe NVMe NVMe Storage Layer – Event Hubs Premium Backend VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18 Service Fabric Placement – Stateful Backend Processes Process Container Process Container Process Container Process Container Process Container Process Container Process Container P P Storage Account 1 (ZRS) Stg Acc N Stg Acc 2 Partition X Log Extent 1 Extent 2 B K E T V A Normalizing / Indexing Batching Block Store Provider Index At-Rest Encryption (CMK) Local Block Store VM2 VM4 VM6 VM8 VM10 VM12 VM14 VM16 VM18 Service Fabric Placement – Stateful Backend Processes Process Container Process Container Process Container Process Container Process Container Process Container Process Container B B rpc Partition X Log Extent 1 Extent 2 Extent 3 Active Writes Sealed Sealed B As with EH Std, Partitions are mapped to accounts. But only complete extents are written to storage at once. Block store is a fast NVMe based append log store. Native code, Service Fabric Replication. Synchronous 3x availability zone replication with flush to disk for each write: Consistently under 3ms. Azure Storage
  • 17.
    Networking Features Firewall, VirtualNetwork Integration with Private Endpoints Gateway HTTPS / WebSockets AMQP 1.0 w/ TLS Apache Kafka RPC VM1 VM3 VM5 VM7 VM9 VM11 VM13 VM15 VM17 Each cluster has a single public load-balancer IPv4 address. The address is generally stable and will very rarely change. But: use DNS firewall rules on your namespace. ns-eh1-prod-am3-403.cloudapp.net 52.236.186.64 CNAME contoso.servicebus.windows.net Namespace names alias the cluster DNS name. EH relies on that hostname to identify the namespace tenant and it can therefore not be further aliased. TLS 1.2 is the default. All current, supported clients use TLS 1.2 and all traffic generally uses TLS. Legacy clients are still permitted to use TLS <1.2, customer controlled. TLS is terminated at the gateway VMs. Common, namespace-level IP filter and VNet/PEP firewall policy enforcement on each VM. Each cluster has an Azure-private "IPv6 Service Endpoint" address for private endpoints. Customer Virtual Network: 10.0.0.0/8 Subnet 1: 10.1.1.0/24 VM 10.1.1.28 Private Endpoint 10.1.1.42 IPv6 SE Client-Side Firewalls & Proxies WebSockets AMQP tunneling allows port 443 firewall traversal.
  • 18.
  • 19.
    Event Streaming withAzure Stream Analytics Jobs Azure Data Explorer Real-time stream processing Data Lake Storage Gen2 Storage blob Streaming ETL Big data analytics Azure Synapse Analytics Function Apps Kubernetes Services Event Streaming Apps
  • 20.
    Azure Schema Registry •Event Streaming often requires structured data​. • New consumers need to understand the format of the messages. • Validate event stream data, evolution of event data • Interaction of producers and consumers without directly sharing schema. • Included with Azure Event Hubs with no additional cost.
  • 21.
    Real-time event streamprocessing with Azure Stream Analytics • Process large volumes of streaming data with sub- millisecond latencies with Azure Stream Analytics • Create streaming pipelines using intuitive graphical drag- and-drop tool which is built into Event Hubs and runs on Azure Stream Analytics.
  • 22.
    Capturing Event Streams •You can capture event streams to data lakes and warehouses using built-in capture feature or using Azure Stream Analytics jobs.
  • 23.
    Data loading toAzure Data Explorer from Event Hubs • Azure Data Explorer offers ingestion (data loading) from Event Hubs for near-real time real- time analysis on large volumes of streaming data.
  • 24.
    Learn more at https://aka.ms/eventhubs Checkoutour blogs for updates and more https://blogs.msdn.microsoft.com/eventhubs/ Contact us askeventhubs@microsoft.com Find us on GitHub https://github.com/Azure/azure-event-hubs Learn about EventHubs on Azure Stack https://aka.ms/eventhubsonstack Learn about Dedicated Event Hubs Clusters https://aka.ms/eventhubsclusterquickstart Event Hubs Resources
  • 25.
  • 26.
    Event Hubs –High Level Architecture Coordinate ownership of partitions across multiple receivers Clients can use any native protocol. Partitions are like lanes on a freeway. More lanes, more throughput. Entity/Topic AMQP 1.0
  • 27.
    Similar yet verydifferent Azure Event Hubs Apache Kafka User Model Partitioned event stream broker with high-availability replication Partitioned event stream broker with high-availability replication Architecture Multi-tenant, 3-Tier Gateway/Broker/Storage cluster model, with tenant- isolation, all tiers independently scalable Single-tenant monolith. Need to increase broker instances in a cluster to scale any dimension. Implementation Language C# and Native (C/C++) Java Cluster Manager Azure Service Fabric (inline) Apache Zookeeper (external); KRaft (inline, experimental) Partition Mapping Key hashing, client or server-side mapping of events Key hashing, client-side mapping of events Consumer Partition Ownership Coordination Server-coordinated partition ownership (Kafka), client-coordinated ownership with external leader election. Parallel, direct partition reads. Server-coordinated partition ownership Server Workload Balancing Dynamic and fully automated (100% hands-off). Broker resource allocation independent of partition count or ownership, flexible scaling. Static assignment of partitions to broker instances requiring operator intervention for rebalancing. Storage Model Replicated log store, synchronous per-message flush-to-disk on all replicas Replicated log store, asynchronous flush-to-disk controlled by host file system write cache settings. Networking Single endpoint access to all partitions, Public IP/DNS or Virtual Networking, Firewall. Endpoint per broker instance. Multiple IPs required. Complex network management required. Access Control Token-based access policy model with unlimited publisher policies, Azure Active Directory role-based access control Local accounts, federation extensibility. Protocols AMQP 1.0 (optional: WebSockets) HTTPS 1.1 Apache Kafka RPC Apache Kafka RPC Batching / Archives Avro-packaged batch-packaging and archival to blob store Schema Registry Schema Registry based on open CNCF Schema Registry API (Proprietary from commercial vendors) Azure Event Hubs vs Apache Kafka®