Hi all … and thanks for coming to this session for the Global Azure Bootcamp in California
In this session we’ll have a global overview on a new service in the Azure Service Bus family : the Event Hubs
We’ll see hot it provides great support and features for Internet of Things solution in the telemetry space and data ingestion
My name is Paolo Patierno and I’ll introduce myself in the next slide
Let me introduce myself … who am I ?
My name is Paolo Patierno and I’m from Naples (in Italy) and I’m a Senior Software Engineer in a company named Leonardo Ricerche that builds embedded products connected to the Cloud and IoT solutions so in my job I constantly moving from devices to the cloud. I develop from low level application to high level application with a UI and Cloud solution for Internet of Things I’m Microsoft MVP on Windows Embedded but focused on Internet of Things stuff so all related protocols and cloud platforms. I’m member of some regional communities like DotNetCampania and TinyCLR.it that is an Italian community with a focus on .Net Micro Framework.
Of course I’m board of director member of Embedded101 community thanks to Sam.
To contact me for any questions you have a lot of way (socials and email) … every day I do my best to reply all … but you can be sure that I reply all
What we’ll see during this session ?
The session is focused on Event Hubs, one of the new services added inside Azure Service Bus and it’s related to one of the main communication patterns in the IoT … Telemetry. Speaking about telemetry, we speak about a lot of data that a lot of devices send to the cloud (for example sensors, connected cars, smart homes, smart buildings, apps and so on). We need to “ingest” all the data and process or store them for extracting useful information and figure out what we can do with them (for example analytics, machine learning and so on). The first step could be to use Queues and/or Topics provided by Azure Service Bus and We’ll have a quick introduction about them.
The BIG problem is always telemetry but … at scale when we have million, billion of devices and in this case a very good solution is to use the new Event Hubs service inside Azure Service Bus. We’ll see why to use it, the related architecture and features and differences with Queues and Topics.
Finally, I hope to have time to show you a quick demo about it.
As I said, one of the main IoT communication pattern is the Telemetry pattern.
Every device sends data to another system (for example in the Cloud) for conveying information about the status of the device itself and environment around it using for example one or more sensors (temperature, pressure, light and so on). The communication is unidirectional, the system in the cloud doesn’t send any data or related information to the device (there are other IoT patterns for this like Notify, Command and Inquiry). Of course, the data frequency depends on the application and the conditions around the device. We can have devices that send data every few minutes or every few seconds.
The first solution provided by Microsoft for the Telemetry problem is based on Queues even if we can use Queues for all the other IoT patterns.
A queue provides a way to decouple a sender and receiver so that the sender can send messages asynchronously and the receiver can get messages at its own pace. The messages are stored inside the queue. So we can have a sender who sends message to the queue and a receiver who gets messages from the queue. Of course we can have more senders (we call it “fan in”) and more receivers (we call it “fan out”).
However with queues we apply the Competing Consumers pattern. Each message can be consumed by a single consumer and then deleted from the queue.
More consumers “fight” on the same queue but only one is able to get the first message available in the queue. The consumer can get the message in two ways :
Receive & Delete : the consumer gets the message from the queue and the system deletes the message. In this way if the consumer crashes during message processing, the message is lost Peek Lock : the consumer acquires a lock on a message and try to process it. During the lock, the message is in the queue but it isn’t visible to the other consumers. At the end, the consumer can notify the system to delete the message because it is processed or can notify the system to make visible the message to other consumers because it isn’t able to process it
You can think about a kind of cursor on the queue that provides the next message to consumers. When a message is consumed, the cursor gives the next message to a reading consumer.
If you need to have more receivers who get same messages or filtered on some criteria we can use topics and subscription.
On the sender side, a topic acts like a queue. On the receiver side, each receiver has a subscription related to the topic : each message inside the topic is replicated to all subscriptions for that topic. We can apply a filter on the subscriptions based on message properties so that we can differentiate the destination subscriptions (for example we can have a receiver for logging interested to all “error” messages and we can have a receiver for storing data interested to all “info” messages and so on).
In this case the topics provide the Publish/Subscribe pattern (it’s different from Competing Consumers we saw speaking about queues).
So what are the main features for Queues and Topics
The messages are stored but with a TTL (Time To Live). When it expires, the message is deleted and it’s not available in the queue. Expired messages can go into a special queue so called “dead-letter” queue. The same could be for so called “poisoned” messages who are malformed messages that the receivers can’t process. We can have a “special” consumer on a “dead-letter” queue that is able to understand why some messages are malformed.
A receiver can get a message from the queue and the Service Bus deletes it (Receive & Delete mode) or the receiver can Lock the next message, process it and then confirms to the system to delete it. In the first case, if the receiver crashes the message is lost (already deleted). In the second case, if the receiver crashes the message returns in the queue and it is available for another receiver.
We can have Request/Reply pattern (as for HTTP protocol) with a request sends to a queue and response get from another queue. There is a way to correlate request message id with response message id. The sender set an ID on the request message and set the name of a queue (the ReplyTo queue) for receiving response messages.
Advanced features are sessions (grouping messages) and transaction to have send/receive feature completed in batch (like transaction on databases)
When the number of devices increase we have the BIG problem …. Telemetry at scale
We speak about one hundred devices, ten thousands to one million devices … and all send data in parallel to the Cloud
The million or billions of devices we are talking about at scale could be devices in our smart homes or connected cars that send data with different frequency.
We can have applications on PC, embedded systems or smartphones and tables.
We can have games … and a great example is Halo that uses Service Bus as ingestion system.
Here the ingestion and processing scenario …
Starting from the left we have different devices :
IP-capable to connect directly to the ingestion system. They have enough resources like CPU and memory for a full TCP/IP stack and SSL/TLS protocol support Low-power devices based on RTOS (Real Time Operating System) there aren’t capable to connect to the Cloud (they don’t have TCP/IP stack) but use other protocols stack (ex. BLE, ZigBee, Z-Wave, …) to connect to a field gateway who connects to the cloud. The field gateway is like a bridge with full capabilities for cloud connection (for example a Raspberry Pi or Windows Embedded Compact based device and the future Windows 10 for IoT devices). Legacy IOT devices that use custom/proprietary protocol but are able to connect to the ingestion system through a gateway in the Cloud. This cloud gateway translates the custom/proprietary protocol messages into related messages that the ingestion system can understand. We can think about IoT protocols like MQTT and CoAP
We can also have applications running on PC or smartphones that send data for monitoring or logging.
After collecting messages, all data go into the ingestion system based on Event Hubs.
All ingested data are processed in different ways from stream processing to long-term storage and finally extracted information are shown to the user with the right presentation layer. In this scenario we can have web, desktop or mobile applications or powerful tool like Power BI
Event Hubs is different from Queues and Topics because it provides the Partitioned Consumers pattern
Each producer sends data to a partition directly or with a partition key on the message. In the second case, the system process an hash from the partition key and obtain the related partition. Of course, same partition key is always related to the same partition. Each receiver consumes data from a partition but more receivers are grouped inside consumer groups. With consumer groups we can “simulate” a publish/subscribe pattern : we can have different receivers who consumes data from the same partition but they are in different consumer groups. A consumer group is like a “view” on the data stream and each consumer group can process data for different purposes like logging, error processing, long-term storage, stream analysis and so on
When receiver get data from the event hub, the data itself isn’t removed from the stream. This is a big difference from queues and topics. The cursor to get data from the stream isn’t in the central system (like queues and topics) but it’s on the receiver side : in this way a receiver can “rewind” and re-read all the data in the stream to process them in the same or different way. It’s up to the receiver to save a “checkpoint” and remember what’s last data it read to restart from there if it crashes. This checkpoint could be based on message offset inside the stream or related timestamp.
The main Event Hub elaboration unit is called throughput unit that we can consider like a virtual machine. Its role is process data from one or more partition inside one or more Event Hubs.
All the communication is based on the standard AMQP (Advanced Message Queueing Protocol) protocol or HTTP protocol and the channel is encrypted using SSL/TLS.
As I said, producers and receivers send and receive data to and from partitions in parallel
Using partitions allow to system to “scale out”, I mean to scale horizontally
The default number of partitions is 16 with minimum 8 and maximum 32. You can have up to 1024 after a request to the Azure support and only for special condition.
A producer can send data to partitions in the following ways :
First, It doesn’t specify a partition and in this case the system uses a round robin distribution. If we suppose an event hub with 8 partitions, It means that the first received message goes into partition one, then partition two until partition eight. The next message will go into partition one. In this case you can’t have FIFO (First In First Out) feature for events because messages are spread across different partitions and more receivers read them in parallel In the second way, It can specify the partition id directly Third way, It can set a partition key inside the message. Based on this partition key, the system process an hash that is related to a specific partition. The system guarantees that for the same partition key we have the same hash so the same destination partition.
In the last two ways, you send messages to a specific partition (directly or not) so you can have FIFO feature for the messages sent by a specific device, if it sends data always to the same partition. Partition key is the most used because you can group data based on it.
For the last choice, It can use a publisher policy. On the system there is a so called “publisher” endpoint for each specific producer (it’s like a virtual endpoint in the Cloud for a device). Related to this “publisher” there is a policy based on a token so that we can grant or remove access to the event hub with granularity on single device (remember that the base access to the Azure Service Bus is provided using SAS, Secure Access Signature)
The supported protocols are HTTP if the device has a short lived connection to the system and it sends data with low frequency. It is better to use AMQP protocol for long lived connection with a device that sends data with high frequency. In this case, there isn’t the high cost of open and closing connection for every communication like for HTTP (based on a request/response pattern). Another AMQP advantage is that the SSL/TLS handshake is execute only one time on connection.
Receivers are grouped in consumer group and we have only one receiver for partition. We can have more receivers on the same partition but across consumer group.
We can consider consumer groups like “views” on the stream so they provided Publish/Subscribe pattern like topics and subscriptions.
At least one consumer group (the default) and up to 20.
To simplify the interaction with the event hub for receiving messages and handle fails, checkpoint and so on we have a specific Nuget package with Event Processor that provides a .Net API abstraction for receivers.
It provides the IEventProcessor interface we can implement to handle more messages in batch. Each processor instance is a receiver for a specific partition and it is registered inside a specified consumer group. All processor are managed by an Event Processor Host.
When a processor is registered for a partition, it has a lease that is used by the Event Processor Host to handle failover and scale for us. It means that if a processor crashes, another processor acquires the lease and we are able to receive from that partition. If we starts more processors for the same event hub, the Event Processor Host handles a load balancing assigning partitions across all available processors. For example we starts with a single worker role that use an Event Processor Host that provides eight processors to get data from all eight partitions. If we need to scale up, we can create another worker role and the two Event Processor Host will have both four processors to get data from four partitions in the Event Hubs.
Last but not least feature is the checkpoint. We can use an API to invoke checkpoint so that the processor saves its offset and timestamp in the stream inside a blob storage. It’s used to recover offset information if the processor goes offline and then restarts.
This package saves a lot of developing time instead of creating manually receivers on partitions and handle scaling and failing over.
Throughput unit is the processing unit (like a virtual machine) for the event hubs and it’s the main billing unit even if we have to add the number of messages transmitted (we can see pricing details later in the presentation).
TU provides 1 MB/sec ingress and 2 MB/sec egress with data retention of 84 GB/day.
One TU can handle more partitions but in this case we have lower performance (at lower cost). For better performance we need one TU for partition but at a high cost.
TUs are used across more event hubs inside the same namespace. It means that a TU can process more partitions across more event hubs inside the same namespace.
So … in summary what are are the main differences between Event Hubs and Queues/Topics ?
First of all we spoke about different patterns (Competing Consumers for Queues, Publish/Subscribe for Topics and Partitioned Consumers for Event Hubs). It means that Queues and Topics are useful for telemetry but also for command messages when you need to send a command to a device from the system in the cloud . For example, you can do it sending a message to a queue and the device gets the command from it. In this scenario, the device can reply with command result on another queue so we can implement request/replay pattern. Event Hubs allows unidirectional flow of data so called event only from devices to the cloud.
Speaking about the receivers side, with Queues and Topics we have a cursor on the server (on the queue and on the subscription) so that when the message is consumed it is deleted and cursor moves to the next available message. With Event Hubs the cursor is on the client that needs to save a checkpoint to remember what is the offset or the timestamp in the event stream so that it restarts from the last read event when it shutdowns or crashes and restarts.
Regarding the messages retention, there is a TTL (Time To Live) on Queues and Topics. It could be defined at entity level or message level and the message is deleted when TTL expires. On Event Hubs there isn’t TTL but the messages are retained from 1 to a max of 7 days.
The main topic is on security and authentication …
All entities in the Azure Service Bus are accessed using SAS (Secure Access Signature). With a SAS policy we can define the authorizations on an entity (queue, topic and event hub) for sending, receiving or managing. When a device connects to the Azure Service Bus it has to use the access key name and related secure access password for the SAS policy. It’s is valid both for Queues/Topics and Event Hubs. Regarding the underlying protocol, we have secure connections based on SSL/TLS so the application protocol over the TCP/IP can be HTTPS or AMQPS. The number of SAS policies is limited up to 5 for namespace. It means that we have to use one SAS policy to group more than one devices for their authorizations.
So how can we solve this problem at scale with million of devices ?
With Event Hubs we have a more fine grained policy. We have publisher policy based on SAS Token. Using only one SAS policy to access to the Azure Service Bus, a device can request a SAS token that will be valid for a limited period of time and only for the specific device with the specified name. This name corresponds to the “virtual” publisher we saw in the previous slides. The device can use this token to send data to the Event Hub until the token expires or it is revoked. For example, we can block an “hacked” devices that is trying to produce a data flooding (we can think about a Denial of Service attack).
Finally, Event Hubs doesn’t have all the other features of Queues and Topics like dead letter queue and transaction because the main purpose for Event Hubs is the higher throughput and all above features add complexity to the system and slow down its performance.
Until now it’s all beautiful … great features but what about the pricing ?
As I said the main billing unit is the throughput unit that we pay for hour. More TU means more performance for processing partition so higher cost.
Than we have to pay the events ingested and the established connections.
The demo consist of an IoT Gateway based on a FEZ Spider board from GHI Electronics.
We have a BLE module (from Italian company Innovactive) connected to this board and we get data (temperature and humidity) from a Texas Instrument Sensor Tag. The data are transferred to the Event Hub using AMQP protocol and they are showed on a dashboard developed for PC.
Another interesting solution is to send data to the ConnectTheDots dashboard that is web based.ConnectTheDots is a project developed by Microsoft Open Technology that shows how we can connect different boards (Arduino, Raspberry Pi, .Net Gadgeteer based, …) to the Cloud using Event Hubs and Azure Stream Analytics for processing them. You can find more information about this project in the last references slide.
Here you can see the picture with the FEZ Spider board with Ethernet connection and BLE module (from Innovactive).
In red there is the Texas Instruments Sensor Tag that has a bunch on built in sensors like temperature, humidity, accelerometer and so on.
On the right we can see the monitor application with the two charts related to temperature and humidity values
Event Hubs : million events per second to the Cloud
Event Hubs: Million Events Per Second
To The Cloud
Senior Software Engineer
The Microsoft Azure hyper scale ingestion
WHO AM I ? CONTACTS
• Senior Software Engineer (Leonardo Ricerche S.r.l)
• Microsoft MVP for Windows Embedded
”... constantly moving between the devices and the cloud ...”
• «DotNetCampania» member
• «TinyCLR.it» member
• «Embedded101» board of director member
• [twitter] @ppatierno
• [email] email@example.com
• [skype] paolopat80
• Telemetry … the problem
• Microsoft Azure Service Bus …
• Messaging … Queues & Topics … the offer
• Telemetry at scale … the BIG problem
• Microsoft Azure Service Bus … again …
• Event Hubs … the solution
• Event Hubs :
• Why ?
• Against Queues & Topics
TELEMETRY ... THE PROBLEM
• Information flowing
from a device to other
systems for conveying
status of device and
• Data frequency can be
different based on
SERVICE BUS … QUEUES
• Competing Consumers pattern
• all consumers read from same stream (queue)
• Message consumed by a single consumer
SERVICE BUS … TOPICS
• Publish/Subscribe pattern
• each consumer reads from its subscription (a copy of message
on related topic)
• Message consumed by more subscribers
• It’s possible to use filters
SERVICE BUS: QUEUS & TOPICS
• Message are durably stored but with TTL
• Receive & Delete or Peek Lock
• Sessions (for FIFO feature)
• Request/Reply pattern (based on correlation)
• Transaction for batch send/receive
• Dead-letter queue (TTL or “poisoned” messages)
AT SCALE … THE BIG PROBLEM
• Hyper Scale
• Million clients
100 10,000 1,000,000
TELEMETRY … EXAMPLES
• Device Telemetry
• Houses send telemetry
every 10-15 minutes
• Cars send telemetry every
• Application Telemetry
• Performance counters are
measured every second
• Mobile applications capture
• Gaming online
• Halo …1,000,000
SERVICE BUS ... EVENT HUBS
• Partitioned Consumers pattern
• Event stream partitioned for
• Consumers pull out events in
• Producers send events in
• How producers use/address
• Directly with partition Id
• Hash based using partition
key or publisher identity
• Automatic round robin
• Default 16 partitions, min 8,
• Azure Support can enable up to
1024 (it is a very special
condition ! )
• Publish in many ways …
• No partition info (round
• Partition Id (directly)
• Partition key hashed to
select related partition
• Publisher policy
• Protocols supported
• Short lived, low-
• Long lived, high-
• Receivers are part of a
• In general, a receiver per
• Consumer Groups are views on
• Similar to topic subscriptions
• $Default consumer group
• Up to 20 named consumer
• .Net API abstraction for
receivers (Nuget package)
• Interface to handle messages
• Registration with a consumer
• Each processor acquires a lease
on a partition for failover and
• Store (Azure Storage Blobs) a
checkpoint with offset inside a
• Throughput Unit (TU)
• Ingress : 1 MB/sec (or 1000 events/sec)
• Egress : 2 MB/sec
• Retention : 84 GB/day
• Billing : hourly
• Number of partitions ≥ Throughput Units
• One TU can handle more partitions
• One TU for partition, better performance, high cost :-)
• Throughput Unit works at namespace level
• It can handle more event hubs
EVENT HUBS VS QUEUES&TOPICS
• Q&T : useful for Command Message and
Request/Replay Message (response queue)
• EH : useful for Event Messages
• Q&T : on server side. Message consumed and deleted
from queue, cursor to next available message
• EH : on client side. Client can rewind on the stream and
re-read same events (during their retention). Access
partition by offset or timestamp
• Q&T : TTL at queue/topic level or message level
• EH : max 7 days
EVENT HUBS VS QUEUES&TOPICS
• Security & Authentication
• Q&T and EH
• SSL/TLS via HTTP(S) or AMQP(S)
• SAS (Secure Access Signature) for sending/receiving
• Publisher policy (SAS Token)
• Fine grained per device
• Revoke/Restore publisher
• EH doesn't have dead lettering, transaction, ... to have higher
EVENT HUBS : PRICING
Basic: Up to 100 connections, no extension
Standard: 1000 connections incl.
Throughput Unit Hour (Basic) 0.015/0.03 TU per hour (Basic/Standard)
Ingress Events 0.028 per 1,000,000 events
Cost Brokered Connections (0 -1k) 0 Included (Basic/100, Standard/1k)
Cost Brokered Connections (1k-100k) 0.00004 connection/hour
Cost Brokered Connections (100k-500k) 0.00003 connection/hour
Cost Brokered Connections (500k+) 0.00002 connection/hour
Storage Overage >TUs*84GB
local-redundant Azure storage