@jimmydahlqvist
Challenges in IoTSolutions
• Scalability – Scale to massive amount of devices
• Monitoring – Monitor devices for connectivity
• Latency – Low latency around the globe
• Security – Ensure devices are secure and safe
• Data – Formats and volume
• Cost – Operate at a low per device cost
10.
@jimmydahlqvist
Event-Driven & Serverless
Serverless
•Scaling based on
demand
• Less operational
overhead
• Pay-per-use pricing
model
Event-Driven
• Loose coupling of
services
• Enable real-time
decisions
• Asynchronous
processing
@jimmydahlqvist
Architecture Problems -Lessons
Learned
AWS IoT Core
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
User Client
AWS Lambda
• IoT Rules as router
• Small objects in S3
• Hard to change data format
• Debugging was a nightmare
• Possible lost data
• Service coupling
15.
@jimmydahlqvist
Architecture Problems -Lessons
Learned
AWS IoT Core
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
User Client
AWS Lambda
• IoT Rules as router
• Small objects in S3
• Hard to change data format
• Debugging was a nightmare
• Possible lost data
• Service coupling
16.
@jimmydahlqvist
Architecture Problems -Lessons
Learned
AWS IoT Core
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
User Client
AWS Lambda
• IoT Rules as router
• Small objects in S3
• Hard to change data format
• Debugging was a nightmare
• Possible lost data
• Service coupling
17.
@jimmydahlqvist
Architecture Problems -Lessons
Learned
AWS IoT Core
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
User Client
AWS Lambda
• IoT Rules as router
• Small objects in S3
• Hard to change data format
• Debugging was a nightmare
• Possible lost data
• Service coupling
18.
@jimmydahlqvist
Architecture Problems -Lessons
Learned
AWS IoT Core
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
User Client
AWS Lambda
• IoT Rules as router
• Small objects in S3
• Hard to change data format
• Debugging was a nightmare
• Possible lost data
• Service coupling
19.
@jimmydahlqvist
Architecture Problems -Lessons
Learned
AWS IoT Core
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
User Client
AWS Lambda
• IoT Rules as router
• Small objects in S3
• Hard to change data format
• Debugging was a nightmare
• Possible lost data
• Service coupling
20.
@jimmydahlqvist
Architecture Problems -Lessons
Learned
AWS IoT Core
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
User Client
AWS Lambda
• IoT Rules as router
• Small objects in S3
• Hard to change data format
• Debugging was a nightmare
• Possible lost data
• Service coupling
21.
@jimmydahlqvist
Mitigating the problems
•Prevent data loss – due to failing Lambda functions
• Replace IoT Core as event-router
• Buffer data for efficient query
• Break out functions into domains and services
• Enhance debug and analytics capabilities
@jimmydahlqvist
Design Choices
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
Amazon Kinesis Amazon EventBridge AWS Step Function
Amazon Firehose Amazon OpenSearch
• Remove IoT Core as event
router
• Storage First – Send all events
to kinesis stream
• EventBridge as new event-
router
• FireHose for buffering –
OpenSearch for debug
• Containers for API and data
processing
24.
@jimmydahlqvist
Design Choices
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
Amazon Kinesis Amazon EventBridge AWS Step Function
Amazon Firehose Amazon OpenSearch
• Remove IoT Core as event
router
• Storage First – Send all events
to kinesis stream
• EventBridge as new event-
router
• FireHose for buffering –
OpenSearch for debug
• Containers for API and data
processing
25.
@jimmydahlqvist
Design Choices
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
Amazon Kinesis Amazon EventBridge AWS Step Function
Amazon Firehose Amazon OpenSearch
• Remove IoT Core as event
router
• Storage First – Send all events
to kinesis stream
• EventBridge as new event-
router
• FireHose for buffering –
OpenSearch for debug
• Containers for API and data
processing
26.
@jimmydahlqvist
Design Choices
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
Amazon Kinesis Amazon EventBridge AWS Step Function
Amazon Firehose Amazon OpenSearch
• Remove IoT Core as event
router
• Storage First – Send all events
to kinesis stream
• EventBridge as new event-
router
• FireHose for buffering –
OpenSearch for debug
• Containers for API and data
processing
27.
@jimmydahlqvist
Design Choices
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
Amazon Kinesis Amazon EventBridge AWS Step Function
Amazon Firehose Amazon OpenSearch
• Remove IoT Core as event
router
• Storage First – Send all events
to kinesis stream
• EventBridge as new event-
router
• FireHose for buffering –
OpenSearch for debug
• Containers for API and data
processing
28.
@jimmydahlqvist
Design Choices
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
Amazon Kinesis Amazon EventBridge AWS Step Function
Amazon Firehose Amazon OpenSearch
• Remove IoT Core as event
router
• Storage First – Send all events
to kinesis stream
• EventBridge as new event-
router
• FireHose for buffering –
OpenSearch for debug
• Containers for API and data
processing
29.
@jimmydahlqvist
Design Choices
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
Amazon S3
Application
Load Balancer
Amazon
Api Gateway
AWS Fargate
AWS Cloud
Amazon Kinesis Amazon EventBridge AWS Step Function
Amazon Firehose Amazon OpenSearch
• Remove IoT Core as event
router
• Storage First – Send all events
to kinesis stream
• EventBridge as new event-
router
• FireHose for buffering –
OpenSearch for debug
• Containers for API and data
processing
30.
@jimmydahlqvist
Service Decoupling
AWS IoTCore
Amazon DynamoDB
Amazon Aurora
AWS Cloud
Amazon Kinesis Amazon EventBridge
AWS Step Function
Service A
Data Transformation
AWS Step Function
Data Analytics
Amazon Firehose
AWS Fargate
Service B
Partner
Amazon EventBridge
@jimmydahlqvist
Storage First
AWS IoTCore
AWS Cloud
Amazon Kinesis Amazon EventBridge
Amazon SQS
DLQ
Amazon SQS
DLQ
Benefits
• Durability and availability
• Scalable System Design
• Asynchronous processing
Considerations
• Architectural complexity
• Eventual consistency
• Design for idempotency
36.
@jimmydahlqvist
Data Ingestion, watchout for the
poison pill
AWS IoT Core
AWS Cloud
Amazon Kinesis Amazon EventBridge
Amazon SQS
DLQ
Amazon SQS
DLQ
• Understand how AWS services
work
• If one message fail, entire batch
fail
• Bisect on failure
37.
@jimmydahlqvist
Data Transform
AWS IoTCore
AWS Cloud
Amazon Kinesis Amazon EventBridge
AWS Step Function
Transform data format
Amazon DynamoDB
AWS Step Function
Service A
Amazon SQS AWS Fargate
38.
@jimmydahlqvist
Data Transform
AWS IoTCore
AWS Cloud
Amazon Kinesis Amazon EventBridge
AWS Step Function
Transform data format
Amazon DynamoDB
AWS Step Function
Service A
AWS Fargate
• Data transform pattern
• Decouple data format from
device
• Internal data format
optimized for services
Amazon SQS
39.
@jimmydahlqvist
Data Commands &Egestion
AWS IoT Core
AWS Cloud
Amazon EventBridge
AWS Step Function
Amazon DynamoDB AWS Step Function Alarm
Command Ack
Command Ack
Command
Command
Process Command
Store Command
AWS Step Function
Remove Command Timer
Set Command Timer
Command Timer Expired
40.
@jimmydahlqvist
Data Commands &Egestion
AWS IoT Core
AWS Cloud
Amazon EventBridge
AWS Step Function
Amazon DynamoDB AWS Step Function Alarm
AWS Step Function
• Commands processed
asynchronously
• Certain commands require an ack
with a configurable timeout
• Notify and retry using CloudWatch
alarms and StepFunctions built in
support.
41.
@jimmydahlqvist
Data Processing
AWS Cloud
AmazonEventBridge
Amazon DynamoDB
Service A
Amazon SQS AWS Fargate Amazon Aurora
Service B
AWS Step Function
Application
Load Balancer
Amazon
Api Gateway
42.
@jimmydahlqvist
Data Processing
AWS Cloud
AmazonEventBridge
Amazon DynamoDB
Service A
Amazon SQS AWS Fargate Amazon Aurora
Service B
AWS Step Function
Application
Load Balancer
Amazon
Api Gateway
• Each service process and
store data individually
• Each service only listen for
specific data updates
• Some services expose data
through an API
@jimmydahlqvist
Data - Debug,Analytics & ML
AWS IoT Core
Amazon S3
AWS Cloud
Amazon Kinesis Amazon EventBridge
Amazon Firehose
Amazon Athena
Amazon OpenSearch
Amazon Firehose
AWS Glue
Amazon SageMaker
• Debug of raw events and data
using OpenSearch
• Transform data to Parquet for
faster access
• Athena and SageMaker for
queries and ML
@jimmydahlqvist
API
Application
Load Balancer
Amazon
Api Gateway
AWSFargate
AWS Cloud
Client AWS Fargate
AWS Fargate
Amazon DynamoDB
Amazon Aurora
AWS Lambda
• Containers used for API
steady use
• API gateway for authorization
and throttling
• Reuse processing and
business logic
@jimmydahlqvist
I would sayso!
• Several production environments
• Thousands to millions of devices
• Thousands of messages per second
• Vast amount of data analysed
#3 Listen to that, more than 15 billion devices, that is how many IoT enabled devices that connect to internet every day right now.
IoT devices powers out smart cities, out connected homes, connected factories and manufacturing, logistics, health care and much much more.
Today I will how we solved our Iot challenges using event-driven and serverless architectures. I will talk about the common challenges, our journey from one architecture to a refined.
We will deep dive into some parts of this solution and talk about the challenges and how serverless helped us overcome them.
#4 Hi! I'm Jimmy!
I have worked with AWS and severless since 2015, almost a decade now, and I have seen all kind of strange things.
I’m a true serverless enthusiast, the very first solution I built on AWS was serverless and I have not looked back since.
I have built serverless solutions for a variaty of companies, from startups to large enterpices.
I'm the founder of serverless-handbook.com where you can find all kind of serverless things that i have built, ranging from workshops to small architecture patterns.
And I have my blog on Jimmydqv.com
As a day-time job, and yes, I do have a daytime job, I know people have been questioning that.
I work as Head of AWS at Sigma Technology Cloud, we are an advanced services partner with AWS and do all kind of fun solutions.
If you like to know more about us, visit our booth outside....
I’m AWS Serverless Hero, AWS Ambassador, and one of user Group leader for the Scania user group.
#7 The IoT market is growing rapidly, creating a need for our solutions to handle the scale and growth. There is an estimate that by 2030 the number of connected devices would be double, and around 40 billion.
This can be our household products smart heating, speakers, light bulbs, robot vacuums, washing machines, coffee makers.
How many in here use these types connected device? I do to, we love our gadgets and I recently bought a new Philips Airfryer, and of course it was connected so I could control it remote.
But this is the home side of Iot.
The other side is the industry side of things, this could be in health care, manufacturing,
#8 I have already touched on this but IoT is all around us.
From our connected homes, to logistics with connected delivery trucks and even food orders and uber.
In healthcare we have our connected watches and things, and our cars keep sending information all the time.
IoT is a huge part of manufacturing where it can be used ot optimize production but also make sure factory conditions meet any regulations, temperature, humidity and so on.
#9 Some of the key characteristics of an iot system would be
Large scale connectivity, often we need to design our solutions to be able to handle thousand to millions of devices. Some of the clients I have worked with in this space talked about selling 250k devices every year that was suppose ot connects.
Our devices never sleep, or almost never sleep, they send data and telemetry in continuous streams.
And in some systems, we need to make real time decisions. The same client, that wanted to sell 250k devices, had exactly this requirement.
#10 - Scalability: Handling millions of devices and events
- Resilience: Ensuring uptime and reliability despite failures
- Monitoring: Monitor and manage devices
- Latency: Processing data quickly for real-time use cases
- Security: Managing device authentication, data encryption SaaS and multi tenant
- Data Storage & consistency**: Handling vast amounts of telemetry data from devices with different firmware versions and different data formats
- Cost Optimization: Avoiding over-provisioned infrastructure
#11 Reason You build an serverless and eventdriven architecture.
Loosely coupled services
Scale and fail independently
Cost effective – pay for what you use
Extensibility – easy and fast to extend
HA – built in
#13 Now let’s dive into a well tested architecture that was we built to meet demands with tens of thousands of devices that communicated 24h hours per day.
The communication was however extremely unpredictable with high spikes and long times without any data.
#14 In our first version of this system, we opted in for using IoT Core with rules routing our data to different parts of the systems. Some rules wrote directly to S3 others to DynamoDb while some invoked Lambda functions.
We needed different data stores for different use cases and services, Dynamo for our real time and time series data, d3 for analytics and aurora postgresql for state and device information.
Our services communicated with our clients over an API with API Gateway, ALB and containers in Fargate.
Even though this architecture worked OK, it did come with several problems.
#15 Some of the problems and lessons learned we had from running this solution for a while was.
#16 Using IoT Core as our event-router. Even though technically we were running and event driven architecture, extending it was extremely hard.
IoT Rules can be clunky to work with and filtering is often not that good, how you would need to write the rules and manage them was pain leading to hard time extending the system.
#17 One of our rules sent data directly into S3 for analytics. This lead to a lot of small objects in S3, since every event was written as a single object.
Glue and Athena did not really appreciate this and queries and indexing was slow. Also our S3 cost was high since we ended up with so many put operations.
#18 Sending data as is to our storage not only led to high cost in S3, but also it ended up that the IoT device dictated the data format we had in our system.
Changing it was extremely hard and upgrading our devices with new data, and new data formats was almost impossible.
#19 Don’t even get me started on the problems we had with debugging. Our primary source for time series data was in DynamoDB, and we often got questions if a certain device had sent a certain message.
Or why a certain devices indicated an incorrect state. Now, everyone that has used DynamoDB knows that it’s a great key-value store and fetching data is lightning fast, for the use cases you have thought of.
Often we needed to perfiorm scans to fetch the information we needed which made explode in number of indexes to meet future queries.
#20 We also had some possible data loss. Our device state was updated by a Lambda function that was invoked by an IoT Rule. If this function failed, the state could be wrong…
#21 And last we had some hard service coulings. Our services had a hard time operate on their own and adding new services was really hard.
#23 Looking at the mitigated architecture we did some major changes
#50 Automatic Scalability
IoT devices often send data in unpredictable traffic patterns
Event-driven by nature
IoT devices send telemetry, or events, making it event-driven by nature
Built in high availability
Services are highly available out of the box
Loose coupling and flexibility
Flexibility and extensibility in processing of IoT telemetry
#53 Not to forget!!
It powers the most amazing system in the whole world!!