SlideShare a Scribd company logo
Lessons Learnt building a
Distributed Linked List on S3
@theManikJindal | Adobe
I am HE
@theManikJindal
Computer Scientist, Adobe
P.S. You get a chocolate if you
ask a question.
What is a Distributed Linked List?
• Linked List: An ordered set of nodes, each node containing a link to the next one.
• Distributed Linked List:
• Data (nodes) is stored on multiple machines for durability and to allow
reading at scale.
• Multiple compute instances write concurrently for resiliency and to write
at scale.
What’s the problem we’re trying to solve?
• Adobe I/O Events enables our customers to create event subscriptions for events
generated by products and services across Adobe.
• Our customers can consume events via Webhook PUSH or PULL them from the
Journaling API.
• A Journal, is an ordered list of events - much like a ledger or a log where new
entries (events) are added to the end of the ledger and the ledger keeps growing.
What’s the problem we’re trying to solve?
• Each event subscription has a corresponding Journal – and every event that we
receive needs to be written to the correct Journal.
• The order of events once written into the Journal cannot be changed. Newer events
can only be added to the end. A client application reads the events in this order.
• We have millions of events per hour (and counting) and over a few thousand
subscriptions – we also need to operate in near real time.
Approach-1: We stored everything in a Database
• We stored all the events in a document database (Azure Cosmos DB).
• We ordered the events by an auto-increment id.
• We partitioned the events’ data in the database by the subscription id.
I slept at night because we did not order the events by timestamps.
Approach-1: FAILED
• Reading the events was very expensive. It required database lookups, ordering and
data transfer of a few hundred KBs per request.
• Our clients could not read events at scale - we were not able to handle more than
a dozen clients reading concurrently.
• Incident: A single client brought us down in a day by publishing large events
frequently enough. (We hit Cosmos DB’s 10 GB partition limit.)
Approach-1: Lessons Learnt
• Partition Policy: Do not partition data by something that a client has control over.
(Example: user_id, client_id, etc.)
• Databases were not the right fit for storing messages ranging from a few KBs to
hundreds of KBs.
• RTFM: Cosmos DB's pricing is based on reserved capacity in terms of a strange unit
- "RU/s". There wasn't a way to tell how much we'll need.
Approach-2: We used a mix of Database & Blob Storage
• We moved out just the event payloads to a blob storage (AWS S3), to quickly rectify
the partition limit problem.
• We also batched the event writes to reduce our storage costs.
Approach-2: Failed
• Reading events was still very expensive. It required database lookups, ordering and
then up to a hundred S3 GETs per request.
• Due to batching, writing events was actually cheaper than reading them.
• Our clients still could not read events at scale. Even after storing the event payload
separately the Database continued to be the chokepoint.
Approach-2: Lessons Learnt
• Databases were not the right fit - we never updated any records and only deleted
them based on a TTL policy.
• Because we used a managed database service, we were able to scale event reads
for the time being by throwing money at the problem.
• Batching needed to be optimized for reading and not writing. The system writes
once, clients read at least once.
Searching for a Solution: What do we know?
• A database should not be involved in the critical path of reading events. Because a
single partition will not scale and there is no partitioning policy that can distribute
loads effectively.
• Blob storage (AWS S3) is ideal for storing event payloads, but it is costly. So, we will
have to batch the events to save on costs, but we should batch them in a way that
we optimize reading costs.
Searching for a Solution: Distributed Linked List
• Let’s build a linked list of S3 objects. Each S3 object not only contains a batch of
events but also contains information to find the next S3 object on the list.
• Because each S3 object knows where the next S3 object is, we will eliminate
database from the critical path of reading events and then reading events can
potentially scale as much as S3 itself.
• Because we will batch to optimize reading costs, the cost of reading events could be
as low as S3 GET requests ($4 per 10 million requests).
Distributed Linked List – The Hard Problem
• From a system architecture standpoint, we needed multiple compute instances to
process the incoming stream of events - for both resiliency and to write events at
scale.
• However, we had a hard problem on our hands, we needed to order the writes
coming from multiple compute instances – otherwise we’d risk corrupting the data.
Well, I thought, how hard could it be? We could always acquire a lock...
Distributed Linked List – Distributed Locking
• To order the writes from multiple compute instances, we used a distributed lock on
Redis. A compute instance would acquire a lock to insert into the list.
• Turned out that distributed locks could not always guarantee mutual exclusion. If
the lock failed the results would have been catastrophic, and hence, we could not
take a chance – howsoever small.
• To rectify this we explored ZooKeeper and sought help to find a lock that could
work for us – in vain.
A good read on distributed locks by Martin Kleppmann - https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
Distributed Linked List – Zookeeper
No major cloud provider provides a managed ZooKeeper service.
If you want us to run ZooKeeper ourselves, we will need another
full time DevOps engineer.
Sathyajith Bhat
(Senior DevOps Engineer)
Distributed Linked List – Say No to Distributed Locks
The difference between ‘a distributed lock’ and ‘a distributed lock that
works in practice and is on the critical path for high throughput writes
and magically is not the bottleneck’ is ~50 person years. [sic]
Michael Marth
(Director, Engineering)
Distributed Linked List – A Lock Free Solution
• In the first step we batch and store the event payloads in blob storage (S3), and
then order the different batches using an auto-increment id in database (MySQL).
This step can be performed independently by each compute instance.
• In the second step we transfer the database order to blob storage. The algorithm to
do so is idempotent and, hence, works lock free with multiple compute instances.
We even mathematically proved the algorithm for correctness.
Inspiring Architecture Design – AWS SQS
• SQS ensures that all messages are processed by the means of client
acknowledgement. This allowed us to not worry about losing any data and we
based our internal event stream on an SQS-like queue.
• SQS also helps distributes load amongst the worker, which came in really handy for
us to scale in and out depending on the traffic.
Inspiring Algorithm Design – AWS RDS MySQL
• We love managed services and RDS was a perfect choice for us – guaranteeing both
durability and performance.
• We use an auto-increment id to order writes. Even though MySQL’s auto increment
id is very performant, we still reduced the number of times we used it. We did so by
only inserting a row in the table for a batch of events and not for every event.
• We also made sure to delete older rows that have already been processed. This way
any internal locking that MySQL performs has lesser rows to contend over.
Inspiring Algorithm Design – AWS S3
• An S3 object cannot be partially updated or appended to.
• Hence, we could not construct a linked list in the traditional sense, i.e., update the
pointer to the next node.
• Instead, we had to know the location of the next S3 object in advance.
Inspiring API Design – AWS S3
• Not only does AWS S3 charge by the number of requests, even the performance
limits on a partition are defined in terms of the number of requests/second.
• Originally, in an API call a client could specify the number of events it wanted to
consume. However, this simple feature took away our control over the number of
S3 GET requests we in-turn had to make.
• We changed the API definition so that a client can now only specify a limit on the
number of events and the API is allowed to give them a lesser number of events.
Inspiring Architecture Design – AWS S3
• AWS S3 partitions the data using object key prefixes and the performance limits on
S3 are also defined on a per partition level.
• Because we wanted to maximize the performance we spread out our object keys
and used completely random GUIDs as the object keys.
• Due to performance considerations, we also added the capability to add more S3
buckets at runtime so the system can start distributing load over them, if needed.
Results – Performance
Batching
60%
S3 Writes
8%
Database
2%
Time
Delay
30%
Average Time Taken to write a batch
of Events
~ 2 seconds
S3 Reads
67%Auth
33%
Average Time Taken to read a batch
of Events
~ 0.06
seconds
Results – Costs
11.11
4.50
10.80
26.41
0.72
7.20
0.29
8.21
0.00
5.00
10.00
15.00
20.00
25.00
30.00
DB + Redis S3 Writes S3 Reads Total
Cost of ingesting 1M events/min for 100 clients. ($/hour)
Approach-2 Now
Conclusion: What did we learn?
• Always understand the problem you’re trying to solve. Find out how the system is
intended to be used and design accordingly. Also, find out the aspects in which the
system is allowed to be less than perfect – for us it was being near real time.
• Do no think of a system architecture and then try to fit various services in it later,
rather design the architecture in the stride of the underlying components you use.
• Listen to experienced folks to cut losses early. (Distributed Locks, ZooKeeper)
Find me everywhere as
@theManikJindal
Please spare two minutes for feedback
https://tinyurl.com/tmj-dll

More Related Content

What's hot

Design Patterns using Amazon DynamoDB
 Design Patterns using Amazon DynamoDB Design Patterns using Amazon DynamoDB
Design Patterns using Amazon DynamoDB
Amazon Web Services
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Seunghyun Lee
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
HostedbyConfluent
 
Monitoring with Dynatrace Presentation.pptx
Monitoring with Dynatrace Presentation.pptxMonitoring with Dynatrace Presentation.pptx
Monitoring with Dynatrace Presentation.pptx
Knoldus Inc.
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
advaitdeo
 
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Amazon Web Services
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
Amazon Web Services
 
Count min sketch
Count min sketchCount min sketch
Count min sketch
DaeMyung Kang
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
Aleksandr Kuboskin, CFA
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Amazon Web Services Korea
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
confluent
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Cloud-Native Observability
Cloud-Native ObservabilityCloud-Native Observability
Cloud-Native Observability
Tyler Treat
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
Amazon Web Services
 
Road to (Enterprise) Observability
Road to (Enterprise) ObservabilityRoad to (Enterprise) Observability
Road to (Enterprise) Observability
Christoph Engelbert
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
Amazon Web Services
 
AWS DynamoDB
AWS DynamoDBAWS DynamoDB
AWS DynamoDB
Suman Debnath
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
Cloudera, Inc.
 

What's hot (20)

Design Patterns using Amazon DynamoDB
 Design Patterns using Amazon DynamoDB Design Patterns using Amazon DynamoDB
Design Patterns using Amazon DynamoDB
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
 
Monitoring with Dynatrace Presentation.pptx
Monitoring with Dynatrace Presentation.pptxMonitoring with Dynatrace Presentation.pptx
Monitoring with Dynatrace Presentation.pptx
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
 
Count min sketch
Count min sketchCount min sketch
Count min sketch
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Cloud-Native Observability
Cloud-Native ObservabilityCloud-Native Observability
Cloud-Native Observability
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
Road to (Enterprise) Observability
Road to (Enterprise) ObservabilityRoad to (Enterprise) Observability
Road to (Enterprise) Observability
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
AWS DynamoDB
AWS DynamoDBAWS DynamoDB
AWS DynamoDB
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 

Similar to Lessons learnt building a Distributed Linked List on S3

(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
Amazon Web Services
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
Amazon Web Services
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
Amazon Web Services
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
Amazon Web Services
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Amazon Web Services
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
Amazon Web Services
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
Ozgun Erdogan
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
kartraj
 
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
Amazon Web Services
 
AWS Canberra WWPS Summit 2013 - AWS for Web Applications
AWS Canberra WWPS Summit 2013 - AWS for Web ApplicationsAWS Canberra WWPS Summit 2013 - AWS for Web Applications
AWS Canberra WWPS Summit 2013 - AWS for Web Applications
Amazon Web Services
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Amazon Web Services
 
AWS Big Data in everyday use at Yle
AWS Big Data in everyday use at YleAWS Big Data in everyday use at Yle
AWS Big Data in everyday use at Yle
Rolf Koski
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
Amazon Web Services
 
Architecting applications in the AWS cloud
Architecting applications in the AWS cloudArchitecting applications in the AWS cloud
Architecting applications in the AWS cloud
Cloud Genius
 
Serverless at Lifestage
Serverless at LifestageServerless at Lifestage
Serverless at Lifestage
BATbern
 

Similar to Lessons learnt building a Distributed Linked List on S3 (20)

(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
 
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
 
AWS Canberra WWPS Summit 2013 - AWS for Web Applications
AWS Canberra WWPS Summit 2013 - AWS for Web ApplicationsAWS Canberra WWPS Summit 2013 - AWS for Web Applications
AWS Canberra WWPS Summit 2013 - AWS for Web Applications
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
AWS Big Data in everyday use at Yle
AWS Big Data in everyday use at YleAWS Big Data in everyday use at Yle
AWS Big Data in everyday use at Yle
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Architecting applications in the AWS cloud
Architecting applications in the AWS cloudArchitecting applications in the AWS cloud
Architecting applications in the AWS cloud
 
Serverless at Lifestage
Serverless at LifestageServerless at Lifestage
Serverless at Lifestage
 

More from AWS User Group Bengaluru

Demystifying identity on AWS
Demystifying identity on AWSDemystifying identity on AWS
Demystifying identity on AWS
AWS User Group Bengaluru
 
AWS Secrets for Best Practices
AWS Secrets for Best PracticesAWS Secrets for Best Practices
AWS Secrets for Best Practices
AWS User Group Bengaluru
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
AWS User Group Bengaluru
 
Lessons learnt building a Distributed Linked List on S3
Lessons learnt building a Distributed Linked List on S3Lessons learnt building a Distributed Linked List on S3
Lessons learnt building a Distributed Linked List on S3
AWS User Group Bengaluru
 
Medlife journey with AWS
Medlife journey with AWSMedlife journey with AWS
Medlife journey with AWS
AWS User Group Bengaluru
 
Building Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWSBuilding Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWS
AWS User Group Bengaluru
 
Exploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful careerExploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful career
AWS User Group Bengaluru
 
Slack's transition away from a single AWS account
Slack's transition away from a single AWS accountSlack's transition away from a single AWS account
Slack's transition away from a single AWS account
AWS User Group Bengaluru
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
AWS User Group Bengaluru
 
Serverless Culture
Serverless CultureServerless Culture
Serverless Culture
AWS User Group Bengaluru
 
Refactoring to serverless
Refactoring to serverlessRefactoring to serverless
Refactoring to serverless
AWS User Group Bengaluru
 
Amazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances WorkshopAmazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances Workshop
AWS User Group Bengaluru
 
Building Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWSBuilding Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWS
AWS User Group Bengaluru
 
Medlife's journey with AWS from 0(zero) orders to 6 digit mark
Medlife's journey with AWS from 0(zero) orders to 6 digit markMedlife's journey with AWS from 0(zero) orders to 6 digit mark
Medlife's journey with AWS from 0(zero) orders to 6 digit mark
AWS User Group Bengaluru
 
AWS Secrets for Best Practices
AWS Secrets for Best PracticesAWS Secrets for Best Practices
AWS Secrets for Best Practices
AWS User Group Bengaluru
 
Exploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful careerExploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful career
AWS User Group Bengaluru
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
AWS User Group Bengaluru
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
AWS User Group Bengaluru
 
Cost Optimization in AWS
Cost Optimization in AWSCost Optimization in AWS
Cost Optimization in AWS
AWS User Group Bengaluru
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
AWS User Group Bengaluru
 

More from AWS User Group Bengaluru (20)

Demystifying identity on AWS
Demystifying identity on AWSDemystifying identity on AWS
Demystifying identity on AWS
 
AWS Secrets for Best Practices
AWS Secrets for Best PracticesAWS Secrets for Best Practices
AWS Secrets for Best Practices
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
 
Lessons learnt building a Distributed Linked List on S3
Lessons learnt building a Distributed Linked List on S3Lessons learnt building a Distributed Linked List on S3
Lessons learnt building a Distributed Linked List on S3
 
Medlife journey with AWS
Medlife journey with AWSMedlife journey with AWS
Medlife journey with AWS
 
Building Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWSBuilding Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWS
 
Exploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful careerExploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful career
 
Slack's transition away from a single AWS account
Slack's transition away from a single AWS accountSlack's transition away from a single AWS account
Slack's transition away from a single AWS account
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
 
Serverless Culture
Serverless CultureServerless Culture
Serverless Culture
 
Refactoring to serverless
Refactoring to serverlessRefactoring to serverless
Refactoring to serverless
 
Amazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances WorkshopAmazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances Workshop
 
Building Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWSBuilding Efficient, Scalable and Resilient Front-end logging service with AWS
Building Efficient, Scalable and Resilient Front-end logging service with AWS
 
Medlife's journey with AWS from 0(zero) orders to 6 digit mark
Medlife's journey with AWS from 0(zero) orders to 6 digit markMedlife's journey with AWS from 0(zero) orders to 6 digit mark
Medlife's journey with AWS from 0(zero) orders to 6 digit mark
 
AWS Secrets for Best Practices
AWS Secrets for Best PracticesAWS Secrets for Best Practices
AWS Secrets for Best Practices
 
Exploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful careerExploring opportunities with communities for a successful career
Exploring opportunities with communities for a successful career
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
 
Cost Optimization in AWS
Cost Optimization in AWSCost Optimization in AWS
Cost Optimization in AWS
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Lessons learnt building a Distributed Linked List on S3

  • 1. Lessons Learnt building a Distributed Linked List on S3 @theManikJindal | Adobe
  • 2. I am HE @theManikJindal Computer Scientist, Adobe P.S. You get a chocolate if you ask a question.
  • 3. What is a Distributed Linked List? • Linked List: An ordered set of nodes, each node containing a link to the next one. • Distributed Linked List: • Data (nodes) is stored on multiple machines for durability and to allow reading at scale. • Multiple compute instances write concurrently for resiliency and to write at scale.
  • 4. What’s the problem we’re trying to solve? • Adobe I/O Events enables our customers to create event subscriptions for events generated by products and services across Adobe. • Our customers can consume events via Webhook PUSH or PULL them from the Journaling API. • A Journal, is an ordered list of events - much like a ledger or a log where new entries (events) are added to the end of the ledger and the ledger keeps growing.
  • 5. What’s the problem we’re trying to solve? • Each event subscription has a corresponding Journal – and every event that we receive needs to be written to the correct Journal. • The order of events once written into the Journal cannot be changed. Newer events can only be added to the end. A client application reads the events in this order. • We have millions of events per hour (and counting) and over a few thousand subscriptions – we also need to operate in near real time.
  • 6. Approach-1: We stored everything in a Database • We stored all the events in a document database (Azure Cosmos DB). • We ordered the events by an auto-increment id. • We partitioned the events’ data in the database by the subscription id. I slept at night because we did not order the events by timestamps.
  • 7. Approach-1: FAILED • Reading the events was very expensive. It required database lookups, ordering and data transfer of a few hundred KBs per request. • Our clients could not read events at scale - we were not able to handle more than a dozen clients reading concurrently. • Incident: A single client brought us down in a day by publishing large events frequently enough. (We hit Cosmos DB’s 10 GB partition limit.)
  • 8. Approach-1: Lessons Learnt • Partition Policy: Do not partition data by something that a client has control over. (Example: user_id, client_id, etc.) • Databases were not the right fit for storing messages ranging from a few KBs to hundreds of KBs. • RTFM: Cosmos DB's pricing is based on reserved capacity in terms of a strange unit - "RU/s". There wasn't a way to tell how much we'll need.
  • 9. Approach-2: We used a mix of Database & Blob Storage • We moved out just the event payloads to a blob storage (AWS S3), to quickly rectify the partition limit problem. • We also batched the event writes to reduce our storage costs.
  • 10. Approach-2: Failed • Reading events was still very expensive. It required database lookups, ordering and then up to a hundred S3 GETs per request. • Due to batching, writing events was actually cheaper than reading them. • Our clients still could not read events at scale. Even after storing the event payload separately the Database continued to be the chokepoint.
  • 11. Approach-2: Lessons Learnt • Databases were not the right fit - we never updated any records and only deleted them based on a TTL policy. • Because we used a managed database service, we were able to scale event reads for the time being by throwing money at the problem. • Batching needed to be optimized for reading and not writing. The system writes once, clients read at least once.
  • 12. Searching for a Solution: What do we know? • A database should not be involved in the critical path of reading events. Because a single partition will not scale and there is no partitioning policy that can distribute loads effectively. • Blob storage (AWS S3) is ideal for storing event payloads, but it is costly. So, we will have to batch the events to save on costs, but we should batch them in a way that we optimize reading costs.
  • 13. Searching for a Solution: Distributed Linked List • Let’s build a linked list of S3 objects. Each S3 object not only contains a batch of events but also contains information to find the next S3 object on the list. • Because each S3 object knows where the next S3 object is, we will eliminate database from the critical path of reading events and then reading events can potentially scale as much as S3 itself. • Because we will batch to optimize reading costs, the cost of reading events could be as low as S3 GET requests ($4 per 10 million requests).
  • 14. Distributed Linked List – The Hard Problem • From a system architecture standpoint, we needed multiple compute instances to process the incoming stream of events - for both resiliency and to write events at scale. • However, we had a hard problem on our hands, we needed to order the writes coming from multiple compute instances – otherwise we’d risk corrupting the data. Well, I thought, how hard could it be? We could always acquire a lock...
  • 15. Distributed Linked List – Distributed Locking • To order the writes from multiple compute instances, we used a distributed lock on Redis. A compute instance would acquire a lock to insert into the list. • Turned out that distributed locks could not always guarantee mutual exclusion. If the lock failed the results would have been catastrophic, and hence, we could not take a chance – howsoever small. • To rectify this we explored ZooKeeper and sought help to find a lock that could work for us – in vain. A good read on distributed locks by Martin Kleppmann - https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
  • 16. Distributed Linked List – Zookeeper No major cloud provider provides a managed ZooKeeper service. If you want us to run ZooKeeper ourselves, we will need another full time DevOps engineer. Sathyajith Bhat (Senior DevOps Engineer)
  • 17. Distributed Linked List – Say No to Distributed Locks The difference between ‘a distributed lock’ and ‘a distributed lock that works in practice and is on the critical path for high throughput writes and magically is not the bottleneck’ is ~50 person years. [sic] Michael Marth (Director, Engineering)
  • 18. Distributed Linked List – A Lock Free Solution • In the first step we batch and store the event payloads in blob storage (S3), and then order the different batches using an auto-increment id in database (MySQL). This step can be performed independently by each compute instance. • In the second step we transfer the database order to blob storage. The algorithm to do so is idempotent and, hence, works lock free with multiple compute instances. We even mathematically proved the algorithm for correctness.
  • 19. Inspiring Architecture Design – AWS SQS • SQS ensures that all messages are processed by the means of client acknowledgement. This allowed us to not worry about losing any data and we based our internal event stream on an SQS-like queue. • SQS also helps distributes load amongst the worker, which came in really handy for us to scale in and out depending on the traffic.
  • 20. Inspiring Algorithm Design – AWS RDS MySQL • We love managed services and RDS was a perfect choice for us – guaranteeing both durability and performance. • We use an auto-increment id to order writes. Even though MySQL’s auto increment id is very performant, we still reduced the number of times we used it. We did so by only inserting a row in the table for a batch of events and not for every event. • We also made sure to delete older rows that have already been processed. This way any internal locking that MySQL performs has lesser rows to contend over.
  • 21. Inspiring Algorithm Design – AWS S3 • An S3 object cannot be partially updated or appended to. • Hence, we could not construct a linked list in the traditional sense, i.e., update the pointer to the next node. • Instead, we had to know the location of the next S3 object in advance.
  • 22. Inspiring API Design – AWS S3 • Not only does AWS S3 charge by the number of requests, even the performance limits on a partition are defined in terms of the number of requests/second. • Originally, in an API call a client could specify the number of events it wanted to consume. However, this simple feature took away our control over the number of S3 GET requests we in-turn had to make. • We changed the API definition so that a client can now only specify a limit on the number of events and the API is allowed to give them a lesser number of events.
  • 23. Inspiring Architecture Design – AWS S3 • AWS S3 partitions the data using object key prefixes and the performance limits on S3 are also defined on a per partition level. • Because we wanted to maximize the performance we spread out our object keys and used completely random GUIDs as the object keys. • Due to performance considerations, we also added the capability to add more S3 buckets at runtime so the system can start distributing load over them, if needed.
  • 24. Results – Performance Batching 60% S3 Writes 8% Database 2% Time Delay 30% Average Time Taken to write a batch of Events ~ 2 seconds S3 Reads 67%Auth 33% Average Time Taken to read a batch of Events ~ 0.06 seconds
  • 25. Results – Costs 11.11 4.50 10.80 26.41 0.72 7.20 0.29 8.21 0.00 5.00 10.00 15.00 20.00 25.00 30.00 DB + Redis S3 Writes S3 Reads Total Cost of ingesting 1M events/min for 100 clients. ($/hour) Approach-2 Now
  • 26. Conclusion: What did we learn? • Always understand the problem you’re trying to solve. Find out how the system is intended to be used and design accordingly. Also, find out the aspects in which the system is allowed to be less than perfect – for us it was being near real time. • Do no think of a system architecture and then try to fit various services in it later, rather design the architecture in the stride of the underlying components you use. • Listen to experienced folks to cut losses early. (Distributed Locks, ZooKeeper)
  • 27. Find me everywhere as @theManikJindal Please spare two minutes for feedback https://tinyurl.com/tmj-dll