A journey through cosmos - 5th el

•Download as PPTX, PDF•

1 like•346 views

The topics we will be covering in this talk: 1. Introduction - Briefly provide business context to appreciate the need to solve this problem, and challenges involved. 2. The factors driving the decision to choose Cosmos DB as our backend store. 3, Key insights into what drives cost of the store, and various gotchas involved when designing such a system. 4. How to optimize the cost and bring intelligence to enable auto-scalability. 5. The need for building a multi version concurrency control and how to achieve it to enable parallel writes with multiple schema versions for the same record. 6. The tradeoff between readability and storage cost, and how to get the best of both worlds by using inflight abbreviated compression.

Technology

A journey
through Cosmos
avinash.ramakanth@inmobi.com

Overview
Problem ?
High-throughput User Store
Choice of technology
Cosmos DB
Design journey Optimization layer

Customer Data Platform
● Typical DSP loads: 250k - 1M req/sec
● Overall latencies: p99 < 50ms
● Data response latencies: p99 < 10ms
● Geo Replicated and Redundant
● Scalable

Problem ?
High-throughput User Store
Choice of technology
Cosmos DB
Design journey Optimization layer

On Prem Cloud Native
Theoretical caps on scale No such caps
Scalable in steps Granular scalability
Cost of maintenance and
upgrades
Amortized cost
(Man hours and infra)
In house skill set Customer Care
Custom buildable Not as much flexibility

Rationale
● Cloud native
○ To mitigate the limits of scaling business
● Inmobi - Microsoft Partnership
● Top contenders
○ Cosmos DB, Aerospike

Cosmos DB
● Cloud native alternative for
Cassandra/Aerospike/MongoDB
● Azure equivalent of AWS Dynamodb with a few extra
bells and whistles
● Document store supporting point lookups and queries
○ Fetch document given unique ID
○ SQL type queries spanning across documents

Data Splits
● Extending the observations on cost of read and write, data can be
split in unique ways to enable various use cases, based on access
patterns
● User timelines
○ Date boundaries
○ Container Splitting
● User profiles
○ Apps owned

● The InMobi team determines the primary and secondary key,
based on read-write pattern and distribution of records.
● Apart from the read-write pattern, we also consider
○ immutability of records,
○ record size,
○ item level TTL.
Data modelling

Data modelling
Useful models at InMobi:
● User level aggregates - for aggregate profiles
● Time based partitioning - for immutable timelines
● Enable top level keys in document which give flexibility for
design change

Indexing + Querying
{
"locations": [
{ "country": "Germany", "city":
"Berlin" },
{ "country": "France", "city":
"Paris" }
],
"headquarters": { "country":
"Belgium", "employees": 250 },
"exports": [
{ "city": "Moscow" },
{ "city": "Athens" }
]
}

RU = f(docSize, partitions)
● Read RUs are directly proportional to document size
Regardless of increase in partition count and collection size
● RUs consumed for getting non existing document is constant
Regardless of increase in partition count and collection size
● As the collection size grows, though query costs remain constant
The minimum provisioning keeps growing

Geo Replication
● Asymmetry in data access
● Outbound bandwidth costs are applicable

Cost of Degradation
“HTTP Status Code 429: The user has sent too many requests in a given amount of time (rate limiting).”

Cost of Degradation
If provisioned for 100 read calls/sec (assume 100RU) and bombarded with say
1000 read calls/sec, we will encounter HTTP 429 error.
The behaviour observed would be <100 calls would succeed as the failed calls
would also consume resources.

Cost of degradation
● Recommendation: Honor backoff
● Corner case : In serving systems, such backoff cannot be honored, leaving
the only solution as scaling out systems.

Factors to consider when autoscaling
● Noise in traffic
● Skew in partitions
● Degradation costs
● Hourly billing

Multiversion Concurrency Control
Version 1.0 - Client C1 Version 2.0 -
Client C2

Summary
● On Prem vs Cloud Native
● Levers to optimize cost
and performance
○ Size of documents
○ Data models to enable
document splitting
○ Autoscaler
○ Data compression
Future looking
● Partial document updates
● AI to tune autoscaler, and
handle burst traffic
● Enable multi-region write

We would love to learn about your typical use
cases of Cosmos DB.
how do you approach costs in your system?

Collaborators:
avinash.ramakanth@inmobi.com
utkarsh.kumar@inmobi.com
jainesh.patel@inmobi.com

What's hot

Snowball 180625113523Guna Shekar

In-Memory Computing: How, Why? and common PatternsSrinath Perera

0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong

SOLR Power FTW: short versionAlex Pinkin

Office 365 + Azure service busVitaly Zhukov

DAX: A Widely Distributed Multi-tenant Storage Service for DBMS HostingRui Liu

Dynamo db and Cross Region MigrationAnamika Gupta

No(Geo)SQLNicolasgmail.com Helleringer

RubiXShubham Tagra

Nosql databases for the .net developerJesus Rodriguez

Hypercubes In HbaseGeorge Ang

Building a Directed Graph with MongoDBTony Tam

Update on Crimson - the Seastarized Ceph - Seastar SummitScyllaDB

Building Expedia’s Travel Graph using MongoDBMongoDB

Benchmarking your cloud performance with top 4 global public cloudsdata://disrupted®

Alluxio Data Orchestration Platform for the CloudShubham Tagra

Introduce to sparkYen Hao Huang

Amazon Web Services lection 4 Binary Studio

Cimagraphi8Pablo Vilanez

Apache Gobblin at MZMichael Dreibelbis

What's hot (20)

Snowball 180625113523

In-Memory Computing: How, Why? and common Patterns

0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2

SOLR Power FTW: short version

Office 365 + Azure service bus

DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting

Dynamo db and Cross Region Migration

No(Geo)SQL

RubiX

Nosql databases for the .net developer

Hypercubes In Hbase

Building a Directed Graph with MongoDB

Update on Crimson - the Seastarized Ceph - Seastar Summit

Building Expedia’s Travel Graph using MongoDB

Benchmarking your cloud performance with top 4 global public clouds

Alluxio Data Orchestration Platform for the Cloud

Introduce to spark

Amazon Web Services lection 4

Cimagraphi8

Apache Gobblin at MZ

Similar to A journey through cosmos - 5th el

MongoDB 4.0 새로운 기능 소개Ha-Yang(White) Moon

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax

MongoDB 3.4 webinarAndrew Morgan

AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics

Azure DocumentDB OverviewAndrew Liu

Taking High Performance Computing to the Cloud: Windows HPC and Saptak Sen

Ops Jumpstart: MongoDB Administration 101MongoDB

Gcp dataflowIgor Roiter

Introduction to Azure DocumentDBDenny Lee

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks

AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...Amazon Web Services

Solving enterprise challenges through scale out storage & big compute finalAvere Systems

Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...Ramprasad Nagaraja

MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB

Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit

Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty

Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales

Design Like a Pro: How to Pick the Right System ArchitectureInductive Automation

Public Cloud WorkshopAmer Ather

Similar to A journey through cosmos - 5th el (20)

MongoDB 4.0 새로운 기능 소개

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...

MongoDB 3.4 webinar

AquaQ Analytics Kx Event - Data Direct Networks Presentation

Azure DocumentDB Overview

Taking High Performance Computing to the Cloud: Windows HPC and

Ops Jumpstart: MongoDB Administration 101

Gcp dataflow

Introduction to Azure DocumentDB

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...

Solving enterprise challenges through scale out storage & big compute final

Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...

MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform

Challenges of Implementing an Advanced SQL Engine on Hadoop

Big Data in 200 km/h | AWS Big Data Demystified #1.3

Challenges of Building a First Class SQL-on-Hadoop Engine

Design Like a Pro: How to Pick the Right System Architecture

Public Cloud Workshop

Recently uploaded

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Recently uploaded (20)

Understanding the Laravel MVC Architecture

Breaking the Kubernetes Kill Chain: Host Path Mount

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Salesforce Community Group Quito, Salesforce 101

IAC 2024 - IA Fast Track to Search Focused AI Solutions

My Hashitalk Indonesia April 2024 Presentation

08448380779 Call Girls In Friends Colony Women Seeking Men

Google AI Hackathon: LLM based Evaluator for RAG

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

CNv6 Instructor Chapter 6 Quality of Service

Finology Group – Insurtech Innovation Award 2024

A Domino Admins Adventures (Engage 2024)

Boost PC performance: How more available memory can improve productivity

GenCyber Cyber Security Day Presentation

Injustice - Developers Among Us (SciFiDevCon 2024)

A journey through cosmos - 5th el

1. A journey through Cosmos avinash.ramakanth@inmobi.com

2. Avinash.Ramakanth @inmobi.com

3. Overview Problem ? High-throughput User Store Choice of technology Cosmos DB Design journey Optimization layer

4. DSP @ InMobi

6. Customer Data Platform ● Typical DSP loads: 250k - 1M req/sec ● Overall latencies: p99 < 50ms ● Data response latencies: p99 < 10ms ● Geo Replicated and Redundant ● Scalable

7. Problem ? High-throughput User Store Choice of technology Cosmos DB Design journey Optimization layer

8. On Prem Cloud Native Theoretical caps on scale No such caps Scalable in steps Granular scalability Cost of maintenance and upgrades Amortized cost (Man hours and infra) In house skill set Customer Care Custom buildable Not as much flexibility

9. Rationale ● Cloud native ○ To mitigate the limits of scaling business ● Inmobi - Microsoft Partnership ● Top contenders ○ Cosmos DB, Aerospike

10. Cosmos DB ● Cloud native alternative for Cassandra/Aerospike/MongoDB ● Azure equivalent of AWS Dynamodb with a few extra bells and whistles ● Document store supporting point lookups and queries ○ Fetch document given unique ID ○ SQL type queries spanning across documents

11. Problem ? High-throughput User Store Choice of technology Cosmos DB Design journey Optimization layer

12. How should we store data ?

13. Database -> Collection -> Documents

14. Cost units

15. Document Size

16.

17. Data Splits ● Extending the observations on cost of read and write, data can be split in unique ways to enable various use cases, based on access patterns ● User timelines ○ Date boundaries ○ Container Splitting ● User profiles ○ Apps owned

18. ● The InMobi team determines the primary and secondary key, based on read-write pattern and distribution of records. ● Apart from the read-write pattern, we also consider ○ immutability of records, ○ record size, ○ item level TTL. Data modelling

19. Data modelling Useful models at InMobi: ● User level aggregates - for aggregate profiles ● Time based partitioning - for immutable timelines ● Enable top level keys in document which give flexibility for design change

20. Did you say queries on a NoSql DB ?

21. Indexing + Querying { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarters": { "country": "Belgium", "employees": 250 }, "exports": [ { "city": "Moscow" }, { "city": "Athens" } ] }

22. Indexing

23. How do we scale ?

24. Partitioning

25. RU = f(docSize, partitions) ● Read RUs are directly proportional to document size Regardless of increase in partition count and collection size ● RUs consumed for getting non existing document is constant Regardless of increase in partition count and collection size ● As the collection size grows, though query costs remain constant The minimum provisioning keeps growing

26. Provisioning skew

27. Geo Replication ● Asymmetry in data access ● Outbound bandwidth costs are applicable

28. The numbers don’t add up !

29. Cost of Degradation “HTTP Status Code 429: The user has sent too many requests in a given amount of time (rate limiting).”

30. Cost of Degradation If provisioned for 100 read calls/sec (assume 100RU) and bombarded with say 1000 read calls/sec, we will encounter HTTP 429 error. The behaviour observed would be <100 calls would succeed as the failed calls would also consume resources.

31. Cost of degradation ● Recommendation: Honor backoff ● Corner case : In serving systems, such backoff cannot be honored, leaving the only solution as scaling out systems.

32. Problem ? High-throughput User Store Choice of technology Cosmos DB Design journey Optimization layer

33. This is too costly. Help !

34. Autoscaling

35. Factors to consider when autoscaling ● Noise in traffic ● Skew in partitions ● Degradation costs ● Hourly billing

36. Autoscaler in action

37. Document compression

38. We need partial updates !

39. Multiversion Concurrency Control Version 1.0 - Client C1 Version 2.0 - Client C2

40. Multiversion Concurrency Control

41. Problem ? High-throughput User Store Choice of technology Cosmos DB Design journey Optimization layer

42. Summary ● On Prem vs Cloud Native ● Levers to optimize cost and performance ○ Size of documents ○ Data models to enable document splitting ○ Autoscaler ○ Data compression Future looking ● Partial document updates ● AI to tune autoscaler, and handle burst traffic ● Enable multi-region write

43. We would love to learn about your typical use cases of Cosmos DB. how do you approach costs in your system?

44. Collaborators: avinash.ramakanth@inmobi.com utkarsh.kumar@inmobi.com jainesh.patel@inmobi.com

Editor's Notes

The major business for InMobi is mobile advertising. Which is to show the right advertisement to the right users at the right opportunity.
The flow of information is from right to left and flow of money is from left to right Publishers are apps which a user is engaging with SSP aggregates all ad requests from various apps Exchanges aggregate requests from various SSPs DSP listens to multiple Exchanges and using intelligence powered by its DMP, responds with the best possible ad for that user at that point in time The overall latency of this system is typically within 50ms Consequently the latency with DMP < 10ms
The rate of ad requests received directly impacts the money spent and varies hugely, based on time of day, day of the week and various other seasonal and geographical factors e.g. Cricket World Cup causes increase in content consumption by users. The number of ads served drives the revenue generated and is dependent on the advertising funds available and not on the number of requests received. InMobi operates at a global scale and so it needs to serve ads across various geographies while adhering to latency requirements. A huge global presence implies that the traffic patterns vary widely across geographies e.g. The US and India have huge differences in terms of peak traffic times, seasonal behaviours like Diwali or Christmas, types of apps used etc. An ad request from a user can land on one of many InMobi’s geo redundant serving systems. We need the user’s information to be consistently available across all serving regions, to serve ads uniformly. Global consistency also enables us to support easy fallbacks, load balancing and account for the fact that users travel. This could be achieved using either a master-master or master-slave setup, either ways, globally consistent view of a user’s information is critical.
Upgrades - Security, performance, features etc. Common scenario where someone sets up a system and is no longer working on it 6 months down the line, when it breaks and adds to the pain of maintanence
Why Cosmos DB ? The story behing this choice Redis, Cassandra vs Aerospike - Aerospike. Reliability, scalability etc. Inhouse expertise with Aerospike. Cosmos vs Dynamo - Due to partnership makes more financial sense to chose Cosmos DB.
Note this is not an exhaustive list of capabilities of Cosmos DB, but the major use cases for our needs.
The first question we ask ourselves is where and how is the data stored
This follows with a logical next step of what is the cost of this system and how is it measured ? Introduce Request Units: It is a uber number denoting the CPU/Memory/IO utilization on Cosmos DB. Measured per read/write/query etc.
Now that we know about documents and RU, how do they relate ?? Did you observe the drastic changes in RU utilization at 16, 32, 64kB?
32KB document split into 16KB documents - If every write on big document translates to 2 small write, total cost would not change. 24KB document split into 16KB + 8KB document - If every write on big document translates to 2 writes on smaller documents, the total cost after split would be higher, since the cost of 24KB write = cost of 16KB write. If each write on a big document translates to only one write on a smaller document, then splitting a 32KB document would mean your write cost would halve straight away. Whereas for a 24KB document the cost benefit would not be as significant even when doing a single write on one of the smaller documents. In contrast, a case where we see significant benefit is when working with large documents. Reading multiple split documents is cheaper than reading one big document under certain conditions - if a 40KB document is split into 4x10KB documents, all with the same partition key, total read and write RU falls significantly in the latter case.
So the main takeaway is keep document sizes small, This may result in bigger collections, which can be split into smaller collections
Indexing is required to enable _ttl and queries. With full indexing enabled, it will be possible to run spark jobs using Cosmos DB as the backend DB
Indexing increases write costs, while the read costs remain stable. Avoid indexing indiscriminately, unless a cost-benefit analysis is done on the use of running queries on Cosmos DB. Alternatives include running batch jobs on dumps of the data.
Understanding costs is nice, but how does my system scale ? What are the building blocks ? Horizontal scalability is achieved using partitioning and replication
Given what we know about documents, cost and scaling factors, how do these interplay ??
The expected behaviour might be that in a second first 100 read calls would succeed and remaining 900 calls would fail. This would be the case if those 900 failure calls were not consuming any resources. But in the case of “Too many requests” (HTTP error code 429) these failed read calls also consume some amount of constant resources, which eats into your 100 success calls. Thus the behaviour is not 100 calls success and 900 calls failure, the behaviour is all 1000 calls consume resources, and few of those will succeed and rest fail, implying a successful response for less than 100 calls.
Image for skewed degradation
Cosmos DB supports scaling of RUs provisioned through API and also via the portal. Automating this scaling is a critical feature, to optimize the cost utilization. Depending on the variation in the usage pattern of Cosmos DB RUs, the auto scaling feature can make or break the whole system being built. At InMobi, the traffic observed across days is dependent on various factors as explored earlier, and if we provision our systems to the maximum required, Cosmos DB costing would become unreasonably large to build a sustainable business model.
A decrease in RU consumption on throttling - RU consumption decreases due to honouring backoff retry logic, but we have to increase the RU at this point of time. Handling hot partitions on throttling - Contrary to the previous point, there can be some hot-partitions getting throttled which might push the auto-scaler to increase RU more than whatever is needed. Interestingly, even though the overall provisioning is sufficient enough, we will see few 429s in the system. This might seem similar to situation #2 at first glance. Simply increasing the RUs for such a scenario will massively overprovision the system, bloating up the cost significantly. Before increasing RUs, the algorithm needs to be cognizant of request skewness. Cost model of Cosmos for RU consumption: As mentioned earlier, RUs are charged at hourly quota for the maximum RU allocated for the running hour. Do we really need to decrease the RUs if we are paying for the maximum allocated RUs during an hourly interval?
When building a user profile, disparate information is ingested from various sources. Summarization of this information is stored in Cosmos DB Information coming from various sources will not have the same schema User profiles also evolve with time. Modification of the records concurrently is prone to data loss due to race condition.
We perform partial serialization and deserialization at client-end to handle corner cases. Records are serialized in Cosmos as json objects. During deserialization, the respective schema of event is fetched from a schema registry. The registry helps in book-keeping the schema-mapping (Avro schemas in our use case) for every user feature, and the respective record fragments (fragment of record with the User Feature we want to update) are deserialized. Any operations applicable over the fragment are performed before serializing again to JSON and dumping to Cosmos DB.

A journey through cosmos - 5th el

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A journey through cosmos - 5th el

Similar to A journey through cosmos - 5th el (20)

Recently uploaded

Recently uploaded (20)

A journey through cosmos - 5th el

Editor's Notes