SlideShare a Scribd company logo
1 of 38
Ambry
LinkedIn’s Scalable Geo-
Distributed Object Store
Sivabalan Narayanan
Linkedin
Agenda
 Media’s significance
 Nature of Media Infrastructure at LinkedIn
before Ambry
 Motivation and Design goals
 API
 Architecture and selected internals
 Evaluation
 How to get started
Text files Pictures
Videos Virtual Reality
Content ecosystem @ Linkedin
before Ambry
4
Client 1 Client 2 Client 3 Client N
Filers
(Media Server)
Google App
Engine
Voldemort Espresso
Media
Processing
Lib 2
….....
Cache
Media
Processing
Lib1
Fragmented
Eco system
Content ecosystem @ Linkedin
before Ambry
5
Client 1 Client 1 Client 1 Client 1
Filers
Media
Server
Google App
Engine
Voldemort Espresso
Resize lib 2
….....
Cache
Resize lib1 Fragmented
Eco system
Calls for a Unified
Solution
Why not File Systems,
Key Value Stores?
 File Systems (HDFS, Ceph, etc)
 Have extra capabilities that are not required
for an object store
 Overhead due to metadata lookups
 Key Value stores ( Cassandra, DynamoDB, etc)
 Not built to support large objects. Might copy
data multiple times
 Lacks streaming and chunking
Ambry : A Scalable Geo-
distributed object store
Design principles
 Low latency and high throughput for a
variety of immutable objects
 Unstructured content
 Geo-distribution
 Highly Available
Design principles
 Horizontally Scalable
 Low operational overhead
 Active active set up
 Simple design and ease of use
 Cost Effective
 Store a variety of objects (documents, slides, etc)
 Store any kind of media files (pictures, sounds,
videos)
 Store backups
 For any other use-case that needs to store any
content of larger size
 Range query support for storing videos
Use cases
API
10
API
11
 POST
 Upload content to ambry
 Returns a handle(AmbryID) to the blob/content
uploaded
 GET
 Blob : Fetches the content associated with the
“AmbryID”
 BlobInfo : Fetches the properties and user
metadata associated with the blob pertaining to
the “AmbryId”
 DELETE
 Deletes the content associated with the “AmbryID”
ARCHITECTURE
12
Replication
Architecture
frontend node
Data layer
frontend node
CDN
DC2
frontend node
Frontend layer
frontend node
Frontend node
Data node
Client
ClusterManager
frontend node
Data layer
frontend node
CDN
DC1
frontend node
Frontend layer
frontend node
Frontend node
Data node
Client
ClusterManager
http http
Frontend
 HTTP
 Stateless
 Pluggable Validation and Authentication
service
 Pluggable change capture
 Router to route requests
 Supports streaming for large blobs
 Service or embedded library
 Proxy requests
 For stronger consistency
 Zero cost failure detection
 Avoid down resources
Router Library
id 30 700 - ∞
id 40 850 - 1/1/16
id 70 770 - ∞
id 30 700 - ∞
id 40 850 - 1/1/16
id 70 770 - ∞
16
Data Layer
 Partitions, Replicas
 Log structured
 Asynchronous replication for remote replicas
…
start offset in current
index segment
70
0
770 850 900
log end offset
0 100 GB
blob id
50
640
Log blob id
30
blob id
70
blob id
40
blob id offset flags
id 30 700 - ∞
id 40 850 - 1/1/17
id 70 770 - ∞
TTL
Sorted on
blob id
Index
Segment
17
Data Layer
Key Optimizations
 O(1) I/O for writes
 Avoid unnecessary movement of actual data
 Bloom filter for index segments
 Rely on OS page cache
 Zero copy for gets
EVALUATION
18
Evaluation - setup
 Small cluster
– A beefy single Datanode
• 24 core CPU, 64 GB of RAM, 14 1TB HDD disks, and a
full-duplex 1 Gb/s Ethernet network.
– Workload: Read only, 50-50 read-write, and write
only
• Fix size objects in each test
• Randomly reading objects (worse than real-world with
skewed distribution)
19
Throughput
20
Large objects:
• All cases saturate network.
• Read-write saturates both
inbound and outbound link
(reaching 2x network)
For small objects:
• Write saturates network
• Throughput in Read and
Read-write drop linearly
because of frequent disk
seeks.
• In 50 KB objects, > 94%
latency for disk seeks
Latency
21
Large objects:
• Latency scales
proportionally to the object
size
For small objects:
• Write latency scales
proportionally
• Read latency dominated by
disk seek (almost constant
for all sizes)
Road Map
 Quota Management
 Authentication and Authorization
 Erasure Coding
 De-duplication
How to get started !
 https://github.com/linkedin/ambry
 Quick start
 mail to ambrydev@googlegroups.com
 LinkedIn Blog
 SIGMOD 2016 paper
 Apache 2.0 Licensed
 Contributions are welcome and encouraged !
Questions ?
24
https://github.com/linkedin/amb
ry
Back up slides
25
Challenges
26
Huge diversity
10s of KBs to few GBs
Fast, durable and highly available
Geo-replication with low latency
Ever growing data and requests
> 800 M req/day (~120 TB)
Rate doubled in 12 months
Uploaded once, never modified, rarely deleted
Most recent uploads are accessed often
27
API
• BlobProperties
• Size, TTL, Creation time, Content type
• UserMetadata
• List of <attribute, value>
• AmbryBlobOutput
• InputStream, Size, Last Modified Time
28
API
• POST /
Request Header Type Description
x-ambry-blob-size Long The size of the blob
x-ambry-service-id String The ID of the
service that is
uploading the blob
x-ambry-content-
type
String The type of
content in the blob
29
API
• POST /
Request Header Type Description
x-ambry-ttl Long The time in
seconds for which
the blob is valid.
Defaults to -1
x-ambry-owner-id String The owner of the
blob
x-ambry-um- String User metadata
headers prefix
Returns the handle(blob Id) to the object uploaded
30
API
• GET /<ambry-id>/<sub-resource>
• Sub-resources: BlobInfo, UserMetadata
Request Header Type Description
ambry-id String The ID of the blob
whose content is
requested
sub-resource String One of the listed
sub-resources
Returns the Content of the blob or the requested sub-resource
31
API
 DELETE /<ambry-id>
 HEAD /<ambry-id>
Request Header Type Description
ambry-id String The ID of the blob
to be deleted
Returns a successful response on deletion
Request Header Type Description
ambry-id String The ID of the blob
whose properties
are requested
Returns The blob properties of the blob as response headers.
32
API - Usage
 POST
curl -i -H "x-ambry-blob-size : 1000"
-H "x-ambry-service-id : CurlUpload”
-H "x-ambry-content-type : image/gif"
-H "x-ambry-um-description : Demonstration Image”
http://localhost:1174/ --data-binary @demo.gif
HTTP/1.1 201 Created
Location: AmbryID
Content-Length: 0
33
API - Usage
 GET – BlobInfo
curl -i http://localhost:1174/AmbryID/BlobInfo
HTTP/1.1 200 OK
x-ambry-blob-size: {Blob size}
x-ambry-service-id: CUrlUpload
x-ambry-creation-time: {Creation time}
x-ambry-content-type: image/gif
x-ambry-um-desc: Demonstration Image
Content-Length: 1000
34
API - Usage
 GET - Blob
curl http://localhost:1174/AmbryID > OutImage.gif
 Delete
curl -i -X DELETE http://localhost:1174/AmbryID
HTTP/1.1 202 Accepted
Content-Length: 0
Cluster Manager
Node Disks Size State
DC1:
Node_
1
Disk_0
Disk_1
…
Disk_n
4 TB
4 TB
…
4 TB
Up
Up
…
Up
DC1:
Node_
2
Disk_0
Disk_1
…
Disk_p
4 TB
4 TB
…
4 TB
Up
Down
…
Up
… … … …
DC2:
Node_i
Disk_0
Disk_1
…
Disk_
m
4 TB
4 TB
…
4 TB
Up
Up
…
Down
Partition_id State Replica
Partition_0 Read-
Write
DC1:Node1:Disk_0
DC1:Node4:Disk_2
…
DC3:Node2:Disk_1
Partition_1 Read-
Only
DC2:Node0:Disk_0
DC1:Node4:Disk_3
…
DC2:Node2:Disk_2
… …
Partition_k Read-
Write
DC1:Node1:Disk_1
DC2:Node1:Disk_2
…
DC3:Node0:Disk_0
Hardware Layout Partition Layout
Zero-cost Failure Recovery
36
available
wait period
temp down
temp
available
wait period
temp down
temp
available available
• No extra messages leveraging request messages
• Effective, simple, and consumes very little
bandwidth.
Replication Protocol
640 b1
700 b5
770 b6
850 b4
920 b3
600 b5
690 b1
750 b2
810 b4
880 b7
1. Get blob Ids since 690
Last known
offset of R2
= 690
2. Blob info {b1, b2, b4, b7}
Last known
offset of R2
= 880
4. Get blobs {b2, b7}
5. Blob content {b2, b7}
Replica R1 of Partition P1
(Journal)
Replica R2 of Partition P1
(Journal)
3. Filter missing blob Ids
1050 b2
1110 b7
6. Append blobs to local
replica
38
Replication
 Multi-master replication
 Asynchronous
 Pull based
 Optimizations
 Batching requests
 Inter and intra-colo thread pools
 Prioritization for lagging replicas
- Disk repairs
 In-Memory Journaling
- Map<offset, blob_id>

More Related Content

What's hot

Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
Lessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating ScubaLessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating ScubaSingleStore
 
PostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and CapabilitiesPostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and CapabilitiesPGConf APAC
 
3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem振东 刘
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsBrettTasker
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...HostedbyConfluent
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Labs
 
Bitsy graph database
Bitsy graph databaseBitsy graph database
Bitsy graph databaseLambdaZen LLC
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Karthik Deivasigamani
 
2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-query2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-queryAshnikbiz
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repositoryJukka Zitting
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typeJumping Bean
 
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...Codemotion
 
How to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectLoi Nguyen
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Elephants in the Cloud
Elephants in the CloudElephants in the Cloud
Elephants in the CloudMike Fowler
 

What's hot (20)

Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
Lessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating ScubaLessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating Scuba
 
PostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and CapabilitiesPostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and Capabilities
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applications
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
 
Bitsy graph database
Bitsy graph databaseBitsy graph database
Bitsy graph database
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
 
2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-query2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-query
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
 
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
 
How to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connect
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Elephants in the Cloud
Elephants in the CloudElephants in the Cloud
Elephants in the Cloud
 
Apache Storm In Retail Context
Apache Storm In Retail ContextApache Storm In Retail Context
Apache Storm In Retail Context
 

Similar to Ambry : Linkedin's Scalable Geo-Distributed Object Store

Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...
Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...
Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...Vadym Kazulkin
 
Journey Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersJourney Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersAdrian Hornsby
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Hao H. Zhang
 
Journey Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersJourney Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersAdrian Hornsby
 
The web as it should be
The web as it should beThe web as it should be
The web as it should bethebeebs
 
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Amazon Web Services
 
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)Amazon Web Services
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentationadvaitdeo
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with DockerMariaDB plc
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computingdeimos
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
AWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWSAWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWSVladimir Simek
 
AWS Startup Day Bangalore: Being Well-Architected in the Cloud
AWS Startup Day Bangalore: Being Well-Architected in the CloudAWS Startup Day Bangalore: Being Well-Architected in the Cloud
AWS Startup Day Bangalore: Being Well-Architected in the CloudAdrian Hornsby
 
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWSETCenter
 
Netflix Play API: Why we built an evolutionary architecture
Netflix Play API: Why we built an evolutionary architectureNetflix Play API: Why we built an evolutionary architecture
Netflix Play API: Why we built an evolutionary architectureSuudhan Rangarajan
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSAntons Kranga
 

Similar to Ambry : Linkedin's Scalable Geo-Distributed Object Store (20)

Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...
Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...
Production Ready Serverless Java Applications in 3 Weeks AWS UG Cologne Febru...
 
Log Analysis At Scale
Log Analysis At ScaleLog Analysis At Scale
Log Analysis At Scale
 
Journey Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersJourney Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million Users
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1
 
Journey Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersJourney Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million Users
 
The web as it should be
The web as it should beThe web as it should be
The web as it should be
 
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with Docker
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computing
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
AWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWSAWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWS
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
AWS Startup Day Bangalore: Being Well-Architected in the Cloud
AWS Startup Day Bangalore: Being Well-Architected in the CloudAWS Startup Day Bangalore: Being Well-Architected in the Cloud
AWS Startup Day Bangalore: Being Well-Architected in the Cloud
 
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
 
Netflix Play API: Why we built an evolutionary architecture
Netflix Play API: Why we built an evolutionary architectureNetflix Play API: Why we built an evolutionary architecture
Netflix Play API: Why we built an evolutionary architecture
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
 

Recently uploaded

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

Ambry : Linkedin's Scalable Geo-Distributed Object Store

  • 1. Ambry LinkedIn’s Scalable Geo- Distributed Object Store Sivabalan Narayanan Linkedin
  • 2. Agenda  Media’s significance  Nature of Media Infrastructure at LinkedIn before Ambry  Motivation and Design goals  API  Architecture and selected internals  Evaluation  How to get started
  • 3. Text files Pictures Videos Virtual Reality
  • 4. Content ecosystem @ Linkedin before Ambry 4 Client 1 Client 2 Client 3 Client N Filers (Media Server) Google App Engine Voldemort Espresso Media Processing Lib 2 …..... Cache Media Processing Lib1 Fragmented Eco system
  • 5. Content ecosystem @ Linkedin before Ambry 5 Client 1 Client 1 Client 1 Client 1 Filers Media Server Google App Engine Voldemort Espresso Resize lib 2 …..... Cache Resize lib1 Fragmented Eco system Calls for a Unified Solution
  • 6. Why not File Systems, Key Value Stores?  File Systems (HDFS, Ceph, etc)  Have extra capabilities that are not required for an object store  Overhead due to metadata lookups  Key Value stores ( Cassandra, DynamoDB, etc)  Not built to support large objects. Might copy data multiple times  Lacks streaming and chunking
  • 7. Ambry : A Scalable Geo- distributed object store Design principles  Low latency and high throughput for a variety of immutable objects  Unstructured content  Geo-distribution  Highly Available
  • 8. Design principles  Horizontally Scalable  Low operational overhead  Active active set up  Simple design and ease of use  Cost Effective
  • 9.  Store a variety of objects (documents, slides, etc)  Store any kind of media files (pictures, sounds, videos)  Store backups  For any other use-case that needs to store any content of larger size  Range query support for storing videos Use cases
  • 11. API 11  POST  Upload content to ambry  Returns a handle(AmbryID) to the blob/content uploaded  GET  Blob : Fetches the content associated with the “AmbryID”  BlobInfo : Fetches the properties and user metadata associated with the blob pertaining to the “AmbryId”  DELETE  Deletes the content associated with the “AmbryID”
  • 13. Replication Architecture frontend node Data layer frontend node CDN DC2 frontend node Frontend layer frontend node Frontend node Data node Client ClusterManager frontend node Data layer frontend node CDN DC1 frontend node Frontend layer frontend node Frontend node Data node Client ClusterManager http http
  • 14. Frontend  HTTP  Stateless  Pluggable Validation and Authentication service  Pluggable change capture  Router to route requests
  • 15.  Supports streaming for large blobs  Service or embedded library  Proxy requests  For stronger consistency  Zero cost failure detection  Avoid down resources Router Library
  • 16. id 30 700 - ∞ id 40 850 - 1/1/16 id 70 770 - ∞ id 30 700 - ∞ id 40 850 - 1/1/16 id 70 770 - ∞ 16 Data Layer  Partitions, Replicas  Log structured  Asynchronous replication for remote replicas … start offset in current index segment 70 0 770 850 900 log end offset 0 100 GB blob id 50 640 Log blob id 30 blob id 70 blob id 40 blob id offset flags id 30 700 - ∞ id 40 850 - 1/1/17 id 70 770 - ∞ TTL Sorted on blob id Index Segment
  • 17. 17 Data Layer Key Optimizations  O(1) I/O for writes  Avoid unnecessary movement of actual data  Bloom filter for index segments  Rely on OS page cache  Zero copy for gets
  • 19. Evaluation - setup  Small cluster – A beefy single Datanode • 24 core CPU, 64 GB of RAM, 14 1TB HDD disks, and a full-duplex 1 Gb/s Ethernet network. – Workload: Read only, 50-50 read-write, and write only • Fix size objects in each test • Randomly reading objects (worse than real-world with skewed distribution) 19
  • 20. Throughput 20 Large objects: • All cases saturate network. • Read-write saturates both inbound and outbound link (reaching 2x network) For small objects: • Write saturates network • Throughput in Read and Read-write drop linearly because of frequent disk seeks. • In 50 KB objects, > 94% latency for disk seeks
  • 21. Latency 21 Large objects: • Latency scales proportionally to the object size For small objects: • Write latency scales proportionally • Read latency dominated by disk seek (almost constant for all sizes)
  • 22. Road Map  Quota Management  Authentication and Authorization  Erasure Coding  De-duplication
  • 23. How to get started !  https://github.com/linkedin/ambry  Quick start  mail to ambrydev@googlegroups.com  LinkedIn Blog  SIGMOD 2016 paper  Apache 2.0 Licensed  Contributions are welcome and encouraged !
  • 26. Challenges 26 Huge diversity 10s of KBs to few GBs Fast, durable and highly available Geo-replication with low latency Ever growing data and requests > 800 M req/day (~120 TB) Rate doubled in 12 months Uploaded once, never modified, rarely deleted Most recent uploads are accessed often
  • 27. 27 API • BlobProperties • Size, TTL, Creation time, Content type • UserMetadata • List of <attribute, value> • AmbryBlobOutput • InputStream, Size, Last Modified Time
  • 28. 28 API • POST / Request Header Type Description x-ambry-blob-size Long The size of the blob x-ambry-service-id String The ID of the service that is uploading the blob x-ambry-content- type String The type of content in the blob
  • 29. 29 API • POST / Request Header Type Description x-ambry-ttl Long The time in seconds for which the blob is valid. Defaults to -1 x-ambry-owner-id String The owner of the blob x-ambry-um- String User metadata headers prefix Returns the handle(blob Id) to the object uploaded
  • 30. 30 API • GET /<ambry-id>/<sub-resource> • Sub-resources: BlobInfo, UserMetadata Request Header Type Description ambry-id String The ID of the blob whose content is requested sub-resource String One of the listed sub-resources Returns the Content of the blob or the requested sub-resource
  • 31. 31 API  DELETE /<ambry-id>  HEAD /<ambry-id> Request Header Type Description ambry-id String The ID of the blob to be deleted Returns a successful response on deletion Request Header Type Description ambry-id String The ID of the blob whose properties are requested Returns The blob properties of the blob as response headers.
  • 32. 32 API - Usage  POST curl -i -H "x-ambry-blob-size : 1000" -H "x-ambry-service-id : CurlUpload” -H "x-ambry-content-type : image/gif" -H "x-ambry-um-description : Demonstration Image” http://localhost:1174/ --data-binary @demo.gif HTTP/1.1 201 Created Location: AmbryID Content-Length: 0
  • 33. 33 API - Usage  GET – BlobInfo curl -i http://localhost:1174/AmbryID/BlobInfo HTTP/1.1 200 OK x-ambry-blob-size: {Blob size} x-ambry-service-id: CUrlUpload x-ambry-creation-time: {Creation time} x-ambry-content-type: image/gif x-ambry-um-desc: Demonstration Image Content-Length: 1000
  • 34. 34 API - Usage  GET - Blob curl http://localhost:1174/AmbryID > OutImage.gif  Delete curl -i -X DELETE http://localhost:1174/AmbryID HTTP/1.1 202 Accepted Content-Length: 0
  • 35. Cluster Manager Node Disks Size State DC1: Node_ 1 Disk_0 Disk_1 … Disk_n 4 TB 4 TB … 4 TB Up Up … Up DC1: Node_ 2 Disk_0 Disk_1 … Disk_p 4 TB 4 TB … 4 TB Up Down … Up … … … … DC2: Node_i Disk_0 Disk_1 … Disk_ m 4 TB 4 TB … 4 TB Up Up … Down Partition_id State Replica Partition_0 Read- Write DC1:Node1:Disk_0 DC1:Node4:Disk_2 … DC3:Node2:Disk_1 Partition_1 Read- Only DC2:Node0:Disk_0 DC1:Node4:Disk_3 … DC2:Node2:Disk_2 … … Partition_k Read- Write DC1:Node1:Disk_1 DC2:Node1:Disk_2 … DC3:Node0:Disk_0 Hardware Layout Partition Layout
  • 36. Zero-cost Failure Recovery 36 available wait period temp down temp available wait period temp down temp available available • No extra messages leveraging request messages • Effective, simple, and consumes very little bandwidth.
  • 37. Replication Protocol 640 b1 700 b5 770 b6 850 b4 920 b3 600 b5 690 b1 750 b2 810 b4 880 b7 1. Get blob Ids since 690 Last known offset of R2 = 690 2. Blob info {b1, b2, b4, b7} Last known offset of R2 = 880 4. Get blobs {b2, b7} 5. Blob content {b2, b7} Replica R1 of Partition P1 (Journal) Replica R2 of Partition P1 (Journal) 3. Filter missing blob Ids 1050 b2 1110 b7 6. Append blobs to local replica
  • 38. 38 Replication  Multi-master replication  Asynchronous  Pull based  Optimizations  Batching requests  Inter and intra-colo thread pools  Prioritization for lagging replicas - Disk repairs  In-Memory Journaling - Map<offset, blob_id>

Editor's Notes

  1. Before proceeding, a few words on importance of media. 15 seconds Media includes images, videos, documents and possibly in the future Virtual Reality. Media content are the biggest influencers to user engagement and virality. This trend will continue to grow not just on LinkedIn but across the web. To build great media products, we need to have a world class infrastructure to support it. That is the vision and focus of this group.
  2. FS have and extra capabilities that are not needed for blob store
 Blob store does not need the FS arbitrarily nested directory structure (hierarchical structure)
 They don’t need the rich metadata per object like ACL and access times Not the best to store large number of small objects due to the metadata that has to be kept Name Node issues KV stores are more general purpose, they can be used for storing objects but not optimized for this purpose, and extra burden comes with them. Handle key collisions Data is mutable. Additional overhead to maintain consistency model
  3. So, when we started designing Ambry, we had few design principles in mind that needs to be taken care of. First and foremost, it should be a low latency and high throughput system. Since medias are tend to vary from smaller sized objects to very large objects, we can’t give up on throughput nor the latency. And definitely smaller blobs should be served with very low latency. Next is Geo – distribution. Geo-distribution is very much necessary when dealing with medias. Whenever you upload a picture, and share it via your social network, very likely someone on the other end of the globe will try to see it. So, our object store should have a global presence. Highly available: Downstream Applications nor users can’t withstand unavailability. Consider having video ads and the monetization via same. So, availability is pretty much a must have feature in any data system.
  4. Next one is scalability. If we don’t build a system scalable right from beginning, either we have to pay a hefty price at a later stage when we revisit our system design to make it scalable or it may not be feasible to scale at all. Scaling Ambry is just a matter of adding new nodes with new partitions. You can’t brag that you have designed a good system, just that its tough to operationalize it. No one is going to use your system if for operational issues. So, Ambry has been built from the ground up having this in mind. We have lot of tooling built around Ambry for ease of operationalizing. Active-active set up: Having master-slave set up might add additional overhead in making sure slaves are synced up with master all the time, master is up and becomes sometimes might be bottle neck too. So, ours is a active active set up, where in any replica can take in puts and gets. This avoids bottleneck and helps in spreading out the requests to different replicas for the same partition. Simple design and ease of use: We will talk about Ambry’s design in the forth coming slides. But again, we wanted to ensure that our design is simple enough to understand and easy for developers to use it. Cheap: Medias are rarely deleted. The older data become cold over time and has very low read QPS. Also, objects are usually large and take up a lot of space. The design should be such that it enables JBOD, supports hard disks and keeps the space amplification to a minimum.
  5. Ambry can be used a source of truth for all your immutable needs with highly availability and scalability.
  6. Frontend understands http Having a frontend also helps us to coexist with the legacy system (requests can be routed to Ambry or the media server) Frontend is also where we can plug in other features like virus scanning, or pushing to a change capture system like kafka for tracking puts and deletes. Frontend has a router library which has all the core logic to work with the Ambry backend/datanode. Router handles how an operation should be performed and has configurable policies for them.
  7. Storage is divided into partitions A partition has a set of replicas Any given blob goes into one partition A server node consists of replicas of several partitions A replica has a set of disk/memory data structures
  8. Storage is divided into partitions A partition has a set of replicas Any given blob goes into one partition A server node consists of replicas of several partitions A replica has a set of disk/memory data structures
  9. We deployed Ambry with a single Datanode. The Datan- ode was running on a 24 core CPU with 64 GB of RAM, 14 1TB HDD disks, and a full-duplex 1 Gb/s Ethernet network. 4 GB of the RAM was set aside for the Datanode’s internal use and the rest was left to be used as Linux Cache
  10. the maximum through- put (in MB/s) stays constant and close to maximum net- work bandwidth across all blob sizes. Similarly, throughput in terms of requests/s scales proportionally. However, for Read and Read-Write, the read throughput in terms of MB/s drops linearly for smaller sizes. This drop is because our micro-benchmark reads blobs randomly, incurring frequent disk seeks. The effect of disk seeks is amplified for smaller blobs. By further profiling the disk using Bonnie++ [1] (an IO benchmark for measuring disk performance), we confirmed that disk seeks are the dominant source of latency for small blobs. For example, when reading a 50 KB blob, more than 94% of latency is due to disk seek (6.49 ms for disk seek, and 0.4 ms for reading the data).
  11. Handling blobs poses a number of unique challenges. First, due to diversity in media types, blob sizes vary significantly from tens of KBs (e.g., profile pictures) to a few GBs (e.g., videos). The system needs to store both massive blobs and a large number of small blobs efficiently. Second, users expect the uploading process to be fast, durable, and highly available. When a user uploads a blob, all his/her friends from all around the globe should be able to see the blob with very low latency, even if parts of the internal infrastructure fail. To provide these properties, data has to be reliably replicated across the globe in multiple datacenters, while maintaining low latency for each request. Finally, there is an ever-growing number of blobs that need to be stored and served. Currently, LinkedIn serves more than 800 million put and get operations per day (over 120 TB in size). In the past 12 months, the request rate has almost doubled, from 5k requests/s to 9.5k requests/s. This rapid growth in requests magnifies the necessity for a linearly scalable system (with low overhead). Third, the variability in workload and cluster expansions can create unbalanced load, degrading the latency and throughput of the system
  12. Talk about REST APIs & Router APIs Also talk about ease of usability (either as a service or embed as lib and reduce one hop)
  13. Talk about REST APIs & Router APIs Also talk about ease of usability (either as a service or embed as lib and reduce one hop)
  14. Talk about REST APIs & Router APIs Also talk about ease of usability (either as a service or embed as lib and reduce one hop)
  15. Talk about REST APIs & Router APIs Also talk about ease of usability (either as a service or embed as lib and reduce one hop)
  16. Talk about REST APIs & Router APIs Also talk about ease of usability (either as a service or embed as lib and reduce one hop)
  17. Talk about REST APIs & Router APIs Also talk about ease of usability (either as a service or embed as lib and reduce one hop)
  18. Talk about REST APIs & Router APIs Also talk about ease of usability (either as a service or embed as lib and reduce one hop)
  19. Frontend understands http Frontend is also where we can plug in other features like virus scanning, or pushing to a change capture system like kafka for tracking puts and deletes. Frontend has a router library which has all the core logic to work with the Ambry backend/datanode. Router handles how an operation should be performed and has configurable policies for them.
  20. Ambry employs a zero-cost failure detection mechanism involving no extra messages, such as heartbeats and pings, by leveraging request messages. In practice, we found our failure detection mechanism is effective, simple, and con- sumes very little bandwidth.
  21. As we saw earlier, we have quorum based writes in case of PUT requests. So, there could be some replicas where certain blobs are left out. We rely on replication to replicate such blobs to all replicas. Also, our operation policy for PUT is designed in such a way that we write to local replicas and rely on replication to replicate the same to remote replicas. Thus replication places an important role in Ambry. Multiple masters can take writes for high availability. Completely asynchronous and a pull based replication protocol.