Ambry : Linkedin's Scalable Geo-Distributed Object Store

Ambry
LinkedIn’s Scalable Geo-
Distributed Object Store
Sivabalan Narayanan
Linkedin

Agenda
 Media’s significance
 Nature of Media Infrastructure at LinkedIn
before Ambry
 Motivation and Design goals
 API
 Architecture and selected internals
 Evaluation
 How to get started

Text files Pictures
Videos Virtual Reality

Content ecosystem @ Linkedin
before Ambry
4
Client 1 Client 2 Client 3 Client N
Filers
(Media Server)
Google App
Engine
Voldemort Espresso
Media
Processing
Lib 2
….....
Cache
Media
Processing
Lib1
Fragmented
Eco system

Content ecosystem @ Linkedin
before Ambry
5
Client 1 Client 1 Client 1 Client 1
Filers
Media
Server
Google App
Engine
Voldemort Espresso
Resize lib 2
….....
Cache
Resize lib1 Fragmented
Eco system
Calls for a Unified
Solution

Why not File Systems,
Key Value Stores?
 File Systems (HDFS, Ceph, etc)
 Have extra capabilities that are not required
for an object store
 Overhead due to metadata lookups
 Key Value stores ( Cassandra, DynamoDB, etc)
 Not built to support large objects. Might copy
data multiple times
 Lacks streaming and chunking

Ambry : A Scalable Geo-
distributed object store
Design principles
 Low latency and high throughput for a
variety of immutable objects
 Unstructured content
 Geo-distribution
 Highly Available

Design principles
 Horizontally Scalable
 Low operational overhead
 Active active set up
 Simple design and ease of use
 Cost Effective

 Store a variety of objects (documents, slides, etc)
 Store any kind of media files (pictures, sounds,
videos)
 Store backups
 For any other use-case that needs to store any
content of larger size
 Range query support for storing videos
Use cases

API
11
 POST
 Upload content to ambry
 Returns a handle(AmbryID) to the blob/content
uploaded
 GET
 Blob : Fetches the content associated with the
“AmbryID”
 BlobInfo : Fetches the properties and user
metadata associated with the blob pertaining to
the “AmbryId”
 DELETE
 Deletes the content associated with the “AmbryID”

Replication
Architecture
frontend node
Data layer
frontend node
CDN
DC2
frontend node
Frontend layer
frontend node
Frontend node
Data node
Client
ClusterManager
frontend node
Data layer
frontend node
CDN
DC1
frontend node
Frontend layer
frontend node
Frontend node
Data node
Client
ClusterManager
http http

Frontend
 HTTP
 Stateless
 Pluggable Validation and Authentication
service
 Pluggable change capture
 Router to route requests

 Supports streaming for large blobs
 Service or embedded library
 Proxy requests
 For stronger consistency
 Zero cost failure detection
 Avoid down resources
Router Library

id 30 700 - ∞
id 40 850 - 1/1/16
id 70 770 - ∞
id 30 700 - ∞
id 40 850 - 1/1/16
id 70 770 - ∞
16
Data Layer
 Partitions, Replicas
 Log structured
 Asynchronous replication for remote replicas
…
start offset in current
index segment
70
0
770 850 900
log end offset
0 100 GB
blob id
50
640
Log blob id
30
blob id
70
blob id
40
blob id offset flags
id 30 700 - ∞
id 40 850 - 1/1/17
id 70 770 - ∞
TTL
Sorted on
blob id
Index
Segment

17
Data Layer
Key Optimizations
 O(1) I/O for writes
 Avoid unnecessary movement of actual data
 Bloom filter for index segments
 Rely on OS page cache
 Zero copy for gets

Evaluation - setup
 Small cluster
– A beefy single Datanode
• 24 core CPU, 64 GB of RAM, 14 1TB HDD disks, and a
full-duplex 1 Gb/s Ethernet network.
– Workload: Read only, 50-50 read-write, and write
only
• Fix size objects in each test
• Randomly reading objects (worse than real-world with
skewed distribution)
19

Throughput
20
Large objects:
• All cases saturate network.
• Read-write saturates both
inbound and outbound link
(reaching 2x network)
For small objects:
• Write saturates network
• Throughput in Read and
Read-write drop linearly
because of frequent disk
seeks.
• In 50 KB objects, > 94%
latency for disk seeks

Latency
21
Large objects:
• Latency scales
proportionally to the object
size
For small objects:
• Write latency scales
proportionally
• Read latency dominated by
disk seek (almost constant
for all sizes)

Road Map
 Quota Management
 Authentication and Authorization
 Erasure Coding
 De-duplication

How to get started !
 https://github.com/linkedin/ambry
 Quick start
 mail to ambrydev@googlegroups.com
 LinkedIn Blog
 SIGMOD 2016 paper
 Apache 2.0 Licensed
 Contributions are welcome and encouraged !

Questions ?
24
https://github.com/linkedin/amb
ry

Challenges
26
Huge diversity
10s of KBs to few GBs
Fast, durable and highly available
Geo-replication with low latency
Ever growing data and requests
> 800 M req/day (~120 TB)
Rate doubled in 12 months
Uploaded once, never modified, rarely deleted
Most recent uploads are accessed often

27
API
• BlobProperties
• Size, TTL, Creation time, Content type
• UserMetadata
• List of <attribute, value>
• AmbryBlobOutput
• InputStream, Size, Last Modified Time

28
API
• POST /
Request Header Type Description
x-ambry-blob-size Long The size of the blob
x-ambry-service-id String The ID of the
service that is
uploading the blob
x-ambry-content-
type
String The type of
content in the blob

29
API
• POST /
x-ambry-ttl Long The time in
seconds for which
the blob is valid.
Defaults to -1
x-ambry-owner-id String The owner of the
blob
x-ambry-um- String User metadata
headers prefix
Returns the handle(blob Id) to the object uploaded

30
API
• GET /<ambry-id>/<sub-resource>
• Sub-resources: BlobInfo, UserMetadata
ambry-id String The ID of the blob
whose content is
requested
sub-resource String One of the listed
sub-resources
Returns the Content of the blob or the requested sub-resource

31
API
 DELETE /<ambry-id>
 HEAD /<ambry-id>
to be deleted
Returns a successful response on deletion
whose properties
are requested
Returns The blob properties of the blob as response headers.

32
API - Usage
 POST
curl -i -H "x-ambry-blob-size : 1000"
-H "x-ambry-service-id : CurlUpload”
-H "x-ambry-content-type : image/gif"
-H "x-ambry-um-description : Demonstration Image”
http://localhost:1174/ --data-binary @demo.gif
HTTP/1.1 201 Created
Location: AmbryID
Content-Length: 0

33
API - Usage
 GET – BlobInfo
curl -i http://localhost:1174/AmbryID/BlobInfo
HTTP/1.1 200 OK
x-ambry-blob-size: {Blob size}
x-ambry-service-id: CUrlUpload
x-ambry-creation-time: {Creation time}
x-ambry-content-type: image/gif
x-ambry-um-desc: Demonstration Image
Content-Length: 1000

34
API - Usage
 GET - Blob
curl http://localhost:1174/AmbryID > OutImage.gif
 Delete
curl -i -X DELETE http://localhost:1174/AmbryID
HTTP/1.1 202 Accepted
Content-Length: 0

Cluster Manager
Node Disks Size State
DC1:
Node_
1
Disk_0
Disk_1
…
Disk_n
4 TB
4 TB
…
4 TB
Up
Up
…
Up
DC1:
Node_
2
Disk_0
Disk_1
…
Disk_p
4 TB
4 TB
…
4 TB
Up
Down
…
Up
… … … …
DC2:
Node_i
Disk_0
Disk_1
…
Disk_
m
4 TB
4 TB
…
4 TB
Up
Up
…
Down
Partition_id State Replica
Partition_0 Read-
Write
DC1:Node1:Disk_0
DC1:Node4:Disk_2
…
DC3:Node2:Disk_1
Partition_1 Read-
Only
DC2:Node0:Disk_0
DC1:Node4:Disk_3
…
DC2:Node2:Disk_2
… …
Partition_k Read-
Write
DC1:Node1:Disk_1
DC2:Node1:Disk_2
…
DC3:Node0:Disk_0
Hardware Layout Partition Layout

Zero-cost Failure Recovery
36
available
wait period
temp down
temp
available
wait period
temp down
temp
available available
• No extra messages leveraging request messages
• Effective, simple, and consumes very little
bandwidth.

Replication Protocol
640 b1
700 b5
770 b6
850 b4
920 b3
600 b5
690 b1
750 b2
810 b4
880 b7
1. Get blob Ids since 690
Last known
offset of R2
= 690
2. Blob info {b1, b2, b4, b7}
Last known
offset of R2
= 880
4. Get blobs {b2, b7}
5. Blob content {b2, b7}
Replica R1 of Partition P1
(Journal)
Replica R2 of Partition P1
(Journal)
3. Filter missing blob Ids
1050 b2
1110 b7
6. Append blobs to local
replica

38
Replication
 Multi-master replication
 Asynchronous
 Pull based
 Optimizations
 Batching requests
 Inter and intra-colo thread pools
 Prioritization for lagging replicas
- Disk repairs
 In-Memory Journaling
- Map<offset, blob_id>

Ambry : Linkedin's Scalable Geo-Distributed Object Store

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ambry : Linkedin's Scalable Geo-Distributed Object Store

Similar to Ambry : Linkedin's Scalable Geo-Distributed Object Store (20)

Recently uploaded

Recently uploaded (20)

Ambry : Linkedin's Scalable Geo-Distributed Object Store

Editor's Notes