Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CASSANDRA @ INSTAGRAM 2016
Dikang Gu
Software Engineer @ Facebook
ABOUT ME
2
• @dikanggu
• Software Engineer
• Instagram core infra, 2014 — present
• Facebook data Infra, 2012 — 2014
AGENDA
3
1 Overview
2 Improvements
3 Challenges
OVERVIEW
OVERVIEW
Cluster Deployment
5
• Cassandra Nodes: 1,000+
• Data Size: 100s of TeraBytes
• Ops/sec: in the millions
• Larges...
OVERVIEW
6
• Client: Python/C++/Java/PHP
• Protocol: mostly thrift, some CQL
• Versions: 2.0.x - 2.2.x
• Use LCS for most ...
TEAM
7
USE CASE 1
Feed
8
PUSH
When posting, we push the media information to the followers' feed store.
When reading, we fetch th...
USE CASE 1
Feed
• Write QPS: 1M+
• Avg/P99 Read Latency : 20ms/100ms
• Data Model:

user_id —> List(media_id)
USE CASE 2
Metadata store
10
Applications use C* as a key value store, they store a list of blobs associated with a key, a...
USE CASE 2
Metadata store
• Read/Write QPS: 100K+
• Avg read size: 50KB
• Avg/P99 Read Latency : 7ms/50ms
• Data Model:

u...
USE CASE 3
Counter
12
Applications issue bump/get counter operations for each user requests.
USE CASE 3
Counter
• Read/Write QPS: 50K+
• Avg/P99 Read Latency : 3ms/50ms
• C* 2.2
• Data Model:

some_id —> Counter
IMPROVEMENTS
1. PROXY NODES
15
PROXY NODE
Problem
16
• Thrift client, NOT token aware
• Data node coordinates the requests
• High latency and timeout whe...
PROXY NODE
Solution
17
• join_ring: false
• act as coordinator
• do not store data locally
• client only talks to proxy no...
(CASSANDRA-9258)
2. PENDING RANGES
18
PENDING RANGES
Problem
19
• CPU usage +30% when bootstrapping new nodes.
• Client requests latency jumps and timeouts
• Mu...
PENDING RANGES
Solution
20
• Cassandra-9258
• Use two NavigableMaps to implement the pending ranges
• We can expand or shr...
(CASSANDRA-6908)
3. DYNAMIC SNITCH
21
DYNAMIC SNITCH
22
• High read latency during peak time.
• Unnecessary cross region requests.
• dynamic_snitch_badness_thre...
4. COMPACTION
23
COMPACTION IMPROVEMENTS
24
• Track the write amplification. (CASSANDRA-11420)
• Optimize the overlapping lookup. (CASSANDRA...
5. BIG HEAP SIZE
25
BIG HEAP SIZE
26
• 64G max heap size
• 16G new gen size
• -XX:MaxTenuringThreshold=6
• Young GC every 10 seconds
• Avoid f...
(CASSANDRA-10406)
6. NODETOOL REBUILD RANGE
27
NODETOOL REBUILD
28
• rebuild may fail for nodes with TBs of data
• Cassandra-10406
• support to rebuild the failed token ...
CHALLENGES
PERFORMANCE
30
P99 Read Latency
Latency on the C* nodes, even higher on the client side.
PERFORMANCE
31
Compaction has difficulties to catch up
Impact the read latency
PERFORMANCE
32
Compaction uses too much CPU (40%+)
PERFORMANCE
33
Tombstone
SCALABILITY
34
Gossip, nodes see inconsistent ring
(CASSANDRA-11709, CASSANDRA-11740)
FEATURES
35
Counter, problem with repair
(CASSANDRA-11432, CASSANDRA-10862)
SSTables in each level: [966/4, 20/10, 152/100...
CLIENT
36
Access C* from different languages
Cassandra ClusterService
Service
Service
OPERATIONS
37
Cluster expansion takes long time
15 days to bootstrap 30 nodes
RECAP
38
• Proxy Node
• Pending Ranges
• Dynamic Snitch
• Compaction
• Big heap size
• Nodetool rebuild range token
• P99 ...
QUESTIONS?
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Upcoming SlideShare
Loading in …5
×

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

4,666 views

Published on

At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.

About the Speaker
Dikang Gu Software Engineer, Facebook

I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.

Published in: Software
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Want to preview some of our plans? You can get 50 Woodworking Plans and a 440-Page "The Art of Woodworking" Book... Absolutely FREE ♥♥♥ http://ishbv.com/tedsplans/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ◆◆◆ http://tinyurl.com/yy9yh8fu
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

  1. 1. CASSANDRA @ INSTAGRAM 2016 Dikang Gu Software Engineer @ Facebook
  2. 2. ABOUT ME 2 • @dikanggu • Software Engineer • Instagram core infra, 2014 — present • Facebook data Infra, 2012 — 2014
  3. 3. AGENDA 3 1 Overview 2 Improvements 3 Challenges
  4. 4. OVERVIEW
  5. 5. OVERVIEW Cluster Deployment 5 • Cassandra Nodes: 1,000+ • Data Size: 100s of TeraBytes • Ops/sec: in the millions • Largest Cluster: 100+ • Regions: multiple
  6. 6. OVERVIEW 6 • Client: Python/C++/Java/PHP • Protocol: mostly thrift, some CQL • Versions: 2.0.x - 2.2.x • Use LCS for most tables.
  7. 7. TEAM 7
  8. 8. USE CASE 1 Feed 8 PUSH When posting, we push the media information to the followers' feed store. When reading, we fetch the feed ids from the viewer's feed store.
  9. 9. USE CASE 1 Feed • Write QPS: 1M+ • Avg/P99 Read Latency : 20ms/100ms • Data Model:
 user_id —> List(media_id)
  10. 10. USE CASE 2 Metadata store 10 Applications use C* as a key value store, they store a list of blobs associated with a key, and do point query or range query during the read time.
  11. 11. USE CASE 2 Metadata store • Read/Write QPS: 100K+ • Avg read size: 50KB • Avg/P99 Read Latency : 7ms/50ms • Data Model:
 user_id —> List(Blob)
  12. 12. USE CASE 3 Counter 12 Applications issue bump/get counter operations for each user requests.
  13. 13. USE CASE 3 Counter • Read/Write QPS: 50K+ • Avg/P99 Read Latency : 3ms/50ms • C* 2.2 • Data Model:
 some_id —> Counter
  14. 14. IMPROVEMENTS
  15. 15. 1. PROXY NODES 15
  16. 16. PROXY NODE Problem 16 • Thrift client, NOT token aware • Data node coordinates the requests • High latency and timeout when data node is hot.
  17. 17. PROXY NODE Solution 17 • join_ring: false • act as coordinator • do not store data locally • client only talks to proxy node • 2X latency drop
  18. 18. (CASSANDRA-9258) 2. PENDING RANGES 18
  19. 19. PENDING RANGES Problem 19 • CPU usage +30% when bootstrapping new nodes. • Client requests latency jumps and timeouts • Multimap<Range<Token>, InetAddress> PendingRange • In-efficient O(n) pendingRanges lookup for request
  20. 20. PENDING RANGES Solution 20 • Cassandra-9258 • Use two NavigableMaps to implement the pending ranges • We can expand or shrink the cluster without affecting requests • Thanks Branimir Lambov for patch review and feedbacks.
  21. 21. (CASSANDRA-6908) 3. DYNAMIC SNITCH 21
  22. 22. DYNAMIC SNITCH 22 • High read latency during peak time. • Unnecessary cross region requests. • dynamic_snitch_badness_threshold: 50 • 10X P99 latency drop
  23. 23. 4. COMPACTION 23
  24. 24. COMPACTION IMPROVEMENTS 24 • Track the write amplification. (CASSANDRA-11420) • Optimize the overlapping lookup. (CASSANDRA-11571) • Optimize the isEOF() checking. (CASSANDRA-12013) • Avoid searching for column index. (CASSANDRA-11450) • Persist last compacted key per level. (CASSANDRA-6216) • Compact tables before making available in L0. (CASSANDRA-10862)
  25. 25. 5. BIG HEAP SIZE 25
  26. 26. BIG HEAP SIZE 26 • 64G max heap size • 16G new gen size • -XX:MaxTenuringThreshold=6 • Young GC every 10 seconds • Avoid full GC • 2X P99 latency drop
  27. 27. (CASSANDRA-10406) 6. NODETOOL REBUILD RANGE 27
  28. 28. NODETOOL REBUILD 28 • rebuild may fail for nodes with TBs of data • Cassandra-10406 • support to rebuild the failed token ranges • Thanks Yuki Morishita for reviewing
  29. 29. CHALLENGES
  30. 30. PERFORMANCE 30 P99 Read Latency Latency on the C* nodes, even higher on the client side.
  31. 31. PERFORMANCE 31 Compaction has difficulties to catch up Impact the read latency
  32. 32. PERFORMANCE 32 Compaction uses too much CPU (40%+)
  33. 33. PERFORMANCE 33 Tombstone
  34. 34. SCALABILITY 34 Gossip, nodes see inconsistent ring (CASSANDRA-11709, CASSANDRA-11740)
  35. 35. FEATURES 35 Counter, problem with repair (CASSANDRA-11432, CASSANDRA-10862) SSTables in each level: [966/4, 20/10, 152/100, 33, 0, 0, 0, 0, 0]
  36. 36. CLIENT 36 Access C* from different languages Cassandra ClusterService Service Service
  37. 37. OPERATIONS 37 Cluster expansion takes long time 15 days to bootstrap 30 nodes
  38. 38. RECAP 38 • Proxy Node • Pending Ranges • Dynamic Snitch • Compaction • Big heap size • Nodetool rebuild range token • P99 Read latency • Compaction • Tombstone • Gossip • Counter • Client • Cluster expansion ChallengesImprovements
  39. 39. QUESTIONS?

×