Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CASSANDRA @ INSTAGRAM 2016
Dikang Gu
Software Engineer @ Facebook
ABOUT ME
2
• @dikanggu
• Software Engineer
• Instagram core infra, 2014 — present
• Facebook data Infra, 2012 — 2014
AGENDA
3
1 Overview
2 Improvements
3 Challenges
OVERVIEW
OVERVIEW
Cluster Deployment
5
• Cassandra Nodes: 1,000+
• Data Size: 100s of TeraBytes
• Ops/sec: in the millions
• Larges...
OVERVIEW
6
• Client: Python/C++/Java/PHP
• Protocol: mostly thrift, some CQL
• Versions: 2.0.x - 2.2.x
• Use LCS for most ...
TEAM
7
USE CASE 1
Feed
8
PUSH
When posting, we push the media information to the followers' feed store.
When reading, we fetch th...
USE CASE 1
Feed
• Write QPS: 1M+
• Avg/P99 Read Latency : 20ms/100ms
• Data Model:

user_id —> List(media_id)
USE CASE 2
Metadata store
10
Applications use C* as a key value store, they store a list of blobs associated with a key, a...
USE CASE 2
Metadata store
• Read/Write QPS: 100K+
• Avg read size: 50KB
• Avg/P99 Read Latency : 7ms/50ms
• Data Model:

u...
USE CASE 3
Counter
12
Applications issue bump/get counter operations for each user requests.
USE CASE 3
Counter
• Read/Write QPS: 50K+
• Avg/P99 Read Latency : 3ms/50ms
• C* 2.2
• Data Model:

some_id —> Counter
IMPROVEMENTS
1. PROXY NODES
15
PROXY NODE
Problem
16
• Thrift client, NOT token aware
• Data node coordinates the requests
• High latency and timeout whe...
PROXY NODE
Solution
17
• join_ring: false
• act as coordinator
• do not store data locally
• client only talks to proxy no...
(CASSANDRA-9258)
2. PENDING RANGES
18
PENDING RANGES
Problem
19
• CPU usage +30% when bootstrapping new nodes.
• Client requests latency jumps and timeouts
• Mu...
PENDING RANGES
Solution
20
• Cassandra-9258
• Use two NavigableMaps to implement the pending ranges
• We can expand or shr...
(CASSANDRA-6908)
3. DYNAMIC SNITCH
21
DYNAMIC SNITCH
22
• High read latency during peak time.
• Unnecessary cross region requests.
• dynamic_snitch_badness_thre...
4. COMPACTION
23
COMPACTION IMPROVEMENTS
24
• Track the write amplification. (CASSANDRA-11420)
• Optimize the overlapping lookup. (CASSANDRA...
5. BIG HEAP SIZE
25
BIG HEAP SIZE
26
• 64G max heap size
• 16G new gen size
• -XX:MaxTenuringThreshold=6
• Young GC every 10 seconds
• Avoid f...
(CASSANDRA-10406)
6. NODETOOL REBUILD RANGE
27
NODETOOL REBUILD
28
• rebuild may fail for nodes with TBs of data
• Cassandra-10406
• support to rebuild the failed token ...
CHALLENGES
PERFORMANCE
30
P99 Read Latency
Latency on the C* nodes, even higher on the client side.
PERFORMANCE
31
Compaction has difficulties to catch up
Impact the read latency
PERFORMANCE
32
Compaction uses too much CPU (40%+)
PERFORMANCE
33
Tombstone
SCALABILITY
34
Gossip, nodes see inconsistent ring
(CASSANDRA-11709, CASSANDRA-11740)
FEATURES
35
Counter, problem with repair
(CASSANDRA-11432, CASSANDRA-10862)
SSTables in each level: [966/4, 20/10, 152/100...
CLIENT
36
Access C* from different languages
Cassandra ClusterService
Service
Service
OPERATIONS
37
Cluster expansion takes long time
15 days to bootstrap 30 nodes
RECAP
38
• Proxy Node
• Pending Ranges
• Dynamic Snitch
• Compaction
• Big heap size
• Nodetool rebuild range token
• P99 ...
QUESTIONS?
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Upcoming SlideShare
Loading in …5
×

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

6,771 views

Published on

At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.

About the Speaker
Dikang Gu Software Engineer, Facebook

I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.

Published in: Software
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans...  http://t.cn/A6hKwqcb
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Profit Maximiser redefined the notion of exploiting bookie offers as a longer-term, rather than a one-off opportunity. Seasoned users report steady month-by-month profits and support each other through a famously busy, private facebook group. The winner of our best matched betting product oscar has matured into something very, very special.  http://t.cn/A6hPRSfx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/sAEit ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Ordinary Guy Retires After Winning The Lotto 7 Times ➤➤ http://t.cn/Airfq84N
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

  1. 1. CASSANDRA @ INSTAGRAM 2016 Dikang Gu Software Engineer @ Facebook
  2. 2. ABOUT ME 2 • @dikanggu • Software Engineer • Instagram core infra, 2014 — present • Facebook data Infra, 2012 — 2014
  3. 3. AGENDA 3 1 Overview 2 Improvements 3 Challenges
  4. 4. OVERVIEW
  5. 5. OVERVIEW Cluster Deployment 5 • Cassandra Nodes: 1,000+ • Data Size: 100s of TeraBytes • Ops/sec: in the millions • Largest Cluster: 100+ • Regions: multiple
  6. 6. OVERVIEW 6 • Client: Python/C++/Java/PHP • Protocol: mostly thrift, some CQL • Versions: 2.0.x - 2.2.x • Use LCS for most tables.
  7. 7. TEAM 7
  8. 8. USE CASE 1 Feed 8 PUSH When posting, we push the media information to the followers' feed store. When reading, we fetch the feed ids from the viewer's feed store.
  9. 9. USE CASE 1 Feed • Write QPS: 1M+ • Avg/P99 Read Latency : 20ms/100ms • Data Model:
 user_id —> List(media_id)
  10. 10. USE CASE 2 Metadata store 10 Applications use C* as a key value store, they store a list of blobs associated with a key, and do point query or range query during the read time.
  11. 11. USE CASE 2 Metadata store • Read/Write QPS: 100K+ • Avg read size: 50KB • Avg/P99 Read Latency : 7ms/50ms • Data Model:
 user_id —> List(Blob)
  12. 12. USE CASE 3 Counter 12 Applications issue bump/get counter operations for each user requests.
  13. 13. USE CASE 3 Counter • Read/Write QPS: 50K+ • Avg/P99 Read Latency : 3ms/50ms • C* 2.2 • Data Model:
 some_id —> Counter
  14. 14. IMPROVEMENTS
  15. 15. 1. PROXY NODES 15
  16. 16. PROXY NODE Problem 16 • Thrift client, NOT token aware • Data node coordinates the requests • High latency and timeout when data node is hot.
  17. 17. PROXY NODE Solution 17 • join_ring: false • act as coordinator • do not store data locally • client only talks to proxy node • 2X latency drop
  18. 18. (CASSANDRA-9258) 2. PENDING RANGES 18
  19. 19. PENDING RANGES Problem 19 • CPU usage +30% when bootstrapping new nodes. • Client requests latency jumps and timeouts • Multimap<Range<Token>, InetAddress> PendingRange • In-efficient O(n) pendingRanges lookup for request
  20. 20. PENDING RANGES Solution 20 • Cassandra-9258 • Use two NavigableMaps to implement the pending ranges • We can expand or shrink the cluster without affecting requests • Thanks Branimir Lambov for patch review and feedbacks.
  21. 21. (CASSANDRA-6908) 3. DYNAMIC SNITCH 21
  22. 22. DYNAMIC SNITCH 22 • High read latency during peak time. • Unnecessary cross region requests. • dynamic_snitch_badness_threshold: 50 • 10X P99 latency drop
  23. 23. 4. COMPACTION 23
  24. 24. COMPACTION IMPROVEMENTS 24 • Track the write amplification. (CASSANDRA-11420) • Optimize the overlapping lookup. (CASSANDRA-11571) • Optimize the isEOF() checking. (CASSANDRA-12013) • Avoid searching for column index. (CASSANDRA-11450) • Persist last compacted key per level. (CASSANDRA-6216) • Compact tables before making available in L0. (CASSANDRA-10862)
  25. 25. 5. BIG HEAP SIZE 25
  26. 26. BIG HEAP SIZE 26 • 64G max heap size • 16G new gen size • -XX:MaxTenuringThreshold=6 • Young GC every 10 seconds • Avoid full GC • 2X P99 latency drop
  27. 27. (CASSANDRA-10406) 6. NODETOOL REBUILD RANGE 27
  28. 28. NODETOOL REBUILD 28 • rebuild may fail for nodes with TBs of data • Cassandra-10406 • support to rebuild the failed token ranges • Thanks Yuki Morishita for reviewing
  29. 29. CHALLENGES
  30. 30. PERFORMANCE 30 P99 Read Latency Latency on the C* nodes, even higher on the client side.
  31. 31. PERFORMANCE 31 Compaction has difficulties to catch up Impact the read latency
  32. 32. PERFORMANCE 32 Compaction uses too much CPU (40%+)
  33. 33. PERFORMANCE 33 Tombstone
  34. 34. SCALABILITY 34 Gossip, nodes see inconsistent ring (CASSANDRA-11709, CASSANDRA-11740)
  35. 35. FEATURES 35 Counter, problem with repair (CASSANDRA-11432, CASSANDRA-10862) SSTables in each level: [966/4, 20/10, 152/100, 33, 0, 0, 0, 0, 0]
  36. 36. CLIENT 36 Access C* from different languages Cassandra ClusterService Service Service
  37. 37. OPERATIONS 37 Cluster expansion takes long time 15 days to bootstrap 30 nodes
  38. 38. RECAP 38 • Proxy Node • Pending Ranges • Dynamic Snitch • Compaction • Big heap size • Nodetool rebuild range token • P99 Read latency • Compaction • Tombstone • Gossip • Counter • Client • Cluster expansion ChallengesImprovements
  39. 39. QUESTIONS?

×