CASSANDRA DAY SILICON VALLEY 2014 – APRIL 7TH – MATT JURIK
SCALING VIDEO PROGRESS TRACKING
MATT JURIK
SOFTWARE DEVELOPER
WHAT IS HULU?
3
Help people find and enjoy
the world’s premium content
when, where and how they want it.
HULU’S MISSION
5
•  Service Oriented Architecture
•  Follow the Unix Philosophy
•  Small services with specialized scopes
•  Small teams ...
VIDEO PROGRESS TRACKING
CODENAME: HUGETOP
6
7
AGENDA
•  Old architecture
•  New architecture
•  Keyspace design
•  Migrating to cassandra
•  Operations
9
OLD ARCHITECTURE (MYSQL)
HUGETOP (PYTHON)
OTHER SERVICESDEVICESHULU.COM
64 Redis Shards
(Persistence-enabled)
API (PYTHO...
10
NEW ARCHITECTURE (C*)
HUGETOP (PYTHON)
OTHER SERVICESDEVICESHULU.COM
64 Redis Shards
(Cache-only)
CRAPI (JAVA)
8 Cassan...
The dilemma
•  Unbounded data growth
•  MySQL very stable, but servers running out of space
•  “Manually resharding is fun...
12
INTERACTION BETWEEN REDIS + CASSANDRA
HUGETOP
64 Redis Shards
(Cache-only)
CRAPI
8 Cassandra Nodes
Video position updat...
Take one
•  Hadoop-class machines
•  Physical boxes (i.e., no VMs)
•  6 standard 7200rpm drives
•  32gb RAM
•  Leveled com...
14
•  Query last position for user=X, video=Y
•  Query last position for user=X, video=*
•  Daily log of all views needed ...
•  Single row containing one day’s worth of data = too BIG + causes hotspots
•  Fetching single row in parallel is slow
• ...
16
MIGRATING FROM MYSQL ! CASSANDRA
HUGETOP
1 Read/write to MySQL
MySQL Cassandra
2 Duplicate writes+deletes to Cassandra
...
17
MIGRATING FROM MYSQL ! CASSANDRA
HUGETOP
Backfill old data
Again, write to Cassandra with column timestamp = last_viewe...
•  Use internal tool for automating repairs, backups, etc.
•  Metrics
•  Dump metrics to graphite via custom -javaagent wh...
•  SSTable Corruption
•  nodetool scrub
•  sstablescrub – if things are really bad
•  Things to watch:
•  Snapshots awesom...
20
THANK YOU
QUESTIONS?
Upcoming SlideShare
Loading in …5
×

Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Apache Cassandra

1,114 views

Published on

At Hulu, we deal with scaling our web services to meet the demands of an ever growing number of users. During this talk, we will discuss our initial use case for cassandra at Hulu: the video progress tracking service known as hugetop. While cassandra provides a fantastic platform on which to build scalable applications, there are some dark corners of which to be cautious. We will provide a walkthrough of hugetop and some design decisions that went into the hugetop keyspace, our hardware choices, and our experiences operating cassandra in a high-traffic environment.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,114
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
33
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Apache Cassandra

  1. 1. CASSANDRA DAY SILICON VALLEY 2014 – APRIL 7TH – MATT JURIK SCALING VIDEO PROGRESS TRACKING
  2. 2. MATT JURIK SOFTWARE DEVELOPER
  3. 3. WHAT IS HULU? 3
  4. 4. Help people find and enjoy the world’s premium content when, where and how they want it. HULU’S MISSION
  5. 5. 5 •  Service Oriented Architecture •  Follow the Unix Philosophy •  Small services with specialized scopes •  Small teams focusing on specific areas •  Right tool for the job •  Many languages, frameworks, formats •  Cross team development encouraged •  If something you depend on needs fixing, feel free to fix it
  6. 6. VIDEO PROGRESS TRACKING CODENAME: HUGETOP 6
  7. 7. 7
  8. 8. AGENDA •  Old architecture •  New architecture •  Keyspace design •  Migrating to cassandra •  Operations
  9. 9. 9 OLD ARCHITECTURE (MYSQL) HUGETOP (PYTHON) OTHER SERVICESDEVICESHULU.COM 64 Redis Shards (Persistence-enabled) API (PYTHON) 8 MySQL Shards
  10. 10. 10 NEW ARCHITECTURE (C*) HUGETOP (PYTHON) OTHER SERVICESDEVICESHULU.COM 64 Redis Shards (Cache-only) CRAPI (JAVA) 8 Cassandra Nodes
  11. 11. The dilemma •  Unbounded data growth •  MySQL very stable, but servers running out of space •  “Manually resharding is fun!” – No one, ever Why cassandra? •  Our data fits cassandra’s data model well. •  Cassandra promises (and delivers) great scalability •  Highly available •  Multi-DC 11 WHY SWITCH?
  12. 12. 12 INTERACTION BETWEEN REDIS + CASSANDRA HUGETOP 64 Redis Shards (Cache-only) CRAPI 8 Cassandra Nodes Video position updates 1.  Write position info to cassandra 2.  Update Redis Video position requests Check redis: If data is loaded in redis, return it. Else: Fetch user’s history from cassandra, Queue job to update redis, Return data fetched from cassandra. Redis •  Maintains complex indices •  Enrich data by simulating joins with Lua Cassandra •  Provides durability •  Replenish Redis as necessary
  13. 13. Take one •  Hadoop-class machines •  Physical boxes (i.e., no VMs) •  6 standard 7200rpm drives •  32gb RAM •  Leveled compaction + JBOD •  Write throughput J •  Read latency L 13 HARDWARE CONSIDERATIONS Take two •  SSD-based machines •  Physical boxes (c-states disabled) •  550gb RAID5 •  48gb RAM •  Leveled compaction •  Write throughput J •  Read latency J •  16 nodes split between 2 DCs
  14. 14. 14 •  Query last position for user=X, video=Y •  Query last position for user=X, video=* •  Daily log of all views needed by other services •  Two tables: one for updates; one for deletes. •  Shard data across rows •  TTL’d KEYSPACE DESIGN Copy 1 CREATE TABLE views ( u int, # User ID v int, # Video ID c boolean, # Is completed? p float, # Video position t timestamp, # Last viewed at ..., # Other fields PRIMARY KEY (u, v) ); CREATE TABLE daily_user_views ( s int, # Partition key u int, # User ID v int, # Video ID ..., # Other fields PRIMARY KEY (s, u, v) ); Copy 2
  15. 15. •  Single row containing one day’s worth of data = too BIG + causes hotspots •  Fetching single row in parallel is slow •  Solution: shard each day across 128 rows => Spreads data across multiple nodes => Query multiple nodes in parallel 15 SHARDING!? Partition key userID % 128 + daysBetween(EPOCH, viewDate) * 128 April 7th, 2014 (daysBetween(EPOCH, “April 7th, 2014”) = 16167): for(int i = 0; i < 128; i++) { int k = i % 128 + 16167 * 128 execute(“SELECT * FROM daily_user_views WHERE s = “ + k) }
  16. 16. 16 MIGRATING FROM MYSQL ! CASSANDRA HUGETOP 1 Read/write to MySQL MySQL Cassandra 2 Duplicate writes+deletes to Cassandra - column timestamps = last_played_at date ß Critical for next step - apply deletions, but also temporarily store them in deletion_ledger
  17. 17. 17 MIGRATING FROM MYSQL ! CASSANDRA HUGETOP Backfill old data Again, write to Cassandra with column timestamp = last_viewed_at date (prevents old position from overwriting new position) MySQL Cassandra 3 Replay deletions stored in deletion_ledger Just like inserts, you can specify a timestamp for deletions. column timestamp = time at which original deletion occurred (prevents deleting new data) 4
  18. 18. •  Use internal tool for automating repairs, backups, etc. •  Metrics •  Dump metrics to graphite via custom -javaagent which hooks into yammer metrics •  Implement a MetricPredicate to filter boring metrics •  High level monitoring (something is usually wrong if): •  d(hint count)/dt > 0 •  Large number of old gen collections •  Lots of SSTables in L0 (and not importing data, bootstrapping, etc) 18 OPERATIONS
  19. 19. •  SSTable Corruption •  nodetool scrub •  sstablescrub – if things are really bad •  Things to watch: •  Snapshots awesome, but can quickly burn disk space •  Keep nodes under 50% disk utilization, even if using Leveled Compaction. 19 OPERATIONS
  20. 20. 20
  21. 21. THANK YOU
  22. 22. QUESTIONS?

×