• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
C* Summit EU 2013: Playlists at Spotify - Using Cassandra to Store Version Controlled Objects
 

C* Summit EU 2013: Playlists at Spotify - Using Cassandra to Store Version Controlled Objects

on

  • 771 views

Speaker: Jimmy Mardell, Senior Software Engineer at Spotify ...

Speaker: Jimmy Mardell, Senior Software Engineer at Spotify
Video: http://www.youtube.com/watch?v=NQXxKzfv7Zo&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=17
All systems at Spotify have to deal with huge amounts of data. Playlists in particular is a unique challenge. We need to store more than one billion playlists, and make them accessible for not only the playlist owner but also subscribers. Furthermore, we need to handle concurrent changes to collaborative playlists and offline scenarios. The devised solution treats every playlist as a versioned object. We use Cassandra to store these objects in an efficient way, allowing fast read- and write queries. The road there was not pain free however. I will talk about the data model we ended up using, and lessons learned along the way.

Statistics

Views

Total Views
771
Views on SlideShare
768
Embed Views
3

Actions

Likes
1
Downloads
10
Comments
0

3 Embeds 3

http://localhost 1
http://23.253.69.203 1
https://twitter.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    C* Summit EU 2013: Playlists at Spotify - Using Cassandra to Store Version Controlled Objects C* Summit EU 2013: Playlists at Spotify - Using Cassandra to Store Version Controlled Objects Presentation Transcript

    • Playlists at Spotify Using Cassandra to store version controlled objects at large scale Jimmy Mårdell <yarin@spotify.com> #CassandraEU October 18, 2013
    • Intro About me • Jimmy Mårdell • Software Engineer • 3 years at Spotify #CassandraEU 2
    • Intro About Spotify • 24 million active users – 6 million paying subscribers • 4 000 servers in 4 data centers • Over 1 billion playlists created #CassandraEU 3
    • #CassandraEU Intro Contents •Why version control? •Playlists at Spotify •Cassandra data model •Lessons learned 4
    • Why version control? #CassandraEU What is version control? • “Version control is the management of changes to documents” (Wikipedia) • Stand-alone (most common) – GIT, Subversion etc • Embedded – Google Docs 5
    • Why version control? Embedded usage • Collaborative editing • Undo functionality • Performance • Business logic depends on document history #CassandraEU 6
    • Playlists at Spotify Playlists #CassandraEU 7
    • Playlists at Spotify #CassandraEU 8
    • Playlists at Spotify Playlist challenges • More than 1 billion playlists • >40 000 requests/second at peak • Offline mode • Concurrent changes #CassandraEU 9
    • Playlists at Spotify Playlist client-server • Every playlist is a version controlled object • All playlists are synced on login – Fetch all new changes #CassandraEU 10
    • Playlists at Spotify Playlist client-server • Local queue of playlist modifications – Clients optimistically accept changes - fast UI • Queue flushed to server when possible – Offline changes – Fault tolerant #CassandraEU 11
    • #CassandraEU Playlists at Spotify 12 Playlist version control 3,038f...: REM(from=2, len=1) A C 2,19ca...: MOV(from=2, to=1, len=1) A C B 1,4ed2...: ADD(ix=0, track=A,B,C) A B C 0,ROOT Representation of a playlist in the backend
    • #CassandraEU Playlists at Spotify Playlist branching • Concurrent changes – Offline A B 13
    • #CassandraEU Playlists at Spotify Playlist branching merge • Concurrent changes – Offline • Conflict resolution – Operational Transformation • Clients oblivious of branches B’ A’ A B 14
    • Cassandra data model Cassandra data model #CassandraEU 15
    • Cassandra data model Cassandra at Spotify • Playlist first system to use Cassandra – Now we use it a lot... • Started with Cassandra 0.7 • Using limited set of Cassandra features – No super columns – No CQL #CassandraEU 16
    • Cassandra data model Planning a data model • Start with the queries! • Three common playlist queries – SYNC: Get all changes since a particular revision – GET: Get the most recent snapshot – APPEND: Add/move/delete tracks #CassandraEU 17
    • #CassandraEU Cassandra data model Playlist data model CF playlist_change Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA 1,4ed2... parent=0,ROOT op=ADD(ix=0, track=A,B,C) 2,19ca... parent=1,4ed2... op=MOV(from=2, to=1, len=1) 3,038f... parent=2,19ca op=REM(from=2, len=1) 18
    • #CassandraEU Cassandra data model 19 Playlist data model CF playlist_change Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA Row key 1,4ed2... 2,19ca... parent=0,ROOT op=ADD(ix=0, track=A,B,C) parent=1,4ed2... op=MOV(from=2, to=1, len=1) 1,8a20... 2,dd07... spotify:user:yarin:playlist: prnt=0,ROOT 4Pj4dCOEEYWDixfYyJwxEf op=... 2,b783... prnt=1,8a20... op=... prnt=1,8a20... op=... 3,39ef... 3,038f... parent=2,19ca op=REM(from=2, len=1) 3,5a9c... prnt=2,dd07... prnt=2,b783... op=... op=... 4,03fc... prnt=2,39ef... prnt=3,5a9c...
    • Cassandra data model Playlists in Cassandra • Which revision is the latest? – Changes with no children • Multiple heads possible! – Heads may appear anywhere within the row #CassandraEU 20
    • #CassandraEU Cassandra data model Playlist data model CF playlist_change Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA 1,4ed2... prnt=0,ROOT op=... CF playlist_head 2,19ca... prnt=1,4ed2... op=... 3,038f... prnt=2,19ca op=... Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA 3,038f... 21
    • #CassandraEU Cassandra data model 22 Playlist data model CF playlist_change Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA Row key spotify:user:yarin:playlist: 4Pj4dCOEEYWDixfYyJwxEf 1,4ed2... prnt=0,ROOT op=... 1,8a20... prnt=0,ROOT op=... CF playlist_head 2,19ca... prnt=1,4ed2... op=... 2,b783... prnt=1,8a20... op=... 3,038f... prnt=2,19ca op=... 2,dd07... prnt=1,8a20... op=... Row key 3,038f... spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA Row key spotify:user:yarin:playlist: 4Pj4dCOEEYWDixfYyJwxEf 2,b783... 2,dd07...
    • #CassandraEU Cassandra data model Playlist data model CF playlist_change Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA Row key CF playlist_head 1,4ed2... prnt=0,ROOT op=... 2,19ca... prnt=1,4ed2... op=... 3,038f... prnt=2,19ca op=... 1,8a20. 2,b783. 2,dd07. 3,39ef. 3,5a9c. 4,03fc. spotify:user:yarin:p laylist:4Pj4dCOEE prt=0,ROOT YWDixfYyJwxEf op=... prnt=1,8a20 op=... prnt=1,8a20 op=... prnt=2,dd07 op=... prnt=2,b783 op=... prnt=2,39ef prnt=3,5a9c Row key 3,038f... spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA Row key spotify:user:yarin:playlist: 4Pj4dCOEEYWDixfYyJwxEf 4,03fc... 23
    • Cassandra data model Playlist heads • playlist_head is a small CF – Fits in RAM • 95% of playlist request only read from playlist_head – Most playlists are already up-to-date #CassandraEU 24
    • Cassandra data model Playlist snapshots • playlist_change works well when syncing playlists • Not so well for fetching new playlists – Snapshot cache #CassandraEU 25
    • #CassandraEU Cassandra data model Playlist data model CF playlist_change Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA Row key spotify:user:yarin:playlist: 4Pj4dCOEEYWDixfYyJwxEf 1,4ed2... prnt=0,ROOT op=... 1,8a20... prnt=0,ROOT op=... CF playlist_snapshot 2,19ca... prnt=1,4ed2... op=... 2,b783... prnt=1,8a20... op=... 3,038f... prnt=2,19ca op=... 2,dd07... prnt=1,8a20... op=... Row key spotify:user:spotify:playlist: 3ZgmfR6lsnCwdffZUan8EA cache version=3,038f... contents=A,C Row key cache spotify:user:yarin:playlist: 4Pj4dCOEEYWDixfYyJwxEf version=2,b783... contents=... 26
    • Cassandra data model Updating playlists • Validate change – Locate snapshot – Client may append to old version • Update all tables – playlist_head last #CassandraEU 27
    • Cassandra data model Cassandra consistency levels • Replication factor 3 • All writes using CL_QUORUM • Reads from playlist_head – CL_QUORUM • Reads from playlist_change and playlist_snapshot – CL_ONE but may fallback to CL_QUORUM #CassandraEU 28
    • Lessons learned Lessons learned #CassandraEU 29
    • Lessons learned Optimizations • Leveled compaction – Improved performance a lot • Compression – Not as impressive – CRC checks #CassandraEU 30
    • Lessons learned Optimizations • Trusted Linux page cache to ensure playlist_head kept in RAM – Didn’t work • Tried Cassandra row cache – NO! • mlock to the rescue #CassandraEU 31
    • Lessons learned #CassandraEU An enterprise ready solution bash# while true; do vmtouch -m 10000000000 -l *head* & sleep 10m kill %vmtouch done 32
    • Lessons learned No moving parts • Flash disks are awesome • Reduced size of cluster from 60 to 30 nodes – Thanks FusionIO! • IOPS no longer the bottleneck #CassandraEU 33
    • Lessons learned Tombstone hell • Noticed requests to playlist_head took several seconds – Huh? • Every change causes a value to be deleted in playlist_head • playlist_head is essentially a queue – Well-known anti-pattern #CassandraEU 34
    • Lessons learned Tombstone hell • We had rows with >500,000 tombstones • Solution: major compaction – Relatively fast since playlist_head is in RAM #CassandraEU 35
    • Lessons learned And more... • Large rows in playlist_change – Modify version graph • Reduce amount of requests – Group playlists by owner Sounds interesting? We’re hiring! #CassandraEU 36
    • Questions?