Your SlideShare is downloading. ×
0
MongoDB to Cassandra                         The Atlas OdysseyFred van den Driessche     Tom McAdam        Adam Horwich   ...
http://flickr.com/photos/dhammza/88644497/
Our platform - late 2012                                                                         tbc                   tbc...
?
Main clients                   Main Partners               Data Partners
What is Atlas?                           /contentBBC                          /schedules                            /topic...
DEMO
Atlas Data Modelbrand                      itemseries                    version              broadcast             location
MongoDB• flexible• features• really simple• shell
Where MongoDB falls short• too simple• lack of control• sharding• embedding
Where to?
Where to?•   add a cache?
Atlas API•       content    •     http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/    ...
Atlas API•       content    •     http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/    ...
Why Cassandra?•scalability/performance• row caches• consistency control• column-based model matches our use case
And?• ElasticSearch• messaging• tooling: bootstraps
What is Atlas?BBC         Data ingest           server             DB PA C4                 Update bus        HTTP servere...
Data model•   columns to model annotations•   secondary indexes    •   index.direct(keyspace, SEGMENT_URI_INDEX_CF, Consis...
ID generation• give external data our own ID on ingest• needs to be user-friendly:  http://www.radiotimes.com/programme/cf...
Where we’re at• already live with some data• alpha release of schedule endpoint coming soon• later: roll out across other ...
Ops
Ops in Cassandra•   we love Puppet•    it’s great for automation and deployment•    MongoDB: 1 file•    Cassandra: 2 files!•...
Cassandra Tokens•   define where data is written to    in a cluster•   therefore balanced tokens =    balanced cluster•   t...
Cassandra plays nicely with AWS•   datacentre / rack aware•    AWS Region = Datacentre•    AWS Availability Zone = Rack•  ...
Monitoring•   Nagios is a little threadbare for Cassandra•    basic TCP service check•    stats from API not very helpful•...
OpsCenter•   wonderful for an overview•    not so much for alerting ;)•   ohai API•    can integrate metrics into Nagios
Disaster Recovery•   we operate a 4 node cluster presently •   replication factor of 3 with quorum read/writes•   DR compl...
Cluster Happiness and Headaches•   little maintenance overhead• cluster rebalancing •   uncommon maintenance procedure•   ...
Summary• Mongo is good, Atlas has outgrown it• Cassandra isn’t a drop-in replacement• Ops more complex but so far so good
Questions?
MongoDB to Cassandra
MongoDB to Cassandra
Upcoming SlideShare
Loading in...5
×

MongoDB to Cassandra

1,650

Published on

An overview of experiences of moving from MongoDB to Cassandra from the team at metabroadcast.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,650
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
23
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "MongoDB to Cassandra"

  1. 1. MongoDB to Cassandra The Atlas OdysseyFred van den Driessche Tom McAdam Adam Horwich Engineer CTO Systems Engineer @fredvdd @tfm @Mmmkayness
  2. 2. http://flickr.com/photos/dhammza/88644497/
  3. 3. Our platform - late 2012 tbc tbc MetaBroadcast platformVideo and audio metadata Profiles and activity from video and from 20+ sources Analytic requests and groupings audio products, social networks
  4. 4. ?
  5. 5. Main clients Main Partners Data Partners
  6. 6. What is Atlas? /contentBBC /schedules /topics PA ATLAS C4 sitemaps radioplayeretc... DB interlinking
  7. 7. DEMO
  8. 8. Atlas Data Modelbrand itemseries version broadcast location
  9. 9. MongoDB• flexible• features• really simple• shell
  10. 10. Where MongoDB falls short• too simple• lack of control• sharding• embedding
  11. 11. Where to?
  12. 12. Where to?• add a cache?
  13. 13. Atlas API• content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations• schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk• api explorer http://atlas.metabroadcast.com/#apiExplorer
  14. 14. Atlas API• content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations• schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk• api explorer http://atlas.metabroadcast.com/#apiExplorer
  15. 15. Why Cassandra?•scalability/performance• row caches• consistency control• column-based model matches our use case
  16. 16. And?• ElasticSearch• messaging• tooling: bootstraps
  17. 17. What is Atlas?BBC Data ingest server DB PA C4 Update bus HTTP serveretc... ES
  18. 18. Data model• columns to model annotations• secondary indexes • index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM). from(segment.getCanonicalUri()). to(segment.getIdentifier()). index().execute(requestTimeout, TimeUnit.MILLISECONDS);
  19. 19. ID generation• give external data our own ID on ingest• needs to be user-friendly: http://www.radiotimes.com/programme/cf2/eastenders• mongo: findAndModify()• solution: uses Astyanax client with its distributed locking• more details: http://metabroadcast.com/blog/let- cassandra-identify-your-data
  20. 20. Where we’re at• already live with some data• alpha release of schedule endpoint coming soon• later: roll out across other endpoints
  21. 21. Ops
  22. 22. Ops in Cassandra• we love Puppet• it’s great for automation and deployment• MongoDB: 1 file• Cassandra: 2 files!• oh... tokens
  23. 23. Cassandra Tokens• define where data is written to in a cluster• therefore balanced tokens = balanced cluster• tokens should be rack aware• tools available to provide appropriate tokens for you
  24. 24. Cassandra plays nicely with AWS• datacentre / rack aware• AWS Region = Datacentre• AWS Availability Zone = Rack• only recently introduced in MongoDB but simple to implement in Cassandra• horizontally (and vertically) scalable
  25. 25. Monitoring• Nagios is a little threadbare for Cassandra• basic TCP service check• stats from API not very helpful• nodetool and CLI tools useful• manual effort to integrate them• if only there was some useful service...
  26. 26. OpsCenter• wonderful for an overview• not so much for alerting ;)• ohai API• can integrate metrics into Nagios
  27. 27. Disaster Recovery• we operate a 4 node cluster presently • replication factor of 3 with quorum read/writes• DR complicated by tokens• cluster should be balanced• snapshot + S3 Backups
  28. 28. Cluster Happiness and Headaches• little maintenance overhead• cluster rebalancing • uncommon maintenance procedure• schema changes are cumbersome • little scope for rollback, can put cluster in unrecoverable state
  29. 29. Summary• Mongo is good, Atlas has outgrown it• Cassandra isn’t a drop-in replacement• Ops more complex but so far so good
  30. 30. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×