MongoDB to Cassandra

2,150 views

Published on

An overview of experiences of moving from MongoDB to Cassandra from the team at metabroadcast.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,150
On SlideShare
0
From Embeds
0
Number of Embeds
307
Actions
Shares
0
Downloads
27
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

MongoDB to Cassandra

  1. 1. MongoDB to Cassandra The Atlas OdysseyFred van den Driessche Tom McAdam Adam Horwich Engineer CTO Systems Engineer @fredvdd @tfm @Mmmkayness
  2. 2. http://flickr.com/photos/dhammza/88644497/
  3. 3. Our platform - late 2012 tbc tbc MetaBroadcast platformVideo and audio metadata Profiles and activity from video and from 20+ sources Analytic requests and groupings audio products, social networks
  4. 4. ?
  5. 5. Main clients Main Partners Data Partners
  6. 6. What is Atlas? /contentBBC /schedules /topics PA ATLAS C4 sitemaps radioplayeretc... DB interlinking
  7. 7. DEMO
  8. 8. Atlas Data Modelbrand itemseries version broadcast location
  9. 9. MongoDB• flexible• features• really simple• shell
  10. 10. Where MongoDB falls short• too simple• lack of control• sharding• embedding
  11. 11. Where to?
  12. 12. Where to?• add a cache?
  13. 13. Atlas API• content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations• schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk• api explorer http://atlas.metabroadcast.com/#apiExplorer
  14. 14. Atlas API• content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations• schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk• api explorer http://atlas.metabroadcast.com/#apiExplorer
  15. 15. Why Cassandra?•scalability/performance• row caches• consistency control• column-based model matches our use case
  16. 16. And?• ElasticSearch• messaging• tooling: bootstraps
  17. 17. What is Atlas?BBC Data ingest server DB PA C4 Update bus HTTP serveretc... ES
  18. 18. Data model• columns to model annotations• secondary indexes • index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM). from(segment.getCanonicalUri()). to(segment.getIdentifier()). index().execute(requestTimeout, TimeUnit.MILLISECONDS);
  19. 19. ID generation• give external data our own ID on ingest• needs to be user-friendly: http://www.radiotimes.com/programme/cf2/eastenders• mongo: findAndModify()• solution: uses Astyanax client with its distributed locking• more details: http://metabroadcast.com/blog/let- cassandra-identify-your-data
  20. 20. Where we’re at• already live with some data• alpha release of schedule endpoint coming soon• later: roll out across other endpoints
  21. 21. Ops
  22. 22. Ops in Cassandra• we love Puppet• it’s great for automation and deployment• MongoDB: 1 file• Cassandra: 2 files!• oh... tokens
  23. 23. Cassandra Tokens• define where data is written to in a cluster• therefore balanced tokens = balanced cluster• tokens should be rack aware• tools available to provide appropriate tokens for you
  24. 24. Cassandra plays nicely with AWS• datacentre / rack aware• AWS Region = Datacentre• AWS Availability Zone = Rack• only recently introduced in MongoDB but simple to implement in Cassandra• horizontally (and vertically) scalable
  25. 25. Monitoring• Nagios is a little threadbare for Cassandra• basic TCP service check• stats from API not very helpful• nodetool and CLI tools useful• manual effort to integrate them• if only there was some useful service...
  26. 26. OpsCenter• wonderful for an overview• not so much for alerting ;)• ohai API• can integrate metrics into Nagios
  27. 27. Disaster Recovery• we operate a 4 node cluster presently • replication factor of 3 with quorum read/writes• DR complicated by tokens• cluster should be balanced• snapshot + S3 Backups
  28. 28. Cluster Happiness and Headaches• little maintenance overhead• cluster rebalancing • uncommon maintenance procedure• schema changes are cumbersome • little scope for rollback, can put cluster in unrecoverable state
  29. 29. Summary• Mongo is good, Atlas has outgrown it• Cassandra isn’t a drop-in replacement• Ops more complex but so far so good
  30. 30. Questions?

×