SlideShare a Scribd company logo
1 of 32
Download to read offline
MongoDB to Cassandra
                         The Atlas Odyssey




Fred van den Driessche     Tom McAdam        Adam Horwich
       Engineer                CTO           Systems Engineer
       @fredvdd                @tfm          @Mmmkayness
http://flickr.com/photos/dhammza/88644497/
Our platform - late 2012

                                                                         tbc                   tbc




                             MetaBroadcast platform



Video and audio metadata       Profiles and activity from video and
    from 20+ sources                                                 Analytic requests and groupings
                                 audio products, social networks
?
Main clients                   Main Partners




               Data Partners
What is Atlas?
                           /content
BBC
                          /schedules

                            /topics
 PA

              ATLAS

 C4
                           sitemaps

                          radioplayer
etc...         DB
                          interlinking
DEMO
Atlas Data Model

brand                      item




series                    version




              broadcast             location
MongoDB


• flexible

• features

• really simple

• shell
Where MongoDB falls short


• too simple

• lack of control

• sharding

• embedding
Where to?
Where to?




•   add a cache?
Atlas API
•       content

    •     http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/
          b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

    •     http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/
          b0074g7p&annotations=description,brand_summary,locations

•       schedules

    •     http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.
          3h&channel=bbcone&publisher=bbc.co.uk

    •     http://atlas.metabroadcast.com/3.0/schedule.json?
          from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

•       api explorer http://atlas.metabroadcast.com/#apiExplorer
Atlas API
•       content

    •     http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/
          b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

    •     http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/
          b0074g7p&annotations=description,brand_summary,locations

•       schedules

    •     http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.
          3h&channel=bbcone&publisher=bbc.co.uk

    •     http://atlas.metabroadcast.com/3.0/schedule.json?
          from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

•       api explorer http://atlas.metabroadcast.com/#apiExplorer
Why Cassandra?


•scalability/performance

• row caches

• consistency control

• column-based model matches our use case
And?



• ElasticSearch

• messaging

• tooling: bootstraps
What is Atlas?
BBC
         Data ingest
           server             DB
 PA



 C4
                 Update bus        HTTP server

etc...




                              ES
Data model
•   columns to model annotations




•   secondary indexes
    •   index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM).


            from(segment.getCanonicalUri()).
            to(segment.getIdentifier()).
            index().execute(requestTimeout, TimeUnit.MILLISECONDS);
ID generation
• give external data our own ID on ingest

• needs to be user-friendly:
  http://www.radiotimes.com/programme/cf2/eastenders

• mongo: findAndModify()

• solution: uses Astyanax client with its distributed locking

• more details: http://metabroadcast.com/blog/let-
  cassandra-identify-your-data
Where we’re at



• already live with some data

• alpha release of schedule endpoint coming soon

• later: roll out across other endpoints
Ops
Ops in Cassandra

•   we love Puppet
•    it’s great for automation and deployment

•    MongoDB: 1 file

•    Cassandra: 2 files!




•   oh... tokens
Cassandra Tokens

•   define where data is written to
    in a cluster

•   therefore balanced tokens =
    balanced cluster

•   tokens should be rack aware
•    tools available to provide appropriate tokens
     for you
Cassandra plays nicely with AWS

•   datacentre / rack aware
•    AWS Region = Datacentre

•    AWS Availability Zone = Rack


•   only recently introduced in MongoDB but simple to
    implement in Cassandra

•   horizontally (and vertically) scalable
Monitoring

•   Nagios is a little threadbare for Cassandra
•    basic TCP service check

•    stats from API not very helpful


•   nodetool and CLI tools useful
•    manual effort to integrate them


•   if only there was some useful service...
OpsCenter

•   wonderful for an overview
•    not so much for alerting ;)




•   ohai API
•    can integrate metrics into Nagios
Disaster Recovery

•   we operate a 4 node cluster presently
 •   replication factor of 3 with quorum read/writes


•   DR complicated by tokens

• cluster should be balanced

• snapshot + S3 Backups
Cluster Happiness and Headaches

•   little maintenance overhead

• cluster rebalancing
 •   uncommon maintenance procedure




•   schema changes are cumbersome
 •   little scope for rollback, can put cluster in unrecoverable state
Summary



• Mongo is good, Atlas has outgrown it

• Cassandra isn’t a drop-in replacement

• Ops more complex but so far so good
Questions?

More Related Content

What's hot

Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Databricks
 
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Spark Summit
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark Summit
 

What's hot (20)

Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza Karimi
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLRethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
 
Spark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Nimbus GoehausenSpark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Nimbus Goehausen
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Spark Summit EU talk by Stavros kontopoulos and Justin Pihony
Spark Summit EU talk by Stavros kontopoulos and Justin PihonySpark Summit EU talk by Stavros kontopoulos and Justin Pihony
Spark Summit EU talk by Stavros kontopoulos and Justin Pihony
 
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
 

Viewers also liked

Phytoremediation
PhytoremediationPhytoremediation
Phytoremediation
RANJANI
 
Phytoremediation
PhytoremediationPhytoremediation
Phytoremediation
nazish66
 

Viewers also liked (11)

SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?
 
Phytoremediation
PhytoremediationPhytoremediation
Phytoremediation
 
phyto & myco remediation.
phyto & myco remediation.phyto & myco remediation.
phyto & myco remediation.
 
Phytoremediation
PhytoremediationPhytoremediation
Phytoremediation
 
PHYTOREMEDIATION - Using Plants To Clean Up Our Environment - By Haseeb
PHYTOREMEDIATION - Using Plants To Clean Up Our Environment  - By HaseebPHYTOREMEDIATION - Using Plants To Clean Up Our Environment  - By Haseeb
PHYTOREMEDIATION - Using Plants To Clean Up Our Environment - By Haseeb
 
Phytoremediation.ppt
Phytoremediation.pptPhytoremediation.ppt
Phytoremediation.ppt
 
Phytoremediation
PhytoremediationPhytoremediation
Phytoremediation
 
Phytoremediation
PhytoremediationPhytoremediation
Phytoremediation
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 

Similar to MongoDB to Cassandra

End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Kim Hammar
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
bartzon
 

Similar to MongoDB to Cassandra (20)

MongoDB World 2019: Terraform New Worlds on MongoDB Atlas
MongoDB World 2019: Terraform New Worlds on MongoDB Atlas MongoDB World 2019: Terraform New Worlds on MongoDB Atlas
MongoDB World 2019: Terraform New Worlds on MongoDB Atlas
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Webinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
Webinar: Serverless Architectures with AWS Lambda and MongoDB AtlasWebinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
Webinar: Serverless Architectures with AWS Lambda and MongoDB Atlas
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

MongoDB to Cassandra

  • 1. MongoDB to Cassandra The Atlas Odyssey Fred van den Driessche Tom McAdam Adam Horwich Engineer CTO Systems Engineer @fredvdd @tfm @Mmmkayness
  • 2.
  • 4. Our platform - late 2012 tbc tbc MetaBroadcast platform Video and audio metadata Profiles and activity from video and from 20+ sources Analytic requests and groupings audio products, social networks
  • 5. ?
  • 6. Main clients Main Partners Data Partners
  • 7. What is Atlas? /content BBC /schedules /topics PA ATLAS C4 sitemaps radioplayer etc... DB interlinking
  • 9. Atlas Data Model brand item series version broadcast location
  • 10. MongoDB • flexible • features • really simple • shell
  • 11. Where MongoDB falls short • too simple • lack of control • sharding • embedding
  • 13. Where to? • add a cache?
  • 14. Atlas API • content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations • schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk • api explorer http://atlas.metabroadcast.com/#apiExplorer
  • 15.
  • 16. Atlas API • content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations • schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk • api explorer http://atlas.metabroadcast.com/#apiExplorer
  • 17. Why Cassandra? •scalability/performance • row caches • consistency control • column-based model matches our use case
  • 19. What is Atlas? BBC Data ingest server DB PA C4 Update bus HTTP server etc... ES
  • 20. Data model • columns to model annotations • secondary indexes • index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM). from(segment.getCanonicalUri()). to(segment.getIdentifier()). index().execute(requestTimeout, TimeUnit.MILLISECONDS);
  • 21. ID generation • give external data our own ID on ingest • needs to be user-friendly: http://www.radiotimes.com/programme/cf2/eastenders • mongo: findAndModify() • solution: uses Astyanax client with its distributed locking • more details: http://metabroadcast.com/blog/let- cassandra-identify-your-data
  • 22. Where we’re at • already live with some data • alpha release of schedule endpoint coming soon • later: roll out across other endpoints
  • 23. Ops
  • 24. Ops in Cassandra • we love Puppet • it’s great for automation and deployment • MongoDB: 1 file • Cassandra: 2 files! • oh... tokens
  • 25. Cassandra Tokens • define where data is written to in a cluster • therefore balanced tokens = balanced cluster • tokens should be rack aware • tools available to provide appropriate tokens for you
  • 26. Cassandra plays nicely with AWS • datacentre / rack aware • AWS Region = Datacentre • AWS Availability Zone = Rack • only recently introduced in MongoDB but simple to implement in Cassandra • horizontally (and vertically) scalable
  • 27. Monitoring • Nagios is a little threadbare for Cassandra • basic TCP service check • stats from API not very helpful • nodetool and CLI tools useful • manual effort to integrate them • if only there was some useful service...
  • 28. OpsCenter • wonderful for an overview • not so much for alerting ;) • ohai API • can integrate metrics into Nagios
  • 29. Disaster Recovery • we operate a 4 node cluster presently • replication factor of 3 with quorum read/writes • DR complicated by tokens • cluster should be balanced • snapshot + S3 Backups
  • 30. Cluster Happiness and Headaches • little maintenance overhead • cluster rebalancing • uncommon maintenance procedure • schema changes are cumbersome • little scope for rollback, can put cluster in unrecoverable state
  • 31. Summary • Mongo is good, Atlas has outgrown it • Cassandra isn’t a drop-in replacement • Ops more complex but so far so good