Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018

2,453 views

Published on

GumGum recently moved to Amazon DynamoDB from Apache Cassandra. In this session, we discuss the architecture and design decisions made in the process, including comparisons of different NoSQL database options. We also share the justifications and steps taken in order to plan and complete the migration process. Finally, we cover the benefits and outcome of the migration, including performance boost, cost savings, and maintenance reductions.

  • Be the first to comment

How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How GumGum Migrated from Cassandra to Amazon DynamoDB Anirban Roy Lead Engineer GumGum D A T 3 4 5
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Introduction Background Alternatives and comparison About the data Migration strategy Observations and benefits Q&A
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High traffic with surges 90% of our traffic involves our programmatic partners Introduction: Background Low response time Maintaining low latency is key to revenue
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Cassandra We use to run 106 nodes of i3.2xlarge instances on AWS Introduction: The problem Scaling Required adding nodes manually to the cluster
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data center outrages Introduction: The problem Revenue loss Engineering fatigue
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives More than 225 available (source: nosql-database.org)
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives GumGum’s Blogpost: https://techblog.gumgum.com/articles/moving-to-amazon- dynamodb-from-hosted-cassandra
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benchmarking DynamoDB • YCSB benchmarked • Loaded ~20 million items (~22 GB) GumGum’s blogpost: https://techblog.gumgum.com/articles/moving-to- amazon-dynamodb-from-hosted-cassandra YCSB https://github.com/brianfrankcooper/YCSB Apache Cassandra • Achieved ~125,000 reads per second and ~40,000 writes per second • ~3-5ms read latency
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Behavioral targeting data DMP partners DSP partners Cookie syncing 30 days TTL
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. GumGum Metadata Store (replicated across all four data centers of GG) Contextual targeting data Image URL Page URL 30 Days to one year TTL for images Seven days to one year TTL for pages GumGum TaPas (NLP)GumGum Vertex (CV) ECS spot ECS spot ECS spot images_metadata pages_metadata Vertex spot node Vertex spot node Vertex spot node
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Behavioral targeting data migration Migration involved the following • Data volume is considerably bigger • No ETL operation required for migration • WRITE -> WAIT -> READ approach • Exploit the fact that TTL is short (30 days - WAIT phase) Visitors keyspace visitors Ad server Ad server Ad server
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Contextual targeting data migration images_metadata pages_metadata Extract data Transform data Load dataCassandra keyspace images_metadata pages_metadata
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Caching: DAX or Memcached When using DAX (only with DynamoDB) AWS DAX When using Memcached GumGum ad servers Memcached node Memcached node DAX node DAX node NOSQL store GumGum ad servers Ad server Ad server Ad server Ad Server Ad Server Ad Server
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data replication requirements • Behavioral targeting • Data is required to be replicated between the US east and US west data centers • Global replication is not required • Contextual targeting • Data replication is required across all the four data centers of GumGum • Global Tables was used to achieve replication During development for behavioral targeting data, replication was not yet supported by DynamoDB
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data replication architecture: Master-Master Modified dynamodb-cross-region-library to perform Master-Master replication. Changes can be found at https://github.com/awslabs/dynamodb-cross-region-library/pull/53 AWS Region US East 1 AWS Cloud VPC AWS Region US West 2 VPC Auto scaling replicator replicator Auto scaling replicator replicator
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits: performance • 4-5ms read latency • No throttles • Zero outages so far • Less timeouts than Cassandra 4-5ms read latency
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits: Cost • Cassandra hosting cost • 80 i3.2xlarge instances • Total hosting cost: 0.624000 x 24 x 365 x 80 = $437299.2 USD • DynamoDB running cost • Per month = ~450 x 30 = ~13500 USD • Estimated annual cost = 14100 x 12 = $162000 USD • % Saving • {(437299.2 - 162000) x 100}/ 437299.2 = 62.95% 65-70%
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational stats 2 TB data 16.2 billion items ~ 8 million reads per minute All at <3ms read latency
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. But wait - There’s more about DynamoDB A list of all DynamoDB sessions, workshops, and chalk talks • Migrating Apache Cassandra to DynamoDB • What’s new with DynamoDB • Purpose-built databases in AWS • DynamoDB service level agreement • Adaptive capacity • Point-in-time recovery (PITR) • Global tables
  30. 30. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Anirban Roy LinkedIn: anirban51roy
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×