Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv 2017

126 views

Published on

DevOpsDays Tel Aviv 2017

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv 2017

  1. 1. Large Scale NoSQL DB Migration - Under Fire - Ido Barkan - Proprietary & Confidential -
  2. 2. Agenda ● What is Appsflyer? ● Motivation and limitations ● War plan ● What have we learned?
  3. 3. What is Appsflyer? ~25B events per day
  4. 4. Installs DB- Zoom in app install s writer 2K write IOPS app events enrichments 180k read IOPS
  5. 5. We need to replace CouchBase ● Scale out is out of control (45 r3.4xl instances) ● XDCR can’t keep up ● Daily backups take more than a day ● Paid support is not good enough chosen solution: Aerospike
  6. 6. Rules of the game- starting point ● # of records: ~ 2,000,000,000 (repl. factor 3) ● Read intensive: 180K read + 2K write IOPS ● # of machines: 45 r3.4xlarge ● Cost: ~ 30,000 $ per month ● Time frame: a few months. Mission critical
  7. 7. Why is data migration harder than in a relational DB? Relational DB NoSql DB Data types conventional blob, JSON, specialized bins Vendors a few 100+ Query Language SQL N1QL, SQL, JavaScript, HTTP/REST -- creating views/ MapReduce style Schema Yes Maybe Migration tools Yes DIY
  8. 8. Rules of the game No Downtime No data loss
  9. 9. War plan- easy part ● Write new installs/updates to BOTH ● Continue reading from the OLD DBApplication
  10. 10. Breathing (testing) time test test test ● Sizing RAM/Disk ● Expected load ● Metrics ● Alerting ● Logs ● Backups
  11. 11. Easy part (Cont.) ● Write new installs/updates to NEW DB ● Read from BOTH DBs ● A fraction of records (not VIP) ● Do NOT delete yet! ● Have metrics on misses
  12. 12. Breathing time Test Test Test ● Sizing (again) ● Expected load (with reads) ● Metrics (again)
  13. 13. Is that it? If your data is short lived- you are mostly done. Otherwise...
  14. 14. `Manual` part Migrating the long tail Dumper view (query) dump keys/records Kafka as a buffer Loader -> write (do not overwrite) values
  15. 15. Detour: DB views function (doc, meta) { emit(meta.expiration, doc.id); } key (sorting) value Map function (JS) ● Record itself is NOT a part of the view
  16. 16. `Manual` part Migrating the long tail Dumper view (query) dump keys/records Kafka as a buffer Loader -> write (do not overwrite) values
  17. 17. New Cluster ● # of records: ~ 2,000,000,000 ● 2AZ- repl. factor 2 in each one ● IOPS, same and growing ● # of machines: 2 x 5 i3.4xlarge ● Cost: ~ 10,000 $ per month (⅓)
  18. 18. What have we learned? 1. Plan, Plan, Plan. 2. Go easy: test a sample before you start (1 app / API / Customer). 3. Breath (read: test) between stages. 4. Do NOT delete until you have to. 5. slower -> safer (-> $$$).
  19. 19. Thank you! ido@appsflyer.com

×