Couchbase_TLV_2014_Couchbase_at_Viber

57,041 views
57,693 views

Published on

Published in: Technology
2 Comments
35 Likes
Statistics
Notes
No Downloads
Views
Total views
57,041
On SlideShare
0
From Embeds
0
Number of Embeds
52,223
Actions
Shares
0
Downloads
197
Comments
2
Likes
35
Embeds 0
No embeds

No notes for slide
  • Viber was founded a little over 3 years ago.It started as a free app for iPhones providing free VoIP calls.After a few months an Android version was released and text messaging was introduced.Since then many new features were added and today Viber is a social communications platform available for almost all mobile phones, tablets and desktop OS’s.
  • Viber provides reliable text messaging, giving you indications when a message was sent, delivered to the recipient and even when they were read.Groups of up to 100 different users are possible supporting all media options such as sending photo’s, video’s, stickers, doodles and sending your location.Recently we added a new Push To Talk feature which sends your voice as you are talking without waiting for the recording to finish.In a group conversation with PTT, you can broadcast your voice instantly to up to 100 people.
  • During the last few months we have started to monetize the Viber service.Viber out – VOIP calls from Viber to non-Viber phone number (land lines & mobile numbers) for very low ratesStickers – Both free and premium stickers that can be purchased. In addition to branded content such as Smurfs and Garfield, we have created Viber characters such as Violet, Eve, Freddy, Blu, Zoe & more that you can see in the pictures here.
  • Viber is very easy to use. It uses your mobile number as your registration id and detects which of your friends have Viber from your AB.In order to provide the best user experience, Viber clients are always on and connected to our servers allowing for sub-second updates. We were able to provide this level of service without sacrificing battery life.MD: Viber is primarily a mobile application, but lately we have added support for both desktop’s and tablets. All your devices are registered under the same phone number and are fully sync’ed between them. All messages & calls are received by all devices and if you read a message on one device it is automatically shown as read on the other. Sent messages from one device appear on all other devices instantly. Calls can be seamlessly transferred between the devices without the other side even noticing.
  • Viber actually handles several times more talking minutes per day than all of the Israel’s cellular providers put together
  • Next I would like to talk about what runs the Viber service.The back-end which allows sending billions of messages and talking minutes with sub-second latencies to hundreds of millions of users.
  • At first Viber was a much smaller service, and for the first few months Viber used an in-house in-memory database solution.As Viber usage grew exponentially, we had to move to a more scalable solution. We decided to use a shardedNoSQL database solution to provide fast implementation and very easy scaling.In early 2011 this technology was not even cutting edge technology, but bleeding edge technology. We initially ran on the beta version of MongoDB’s very version that started to support sharding. We were one of the first big MongoDB deployments back then (if not the biggest).
  • All Viber servers on AWSRedis Sharder developed in house by Viber because Redis does not support shardingRedis in Master/Slave configurationMongoDB with 2 additional replicas for each nodeMongoDB uses SSD based instances for active and 1st replica and EBS for 2nd replicaRedis used both as cache for MongoDB and stand-alone DB for either high-throughput activity or for very large datasets (Billions of keys)
  • Got us this far – 3 years of extreme growthNever lost data from MongoDB – even though we had many server failures and even caused a few down times, but we were always able to access the data at the end of the dayRedis performance – Redis is a very fast DB and was able to give us the speed we needed
  • MongoDB performance: Only provided tens of thousands of ops whereas we needed hundreds of thousands of opsPerformance of databases with billions of keys dropped significantlyMongoDB scale: each application server had many worker threads, all of which would connect to a single MongoDB cluster. MongoDB would manage each connection with a separate thread and stack wasting a lot of memory and CPU. When we reached hundreds of application servers this started to become a serious problem.Redis Sharder – built in-house and is not a commercial-grade solution. Has VERY limited manageability and not robust enough. Scalability is limited and must be done in exponents of 2. Client implementation support most of the Redis commands, but not bulk commands, hindering performance.Redis In-Memory DB – Redis is an in-memory DB which has limited persistence to disk, but because MongoDB could not perform fast enough, we use Redis for most of our DB operations, without MongoDB at all.
  • When looking for a 3rd generation DB architecture, we were not looking to replace a standard RDBMS based system with a NoSQL system like most companies. We were already using a NoSQL solution by one of the market leaders which was simply not working good enough.High performance – hundreds of thousands of ops at consistent low latenciesLarge data sets – Billions of keysScalable – Easy to add additional server nodes without interrupting productionRobust – Solution should be able to withstand node failures without any downtime. Can be persisted to disk with a varying amount of replica and backups for different data (each bucket/cluster will have different robustness settings)Backed-up – Daily / weekly backups that can be used to perform a full recovery in case of failureAlways on – no downtime, including during SW/HW upgrades, backups, etc.Easy to monitor – good monitoring solution which can show both live and historical statistics. Interface should be both graphical UI but also accessible via external interface to connect to our monitoring/alert systemPrefer single DB solution (instead of cache + persistent DB)
  • Several Couchbase clusters (up to 50 nodes each)Each cluster has different access patterns (mainly read, mainly write/delete, large data sets, heavy disk usage) – pick different EC2 instance types according to the access patterns.Each cluster contains nodes with different amounts of memory, disks and cpu. Most are based on SSD drives though for very fast access.Different replica settings for each bucket, depending on data requires. From memcache buckets to Couchbase buckets with 0-2 replicas.We are currently using CB v2.11 & v2.2Backup Couchbase clusterSync. using XDCR for specific bucketsThis cluster contains views for real-time data analyticsWe perform daily or weekly backups from the CB clusters. Backup is done to a local EBS drive and every week copied to non-Amazon location.
  • Migrate live system – We need to migrate the back-end databases while the system is receiving millions of new users, and hundreds of thousands of requests per seconds.Zero downtime – The system must continue running throughout the whole migration process without even a minute of downtimeWe must make sure no data is lost during the migration processAs data is constantly being updated we must make sure are migrating the most up-to-date data and that data can be modified multiple times during the migration processAs we have hundreds of database servers, all in AWS, we need to take into consideration that during the migration process several machines will probably fail and should not affect data migration and consistencyBecause of the complexity of this process, it was probably the most time consuming and delicate part of moving to Couchbase.
  • As the CB was divided into several clusters, we only introduced 1-2 new clusters at a timeStage 1 – We need this stage to maintain data consistency because we have hundreds of application servers and upgrading them can take a few hours. When we move from stage 1 to stage 2 we need to make sure that if a server in stage 2 writes a key and then a server in stage 1 deletes it, it will not appear in CB.Stage 2 – This stage will make sure that ongoing changes are written to CBStage 3 – We exported all data from MongoDB after all servers have been upgraded to stage 2. A background process reads all this data and inserts it into couchbase only if the key did not exist (if it existed it will always be newer).Stage 4 – After background data migration is complete, both databases should be identical. To validate this we log all data import transactions and live updates. If there are any errors during the import we can always re-import the data. We also compare the list of keys from the MongoDB export to the logs of the actual keys inserted and make sure we didn’t miss anything. We also do a random check on a few tens of thousands of keys and compare the data between MongoDB and CB to check for inconsistencies. If there are any problems we can always start the migration process again.Stage 5 – This stage is necessary just to maintain data consistency during server upgrade (stage 4 servers are still reading from MongoDB).
  • MongoDB support updating documents, but since CB is so fast, we are able to retrieve the document, update it and set it back (using CAS to verify it wasn’t changed) much faster than server side updatingRedis supports server-side data structures on a key level. In order to achieve similar functionality with CB, we used several solutions. To simulate sets and lists we used the append function which is an atomic operation and much faster than retrieving, updating and setting the key. The problem is that appending is not possible on valid JSON documents. We appended valid JSON objects with a delimiter between them and to remove an object add a minus and only specify the object key.To simulate large maps that we want to be able to retrieve a single object fast – but not put every object in a separate key because that would create very large metadata. The solution was to break a single map key down into several keys using our own hashing algorithm to know in which key a specific object is located. So instead of having 1 large value, would have 10 values – which would provide a good trade-off between speed and metadata size.Initially we processed the daily backups which are stored in sqlite3 format to retrieve large range queries – such as a list of all Viber phone numbers in a certain country or that have certain data in their JSON object.We currently create views only on our backup cluster as to not impede performance. We plan to move more toward using views so we work with live data.
  • Daily oscillation between 75K to 200K opsOver 1.6 billion keys using 2 replicasActive data is almost 100% in memory and replica data is about 50% in memory for an average of 70% memory usage (since there are 2 replicas)This cluster is replicated to the backup cluster using XDCR. You can see that over 4.5T of data have been replicated already with over 8B key mutations
  • Couchbase_TLV_2014_Couchbase_at_Viber

    1. 1. Couchbase@Viber February 2014 Amir Ish-Shalom System Architect
    2. 2. About Viber
    3. 3. The Viber service • Free, cross platform text messaging • Free, cross platform VoIP calls (voice and video) • Photo, video and location sharing • Stickers and Emoticons • Group communication platform (up to 100 participants) • Push To Talk
    4. 4. Monetization Viber out Sticker market
    5. 5. Simplicity and User Experience • No registration needed • User ID = your mobile number • Automatic friends detection (no add a friend) • Always on. No battery impact • 32 languages • Multiple devices experience: Mobile, Tablets and Desktop
    6. 6. Viber in numbers
    7. 7. Viber in numbers • • • • Hundreds of millions of user accounts Almost 1 million added daily Billions of messages every month Billions of talking minutes every month
    8. 8. 2013-12 2013-11 2013-10 2013-09 2013-08 2013-07 2013-06 2013-05 2013-04 2013-03 2013-02 2013-01 2012-12 2012-11 2012-10 2012-09 2012-08 2012-07 2012-06 2012-05 2012-04 2012-03 2012-02 2012-01 2011-12 2011-11 2011-10 2011-09 2011-08 2011-07 2011-06 2011-05 2011-04 2011-03 2011-02 Viber Growth In 2013 vs. 2012 there was: • Over 3x growth in talking minutes • Over 5x growth in messages • Over 12x growth in group messages
    9. 9. Viber DB Architecture
    10. 10. Viber DB Architecture – 1st Generation Viber Clients Application Servers In-house in-memory DB
    11. 11. Viber DB Architecture – 2nd Generation Redis Sharder Redis Cluster Viber Clients Application Servers Redis Sharder Redis Cache MongoDB Cluster
    12. 12. 2nd generation DB architecture advantages • Got us this far • Never lost data from MongoDB • Redis performance
    13. 13. 2nd generation DB architecture problems • • • • MongoDB performance MongoDB does not scale well with many application servers Redis – In-memory database with no sharding Redis Sharder – Not manageable and robust enough
    14. 14. 3rd generation DB architecture requirements • • • • • • • • High performance Large data sets Scalable Robust Backed-up Always on Easy to monitor Prefer single DB solution Solution:
    15. 15. Viber DB Architecture – 3rd Generation XDCR Viber Clients Application Servers Couchbase Clusters Couchbase Backup Cluster
    16. 16. Back-end servers Increased performance using less than ½ of the DB servers! • Over 300 application servers • 2nd generation DB architecture: • MongoDB – 150 servers (master + 2 slaves) • Redis – Over 100 servers (master + 1 slave) • 3rd generation DB architecture: • 6 Couchbase clusters (up to 50 nodes each) • 0 – 2 replicas, XDCR & external backup • Total of 100-120 Couchbase servers
    17. 17. Migrating from 2nd to 3rd generation DB’s • • • • Migrate a live system Zero downtime No data loss Consistent data
    18. 18. How did we migrate? • Stage 1: • Stage 2: • Stage 3: • Stage 4: • Stage 5: • Stage 6: Add new CB cluster in parallel to existing cluster Only delete keys from CB Read only from MongoDB Write/Delete to both CB & MongoDB Background process that copies all data from MongoDB to CB (if it doesn’t exist) Validate data (both DB’s should be identical) Read only from CB Write/Delete from both CB & MongoDB Remove MongoDB and use only CB
    19. 19. Where are we now? • Completed migration of 3 clusters to Couchbase • Currently migrating 3 additional clusters to Couchbase
    20. 20. Obstacles we had to overcome with Couchbase 1. No server-side document updates  Solution: CAS 2. No internal data structures (maps, sets, lists, etc.)  Solution 1: “Appendable” JSON’s {“key”:”123”,”data”:”abc”}; {“key”:”456”,”data”:”efg”};-{“key”:”123”}  Solution 2: Break-up key to several keys 3. No secondary indexes / range queries  Solution 1: Process daily backups (sqlite3 files)  Solution 2: Use views
    21. 21. Couchbase Cluster with 10 nodes
    22. 22. Future improvements • • • • • Complete database migration to use only Couchbase Upgrade to CB v2.5 to distribute replica across EC2 AZ’s More extensive use of views Integration with our Big Data analytics Hadoop architecture Integration with Elastic Search using XDCR
    23. 23. Questions?
    24. 24. Thank you

    ×