Cassandra at Bazaarvoice - EmoDB
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Cassandra at Bazaarvoice - EmoDB

on

  • 2,291 views

Introducing Bazaarvoice datastore (EmoDB) ...

Introducing Bazaarvoice datastore (EmoDB)

EmoDB is a RESTful HTTP server used by Bazaarvoice for storing JSON objects and for watching for changes to those events. It also supports a blob store, a queueing service, and a data bus to track events.

It is designed to span multiple data centers, using eventual consistency (AP) and multi-master conflict resolution. It relies on Apache Cassandra for persistence and cross-data center replication.

About Bazaarvoice

Bazaarvoice is based on a simple truth - when people talk to each other, people buy stuff they are happy about because they trust the opinions of others. We see a day when all voices are connected and, together, help the marketplace function better. We’ve built a network that connects businesses together to amplify the authentic voices of people wherever they shop – online, in-store and mobile. Our mission, just like our name, is to be the "voice of the marketplace", one authentic conversation at a time.

Each month, more than 450 million people view and share opinions, questions and experiences on more than 20 million products in the Bazaarvoice network. Our technology platform channels these voices into places that help consumers make purchasing decisions. Our engineers have the opportunities to work on many areas of computer science, including distributed computing, natural language processing, big data analytics, larget-scale system design, and user interface design just to name a few. We use the latest and greatest technologies in Cloud, NoSQL, advanced client side JavaScript etc to solve problems at a scale that few companies can offer.

About Fahd Siddiqui:
Fahd Siddiqui is a Senior Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin.

Statistics

Views

Total Views
2,291
Views on SlideShare
1,294
Embed Views
997

Actions

Likes
2
Downloads
12
Comments
0

16 Embeds 997

http://planetcassandra.org 584
http://www.planetcassandra.org 353
http://planetca.w11.wh-2.com 15
http://www.newsblur.com 9
http://localhost 8
http://planetcassandra.com 8
http://planetcassandra.net 7
https://www.google.co.in 2
http://www.planetcassandra.net 2
http://www.planetcassandra.com 2
http://www.feedspot.com 2
https://www.google.com 1
http://translate.googleusercontent.com 1
http://reader.faltering.com 1
https://twitter.com 1
http://planetcassandra.prakashinfotech.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cassandra at Bazaarvoice - EmoDB Presentation Transcript

  • 1. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Store your feelings here www.bazaarvoice.com
  • 2. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SaaS serving software that collects and displays user generated content, crunches analytics, and extracts insights. Thousands of clients Hundreds of millions of pieces of content Hundreds of millions of unique visitors per month Tens of billions of pageviews per month Austin-based company founded in 2005 Austin San Francisco New YorkEngineering offices
  • 3. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Fahd Siddiqui Senior Software Engineer, Data Infrastructure Bazaarvoice linkedin.com/in/fahdsiddiqui fahd.siddiqui@bazaarvoice.com $ whoami
  • 4. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Global Monthly Unique Visitors 1B 1B 500M 1B 400M 200M 250M 450M 1B 600M
  • 5. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Monthly stats as of July 2013 16B 1B 480M 250M 118M 3M 4000 2500 Review impressions Pageviews (37k rps) Unique users Products in catalog Total reviews Monthly new reviews Customer implementations Servers 95 Engineers
  • 6. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 7. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 8. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 9. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 10. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 11. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB
  • 12. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything
  • 13. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture
  • 14. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events
  • 15. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events Exposes RESTful API
  • 16. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events Exposes RESTful API Multi-master, multi-datacenter, fault tolerant, horizontal scale on r/w
  • 17. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Overview System of Record Databus Queue Service Blob Store ….. Backed by Cassandra
  • 18. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - tables What is an Emo Table? It is a bucket that contains json document. Creating it is cheap, and you may create as many as you want e.g.., review:testcustomer Offers a way to fetch any particular row id, and Complete table scan – uses splits for parallel scans
  • 19. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - tables Create a table $ curl -s -XPUT -H "Content-Type: application/json" "http://localhost:8080/sor/1/_table/review:testcustomer ?options=placement:'ugc_global:ugc'&audit=comment:'initial+provisioning',host:aws-tools-02" --data-binary '{"type":"review","client":"TestCustomer"}' | jsonpp { "success": true } • Store a document $ curl -s -XPUT -H "Content-Type: application/json" http://localhost:8080/sor/1/review:testcustomer/demo1?audit=comment:'initial+submission',host:aws-submit-09 --data-binary '{"author":"Bob","title":"Best Ever!","rating":5}' | jsonpp { "success": true }
  • 20. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – rows Row is composed of deltas Writers append deltas, and readers resolve deltas to produce a resolved object Compaction occurs when data has been replicated to all data centers Due to this, EmoDb is not good for systems high update/create ratio
  • 21. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Access in EmoDB 3 ways to read data out of EmoDB Lookup by primary key Bulk extract (scan) Change feed (using EmoDB databus) What’s missing? Where, join, group by, anything other than primary key lookup Use other indexing mechanism for complex queries (such as elasticsearch, solr, etc.)
  • 22. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage challenges Problem 1: Need a way to cheaply create 10’s of 1000’s of “tables” As of Cassandra 1.1, at least 1 MB of memory in every node for each CF is needed Way too much overhead to dedicate a CF for each user-defined table Hint: We’ll use only one Column Family to store all tables
  • 23. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage challenges Problem 2 (once Problem 1 is solved): Need to scan entire table to be indexed by Polloi (Elasticsearch) Require a way to split tables into shards that enable sequential scan Shards for each table should be fully distributed over Cassandra cluster
  • 24. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage Solution to both Problem 1 and 2: Row key byte buffer contains a 9 byte “table prefix” 0 – 0: 8-bit shard identifier 1 – 8: 64-bit table UUID N-byte - UTF-8-encoded content key Shard identifier is determined by Bottom 8 bits of 32-bit Murmur3 hash of (table UUID | content key)
  • 25. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Scanning Table Shard identifier serves to spread content for a given table to avoid hotspots (using ByteOrderedPartitioner) All content for a table can be fetched in parallel using 2^8 = 256 range queries There you have it, a single CF offering range scans for segments (tables) that are fully distributed over the cluster !
  • 26. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Scanning Table Table UUIDs also solved another problem for us Multiple tables can now be stored in the same CF Since we use UUID, it allows us to DROP tables, and CREATE with the same name. DROP’ed table deleted lazily – specially important in an eventually consistent world
  • 27. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan
  • 28. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan Call getSplits() method to get a list of split identifiers Then, in parallel, scan the data in each split by calling the getSplit() method Java: Collection<String> getSplits(String table, int desiredRecordsPerSplit); Iterator<Map<String, Object>> getSplit(String table, String split, @Nullable String fromKeyExclusive, long limit, ReadConsistency consistency);
  • 29. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan Java code sample
  • 30. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Documents are stored as a sequence of deltas Readers evaluate deltas in order to produce document Create, update, and delete documents by creating deltas Weak consistency – no document level locking
  • 31. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas
  • 32. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Typically a replication conflict between t2 and t3 But since each delta specifies only the fields it modifies, the deltas merge together cleanly and produce the desired result. No cross-data center synchronous communication required for concurrent modification
  • 33. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Recursive, pattern matching approach Operations available for: Setting a value Deleting a value Updating a value for a key in a map No operation for modifying a list Model list using a map Time UUID is a good candidate for list keys
  • 34. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Literal – “smash” operation Delete Map
  • 35. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Conditional Perform a delta conditionally Designed to help resolve the most common concurrent write conflict situations Simple and reliable Eg., Mark review “approved” only if moderation hasn’t begun
  • 36. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Deltas Other types of conditions Equal, Intrinsic, Is, Map, And, Or, Not, Constant Eg., {..,"type":or("product","category"),"client":"TestCustomer"}
  • 37. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Read-Modify-Write Read original state Compute new version The write succeeds, or Eventually, the write conflicts, and databus fires an event for the application to detect it, and retry the write.
  • 38. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Data center A • T1 • Conditio nal T3 Data center B • T1 • T2 •
  • 39. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Compaction For efficiency, older deltas get compacted and replaced by a single delta – a “compaction” record Ensures intrinsics like ~version, ~firstUpdateAt, etc. are maintained Compaction happens opportunistically, whenever documents are read
  • 40. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Allows applications to get notified of updates to SoR Must create a persistent subscription A table or multiple tables (based on value of attributes) SoR “DVR”s updates for all subscriptions Supports multiple concurrent writers, and readers (polls and acks) No guarantees on order To help SoR provides ~version, and ~signature Exposes RESTful API
  • 41. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus – Subscription Management Subscribe to changes to a set of tables in the System of Record Table filters are the same as conditions for deltas Follow events on all tables for which the condition evaluates to true To subscribe to all tables in the SoR, omit the condition or pass ‘alwaysTrue’
  • 42. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Subscribe for multiple tables Count events
  • 43. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Poll for events Check for unclaimed, unacknowledged events If events not ack’d, then they will return in another poll after claim period expires Renew claims Acknowledge Claims
  • 44. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Blob Store REST storage service for photos. No single point of failure (data loss after 3 servers fail.) Sweet spot is blobs of a few MB, not GB (not designed for video.) Data replicates to all data centers Except where replication is restricted by legal Why not Amazon S3? Lower latency: reads & writes are always served out of the local data center. If you don't read cross-data center or you don't mind writing to buckets in multiple regions, use S3 or S3+Cloudfront.
  • 45. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Highly Scalable Architecture We serve traffic out of three AWS regions simultaneously DNS Global Traffic Management sends user requests to the fastest region Application services are all auto-scaled and self-healing Our Cassandra-based EmoDB operations out of multiple Availability Zones, so that an AZ failure doesn’t result in downtime Cassandra replicates across all three regions
  • 46. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Emo/Polloi Contributors Aaron Dixon Ahaduzzaman Munna Dave Barcelo Fahd Siddiqui John Roesler Mark Brandt Matt Bogner Nate Bauernfiend Shawn Smith Steven Grotten
  • 47. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. @Bazaarvoice @BazaarvoiceDev http://www.bazaarvoice.com/ http://blog.developer.bazaarvoice.com/ Learn more