Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
EmoDB
Store your feelings
here
www.bazaarvoice.com
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SaaS serving software that collects
and displays user generated con...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Fahd Siddiqui
Senior Software Engineer, Data Infrastructure
Bazaarv...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Global Monthly Unique Visitors
1B
1B
500M
1B
400M
200M
250M
450M
1B...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Monthly stats as of July 2013
16B
1B
480M
250M
118M
3M
4000
2500
Re...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Uni...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Uni...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Uni...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Uni...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
EmoDB Overview
System of Record
Databus
Queue Service
Blob Store
…....
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - tables
What is an Emo Table?
It is a bucket that contains jso...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - tables
Create a table
$ curl -s -XPUT -H "Content-Type: appli...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – rows
Row is composed of deltas
Writers append deltas, and rea...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Access in EmoDB
3 ways to read data out of EmoDB
Lookup by pri...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage challenges
Problem 1:
Need a way to cheaply crea...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage challenges
Problem 2 (once Problem 1 is solved):...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage
Solution to both Problem 1 and 2:
Row key byte b...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Scanning Table
Shard identifier serves to spread content for ...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Scanning Table
Table UUIDs also solved another problem for us...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Call getSplits() method to get a list of split ...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Java code sample
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Documents are stored as a sequence of deltas
Readers e...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Typically a replication conflict between t2 and t3
But...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Recursive, pattern matching approach
Operations availa...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Literal – “smash” operation
Delete
Map
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Conditional
Perform a delta conditionally
Designed to ...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Deltas
Other types of conditions
Equal, Intrinsic, Is, Map, A...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Read-Modify-Write
Read original state
Compute new vers...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Data
center A
• T1
• Conditio
nal T3
Data
center B
• T...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Compaction
For efficiency, older deltas get compacted ...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Allows applications to get notified of updates to SoR
Must ...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus – Subscription Management
Subscribe to changes to a set of ...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Subscribe for multiple tables
Count events
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Poll for events
Check for unclaimed, unacknowledged events
...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Blob Store
REST storage service for photos.
No single point of fail...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Highly Scalable Architecture
We serve traffic out of three AWS
regi...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Emo/Polloi Contributors
Aaron Dixon
Ahaduzzaman Munna
Dave Barcelo
...
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
@Bazaarvoice
@BazaarvoiceDev
http://www.bazaarvoice.com/
http://blo...
Upcoming SlideShare
Loading in...5
×

Cassandra at Bazaarvoice - EmoDB

2,469

Published on

Introducing Bazaarvoice datastore (EmoDB)

EmoDB is a RESTful HTTP server used by Bazaarvoice for storing JSON objects and for watching for changes to those events. It also supports a blob store, a queueing service, and a data bus to track events.

It is designed to span multiple data centers, using eventual consistency (AP) and multi-master conflict resolution. It relies on Apache Cassandra for persistence and cross-data center replication.

About Bazaarvoice

Bazaarvoice is based on a simple truth - when people talk to each other, people buy stuff they are happy about because they trust the opinions of others. We see a day when all voices are connected and, together, help the marketplace function better. We’ve built a network that connects businesses together to amplify the authentic voices of people wherever they shop – online, in-store and mobile. Our mission, just like our name, is to be the "voice of the marketplace", one authentic conversation at a time.

Each month, more than 450 million people view and share opinions, questions and experiences on more than 20 million products in the Bazaarvoice network. Our technology platform channels these voices into places that help consumers make purchasing decisions. Our engineers have the opportunities to work on many areas of computer science, including distributed computing, natural language processing, big data analytics, larget-scale system design, and user interface design just to name a few. We use the latest and greatest technologies in Cloud, NoSQL, advanced client side JavaScript etc to solve problems at a scale that few companies can offer.

About Fahd Siddiqui:
Fahd Siddiqui is a Senior Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,469
On Slideshare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cassandra at Bazaarvoice - EmoDB

  1. 1. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Store your feelings here www.bazaarvoice.com
  2. 2. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SaaS serving software that collects and displays user generated content, crunches analytics, and extracts insights. Thousands of clients Hundreds of millions of pieces of content Hundreds of millions of unique visitors per month Tens of billions of pageviews per month Austin-based company founded in 2005 Austin San Francisco New YorkEngineering offices
  3. 3. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Fahd Siddiqui Senior Software Engineer, Data Infrastructure Bazaarvoice linkedin.com/in/fahdsiddiqui fahd.siddiqui@bazaarvoice.com $ whoami
  4. 4. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Global Monthly Unique Visitors 1B 1B 500M 1B 400M 200M 250M 450M 1B 600M
  5. 5. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Monthly stats as of July 2013 16B 1B 480M 250M 118M 3M 4000 2500 Review impressions Pageviews (37k rps) Unique users Products in catalog Total reviews Monthly new reviews Customer implementations Servers 95 Engineers
  6. 6. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  7. 7. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  8. 8. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  9. 9. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  10. 10. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  11. 11. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB
  12. 12. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything
  13. 13. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture
  14. 14. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events
  15. 15. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events Exposes RESTful API
  16. 16. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events Exposes RESTful API Multi-master, multi-datacenter, fault tolerant, horizontal scale on r/w
  17. 17. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Overview System of Record Databus Queue Service Blob Store ….. Backed by Cassandra
  18. 18. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - tables What is an Emo Table? It is a bucket that contains json document. Creating it is cheap, and you may create as many as you want e.g.., review:testcustomer Offers a way to fetch any particular row id, and Complete table scan – uses splits for parallel scans
  19. 19. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - tables Create a table $ curl -s -XPUT -H "Content-Type: application/json" "http://localhost:8080/sor/1/_table/review:testcustomer ?options=placement:'ugc_global:ugc'&audit=comment:'initial+provisioning',host:aws-tools-02" --data-binary '{"type":"review","client":"TestCustomer"}' | jsonpp { "success": true } • Store a document $ curl -s -XPUT -H "Content-Type: application/json" http://localhost:8080/sor/1/review:testcustomer/demo1?audit=comment:'initial+submission',host:aws-submit-09 --data-binary '{"author":"Bob","title":"Best Ever!","rating":5}' | jsonpp { "success": true }
  20. 20. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – rows Row is composed of deltas Writers append deltas, and readers resolve deltas to produce a resolved object Compaction occurs when data has been replicated to all data centers Due to this, EmoDb is not good for systems high update/create ratio
  21. 21. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Access in EmoDB 3 ways to read data out of EmoDB Lookup by primary key Bulk extract (scan) Change feed (using EmoDB databus) What’s missing? Where, join, group by, anything other than primary key lookup Use other indexing mechanism for complex queries (such as elasticsearch, solr, etc.)
  22. 22. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage challenges Problem 1: Need a way to cheaply create 10’s of 1000’s of “tables” As of Cassandra 1.1, at least 1 MB of memory in every node for each CF is needed Way too much overhead to dedicate a CF for each user-defined table Hint: We’ll use only one Column Family to store all tables
  23. 23. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage challenges Problem 2 (once Problem 1 is solved): Need to scan entire table to be indexed by Polloi (Elasticsearch) Require a way to split tables into shards that enable sequential scan Shards for each table should be fully distributed over Cassandra cluster
  24. 24. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage Solution to both Problem 1 and 2: Row key byte buffer contains a 9 byte “table prefix” 0 – 0: 8-bit shard identifier 1 – 8: 64-bit table UUID N-byte - UTF-8-encoded content key Shard identifier is determined by Bottom 8 bits of 32-bit Murmur3 hash of (table UUID | content key)
  25. 25. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Scanning Table Shard identifier serves to spread content for a given table to avoid hotspots (using ByteOrderedPartitioner) All content for a table can be fetched in parallel using 2^8 = 256 range queries There you have it, a single CF offering range scans for segments (tables) that are fully distributed over the cluster !
  26. 26. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Scanning Table Table UUIDs also solved another problem for us Multiple tables can now be stored in the same CF Since we use UUID, it allows us to DROP tables, and CREATE with the same name. DROP’ed table deleted lazily – specially important in an eventually consistent world
  27. 27. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan
  28. 28. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan Call getSplits() method to get a list of split identifiers Then, in parallel, scan the data in each split by calling the getSplit() method Java: Collection<String> getSplits(String table, int desiredRecordsPerSplit); Iterator<Map<String, Object>> getSplit(String table, String split, @Nullable String fromKeyExclusive, long limit, ReadConsistency consistency);
  29. 29. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan Java code sample
  30. 30. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Documents are stored as a sequence of deltas Readers evaluate deltas in order to produce document Create, update, and delete documents by creating deltas Weak consistency – no document level locking
  31. 31. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas
  32. 32. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Typically a replication conflict between t2 and t3 But since each delta specifies only the fields it modifies, the deltas merge together cleanly and produce the desired result. No cross-data center synchronous communication required for concurrent modification
  33. 33. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Recursive, pattern matching approach Operations available for: Setting a value Deleting a value Updating a value for a key in a map No operation for modifying a list Model list using a map Time UUID is a good candidate for list keys
  34. 34. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Literal – “smash” operation Delete Map
  35. 35. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Conditional Perform a delta conditionally Designed to help resolve the most common concurrent write conflict situations Simple and reliable Eg., Mark review “approved” only if moderation hasn’t begun
  36. 36. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Deltas Other types of conditions Equal, Intrinsic, Is, Map, And, Or, Not, Constant Eg., {..,"type":or("product","category"),"client":"TestCustomer"}
  37. 37. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Read-Modify-Write Read original state Compute new version The write succeeds, or Eventually, the write conflicts, and databus fires an event for the application to detect it, and retry the write.
  38. 38. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Data center A • T1 • Conditio nal T3 Data center B • T1 • T2 •
  39. 39. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Compaction For efficiency, older deltas get compacted and replaced by a single delta – a “compaction” record Ensures intrinsics like ~version, ~firstUpdateAt, etc. are maintained Compaction happens opportunistically, whenever documents are read
  40. 40. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Allows applications to get notified of updates to SoR Must create a persistent subscription A table or multiple tables (based on value of attributes) SoR “DVR”s updates for all subscriptions Supports multiple concurrent writers, and readers (polls and acks) No guarantees on order To help SoR provides ~version, and ~signature Exposes RESTful API
  41. 41. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus – Subscription Management Subscribe to changes to a set of tables in the System of Record Table filters are the same as conditions for deltas Follow events on all tables for which the condition evaluates to true To subscribe to all tables in the SoR, omit the condition or pass ‘alwaysTrue’
  42. 42. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Subscribe for multiple tables Count events
  43. 43. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Poll for events Check for unclaimed, unacknowledged events If events not ack’d, then they will return in another poll after claim period expires Renew claims Acknowledge Claims
  44. 44. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Blob Store REST storage service for photos. No single point of failure (data loss after 3 servers fail.) Sweet spot is blobs of a few MB, not GB (not designed for video.) Data replicates to all data centers Except where replication is restricted by legal Why not Amazon S3? Lower latency: reads & writes are always served out of the local data center. If you don't read cross-data center or you don't mind writing to buckets in multiple regions, use S3 or S3+Cloudfront.
  45. 45. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Highly Scalable Architecture We serve traffic out of three AWS regions simultaneously DNS Global Traffic Management sends user requests to the fastest region Application services are all auto-scaled and self-healing Our Cassandra-based EmoDB operations out of multiple Availability Zones, so that an AZ failure doesn’t result in downtime Cassandra replicates across all three regions
  46. 46. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Emo/Polloi Contributors Aaron Dixon Ahaduzzaman Munna Dave Barcelo Fahd Siddiqui John Roesler Mark Brandt Matt Bogner Nate Bauernfiend Shawn Smith Steven Grotten
  47. 47. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. @Bazaarvoice @BazaarvoiceDev http://www.bazaarvoice.com/ http://blog.developer.bazaarvoice.com/ Learn more
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×