C* path

1,215 views

Published on

Library for decomposing your structured data and storing it in Cassandra. Same simple API implemented for both Thrift and CQL.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,215
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
4
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

C* path

  1. 1. C* Path: Denormalize your data Eric Zoerner | Software Developer, eBuddy BV #CASSANDRAEU dinsdag 22 oktober 13 Cassandra Summit Europe 2013 London CASSANDRASUMMITEU
  2. 2. Topics • About eBuddy • Introducing C* Path • How does it work? • Design and Challenges • Cassandra Data Model • Futures #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  3. 3. About eBuddy #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  4. 4. XMS #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  5. 5. Cassandra in eBuddy Messaging Platform • User Data Service #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  6. 6. Cassandra in eBuddy Messaging Platform • User Data Service • User Discovery Service #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  7. 7. Cassandra in eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  8. 8. Cassandra in eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store • Message History #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  9. 9. Cassandra in eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store • Message History • Location-based Discovery #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  10. 10. Some Statistics • Current size of data – 1,4 TB total (replication of 3x); 467 GB actual data • 16 million sessions (11 million users plus groups) • Almost a billion rows in one column family (inverse social graph) #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  11. 11. C* Path #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  12. 12. The Problem (a “classic”) Key-Value Store (RDB table, NoSQL, etc.) Complex Object Person name: String birthdate: Date nickname: String * 1 Address street: String city: String province: String postalCode: String countryCode: String ? ? ? ? 1 ? ? ? ? * Phone name: String number: String #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  13. 13. Some Strategies Serialization! #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  14. 14. Serialization! Some Strategies Person id birthdate nickname 110 John 1985-04-06 Jack 111 Mary 1979-11-30 Mary person_id address_id street city 110 001 123 Main St New York 110 002 456 Singel Amsterdam 111 Normalization! name 003 78 Hoofd Str London Address Phone person_id mobile +15551234 111 home +44884800 111 dinsdag 22 oktober 13 phone 110 #CASSANDRAEU name mobile +44030393 CASSANDRASUMMITEU
  15. 15. Some Strategies Serialization! Person id birthdate nickname 110 John 1985-04-06 Jack 111 Mary 1979-11-30 Mary person_id address_id street city 110 Normalization! name 001 123 Main St New York 110 002 456 Singel Amsterdam 111 003 78 Hoofd Str London Address Decomposition! name/ John addresses/@0/street 123 Main St. phones/@0/number +31123456789 ... ... Phone phone 110 mobile +15551234 home +44884800 111 dinsdag 22 oktober 13 name 111 #CASSANDRAEU person_id mobile +44030393 CASSANDRASUMMITEU
  16. 16. Strategies Comparison Serialization Single Write Single Read Consistent Updates Structural Access Cycles #CASSANDRAEU dinsdag 22 oktober 13 Normalization Decomposition ✔ ✔ ✔ ✘ ✔ ✘ ✘ ✔ ✔ ✔ ✔ ✔ not enforced ✔ ✘ CASSANDRASUMMITEU
  17. 17. C* Path Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra https://github.com/ * ebuddy/c-star-path Artifacts available at Maven Central. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  18. 18. C* Path: Decomposition • Easy to Use • Simple API #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  19. 19. C* Path: Decomposition • Easy to Use • Simple API • Good for Cassandra because: – Structural Access: Write parts of objects without reading first #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  20. 20. C* Path: Decomposition • Easy to Use • Simple API • Good for Cassandra because: – Structural Access: Write parts of objects without reading first – Good for denormalizing data, can read or write large complex objects with one read or write operation #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  21. 21. How does it work? #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  22. 22. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  23. 23. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  24. 24. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); dao.writeToPath(rowKey, path, pojo); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  25. 25. API Example - Read from a Path Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  26. 26. API Example - Read from a Path Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { }); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  27. 27. API Example - Delete dao.deletePath(rowKey, path); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  28. 28. API Example - Batch Operations BatchContext batch = dao.beginBatch(); dao.writeToPath(rowKey1, path, pojo1, batch); dao.writeToPath(rowKey2, path, pojo2, batch); dao.deletePath(rowKey3, path, pojo3, batch); dao.applyBatch(batch); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  29. 29. Read or write at any level of a path Person person = …; Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  30. 30. Read or write at any level of a path Person person = …; Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); Path pathToName = path.withElements(“name”); String name = dao.readFromPath(rowKey, pathToName, stringTypeReference); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  31. 31. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  32. 32. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations • Step 2: – Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  33. 33. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations • Step 2: – Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer • Step 3: – Write this map as key-value pairs in the database #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  34. 34. Example Decomposition - step 1 Person name: String birthdate: Date nickname: String * 1 Address street: String city: String province: String postalCode: String countryCode: String Simplify structure into regular Maps, Lists, and simple values 1 * Phone name: String number: String #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  35. 35. Example Decomposition - step 1 Simplify structure into regular Maps, Lists, and simple values Map name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List> [0] = <Map> phones=<List> [0] = <Map> street="123 Main" number="+31651234567" place="New York" name="mobile" [1] = <Map> street="Singel 45" place="Amsterdam" #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  36. 36. Example Decomposition - step 2 path value name/ “John” birthdate/ “-39080932298” nickname/ “Jack” addresses/@0/street “123 Main St.” addresses/@0/place “New York” addresses/@1/street “Singel 45” addresses/@1/place “Amsterdam” phones/@0/name “mobile” phones/@1/number "+31651234567" #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  37. 37. Read implementation: Composition • Step 1: – Read path-value pairs from database #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  38. 38. Read implementation: Composition • Step 1: – Read path-value pairs from database • Step 2: – “Merge” path-value maps back into basic structure (Maps, Lists, simple values), done by Composer #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  39. 39. Read implementation: Composition • Step 1: – Read path-value pairs from database • Step 2: – “Merge” path-value maps back into basic structure (Maps, Lists, simple values), done by Composer • Step 3: – Use Jackson to convert basic structure back into domain object using a TypeReference #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  40. 40. Design & Challenges #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  41. 41. Path Encoding • Paths stored as strings • Forward slashes in paths (but hidden by Path API) • Path elements are internally URL encoded allowing use of special characters in the implementation • Special characters: @ for list indices (@0, @1, @2, ...) #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  42. 42. Challenge: “Shrinking Lists” ➀ Write a list. x/@0/ dao.writeToPath(key, “x”, {“1”,”2”}); #CASSANDRAEU dinsdag 22 oktober 13 “1” x/@1/ “2” CASSANDRASUMMITEU
  43. 43. Challenge: “Shrinking Lists” ➀ Write a list. ➁ Write a shorter list. x/@0/ dao.writeToPath(key, “x”, {“3”}); #CASSANDRAEU dinsdag 22 oktober 13 x/@1/ “2” x/@0/ dao.writeToPath(key, “x”, {“1”,”2”}); “1” “3” x/@1/ “2” CASSANDRASUMMITEU
  44. 44. Challenge: “Shrinking Lists” ➀ Write a list. ➁ Write a shorter list. ➂ Read the list. x/@0/ x/@1/ “3” x/@1/ dao.writeToPath(key, “x”, {“3”}); “2” x/@0/ dao.writeToPath(key, “x”, {“1”,”2”}); “1” “2” dao.readFromPath(key, “x”, new TypeReference<List<String>>() {}); {“3”,”2”} #CASSANDRAEU dinsdag 22 oktober 13 ✘ CASSANDRASUMMITEU
  45. 45. Challenge: “Shrinking Lists” ✔ Solution: Implementation writes a list terminator value. x/@0/ x/@1/ 0xFFFFFFFF x/@0/ “3” x/@1/ 0xFFFFFFFF x/@2/ dao.writeToPath(key, “x”, {“3”}); “2” x/@2/ dao.writeToPath(key, “x”, {“1”,”2”}); “1” 0xFFFFFFFF dao.readFromPath(key, “x”, new TypeReference<List<String>>() {}); {“3”} #CASSANDRAEU dinsdag 22 oktober 13 ✔ CASSANDRASUMMITEU
  46. 46. Challenge: “Shrinking Lists” ✔ Solution: Implementation writes a list terminator value. Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path. This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically. Conclusion: The user must know what they are doing and understand the implementation. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  47. 47. Challenge: Inconsistent Updates Because objects can be updated at any path, there is no protection against a write “corrupting” an object structure Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1); #CASSANDRAEU dinsdag 22 oktober 13 x/address/street/ “Singel 45” x/name/ “John” CASSANDRASUMMITEU
  48. 48. Challenge: Inconsistent Updates Because objects can be updated at any path, there is no protection against a write “corrupting” an object structure Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1); x/address/street/ “Singel 45” x/name/ “John” x/address/street/ path = dao.createPath(“x”,”name”); dao.writeToPath(key, path, person1); ✘ #CASSANDRAEU dinsdag 22 oktober 13 “Singel 45” x/name/ “John” x/name/address/street/ “Singel 45” x/name/name/ “John” CASSANDRASUMMITEU
  49. 49. Challenge: Inconsistent Updates ✔ Solution: Don’t do that! * If it does happen... The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail. Conclusion: The user must know what they are doing and understand the implementation. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  50. 50. Issue: Sorting Question: What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  51. 51. Issue: Sorting Question: What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? Instead of storing paths as strings, the implementation could have used DynamicComposite. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  52. 52. Issue: Sorting Question: What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? Instead of storing paths as strings, the implementation could have used DynamicComposite. We tried it. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  53. 53. Issue: Sorting Question: What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? It can work. CQL supports it as a user-defined type. Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  54. 54. Issue: Sorting Question: What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? It is still in consideration to use DynamicComposite for paths in a future version. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  55. 55. Cassandra Data Model #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  56. 56. Thrift row key column value column name “Singel 45” “John” … column family x/address/street/ x/name <UUID> … - OR super column name row key x <UUID> super column family (coming soon) #CASSANDRAEU dinsdag 22 oktober 13 address/street/ “Singel 45” name “John” … … CASSANDRASUMMITEU
  57. 57. Thrift Thrift implementation relies on the Hector client. ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>( keyspace,KeySerializer,StringSerializer,StructureSerializer); StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  58. 58. CQL CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) ) • Cannot use the path itself as a column name because it is “dynamic” • Dynamic column family #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  59. 59. CQL: Data Model Constraints CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) ) • Need to do a range (“slice”) query on the path path must be a clustering key • Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query. • One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditions Bad Request: No indexed columns present in by-columns clause with Equal operator #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  60. 60. CQL CQL implementation relies on the DataStax Java driver. StructuredDataSupport<K> dao = new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session); #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  61. 61. And the rest… #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  62. 62. Planned Features • Sets with simple values: element values stored in path • DynamicComposites? • Multiple row reads and writes • Slice queries on path ranges #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  63. 63. Credits and Acknowledgements • Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback • jackson JSON Processor, which is core to the C* Path implementation http://wiki.fasterxml.com/JacksonHome • Image credits: Slide image name author link Some Strategies binary noegranado http://www.flickr.com/photos/ 43360884@N04/6949896929/ #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU
  64. 64. C* Path Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra https://github.com/ * ebuddy/c-star-path Artifacts available at Maven Central. #CASSANDRAEU dinsdag 22 oktober 13 CASSANDRASUMMITEU

×