Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real data models of silicon valley

5,861 views

Published on

A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!

Published in: Data & Analytics, Technology
  • There is a useful site for you that will help you to write a perfect and valuable essay and so on. Check out, please ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.HelpWriting.net ⇐ So make sure to check it out!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Eat THIS "prickly flower" to crush food cravings. Ugly plant kills sugar & carb cravings instantly 》》》 https://t.cn/A6wnCpvk
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • $25 per hour jobs on Facebook, now hiring! ♣♣♣ http://ishbv.com/socialpaid/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Eat This POTENT Vegetable To Melt Diabetic Fat. IMPORTANT: Be careful, only eat it twice a day or you will lose diabetic belly fat too fast... ♣♣♣ http://scamcb.com/bloodsug/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Real data models of silicon valley

  1. 1. Real Data Models of Silicon Valley Patrick McFadin Chief Evangelist for Apache Cassandra ! @PatrickMcFadin
  2. 2. It's been an epic year
  3. 3. I've had a ton of fun! • Traveling the world talking to people like you! Stockholm Warsaw Melbourne New York Vancouver Dublin
  4. 4. What's new? • 2.1 is out! • Amazing changes for performance and stability
  5. 5. Where are we going? • 3.0 is next. Just hold on…
  6. 6. KillrVideo.com • 2012 Summit • Complete example for data modeling www.killrvideos.com Video Title Recommended Meow Ads by Google Description Comments Upload New! Username Rating: Tags: Foo Bar *Cat drawing by goodrob13 on Flickr
  7. 7. It’s alive!!! • Hosted on Azure • Code on Github
  8. 8. Data Model - Revisited • Add in some 2.1 data models • Replace (or remove) some app code • Become a part of Cassandra OSS download
  9. 9. User Defined Types • Complex data in one place • No multi-gets (multi-partitions) • Nesting! CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
  10. 10. Before CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE video_metadata ( video_id uuid PRIMARY KEY, height int, width int, video_bit_rate set<text>, encoding text ); SELECT * FROM videos WHERE videoId = 2; ! SELECT * FROM video_metadata WHERE videoId = 2; Title: Introduction to Apache Cassandra ! Description: A one hour talk on everything you need to know about a totally amazing database. Playback rate: 480 720 In-application join
  11. 11. After • Now video_metadata is embedded in videos CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text ); CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
  12. 12. Wait! Frozen?? • Staying out of technical debt • 3.0 UDTs will not have to be frozen • Applicable to User Defined Types and Tuples (wait for it…) Do you want to build a schema? Do you want to store some JSON?
  13. 13. Let’s store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
  14. 14. Let’s store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions ( units text, length float, width float, height float );
  15. 15. Let’s store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions ( units text, length float, width float, height float ); CREATE TYPE category ( catalogPage int, url text );
  16. 16. Let’s store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions ( units text, length float, width float, height float ); CREATE TYPE category ( catalogPage int, url text ); CREATE TABLE product ( productId int, name text, price float, description text, dimensions frozen <dimensions>, categories map <text, frozen <category>>, PRIMARY KEY (productId) );
  17. 17. Let’s store some JSON INSERT INTO product (productId, name, price, description, dimensions, categories) VALUES (2, 'Kitchen Table', 249.99, 'Rectangular table with oak finish', { units: 'inches', length: 50.0, width: 66.0, height: 32 }, { 'Home Furnishings': { catalogPage: 45, url: '/home/furnishings' }, 'Kitchen Furnishings': { catalogPage: 108, url: '/kitchen/furnishings' } ! } ); dimensions frozen <dimensions> categories map <text, frozen <category>>
  18. 18. Retrieving fields
  19. 19. Counters pt Deux • Since .8 • Commit log replay would change counters • Repair could change counters • Performance was inconsistent. Lots of GC
  20. 20. The good • Stable under load • No commit log replay issues • No repair weirdness
  21. 21. The bad • Still can’t delete/reset counters • Still needs to do a read before write.
  22. 22. Usage Wait for it… It’s the same! Carry on…
  23. 23. Static Fields • New as of 2.0.6 • VERY specific, but useful • Thrift people will like this CREATE TABLE t ( k text, s text STATIC, i int, PRIMARY KEY (k, i) );
  24. 24. Why? CREATE TABLE weather ( id int, time timestamp, weatherstation_name text, temperature float, PRIMARY KEY (id, time) ); ID = 1 Partition Key (Storage Row Key) 2014-09-08 12:00:00 : name SFO 2014-09-08 12:00:00 : temp 63.4 2014-09-08 12:01:00 : name SFO 2014-09-08 12:00:00 : temp 63.9 2014-09-08 12:02:00 : name SFO 2014-09-08 12:00:00 : temp 64.0 Partition Row 1 Partition Row 2 Partition Row 3 ID = 1 Partition Key (Storage Row Key) name SFO Partition Row 1 Partition Row 1 Partition Row 1 2014-09-08 12:00:00 : temp 63.4 2014-09-08 12:00:00 : temp 63.9 2014-09-08 12:00:00 : temp 64.0 CREATE TABLE weather ( id int, time timestamp, weatherstation_name text static, temperature float, PRIMARY KEY (id, time) );
  25. 25. Usage • Put a static at the end of the declaration • Can’t be a part of primary key CREATE TABLE video_event ( videoid uuid, userid uuid, preview_image_location text static, event varchar, event_timestamp timeuuid, video_timestamp bigint, PRIMARY KEY ((videoid,userid),event_timestamp,event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC,event ASC);
  26. 26. Tuples CREATE TABLE tuple_table ( id int PRIMARY KEY, three_tuple frozen <tuple<int, text, float>>, four_tuple frozen <tuple<int, text, float, inet>>, five_tuple frozen <tuple<int, text, float, inet, ascii>> ); • A type that represents a group • Up to 256 different elements
  27. 27. Example Usage • Track a drone’s position • x, y, z in a 3D Cartesian CREATE TABLE drone_position ( droneId int, time timestamp, position frozen <tuple<float, float, float>>, PRIMARY KEY (droneId, time) );
  28. 28. What about partition size? • A CQL partition is a logical projection of a storage row • Storage rows can have up to 2 billion cells • Each cell can hold up to 2G of data
  29. 29. How much is too much? • How many cells before performance degrades? • How many bytes per partition before it’s unmanageable • What is “practical”
  30. 30. Old answer • 2011: Pre-Cassandra 1.2 (actually tested on .8) • Aaron Morton, Cassandra MVP and Founder of The Last Pickle
  31. 31. Conclusion • Keep partition (storage row) length < 10k cells • Total size in bytes below 64M (Multi-pass compaction) • Multiple hits to 64k page size will start to hurt TL;DR - It’s a performance tunable
  32. 32. The tests revisited • Attempted to reproduce the same tests using CQL • Cassandra 2.1, 2.0 and 1.2 • Tested partitions sizes 1. 100 2. 2114 3. 5,000 4. 10,000 5. 100,000 6. 1,000,000 7. 10,000,000 8. 100,000,000 9. 1,000,000,000
  33. 33. Results mSec Cells per partition
  34. 34. The new answer • 100’s of thousands is not problem • 100’s of megs per partition is best operationally • The issue to manage is operations
  35. 35. Thank You! Follow me on twitter for more @PatrickMcFadin
  36. 36. CASSANDRASUMMIT2014 September 10 - 11 | #CassandraSummit

×