Your SlideShare is downloading. ×
The data model is dead, long live the data model
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The data model is dead, long live the data model

13,119
views

Published on

Published in: Technology

0 Comments
21 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
13,119
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
520
Comments
0
Likes
21
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The data model is dead,long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStaxThursday, May 2, 13
  • 2. The data model is dead,long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStaxThursday, May 2, 13
  • 3. Bridging the divideThe era of relational everything is overThe era of Polyglot Persistence* has begun* http://www.martinfowler.com/bliki/PolyglotPersistence.htmlThursday, May 2, 13
  • 4. Coming from a relational worldTradeoffs are hardFeature RDBMS CassandraSingle Point ofFailureCross DatacenterLinear ScalingData modelingThursday, May 2, 13
  • 5. Background -The data model•The data model is alive and well• Models define the business requirements• Define of the structure of your data• Relational is just one type (Network model anyone?)4Wait? I thought NoSQL meant no model?Thursday, May 2, 13
  • 6. Background - ACID vs CAP5ACIDCAP - Pick twoAtomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that wayConsistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each otherThursday, May 2, 13
  • 7. Background - ACID vs CAP5ACIDCAP - Pick twoAtomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that wayConsistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each otherThursday, May 2, 13
  • 8. Background - ACID vs CAP5ACIDCAP - Pick twoAtomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that wayConsistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each otherCassandra let’s you tune thisThursday, May 2, 13
  • 9. Relational Background - Normal forms•This IS the relational model• 5 normal forms• Need foreign keys• Need joins6id First Last1 Edgar Codd2 Raymond Boyceid Dept1 Engineering2 MathEmployeesDepartmentThursday, May 2, 13
  • 10. Background - How Cassandra Stores Data• Model brought from big table*• Row Key and a lot of columns• Column names sorted (UTF8, Int,Timestamp, etc)7Column Name ... Column NameColumnValue ColumnValueTimestamp TimestampTTL TTLRow Key1 2 Billion* http://research.google.com/archive/bigtable.htmlThursday, May 2, 13
  • 11. Background - How Cassandra Stores Data• Rows belong to a node and are replicated• Row lookups are fast• Randomly distributed in cluster8RowKey1RowKey2RowKey3RowKey4RowKey5RowKey6RowKey7RowKey8RowKey9RowKey10RowKey11RowKey12Lookup5RowKey5Thursday, May 2, 13
  • 12. Relational Concept - Sequences• Handy feature for auto-creation of Ids• Guaranteed unique• Depends on a single source of truth (one server)9INSERT INTO user (id, firstName, LastName)VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)Thursday, May 2, 13
  • 13. Cassandra Concept - No sequences• Difficult in a distributed system• Requires a lock (perf killer)• What to do?- Use part of the data to create a unique index, or...- UUID to the rescue!10Thursday, May 2, 13
  • 14. Concept - UUID• Universal Unique ID• 128 bit number represented in character form• Easily generated on the client• Same as GUID for the MS folks1199051fe9-6a9c-46c2-b949-38ef78858dd0RFC 4122 if you want a referenceThursday, May 2, 13
  • 15. Cassandra Concept - Entity model• User table (!!)• Username is the unique key• Static but can be changed dynamically without downtime12CREATE TABLE users (username varchar,firstname varchar,lastname varchar,email varchar,password varchar,created_date timestamp,PRIMARY KEY (username));ALTER TABLE users ADD city text;Thursday, May 2, 13
  • 16. Relational Concept - De-normalization•To combine relations into a single row• Used in relational modeling to avoid complex joins13id First Last1 Edgar Codd2 Raymond Boyceid Dept1 Engineering2 MathEmployeesDepartmentSELECT e.First, e.Last, d.DeptFROM Department d, Employees eWHERE 1 = e.idAND e.id = d.idTake this and then...Thursday, May 2, 13
  • 17. Relational Concept - De-normalization• Combine table columns into a single view• No joins• All in how you set the data for fast reads14SELECT First, Last, DeptFROM employeesWHERE id = ‘1’id First Last Dept1 Edgar Codd Engineering2 Raymond Boyce MathEmployeesThursday, May 2, 13
  • 18. Cassandra Concept - One-to-Many• Relationship without being relational• Users have many videos• Wait? Where is the foreign key?15username firstname lastname emailtcodd Edgar Codd tcodd@relational.comrboyce Raymond Boyce rboyce@relational.comvideoid videoname username description tags99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lolUsersVideosThursday, May 2, 13
  • 19. Cassandra Concept - One-to-many• Static table to store videos• UUID for unique video id• Add username to denormalize16CREATE TABLE videos (videoid uuid,videoname varchar,username varchar,description varchar,tags varchar,upload_date timestamp,PRIMARY KEY(videoid));Thursday, May 2, 13
  • 20. Cassandra Concept - One-to-Many• Lookup video by username• Write in two tables at once for fast lookups17CREATE TABLE username_video_index (username varchar,videoid uuid,upload_date timestamp,video_name varchar,PRIMARY KEY (username, videoid));SELECT video_nameFROM username_video_indexWHERE username = ‘ctodd’AND videoid = ‘99051fe9’Creates a wide row!Thursday, May 2, 13
  • 21. Cassandra concept - Many-to-many• Users and videos have many comments18username firstname lastname emailtcodd Edgar Codd tcodd@relational.comrboyce Raymond Boyce rboyce@relational.comvideoid videoname username description tags99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lolUsersVideosusername videoid commenttcodd 99051fe9 Sweet!rboyce b3a76c6b Boring :(CommentsThursday, May 2, 13
  • 22. Cassandra concept - Many-to-many• Model both sides of the view• Insert both when comment is created•View from either side19CREATE TABLE comments_by_video (videoid uuid,username varchar,comment_ts timestamp,comment varchar,PRIMARY KEY (videoid,username));CREATE TABLE comments_by_user (username varchar,videoid uuid,comment_ts timestamp,comment varchar,PRIMARY KEY (username,videoid));Thursday, May 2, 13
  • 23. Cassandra concept - Many-to-many• Model both sides of the view• Insert both when comment is created•View from either side19CREATE TABLE comments_by_video (videoid uuid,username varchar,comment_ts timestamp,comment varchar,PRIMARY KEY (videoid,username));CREATE TABLE comments_by_user (username varchar,videoid uuid,comment_ts timestamp,comment varchar,PRIMARY KEY (username,videoid));Don’t be afraid of writes. Bring it!Thursday, May 2, 13
  • 24. Relational Concept -Transactions• Built in and easy to use• Can be slow and heavy so don’t use them all the time• Normal forms force ACID writes into many tables20lock-change table one-change table two-change table threecommit-or-lock-change table one-change table two-change table threerollbackThursday, May 2, 13
  • 25. Crazy Concept - Do you need a transaction?• Since they were easy in RDBMS, was it just default?• Read this article• In a nutshell,21http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdfAsynchronous transactionCashier takes your moneyBarista makes your coffeeError? Barista deals with itThursday, May 2, 13
  • 26. Cassandra Concept -Transaction quality• Requires a lock, which is costly in distributed systems• Cassandra features can be used to advantage- Row level isolation- Atomic batches22Thursday, May 2, 13
  • 27. Cassandra Concept -Transaction•Track that something happened• Use time stamps to preserve order• Rectify when any doubt (just like banks do)23CREATE TABLE credit_transaction (username varchar,type varchar,datetime timestamp,credits int,PRIMARY KEY (username,datetime,type)) WITH CLUSTERING ORDER BY (datetime DESC, type ASC);Create this tableSort the columns in reverse orderso last action is first on the listThursday, May 2, 13
  • 28. Cassandra Concept -Transaction• All transactions are stored•Think RPN calculator, latest first24ADD:2013-04-2521:10:32.745REMOVE:2013-04-2515:45:22.813ADD:2013-04-2507:15:12.542$20 $5 $100tcoddRectify account: + $100- $5+ 20---------= $115 Current balanceThursday, May 2, 13
  • 29. Cassandra Concept -Transaction25Create credit_transaction recordwith ADD +TimestampRead user record total_creditsand credit_timestampuser credit_timestamp <credit_transactiontimestamp?Set back in user recordcredit_timestamp andincremented total_creditsCreate credit_transaction recordwith REMOVE +TimestampRead user record total_creditsand credit_timestampuser credit_timestamp <credit_transactiontimestamp?Set back in user recordcredit_timestamp anddecremented total_creditsFail transactionand rectifySuccessAdd Credit Remove creditThursday, May 2, 13
  • 30. And if that doesn’t work...• Lightweight transactions coming soon.• Cassandra 2.0• See CASSANDRA-506226Thursday, May 2, 13
  • 31. But wait there is more!!•The next in this series: May 16th27Become a super modeler• Final will be at the Cassandra Summit: June 11thThe worlds next top data modelThursday, May 2, 13
  • 32. Be there!!!28Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.Thursday, May 2, 13
  • 33. ThankYouQ&AThursday, May 2, 13