The data model is dead, long live the data model
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
9,681
On Slideshare
9,136
From Embeds
545
Number of Embeds
7

Actions

Shares
Downloads
389
Comments
0
Likes
18

Embeds 545

http://d.hatena.ne.jp 488
http://le2p-cc.org 24
https://twitter.com 21
http://feedly.com 8
http://newsblur.com 2
http://moderation.local 1
http://www.google.co.jp 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The data model is dead,long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStaxThursday, May 2, 13
  • 2. The data model is dead,long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStaxThursday, May 2, 13
  • 3. Bridging the divideThe era of relational everything is overThe era of Polyglot Persistence* has begun* http://www.martinfowler.com/bliki/PolyglotPersistence.htmlThursday, May 2, 13
  • 4. Coming from a relational worldTradeoffs are hardFeature RDBMS CassandraSingle Point ofFailureCross DatacenterLinear ScalingData modelingThursday, May 2, 13
  • 5. Background -The data model•The data model is alive and well• Models define the business requirements• Define of the structure of your data• Relational is just one type (Network model anyone?)4Wait? I thought NoSQL meant no model?Thursday, May 2, 13
  • 6. Background - ACID vs CAP5ACIDCAP - Pick twoAtomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that wayConsistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each otherThursday, May 2, 13
  • 7. Background - ACID vs CAP5ACIDCAP - Pick twoAtomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that wayConsistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each otherThursday, May 2, 13
  • 8. Background - ACID vs CAP5ACIDCAP - Pick twoAtomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that wayConsistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each otherCassandra let’s you tune thisThursday, May 2, 13
  • 9. Relational Background - Normal forms•This IS the relational model• 5 normal forms• Need foreign keys• Need joins6id First Last1 Edgar Codd2 Raymond Boyceid Dept1 Engineering2 MathEmployeesDepartmentThursday, May 2, 13
  • 10. Background - How Cassandra Stores Data• Model brought from big table*• Row Key and a lot of columns• Column names sorted (UTF8, Int,Timestamp, etc)7Column Name ... Column NameColumnValue ColumnValueTimestamp TimestampTTL TTLRow Key1 2 Billion* http://research.google.com/archive/bigtable.htmlThursday, May 2, 13
  • 11. Background - How Cassandra Stores Data• Rows belong to a node and are replicated• Row lookups are fast• Randomly distributed in cluster8RowKey1RowKey2RowKey3RowKey4RowKey5RowKey6RowKey7RowKey8RowKey9RowKey10RowKey11RowKey12Lookup5RowKey5Thursday, May 2, 13
  • 12. Relational Concept - Sequences• Handy feature for auto-creation of Ids• Guaranteed unique• Depends on a single source of truth (one server)9INSERT INTO user (id, firstName, LastName)VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)Thursday, May 2, 13
  • 13. Cassandra Concept - No sequences• Difficult in a distributed system• Requires a lock (perf killer)• What to do?- Use part of the data to create a unique index, or...- UUID to the rescue!10Thursday, May 2, 13
  • 14. Concept - UUID• Universal Unique ID• 128 bit number represented in character form• Easily generated on the client• Same as GUID for the MS folks1199051fe9-6a9c-46c2-b949-38ef78858dd0RFC 4122 if you want a referenceThursday, May 2, 13
  • 15. Cassandra Concept - Entity model• User table (!!)• Username is the unique key• Static but can be changed dynamically without downtime12CREATE TABLE users (username varchar,firstname varchar,lastname varchar,email varchar,password varchar,created_date timestamp,PRIMARY KEY (username));ALTER TABLE users ADD city text;Thursday, May 2, 13
  • 16. Relational Concept - De-normalization•To combine relations into a single row• Used in relational modeling to avoid complex joins13id First Last1 Edgar Codd2 Raymond Boyceid Dept1 Engineering2 MathEmployeesDepartmentSELECT e.First, e.Last, d.DeptFROM Department d, Employees eWHERE 1 = e.idAND e.id = d.idTake this and then...Thursday, May 2, 13
  • 17. Relational Concept - De-normalization• Combine table columns into a single view• No joins• All in how you set the data for fast reads14SELECT First, Last, DeptFROM employeesWHERE id = ‘1’id First Last Dept1 Edgar Codd Engineering2 Raymond Boyce MathEmployeesThursday, May 2, 13
  • 18. Cassandra Concept - One-to-Many• Relationship without being relational• Users have many videos• Wait? Where is the foreign key?15username firstname lastname emailtcodd Edgar Codd tcodd@relational.comrboyce Raymond Boyce rboyce@relational.comvideoid videoname username description tags99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lolUsersVideosThursday, May 2, 13
  • 19. Cassandra Concept - One-to-many• Static table to store videos• UUID for unique video id• Add username to denormalize16CREATE TABLE videos (videoid uuid,videoname varchar,username varchar,description varchar,tags varchar,upload_date timestamp,PRIMARY KEY(videoid));Thursday, May 2, 13
  • 20. Cassandra Concept - One-to-Many• Lookup video by username• Write in two tables at once for fast lookups17CREATE TABLE username_video_index (username varchar,videoid uuid,upload_date timestamp,video_name varchar,PRIMARY KEY (username, videoid));SELECT video_nameFROM username_video_indexWHERE username = ‘ctodd’AND videoid = ‘99051fe9’Creates a wide row!Thursday, May 2, 13
  • 21. Cassandra concept - Many-to-many• Users and videos have many comments18username firstname lastname emailtcodd Edgar Codd tcodd@relational.comrboyce Raymond Boyce rboyce@relational.comvideoid videoname username description tags99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lolUsersVideosusername videoid commenttcodd 99051fe9 Sweet!rboyce b3a76c6b Boring :(CommentsThursday, May 2, 13
  • 22. Cassandra concept - Many-to-many• Model both sides of the view• Insert both when comment is created•View from either side19CREATE TABLE comments_by_video (videoid uuid,username varchar,comment_ts timestamp,comment varchar,PRIMARY KEY (videoid,username));CREATE TABLE comments_by_user (username varchar,videoid uuid,comment_ts timestamp,comment varchar,PRIMARY KEY (username,videoid));Thursday, May 2, 13
  • 23. Cassandra concept - Many-to-many• Model both sides of the view• Insert both when comment is created•View from either side19CREATE TABLE comments_by_video (videoid uuid,username varchar,comment_ts timestamp,comment varchar,PRIMARY KEY (videoid,username));CREATE TABLE comments_by_user (username varchar,videoid uuid,comment_ts timestamp,comment varchar,PRIMARY KEY (username,videoid));Don’t be afraid of writes. Bring it!Thursday, May 2, 13
  • 24. Relational Concept -Transactions• Built in and easy to use• Can be slow and heavy so don’t use them all the time• Normal forms force ACID writes into many tables20lock-change table one-change table two-change table threecommit-or-lock-change table one-change table two-change table threerollbackThursday, May 2, 13
  • 25. Crazy Concept - Do you need a transaction?• Since they were easy in RDBMS, was it just default?• Read this article• In a nutshell,21http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdfAsynchronous transactionCashier takes your moneyBarista makes your coffeeError? Barista deals with itThursday, May 2, 13
  • 26. Cassandra Concept -Transaction quality• Requires a lock, which is costly in distributed systems• Cassandra features can be used to advantage- Row level isolation- Atomic batches22Thursday, May 2, 13
  • 27. Cassandra Concept -Transaction•Track that something happened• Use time stamps to preserve order• Rectify when any doubt (just like banks do)23CREATE TABLE credit_transaction (username varchar,type varchar,datetime timestamp,credits int,PRIMARY KEY (username,datetime,type)) WITH CLUSTERING ORDER BY (datetime DESC, type ASC);Create this tableSort the columns in reverse orderso last action is first on the listThursday, May 2, 13
  • 28. Cassandra Concept -Transaction• All transactions are stored•Think RPN calculator, latest first24ADD:2013-04-2521:10:32.745REMOVE:2013-04-2515:45:22.813ADD:2013-04-2507:15:12.542$20 $5 $100tcoddRectify account: + $100- $5+ 20---------= $115 Current balanceThursday, May 2, 13
  • 29. Cassandra Concept -Transaction25Create credit_transaction recordwith ADD +TimestampRead user record total_creditsand credit_timestampuser credit_timestamp <credit_transactiontimestamp?Set back in user recordcredit_timestamp andincremented total_creditsCreate credit_transaction recordwith REMOVE +TimestampRead user record total_creditsand credit_timestampuser credit_timestamp <credit_transactiontimestamp?Set back in user recordcredit_timestamp anddecremented total_creditsFail transactionand rectifySuccessAdd Credit Remove creditThursday, May 2, 13
  • 30. And if that doesn’t work...• Lightweight transactions coming soon.• Cassandra 2.0• See CASSANDRA-506226Thursday, May 2, 13
  • 31. But wait there is more!!•The next in this series: May 16th27Become a super modeler• Final will be at the Cassandra Summit: June 11thThe worlds next top data modelThursday, May 2, 13
  • 32. Be there!!!28Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.Thursday, May 2, 13
  • 33. ThankYouQ&AThursday, May 2, 13