Data Modeling for NoSQL

1,732 views
1,572 views

Published on

Video and slides synchronized, mp3 and slide download available at http://bit.ly/16I9vsI.

Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB. Filmed at qconsf.com.

Tony Tam is the CEO and technical co-founder of Wordnik, where he leads development efforts of the site's innovative word navigation system. He is a member of the MongoDB Masters group and has lead the Swagger API Framework open-source initiative. He was a founder and SVP of Engineering at Think Passenger, Lead Engineer at Composite Software and a software development lead at Galileo Labs.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,732
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Data Modeling for NoSQL

  1. 1. Data Modeling for NoSQLTony Tam@fehguy
  2. 2. InfoQ.com: News & Community Site• 750,000 unique visitors/month• Published in 4 languages (English, Chinese, Japanese and BrazilianPortuguese)• Post content from our QCon conferences• News 15-20 / week• Articles 3-4 / week• Presentations (videos) 12-15 / week• Interviews 2-3 / week• Books 1 / monthWatch the video with slidesynchronization on InfoQ.com!http://www.infoq.com/presentations/data-modeling-mongodb
  3. 3. Presented at QCon San Franciscowww.qconsf.comPurpose of QCon- to empower software development by facilitating the spread ofknowledge and innovationStrategy- practitioner-driven conference designed for YOU: influencers ofchange and innovation in your teams- speakers and topics driving the evolution and innovation- connecting and catalyzing the influencers and innovatorsHighlights- attended by more than 12,000 delegates since 2007- held in 9 cities worldwide
  4. 4. Data Modeling?!SmartModelingmakes NoSQL
  5. 5. Why Modeling Matters!• NoSQL => no joins!• What replaces joins?!•  Hierarchy!•  Duplication of data!•  Different models for querying, indexing!• Your optimal data model is (probably) verydifferent than with relational!•  Simpler!•  More like you develop!
  6. 6. Stop Thinking Like This!!endless layersof abstraction (and misery)
  7. 7. Hierarchy before NoSQL!• Simple User Model!
  8. 8. Hierarchy before NoSQL!• Tuned Queries!•  Write some brittle SQL:!•  “select user.id, … inner join settings on …!•  Pick out the fields and construct object hierarchy(this gets nasty, fast)!•  (outer joins for optional values?)!• Object fetching!•  Queries follow object graph, PK/FK!•  5 queries to fetch object in this example!
  9. 9. Hierarchy before NoSQL!
  10. 10. Hierarchy with NoSQL!• JSON structure mapped to objects!•  Fetch json from MongoDB**!•  Unmarshall into objects/tuples!•  Use it!Using JSON4S
  11. 11. Hierarchy with NoSQL!Focus on yourSoftware, notDB layer!
  12. 12. Hierarchy with NoSQL!• Write operations!•  Atomic upsert (create, update or fail)!!•  Saves all levels of object atomically!•  Reduces need for transactions!
  13. 13. Hierarchy with NoSQL!• Write operations!•  Atomic upsert (create, update or fail)!!•  Saves all levels of object atomically!•  Reduces need for transactions!All orConveniencnot magic
  14. 14. Unique Identifiers in your Data!• Relational design => PK/FK!•  Often not “meaningful” identifiers for data!• User Data Model!
  15. 15. Unique Identifiers in your Data!• Relational design => PK/FK!•  Often not “meaningful” identifiers for data!• User Data Model!Unique byusername
  16. 16. Unique Identifiers in your Data!• Words! Ensured tobe constant
  17. 17. Data Duplication !• Without Joins, what about SQL lookuptables?!•  Duplication of data in NoSQL is required!• Trade storage for speed!
  18. 18. Data Duplication !• Without Joins, what about SQL lookuptables?!•  Duplication of data in NoSQL is required!• Trade storage for speed!…Can movelogic to app
  19. 19. Data Duplication!• Many fields don’t change, ever!• But… many do!•  New decisions for the developer!!•  Often background updates!
  20. 20. Data Duplication!• Many fields don’t change, ever!• But… many do!•  New decisions for the developer!!•  Often background updates!How oftendoes thischange?
  21. 21. Data Duplication!
  22. 22. Reaching into Objects!• Incredible feature of MongoDB!•  Dot syntax safely** traverses the object graph!
  23. 23. Inner Indexes!• Convenience at a cost!•  No index => table scan!•  No value? => table scan!•  No child value? => table scan!• Table scan with big collection?!• Can’t index everything!!96GB ofIndexes?
  24. 24. Inner Indexes!• This will should drive your Data Model!• Sparse Data test!Even with only2000 non-emptyvalues!
  25. 25. Adding & Modifying!• Append in mongo is blazing fast!•  “tail” of data is always in memory!•  Pre-allocated data files!• Main expense is “index maintenance”!•  Some marshalling/unmarshalling cost**!• Modifying? Object growth!•  Pre-allocation of space built in collection design!
  26. 26. Adding & Modifying!• Each object has allocated space!•  Exceed that space, need to relocate object!•  Leaves “hole” in collection!• Large increases to documents hurts youroverall performance!• Your data model should strive for equally-sized objects as much as possible!
  27. 27. Retrieval!• Many same rules apply as relational!• Indexes !•  complex/inner or not!•  Indexes in RAM? Yes!•  Cardinality matters!• New(ish) considerations!•  Complex hierarchy not free!•  Marshalling ó unmarshalling!
  28. 28. Marshalling & Unmarshalling!Objectcomplexity
  29. 29. Marshalling & Unmarshalling!• All you can eat from your Data Model?!• Techniques have tremendous impact!•  Development ease until it matters!•  50% speed bump with manual mapping!Only demandwhat you canconsume!
  30. 30. Making the most of _id!• Indexes matter!• Tailor your _id to be meaningful by accesspattern!•  It’s your first defense when auto-sharding!• Date-driven data?!•  Monotonically _id value!•  Ensures recent data is “hot”!
  31. 31. Making the most of _id!• Other time-based data techniques!• Flexibility in querying!
  32. 32. Making the most of _id!• Other time-based data techniques!• Flexibility in querying!Case-sensitiveREGEX is
  33. 33. Making the most of _id!• Hot indexes are happy indexes!•  Access should strive for right bias!• Random access with large indexes hit disk17 15
  34. 34. Your Data Model!• NoSQL gets you started faster!• Many relational pain points are gone!• New considerations (easier?)!• Migration should be real effort!• Designed by access patterns over objectstructure!• Don’t prematurely optimize, but knowwhere the knobs are!
  35. 35. More Reading!• http://tech.wordnik.com!• http://github.com/wordnik/wordnik-oss!• http://developer.wordnik.com!• http://slideshare.net/fehguy!

×