Data Modeling for NoSQL

29,344 views

Published on

Slides from QConSF 2012 about data modeling with NoSQL, specifically mongodb

Published in: Technology
3 Comments
44 Likes
Statistics
Notes
No Downloads
Views
Total views
29,344
On SlideShare
0
From Embeds
0
Number of Embeds
2,548
Actions
Shares
0
Downloads
540
Comments
3
Likes
44
Embeds 0
No embeds

No notes for slide
  • Go ½ way, won’t get the promised results. You’ll look silly
  • Order history won’t change, but customer will
  • Order history won’t change, but customer will
  • 150mb index is not efficient
  • Data Modeling for NoSQL

    1. 1. Data Modeling for NoSQL Tony Tam @fehguy
    2. 2. Data Modeling? Smart Modeling makesNoSQL work
    3. 3. Why Modeling Matters• NoSQL => no joins• What replaces joins? • Hierarchy • Duplication of data • Different models for querying, indexing• Your optimal data model is (probably) very different than with relational • Simpler • More like you develop
    4. 4. Stop Thinking Like This! endless layers of abstraction(and misery)
    5. 5. Hierarchy before NoSQL• Simple User Model
    6. 6. Hierarchy before NoSQL• Tuned Queries • Write some brittle SQL: • “select user.id, … inner join settings on … • Pick out the fields and construct object hierarchy (this gets nasty, fast) • (outer joins for optional values?)• Object fetching • Queries follow object graph, PK/FK • 5 queries to fetch object in this example
    7. 7. Hierarchy before NoSQL
    8. 8. Hierarchy with NoSQL• JSON structure mapped to objects • Fetch json from MongoDB** • Unmarshall into objects/tuples • Use it Using JSON4S
    9. 9. Hierarchy with NoSQL Focus on your Software, n ot DB layer!
    10. 10. Hierarchy with NoSQL• Write operations • Atomic upsert (create, update or fail) • Saves all levels of object atomically • Reduces need for transactions
    11. 11. Hierarchy with NoSQL • Write operations • Atomic upsert (create, update or fail) • Saves all levels of object atomically • Reduces need for transactions Convenienc e not magic All ornothing
    12. 12. Unique Identifiers in your Data• Relational design => PK/FK • Often not “meaningful” identifiers for data• User Data Model
    13. 13. Unique Identifiers in your Data• Relational design => PK/FK • Often not “meaningful” identifiers for data Unique by• User Data Model username
    14. 14. Unique Identifiers in your Data• Words Ensured to be constant
    15. 15. Data Duplication• Without Joins, what about SQL lookup tables? • Duplication of data in NoSQL is required• Trade storage for speed
    16. 16. Data Duplication …Can• Without Joins, what about SQL lookup move logic tables? to app • Duplication of data in NoSQL is required• Trade storage for speed
    17. 17. Data Duplication• Many fields don’t change, ever• But… many do • New decisions for the developer! • Often background updates
    18. 18. Data Duplication How often• Many fields don’t change, ever does this• But… many do change? • New decisions for the developer! • Often background updates
    19. 19. Data Duplication
    20. 20. Reaching into Objects• Incredible feature of MongoDB • Dot syntax safely** traverses the object graph
    21. 21. Inner Indexes• Convenience at a cost • No index => table scan • No value? => table scan • No child value? => table scan• Table scan with big collection?• Can’t index everything! 96GB of Indexes?
    22. 22. Inner Indexes• This will should drive your Data Model• Sparse Data test Even with only 2000 non- empty values!
    23. 23. Adding & Modifying• Append in mongo is blazing fast • “tail” of data is always in memory • Pre-allocated data files• Main expense is “index maintenance” • Some marshalling/unmarshalling cost**• Modifying? Object growth • Pre-allocation of space built in collection design
    24. 24. Adding & Modifying• Each object has allocated space • Exceed that space, need to relocate object • Leaves “hole” in collection• Large increases to documents hurts your overall performance• Your data model should strive for equally- sized objects as much as possible
    25. 25. Retrieval• Many same rules apply as relational• Indexes • complex/inner or not • Indexes in RAM? Yes • Cardinality matters• New(ish) considerations • Complex hierarchy not free • Marshalling  unmarshalling
    26. 26. Marshalling & Unmarshalling Object complexit yRecords/sec
    27. 27. Marshalling & Unmarshalling• All you can eat from your Data Model?• Techniques have tremendous impact • Development ease until it matters • 50% speed bump with manual mapping Only demand what you can consume!
    28. 28. Making the most of _id• Indexes matter• Tailor your _id to be meaningful by access pattern • It’s your first defense when auto-sharding• Date-driven data? • Monotonically _id value • Ensures recent data is “hot”
    29. 29. Making the most of _id• Other time-based data techniques• Flexibility in querying
    30. 30. Making the most of _id• Other time-based data techniques• Flexibility in querying Case- sensitive REGEX is your pal
    31. 31. Making the most of _id• Hot indexes are happy indexes • Access should strive for right bias• Random access with large indexes hit disk 1 7 1 2 5 7
    32. 32. Your Data Model• NoSQL gets you started faster• Many relational pain points are gone• New considerations (easier?)• Migration should be real effort• Designed by access patterns over object structure• Don’t prematurely optimize, but know where the knobs are
    33. 33. More Reading• http://tech.wordnik.com• http://github.com/wordnik/wordnik-oss• http://developer.wordnik.com• http://slideshare.net/fehguy

    ×