Schema Design for Riak (Take 2)
Upcoming SlideShare
Loading in...5
×
 

Schema Design for Riak (Take 2)

on

  • 4,351 views

A discussion of strategies for designing application schemas that use the Riak distributed key-value store.

A discussion of strategies for designing application schemas that use the Riak distributed key-value store.

Video available here: http://vimeo.com/17604126

Statistics

Views

Total Views
4,351
Views on SlideShare
4,317
Embed Views
34

Actions

Likes
11
Downloads
74
Comments
0

3 Embeds 34

http://coderwall.com 32
http://twitter.com 1
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Schema Design for Riak (Take 2) Schema Design for Riak (Take 2) Presentation Transcript

  • Schema Design for Riak Sean Cribbs Developer Advocateb a sho
  • Thinking Non-Relationallyb a sho
  • “There is no spoon schema.”b a sho
  • There is an application.b a sho
  • What you loseb a sho
  • What you lose •Tablesb a sho
  • What you lose •Tables •Foreign keys and constraintsb a sho
  • What you lose •Tables •Foreign keys and constraints •ACIDb a sho
  • What you lose •Tables •Foreign keys and constraints •ACID •Sophisticated query plannersb a sho
  • What you lose •Tables •Foreign keys and constraints •ACID •Sophisticated query planners •Declarative query language (SQL)b a sho
  • What you gainb a sho
  • What you gain •More flexible, fluid designsb a sho
  • What you gain •More flexible, fluid designs •More natural data representationsb a sho
  • What you gain •More flexible, fluid designs •More natural data representations •Scaling without painb a sho
  • What you gain •More flexible, fluid designs •More natural data representations •Scaling without pain •Reduced operational complexityb a sho
  • Walking without Relational crutchesb a sho
  • Walking without Relational crutches •Sparse data (optional/multi-value fields)b a sho
  • Walking without Relational crutches •Sparse data (optional/multi-value fields) •Richer data structuresb a sho
  • Walking without Relational crutches •Sparse data (optional/multi-value fields) •Richer data structures •Meaningful identifiersb a sho
  • Walking without Relational crutches •Sparse data (optional/multi-value fields) •Richer data structures •Meaningful identifiers •Innovative access patternsb a sho
  • Know your datab a sho
  • Make your Top 10 listb a sho
  • Make your Top 10 list •Frequently requested pages/screensb a sho
  • Make your Top 10 list •Frequently requested pages/screens •Slow queriesb a sho
  • Make your Top 10 list •Frequently requested pages/screens •Slow queries •Secondary indexesb a sho
  • Make your Top 10 list •Frequently requested pages/screens •Slow queries •Secondary indexes •Complicated joins or aggregationsb a sho
  • Analyzeb a sho
  • Analyze •Interdependencies, couplingb a sho
  • Analyze •Interdependencies, coupling •Cardinalities of relationshipsb a sho
  • Analyze •Interdependencies, coupling •Cardinalities of relationships •Access patternb a sho
  • Analyze •Interdependencies, coupling •Cardinalities of relationships •Access pattern •Shoehorned structuresb a sho
  • Exampleb a sho
  • Radiant CMS radiantcms.orgb a sho
  • The “Styled Blog” Templateb a sho
  • Date-organized pages The “Styled Blog” Templateb a sho
  • Static sidebar content Date-organized pages The “Styled Blog” Templateb a sho
  • Administration UIb a sho
  • hierarchical site structure Administration UIb a sho
  • add new pages by parent hierarchical site structure Administration UIb a sho
  • add new pages by parent special “virtual” pages hierarchical site structure Administration UIb a sho
  • b a sho
  • slug, breadcrumbb a sho
  • slug, breadcrumb content blocksb a sho
  • slug, breadcrumb content blocks layoutb a sho
  • b a sho
  • template tagsb a sho
  • Content-Type template tagsb a sho
  • Relational Schema pagesb a sho
  • Relational Schema parent pagesb a sho
  • Relational Schema parent pages page_partsb a sho
  • Relational Schema parent pages page_parts layoutsb a sho
  • Relational Schema parent pages page_parts layouts snippetsb a sho
  • Relational Schema parent pages page_parts layouts users snippetsb a sho
  • Relational Schema parent pages page_parts created/modified by layouts users snippetsb a sho
  • Converting to Riakb a sho
  • Layouts and Snippetsb a sho
  • Layouts and Snippets • Layouts and Snippets are accessed by nameb a sho
  • Layouts and Snippets • Layouts and Snippets are accessed by name • SQL: WHERE name=?b a sho
  • Layouts and Snippets • Layouts and Snippets are accessed by name layouts/Main • SQL: WHERE name=? • Simple access by key snippets/top-navb a sho
  • Layouts and Snippets • Layouts and Snippets are accessed by name layouts/Main • SQL: WHERE name=? • Simple access by key snippets/top-nav • Simple value structure (content + metadata)b a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=?b a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email keyb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • Easy lookup on one, manual index otherb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • Easy lookup on one, manual index other • #2: Arbitrary keyb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • Easy lookup on one, manual index other • #2: Arbitrary key • Independent of email/login changesb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • Easy lookup on one, manual index other • #2: Arbitrary key • Independent of email/login changes • Manually index bothb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • #3: Users-list object • Easy lookup on one, manual index other • #2: Arbitrary key • Independent of email/login changes • Manually index bothb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • #3: Users-list object • Easy lookup on one, • Fits “small teams” manual index other philosophy • #2: Arbitrary key • Independent of email/login changes • Manually index bothb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • #3: Users-list object • Easy lookup on one, • Fits “small teams” manual index other philosophy • #2: Arbitrary key • Quick lookup (one fetch) • Independent of email/login changes • Manually index bothb a sho
  • Users Normally accessed by login or email: WHERE login=? OR email=? • #1: Login or Email key • #3: Users-list object • Easy lookup on one, • Fits “small teams” manual index other philosophy • #2: Arbitrary key • Quick lookup (one fetch) • Independent of email/login changes • Bad for large list, still need unique IDs • Manually index bothb a sho
  • The Bigger Problemsb a sho
  • Page Renderingb a sho
  • Page Rendering • Load and render layout in page contextb a sho
  • Page Rendering • Load and render layout in page context • Render parts, snippets via template tagsb a sho
  • Page Rendering • Load and render layout in page context • Render parts, snippets via template tags • Iterate over sections of page hierarchyb a sho
  • Rendering Page Partsb a sho
  • Rendering Page Parts •Parts are dependent on the pageb a sho
  • Rendering Page Parts •Parts are dependent on the page •Compose / Denormalize (part of page)b a sho
  • Rendering Page Parts •Parts are dependent on the page •Compose / Denormalize (part of page) •Reduces lookups, conceptual unity, no index neededb a sho
  • Rendering Page Parts •Parts are dependent on the page •Compose / Denormalize (part of page) •Reduces lookups, conceptual unity, no index needed • Larger objects, no lazy fetchingb a sho
  • Denormalized Page { title:”Home Page”, parts:[ {name:”body”, content:”...”}, {name:”sidebar”, content:”...”}, ], // ... }b a sho
  • Denormalized Page { title:”Home Page”, parts:[ {name:”body”, content:”...”}, {name:”sidebar”, content:”...”}, ], // ... } dependent objects inlineb a sho
  • Site Hierarchy ...or, how to find a page (currently)b a sho
  • Site Hierarchy ...or, how to find a page (currently) • Start at the root: WHERE parent_id IS NULLb a sho
  • Site Hierarchy ...or, how to find a page (currently) • Start at the root: WHERE parent_id IS NULL • Check if the current page matches URLb a sho
  • Site Hierarchy ...or, how to find a page (currently) • Start at the root: WHERE parent_id IS NULL • Check if the current page matches URL • Check children for matching path segment, recurseb a sho
  • Site Hierarchy ...or, how to find a page (currently) • Start at the root: WHERE parent_id IS NULL • Check if the current page matches URL • Check children for matching path segment, recurse • Return “not found” page or 404b a sho
  • Site Hierarchy ...or, how to find a page (analysis)b a sho
  • Site Hierarchy ...or, how to find a page (analysis) • O(log N) queries to find requested pageb a sho
  • Site Hierarchy ...or, how to find a page (analysis) • O(log N) queries to find requested page • Index on parent_id speeds retrievalb a sho
  • Site Hierarchy ...or, how to find a page (analysis) • O(log N) queries to find requested page • Index on parent_id speeds retrieval • Traversing/iterating for content generationb a sho
  • Site Hierarchy ...or, how to find a page (analysis) • O(log N) queries to find requested page • Index on parent_id speeds retrieval • Traversing/iterating for content generation • Generating URL pathsb a sho
  • Site Hierarchy ...or, how to find a page (in Riak)b a sho
  • Site Hierarchy ...or, how to find a page (in Riak) • Blog post: http://ow.ly/3jDAOb a sho
  • Site Hierarchy ...or, how to find a page (in Riak) • Blog post: http://ow.ly/3jDAO • #1 Parent/child linksb a sho
  • Site Hierarchy ...or, how to find a page (in Riak) • Blog post: http://ow.ly/3jDAO • #1 Parent/child links • #2 Material path keyb a sho
  • Site Hierarchy ...or, how to find a page (in Riak) • Blog post: http://ow.ly/3jDAO • #1 Parent/child links • #2 Material path key • #3 Tree objectb a sho
  • #1: Parent/Child Links A Natural Tree A B C D E Fb a sho
  • #1: Parent/Child Links A Natural Tree • Easy to understand & A implement B C D E Fb a sho
  • #1: Parent/Child Links A Natural Tree </riak/pages/C>; riaktag=”child” • Easy to understand & A implement </riak/pages/A>; riaktag=”parent” • Doubly-linked for easy traversal in either B C direction D E Fb a sho
  • #1: Parent/Child Links A Natural Tree </riak/pages/C>; riaktag=”child” • Easy to understand & A implement </riak/pages/A>; riaktag=”parent” • Doubly-linked for easy traversal in either B C direction • Easy to move entire subtrees D E Fb a sho
  • #1: Parent/Child Links A Natural Tree A B C D E Fb a sho
  • #1: Parent/Child Links A Natural Tree A • Child order is arbitrary B C D E Fb a sho
  • #1: Parent/Child Links A Natural Tree A • Child order is arbitrary • Still O(log N) traversal B C D E Fb a sho
  • #1: Parent/Child Links A Natural Tree A • Child order is arbitrary • Still O(log N) traversal B C • Two writes to add or update D E Fb a sho
  • #2: Material Path Key “Best Guess” discovery _root_ (A) B C B/D C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery _root_ (A) • Use path-to-page as key B C B/D C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery _root_ (A) • Use path-to-page as key B • Best case: one lookup C to find requested page B/D C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery _root_ (A) • Use path-to-page as key B • Best case: one lookup C to find requested page B/D • Ancestor pages listed inside object C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery { _root_ (A) • Use path-to-page as title:”D”, key ancestors:[ B “B”, • Best case: one lookup “_root_” C to find requested page ] B/D • Ancestor pages listed inside object C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery _root_ (A) B C B/D C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery • Large update cost for _root_ (A) internal nodes B C B/D C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery • Large update cost for _root_ (A) internal nodes • Dynamic URLs hard B C B/D C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery • Large update cost for _root_ (A) internal nodes • Dynamic URLs hard B • Key-filters (0.14) C needed for efficient B/D child lists C/E “pages” bucketb a sho
  • #2: Material Path Key “Best Guess” discovery • Large update cost for _root_ (A) internal nodes • Dynamic URLs hard B • Key-filters (0.14) C needed for efficient B/D child lists • Need fallback for C/E “misses” “pages” bucketb a sho
  • #3: Tree Object “Branches and Leaves” { key:”A”, children:[ {key:”B”, children:[“D”]}, {key:”C”, children:[“E”,”F”]} ] }b a sho
  • #3: Tree Object “Branches and Leaves” • One request to get site structure { key:”A”, children:[ {key:”B”, children:[“D”]}, {key:”C”, children:[“E”,”F”]} ] }b a sho
  • #3: Tree Object “Branches and Leaves” • One request to get site structure { key:”A”, • Separates content from children:[ {key:”B”, organization children:[“D”]}, {key:”C”, children:[“E”,”F”]} ] }b a sho
  • #3: Tree Object “Branches and Leaves” • One request to get site structure { key:”A”, • Separates content from children:[ {key:”B”, organization children:[“D”]}, {key:”C”, • Intrinsic ordering ] children:[“E”,”F”]} }b a sho
  • #3: Tree Object “Branches and Leaves” • One request to get site structure { key:”A”, • Separates content from children:[ {key:”B”, organization children:[“D”]}, {key:”C”, • Intrinsic ordering ] children:[“E”,”F”]} } • Massive structure changes are quickb a sho
  • #3: Tree Object “Branches and Leaves” { key:”A”, children:[ {key:”B”, children:[“D”]}, {key:”C”, children:[“E”,”F”]} ] }b a sho
  • #3: Tree Object “Branches and Leaves” • Large site = large tree object (expensive to transfer) { key:”A”, children:[ {key:”B”, children:[“D”]}, {key:”C”, children:[“E”,”F”]} ] }b a sho
  • #3: Tree Object “Branches and Leaves” • Large site = large tree object (expensive to transfer) { key:”A”, • Some metadata children:[ {key:”B”, (slug?)will need to be children:[“D”]}, {key:”C”, in-tree for efficient children:[“E”,”F”]} traversal ] }b a sho
  • #3: Tree Object “Branches and Leaves” • Large site = large tree object (expensive to transfer) { key:”A”, • Some metadata children:[ {key:”B”, (slug?)will need to be children:[“D”]}, {key:”C”, in-tree for efficient children:[“E”,”F”]} traversal ] } • Multiple writers problematicb a sho
  • Hybrid Solutions TIMTOWTDIb a sho
  • Hybrid Solutions TIMTOWTDI •Material paths for quick lookups, links for relative traversalb a sho
  • Hybrid Solutions TIMTOWTDI •Material paths for quick lookups, links for relative traversal •Tree object for relative lookups, secondary index for material pathsb a sho
  • Key Takeawaysb a sho
  • Key Takeaways •Design for most common access pattern: the key is your indexb a sho
  • Key Takeaways •Design for most common access pattern: the key is your index •Denormalize dependent data typesb a sho
  • Key Takeaways •Design for most common access pattern: the key is your index •Denormalize dependent data types •Build richer (or simpler) data structuresb a sho
  • Key Takeaways •Design for most common access pattern: the key is your index •Denormalize dependent data types •Build richer (or simpler) data structures •Use links to connect normalized or independent typesb a sho
  • Useful Tipsb a sho
  • Useful Tips •Use query or identity caches to reduce duplicate fetchesb a sho
  • Useful Tips •Use query or identity caches to reduce duplicate fetches •Store data in JSON, XML, or Erlang terms for MapReduceb a sho
  • Useful Tips •Use query or identity caches to reduce duplicate fetches •Store data in JSON, XML, or Erlang terms for MapReduce •Use Riak Search where appropriate to reduce complexityb a sho
  • Reviewb a sho
  • Review •Analyze your relational modelb a sho
  • Review •Analyze your relational model •Identify pain points, take statsb a sho
  • Review •Analyze your relational model •Identify pain points, take stats •Design some alternativesb a sho
  • Review •Analyze your relational model •Identify pain points, take stats •Design some alternatives •Test, Measure, Repeat!b a sho
  • Plug Interested in learning about support, consulting, or Enterprise features?   Email info@basho.com or go to http://www.basho.com/contact.html to talk with us. www.basho.comb a sho
  • Plug Interested in learning about support, consulting, or Enterprise features?   Email info@basho.com or go to http://www.basho.com/contact.html to talk with us. www.basho.com sean@basho.com @seancribbsb a sho