Going Schema-Free
Upcoming SlideShare
Loading in...5

Going Schema-Free






Total Views
Slideshare-icon Views on SlideShare
Embed Views



1 Embed 7

http://www.slideshare.net 7



Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • <br /> <br />
  • what’s our course? where are we starting, where are we going? <br /> (image: somewhere on flickr) <br />
  • Where do we start? <br /> I’m approaching this as someone whose primary/only database use has been with relational databases, using SQL. <br />
  • What are we looking for? <br /> An understanding of what document-oriented databases are, how they differ from relational databases, why you’d use them over relational databases, and what some of the options are. <br />
  • What pitfalls might we encounter? <br /> Matt is not an expert, so I probably will miss stuff, might not be able to argue for document-oriented databases very eloquently, hopefully I won’t totally mislead anybody. <br />
  • back to the title. are we really going to talk about and seriously consider schema-free databases? what’s the point of that? <br /> the short answer is yes. hopefully this presentation will show why schema-free databases are sometimes very useful. <br />
  • quick review: relational databases are made up of relations. <br /> roughly, attributes are columns, tuples are rows. relations are collections of tuples with the same set of attributes, so tables. <br /> nice, structured, data. <br /> (image: http://en.wikipedia.org/wiki/File:Relational_database_terms.svg) <br />
  • You could say that a relational database is defined by its structure. <br /> Structured Query Language <br /> For this presentation, it’s analagous to static programming languages (like C, C++, C#, Java) <br /> <br /> <br /> <br /> so, what are some of the challenges? <br />
  • ironically, some structured data can be difficult or tedious to implement. <br /> For example, parent-child relationships can be difficult to represent and/or query on (select all work items where area path is in “Top-Level Component”) <br />
  • Relational databases typically aren’t designed for replication and scale-out from the beginning. As we all know, neglecting to consider something like this will make it harder to do later. Even something like merging in a source control tool (git vs. svn)... if you start out trying to support it, you’ll do better than if you add it as a feature later. <br />
  • one of the reasons that replication or distribution is difficult is that conflicts are sure to arise. two edits could conflict. two identical ids could be autogenerated... the application can solve these things, but the database isn’t going to provide too much out of the box. <br />
  • Relational databases do solve problems for us, and they’re a powerful tool. I don’t want to discount that. <br />
  • document-oriented databases. <br /> can anyone tell me what document-oriented databases are made up of? <br /> <br /> <br /> <br /> http://www.flickr.com/photos/janodecesare/2978128591/sizes/o/ <br />
  • <br /> <br />
  • we’re not doing waterfall, here <br />
  • attributes. <br />
  • notice the differences. there’s no schema to follow! <br /> flexibility <br />
  • couchdb is a very popular open-source document-oriented database. <br />
  • JavaScript Object Notation <br />
  • CAP theorem. <br /> consistency: all reads return the same, “right” result; reads from two servers return the same result. This ends up being a challenge for lots of big web 2.0 properties -- I’ve read about how flickr, facebook deal with this. <br /> availability: data is returned when requested. i.e. writes don’t block reads. <br /> partition tolerance: the database can be split <br /> choose two. <br /> <br /> <br /> <br /> “eventual consistency” <br /> <br /> <br /> <br /> as you can see, one of the differences between couchdb and a relational database is the consistency/availability tradeoff. couchdb is written in erlang, so some of its features have an erlangish feel to them: data is always there (old revisions always exist, are immutable), and new versions get layered on top. <br />
  • can someone describe map-reduce? <br />
  • enables parallelization. <br />
  • views in couch are map/reduce <br />
  • <br /> <br />
  • <br /> <br />
  • stale=ok means that views won’t be recomputed (if map’s output is in memory, don’t check to see if it needs to be regenerated). <br /> reduce=false skips the reduce function, if it was supplied. <br />
  • <br /> <br />
  • <br /> <br />

Going Schema-Free Going Schema-Free Presentation Transcript

  • Going Schema-free Document-Oriented Databases
  • Schema-free?
  • Structure :)
  • Structures :(
  • Replication
  • Conflicts
  • Documents!
  • (not the bad kind)
  • •name •A Document •abstract •A Document for the purpose of demonstration •attachments •realdoc.doc
  • •name •Another Document •author •Biz Stone •attachments •fulldoc.txt
  • json documents
  • map-reduce
  • views
  • 1 quot;mapquot;: function (doc) 2 { 3   emit(quot;idquot;, quot;valuequot;); 4 }
  • 1 quot;reducequot;: function (keys, values, rereduce) 2 { 3   return {quot;resultquot;: true}; 4 }
  • View options • key/keys • descending • startkey/endkey • skip • startkey_docid/ • group endkey_docid • group_level • limit • reduce • stale • include_docs
  • (demo)
  • More Information •http://couchdb.apache.org/ •http://books.couchdb.org/