Going Schema-Free

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1



    what’s our course? where are we starting, where are we going?
    (image: somewhere on flickr)

    Where do we start?
    I’m approaching this as someone whose primary/only database use has been with relational databases, using SQL.

    What are we looking for?
    An understanding of what document-oriented databases are, how they differ from relational databases, why you’d use them over relational databases, and what some of the options are.

    What pitfalls might we encounter?
    Matt is not an expert, so I probably will miss stuff, might not be able to argue for document-oriented databases very eloquently, hopefully I won’t totally mislead anybody.

    back to the title. are we really going to talk about and seriously consider schema-free databases? what’s the point of that?
    the short answer is yes. hopefully this presentation will show why schema-free databases are sometimes very useful.

    quick review: relational databases are made up of relations.
    roughly, attributes are columns, tuples are rows. relations are collections of tuples with the same set of attributes, so tables.
    nice, structured, data.
    (image: http://en.wikipedia.org/wiki/File:Relational_database_terms.svg)

    You could say that a relational database is defined by its structure.
    Structured Query Language
    For this presentation, it’s analagous to static programming languages (like C, C++, C#, Java)



    so, what are some of the challenges?

    ironically, some structured data can be difficult or tedious to implement.
    For example, parent-child relationships can be difficult to represent and/or query on (select all work items where area path is in “Top-Level Component”)

    Relational databases typically aren’t designed for replication and scale-out from the beginning. As we all know, neglecting to consider something like this will make it harder to do later. Even something like merging in a source control tool (git vs. svn)... if you start out trying to support it, you’ll do better than if you add it as a feature later.

    one of the reasons that replication or distribution is difficult is that conflicts are sure to arise. two edits could conflict. two identical ids could be autogenerated... the application can solve these things, but the database isn’t going to provide too much out of the box.

    Relational databases do solve problems for us, and they’re a powerful tool. I don’t want to discount that.

    document-oriented databases.
    can anyone tell me what document-oriented databases are made up of?



    http://www.flickr.com/photos/janodecesare/2978128591/sizes/o/



    we’re not doing waterfall, here

    attributes.

    notice the differences. there’s no schema to follow!
    flexibility

    couchdb is a very popular open-source document-oriented database.

    JavaScript Object Notation

    CAP theorem.
    consistency: all reads return the same, “right” result; reads from two servers return the same result. This ends up being a challenge for lots of big web 2.0 properties -- I’ve read about how flickr, facebook deal with this.
    availability: data is returned when requested. i.e. writes don’t block reads.
    partition tolerance: the database can be split
    choose two.



    “eventual consistency”



    as you can see, one of the differences between couchdb and a relational database is the consistency/availability tradeoff. couchdb is written in erlang, so some of its features have an erlangish feel to them: data is always there (old revisions always exist, are immutable), and new versions get layered on top.

    can someone describe map-reduce?

    enables parallelization.

    views in couch are map/reduce





    stale=ok means that views won’t be recomputed (if map’s output is in memory, don’t check to see if it needs to be regenerated).
    reduce=false skips the reduce function, if it was supplied.





    Favorites, Groups & Events

    Going Schema-Free - Presentation Transcript

    1. Going Schema-free Document-Oriented Databases
    2. Schema-free?
    3. Structure :)
    4. Structures :(
    5. Replication
    6. Conflicts
    7. Documents!
    8. (not the bad kind)
    9. •name •A Document •abstract •A Document for the purpose of demonstration •attachments •realdoc.doc
    10. •name •Another Document •author •Biz Stone •attachments •fulldoc.txt
    11. json documents
    12. map-reduce
    13. views
    14. 1 \"map\": function (doc) 2 { 3   emit(\"id\", \"value\"); 4 }
    15. 1 \"reduce\": function (keys, values, rereduce) 2 { 3   return {\"result\": true}; 4 }
    16. View options • key/keys • descending • startkey/endkey • skip • startkey_docid/ • group endkey_docid • group_level • limit • reduce • stale • include_docs
    17. (demo)
    18. More Information •http://couchdb.apache.org/ •http://books.couchdb.org/

    + spraintsspraints, 6 months ago

    custom

    230 views, 0 favs, 0 embeds more stats

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 230
      • 230 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 4
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Tags