Document databases

Document databases
The mystery revealed

Contents
 noSQL
 Culture shock
 Document databases
 Concepts
 Benefits
 Schema design
 MongoDB
 Internals
 Use in .NET

noSQL
 Collective term for
a range of db’s
 Non-relational
 Key/value pairs
 key = field name

Comparison
Article
- id
- authorid
- title
- content
Comment
- id
- articleid
- message
Author
- id
- name
- email
Article
- _id
- title
- content
- author
- comments[]
- _id
- name
- email
 Relational  Document db

Terminology
 In parallel with SQL:
Relational Document db
Table Collection
Row Document
Column Field
Index Index
Join Embedding & linking
Schema N/A

Data integrity
 Shift of responsibilities to the app
 Manage data integrity and validity yourself
 Database more efficient and more scalable
DB
data integrity &
validity checks
APPLICATION

Concepts
 Joins
 No joins
 Joins at "design time", not at "query time“
 Due to embedded docs and arrays less joins are needed
 Constraints
 No foreign key constraints
 Unique indexes
 Transactions
 No commit/rollback
 Atomic operations
 Multiple actions inside the same document
 Incl. embedded documents

Dynamic schema
 No schema
 Implied: definition in the app, not the db
 A field can exist in certain docs and not in others
 When indexing null as a value
 Sparse index: exclude docs without that field
 Writing to a non-existent collection or database
 Lazy creation
 Reading from a non-existent collection
 Empty value returned

Relations
 Embedded fields
 Can be queried, the parent doc is returned
 Can be indexed
 Can’t be used for ordering
 Linking
 Get the 2nd doc yourself in de app via a reference
 Avoid where possible
 Use for:
 Many-to-many relations
 Subdoc often needs to be modified

Benefits
 Scalable: good for a lot of data / traffic
 Horizontal scaling: to more nodes
 Good for web-apps
 Performance
 No joins and constraints
 Dev/user friendly
 Data is modeled to how the app is going to use it
 No conversion between object oriented > relational
 No static schema = agile

Drawbacks
 More mistake-prone
 No data integrity checks
 Database is app-specific
 Less flexibility for shared usage
 Data aggregation is harder
 Less suitable for reporting

Schema design
 Start from application-specific queries
 “What questions do I have?” vs “What answers”
 “Data like the application wants it”
 Base parent documents on:
 The most common usage
 What do I want returned?

Schema design
 Hybrid embed / link
 Changing the author name is a seldom occurring action
 First update author.name
 Then update the articles async
Article
- _id
- author
- content
- _id
- name
- email
Author
- _id
- name
- email

Schema design
 Data duplication & denormalisation
 Pro
 simplicity
 optimalisation (less IO operations)
 query processing
 Con
 more disk usage
 data integrity
 Embedded docs
 Recommended < 250 kB

Product
Single collection inheritance
Product
- _id
- price
Book
- author
- title
Album
- artist
- title
Jeans
- size
- color
Book
- _id
- price
- author
- title
Jeans
- _id
- price
- size
- color

Product
Single collection inheritance
Product
- _id
- price
Book
- author
- title
Album
- artist
- title
Jeans
- size
- color
_type: Book
- _id
- price
- author
- title
_type: Jeans
- _id
- price
- size
- color

One-to-many
 Embedded array / array keys
 Some queries get harder
 You can index arrays!
 Normalized approach
 More flexibility
 A lot less performance
Article
- _id
- content
- tags: {“foo”, “bar”}
- comments: {“id1”, “id2”}

Many-to-many
 Using array keys
 No join table
 References on both sides
 Advantage: simple queries
articles.Where(p => p.CategoryIds.Contains(categoryId))
categories.Where(c => c.ArticleIds.Contains(articleId))
 Disadvantage: duplication, update two docs
Article
- _id
- content
- category_ids : {“id1”, “id2”}
Category
- _id
- name
- article_ids: {“id7”, “id8”}

Many-to-many
 References on one side
 Advantage: data in one place
 Disadvantage: 2 queries
articles.Where(p => p.CategoryIds.Contains(categoryId))
var article = articles.Single(p => p.Id == articleId)
categories.Where(c => c.Id.In(article.CategoryIds))
Article
- _id
- content
- category_ids : {“id1”, “id2”}
Category
- _id
- name

To sum up
 A new mind set
 Serialize complex .NET objects directly to the db
 Data duplication and denormalisation are key
 Big shift of responsibilities to the app
 No built-in data integrity checks
 Database has a single responsibility: storing data
 Quicker and easier to scale

MongoDB
 Why MongoDB?
 Largest user base, mature
 Platform independent
 Open source, free
Source: Google Trends

MongoDB: internals
 Durability
 By default through replication
 Single server durability: less performance
 Eventual consistency
 Configure fsync: sync between memory and disk
 by default every 60 sec.
 Configure replicate before return

MongoDB: internals
 Safe mode
 Turn off eventual consistency
 sync directly to the disk
 sufficiently replicate data, in replication sets
 Calls GetLastError to determine whether the action was
successful
 Applies to actions without a return value
 On connection or action level

MongoDB: internals
 Replication sets
 Nodes that are copies of each other
 Set-up of master and slave nodes
 If the master goes down, the slave automatically
takes over and promotes itself to master

 Sharding
 Scale out
 Clusters of replica sets
 Connected to
 a central proxy
 used by clients
 config servers
 contain meta-data
 Write to multiple nodes
MongoDB: internals

MongoDB: internals
 Sharding
 Based on a shard key (= field)
 Commands are sent to the shard that includes the
relevant range of the data
 Data is evenly distributed across the shards
 Automatic reallocation of data when adding or removing
servers

MongoDB: internals
 BSON
 Data storage and network transfer format
 Binary serialized JSON
 System collections
 db.systems.collections
 db.systems.indexes
 Geospatial indexing
 Find results closest to coordinate
 db.places.find({ loc: {$near: [50, 4], $maxDistance: 5} })

Links
 http://www.mongodb.org/display/DOCS/CSharp+Language+Cen
ter
 Quick-start
 Documentation
 LINQ
 Serialization
 http://mongly.com/
 Free eBook
 Interactive tutorial

Document databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Document databases

Similar to Document databases (20)

More from Qframe

More from Qframe (6)

Recently uploaded

Recently uploaded (20)

Document databases

Editor's Notes