DMDW  Extra Lesson - NoSql and MongoDB
Upcoming SlideShare
Loading in...5
×
 

DMDW Extra Lesson - NoSql and MongoDB

on

  • 3,777 views

 

Statistics

Views

Total Views
3,777
Views on SlideShare
3,343
Embed Views
434

Actions

Likes
2
Downloads
130
Comments
0

4 Embeds 434

http://blog.johanneshoppe.de 424
http://localhost 5
url_unknown 3
https://si0.twimg.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

DMDW  Extra Lesson - NoSql and MongoDB DMDW Extra Lesson - NoSql and MongoDB Presentation Transcript

  • STUDIEREN
    UND DURCHSTARTEN.
    Author: Dip.-Inf. (FH) Johannes Hoppe
    Date: 06.05.2011
  • NoSQL and MongoDB
    Author: Dip.-Inf. (FH) Johannes Hoppe
    Date: 06.05.2011
  • 01
    Not only SQL
    3
  • Trends
    4
  • Trends
    Data
    Facebook had 60k servers in 2010
    Google had 450k servers in 2006 (speculated)
    Microsoft: between 100k and 500k servers (since Azure)
    Amazon: likely has a similar numbers, too (S3)
    Facebook Server Footprint
    5
  • Trends
    Trend 1: increasing data sizes
    Trend 2: more connectedness (“web 2.0”)
    Trend 3:moreindividualization (feverstructure)
    6
  • NoSQL
    7
  • NoSQL
    Database paradigms
    Relational (RDBMS)
    NoSQL
    Key-Value stores
    Document databases
    Wide column stores (BigTable and clones)
    Graph databases
    Other
    8
  • NoSQL
    Some NoSQL use cases
    1. Massive data volumes
    Massively distributed architecture required to store the data
    Google, Amazon, Yahoo, Facebook…
    2. Extreme query workload
    Impossible to efficiently do joins at that scale with an RDBMS
    3. Schema evolution
    Schema flexibility (migration) is not trivial at large scale
    Schema changes can be gradually introduced with NoSQ
    9
  • NoSQL - CAP theorem
    Requirements for distributed systems:
    Consistency
    Availability
    Partition tolerance
    10
  • NoSQL - CAP theorem
    Consistency
    The system is in a consistent state after an operation
    All clients see the same data
    Strong consistency (ACID)vs. eventual consistency (BASE)
    ACID: Atomicity, Consistency, Isolation and Durability
    BASE: Basically Available, Soft state, Eventually consistent
    11
  • NoSQL - CAP theorem
    Availability
    The system is “always on”, no downtime
    Node failure tolerance– all clients can find some available replica
    Software/hardware upgrade tolerance
    12
  • NoSQL - CAP theorem
    Partition tolerance
    The system continues to function even when
    Split into disconnected subsets (by a network disruption)
    Not only for reads, but writes as well!
    13
  • NoSQL
    CAP Theorem
    E. Brewer, N. Lynch
    You can satisfyat most 2 out of the 3 requirements
    14
  • NoSQL
    CAP Theorem  CA
    Single site clusters(easier to ensure all nodes are always in contact)
    When a partition occurs, the system blocks
    e.g. usable for two-phase commits (2PC) which already require/use blocks
    15
  • NoSQL
    CAP Theorem  CA
    Single site clusters(easier to ensure all nodes are always in contact)
    When a partition occurs, the system blocks
    e.g. usable for two-phase commits (2PC) which already require/use blocks
    Obviously, any horizontal scaling strategy is based on data partitioning; therefore, designers are forced to decide between consistency and availability.
    16
  • NoSQL
    CAP Theorem  CP
    Some data may be inaccessible (availability sacrificed), but the rest is still consistent/accurate
    e.g. sharded database
    17
  • NoSQL
    CAP Theorem  AP
    System is still available under partitioning,but some of the data returned my be inaccurate
    Need some conflict resolution strategy
    e.g. Master/Slave replication
    18
  • NoSQL
    RDBMS
    Guaratnee ACID by CA(two-phasecommits)
    SQL
    Mature:
    19
  • NoSQL
    NoSQL DBMS
    No relational tables
    No fixed table schemas
    No joins
    No risk, no fun!
    CP and AP
    (and sometimes even AP and on top of CP  MongoDB*)
    * This is damn cool!
    20
  • NoSQL
    Key-value
    One key  one value, very fast
    Key: Hash (no duplicates)
    Value: binary object („BLOB“)
    (DB does not understand your content)
    Players: Amazon Dynamo, Memcached…
    21
  • NoSQL
    key
    value
    ?=PQ)“§VN? =§(Q$U%V§W=(BN W§(=BU&W§$()= W§$(=%
    GIVE ME A MEANING!
    customer_22
    22
  • NoSQL
    Document databases
    Key-value store, too
    Value is „understood“ by the DB
    Querying the data is possible(not just retrieving the key‘s content)
    Players: Amazon SimpleDB, CouchDB, MongoDB …
    23
  • NoSQL
    key
    value
    {
    Type: “Customer”,
    Name: "Norbert“,
    Invoiced: 2222
    }
    customer_22
    24
  • NoSQL
    key
    value / documents
    {
    Type: "Customer",
    Name: "Norbert",
    Invoiced: 2222
    Messages: [
    { Title: "Hello",
    Text: "World" },
    { Title: "Second",
    Text: "message" }
    ]
    }
    customer_22
    25
  • NoSQL
    (Wide) column stores
    Often referred as “BigTable clones”
    Each key is associated with many attributes (columns)
    NoSQL column stores are actually hybrid row/column stores
    Different from “pure” relational column stores!
    Players: Google BigTable, Cassandra (Facebook), HBase…
    26
  • NoSQL
    Won‘t be stored as: It will be stored as:
    22;Norbert;22222 22;23;24
    23;Hans;50000 Norbert;Hans;Franz
    24;Franz;44000 22222;50000;44000
    27
  • NoSQL
    Graph databases
    Multi-relational graphs
    SPARQL query language (W3C Recommendation!)
    Players: Neo4j, InfoGrid …
    (note: graph DBs are special and somehow the “black sheep” in the NoSQL world –the following PROs/CONs don’t apply very well)
    28
  • NoSQL
    PROs (& Promisses)
    Scheme-free / semi-structured data
    Massive data stores
    Scaling is easy
    Very, very high availability
    Often simpler to implement
    (and OR Mappers aren’t required)
    „Web 2.0 ready“
    29
  • NoSQL
    CONSs
    NoSQL implementations often „alpha“, no standards
    Data consistency, no transactions,
    Insufficient access control
    SQL: strong for dynamic, cross-table queries (JOIN)
    Relationships aren‘t enforced
    (conventions over constrains – except for graph DBs (of course))
    Premature optimization: Scalability
    (Don’t build for scalability if you never need it!)
    30
  • 02
    MongoDB
    31
  • NoSQL
    Lets rock!
    MongoDB Quick Reference Cards
    http://www.10gen.com/reference
    32
  • Basic Deployment
    Create the default data directory in c:datadb
    Start mongod.exe
    Optionally: mongod.exe --dbpath c:datadb --port 27017 --logpath c:datamongodb.log
    Start the shell: mongo.exe
    33
  • Data Import
    cd c:dba-training-datadata
    mongoimport -d twitter -c tweets twitter.json
    cd c:dba-training-datadatadumptraining
    mongorestore -d training -c scores scores.bson
    cd c:dba-training-datadatadump
    mongorestore -d diggdigg
    34
  • 35
  • MongoDB Documents
    (in the shell)
    use digg
    db.stories.findOne();
    36
  • JSON  BSON
    All JSON documents are stored in a binary format called BSON. BSON supports a richer set of types than JSON.
    http://bsonspec.org
    37
  • CRUD – Create
    (in the shell)
    db.people.save({name: 'Smith', age: 30});
    See how the save command works:
    db.foo.save
    38
  • CRUD – Create
    How training.scores was created:
    for(i=0; i<1000; i++) {
    ['quiz', 'essay', 'exam'].forEach(function(name) {
    var score = Math.floor(Math.random() * 50) + 50;
    db.scores.save({student: i, name: name, score: score});
    });
    }
    db.scores.count();
    39
  • CRUD – Read
    Queries are specified using a document-style syntax!
    use training
    db.scores.find({score: 50});
    db.scores.find({score: {"$gte": 70}});
    db.scores.find({score: {"$gte": 70}});
    Cursor!
    40
  • Exercises
    Find all scores less than 65.
    Find the lowest quiz score. Find the highest quiz score.
    Write a query to find all digg stories where the view count is greater than 1000.
    Query for all digg stories whose media type is either 'news' or 'images' and where the topic name is 'Comedy’.(For extra practice, construct two queries using different sets of operators to do this. )
    Find all digg stories where the topic name is 'Television' or the media type is 'videos'. Skip the first 5 results, and limit the result set to 10.
    41
  • CRUD – Update
    use digg;
    db.people.update({name: 'Smith'}, {'$set': {interests: []}});
    db.people.update({name: 'Smith'}, {'$push': {interests: ['chess']}});
    42
  • Exercises
    Set the proper 'grade' attribute for all scores. For example, users with scores greater than 90 get an 'A.' Set the grade to ‘B’ for scores falling between 80 and 90.
    You're being nice, so you decide to add 10 points to every score on every “final” exam whose score is lower than 60. How do you do this update?
    43
  • CRUD – Delete
    db.dropDatabase();
    db.foo.drop();
    db.foo.remove();
    44
  • “Map Reduce is the Uzi of aggregation tools. Everything described with count, distinct and group can be done with MapReduce, and more.”
    Kristina Chadorow, Michael Dirolf in MongoDB – The Definitive Guide
    45
  • MapReduce
    To use map-reduce, you first write a map function.
    map = function() {
    emit(this.user.name, {diggs: this.diggs, posts: 0});
    }
    46
  • MapReduce
    The reduce functions then aggregation those docs by key.
    reduce = function(key, values) {
    vardiggs = 0;
    var posts = 0;
    values.forEach(function(doc) {
    diggs += doc.diggs;
    posts += 1;
    });
    return {diggs: diggs, posts: posts};
    }
    47
  • MapReduce
    Now both are used to perform custom aggregation.
    db.stories.mapReduce(map, reduce, {out: 'digg_users'});
    48
  • THANK YOU
    FOR YOUR ATTENTION
    49
  • Farben Primär
  • Farben Code