0
distilled            Boris Trofimov            Team Lead@Sigma Ukraine            @b0ris_1            btrofimoff@gmail.com
Agenda●    Part 1. Why NoSQL    –   SQL benefints and critics    –   NoSQL challange●    Part 2. MongoDB    –   Overview  ...
Part 1. Why NoSQL
Relational DBMS Benefits
SQL●    Simplicity●    Uniform representation●    Runtime schema modifications     SELECT DISTINCT p.LastName, p.FirstName...
Strong schema definition
Strong consistency     SQL features like     Foreign and Primary Keys, Unique     fields     ACID (atomicity, consistency,...
RDBMS Criticism
Big gap between domain and      relational model
Performance IssuesJOINS Minimization   Query Optimization     Choosing right transaction strategy        Consistency costs...
Schema migration issues      Consistency issues      Reinventing bicycle      Involving external tools like DBDeploy    Sc...
SQL Opposition    ●        Object Databases by OMG    ●        ORM    ●        ?
No SQL Yes●    Transactionaless in usual understanding●    Schemaless, no migration●    Closer to domain●    Focused on ag...
NoSQL Umbrella
Key-Value Databases
Column-Family Databases
Document-oriented Databases
Graph-oriented Databases
Aggregate oriented Databases●    Document databases implement idea of Aggregate    oriented database.●    Aggregate is a s...
Relations vs Aggregates
// in customers                   {                   "id":1,                   "name":"Medvedev",                   "bill...
Part 2. MongoDB
MongoDB Basics                 MongoDB is document-                 oriented and DBMS                 MongoDB is Client-Se...
Collections       Name       Documents       IndexesTwo documents from the samecollection might be completly differentSimp...
Document    Identifier (_id)    Body i JSON (Internally BSON)    {     "fullName" : "Fedor Buhankin",     "course" : 5,   ...
MongoDB Console
Query Examples
// in customers                        {Simple Select           "id":1,                        "name":"Medvedev",         ...
// in customers                                     {Simple Condition                     "id":1,                         ...
// in customers                                                 {Simple Comparison                                "id":1, ...
// in customers                                              {AND Condition                                 "id":1,       ...
// in customers                                            {OR Condition                                "id":1,           ...
// in customers                                 {Select fields                    "id":1,                                 ...
// in customers                                              { Inner select                                 "id":1,       ...
// in customers                                         {NULL checks                              "id":1,                 ...
More examples• db.orders.sort().skip(20).limit(10)• db.orders.count({ "orderItems.price" : { $gt: 444 })• db.orders.find( ...
Queries between collections●    Remember, MongoDB = no JOINs●    1 approach: Perform multiple queries (lazy loading)●    2...
Map Reduce Framework●    Is used to perform complex grouping with collection    documents●    Is able to manipulate over m...
Map Reduce Concept         Launch map            Launch reduce        For every elem a11        map            map        ...
How it worksInput    Implement REDUCE function          Implement MAP function                                     Collect...
Take amount of orders for each customerdb.cutomers_orders.remove();mapUsers = function() {    emit( this.customerId, {coun...
Aggregation and          Aggregation Framework●    Simplify most used mapreduce operarions like    group by criteria●    R...
Indexes●    Anything might be indexed●    Indexes improve performance●    Implementation uses B-trees
Access via APIUse Official MongoDB Java Driver (just include mongo.jar)Mongo m =   new Mongo();// orMongo m =   new Mongo(...
Closer to Domain model●    Morphia http://code.google.com/p/morphia/●    Spring Data for MongoDB    http://www.springsourc...
Example with Morphia@Entity("Customers")class Customer {  @Id ObjectId id; // auto-generated, if not set (see ObjectId)  @...
To embed or not to embed     ●         Separate collections are good if you need         to select individual documents, n...
Schema migration●    Schemaless●    Main focus is how the aplication will behave when    new field will has been added●   ...
Data Consistency●    Transactional consistency    –   domain design should take into account aggregate atomicity●    Repli...
Scaling
Scaling options●    Autosharding●    Master-Slave replication●    Replica Set clusterization●    Sharding + Replica Set
Sharding●    MongoDB supports autosharding●    Just specify shard key and pattern●    Sharding increases writes●    Major ...
Master-Slave replication●    One master, many slaves●    Slaves might be hidden or can be used to read●    Master-Slave in...
Replica Set clusterization●    The replica set automatically elects a primary (master)●    Master shares the same state be...
Sharding + ReplicaSet●    Allows to build huge scalable failover database
MongoDB Criticism●    Dataloss reports on heavy-write configurations●    Atomic operatons over multiple documents         ...
Tips●    Do not use autoincrement ids●    Small names are are preffered●    By default DAO methods are async●    Think twi...
Out of scope●    MapReduce options●    Indexes●    Capped collections
Further readinghttp://www.mongodb.orgMartin Fowler NoSQL DistilledKyle Banker, MongoDB in Action
Thank you!
Distilled mongo db by Boris Trofimov
Upcoming SlideShare
Loading in...5
×

Distilled mongo db by Boris Trofimov

2,104

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,104
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Distilled mongo db by Boris Trofimov"

  1. 1. distilled Boris Trofimov Team Lead@Sigma Ukraine @b0ris_1 btrofimoff@gmail.com
  2. 2. Agenda● Part 1. Why NoSQL – SQL benefints and critics – NoSQL challange● Part 2. MongoDB – Overview – Console and query example – Java Integration – Data consistancy – Scaling – Tips
  3. 3. Part 1. Why NoSQL
  4. 4. Relational DBMS Benefits
  5. 5. SQL● Simplicity● Uniform representation● Runtime schema modifications SELECT DISTINCT p.LastName, p.FirstName FROM Person.Person AS p JOIN HumanResources.Employee AS e ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN (SELECT Bonus FROM Sales.SalesPerson AS sp WHERE e.BusinessEntityID = sp.BusinessEntityID);
  6. 6. Strong schema definition
  7. 7. Strong consistency SQL features like Foreign and Primary Keys, Unique fields ACID (atomicity, consistency, isolation, durability) transactions Business transactions ~ system transactions
  8. 8. RDBMS Criticism
  9. 9. Big gap between domain and relational model
  10. 10. Performance IssuesJOINS Minimization Query Optimization Choosing right transaction strategy Consistency costs too much Normalization Impact Performance issues
  11. 11. Schema migration issues Consistency issues Reinventing bicycle Involving external tools like DBDeploy Scaling options Consistency issues Poor scaling options
  12. 12. SQL Opposition ● Object Databases by OMG ● ORM ● ?
  13. 13. No SQL Yes● Transactionaless in usual understanding● Schemaless, no migration● Closer to domain● Focused on aggregates● Trully scalable
  14. 14. NoSQL Umbrella
  15. 15. Key-Value Databases
  16. 16. Column-Family Databases
  17. 17. Document-oriented Databases
  18. 18. Graph-oriented Databases
  19. 19. Aggregate oriented Databases● Document databases implement idea of Aggregate oriented database.● Aggregate is a storage atom● Aggregate oriented databsaes are closer to application domain.● Ensures atomic operations with aggregate● Aggregate might be replicated or sharded efficiently● Major question: to embed or not to embed
  20. 20. Relations vs Aggregates
  21. 21. // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ], }Relational Model Document Model
  22. 22. Part 2. MongoDB
  23. 23. MongoDB Basics MongoDB is document- oriented and DBMS MongoDB is Client-Server DBMS JSON/JavaScript is major language to accessMongo DB = Collections + Indexes
  24. 24. Collections Name Documents IndexesTwo documents from the samecollection might be completly differentSimple creating (during first insert).
  25. 25. Document Identifier (_id) Body i JSON (Internally BSON) { "fullName" : "Fedor Buhankin", "course" : 5, "univercity" : "ONPU", "faculty" : "IKS", "_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" } "homeAddress" : "Ukraine, Odessa 23/34", "averageAssessment" : 5, "subjects" : [ "math", "literature", "drawing", "psychology" ] }● Major bricks: scalar value, map and list● Any part of the ducument can be indexed● Max document size is 16M
  26. 26. MongoDB Console
  27. 27. Query Examples
  28. 28. // in customers {Simple Select "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * FROM ORDERS; { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" }db.orders.find() ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  29. 29. // in customers {Simple Condition "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * FROM ORDERS WHERE { "id":99, "customerId":1,customerId = 1; "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" }db.orders.find( {"customerId":1} ) ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  30. 30. // in customers {Simple Comparison "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId > 1 "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}],db.orders.find({ "customerId" : { $gt: 1 } } ); "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  31. 31. // in customers {AND Condition "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId = 1 AND "productId":47, "price": 444.45, orderDate is not NULL "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ {db.orders.find( { customerId:1, orderDate : "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft",{ $exists : true } } ); "billingAddress": {"city": "Moscow"} } ] }
  32. 32. // in customers {OR Condition "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId = 100 OR "productId":47, "price": 444.45, orderDate is not NULL "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ {db.orders.find( { $or:[ {customerId:100}, "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft",{orderDate : { $exists : false }} ] } ); "billingAddress": {"city": "Moscow"} } ] }
  33. 33. // in customers {Select fields "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT orderId, orderDate { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId = 1 "productId":47, "price": 444.45, "productName": "iPhone 5" }db.orders.find({customerId:1}, ],{orderId:1,orderDate:1}) "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  34. 34. // in customers { Inner select "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM "customerId":1, "orderItems":[ Orders {WHERE "productId":47, "price": 444.45, Orders.id IN ( "productName": "iPhone 5" } SELECT id FROM orderItem ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ WHERE productName LIKE %iPhone% { "ccinfo":"1000-1000-1000-1000", ) "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"}db.orders.find( } ] {"orderItems.productName":/.*iPhone.*/} } )
  35. 35. // in customers {NULL checks "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE orderDate is NULL "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}],db.orders.find( "orderPayment":[ { orderDate : { $exists : false } } { "ccinfo":"1000-1000-1000-1000",); "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  36. 36. More examples• db.orders.sort().skip(20).limit(10)• db.orders.count({ "orderItems.price" : { $gt: 444 })• db.orders.find( { orderItems: { "productId":47, "price": 444.45, "productName": "iPhone 5" } } );• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )
  37. 37. Queries between collections● Remember, MongoDB = no JOINs● 1 approach: Perform multiple queries (lazy loading)● 2 approach: use MapReduce framework● 3 approach: use Aggregation Framework
  38. 38. Map Reduce Framework● Is used to perform complex grouping with collection documents● Is able to manipulate over multiple collections● Uses MapReduce pattern● Use JavaScript language● Support sharded environment● The result is similar to materialized views
  39. 39. Map Reduce Concept Launch map Launch reduce For every elem a11 map map b1 1 a22 map map b2 2 a33 map map b3 3 a44 map map b4 4 reduce reduce c a55 map map b5 5 a66 map map b6 6... ... ann map map bn n f map : A → B f reduce : B[ ]→ C
  40. 40. How it worksInput Implement REDUCE function Implement MAP function Collection XMAP Execute MAP func: Mark each document with specific colorREDUCE Execute REDUCE func: Merge each colored set into single element Output
  41. 41. Take amount of orders for each customerdb.cutomers_orders.remove();mapUsers = function() { emit( this.customerId, {count: 1, this.customerId} );};reduce = function(key, values) { var result = {count: 0, customerId:key}; values.forEach(function(value) { result.count += value.count; }); return result; };db.customers.mapReduce(mapUsers, reduce, {"out": {"replace""cutomers_orders"}});Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]
  42. 42. Aggregation and Aggregation Framework● Simplify most used mapreduce operarions like group by criteria● Restriction on pipeline size is 16MB● Support sharded environment (Aggregation Framework only)
  43. 43. Indexes● Anything might be indexed● Indexes improve performance● Implementation uses B-trees
  44. 44. Access via APIUse Official MongoDB Java Driver (just include mongo.jar)Mongo m = new Mongo();// orMongo m = new Mongo( "localhost" );// orMongo m = new Mongo( "localhost" , 27017 );// or, to connect to a replica set, supply a seed list of membersMongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017), new ServerAddress("localhost", 27018), new ServerAddress("localhost", 27019)))DB db = m.getDB( "mydb" );DBCollection coll = db.getCollection("customers");ArrayList list = new ArrayList(); list.add(new BasicDBObject("city", "Odessa"));BasicDBObject doc= new BasicDBObject(); doc.put("name", "Kaktus"); doc.put("billingAddress", list);coll.insert(doc);
  45. 45. Closer to Domain model● Morphia http://code.google.com/p/morphia/● Spring Data for MongoDB http://www.springsource.org/spring-data/mongodb Major features: ● Type-safe POJO centric model ● Annotations based mapping behavior ● Good performance ● DAO templates ● Simple criterias
  46. 46. Example with Morphia@Entity("Customers")class Customer { @Id ObjectId id; // auto-generated, if not set (see ObjectId) @Indexed String name; // value types are automatically persisted List<Address> billingAddress; // by default fields are @Embedded Key<Customer> bestFriend; //referenceto external document @Reference List<Customer> partners = new ArrayList<Customer>(); //refs are stored and loaded automatically // ... getters and setters //Lifecycle methods -- Pre/PostLoad, Pre/PostPersist... @PostLoad void postLoad(DBObject dbObj) { ... }}Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")morphia.map(Customer.class);Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...));Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();
  47. 47. To embed or not to embed ● Separate collections are good if you need to select individual documents, need more control over querying, or have huge documents. ● Embedded documents are good when you want the entire document, size of the document is predicted. Embedded documents provide perfect performance.
  48. 48. Schema migration● Schemaless● Main focus is how the aplication will behave when new field will has been added● Incremental migration technque (version field) Use Cases : – removing field – renaming fields – refactoring aggregate
  49. 49. Data Consistency● Transactional consistency – domain design should take into account aggregate atomicity● Replication consistency – Take into account Inconsistency window (sticky sessions)● Eventual consistency● Accept CAP theorem – it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency, availability and partition tolerance.
  50. 50. Scaling
  51. 51. Scaling options● Autosharding● Master-Slave replication● Replica Set clusterization● Sharding + Replica Set
  52. 52. Sharding● MongoDB supports autosharding● Just specify shard key and pattern● Sharding increases writes● Major way for scaling the system
  53. 53. Master-Slave replication● One master, many slaves● Slaves might be hidden or can be used to read● Master-Slave increase reades and provides reliability
  54. 54. Replica Set clusterization● The replica set automatically elects a primary (master)● Master shares the same state between all replicas● Limitation (limit: 12 nodes)● WriteConcern option● Benefits: – Failover and Reliability – Distributing read load – maintance without downtime
  55. 55. Sharding + ReplicaSet● Allows to build huge scalable failover database
  56. 56. MongoDB Criticism● Dataloss reports on heavy-write configurations● Atomic operatons over multiple documents When not to use● Heavy cross-document atomic operations● Queries against varying aggregate structure
  57. 57. Tips● Do not use autoincrement ids● Small names are are preffered● By default DAO methods are async● Think twise on collection design● Use atomic modifications for a document
  58. 58. Out of scope● MapReduce options● Indexes● Capped collections
  59. 59. Further readinghttp://www.mongodb.orgMartin Fowler NoSQL DistilledKyle Banker, MongoDB in Action
  60. 60. Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×