distilledBoris TrofimovTeam Lead@Sigma Ukraine@b0ris_1btrofimoff@gmail.com
Agenda●Part 1. Why NoSQL– SQL benefints and critics– NoSQL challange●Part 2. MongoDB– Overview– Console and query example–...
Part 1. Why NoSQL
Relational DBMS Benefits
SQL●Simplicity●Uniform representation●Runtime schema modificationsSELECT DISTINCT p.LastName, p.FirstNameFROM Person.Perso...
Strong schema definition
Strong consistencySQL features likeForeign and Primary Keys, UniquefieldsACID (atomicity, consistency, isolation,durabilit...
RDBMS Criticism
Big gap between domain andrelational model
Performance IssuesJOINS Minimization Choosing right transaction strategyQuery OptimizationConsistency costs too muchNormal...
Schema migration issuesConsistency issuesReinventing bicycleInvolving external tools like DBDeployScaling optionsConsisten...
SQL Opposition●Object Databases by OMG●ORM●?
No SQL Yes●Transactionaless in usual understanding●Schemaless, no migration●Closer to domain●Focused on aggregates●Trully ...
NoSQL Umbrella
Key-Value Databases
Column-Family Databases
Document-oriented Databases
Graph-oriented Databases
Aggregate oriented Databases●Document databases implement idea of Aggregateoriented database.●Aggregate is a storage atom●...
Relations vs Aggregates
// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderIt...
Part 2. MongoDB
MongoDB BasicsMongoDB is document-oriented and DBMSMongoDB is Client-ServerDBMSMongo DB = Collections + IndexesJSON/JavaSc...
CollectionsSimple creating (during first insert).Two documents from the samecollection might be completly differentNameDoc...
Document{"fullName" : "Fedor Buhankin","course" : 5,"univercity" : "ONPU","faculty" : "IKS","_id" : { "$oid" : "5071c043cc...
MongoDB Console
Query Examples
// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderIt...
SELECT * FROM ORDERS WHEREcustomerId = 1;db.orders.find( {"customerId":1} )Simple Condition// in customers{"id":1,"name":"...
SELECT *FROM ordersWHERE customerId > 1db.orders.find({ "customerId" : { $gt: 1 } } );Simple Comparison// in customers{"id...
SELECT *FROM ordersWHERE customerId = 1 ANDorderDate is not NULLdb.orders.find( { customerId:1, orderDate :{ $exists : tru...
SELECT *FROM ordersWHERE customerId = 100 ORorderDate is not NULLdb.orders.find( { $or:[ {customerId:100},{orderDate : { $...
SELECT orderId, orderDateFROM ordersWHERE customerId = 1db.orders.find({customerId:1},{orderId:1,orderDate:1})Select field...
SELECT *FROMOrdersWHEREOrders.id IN (SELECT id FROM orderItemWHERE productName LIKE %iPhone%)db.orders.find({"orderItems.p...
SELECT *FROM ordersWHERE orderDate is NULLdb.orders.find({ orderDate : { $exists : false } });NULL checks// in customers{"...
More examples• db.orders.sort().skip(20).limit(10)• db.orders.count({ "orderItems.price" : { $gt: 444 })• db.orders.find( ...
Queries between collections●Remember, MongoDB = no JOINs●1 approach: Perform multiple queries (lazy loading)●2 approach: u...
Map Reduce Framework●Is used to perform complex grouping with collectiondocuments●Is able to manipulate over multiple coll...
Map Reduce Concepta1a1a2a2a3a3a4a4a5a5a6a6anan......b1b1b2b2b3b3b4b4b5b5b6b6bnbn......Launch mapFor every elemLaunch reduc...
Implement MAP functionImplement MAP functionImplement REDUCE functionImplement REDUCE functionExecute MAP func:Mark each d...
Take amount of orders for each customerdb.cutomers_orders.remove();mapUsers = function() {emit( this.customerId, {count: 1...
Aggregation andAggregation Framework●Simplify most used mapreduce operarions likegroup by criteria●Restriction on pipeline...
Indexes●Anything might be indexed●Indexes improve performance●Implementation uses B-trees
Access via APIMongo m = new Mongo();// orMongo m = new Mongo( "localhost" );// orMongo m = new Mongo( "localhost" , 27017 ...
Closer to Domain model●Morphia http://code.google.com/p/morphia/●Spring Data for MongoDBhttp://www.springsource.org/spring...
Example with Morphia@Entity("Customers")class Customer {@Id ObjectId id; // auto-generated, if not set (see ObjectId)@Inde...
To embed or not to embed●Separate collections are good if you needto select individual documents, needmore control over qu...
Schema migration●Schemaless●Main focus is how the aplication will behave whennew field will has been added●Incremental mig...
Data Consistency●Transactional consistency– domain design should take into account aggregate atomicity●Replication consist...
Scaling
Scaling options●Autosharding●Master-Slave replication●Replica Set clusterization●Sharding + Replica Set
Sharding●MongoDB supports autosharding●Just specify shard key and pattern●Sharding increases writes●Major way for scaling ...
Master-Slave replication●One master, many slaves●Slaves might be hidden or can be used to read●Master-Slave increasereades...
Replica Set clusterization●The replica set automatically elects a primary (master)●Master shares the same state between al...
Sharding + ReplicaSet●Allows to build huge scalable failover database
MongoDB Criticism●Dataloss reports on heavy-write configurations●Atomic operatons over multiple documentsWhen not to use●H...
Tips●Do not use autoincrement ids●Small names are are preffered●By default DAO methods are async●Think twise on collection...
Out of scope●MapReduce options●Indexes●Capped collections
Further readinghttp://www.mongodb.orgKyle Banker, MongoDB in ActionMartin Fowler NoSQL Distilled
Thank you!
MongoDB Distilled
Upcoming SlideShare
Loading in …5
×

MongoDB Distilled

1,123 views
981 views

Published on

Prepared for Odessa Java User Group (October 2012)

Published in: Business, Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,123
On SlideShare
0
From Embeds
0
Number of Embeds
151
Actions
Shares
0
Downloads
15
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

MongoDB Distilled

  1. 1. distilledBoris TrofimovTeam Lead@Sigma Ukraine@b0ris_1btrofimoff@gmail.com
  2. 2. Agenda●Part 1. Why NoSQL– SQL benefints and critics– NoSQL challange●Part 2. MongoDB– Overview– Console and query example– Java Integration– Data consistancy– Scaling– Tips
  3. 3. Part 1. Why NoSQL
  4. 4. Relational DBMS Benefits
  5. 5. SQL●Simplicity●Uniform representation●Runtime schema modificationsSELECT DISTINCT p.LastName, p.FirstNameFROM Person.Person AS pJOIN HumanResources.Employee AS eON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN(SELECT BonusFROM Sales.SalesPerson AS spWHERE e.BusinessEntityID = sp.BusinessEntityID);
  6. 6. Strong schema definition
  7. 7. Strong consistencySQL features likeForeign and Primary Keys, UniquefieldsACID (atomicity, consistency, isolation,durability) transactionsBusiness transactions ~ system transactions
  8. 8. RDBMS Criticism
  9. 9. Big gap between domain andrelational model
  10. 10. Performance IssuesJOINS Minimization Choosing right transaction strategyQuery OptimizationConsistency costs too muchNormalization Impact Performance issues
  11. 11. Schema migration issuesConsistency issuesReinventing bicycleInvolving external tools like DBDeployScaling optionsConsistency issuesPoor scaling options
  12. 12. SQL Opposition●Object Databases by OMG●ORM●?
  13. 13. No SQL Yes●Transactionaless in usual understanding●Schemaless, no migration●Closer to domain●Focused on aggregates●Trully scalable
  14. 14. NoSQL Umbrella
  15. 15. Key-Value Databases
  16. 16. Column-Family Databases
  17. 17. Document-oriented Databases
  18. 18. Graph-oriented Databases
  19. 19. Aggregate oriented Databases●Document databases implement idea of Aggregateoriented database.●Aggregate is a storage atom●Aggregate oriented databsaes are closer to applicationdomain.●Ensures atomic operations with aggregate●Aggregate might be replicated or sharded efficiently●Major question: to embed or not to embed
  20. 20. Relations vs Aggregates
  21. 21. // in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}]"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}],}Relational Model Document Model
  22. 22. Part 2. MongoDB
  23. 23. MongoDB BasicsMongoDB is document-oriented and DBMSMongoDB is Client-ServerDBMSMongo DB = Collections + IndexesJSON/JavaScript is majorlanguage to access
  24. 24. CollectionsSimple creating (during first insert).Two documents from the samecollection might be completly differentNameDocumentsIndexesIndexes
  25. 25. Document{"fullName" : "Fedor Buhankin","course" : 5,"univercity" : "ONPU","faculty" : "IKS","_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" }"homeAddress" : "Ukraine, Odessa 23/34","averageAssessment" : 5,"subjects" : ["math","literature","drawing","psychology"]}Identifier (_id)Body i JSON (Internally BSON)●Any part of the ducument can be indexed●Max document size is 16M●Major bricks: scalar value, map and list
  26. 26. MongoDB Console
  27. 27. Query Examples
  28. 28. // in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}SELECT * FROM ORDERS;db.orders.find()Simple Select
  29. 29. SELECT * FROM ORDERS WHEREcustomerId = 1;db.orders.find( {"customerId":1} )Simple Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}
  30. 30. SELECT *FROM ordersWHERE customerId > 1db.orders.find({ "customerId" : { $gt: 1 } } );Simple Comparison// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}
  31. 31. SELECT *FROM ordersWHERE customerId = 1 ANDorderDate is not NULLdb.orders.find( { customerId:1, orderDate :{ $exists : true } } );AND Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}
  32. 32. SELECT *FROM ordersWHERE customerId = 100 ORorderDate is not NULLdb.orders.find( { $or:[ {customerId:100},{orderDate : { $exists : false }} ] } );OR Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}
  33. 33. SELECT orderId, orderDateFROM ordersWHERE customerId = 1db.orders.find({customerId:1},{orderId:1,orderDate:1})Select fields// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}
  34. 34. SELECT *FROMOrdersWHEREOrders.id IN (SELECT id FROM orderItemWHERE productName LIKE %iPhone%)db.orders.find({"orderItems.productName":/.*iPhone.*/})Inner select// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}
  35. 35. SELECT *FROM ordersWHERE orderDate is NULLdb.orders.find({ orderDate : { $exists : false } });NULL checks// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}// in orders{"id":99,"customerId":1,"orderItems":[{"productId":47,"price": 444.45,"productName": "iPhone 5"}],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress": {"city": "Moscow"}}]}
  36. 36. More examples• db.orders.sort().skip(20).limit(10)• db.orders.count({ "orderItems.price" : { $gt: 444 })• db.orders.find( { orderItems: { "productId":47, "price": 444.45,"productName": "iPhone 5" } } );• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )
  37. 37. Queries between collections●Remember, MongoDB = no JOINs●1 approach: Perform multiple queries (lazy loading)●2 approach: use MapReduce framework●3 approach: use Aggregation Framework
  38. 38. Map Reduce Framework●Is used to perform complex grouping with collectiondocuments●Is able to manipulate over multiple collections●Uses MapReduce pattern●Use JavaScript language●Support sharded environment●The result is similar to materialized views
  39. 39. Map Reduce Concepta1a1a2a2a3a3a4a4a5a5a6a6anan......b1b1b2b2b3b3b4b4b5b5b6b6bnbn......Launch mapFor every elemLaunch reducemapmapmapmapmapmapmapmapmapmapmapmapmapmapreducereduce ccf map : A→ B f reduce : B[ ]→C
  40. 40. Implement MAP functionImplement MAP functionImplement REDUCE functionImplement REDUCE functionExecute MAP func:Mark each documentwith specific colorExecute MAP func:Mark each documentwith specific colorInputExecute REDUCE func:Merge each colored setinto single elementExecute REDUCE func:Merge each colored setinto single elementMAPREDUCEOutputCollection XHow it works
  41. 41. Take amount of orders for each customerdb.cutomers_orders.remove();mapUsers = function() {emit( this.customerId, {count: 1, this.customerId} );};function(key, values) {var result = {count: 0, customerId:key};values.forEach(function(value) {result.count += value.count;});return result;}db.customers.mapReduce(mapUsers, reduce, {"out": {"replace""cutomers_orders"}});Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]
  42. 42. Aggregation andAggregation Framework●Simplify most used mapreduce operarions likegroup by criteria●Restriction on pipeline size is 16MB●Support sharded environment (AggregationFramework only)
  43. 43. Indexes●Anything might be indexed●Indexes improve performance●Implementation uses B-trees
  44. 44. Access via APIMongo m = new Mongo();// orMongo m = new Mongo( "localhost" );// orMongo m = new Mongo( "localhost" , 27017 );// or, to connect to a replica set, supply a seed list of membersMongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017),new ServerAddress("localhost", 27018),new ServerAddress("localhost", 27019)))DB db = m.getDB( "mydb" );DBCollection coll = db.getCollection("customers");ArrayList list = new ArrayList();list.add(new BasicDBObject("city", "Odessa"));BasicDBObject doc= new BasicDBObject();doc.put("name", "Kaktus");doc.put("billingAddress", list);coll.insert(doc);Use Official MongoDB Java Driver (just include mongo.jar)
  45. 45. Closer to Domain model●Morphia http://code.google.com/p/morphia/●Spring Data for MongoDBhttp://www.springsource.org/spring-data/mongodbMajor features:●Type-safe POJO centric model●Annotations based mapping behavior●Good performance●DAO templates●Simple criterias
  46. 46. Example with Morphia@Entity("Customers")class Customer {@Id ObjectId id; // auto-generated, if not set (see ObjectId)@Indexed String name; // value types are automatically persistedList<Address> billingAddress; // by default fields are @EmbeddedKey<Customer> bestFriend; //referenceto external document@Reference List<Customer> partners = new ArrayList<Customer>(); //refs arestored and loaded automatically// ... getters and setters//Lifecycle methods -- Pre/PostLoad, Pre/PostPersist...@PostLoad void postLoad(DBObject dbObj) { ... }}Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")morphia.map(Customer.class);Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...));Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();
  47. 47. To embed or not to embed●Separate collections are good if you needto select individual documents, needmore control over querying, or have hugedocuments.●Embedded documents are good whenyou want the entire document, size of thedocument is predicted. Embeddeddocuments provide perfect performance.
  48. 48. Schema migration●Schemaless●Main focus is how the aplication will behave whennew field will has been added●Incremental migration technque (version field)Use Cases :– removing field– renaming fields– refactoring aggregate
  49. 49. Data Consistency●Transactional consistency– domain design should take into account aggregate atomicity●Replication consistency– Take into account Inconsistency window (sticky sessions)●Eventual consistency●Accept CAP theorem– it is impossible for a distributed computer system to simultaneously provide allthree of the following guarantees: consistency, availability and partitiontolerance.
  50. 50. Scaling
  51. 51. Scaling options●Autosharding●Master-Slave replication●Replica Set clusterization●Sharding + Replica Set
  52. 52. Sharding●MongoDB supports autosharding●Just specify shard key and pattern●Sharding increases writes●Major way for scaling the system
  53. 53. Master-Slave replication●One master, many slaves●Slaves might be hidden or can be used to read●Master-Slave increasereades and providesreliability
  54. 54. Replica Set clusterization●The replica set automatically elects a primary (master)●Master shares the same state between all replicas●Limitation (limit: 12 nodes)●WriteConcern option●Benefits:– Failover and Reliability– Distributing read load– maintance without downtime
  55. 55. Sharding + ReplicaSet●Allows to build huge scalable failover database
  56. 56. MongoDB Criticism●Dataloss reports on heavy-write configurations●Atomic operatons over multiple documentsWhen not to use●Heavy cross-document atomic operations●Queries against varying aggregate structure
  57. 57. Tips●Do not use autoincrement ids●Small names are are preffered●By default DAO methods are async●Think twise on collection design●Use atomic modifications for a document
  58. 58. Out of scope●MapReduce options●Indexes●Capped collections
  59. 59. Further readinghttp://www.mongodb.orgKyle Banker, MongoDB in ActionMartin Fowler NoSQL Distilled
  60. 60. Thank you!

×