Your SlideShare is downloading. ×
Distilled mongo db by Boris Trofimov
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Distilled mongo db by Boris Trofimov

2,076
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,076
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. distilled Boris Trofimov Team Lead@Sigma Ukraine @b0ris_1 btrofimoff@gmail.com
  • 2. Agenda● Part 1. Why NoSQL – SQL benefints and critics – NoSQL challange● Part 2. MongoDB – Overview – Console and query example – Java Integration – Data consistancy – Scaling – Tips
  • 3. Part 1. Why NoSQL
  • 4. Relational DBMS Benefits
  • 5. SQL● Simplicity● Uniform representation● Runtime schema modifications SELECT DISTINCT p.LastName, p.FirstName FROM Person.Person AS p JOIN HumanResources.Employee AS e ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN (SELECT Bonus FROM Sales.SalesPerson AS sp WHERE e.BusinessEntityID = sp.BusinessEntityID);
  • 6. Strong schema definition
  • 7. Strong consistency SQL features like Foreign and Primary Keys, Unique fields ACID (atomicity, consistency, isolation, durability) transactions Business transactions ~ system transactions
  • 8. RDBMS Criticism
  • 9. Big gap between domain and relational model
  • 10. Performance IssuesJOINS Minimization Query Optimization Choosing right transaction strategy Consistency costs too much Normalization Impact Performance issues
  • 11. Schema migration issues Consistency issues Reinventing bicycle Involving external tools like DBDeploy Scaling options Consistency issues Poor scaling options
  • 12. SQL Opposition ● Object Databases by OMG ● ORM ● ?
  • 13. No SQL Yes● Transactionaless in usual understanding● Schemaless, no migration● Closer to domain● Focused on aggregates● Trully scalable
  • 14. NoSQL Umbrella
  • 15. Key-Value Databases
  • 16. Column-Family Databases
  • 17. Document-oriented Databases
  • 18. Graph-oriented Databases
  • 19. Aggregate oriented Databases● Document databases implement idea of Aggregate oriented database.● Aggregate is a storage atom● Aggregate oriented databsaes are closer to application domain.● Ensures atomic operations with aggregate● Aggregate might be replicated or sharded efficiently● Major question: to embed or not to embed
  • 20. Relations vs Aggregates
  • 21. // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ], }Relational Model Document Model
  • 22. Part 2. MongoDB
  • 23. MongoDB Basics MongoDB is document- oriented and DBMS MongoDB is Client-Server DBMS JSON/JavaScript is major language to accessMongo DB = Collections + Indexes
  • 24. Collections Name Documents IndexesTwo documents from the samecollection might be completly differentSimple creating (during first insert).
  • 25. Document Identifier (_id) Body i JSON (Internally BSON) { "fullName" : "Fedor Buhankin", "course" : 5, "univercity" : "ONPU", "faculty" : "IKS", "_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" } "homeAddress" : "Ukraine, Odessa 23/34", "averageAssessment" : 5, "subjects" : [ "math", "literature", "drawing", "psychology" ] }● Major bricks: scalar value, map and list● Any part of the ducument can be indexed● Max document size is 16M
  • 26. MongoDB Console
  • 27. Query Examples
  • 28. // in customers {Simple Select "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * FROM ORDERS; { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" }db.orders.find() ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 29. // in customers {Simple Condition "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * FROM ORDERS WHERE { "id":99, "customerId":1,customerId = 1; "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" }db.orders.find( {"customerId":1} ) ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 30. // in customers {Simple Comparison "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId > 1 "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}],db.orders.find({ "customerId" : { $gt: 1 } } ); "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 31. // in customers {AND Condition "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId = 1 AND "productId":47, "price": 444.45, orderDate is not NULL "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ {db.orders.find( { customerId:1, orderDate : "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft",{ $exists : true } } ); "billingAddress": {"city": "Moscow"} } ] }
  • 32. // in customers {OR Condition "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId = 100 OR "productId":47, "price": 444.45, orderDate is not NULL "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ {db.orders.find( { $or:[ {customerId:100}, "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft",{orderDate : { $exists : false }} ] } ); "billingAddress": {"city": "Moscow"} } ] }
  • 33. // in customers {Select fields "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT orderId, orderDate { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE customerId = 1 "productId":47, "price": 444.45, "productName": "iPhone 5" }db.orders.find({customerId:1}, ],{orderId:1,orderDate:1}) "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 34. // in customers { Inner select "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM "customerId":1, "orderItems":[ Orders {WHERE "productId":47, "price": 444.45, Orders.id IN ( "productName": "iPhone 5" } SELECT id FROM orderItem ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ WHERE productName LIKE %iPhone% { "ccinfo":"1000-1000-1000-1000", ) "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"}db.orders.find( } ] {"orderItems.productName":/.*iPhone.*/} } )
  • 35. // in customers {NULL checks "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in ordersSELECT * { "id":99,FROM orders "customerId":1, "orderItems":[ {WHERE orderDate is NULL "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}],db.orders.find( "orderPayment":[ { orderDate : { $exists : false } } { "ccinfo":"1000-1000-1000-1000",); "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 36. More examples• db.orders.sort().skip(20).limit(10)• db.orders.count({ "orderItems.price" : { $gt: 444 })• db.orders.find( { orderItems: { "productId":47, "price": 444.45, "productName": "iPhone 5" } } );• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )
  • 37. Queries between collections● Remember, MongoDB = no JOINs● 1 approach: Perform multiple queries (lazy loading)● 2 approach: use MapReduce framework● 3 approach: use Aggregation Framework
  • 38. Map Reduce Framework● Is used to perform complex grouping with collection documents● Is able to manipulate over multiple collections● Uses MapReduce pattern● Use JavaScript language● Support sharded environment● The result is similar to materialized views
  • 39. Map Reduce Concept Launch map Launch reduce For every elem a11 map map b1 1 a22 map map b2 2 a33 map map b3 3 a44 map map b4 4 reduce reduce c a55 map map b5 5 a66 map map b6 6... ... ann map map bn n f map : A → B f reduce : B[ ]→ C
  • 40. How it worksInput Implement REDUCE function Implement MAP function Collection XMAP Execute MAP func: Mark each document with specific colorREDUCE Execute REDUCE func: Merge each colored set into single element Output
  • 41. Take amount of orders for each customerdb.cutomers_orders.remove();mapUsers = function() { emit( this.customerId, {count: 1, this.customerId} );};reduce = function(key, values) { var result = {count: 0, customerId:key}; values.forEach(function(value) { result.count += value.count; }); return result; };db.customers.mapReduce(mapUsers, reduce, {"out": {"replace""cutomers_orders"}});Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]
  • 42. Aggregation and Aggregation Framework● Simplify most used mapreduce operarions like group by criteria● Restriction on pipeline size is 16MB● Support sharded environment (Aggregation Framework only)
  • 43. Indexes● Anything might be indexed● Indexes improve performance● Implementation uses B-trees
  • 44. Access via APIUse Official MongoDB Java Driver (just include mongo.jar)Mongo m = new Mongo();// orMongo m = new Mongo( "localhost" );// orMongo m = new Mongo( "localhost" , 27017 );// or, to connect to a replica set, supply a seed list of membersMongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017), new ServerAddress("localhost", 27018), new ServerAddress("localhost", 27019)))DB db = m.getDB( "mydb" );DBCollection coll = db.getCollection("customers");ArrayList list = new ArrayList(); list.add(new BasicDBObject("city", "Odessa"));BasicDBObject doc= new BasicDBObject(); doc.put("name", "Kaktus"); doc.put("billingAddress", list);coll.insert(doc);
  • 45. Closer to Domain model● Morphia http://code.google.com/p/morphia/● Spring Data for MongoDB http://www.springsource.org/spring-data/mongodb Major features: ● Type-safe POJO centric model ● Annotations based mapping behavior ● Good performance ● DAO templates ● Simple criterias
  • 46. Example with Morphia@Entity("Customers")class Customer { @Id ObjectId id; // auto-generated, if not set (see ObjectId) @Indexed String name; // value types are automatically persisted List<Address> billingAddress; // by default fields are @Embedded Key<Customer> bestFriend; //referenceto external document @Reference List<Customer> partners = new ArrayList<Customer>(); //refs are stored and loaded automatically // ... getters and setters //Lifecycle methods -- Pre/PostLoad, Pre/PostPersist... @PostLoad void postLoad(DBObject dbObj) { ... }}Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")morphia.map(Customer.class);Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...));Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();
  • 47. To embed or not to embed ● Separate collections are good if you need to select individual documents, need more control over querying, or have huge documents. ● Embedded documents are good when you want the entire document, size of the document is predicted. Embedded documents provide perfect performance.
  • 48. Schema migration● Schemaless● Main focus is how the aplication will behave when new field will has been added● Incremental migration technque (version field) Use Cases : – removing field – renaming fields – refactoring aggregate
  • 49. Data Consistency● Transactional consistency – domain design should take into account aggregate atomicity● Replication consistency – Take into account Inconsistency window (sticky sessions)● Eventual consistency● Accept CAP theorem – it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency, availability and partition tolerance.
  • 50. Scaling
  • 51. Scaling options● Autosharding● Master-Slave replication● Replica Set clusterization● Sharding + Replica Set
  • 52. Sharding● MongoDB supports autosharding● Just specify shard key and pattern● Sharding increases writes● Major way for scaling the system
  • 53. Master-Slave replication● One master, many slaves● Slaves might be hidden or can be used to read● Master-Slave increase reades and provides reliability
  • 54. Replica Set clusterization● The replica set automatically elects a primary (master)● Master shares the same state between all replicas● Limitation (limit: 12 nodes)● WriteConcern option● Benefits: – Failover and Reliability – Distributing read load – maintance without downtime
  • 55. Sharding + ReplicaSet● Allows to build huge scalable failover database
  • 56. MongoDB Criticism● Dataloss reports on heavy-write configurations● Atomic operatons over multiple documents When not to use● Heavy cross-document atomic operations● Queries against varying aggregate structure
  • 57. Tips● Do not use autoincrement ids● Small names are are preffered● By default DAO methods are async● Think twise on collection design● Use atomic modifications for a document
  • 58. Out of scope● MapReduce options● Indexes● Capped collections
  • 59. Further readinghttp://www.mongodb.orgMartin Fowler NoSQL DistilledKyle Banker, MongoDB in Action
  • 60. Thank you!

×