• Share
  • Email
  • Embed
  • Like
  • Private Content
 Webinar: Dramatically Reducing Development Time With MongoDB
 

Webinar: Dramatically Reducing Development Time With MongoDB

on

  • 3,799 views

Modern day application development demands persistence of complex and dynamic shapes of data to match the highly flexible and powerful languages used in today's software landscape. Traditional ...

Modern day application development demands persistence of complex and dynamic shapes of data to match the highly flexible and powerful languages used in today's software landscape. Traditional approaches to solutions development with RDBMS increasingly expose the gap between the ease of use of modern development languages and the relational data model. Development time is wasted as the bulk of the work shifts from adding business features to struggling with the RDBMS. MongoDB, the leading NoSQL database, offers a flexible and scalable solution.

In this webinar, we will provide a medium-to-deep exploration of the MongoDB programming model and APIs and how they transform the way developers interact with a database, leading to:

Faster time to market for both initial deployment and subsequent change
Lower development costs
More choices in coupling features of a language to the database

We will also review the advantages of MongoDB technology in the rapid applications development (RAD) space for popular scripting languages such as javascript, python, perl, and ruby.

Statistics

Views

Total Views
3,799
Views on SlideShare
755
Embed Views
3,044

Actions

Likes
5
Downloads
83
Comments
0

7 Embeds 3,044

http://www.mongodb.com 2848
https://www.mongodb.com 135
http://aonetwork.com 26
http://www.aonetwork.com 12
https://live.mongodb.com 11
http://codeumut.com 11
https://stage.mongodb.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hello this is Buzz Moschetti; welcome to the webinar entitled “drama…”Today I’m going to hilight the progressive and powerful programming model in mongoDB and how it not only reduces time-to-market but also increases flexibility and capability.The content applies to any industry but those with questions about specific use cases in financial services, please feel free to reach out to me at the email address buzz.moschetti@mongodb.comSome quick logistics:The presentation audio & slides will be recorded and made available to you in about 24 hours.We have an hour set up but I’ll use about 40 minutes of that for the presentation with some time for questions.You can use the webex Q&A box to ask those questions at any time. If a significant number of similar questions show up in the middle, I will answer them; otherwise, I’ll try to answer as many as possible at the end.If you have technical issues, please send a webex message to the participant ID’d as mongoDB webinar team; otherwise keep your Qs focused on the content.
  • The way we describe concepts like trades, products, scenarios, workflows, locations and all the ways these things can work together is difficult to translate to software.It would be nice if we could record a use case, turn that into an MP3, and pass it to a runtime engine that would literally do what it is told but we’re not quite there yet
  • For several decades, database innovation has lagged behind other areas.Data create and consume goals and releases VERY differentApp/Code environment choices are significantly broader – moving from compile-time/vendor oriented to dynamic runtime open standardsWhy do databases lag?It’s hard to build a good, robust database.Until recently, did not have the power/flexibility of platform (incl cloud) plus new channels and interactive scale like smart mobile devices to push us over the hurdle into a new database model to satisfy these needs
  • How does it do this?A number of ways:BUT today I won’t get into the infrastructure side of things: low-cost horizontal scalability, no-downtime version upgrades, multisite DR, and indeed a coherent distributed scaling/HA/DR strategy.That’s all great.Today I want to talk about the topside experience, around these 3 points.Third: There is Symmetry between read and write operations on collections; this will become important as complexity of shapes increases.
  • To summarize (and we can’t resist putting in a huge ER diagram)Rectangles and the technology to support them is circa 1974.Management of Rich shape is Now.
  • 2: The role of the data access layer is to hide implementation specific details of raw data access from the consumer. The implementation inside the DAL can be as DB-specific as possible to maximize performance or other i/o goals.2: The DAL contains as many functions as appropriate to vend data to applications – could be dozens.2: The topside of the DAL exposes only data to a consumer, not necessarily bespoke objects. This is because some data operations are insufficient or inappropriate to populate a true object. Logic on top of the DAL is required to take the raw data and construct the appropriate object. Even in the mongoDB rich data world, data does not necessarily = object!2: ORM (notably Hibernate) and annotation based frameworks (like Morphia) have a different set of dependency & design considerations2: From a practical basis, it may be necessary to perform high performance data-only operations independent of objects and the DAL permits this to occur without exposing the entire implementation of persistence.3: “Nearly-compilable” code
  • Why a Map?Why not?No compile time dependencies; we are restricting the types at the data access layer to pure data, not complex objects (e.g. no m.put(“contact”, Contact object);Response can carry additional informationMaps are very easy to work with and have a lot of tooling around them – especially if you constrain your types.Brace yourself.
  • Remember: This is happening in the logic of the DAL, not the application!
  • Consequences: code/schema couplingThis took only a bit of time, yes, but it is pure overhead:No business feature valueBeginning of “technical debt”: Here, manifest as disconnect between schema in DB, logic in DAL, and The MapSQL nuances:Positional params are here for convenience but don’t really change things because:A. InresultSet, a “select column AS foo” changes the resultset column name to foo instead of the real column name; thus what you see in the DB vs. the code could differ.Interest fact: select fld1 as foo, fld2 as foo is LEGAL! ResultSet.findColumn() and getString(String name) return the first one they find!B. The column names are still in a different semantic domain (e.g. case insensitivity) than our Map names so you have to provide a “column”->”mapKey” mapping anyway (no direct storage like mongoDB)C. preparedStatements can’t use column names for substitution anyway – they can ONLY use positional parameters; more evidence of input/output mismatch…D. Not going to be the dominant sweat/stability issue – it’s the relational composition and decomposition which we’ll see later.
  • No change!Values we set into the map re preserved; period.
  • Add of phone numbers Could be several; we don’t know how manyThere could be other attributes like do not call and smartphone type, but we’ll leave that to later.
  • Attempt to duck the “listiness” of multiple phones has yielded a bad information architecture practically from the start of developmentThis is just plain bad.Setting the stage for pain when more phones are added in the futureMore technical debt!
  • This is actually a “friendly case”: phones uses the same ID as contact; in other cases, a new foreign key would have to be managed We are sidestepping cross-table integrity; other functions besides save() would come into play The JOIN to fetch all information produces a ResultSet unwind issue. The id, name, title, and hiredate are repeated in the cartesian product and have to be managed via logic. Problem is magnified when more than one id is requested because the ordering is not guaranteed – unless ORDER BY is specified in the query, which impacts performance.This took real time and money! All previous consequences (esp. code/schema release coupling) still present. And Save vs. Fetch logic clearly starting to become asymmetric.The extra work is only starting to begin because in a few days…
  • The Zombies emerge. They’re prevalent in pop culture … and in traditional RDBMS programming.Between day 3 and day 5 we loaded up some test data into the DBThis exposes a common challenge in SQL/RDBMS: As the data model grows – and particularly as “always set / sometimes set” dynamics come into play – one must increasingly be very careful about query constructionGood thing we have a DAL though: imagine if a bunch of those older queries escaped into the application space!
  • No dramaSimply saved the list of phone numbers (or none)No zombies
  • anotherMapOfData might even come from a different place altogether…
  • But likely:Because we have lived thru at least 1 schema upgrade, we are gun-shy about list-y data or under pressure so we create a semicolon delimited string of app names and store it as a single string.The otherMapOfData may or may not be all the fields. We store what we deem to be appropriate – even if that structure changes over time.
  • Tail does not wag the dog.Often, the Time, effort, coordination involved in proper modeling in the RDBMS world incentivizes developers to take shortcuts.
  • So we have saved and fetched a single item. What about Real Queries?This is where the mongoDB programming model starts to really shine.Number 1:Operators; no grammarFor simple queries, it is slightly more “involved” than SQL – but how many users type raw SQL into a screen for execution? Do you really want to do that?For complex queries, it ends up being no more difficult.The same way you build and manipulate data can be applied to manipulating queries.And while we’re at it – it is the same paradigm for consuming responses from the server, both data (in a cursor) and diagnostic and other operations results.Results can be processed, logged, visualized, formatted, etc. the same way for all operations without parsing or losing fidelity.
  • We often start with the CLI to show how things are done.But here, we show the actual map-of-map setup in code.Below, we’ve generalized fetch() into fetchGeneral() and instead of taking a String id, we now take a Map of expressions.This is the REALLY general form of fetch; more specialized versions might take Map fragments or scalar values which are inserted into pre-defined map-of-map structures.You don’t have to worry about “parsable syntax”. It is operators and operands that cooperate in very strongly but easily defined way.
  • Recall on slide 22 we said the data and query paradigm is the same; note that myData and expr are the same! Same tools, tricks, techniques can be applied to bothVery powerful but compact and clear scripts can be easily be written that leverage investments made in modules for that particular language.
  • Polymorphism: a field can have different types (objects shapes) from doc to doc within the same collection.Example K8, K9Very easy to craft a system where software relies on a few “well-known” fields like name and id in this example to manage information in the large but still save and extract with high-fidelity custom data to the parts of the software stack that understand it WITHOUT the persistor getting in the way.
  • You’ll have to recode the split-up in python unless you use Jython.But even then, there’s no easy solution for polymorphic data (you’ll have to develop your own rich data store/query filter/fetch subsystem)
  • In short, no malloc() in the old days.mongoDB takes advantage of the higher type fidelity of today’s popular and powerful languages.
  • Traditional RDBMS CLIs (Psql, isql) are interpreters for that particular flavor of SQL plus some extra commands.SQL is not a general purpose procedural language.mongoDB shell, however, can be viewed this way: it is a javascript interpreter that happens to load up some mongoDB Interface libraries.Party trick: no DDL or special setup needed! Results can be stored back immediately into the DB!All of this: Rich shapes of data, dynamic use of types, and symmetry of operation semantics lead to faster, easier development.

 Webinar: Dramatically Reducing Development Time With MongoDB Webinar: Dramatically Reducing Development Time With MongoDB Presentation Transcript

  • #MongoDB Dramatically Reducing Development Time With MongoDB Buzz Moschetti buzz.moschetti@mongodb.com Solutions Architect, MongoDB
  • Who is your Presenter? • Yes, I use “Buzz” on my business cards • Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that • Over 25 years of designing and building systems • • • • • Big and small Super-specialized to broadly useful in any vertical “Traditional” to completely disruptive Advocate of language leverage and strong factoring Still programming – using emacs, of course
  • What Are Your Developers Doing All Day? Adding and testing business features OR “Integrating with other components, tools, and systems” • • • • • Database(s) ETL and other data transfer operations Messaging Services (web & other) Other open source frameworks
  • Why Can’t We Just Save and Fetch Data? Because the way we think about data at the business use case level… …is different than the way it is implemented at the application/code level… …which traditionally is VERY different than the way it is implemented at the database level
  • This Problem Isn’t New… …but for the past 40 years, innovation at the business & application layers has outpaced innovation at the database layer 1974 2014 Business Data Goals Capture my company’s transactions daily at 5:30PM EST, add them up on a nightly basis, and print a big stack of paper Capture my company’s global transactions in realtime plus everything that is happening in the world (customers, competitors, business/regulatory,weather), producing any number of computed results, and passing this all in realtime to predictive analytics with model feedback; results in realtime to 10000s of mobile devices, multiple GUIs, and b2b and b2c channels Release Schedule Quarterly Yesterday Application /Code COBOL, Fortran, Algol, PL/1, assembler, proprietary tools COBOL, Fortran, C, C++, VB, C#, Java, javascript, groovy, ruby, perl python, Obj-C, SmallTalk, Clojure, ActionScript, Flex, DSLs, spring, AOP, CORBA, ORM, third party software ecosystem, open source movement Database I/VSAM, early RDBMS Mature RDBMS, legacy I/VSAM Column & key/value stores, and…mongoDB
  • Exactly How Does mongoDB Change Things? • mongoDB is designed from the ground up to address rich structure (maps of maps of lists of…), not rectangles • • Standard RDBMS interfaces (i.e. JDBC) do not exploit features of contemporary languages Rapid Application Development (RAD) and scripting in Javascript, Python, Perl, Ruby, and Scala is impedancematched to mongoDB • In mongoDB, the data is the schema • Shapes of data go in the same way they come out
  • Rectangles are 1974. Maps and Lists are 2014 { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { type : “work”, number: “1-800-555-1212” }, { type : “home”, number: “1-800-555-1313”, DNC: true }, { type : “home”, number: “1-800-555-1414”, DNC: true } ] }
  • An Actual Code Example (Finally!) Let’s compare and contrast RDBMS/SQL to mongoDB development using Java over the course of a few weeks. Some ground rules: 1. Observe rules of Software Engineering 101: Assume separation of application, Data Access Layer, and persistor implementation 2. Data Access Layer must be able to a. Expose simple, functional, data-only interfaces to the application • No ORM, frameworks, compile-time bindings, special tools b. Exploit high performance features of persistor 3. Focus on core data handling code and avoid distractions that require the same amount of work in both technologies a. No exception or error handling b. Leave out DB connection and other setup resources 4. Day counts are a proxy for progress, not actual time to complete indicated task
  • The Task: Saving and Fetching Contact data Start with this simple, flat shape in the Data Access Layer: And assume we save it in this way: And assume we fetch one by primary key in this way: Map m = new HashMap(); m.put(“name”, “buzz”); m.put(“id”, “K1”); save(Map m) Map m = fetch(String id) Brace yourself…..
  • Day 1: Initial efforts for both technologies SQL mongoDB DDL: create table contact ( … ) DDL: none init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name ) values ( ?,? )”); fetchStmt = connection.prepareStatement (“select id, name from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.execute(); } save(Map Let’s assume for argument’s sakem)that both { collection.insert(m); approaches take the same amount of time } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); } return m; } Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; }
  • Day 2: Add simple fields m.put(“name”, “buzz”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); • Capturing title and hireDate is part of adding a new business feature • It was pretty easy to add two fields to the structure • …but now we have to change our persistence code Brace yourself (again) …..
  • SQL Day 2 (changes in bold) DDL: alter table contact add title varchar(8); alter table contact add hireDate date; init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); } return m; } Consequences: 1. Code release schedule linked to database upgrade (new code cannot run on old schema) 2. Issues with case sensitivity starting to creep in (many RDBMS are case insensitive for column names, but code is case sensitive) 3. Changes require careful mods in 4 places 4. Beginning of technical debt
  • mongoDB Day 2 save(Map m) { collection.insert(m); } Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; } ✔ NO CHANGE Advantages: 1. Zero time and money spent on overhead code 2. Code and database not physically linked 3. New material with more fields can be added into existing collections; backfill is optional 4. Names of fields in database precisely match key names in code layer and directly match on name, not indirectly via positional offset 5. No technical debt is created
  • Day 3: Add list of phone numbers m.put(“name”, “buzz”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); n1.put(“type”, “work”); n1.put(“number”, “1-800-555-1212”)); list.add(n1); n2.put(“type”, “home”)); n2.put(“number”, “1-866-444-3131”)); list.add(n2); m.put(“phones”, list); • It was still pretty easy to add this data to the structure • .. but meanwhile, in the persistence code … REALLY brace yourself…
  • SQL Day 3 changes: Option 1: Assume just 1 work and 1 home phone number DDL: alter table contact add work_phone varchar(16); alter table contact add home_phone varchar(16); init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate, work_phone, home_phone ) values ( ?,?,?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, work_phone, home_phone from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { String t = onePhone.get(“type”); String n = onePhone.get(“number”); if(t.equals(“work”)) { contactInsertStmt.setString(5, n); } else if(t.equals(“home”)) { contactInsertStmt.setString(6, n); } } contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); Map onePhone; onePhone = new HashMap(); onePhone.put(“type”, “work”); onePhone.put(“number”, rs.getString(5)); list.add(onePhone); onePhone = new HashMap(); onePhone.put(“type”, “home”); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); m.put(“phones”, list); } This is just plain bad….
  • SQL Day 3 changes: Option 2: Proper approach with multiple phone numbers DDL: create table phones ( … ) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”); } save(Map m) { startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { c2stmt.setString(1, m.get(“id”)); c2stmt.setString(2, onePhone.get(“type”)); c2stmt.setString(3, onePhone.get(“number”)); c2stmt.execute(); } contactInsertStmt.execute(); endTrans(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); int i = 0; List list = new ArrayList(); while (rs.next()) { if(i == 0) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); m.put(“phones”, list); } Map onePhone = new HashMap(); onePhone.put(“type”, rs.getString(5)); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); i++; } return m; } This took time and money
  • SQL Day 5: Zombies! init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select A.id, A.name, A.title, A.hiredate, B.type, B.number from contact A left outer join phones B on (A.id = B. id) where A.id = ?”); } while (rs.next()) { if(i == 0) { // … } String s = rs.getString(5); if(s != null) { Map onePhone = new HashMap(); onePhone.put(“type”, s); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); } } (zero or more between entities) Whoops! And it’s also wrong! We did not design the query accounting for contacts that have no phone number. Thus, we have to change the join to an outer join. But this ALSO means we have to change the unwind logic This took more time and …but at least we have a DAL… money! right?
  • mongoDB Day 3 save(Map m) { collection.insert(m); } Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; } ✔ NO CHANGE Advantages: 1. Zero time and money spent on overhead code 2. No need to fear fields that are “naturally occurring” lists containing data specific to the parent structure and thus do not benefit from normalization and referential integrity
  • By Day 14, our structure looks like this: m.put(“name”, “buzz”); m.put(“id”, “K1”); //… n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } ); n4.put(“geo”, “US-EAST”); list2.add(n4); n4.put(“startupApps”, new String[] { “app6” } ); n4.put(“geo”, “EMEA”);l n4.put(“useLocalNumberFormats”, false): list2.add(n4); m.put(“preferences”, list2) n6.put(“optOut”, true); n6.put(“assertDate”, someDate); seclist.add(n6); m.put(“attestations”, seclist) m.put(“security”, anotherMapOfData); • It was still pretty easy to add this data to the structure • Want to guess what the SQL persistence code looks like? • How about the mongoDB persistence code?
  • SQL Day 14 Error: Could not fit all the code into this space. …actually, I didn’t want to spend 2 hours putting the code together.. But very likely, among other things: • n4.put(“startupApps”,new String[]{“app1”,“app2”,“app3”}); was implemented as a single semi-colon delimited string • m.put(“security”, anotherMapOfData); was implemented by flattening it out and storing a subset of fields
  • mongoDB Day 14 – and every other day save(Map m) { collection.insert(m); } Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; } ✔ NO CHANGE Advantages: 1. Zero time and money spent on overhead code 2. Persistence is so easy and flexible and backward compatible that the persistor does not upwardinfluence the shapes we want to persist i.e. the tail does not wag the dog
  • But what about “real” queries? • mongoDB query language is a physical map-ofmap based structure, not a String • Operators (e.g. AND, OR, GT, EQ, etc.) and arguments are keys and values in a cascade of Maps • No grammar to parse, no templates to fill in, no whitespace, no escaping quotes, no parentheses, no punctuation • Same paradigm to manipulate data is used to manipulate query expressions • …which is also, by the way, the same paradigm for working with mongoDB metadata and explain()
  • mongoDB Query Examples Objective Code CLI Find all contacts with at least one mobile phone Map expr = new HashMap(); expr.put(“phones.type”, “mobile”); db.contact.find({"phones.type”:"mobile”}); Find contacts with NO phones Map expr = new HashMap(); Map q1 = new HashMap(); q1.put(“$exists”, false); expr.put(“phones”, q1); db.contact.find({"phones”:{"$exists”:false}}); Advantages: List fetchGeneral(Map expr) { List l = new ArrayList(); DBObject dbo = new BasicDBObject(expr); Cursor c = collection.find(dbo); while (c.hasNext()) } l.add((Map)c.next()); } return l; } 1. Far less time required to set up complex parameterized filters 2. No need for SQL rewrite logic or creating new PreparedStatements 3. Map-of-Maps query structure is easily walked and processed without parsing
  • …and before you ask… Yes, mongoDB query expressions support 1. Sorting 2. Cursor size limit 3. Aggregation functions 4. Projection (asking for only parts of the rich shape to be returned)
  • Day 30: RAD on mongoDB with Python import pymongo def save(data): coll.insert(data) Advantages: def fetch(id): return coll.find_one({”id": id } ) 1. Far easier and faster to create scripts due to “fidelity-parity” of mongoDB map data and python (and perl, ruby, and javascript) structures myData = { “name”: “jane”, “id”: “K2”, # no title? No problem “hireDate”: datetime.date(2011, 11, 1), “phones”: [ { "type": "work", "number": "1-800-555-1212" }, { "type": "home", "number": "1-866-444-3131" } ] } save(myData) print fetch(“K2”) 1. Data types and structure in scripts are exactly the same as that read and written in Java and C++ expr = { "$or": [ {"phones": { "$exists": False }}, {"name": ”jane"}]} for c in coll.find(expr): print [ k.upper() for k in sorted(c.keys()) ]
  • Day 30: Polymorphic RAD on mongoDB with Python import pymongo item = fetch("K8") # item is: { “name”: “bob”, “id”: “K8”, "personalData": { "preferedAirports": [ "LGA", "JFK" ], "travelTimeThreshold": { "value": 3, "units": “HRS”} } } item = fetch("K9") # item is: { “name”: “steve”, “id”: “K9”, "personalData": { "lastAccountVisited": { "name": "mongoDB", "when": datetime.date(2013,11,4) }, "favoriteNumber": 3.14159 } } Advantages: 1. Scripting languages easily digest shapes with common fields and dissimilar fields 2. Easy to create an information architecture where placeholder fields like personalData are “known” in the software logic to be dynamic
  • Day 30: (Not) RAD on top of SQL with Python init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”); } save(Map m) { startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { c2stmt.setString(1, onePhone.get(“type”)); c2stmt.setString(2, onePhone.get(“number”)); c2stmt.execute(); } contactInsertStmt.execute(); endTrans(); } Consequences: 1. All logic coded in Java interface layer (splitting up contact, phones, preferences, etc.) needs to be rewritten in python (unless Jython is used) … AND/or perl, C++, Scala, etc. 2. No robust way to handle polymorphic data other than BLOBing it 3. …and that will take real time and money!
  • The Fundamental Change with mongoDB RDBMS designed in era when: • CPU and disk was slow & expensive • Memory was VERY expensive • Network? What network? • Languages had limited means to dynamically reflect on their types • Languages had poor support for richly structured types Thus, the database had to • Act as combiner-coordinator of simpler types • Define a rigid schema • (Together with the code) optimize at compile-time, not run-time In mongoDB, the data is the schema!
  • mongoDB and the Rich Map Ecosystem Generic comparison of two records Map expr = new HashMap(); expr.put("myKey", "K1"); DBObject a = collection.findOne(expr); expr.put("myKey", "K2"); DBObject b = collection.findOne(expr); List<MapDiff.Difference> d = MapDiff.diff((Map)a, (Map)b); Getting default values for a thing on a certain date and then overlaying user preferences (like for a calculation run) Map expr = new HashMap(); expr.put("myKey", "DEFAULT"); expr.put("createDate", new Date(2013, 11, 1)); DBObject a = collection.findOne(expr); expr.clear(); expr.put("myKey", "user1"); DBObject b = otherCollectionPerhaps.findOne(expr); MapStack s = new MapStack(); s.push((Map)a); s.push((Map)b); Map merged = s.project(); Runtime reflection of Maps and Lists enables generic powerful utilities (MapDiff, MapStack) to be created once and used for all kinds of shapes, saving time and money
  • Lastly: A CLI with teeth > db.contact.find({"SeqNum": {"$gt”:10000}}).explain(); { "cursor" : "BasicCursor", "n" : 200000, //... "millis" : 223 } Try a query and show the diagnostics > for(v=[],i=0;i<3;i++) { … n = i*50000; … expr = {"SeqNum": {"$gt”: n}}; … v.push( [n, db.contact.find(expr).explain().millis)] } Run it 3 times with smaller and smaller chunks and create a vector of timing result pairs (size,time) >v [ [ 0, 225 ], [ 50000, 222 ], [ 100000, 220 ] ] Let’s see that vector > load(“jStat.js”) > jStat.stdev(v.map(function(p){return p[1];})) 2.0548046676563256 Use any other javascript you want inside the shell > for(i=0;i<3;i++) { … expr = {"SeqNum": {"$gt":i*1000}}; … db.foo.insert(db.contact.find(expr).explain()); } Party trick: save the explain() output back into a collection!
  • Webex Q&A
  • #MongoDB Thank You Buzz Moschetti buzz.moschetti@mongodb.com Solutions Architect, MongoDB