Thank you all for coming.My name isMarcio GarciaI’m Sw Engineer my core language is Java, but I’m also developing in +Python, +Ruby, +Groove, +Shell scriptInterested in programming languages, static and dynamics and infrastructure and automation I’ve started at R/GA Last january in SP office. Btw, I’d like to say thanks to Edson, Will Turnage and CristhianRauh for the opportunity.
This is the agenda for todayI’ll Attempt to answer 3 basic questions: WHAT an introduction about MongoDB, I’m expecting to spend 6 minutesWHY, some motivations why you should pick Mongo to a web project, also 6 minutesand HOW, how to integrate and some nice stuff about Java and Mongo, will take approximately 12 minutes.
NoSQL DB / Document based: Different (not better nor worse) just different, for different tasks for different approachesCross platform – linux, windows and macos are welcome, 32 or 64 bits as wellWritten in C++Structured in BSON, it’s a JSON Like protocol and structureUnder GNU licenseJoinless higher performance – it’s a problem on regular SQL db, most of the times, working with Oracle, we should use ‘hints’, reindex indexes to get a satisfied time to execute the query. Actually this is one of the characteristics to a DB be considered a NoSQL.You can use join, but it is highly discouraged. If you’d like to keep thinks like you’re used to you shouldn’t move to a NoSQL DB, quite obvious doesn’t it?Master slave failoverSharding. I’ll talk a little bit more about this later
I’d like to talk more deeply about 3 aspects…..NoSQL Database document basedStructureAnd Sharding
This is a white belt skills needed if you like to start developing with mongo, the differences btween a regular SQL DB and MongoDBThis is a comparison from what we’re used to understand about a regular RDBMS like Oracle, Postgre, MySQL and Mongo.First Database, a database in a RDMBS is quite the same in Mongo, that is represented as a file (or bunch of them) in your Hard Drive.Second, Tablespace on a RDBMS is a Collection in the Mondo side, that is a bunch of documentsThird, Tables on RDBMS is mapped as Documents on Mongo, that have the same conclusion, a bunch of fields togetherFields, is almost the same in both worlds, they can have types like: String, Integer, Float, Timestamp, Binary, Array and including a special one: Document.
This is a Blue belt skills………….You can manipulate a regular RDBMS using SQL statements. Create records, update, remove, retrieve records and all DML and DDLs are SQL basedOn the other hand, on a mongodb world you don’t have sql statements, you should use BSON/JSON to retrive, update and create documents. You don’t have DDLs, the definitions commands in mongo, at not for creating collections and documents, mongo is schemaless so, if you’d like to create a document simple create that json, and fire the save command. If a document with that specific fields already exists fine, if not, it will be created.
Bleck belt skill – Sharding…From MongoDB page: Sharding is the ability to distribute a peace of data through many MongoDB instances.So, to understand how it works in a simple example….Giving this document (click) with two fields, first and last name.Supposing that I’ve a such high volume of queries been fired against the db asking, updating and creating documents, I could choose to distribute the load on different mongo instances based on a function. (click).My function here is: the first letter of the first name will be distributed to different mongo db instances. (click). This is how my infrastructure should looks like.** Present the sharding (blue box :: mongodb instances , yellow box :: mongo instances acting as configuration servers , green box :: mongo router and in gray I have a client firing the request do retrieve, create, remove or delete documents.The workload starting from the client requests, are distributed to a different mongo shard, based on the sharding function.This is just a example of a sharding approach.
Query based dbAll purpose dbACID, this is a important point. Mongo is not ACID through multiple documents, but it is ACID on the same document, if you have a document inside other, this is ACID.Fixed schema, it doesn’t mean that you can do whatever you want with this without a punishment. Punishment in disk space, memory usage.Unlimited storage database: limited by your OS and disk space. On a 32 bits Linux the file size is 16GB. On a 64bits the disk is your limit. I got a database with 130GB with a miliseconds to store or retrive data from the collection. Without any join, of course.
Last week we’ve made a test. And these are the numbers that we got from this.It was around 60mi of records (3 fields)Loaded using JSON formatTooks around 2 hours to load it on my machineIt generated 15GB of data locallyIt took around 40 minutes to create an index with two fieldsFinding records, after the index, It’took less then 18 miliseconds, before the index it took around 1’16’’The performance here is nothing extraordinary, any MySQL well configured and tuned could do the same.
You want a fast response for your queries without care about extra commands and tunning queries…It’s your first db (direct contact with the client)Store temp data, it’s not cache, you have better options for that, like redis and memcacheShare data between apps with diff flavors (java, shell, js)DW cubesFile storage (for instance using Nginx plugin)Horizontally scaling making usage of shardingIf we are talking about an app with web app characteristics like: high volume of data (retrieving, storing and update data), temporary data, asyncrhonous calls, this should fit you.
Portal Home Page, storing data from the backend database to be displayed to a high volume access page. You could use Redis for that, the problem is the queries criteria. (Nike+ and NikeFuel)App on Facebook, for instance sharing score, number of victories, on a social game. Volatille data in the app, like last comments, ranking page, recipes pageDelivering content do different clients – Delivering content to a backend app its more a database job, but putting the power of JSON structure and easy way to connect, makes Mongo share data not only to a backend app, but also to a front end one, using Node.js or just a single jQueryplugin.Delivering content to WebServer – NGINX and lighttp have plugins to instead delivery content from file system, delivery it from a Mongo. You can take advantages of the sharding acting as a load balancer for the content. Advantages of this approach? You could use this instead of a expensive EMC storage devices.
OK guys, not yet convinced after the tech approach…… these are some companies that are using mongo on at least one project.
To connect a java app to the mongodb you can use the prosaic way of opening a socket directly to the mongodatabase, create your own json, send it to the db, read the return, parse it, provide a good message to the client…….. After 5 days using this approach write on your blog how bad was your experience using Java and MongoDB. Or….. you can use a driver ……There are some players that have created some….. Spring created the Spring data for MongoDB, DataNucleos also has one….. Both of them looks to be nice, robust, but also full of features that you probably will not use…… I pick a very simple one… named: Morphia…..
Annotationbased driverU can use the JSR-303 for ValidationIt’s type safeDAO access abstraction using the genericsEasy to include in your project, it’s not needed a lot of dependences (just one jar file)It’s fast…. It uses reflection at first time and then cache all the objects structureLightweightThe source is easy to understand
Two basic ways…. Download the jar file and put it in your classpath… orUsing maven… and downloading the internet to your machine…… both ways are pretty straightforward …..
This is the code to get connected to a Mongo db.First you have to have an instance of a Mongo object (doing this on line 17) with that you can create a datasource connected to the your database (line 18).In this case, my database name is “TEST”
For me it looks like a workaround when your want to put some business rules on a Pre or Post methods.It looks like some technical stuff that you have to act after of before a business rule, like for instance setting basic content for fields, like updated_date or created_date..But of course we should take a look at the specific case.
Warning! By using the @Reference you are creating a FK. And a NoSQL DB doesn’t care if your record will gonna be orphans.This FK is not controlled by the DB. This is controlled by the Driver, so it’s not a good approach.A solution to avoid creating this sort of thing should be create temporary collections, loaded from others collections, joining the fields.
1 – Create the Java object2 – fide the method save from the DatastoreThis is how this record will looks like in mongo
Removing a record, is quite simple as well, You have to have the reference from the DB loaded into the object and fire the delete method from the datastore.The point here is that, the method delete receives two types of parameters, an objects and a Query
Follow the same approach of the delete, you have to have a reference from that record loaded into an object.The second way is to use the Query and UpdateOperations that I’ll show here….. You can use the same approach to delete records, just using the Query.1 – Find the objects to be updated, using the Query2 – Apply the update rule, in this case I’m changing the name of the city to “São Paulo”3 – Then execute the update by firing Datastore.update, including as parameter the query and the update operations)