MongoDB in the context of the Argentinean Census 2010

998 views

Published on

MongoDB in the context of the Argentinean Census 2010.

  • Be the first to comment

MongoDB in the context of the Argentinean Census 2010

  1. 1. MongoDB in the context of the Argentinean Census 2010 Mongo France 2011 Victorio J. BENTIVOGLI Villmond Luxembourg
  2. 2. Who we are? <ul><li>Villmond Luxembourg is an Enterprise Content Management (ECM) and Integration services provider established in 2005 </li></ul><ul><li>Hosted at the Technoport, the first technology-oriented business incubator in Luxembourg, part of the Public Research Center Henri Tudor </li></ul><ul><li>Delivers strategic Content Management, Collaboration and Integration solutions based on Open Source and proprietary software to some of the most demanding organisations </li></ul>
  3. 3. The Argentinean Census 2010 – The context <ul><li>Around 43.000.000 inhabitants </li></ul><ul><li>200.000.000 images to process </li></ul><ul><li>4 months to complete the processing of every single form </li></ul><ul><li>15 days to produce a working mockup of a QA system </li></ul>
  4. 4. The Argentinean Census 2010 – Partners
  5. 5. The Argentinean Census 2010 – The QA system <ul><li>Aimed at controlling the quality of processed booklets </li></ul><ul><li>Shows images and metadata of scanned booklets </li></ul><ul><li>Operated 24x7 used by 120 concurrent users </li></ul>
  6. 6. The Argentinean Census 2010 – The challenge <ul><li>We needed a rapid development cycle because the specifications were a moving target, and </li></ul><ul><li>… a really scalable Content Management System that could cope with 200.000.000 documents/images </li></ul><ul><li>… not only scalable but fast for insertions with 14 scanners working simultaneously 24x7 and importing around 2.000.000 images daily </li></ul><ul><li>… with enterprise class security, content transformation capabilities and reliable!!! </li></ul>
  7. 7. The Argentinean Census 2010 – Our proposal <ul><li>For the backend : use MongoDB as the underlying database coupled with our Villmond Content Integration Framework to complete the solution </li></ul><ul><li>For the frontend : Develop an Adobe Flex based client, encapsulated into an Adobe AIR container </li></ul>
  8. 8. MongoDB <ul><li>Document-oriented storage </li></ul><ul><li>Full Index Support </li></ul><ul><li>Replication & High Availability </li></ul><ul><li>Querying </li></ul><ul><li>Map/Reduce </li></ul><ul><li>GridFS </li></ul><ul><li>Commercial Support </li></ul>
  9. 9. Villmond Content Integration Framework <ul><li>The Framework helps organisations to build robust applications that support critical content centric processes. It includes: </li></ul><ul><li>Support for multiple Content Management platforms, including commercial products like EMC Documentum and Open Source offerings like Alfresco and MongoDB . This allows reusability, maximising the return on investment (ROI) and avoiding vendor lock-in </li></ul><ul><li>A ready to use, carefully crafted set of services that supports the entire lifecycle of critical content, shortening development time and improving the overall quality of applications </li></ul><ul><li>A companion module that facilitates the migration of entire repositories between different platforms </li></ul>
  10. 10. Adobe Flex <ul><li>Front runner in RIA technology, it is cross platform, cross browser. It was conceived as a Domain Specific Language for rich UI development </li></ul><ul><li>Uses a combination of MXML and ActionScript; and integrates with backend services written in Java or .Net </li></ul><ul><li>Adobe Flex SDK is Open Source. Adobe provides a comprehensive set of tools for development (Catalyst, Flash Builder, …) </li></ul><ul><li>Requires a Adobe Flash Player or Adobe AIR runtime, both available as free downloads </li></ul><ul><li>Has got wide industry adoption </li></ul>
  11. 11. SV at a glance – Backend features <ul><li>Web services exposed as REST or Java method calls </li></ul><ul><li>Distributed sessions and master/slaves configuration using Hazelcast (Slave nodes can be added transparently) </li></ul><ul><li>Communication using JSON </li></ul><ul><li>The project is wired using the Spring Framework </li></ul><ul><li>Enterprise class security managed with Spring Security (formerly ACEGI security) </li></ul><ul><li>LDAP access for user / roles based authentication </li></ul><ul><li>The classes that manage the access to MongoDB are decoupled and can be replicated (LUNA, LUNB, …) </li></ul>
  12. 12. SV at a glance – Backend services ECM-Core ECM-Mongo MongoDB LUNB AuthenticationService MongoDB LUNA Filesystem LUNB LDAP Filesystem LUNA SV Core and Service Implementation DataService ImportService UnitManagementService
  13. 13. SV at a glance – Backend services (cont.) <ul><li>Authentication Service: </li></ul><ul><ul><li>Provides the connection to LDAP for authentication and authorization credentials </li></ul></ul><ul><ul><li>Adds the configuration to manage execution of methods depending on the given roles </li></ul></ul>ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  14. 14. SV at a glance – Backend services (cont.) <ul><li>Data Service: </li></ul><ul><ul><li>Retrieves imported booklets </li></ul></ul><ul><ul><li>Retrieves composed images </li></ul></ul>ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  15. 15. SV at a glance – Backend services (cont.) <ul><li>Import Service: </li></ul><ul><ul><li>Validates lot importing without altering the database </li></ul></ul><ul><ul><li>Imports lots of booklets and control units </li></ul></ul>ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  16. 16. SV at a glance – Backend services (cont.) <ul><li>Unit Management Service: </li></ul><ul><ul><li>Promotes the control units to the different states </li></ul></ul><ul><ul><li>Manages purging of rejected and erroneous booklets </li></ul></ul>ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  17. 17. Design issues concerning MongoDB <ul><li>Booklet metadata is stored in MongoDB </li></ul><ul><li>Two MongoDB databases for LUNA and LUNB, could be configured as replica sets </li></ul><ul><li>Booklets and booklet pages are composed records in a collection, allowing to be found fast during a search </li></ul><ul><li>Deletions and state promotions are performed in background </li></ul><ul><li>MongoDB slaves can potentially be accessed concurrently from many Tomcats. The synchronization is accomplished using Hazelcast </li></ul><ul><li>Booklet importing is executed on the master node/primary MongoDB instance </li></ul>
  18. 18. Afterthoughts / lessons learnt <ul><li>MongoDB and Adobe Flex are a great set of tools for rich content applications </li></ul><ul><li>The data model is essential </li></ul><ul><li>Content might be stored into the database as well to facilitate enforcement of the appropriate lifecycle </li></ul><ul><li>The Java driver is great / easy to use </li></ul><ul><li>Currently, we are using our own mapping mechanism for the DTOs (Data Transfer Object), but we would evaluate Morphia in the future </li></ul>
  19. 19. <ul><li>Thank you ! </li></ul>

×