MongoDB in the context of the Argentinean Census 2010 Mongo France 2011 Victorio J. BENTIVOGLI Villmond Luxembourg
Who we are? Villmond Luxembourg is an Enterprise Content Management (ECM) and Integration services provider established in 2005 Hosted at the Technoport, the first technology-oriented business incubator in Luxembourg, part of the Public Research Center Henri Tudor Delivers strategic Content Management, Collaboration and Integration solutions based on Open Source and proprietary software to some of the most demanding organisations
The Argentinean Census 2010 –  The context Around 43.000.000 inhabitants 200.000.000 images to process 4 months to complete the processing of every single form 15 days to produce  a working mockup  of a QA system
The Argentinean Census 2010 –  Partners
The Argentinean Census 2010 –  The QA system Aimed at controlling the quality of processed booklets Shows images and metadata of scanned booklets Operated 24x7 used by 120 concurrent users
The Argentinean Census 2010 –  The challenge We needed a rapid development cycle because the specifications were a moving target, and …  a really scalable Content Management System that could cope with 200.000.000 documents/images …  not only scalable but fast for insertions with 14 scanners working simultaneously 24x7 and importing around 2.000.000 images daily …  with enterprise class security, content  transformation capabilities and reliable!!!
The Argentinean Census 2010 –  Our proposal For the backend : use  MongoDB  as the underlying database coupled with our  Villmond Content Integration Framework  to complete the solution For the frontend : Develop an  Adobe Flex  based client, encapsulated into an  Adobe AIR  container
MongoDB Document-oriented storage Full Index Support Replication & High Availability Querying Map/Reduce GridFS Commercial Support
Villmond Content Integration Framework The Framework helps organisations to build robust applications that support critical content centric processes. It includes: Support for multiple Content Management platforms, including commercial products like EMC Documentum and Open Source offerings like Alfresco and  MongoDB . This allows reusability, maximising the return on investment (ROI) and avoiding vendor lock-in A ready to use, carefully crafted set of services that supports the entire lifecycle of critical content, shortening development time and improving the overall quality of applications A companion module that facilitates the migration of entire repositories between different platforms
Adobe Flex Front runner in RIA technology, it is cross platform, cross browser. It was conceived as a Domain Specific Language for rich UI development Uses a combination of MXML and ActionScript; and integrates with backend services written in Java or .Net Adobe Flex  SDK is Open Source. Adobe provides a comprehensive set of tools for development (Catalyst, Flash Builder, …) Requires a Adobe Flash Player or Adobe AIR runtime, both available as free downloads Has got wide industry adoption
SV at a glance –  Backend features Web services exposed as REST or Java method calls Distributed sessions and master/slaves configuration using Hazelcast (Slave nodes can be added transparently) Communication using JSON The project is wired using the Spring Framework Enterprise class security managed with Spring Security (formerly ACEGI security) LDAP access for user / roles based authentication The classes that manage the access to  MongoDB  are decoupled and can be replicated (LUNA, LUNB, …)
SV at a glance –  Backend services ECM-Core ECM-Mongo MongoDB LUNB AuthenticationService MongoDB LUNA Filesystem LUNB LDAP Filesystem LUNA SV Core and Service Implementation DataService ImportService UnitManagementService
SV at a glance –  Backend services (cont.) Authentication Service: Provides the connection to LDAP for authentication and authorization credentials Adds the configuration to manage execution of methods depending on the given roles ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
SV at a glance –  Backend services (cont.) Data Service: Retrieves imported booklets Retrieves composed images ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
SV at a glance –  Backend services (cont.) Import Service: Validates lot importing without altering the database Imports lots of booklets and control units ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
SV at a glance –  Backend services (cont.) Unit Management Service: Promotes the control units to the different states Manages purging of rejected and erroneous booklets ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
Design issues concerning  MongoDB Booklet metadata is stored in  MongoDB Two  MongoDB  databases for LUNA and LUNB, could be configured as replica sets Booklets and booklet pages are composed records in a collection, allowing to be found fast during a search Deletions and state promotions are performed in background MongoDB  slaves can potentially be accessed concurrently from many Tomcats. The synchronization is accomplished using Hazelcast Booklet importing is executed on the master node/primary  MongoDB  instance
Afterthoughts / lessons learnt MongoDB  and  Adobe Flex  are a great set of tools for rich content applications The data model is essential Content might be stored into the database as well to facilitate enforcement of the appropriate lifecycle The Java driver is great / easy to use Currently, we are using our own mapping mechanism for the DTOs (Data Transfer Object), but we would evaluate Morphia in the future
Thank you !

MongoDB in the context of the Argentinean Census 2010

  • 1.
    MongoDB in thecontext of the Argentinean Census 2010 Mongo France 2011 Victorio J. BENTIVOGLI Villmond Luxembourg
  • 2.
    Who we are?Villmond Luxembourg is an Enterprise Content Management (ECM) and Integration services provider established in 2005 Hosted at the Technoport, the first technology-oriented business incubator in Luxembourg, part of the Public Research Center Henri Tudor Delivers strategic Content Management, Collaboration and Integration solutions based on Open Source and proprietary software to some of the most demanding organisations
  • 3.
    The Argentinean Census2010 – The context Around 43.000.000 inhabitants 200.000.000 images to process 4 months to complete the processing of every single form 15 days to produce a working mockup of a QA system
  • 4.
    The Argentinean Census2010 – Partners
  • 5.
    The Argentinean Census2010 – The QA system Aimed at controlling the quality of processed booklets Shows images and metadata of scanned booklets Operated 24x7 used by 120 concurrent users
  • 6.
    The Argentinean Census2010 – The challenge We needed a rapid development cycle because the specifications were a moving target, and … a really scalable Content Management System that could cope with 200.000.000 documents/images … not only scalable but fast for insertions with 14 scanners working simultaneously 24x7 and importing around 2.000.000 images daily … with enterprise class security, content transformation capabilities and reliable!!!
  • 7.
    The Argentinean Census2010 – Our proposal For the backend : use MongoDB as the underlying database coupled with our Villmond Content Integration Framework to complete the solution For the frontend : Develop an Adobe Flex based client, encapsulated into an Adobe AIR container
  • 8.
    MongoDB Document-oriented storageFull Index Support Replication & High Availability Querying Map/Reduce GridFS Commercial Support
  • 9.
    Villmond Content IntegrationFramework The Framework helps organisations to build robust applications that support critical content centric processes. It includes: Support for multiple Content Management platforms, including commercial products like EMC Documentum and Open Source offerings like Alfresco and MongoDB . This allows reusability, maximising the return on investment (ROI) and avoiding vendor lock-in A ready to use, carefully crafted set of services that supports the entire lifecycle of critical content, shortening development time and improving the overall quality of applications A companion module that facilitates the migration of entire repositories between different platforms
  • 10.
    Adobe Flex Frontrunner in RIA technology, it is cross platform, cross browser. It was conceived as a Domain Specific Language for rich UI development Uses a combination of MXML and ActionScript; and integrates with backend services written in Java or .Net Adobe Flex SDK is Open Source. Adobe provides a comprehensive set of tools for development (Catalyst, Flash Builder, …) Requires a Adobe Flash Player or Adobe AIR runtime, both available as free downloads Has got wide industry adoption
  • 11.
    SV at aglance – Backend features Web services exposed as REST or Java method calls Distributed sessions and master/slaves configuration using Hazelcast (Slave nodes can be added transparently) Communication using JSON The project is wired using the Spring Framework Enterprise class security managed with Spring Security (formerly ACEGI security) LDAP access for user / roles based authentication The classes that manage the access to MongoDB are decoupled and can be replicated (LUNA, LUNB, …)
  • 12.
    SV at aglance – Backend services ECM-Core ECM-Mongo MongoDB LUNB AuthenticationService MongoDB LUNA Filesystem LUNB LDAP Filesystem LUNA SV Core and Service Implementation DataService ImportService UnitManagementService
  • 13.
    SV at aglance – Backend services (cont.) Authentication Service: Provides the connection to LDAP for authentication and authorization credentials Adds the configuration to manage execution of methods depending on the given roles ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 14.
    SV at aglance – Backend services (cont.) Data Service: Retrieves imported booklets Retrieves composed images ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 15.
    SV at aglance – Backend services (cont.) Import Service: Validates lot importing without altering the database Imports lots of booklets and control units ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 16.
    SV at aglance – Backend services (cont.) Unit Management Service: Promotes the control units to the different states Manages purging of rejected and erroneous booklets ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 17.
    Design issues concerning MongoDB Booklet metadata is stored in MongoDB Two MongoDB databases for LUNA and LUNB, could be configured as replica sets Booklets and booklet pages are composed records in a collection, allowing to be found fast during a search Deletions and state promotions are performed in background MongoDB slaves can potentially be accessed concurrently from many Tomcats. The synchronization is accomplished using Hazelcast Booklet importing is executed on the master node/primary MongoDB instance
  • 18.
    Afterthoughts / lessonslearnt MongoDB and Adobe Flex are a great set of tools for rich content applications The data model is essential Content might be stored into the database as well to facilitate enforcement of the appropriate lifecycle The Java driver is great / easy to use Currently, we are using our own mapping mechanism for the DTOs (Data Transfer Object), but we would evaluate Morphia in the future
  • 19.

Editor's Notes