MongoDB in the context of the Argentinean Census 2010

  • 820 views
Uploaded on

MongoDB in the context of the Argentinean Census 2010.

MongoDB in the context of the Argentinean Census 2010.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
820
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • 01/12/11

Transcript

  • 1. MongoDB in the context of the Argentinean Census 2010 Mongo France 2011 Victorio J. BENTIVOGLI Villmond Luxembourg
  • 2. Who we are?
    • Villmond Luxembourg is an Enterprise Content Management (ECM) and Integration services provider established in 2005
    • Hosted at the Technoport, the first technology-oriented business incubator in Luxembourg, part of the Public Research Center Henri Tudor
    • Delivers strategic Content Management, Collaboration and Integration solutions based on Open Source and proprietary software to some of the most demanding organisations
  • 3. The Argentinean Census 2010 – The context
    • Around 43.000.000 inhabitants
    • 200.000.000 images to process
    • 4 months to complete the processing of every single form
    • 15 days to produce a working mockup of a QA system
  • 4. The Argentinean Census 2010 – Partners
  • 5. The Argentinean Census 2010 – The QA system
    • Aimed at controlling the quality of processed booklets
    • Shows images and metadata of scanned booklets
    • Operated 24x7 used by 120 concurrent users
  • 6. The Argentinean Census 2010 – The challenge
    • We needed a rapid development cycle because the specifications were a moving target, and
    • … a really scalable Content Management System that could cope with 200.000.000 documents/images
    • … not only scalable but fast for insertions with 14 scanners working simultaneously 24x7 and importing around 2.000.000 images daily
    • … with enterprise class security, content transformation capabilities and reliable!!!
  • 7. The Argentinean Census 2010 – Our proposal
    • For the backend : use MongoDB as the underlying database coupled with our Villmond Content Integration Framework to complete the solution
    • For the frontend : Develop an Adobe Flex based client, encapsulated into an Adobe AIR container
  • 8. MongoDB
    • Document-oriented storage
    • Full Index Support
    • Replication & High Availability
    • Querying
    • Map/Reduce
    • GridFS
    • Commercial Support
  • 9. Villmond Content Integration Framework
    • The Framework helps organisations to build robust applications that support critical content centric processes. It includes:
    • Support for multiple Content Management platforms, including commercial products like EMC Documentum and Open Source offerings like Alfresco and MongoDB . This allows reusability, maximising the return on investment (ROI) and avoiding vendor lock-in
    • A ready to use, carefully crafted set of services that supports the entire lifecycle of critical content, shortening development time and improving the overall quality of applications
    • A companion module that facilitates the migration of entire repositories between different platforms
  • 10. Adobe Flex
    • Front runner in RIA technology, it is cross platform, cross browser. It was conceived as a Domain Specific Language for rich UI development
    • Uses a combination of MXML and ActionScript; and integrates with backend services written in Java or .Net
    • Adobe Flex SDK is Open Source. Adobe provides a comprehensive set of tools for development (Catalyst, Flash Builder, …)
    • Requires a Adobe Flash Player or Adobe AIR runtime, both available as free downloads
    • Has got wide industry adoption
  • 11. SV at a glance – Backend features
    • Web services exposed as REST or Java method calls
    • Distributed sessions and master/slaves configuration using Hazelcast (Slave nodes can be added transparently)
    • Communication using JSON
    • The project is wired using the Spring Framework
    • Enterprise class security managed with Spring Security (formerly ACEGI security)
    • LDAP access for user / roles based authentication
    • The classes that manage the access to MongoDB are decoupled and can be replicated (LUNA, LUNB, …)
  • 12. SV at a glance – Backend services ECM-Core ECM-Mongo MongoDB LUNB AuthenticationService MongoDB LUNA Filesystem LUNB LDAP Filesystem LUNA SV Core and Service Implementation DataService ImportService UnitManagementService
  • 13. SV at a glance – Backend services (cont.)
    • Authentication Service:
      • Provides the connection to LDAP for authentication and authorization credentials
      • Adds the configuration to manage execution of methods depending on the given roles
    ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 14. SV at a glance – Backend services (cont.)
    • Data Service:
      • Retrieves imported booklets
      • Retrieves composed images
    ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 15. SV at a glance – Backend services (cont.)
    • Import Service:
      • Validates lot importing without altering the database
      • Imports lots of booklets and control units
    ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 16. SV at a glance – Backend services (cont.)
    • Unit Management Service:
      • Promotes the control units to the different states
      • Manages purging of rejected and erroneous booklets
    ECM-Core ECM-Mongo AuthenticationService SV Core and Service Implementation DataService ImportService UnitManagementService
  • 17. Design issues concerning MongoDB
    • Booklet metadata is stored in MongoDB
    • Two MongoDB databases for LUNA and LUNB, could be configured as replica sets
    • Booklets and booklet pages are composed records in a collection, allowing to be found fast during a search
    • Deletions and state promotions are performed in background
    • MongoDB slaves can potentially be accessed concurrently from many Tomcats. The synchronization is accomplished using Hazelcast
    • Booklet importing is executed on the master node/primary MongoDB instance
  • 18. Afterthoughts / lessons learnt
    • MongoDB and Adobe Flex are a great set of tools for rich content applications
    • The data model is essential
    • Content might be stored into the database as well to facilitate enforcement of the appropriate lifecycle
    • The Java driver is great / easy to use
    • Currently, we are using our own mapping mechanism for the DTOs (Data Transfer Object), but we would evaluate Morphia in the future
  • 19.
    • Thank you !