Good morning and thank you for having me here. It is my pleasure to speak about our work
Relatively high level architectureChallenges and solution for migrating large and complex relation data to MongoDB – whereMongoDB fits.
1 min.~20 datasetsMore than 11 million recordsDigitized recordsOriginal paper records
1 min.This is work in progress.Started in April 2010Released beta in April 2011.Basic search, Advanced search, Browse hierarchy, Show search results on map
2 min.Clean UIUnified access to all different data sources.
2 min.Browsing – navigating catalogue hierarchy.
1 min.Procedia Computer Science, 2010.CMS – Compact Muon Solenoid.Data Aggregation System (DAS) provides the ability to search and aggregate information across different data-services – first mention of the MongoDB
1 min.TNA databases: >2000 tables, > 15000 attributesRelational databases with complex SQL scripts, stored procedures, and views.Joins between tables and databasesFront end and back end systemsData sets of different shapes and sizesPerformance does not satisfy requirements.
3 min.InformationAsset is the most important business objects in the entire architecture. It is designed to model the information asset object (also known as Record, Resource, etc...).DetailedAssetView a detailed view of the information Asset.AssetView provides a light weight view of the InformationAsset. This includes the information asset identity and date range of the information asset.
1 min.How to do this in practice?Final destination for data is MongoDB1st – SQL databaseRelations, View, Stored Procedures, Functions, Jobs to support business processes.
3 min.How it works?Left table – Information Asset propertiesRight table – integer pointersETL scripts are used to Extract/Transform/Load EAV data base. In the EAV Information Asset modelled as a SourceItemId – SourceFieldId – SourceTableId – SourceDataBaseId structure. EAV data base table holds only integers and have no joints, which makes metadata extraction process extremely fast. ETL scripts implement no business rules, just simple mappings.Why EAV?Number of different data sets – output always the same. Safe decommission of legacy systems.EAV supports delta updates of the Catalogue (~6000 a day)0.5b rows11m Information Assets
3 min.Next step is to create Information Asset object. At this stage Business Rules are implemented and all related metadata are extracted from Data Warehouse. After Information Asset object fully created it is stored into Object Data Store with Date/Time stamp. Process of creating Information Asset object can be lengthy but used architecture allows creating Information Asset objects in parallel. When metadata in the Data Warehouse database change/update process runs again and new version of Information Asset created. This new version with new Date/Time stamp saved in the Object Data Store.
Geo Information Asset
1 min.Technology:Service Oriented Architecture built on Microsoft .NET 4, WCF, and ESB Neuron.
1 min.Scales horizontally by adding new mongo servers to the replica set.Use GridFS to store digital images – replicas of the Information Assets.
2 min.Searching vs BrowsingSearching – Autonomy, browsing - MongoDBIndex Business Rules service uses Autonomy Schema to create Index Data Set. Allows using multiple schemas and business rules without changing the whole system.Indexer service creates xml files with applied Autonomy schema, which are ready for populating indexes.First time indexing process runs for all data stored in the Object Data Store. All following updates of the index are executing for Information Assets with the latest Date/Time stamp.
2 min.Using GridFS – 6m objects, some have 100s imagesGeospatial data, assets geo referencingUser tagging – tags collection
Thank you to feature team and 10gen for support.
From Sql Server To Mongo D Bv1.0
From SQL Server to MongoDB<br />Aleks Drozdov<br />Enterprise Architect<br />19 September 2011<br />
Outline<br />About The National Archives<br />TNA datasets<br />Information architecture and Discovery service<br />Integration and data migration<br />MongoDB implementation<br />3<br />
About The National Archives<br />The National Archives is a department of the U.K. government and an executive agency of the Ministry of Justice. It’s the official archives of the United Kingdom and cares for, makes available and ‘brings alive’ a vast collection of more than 1,000 years of historical records, including the treasured Domesday Book.<br />The National Archives is one of the world’s largest records repositories, holding more than 11 million records, spanning the Magna Carta to modern government papers. The organization not only keeps its collection secure and available to the public, it also conducts significant research ensuring government records remain accessible for decades to come.<br />The National Archives safeguards historical information and manages current digital information, devising new technological solutions for keeping government records readable now and in the future. As a leading advocate for the archive sector, The National Archives provides world class research facilities and expert advice. It also publishes all U.K. legislation and official publications.<br />
The catalogue<br />The National Archives launched an online catalogue of its collection in 1998. Since then, the catalogue has more than doubled in size and the organization has designed and implemented a number of home-grown systems to improve the accessibility and maintenance of its growing collection.<br /><ul><li>The Catalogue
SOA framework: services and objects<br />In 2010, The National Archives decided to move to a standardized Service Oriented Architecture framework to reduce maintenance costs and provide the flexibility to add new services in the future.<br />13<br />
Creating information asset: eav<br />15<br />Entity-attribute-value model (EAV) is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. <br />
Populating information assets in mongodb<br />16<br />
Creating information asset: mongodb<br />17<br />
Creating information asset: mongodb<br />18<br />
Creating information asset: mongodb<br />19<br />