From Sql Server To Mongo D Bv1.0
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

From Sql Server To Mongo D Bv1.0

on

  • 792 views

 

Statistics

Views

Total Views
792
Views on SlideShare
783
Embed Views
9

Actions

Likes
0
Downloads
7
Comments
0

2 Embeds 9

http://www.linkedin.com 8
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Good morning and thank you for having me here. It is my pleasure to speak about our work
  • Relatively high level architectureChallenges and solution for migrating large and complex relation data to MongoDB – whereMongoDB fits.
  • 1 min.
  • 1 min.~20 datasetsMore than 11 million recordsDigitized recordsOriginal paper records
  • 1 min.
  • 1 min.
  • 1 min.This is work in progress.Started in April 2010Released beta in April 2011.Basic search, Advanced search, Browse hierarchy, Show search results on map
  • 2 min.Clean UIUnified access to all different data sources.
  • 2 min.Browsing – navigating catalogue hierarchy.
  • 1 min.Procedia Computer Science, 2010.CMS – Compact Muon Solenoid.Data Aggregation System (DAS) provides the ability to search and aggregate information across different data-services – first mention of the MongoDB
  • 1 min.TNA databases: >2000 tables, > 15000 attributesRelational databases with complex SQL scripts, stored procedures, and views.Joins between tables and databasesFront end and back end systemsData sets of different shapes and sizesPerformance does not satisfy requirements.
  • 3 min.InformationAsset is the most important business objects in the entire architecture. It is designed to model the information asset object (also known as Record, Resource, etc...).DetailedAssetView a detailed view of the information Asset.AssetView provides a light weight view of the InformationAsset. This includes the information asset identity and date range of the information asset.
  • 1 min.How to do this in practice?Final destination for data is MongoDB1st – SQL databaseRelations, View, Stored Procedures, Functions, Jobs to support business processes.
  • 3 min.How it works?Left table – Information Asset propertiesRight table – integer pointersETL scripts are used to Extract/Transform/Load EAV data base. In the EAV Information Asset modelled as a SourceItemId – SourceFieldId – SourceTableId – SourceDataBaseId structure. EAV data base table holds only integers and have no joints, which makes metadata extraction process extremely fast. ETL scripts implement no business rules, just simple mappings.Why EAV?Number of different data sets – output always the same. Safe decommission of legacy systems.EAV supports delta updates of the Catalogue (~6000 a day)0.5b rows11m Information Assets
  • 3 min.Next step is to create Information Asset object. At this stage Business Rules are implemented and all related metadata are extracted from Data Warehouse. After Information Asset object fully created it is stored into Object Data Store with Date/Time stamp. Process of creating Information Asset object can be lengthy but used architecture allows creating Information Asset objects in parallel. When metadata in the Data Warehouse database change/update process runs again and new version of Information Asset created. This new version with new Date/Time stamp saved in the Object Data Store.
  • Information Asset
  • Geo Information Asset
  • GridFS
  • 1 min.Technology:Service Oriented Architecture built on Microsoft .NET 4, WCF, and ESB Neuron.
  • 1 min.Scales horizontally by adding new mongo servers to the replica set.Use GridFS to store digital images – replicas of the Information Assets.
  • 2 min.Searching vs BrowsingSearching – Autonomy, browsing - MongoDBIndex Business Rules service uses Autonomy Schema to create Index Data Set. Allows using multiple schemas and business rules without changing the whole system.Indexer service creates xml files with applied Autonomy schema, which are ready for populating indexes.First time indexing process runs for all data stored in the Object Data Store. All following updates of the index are executing for Information Assets with the latest Date/Time stamp.
  • 2 min.
  • 1 min.
  • 2 min.Using GridFS – 6m objects, some have 100s imagesGeospatial data, assets geo referencingUser tagging – tags collection
  • Thank you to feature team and 10gen for support.

From Sql Server To Mongo D Bv1.0 Presentation Transcript

  • 1.
  • 2. From SQL Server to MongoDB
    Aleks Drozdov
    Enterprise Architect
    19 September 2011
  • 3. Outline
    About The National Archives
    TNA datasets
    Information architecture and Discovery service
    Integration and data migration
    MongoDB implementation
    3
  • 4. About The National Archives
    The National Archives is a department of the U.K. government and an executive agency of the Ministry of Justice. It’s the official archives of the United Kingdom and cares for, makes available and ‘brings alive’ a vast collection of more than 1,000 years of historical records, including the treasured Domesday Book.
    The National Archives is one of the world’s largest records repositories, holding more than 11 million records, spanning the Magna Carta to modern government papers. The organization not only keeps its collection secure and available to the public, it also conducts significant research ensuring government records remain accessible for decades to come.
    The National Archives safeguards historical information and manages current digital information, devising new technological solutions for keeping government records readable now and in the future. As a leading advocate for the archive sector, The National Archives provides world class research facilities and expert advice. It also publishes all U.K. legislation and official publications.
  • 5. The catalogue
    The National Archives launched an online catalogue of its collection in 1998. Since then, the catalogue has more than doubled in size and the organization has designed and implemented a number of home-grown systems to improve the accessibility and maintenance of its growing collection.
    • The Catalogue
    • 6. Cabinet Papers
    • 7. DocumentsOnline
    • 8. ERO
    • 9. Library Catalogue
    • 10. Taxation Records
    • 11. Trafalgar Ancestors database
    • 12. UK Government Web Archive
    • 13. Census records
    • 14. Merchant seamen registers
    • 15. More…
  • The catalogue
  • 16. The catalogue
  • 17. DISCOVERYhttp://discovery.nationalarchives.gov.uk
    8
  • 18. Discoveryhttp://discovery.nationalarchives.gov.uk
    9
  • 19. Discovery: browse hierarchy
    10
  • 20. The CMS Data Aggregation System
    11
  • 21. Relational model
    12
  • 22. SOA framework: services and objects
    In 2010, The National Archives decided to move to a standardized Service Oriented Architecture framework to reduce maintenance costs and provide the flexibility to add new services in the future.
    13
  • 23. sql server databases
    14
  • 24. Creating information asset: eav
    15
    Entity-attribute-value model (EAV) is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest.
  • 25. Populating information assets in mongodb
    16
  • 26. Creating information asset: mongodb
    17
  • 27. Creating information asset: mongodb
    18
  • 28. Creating information asset: mongodb
    19
  • 29. Discovery architecture
  • 30. Using mongodb: architecture(replica set)
    21
    ARBITER
    MONGO_SRV2
    MONGO_SRV1
    mongo_db2 (250GB)
    mongo_logs2 (75GB)
    FILER2
    mongo_db1 (250GB)
    mongo_logs1 (75GB)
    FILER1
    NetApp FAS3140 HA
  • 31. Discovery: search information asset
    22
  • 32. Discovery: browse information assets hierarchy
    23
  • 33. Browse from information asset details
    24
  • 34. Discovery: using the system - api
    25
    discovery@nationalarchives.gov.uk
  • 35. Thank you!
    http://discovery.nationalarchives.gov.uk
    http://discovery.nationalarchives.gov.uk/api.htm
    adrozdov@nationalarchives.gov.uk