Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

2,030 views

Published on

Many organizations today, due to regulatory compliance or other needs, are finding it necessary to archive large volumes of data into long-term storage. Learn how MongoDB provides a flexible, efficient, scalable, long-term document storage that can adapt to your organization's changing needs over time. A case study from US federal government agency with 130 legacy applications that needed to be archived and integrated into a federated view of archive and real-time operational data. Regulations in many industries (eg HIPAA, SOX, Basel 3, FATCA etc) are driving the need for data retention and the need for query processing across archives and operational data.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,030
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

  1. 1. Legacy System Archiving With XML, XQueryand MongoDB Dave Watson SVP, iWay Software @watsondaveny watson.dave@gmail.com
  2. 2. AgendaXML Archive Overview and Business Use CasesXML Archive Technical Discussion Copyright 2009, Information Builders. Slide 2
  3. 3. iWay Archive What is XML Archive  An extension of ESB for archiving data  Leverage ESB process-oriented integration and data federation capabilities  Long term data retention  Large repository, large index (Big Data)  Search and retrieve capabilities (High performance)  Business use examples  Satisfy regulatory requirement  e-Discovery (e.g. research, forensic)  Business analytics Copyright 2009, Information Builders. Slide 3
  4. 4. Archive – Solving Business NeedsExamples of Business Requirements:Regulations / Reqrs Example Data RetentionFederal Record Patient health records 75 years (after lastRetention Requirement episode of care)FDA 21 CFR Part 11 Clinical trials and FDA 35 years approvalHIPAA (Healthcare) Pediatric medical records 21 yearsSarbanes-Oxley (public Audit 7 yearscompanies)SEC 17a-4 (Financial Account records 6 yearsservices) Corporate documentation Life of the enterpriseResearch Life science Long-termAnalytics Financial / Legal Long-term Copyright 2009, Information Builders. Slide 4
  5. 5. Archive – Types of DataCan handles all types of data, for example:  Electronic Documents Word, Excel, EDI, HL7, XML, …  Applications  ERPs, CRMs, SAP, SFDC, …  Database Data IMS, DB2, Oracle, Sybase, SQL Server, MUMPS, …  Electronic Files VSAM, Unix, Logs, …  Email Outlook, Lotus Notes  Others Multimedia files, Paper, Blueprints, Forms, Claims, …ESB adapter components can be used to connect to the different types ofdata.
  6. 6. Archive – Archiving NeedsExamples of Archiving Requirements:Archive RequirementsPolicy Based – Logical selection of DB records/transactions to be archivedStore very large amounts of data in archiveKeep data for a very long periods of timeBecome independent from Applications/DBMS/Systems – future proofProtect authenticity of data – regulation and complianceAccess archived data when needed / as neededQuickly search huge numbers of archived documentsDiscard data after retention period – regulation and compliance Copyright 2009, Information Builders. Slide 6
  7. 7. Archive – Example Business Use Case Store 75 years worth of patient data  Diverse Sources  XML  MUMPS  Oracle  HL7 Support archive, query and integration scenarios  XML to remain unchanged and exist outside the data store  Ability to query documents  Ability to retrieve original XML or part of XML using XQuery  Ability to integrate XML archived data in federated services with operational sources (e.g. MUMPS, HL7, Oracle) Copyright 2010, Information Builders. Slide 7
  8. 8. Archive – Example Business Requirements Highly scalable high performance document management database Easily integrates into a ESB architecture  Multi-threaded parallel processing  Distributed processing  Just another data source along with, e.g., Oracle and MUMPS databases  Leverage ESB Tools for process orchestration, process monitoring, data mapping/transformation, security and data aggregation capabilities. Implementation and vendor neutral – archived data (e.g. XML) stored in the operating system‟s native file system Copyright 2007, Information Builders. Slide 8
  9. 9. XML Archive Technical Discussion Copyright 2009, Information Builders. Slide 9
  10. 10. Overview Highly configurable ESB Java application that can be customized to specific needs.Load Channel Reads XML documents and loads them into the document repository.Query Channel Handles query request and response against the document repository.Test Channel Simple visual interface displaying functionality and usage of the Query API. Copyright 2009, Information Builders. Slide 10
  11. 11. Technology InvolvedESB - iWay Service Manager (commercial) IBM WebSphere ESB (commercial) Oracle Service Bus (commercial) WS02 ESB (open source)mongoDB - http://www.mongodb.org/JSON - Java Script Object NotationXQuery - XML query language Copyright 2009, Information Builders. Slide 11
  12. 12. mongoDB “Humongous” Scalable, high-performance, document-oriented database. JSON-style documents. Mirror capable. Auto-Sharding (clustering), horizontal scaling, automatic failover, zero single point failure. MapReduce support for complex processing. Work is distributed among the cluster. GridFS support.  A distributed file system. Commercial support from 10gen (OEM by iWay Software) Copyright 2009, Information Builders. Slide 12
  13. 13. XQuery A query and functional programming language for XML documents.  Is to XML documents what SQL is to databases. “FLWOR” expressions.  FOR, LET, WHERE, ORDER BY, RETURN  Example: for $x in /FEDREG/CNTNTS/AGCY where $x/EAR=„Agricultural‟ order by $x ascending return $x Supports syntax for constructing new documents. Copyright 2009, Information Builders. Slide 13
  14. 14. JSON – JavaScript Object Notation The new data-interchange language of the web. www.json.org Copyright 2009, Information Builders. Slide 14
  15. 15. Base Loading Architecture ESB Listener Flow XML to Store Store JSON JSON XML GridFS Binary mongoDB Storage Copyright 2009, Information Builders. Slide 15
  16. 16. Base Query Architecture ESB Listener Flow HTTP Query (Optional)Requester DB Get XML GridFS Binary mongoDB Storage Copyright 2009, Information Builders. Slide 16
  17. 17. Loading ModificationExternal Storage ESB Listener Flow XML to Store Store JSON JSON XML mongoDB File System Copyright 2009, Information Builders. Slide 17
  18. 18. Loading ModificationSAP Loading Architecture ESB Flow RFC IDOC to Store XML Store Server XML JSON SAP XML to StoreSystem JSON IDOC GridFS Binary mongoDB Storage Copyright 2009, Information Builders. Slide 18
  19. 19. Loading ModificationChange Data Capture Loading Architecture ESB Flow CDC XML to Store Store Listener JSON JSON XMLRDBMS GridFS Binary mongoDB Storage Copyright 2009, Information Builders. Slide 19
  20. 20. Loading Modification Salesforce.com Loading Architecture ESB Flow SOAP XML to Store Store Listener JSON JSON XMLSalesforce System GridFS Binary mongoDB Storage Copyright 2009, Information Builders. Slide 20
  21. 21. Loading ModificationFTP Loading Architecture ESB Flow FTP XML to Store Store Server JSON JSON XML File System GridFS Binary mongoDB Storage Copyright 2009, Information Builders. Slide 21
  22. 22. Query ModificationWeb Service SOAP Query Architecture ESB Listener Flow Web (Optional) Service SOAP Query Get XML/ Client DB IDOC GridFS Binary mongoDB Storage Copyright 2009, Information Builders. Slide 22
  23. 23. The Test ClientNote: The archive is designed to be called from other flows or programs. A simple AJAX based human interface for querying the XML Archive. Provides examples of the HTTP query interface provided by the base XML Archive. Installed with the base implementation of the XML Archive. Copyright 2009, Information Builders. Slide 23
  24. 24. Simple ExampleLoaded this simple XML Doc: Copyright 2009, Information Builders. Slide 24
  25. 25. Displaying the DocumentXML Link:JSON Link: Copyright 2009, Information Builders. Slide 25
  26. 26. Basic QueryReturn all documents who have the name attribute of the <a> element equal to “bob”. Copyright 2009, Information Builders. Slide 26
  27. 27. Advanced Queries Query handler is a wrapper around the mongoDB query language.Support for: And Or Regular Expressions Ranges Copyright 2009, Information Builders. Slide 27
  28. 28. Basic XQUERY Return only the <b> element from the document. Formatted Result: Copyright 2009, Information Builders. Slide 28

×