Clare Somerville Trish O’Kane Data in Databases


Published on

Data in Databases
It's not what you think
Clare Somerville and Trish O’Kane

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Adam Brown, Statistics NZGenerally my feedback on ISO 15489 would be: why can't/shouldn't it be applied to data? At the end of the day data is just another record so it really shouldn't be an issue. Having said that I'm not sure that it would particularly add any value to it either. One of the main issues is how to define a record, in terms of data. It is the individual data item or is it a whole dataset? This is certainly the most tricky issue because you generally maintain metadata at the dataset level but potentially slice and dice at lower levels.The other key addition to it for data would have to be a greater focus on usability (7.2.5). As we know, with data this isn't a given to the same extent as it can be for a document. Significantly more information is required to be able to do anything with it - essentially there is a different relationship between data and its metadata than documents and their metadata.In summary, the principles fit for applying it to data (and should be applied!) but as it is it wouldn't add much value.
  • Adam Brown – Stats NZ
  • CS: We used to believe that there was 1 byte of metadata for every 10 bytes of data.
  • CS: But those numbers are changing with metadata now exceeding data.
  • Data is a set of values, in this case in a comma delimited file. I need more information in order to know how to read this.
  • From FMG
  • Volume, velocity, variety, complexity! Things like smart grids causing significant data volume rises.Velocity – speed produced, received, processedVariety – structured databases, emails, metering, video, image. Financial trans etc. Much unstr. – content analytics, taxonomy, ontology. Non-trad BI tools
  • We’ve built skills in DI, DQ in BI and analytics. Have core data management skills to support BI.Other data requires the same DM practices
  • This slide provides some background on what data lifecycle management is, and generic reasons as to why it is important.
  • This slide provides a brief high-level over view of the future state of DLM at HNZCAccurate, relevant, timely delivery implies: data will be managed through it’s lifecycle in a way that ensure there is a single source of truth, meeting the needs of users, and that can be accessed in a timely manner. “Everyone who needs it” will include mobile workers, this is also why format is an important consideration. This is just a take on “the right information to the right person at the right time (and in the right format)Finding old or new information quickly implies: information is described by metadata so that it has context and meaning, and systems use that metadata to locate relevant content and deliver information to users.Clear guidelines implies: principles will be agreed on, which will decompose into business rules for systems. For example: that X information should be kept for Y years, then disposed of by following Z procedures.Appropriate levels of management implies: the value of data is understood within the context of business need, risk and legislation, and that processes are in place to manage how business need Is determined, and how risk and legislative obligations are managed.
  • Clare Somerville Trish O’Kane Data in Databases

    1. 1. Datain databases “It’s not what you think” Clare Somerville Trish O’Kane
    2. 2. Our point Long term preservation of data requires understanding how data is created and managed We have to work out: ◦ What data the business needs to keep ◦ What records the business needs to create and keep And….. how ◦ What data must be unchanged ◦ What we mean by usable and retrievable
    3. 3. The problem, as we see it What is a record and its attributes WeWhat is a database and how they are built and maintained will cover How can we use data sets to create records? What is a data warehouse and how they are built and maintainedHow can we ensure that useful data sets are available over time
    4. 4. Agenda The problem Definitions Delivering data & records from data ◦ Data warehousing ◦ Data “lifecycle” management Conclusion
    5. 5. The problem Databases have replaced many semi-structured records ◦ Register of Births, Deaths and Marriages (and Divorces!) ◦ EQC claims data But - we want some of that information available long term in a usable format Records managers are unfamiliar with the world of structured data ◦ Disposal outcome in a draft disposal authority: “When database decommissioned, transfer to Archives NZ” ◦ Transfer what?
    6. 6. Who wants what? What have we got? ◦ Data in databases What do we want? ◦ Records When do we want them? ◦ Now, and for the long term But….what is a record in the context of data? ◦ The individual data item? ◦ A whole dataset?
    7. 7. What have we got1. Customers ◦ Customers for data ◦ Customers for records2. Information assets ◦ Records ◦ Transactional data in databases ◦ Datasets ◦ Data marts and data warehouses3. What do we have to do to? ◦ Principles from data warehousing ◦ Data life cycle management
    8. 8. DefinitionsRecords, metadata, data, source systems, database, data warehouse
    9. 9. Records Recordkeeping definition In structured worldPublic Records Act 2005  A record is a line of A record or class of data in a table in a records in any form in database whole or part, created or received by a public office in the conduct of its affairs
    10. 10. Attributes of a record Recordkeeping Data management perspective perspective Documents the carrying out  Field types of the organisation’s business objectives, core business ◦ Numeric functions, services and ◦ Character deliverables, and/or ◦ Date/time Provides evidence of  Composite, derived compliance with any current jurisdictional standards,  Values and/or Documents the value of the resources of the organisation and how risks to the business are managed, and/or Supports the long-term viability of the organisation
    11. 11. Data and metadataDocuments and metadata“Essentially there is a different relationship between data and its metadata than documents and their metadata”
    12. 12. Is it data or is it metadata?It depends, doesn’t it?It’s about the level at which it is used/applied date created E.g. Date createdCustomer ID Date created Customer Customer name Type123 2008-10-20 Bloggs, Joe Retailer124 2008-10-23 Mouse, Minnie Distributor125 2008-10-26 Max, Direct Metadata
    13. 13. Metadata in the data warehouseBusiness metadata Technical metadataLink between database What data, fromand users – road map for where, how, when etcaccess Developers Business users Technical users Analysts Maintenance and Less technical growth On-going development
    14. 14. Metadata in the data warehouseBusiness metadata Technical metadata Structure of data Table names Table names Keys Attribute names Indexes Location Program names Access Job dependencies Reliability Transformation Summarisations Execution time Business rules Audit, security controls
    15. 15. Metadata Data Metadata 10 bytes 1 byte
    16. 16. Metadata Data Metadata Heaps!
    17. 17. 0349,000,A," ","CHANGE ADD ON MED CERT "," "," "," "," ","S","GASUP"," ",00000,71909,00000,0,71909,10393470,00000.00,00000.00,00000.00,00000.0 0,00000.00,000000,71937,72266,0,139,600,4,72266,471,360480713,000000000 ,1,00090.00,00037.00,000031543560 ",00000.00,00000.00,+000000.00,0000000,0000,000,00,000000000,00000,0000 0,000000.00,000000.00,000000000,009,72266,00000,72268,16414213,00000000 1,000000000,244,0114340511,04,01,+000000.00,+000000.00,00000,000000,+00Data – 0000.00,610,0,00146.13,000000.00,000,000,610,0,290763901,290763901,0000 00000,000000000,000069987378 0174,000,D,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REED ","Y","DSWSINVE106 ","BELOQ","Y","NAWEK","TANIAcomma ","REED ","C","N",02651,009,0000,72273,72268,16405202,0114340511,03,72245,0000, 003,011434,000000228855 0174,000,A,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REEDdelimited ","Y","DSWSINVE106 ","REED ","BELOQ","Y","NAWEK","TANIA ","C","N",02651,009,0000,72273,72268,16405202,0114340511,04,72245,0000, 003,011434,000000228855 0161,000,D,"A",126,72263,00000.00,600,5,360480713,000007282728 0161,000,A,"A",126,72263,00000.00,600,5,360480713,000007282728 0057,000,D," "," "," "," "," ","A"," "," ","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,1 32,00000,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000 .00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,000409413299 0057,000,A," ","MANOP "," "," "," ","A"," "," ","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,1 32,72269,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000 .00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,000409413299 0270,000,A," "," "," ","N","N","G",128,72266,72268,16414261,01,00000.000,00000.000,0,139,000 00.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00 000.00,600,5,471,000,000,000,000,000,000,000,000,0001,360480713,00000.0 0,000602537445 0062,000,A,"YYYYYYYYYY ","AUTH 01532600063000000000131197N014101 0000000 ","VA","SATRA","DSWSAUCK119 "," ","MANOP"," ",003,132,72268,16414266,0000000000,0,607,362400470,000000000,000084800 530
    18. 18. Data – in a table
    19. 19. Database
    20. 20. 3 layers •User interfaceDatabase •Rules and algorithms •Data
    21. 21. Application layer Provides views, creates reports Turns data into information Adds, overwrites, deletes data Runs rules and processes Data layer Data in tables Acted on by application layer
    22. 22. Can data fit the PRA definition?• We are “format neutral” in the management of records, so….• Data can be records! – Births Deaths and Marriages Register – EQC claims data• Test questions – If we exclude data what have we lost? – What is the impact of losing data? • On the business • For the future
    23. 23. Source solution is nota recordkeeping system The Solution System is not aApplication layer recordkeeping system because it… • Holds transactional data, not evidence of transactions in context (records) • Isn‟t tamper proof – Difficult to know exactly what the Data layer application layer is doing – Different tables and rows may be managed differently – Hard to roll back to a point in time • Must overwrite „redundant‟ data to run efficiently – Compromise of history vs speed – Business use is the priority • The data layer is not usable without the application layer
    24. 24. Inside a database Here today - gone tomorrow Transaction metadata ◦ Example: An activity about a customer is a record Is there a Unique ID  For the transaction?  For the customer? Where and when are/were components located?  Multiple data tables in one database  Multiple data tables across multiple database Table names and column names Standard names for elements across tables
    25. 25. Source / business databases Data stored in tables Normalised structure Lots of data Large number of users Lots of very quick transactions Varying history retained Mostly data is overwritten
    26. 26. Data warehouse
    27. 27. Data warehouse Storing and accessing large amounts of data Central repository for all or significant parts of the data that an enterprise’s various business systems collect
    28. 28. Multiple Designed for Historical source reporting data systems and analysis Lots of data! Large Transaction queries level data MultipleCorporate table joins effort Centrally Unpredictable owned useCorporate needs Data Pressure on warehouse resources
    29. 29. What is thesimplest/most robustapproachto deliverdata andrecords from databases?
    30. 30. Elegant solutions needed
    31. 31. 1 Create policy to document: What authoritative records must be retained and what metadata must be retained What formats are acceptable Which (if any) records and metadata are considered transient artefacts, and why (e.g. format shifting duplicates, quality checking etc), Get approval for destruction of transient artefacts as part of the normal functioning of the systems that dispose of them
    32. 32. Approach: create and exportrecords from solution system1. Identify what data tables/records are needed and that can be produced2. Map identified records to disposal authorities ◦ Which records must be kept beyond system decommission ◦ Identify the business need for retention3. Use the application layer to create and export those records in a suitable format4. Store in recordkeeping system e.g. data warehouse or EDRMS5. Retain records needed for the business post- decommission
    33. 33. 2 Persistently associate metadata Appropriate metadata associated and retained with authoritative records ◦ Identify data linkages between systems ◦ Retain those linkages or ◦ Consolidate metadata and associated record objects into one system, and ensure they are persistently associated Ensure migrated data/metadata/objects retain their context (e.g. date created, author etc)
    34. 34. Future stateBAU transfers to recordkeeping systems Structured data to data warehouse Customer mgmt Case EDRMS system mgmt system Create key records and send to EDRMS
    35. 35. Data warehouses as anexample of good practice
    36. 36. Managing data
    37. 37. Data feeds - principles Direct data feeds from source systems Not changed in any way No intervening processes All changes to the data Fully auditable Reconcile to source system
    38. 38. For Example: one table…Before:  After:29 months data  29 months data162 tapes  4 physical files400 million records  27 million records88 GB  6 GB Month1 Month2 Month3 ...... Monthn ... Compare Compare ...... ... Differences1 Differences2 ...... ... Consolidated file
    39. 39. Subsets Frequently used data At a point in time Smaller, quicker Easier to use Daily, weekly, monthly
    40. 40. Summary data Summary layer Analysts access the summary layer Smaller, easier Data Marts
    41. 41. Benefits of data warehouseAccessible Stored online Quick and easy to access Multiple sources of data Updated daily Full history – track everything Can do more – freedom to explore Tuned environment One version of the truth
    42. 42. Data management Data does not manage itself! Difficult, unruly Standards, processes Roles and responsibilities Data warehouse team Skills ◦ Data warehousing, Data management, Software, Hardware, Metadata, Architectur e, Analysis, Performance, tuning Coordination, communication, marketing
    43. 43. Best practice Data warehousing around for years Proven architectures, technologies, methodologies Good infrastructure  … but will it last?
    44. 44. Challenges – big data33% - data growthcontributes to performanceissues “most of the time”Managing storage may cost3-10 times cost ofprocurementAverage company keeps 20-40 duplicates of its data
    45. 45. Helping IT and the business tocollaborate in managing data It’s not just about BI Business and IT must work togetherHelping IT and the business tocollaborate in managing data
    46. 46. Data “lifecycle” management
    47. 47. Decommission = risk Old case Old mgmtEDRMS system New New Data case EDRMS mgmt warehouse system Partial exports
    48. 48. Data lifecycle management Data lifecycle management (DLM) Managing the flow of data, information and associated metadata through information systems and repositories, from creation and storage through to when it can be discarded. Recognises that the importance and business value of data does not rely on its age, or how often it is used.
    49. 49. Why DLM Data and information has value for ◦ strategic and operational business needs ◦ managing risk ◦ meeting legislative obligations Value of information decays over time Some information can be archived, some discarded Occasionally, sometimes unexpectedly, older data may need to be accessed again, quickly, completely and accurately
    50. 50. DLM Components Create or Modify Standards Formats Requires: Core process artefacts Includes data Retrieval Connected systems Automated capture validation Property Retain or Dispose Maintain Archive Customer Organise Transfer Tenancy Describe Destroy Manage Requires: Requires: Disposal Authorities Risk identification Business requirements Lifecycle policies Disposal planning Metadata schema Tiered Storage Business classification linked to business process Use Access Share Find Requires: Single source of truth Disposal Authorities Disposal Planning Tiered Storage
    51. 51. Conclusion
    52. 52. Create and maintainPrinciple 1: Recordkeeping Must bePlanned and Implemented1. Responsibility assigned CEO down2. Policy3. Procedures4. Responsibilities defined, resourced5. Recordkeeping programme & monitoring
    53. 53. Principle 2: Full & accurate records of business activity must be madeRequirement Data Data Warehouse base1. Functions and business activities identified and documented  2. Records of business decisions and transactions must be created  3. All records of business activity captured routinely into an organisation-wide recordkeeping   framework4. Training provided  
    54. 54. Principle 3: records must provide authoritative and reliable evidence of business activityRequirement Data Data base Warehouse 10. Authentic: accurately documented creation, receipt, & transmission   11. Reliability & integrity, maintained unaltered   12. Useable, retrievable, accessible   13. Complete, with content & contextual information   14. Comprehensive, provide authoritative evidence of all business activities  
    55. 55. Principle 4: records must be managed systematicallyRequirement Data Data Warehouse base 15. Identified & captured in recordkeeping framework   16. Organised according to a business classification scheme   17. Reliably maintained over time in recordkeeping framework   18. Useable, accessible & retrievable for the entire period of their retention   19. Contextual and structural integrity maintained over time   20. Retention & disposal actions systematic  
    56. 56. RK capability of system(s) A system that holds authoritative records ◦ Must be capable of recordkeeping, or ◦ Made capable, or ◦ Must transfer records to a recordkeeping system Who makes that decision? ◦ Should be business owner ◦ (with advice from IT) Data warehouses show us ◦ what can be done ◦ how to do it
    57. 57. Developing an Enterprise Information Management Framework Develop a strategy and Establish principles Assess current and Document legislative INFORMATION CULTURE INFORMATION STEWARDSHIP GOVERNANCE Authority, management, monitoring and performance roadmap Define: desired maturity framework Establish structures and - Policies Determine metrics and Understand compliance of information management arrangements - Standards measuring Determine and optimise functions - Business Rules Define roles and processes Establish monitoring business benefits arrangements processes Manage information risk A blueprint for the semantic Model key information flows Identify: Organise information for: INFORMATION ASSET and physical integration of Establish IS design - Authoritative information - Navigation and retrieval enterprise information principles and standards - High-value information - Discovery ARCHITECTURE assets, technology and the Develop an inventory of - Critical information - Content types and business information, systems and Plan for disaster recovery categorisation processes BUSINESS REFERENCE AND STRUCTURED AND UNSTRUCTURED INTELLIGENCE AND MASTER DATA INFORMATION The DATA MANAGEMENT Develop an information lifecycle strategy and roadmap Develop a recordkeeping strategy and roadmap behaviours, WAREHOUSING values and Capture, store and re-use core Enable integration and interoperability Enable compliant retention and norms of the Oversight of Store and transform business entities Plan and manage: disposal in systems enterprise the content, Integrate and deliver Consolidate and match data - Repositories Support access to legacy within the - Storage information description, Perform analytics and reporting Manage and control data quality context of quality, and Distribute core data appropriately - Format Plan for any content migration information Support decision making accuracy of use enterprise Develop: Map across metadata information throughout METADATA MANAGEMENT - Metadata Schema schemas Manage and The connecting foundation for - Controlled Vocabulary Establish monitoring and sustain its lifecycle EIM, used to describe, organise, - Thesauri maintenance processes change integrate, share, and govern enterprise information assets - Business Function Classification Implement metadata Provide Define Utilise system generated metadata management tools information responsibility, leadership roles and Embed EIM in accountability Establish security policies Manage access control performance Establish SECURITY AND CONTROL Policies, rules and tools that ensure the proper control, and rules Manage classified information management Deliver stewardship Model information security Ensure regulatory compliance processes protection and privacy of and scenarios Establish monitoring and training and Establish information Build security into system metrics ongoing monitoring metadata support and Develop maintenance toolkits and reference Social Emails Audio Mobile IT/OT Transactional material Data Documents Images Text Movies Search
    58. 58. Future state of data Accurate, relevant, timely delivery of data and information ◦ Trustworthy information ◦ Where it is needed ◦ Formats most appropriate to business need and future Information found quickly, whether it’s old or new Clear guidelines for systems and processes ◦ Keep what’s needed for only as long as it’s needed ◦ In the right format Data has recognisable value and appropriate levels of management ◦ Business need: we know what’s important, and when it’s important ◦ Risk: we’re clear about what to manage, and how ◦ Regulatory framework: we meet legislative obligations
    59. 59. Our point Long term preservation of data requires understanding how data is created and managed We have to work out: ◦ What data the business needs to keep ◦ What records the business needs to create and keep And….. how ◦ What data must be unchanged ◦ What we mean by usable and retrievable
    60. 60. Data in databases “It’s not what you think”Clare SomervilleTrish O’Kane