Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

525 ibm optim


Published on

  • Be the first to comment

525 ibm optim

  1. 1. Best Practices in Database Archiving and Information Lifecycle An InformationWeek Webcast Sponsored by
  2. 2. Webcast Logistics
  3. 3. Today’s Presenter Carl Olofson, Research Vice President, Application Development and Deployment, IDC
  4. 4. Best Practices in Database Archivingand Information Lifecycle ManagementHow ILM Saves Money, Reduces RiskCarl OlofsonResearch Vice PresidentIDCMay 2011Copyright IDC. Reproduction is forbidden unless authorized. All rights reserved.
  5. 5. AgendaThe Problem  Unchecked database growth  Hidden costs of large databases  Security and privacy in test dataInformation Lifecycle Management  What is ILM?  Database archiving – Requirements of database archiving – Benefits of database archiving  Test data masking – How data is masked – Benefits of data maskingConclusions / Recommendations© IDC Visit us at and follow us on Twitter: @IDC Source:/Notes: May-11 5
  6. 6. Unchecked Database GrowthAs a database grows…  It requires larger indices  It consumes more storage  It requires specialized administration to tune  It needs more processor power to execute queries and updatesThe hidden costs include  More storage administration  More downtime for reorgs  Larger batch windows for backups© IDC Visit us at and follow us on Twitter: @IDC May-11 6
  7. 7. Polling Question #1How rapidly is your main production database growing?  Under 10% per year  10% per year  25% per year  Over 25% per year© IDC Visit us at and follow us on Twitter: @IDC May-11 7
  8. 8. Elements of Test Data ManagementSelecting the data  Must be referentially complete subset of the database  Must reflect realistic patterns of data to ensure valid testingProtecting sensitive data  Sensitive data must be masked to prevent unauthorized viewing  Masked data needs to make sense to the test system.© IDC Visit us at and follow us on Twitter: @IDC May-11 8
  9. 9. Security and Privacy in Test DataNormal Security Is Often Suspended for Test Data  Confidential data could be compromised  Privacy requirements could be breached  Corporate policies may be violated  Contractual requirements and government regulations could lead to legal culpabilityIn-House Masking Is Inadequate  Simplistic results create unrealistic test data  Code must be changed as the database changes, an unreasonable burden on in-house IT© IDC Visit us at and follow us on Twitter: @IDC May-11 9
  10. 10. Polling Question #2In what role is the person in your organization primarilyresponsible for refreshing test data?  DBA  Development Manager  Project Leader  Developer  Other© IDC Visit us at and follow us on Twitter: @IDC May-11 10
  11. 11. Information Lifecycle Management(ILM) Define Archive Manage Protect Test© IDC Visit us at and follow us on Twitter: @IDC May-11 11
  12. 12. The Basic Elements of ILMDefinition  Policies governing data creation, management, removalSecurity  Encryption and access control at a granular levelProtection  Blocking access to sensitive data, including test data  Data test data protection done through data maskingArchiving  Removal of inactive data from the live database  Storage in a compressed, read-only datastore© IDC Visit us at and follow us on Twitter: @IDC May-11 12
  13. 13. The Data Masking ChallengeApplication testing requirements  Using simple XXXX or #### or “Ipsum lorem” usually not adequate for robust application testing.  Data must be representative of actual data in value range and distribution.  Masked data must “make sense”; zip codes correlate to city and state, for instance.  Secured information, such as personal identification, should not be inferable from the masked data.  The fake data should be consistent.© IDC Visit us at and follow us on Twitter: @IDC May-11 13
  14. 14. Archiving: Types of DataReference  Created in response to a stand-alone event.  Randomly retrieved without requiring context  Active until a special event  Examples: Customer, Patient, ProductTransactional  Created at the start of a business process.  Retrieved in the context of a transaction  Deactivated at the end of a business process.  Examples: Sales order, treatment, shipmentStreaming  Created at reception of a streamed item  Inactive immediately (cannot be updated)© IDC Visit us at and follow us on Twitter: @IDC May-11 14
  15. 15. Classes of DataActive  Data that is still being updated.  Includes reference and transactional data.Inactive  Data no longer active, but retained for query and reporting  Includes historical and streamed data  Historical data is inactive transaction data – Sales order completed, revenue recognized – Inventory item sold and picked up – Patient treatment completed, patient discharged© IDC Visit us at and follow us on Twitter: @IDC May-11 15
  16. 16. Buildup of Inactive DataHypothetical Example  Suppose we have a sales order table  We start the year with 10,000 orders per month  Orders grow at 1% per month  Each order takes 60 days to complete (recognize revenue)  Orders in process are active data  Completed orders are inactive data© IDC Visit us at and follow us on Twitter: @IDC May-11 16
  17. 17. Buildup of Inactive Transaction Data Sales Order Table 160,000 140,000 120,000 100,000 Rows 80,000 Inactive 60,000 Active 40,000 20,000 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecInactive %© IDC Visit us at and follow us on Twitter: @IDC May-11 17
  18. 18. Inactive Data Clogs the DatabaseDBMS Overhead  Big Indexes  Storage demand  Slower queries  Slower transaction processingOperational Overhead  DBA tuning  Disruption for unload/reload and reorg  Longer backup batch windows© IDC Visit us at and follow us on Twitter: @IDC May-11 18
  19. 19. Polling Question # 3Think of transaction data that you retain. What is your requiredretention period?  3-5 years  6-10 years  Over 10 years  We don’t have a retention policy© IDC Visit us at and follow us on Twitter: @IDC May-11 19
  20. 20. Approaches to “Aging Out” DataPartitioning  Move data to low frequency partition on 2nd or 3rd tier storage  Use local partition indexes to avoid growth of global table indexes  Perform maintenance operations by physical partition  Problem: this approach impacts the whole table, and creates a complex operational and management challenge that extends across the databaseArchiving  Select referentially complete subsets of inactive data  Move the inactive data to an archiving system outside the database  Ensure that the archive can support SQL and that queries can, if necessary, be executed in an integrated manner with those of the live database.© IDC Visit us at and follow us on Twitter: @IDC May-11 20
  21. 21. Benefits of ArchivingDatabase benefits  Faster queries  Less index maintenance overhead  Smaller dataspaces and simpler schema than partitioning option  Requires less CPU; license/maintenance savings for DB and applicationsOperational benefits  Less schema maintenance than partitioning option  Stable backup windows  Much less data reorganization© IDC Visit us at and follow us on Twitter: @IDC May-11 21
  22. 22. Application RetirementInactive Applications  Applications become inactive when they are no longer used, and their functions have been migrated elsewhere.  They commonly still have data that must be retained for corporate policy or legal reasons.  For this reason, enterprises keep them running, maintaining them, and paying fees for them even though they are inactive.Retiring Inactive Applications  All their data is inactive, so it may be archived altogether  The archiving system must retain the ability to report on the data.  The savings in servers, storage, software, and operations costs can be very significant.© IDC Visit us at and follow us on Twitter: @IDC May-11 22
  23. 23. Critical Requirements of DatabaseArchivingDBMS Support  Must support ongoing versions of major RDBMS including DB2, Informix, Oracle, Sybase ASE, Microsoft SQL Server, and MySQL  Must record schema and schema changes to support data retrieval even after data definitions have changed.  Must support SQL and ODBC/JDBC used by applications.Technical requirements  Random data retrieval  Compressed, optimized based on read-only access  Reasonable performance on 2nd and 3rd tier storage© IDC Visit us at and follow us on Twitter: @IDC May-11 23
  24. 24. Data GovernancePurpose is to ensure that data is trustworthy  Data is well defined, and maintenance is rational  Original source is known  Sequence and agents of update are known (provenance)  Data is valid and consistent  No unauthorized access has happened  No sensitive data is visible to unauthorized personnel  Data is retained as required without compromising performanceBusiness Benefits  Database development and management addresses known business needs  Trade secrets are not exposed and confidences are not compromised  Ensures contractual and legal requirements compliance  Reduces risk of actual or opportunity cost due to data-driven application error© IDC Visit us at and follow us on Twitter: @IDC May-11 24
  25. 25. ILM and Data Governance Data Governance Uniform Data Definition & Policy ManagementInformation Lifecycle Trust Management Management Validity and Managed Data Data Consistency Security & MonitoringSelection & Retention Protection Assurance Data AccessDatabase Database Test Data Data Provenance Access Log Quality and Control andSubsetting Archiving Masking Cleansing Tracking Analysis Profiling Encryption© IDC Visit us at and follow us on Twitter: @IDC May-11 25
  26. 26. ILM and Database Development andManagement ToolsDatabase Development and Management Tools (DDMT)  Software used by DBAs and data managers to manage the size, performance, and reliability/recoverability of databases  Includes DBA tools, database replication software, development and optimization software, and database archiving / ILM.The ILM Segment of the DDMT Market  Just 4.6% in 2009, but the fastest growing segment; the only segment to show positive growth in that tough economic year.  Projected to show the greatest growth of all DDMT segments to 2014, with a forecast CAGR of 9.9% from $90 m to $188 m.© IDC Visit us at and follow us on Twitter: @IDC May-11 26
  27. 27. What’s IBM’s Share in the ILM MarketSegment Revenue ($M) Solix CA 4% Other 4% 12% HP 11% IBM Informatica 56% 13% Source: IDC, 2010 Total = $89.9 Million© IDC Visit us at and follow us on Twitter: @IDC May-11 27
  28. 28. Conclusions and RecommendationsConclusions  Data governance is critical because the utility and trustworthiness of enterprise data cannot be left to chance.  ILM addresses the key dimension of data size management in relation to data retention, and test data management.  These functions cannot be developed and maintained in-house.Recommendations  Users should carefully review their data access and retention policies and ensure that those policies are carried out.  In most cases, the best approach to ensuring data retention without bloating the databases is to employ database archiving.  Test data management is not trivial; find professionally developed data masking and subsetting tools.  IBM’s InfoSphere Optim leads the market in addressing these key ILM requirements.© IDC Visit us at and follow us on Twitter: @IDC May-11 28
  29. 29. © IDC Visit us at and follow us on Twitter: @IDC May-11
  30. 30. Information ManagementIBM InfoSphere Optim solutionsManaging data throughout its lifecycle in heterogeneous environments Discover Retire  Speed understanding and project time through relationship discovery within and across data sources  Understand sensitive data to protect and secure it Training Test Data Management  Easily refresh & maintain right sized non-production Discover environments, while reducing storage costs Understand  Improve application quality and deploy new Classify Subset functionality more quickly Data Masking Development  Protect sensitive information from misuse & fraud Production Mask  Prevent data breaches and associated fines Data Growth Management  Reduce hardware, storage & maintenance costs Test  Streamline application upgrades and improve application performance Application Retirement  Safely retire legacy & redundant applications while retaining the data Archive  Ensure application-independent access to archive data © 2011 IBM Corporation
  31. 31. Information ManagementManaging Data Across its Lifecycle Discover where Develop database Enhance performance data resides structures & code Classify & define data Create & refresh test Rationalize application Manage data growth portfolio and relationships data Enable compliance Report & retrieve with retention & e- Define policies Validate test results archived data discovery Discover & Develop & Optimize, Archive Consolidate & Define Test & Access Retire Information Governance Quality Management – Lifecycle – Security & Privacy © 2011 IBM Corporation
  32. 32. Information ManagementYou can’t govern what you don’t understand Discover & Define  Define business objects for archival and ? test data applications ? ? ? – Automation of manual activities ? accelerates time to value ? ? ? ?  Discover data transformation rules and ? ? ? heterogeneous relationships ? ? ? – Business insight into data ? ? ? relationships reduces project risk ? ?  Identify hidden sensitive data for privacy ? ? – Provides consistency across ? information agenda projects ? ? ? ? ? ? ? Distributed Data Landscape © 2011 IBM Corporation
  33. 33. Information ManagementEmploy effective test data management practices Develop & Test Production or Production Clone Subset & Mask 2TB 25 GB • Create targeted, right-sized test environments 25 GB Development • Substitute sensitive data with Unit Test fictionalized yet contextually accurate data • Easily refresh, reset and maintain test 50 GB environments 100 GB • Compare data to pinpoint and resolve Training application defects faster Integration Test • Accelerate release schedules © 2011 IBM Corporation
  34. 34. Information Management Archive historical data for data growth management Optimize, Archive & Access Production Data Archives Archive Reference Data Restored Data Historical Retrieve Historical Data Can selectively Current restore archived data records Universal Access to Application Data Mashup Center Application Data Find ODBC / JDBC XML Report Writer Data Archiving is an intelligent process for moving inactive or infrequently accessed data that still has value, while providing the ability to search and retrieve the data © 2011 IBM Corporation
  35. 35. Information ManagementRetire redundant and legacy applications Consolidate & Retire Preserve application data in its business context – Capture all related data, including transaction details, reference data & associated metadata – Capture any related reference data may reside in other application databases Retire out-of-date packaged applications as well as legacy custom applications – Leverage out-of-box support of packaged applications to quickly identify & extract the complete business object Shut down legacy system without a replacement – Provide fast and easy retrieval of data for research and reporting, as well as audits and e-discovery requests Infrastructure before Retirement Archived Data after Consolidation ` ` User Application Database Data User ` ` User Application Database Data User Archive Engine Archive Data ` ` User Application Database Data User © 2011 IBM Corporation
  36. 36. Information ManagementResources to Learn More!InfoSphere Optim Solutions page: –IDC Worldwide Database Development and Management Tools 2009 Vendor and Segment Analysis Report –Whitepaper: Control Application Data Growth Before It Controls Your Business –Whitepaper: Enterprise Strategies to Improve Application Testing –InfoSphere Optim Solutions for Custom and Packaged Applications Solution Brief © 2011 IBM Corporation
  37. 37. Q&A Session Please Submit Your Questions Now