Optim test data management for IMS 2011
Upcoming SlideShare
Loading in...5
×
 

Optim test data management for IMS 2011

on

  • 1,760 views

 

Statistics

Views

Total Views
1,760
Views on SlideShare
1,760
Embed Views
0

Actions

Likes
1
Downloads
56
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This presentation is the Essentials of Test Data Management part of the InfoSphere Information Lifecycle Management Solutions
  • We are going to cover the following: -Information Governance: Review -What is Test Data Management -Role of Test Data Management in the Testing Discipline -Risks and Challenges of Poor Test Data Management -Best Practices in Test Data Management -Data Privacy Concerns with Test Data -IBM InfoSphere Optim Test Data Management Solution -Conclusion
  • This slide you have seen in the Information Lifecycle Management presentation. There are typically hundreds or even thousands of different systems throughout an organization. Information can come in from many places (transaction systems, operational systems, document repositories, external information sources), and in many formats (data, content, streaming). Wherever it comes from, there are often meaningful relationships between various sources of data. We manage all this information in our systems, integrate to build warehouses and master the data to get single views and analyze it to make business decisions. This is a supply chain of information, flowing throughout the organization. Integration information, ensuring its quality and interpreting it correctly is crucial to using the information to make better decisions. Information must be turned into a trusted asset, and governed to maintain the quality over its lifecycle.
  • We went through the requirements for Information lifecycle management. We are focusing on Develop and Test. Specifically efficiently creating the test & development environments (and protecting sensitive data within), effectively validating test results and quickly & securely deploying the application
  • How our enterprises creating test data today…manually or just cloning their entire production to obtain their test database. The downside of cloning your entire production is that you now have a data growth problem and uses significant storage. In addition, you have a privacy issues because you have exposed sensitive data to developers and testers using production data for testing.
  • The business benefits of test data management: More time for testing In many organizations, 30-40% of test script execution is spent on manufacturing new test data…and much of this is done manually today. Automating Test Data Management will reduce the amount of time spent creating new data thereby allowing for the execution of more tests Reduce cost Maximize allocated disk space Catch errors earlier in the testing cycle because now you have realistic test data to test with. Shift errors from production to test Increase data quality Enforce data ownership Test Data Management offers role driven security to support level segmentation of the development and testing teams Reduce data dependencies across test sets Multiple test sets often use the same data, but different tests can negatively impact other tests using the same data. Test Data Management allows for the creation of an unlimited number of test data sets and can create unique IDs each time to ensue clean data is used when testing
  • Why is it important to mask sensitive information….some examples: -Hackers obtained personal information on 70 million subscribers to Sony PlayStation . See article: http://online.wsj.com/article/SB10001424052748704587004576245131531712342.html 'LizaMoon' Mass SQL Injection Attack Escalates Out of Control. See article: http://www.eweek.com/c/a/Security/LizaMoon-Mass-SQL-Injection-Attack-Escalates-Out-of-Control-378108/ -Federal Aviation Administration: Exposes unprotected test data to a third party http://fcw.com/articles/2009/02/10/faa-data-breach.aspx Release of thousands of classified documents by WikiLeaks founder Julian Assange jeopardizes U.S. national security. US Army launches investigation. http://www.mcclatchydc.com/2010/12/23/105763/army-wikileaks-probe-could-lead.html
  • Ever since the inception of Information Technology (aka Electronic Data Processing) it has become commonly accepted to allow a certain percentage of IT staff to have access to the production environment. These "trusted employees" were carefully screened and usually in close proximity to executive management due to the confidentiality of critical sensitive corporate data. Originally, this was a practical matter and was voluntarily implemented by the enterprise. Over the years, the onslaught of international Data Privacy Legislation has made this a compliance matter as well. Today's large, multi-national enterprise is faced with numerous cross-border data privacy exposures. Additionally with the deployment of third-party contractors, there is further separation from the traditional "trusted employee". Data Masking provides development teams with meaningful test data, without exposing sensitive private information. Static data masking is the most common and most tradition approach. Static data masking extracts rows from production databases, conceal data values that ultimately get stored in the columns in the test databases. The concealed values are physically stored in the target databases. Dynamic data masking (a term coined by Gartner), is an emerging technology that performs data obfuscation at the presentation layer in real time. Implemented at the SQL protocol layer, operating as a database listener, in-bound SQL from any application is inspected and then dynamically re-written to include the appropriate masking function. The result is data masking at the presentation layer without having to change the underlying database or the application source code.
  • We went through the requirements for Information lifecycle management. We are focusing on Develop and Test. Specifically efficiently creating the test & development environments (and protecting sensitive data within), effectively validating test results and quickly & securely deploying the application
  • Most companies are still struggling with the first step of understanding their complex heterogeneous data landscapes for test data management. – with the resulting impact on the overall quality of applications. Some of the challenges are knowing what data is needed for test cases, lack of understanding of where data is located and how the data is related, limited understanding of the confidential data elements. It’s cost prohibitive to conduct manual analysis and hand coding.
  • -Test Data Management allows development teams to accelerate testing activities on a project -Test Data Management exploits production data while ensuring security of confidential data -Providing testers and developers with access to test data can improve operational efficiency and optimize resources on a project -A comprehensive Test Data Management solution is needed to minimize cost and shorten development cycles
  • You want to point customers to the InfoSphere Optim ibm.com page, solution sheet, whitepaper and case study on test data management.
  • Thank you!

Optim test data management for IMS 2011 Optim test data management for IMS 2011 Presentation Transcript

  • InfoSphere Optim Test Data Management Solution– IMS Focus Peter Costigan – Product Line Manager, Optim Solutions 9/28/2011
  • Agenda
    • Information Governance Overview
    • Risks and Challenges of Poor Test Data Management
    • Best Practices in Test Data Management
    • InfoSphere Optim Test Data Management
    • Data Privacy Concerns with Non-Production Data
    • IMS and z/OS Considerations
    • Other InfoSphere Optim Solutions: Discovery, Archiving, Application Retirement
    • Conclusion
  • Mastering information across the Information Supply Chain Transactional & Collaborative Applications Business Analytics Applications External Information Sources Trusted  Relevant  Governed Analyze Integrate Manage Cubes Streams Big Data Master Data Content Data Streaming Information Information Governance Data Warehouses Content Analytics Govern Quality Security & Privacy Lifecycle Standards Integrate & Cleanse
  • Requirements to manage data across its lifecycle Validate test results Define policies Report & retrieve archived data Enable compliance with retention & e-discovery Move only the needed information Integrate into single data source Create & refresh test data Manage data growth Classify & define data and relationships Develop database structures & code Enhance performance Discover where data resides Develop & Test Discover & Define Optimize, Archive & Access Consolidate & Retire Information Governance Core Disciplines Lifecycle Management
  • How test data creation is often accomplished
    • Uses more storage than needed, multiple times
    • Production data is a privacy risk
    • Data model changes are expected in Dev/Test, but require significant manual rework
    • Takes much time to create and refresh
    • No way to compare to original after test is complete
    • Cannot span multiple data sources/applications
    • Developer/Tester downtime when sharing data accessibility
    • Simple to do
    • Requires little knowledge of the data model or infrastructure
    • Creates an exact duplication of production
    Negatives Positives Clone Production Database Test Database Development
  • Test Data Management Best Practices
    • TDM refers to the need to manage data used in testing and other non-production environments
    • Extract related subsets of production data that are targeted to functionality under test
    • De-identify / mask related test data to protect privacy
    • Quickly and easily refresh test environments
    • Edit data to create error and boundary conditions
    • Compare “before” and “after” images of test data
    Benefits : Improving application quality & customer satisfaction
  • Optim Captures Complete Business Objects Business data is related across a wide variety of data sources
  • InfoSphere Optim Test Data Management Solution 100 GB 25 GB 50 GB
    • Create targeted, right-sized test environments
    • Automate support for Data Model changes
    • Replace sensitive data with masked data
    • Refresh, reset and maintain test environments
    • Compare and resolve application defects
    • Accelerate release schedules
    Production or Production Clone 25 GB 2TB Development Unit Test Training Integration Test Mask / Remap Insert / Update / Load Compare Extract Related subsets
  • Business benefits of Test Data Management
    • More time for testing
      • In many organizations, 30-40% of test script execution is spent on manufacturing new test data. Test Data Management will reduce the amount of time spent creating new data thereby allowing for the execution of more tests
    • Reduce cost
      • Maximize allocated disk space
      • Catch errors earlier in the testing cycle
      • Shift errors from production to test
    • Increase data quality
      • Refreshing test data from a baseline will minimize the amount of manual intervention currently required when creating new test data reducing triaging efforts and increasing test repeatability
    • Enforce data ownership
      • Often the “honor system” and spreadsheets are used to control test data ownership. Test Data Management offers role driven security to support level segmentation of the development and testing teams
    • Reduce data dependencies across test sets
      • Multiple test sets often use the same data, but different tests can negatively impact other tests using the same data. Test Data Management allows for the creation of an unlimited number of test data sets and can create unique IDs each time to ensue clean data is used when testing
  • TDM Business Value Assessment: Detailed Financial Analysis
  • Sensitive Production Data: What’s the risk? Hundreds of thousands of secret reports regarding US wars in Iraq and Afghanistan published on WikiLeaks. December 2010: A private in the US military, downloaded top secret military documents and passed them to journalist for publication. This puts US national security at risk as well as the lives of those named in reports. Unprotected test data sent to and used by test/development teams as well as third-party consultants. February 2009: An FAA server used for application development & testing was breached, exposing the personally identifiable information of 45,000+ employees. SQL injection is fast becoming one of the biggest and most high profile web security threats. April 2011 : A mass SQL injection attack that initially compromised 28,000 websites shows no sign of slowing down. Known as LizaMoon, this malicious code is after anything stored in a database. Hackers obtained personal information on 70 million subscribers. April 2011: Malicious outsiders stole name, address (city, state, zip), country, email address, birth date, PlayStation Network/Qriocity password and login, and handle/PSN online ID, and possibly credit card numbers from 70 million Sony PlayStation users.
  • What is data masking?
    • Definition Method for creating a structurally similar but inauthentic version of an organization's data. The purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required.
    • Requirement Effective data masking requires data to be altered in a way that the actual values cannot be determined or reengineered, functional appearance is maintained .
    • Other Terms Used Obfuscation, scrambling, data de-identification
    • Commonly masked data types Name, address, telephone, SSN/national identity number, credit card number
    • Methods
      • Static Masking: Extracts rows from production databases, obfuscating data values that ultimately get stored in the columns in the test databases
      • Dynamic Masking: Masks specific data elements on the fly without touching applications or physical production data store
  • InfoSphere Optim Data Masking Solution / Option Example 2 Example 1 Referential integrity is maintained with key propagation Data is masked with contextually correct data to preserve integrity of test data PersNbr FstNEvtOwn LstNEvtOwn 27645 Elliot Flynn 27645 Elliot Flynn Event Table PersNbr FstNEvtOwn LstNEvtOwn 10002 Pablo Picasso 10002 Pablo Picasso Event Table Personal Info Table PersNbr FirstName LastName 08054 Alice Bennett 19101 Carl Davis 27645 Elliot Flynn Personal Info Table PersNbr FirstName LastName 10000 Jeanne Renoir 10001 Claude Monet 10002 Pablo Picasso
    • Lookup values
    • Generic mask
    • Arithmetic expressions
    • Concatenated expressions
    • Date aging
    • String literal values
    • Character substrings
    • Random or sequential numbers
    Data masking techniques include: Patient Information Patient No. SSN Name Address City State Zip 112233 123-45-6789 Amanda Winters 40 Bayberry Drive Elgin IL 60123 123456 333-22-4444 Erica Schafer 12 Murray Court Austin TX 78704
    • Maintain value of test data
      • Reduce risk of data breaches
    • Satisfy Privacy regulations
  • What is IMS Data to InfoSphere Optim?
    • IMS = Hierarchical Database
      • Database consists of segments
      • Segments are related (physically)
    • Optim uses a relational model of tables, rows and columns
    • Optim Distributed uses Middleware to access IMS. More tied to relational model.
    • Optim z/OS uses native (DL/I) access to IMS data.
    -- ---- ---- ---- ------- ---- EMPLOYEE -- ---- ---- ---- ------- ---- DEPARTMENT -- ---- ---- ---- ------- ---- -- ---- ---- ---- ------- ---- JOB
  • InfoSphere Optim z/OS IMS Definitions
    • Describes physical layout of segment
    • Create from COBOL or PL/I copybook
      • Associated with IMS segment
      • Definition stored in the Optim Directory
    • Relate to other tables (DB2 or Legacy) via Optim Relationship
    • Segment treated as virtual DB2 table by any Optim process
    IMS DB Optim Directory EMPLOYEE VENDITEM OPT.PROD. PSTDEPDB VSAM File OPT.PROD. VENDITEM copybooks Legacy Table Definition(s) Legacy Table Definition(s) Legacy Table Definition(s) IMS Definitions Maps Legacy Tables Relationships Definitions
  • InfoSphere Optim z/OS Platform Access to Data Sources DB2 IMS VSAM / Seq
    • Native Client Access
    InfoSphere Optim & DB2 for z/OS
    • Excluded for IMS/VSAM/Seq:
    • TDM Compare
    • TDM Edit
    • Archive
    • Application Retirement
  • InfoSphere Optim Distributed Platform Access to Data Sources Data sources / tables exposed as Nicknames Classic Federation ODBC Client Client Client
    • Native Client Access
    • Leverage Middleware
  • InfoSphere Optim z/OS Requirements for IMS / VSAM / Sequential
    • Available:
      • IMS V12 Support (Optim z/OS V6 and V7)
      • Support for masking data in fixed length arrays (OCCURS)
      • IMS Sequential Dependent (SDEP) Segment Support
      • Support multiple record layouts for an IMS segment
      • Batch IMS/VSAM/Seq Table definition utility
      • Date/Time/Timestamp data types in IMS/VSAM/Seq Table Definitions
      • IMS Compression Exit
    • High Priority:
      • VSAM, Sequential and IMS Related Compare
      • Support for masking data in variable length arrays (ODO)
      • More flexible Optim relationship support
      • Tester productivity enhancements via Self-Service
      • Improvements in unkeyed segment support (over time)
      • Improvements in IMS access path selection (over time)
      • Extract IMS data during IMS Unload
      • Archive IMS, VSAM and Sequential natively on z/OS
      • Common Eclipse-based UI (Optim Designer and Manager)
  • Requirements to manage data across its lifecycle Validate test results Define policies Report & retrieve archived data Enable compliance with retention & e-discovery Move only the needed information Integrate into single data source Create & refresh test data Manage data growth Classify & define data and relationships Develop database structures & code Enhance performance Discover where data resides Develop & Test Discover & Define Optimize, Archive & Access Consolidate & Retire Information Governance Core Disciplines Lifecycle Management
  • Discovery: You can’t manage what you don’t understand ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
    • Challenges:
      • How do I know what data is needed for test cases
      • Lack of understanding of where data is located and how the data is related
      • Limited understanding of confidential data elements
      • Cost prohibitive to conduct manual analysis and hand coding
    • Result:
      • Lack of agility in testing
      • Poor data governance
      • Bad data = Bad business decisions
      • Inadvertent exposure of sensitive information
  • InfoSphere Discovery Speeds Understanding Data Table 1 Table 25 The Discovery Engine analyzes data values to automatically discover the columns that relate rows across data sources, and the columns which contain sensitive data . IBM InfoSphere Discovery Hit Rate: 98% X - Row Member SS # Age Phone Sex 1 595846226 123-45-6789 15 (123) 456-7890 M 2 567472596 138-27-1604 8 (138) 271-6037 F 3 540450092 154-86-4196 22 (154) 864-1961 M 4 514714372 173-44-7900 55 (173) 447-8996 F 5 490204164 194-26-1648 4 (194) 261-6476 F 6 466861109 217-57-3046 66 (217) 573-0453 M 987,623 444629628 243-68-1812 25 (243) 681-8107 F 987,624 423456789 272-92-3629 87 (272) 923-6280 M ID Demo1 595846226 0 567472596 1 540450091 2 514714372 3 490204164 1 466861109 0 444629628 3 423456789 2
  • InfoSphere Optim Data Growth Solution Compressed Archives 2 - 4 Years Active/Historical Online InfoSphere Optim
    • Business Value:
    • Saves Production storage costs
    • Improves Production performance
    • Manage Archive Files through their lifecycle: retention policy compliance
    • Mitigates risks of removing data from Prod.
    Archive Restore Non DBMS Retention Platform ATA File Server EMC Centera™, DR550, Etc. 4 - 6 Years On/Near-Line Archive Native access Additional Options ODBC / JDBC XML SQL Excel Access Off-line Retention Platform CD,Tape,Optical, WORM, IBM TSM, NetApp NearStore® SnapLock™, IBM Total Storage® solutions (including the DR550) EMC Centera™. 6+ Years Off-Line Archive U N I V E R S A L A C C E S S Production Data 1 - 2 Years Current Data
  • InfoSphere Optim Application Retirement
    • Preserve application data in its business context
    • Retire out-of-date packaged applications as well as legacy custom applications
    • Shut down legacy system without a replacement
    Infrastructure before Retirement Archived Data after Consolidation ` User Archive Data Archive Engine ` User ` User ` User Database Application Data ` User Database Application Data ` User Database Application Data
  • Conclusion
    • Test Data Management allows development teams to accelerate testing activities on a project
    • Test Data Management exploits production data while ensuring security of confidential data
    • Providing testers and developers with access to test data can improve operational efficiency and optimize resources on a project
    • A comprehensive Test Data Management solution is needed to minimize cost and shorten development cycles
  • Learn more
    • Product Family Webpage
    • Solution Sheet: InfoSphere Test Data Management Solution brief
    • Whitepaper: Integrated Strategies to Improve Application Testing
    • Case Study: InfoSphere Test Data Management
  •