Enterprise Data Management, or EDM, represents processes and technology that an organization uses to manage mission critical application data – like ERP or CRM applications, or even custom applications – throughout its lifecycle. EDM is a critical part of a company’s business strategy because of the value of the data that flows through the applications – data that represents the cumulative history of the business, and provides an essential foundation for making new decisions on an ongoing bases. Princeton Softech’s clients apply an EDM approach to solve key the business problems – management of data growth, data privacy, retention and discovery, application retirements, upgrades and migrations, and test data management
Note to Presenter: Keep focused on the big benefit customers receive by Archiving. Archiving is an intelligent process for moving inactive or infrequently-accessed data while keeping the ability to retrieve the data. Ideally, archiving is done transparently, with no impact to users or applications, and with data remaining online for fast, easy retrieval when needed.
Here is how Optim works: This is a typical example of a Production environment prior to archiving. Both Active and Inactive data is stored in the Production environment, taking up most of the space on the Production Server. <Click> Optim safely moves the inactive or historical data to an archive, which can be stored in a variety of environments. <Click> Optim provides access to this data through several different methods, including Report Writers such as Cognos and Crystal Reports, XML, ODBC/JDBC, application based access (Oracle, Siebel, etc.) or even MS Excel. <Click> Finally, data can be easily retrieved to an application environment when additional business processing is required. As you can see, Optim’s broad capabilities for enterprise data management help give CIOs a comprehensive solution for dealing with data growth.
The relational databases powering today’s enterprise applications typically rely on complex data models. This graphic depicts a data model for an enterprise application. It is difficult to define the desired set of tables and selection criteria from which to archive, correctly traverse the data model and return the correct result. <Click> Retaining the referential integrity and business context of your data is essential for archive and restore processing. Anything less than 100% accuracy is unacceptable, especially for data retention compliance and satisfying audit requirements. <Click> You need to archive the entire business object, as in the case of a closed accounts payable transaction, so that the object is easily retrievable in the unlikely case of an audit or discovery request.
A recent Computerworld survey found that: (get source from Eric) 42% of IT managers/staffers did not know if any preparation was done for the new e-discovery rules 32% said their company was not at all prepared Databases have now become fair game in legal proceedings, and companies are now expected to have the ability to retain, locate, and produce data upon request
Production support is a very different environment because the various operators must have access to multiple layers of data in order to accomplish their task of adding functionality, ensuring data integrity, certify product quality and train users. Because they can access raw data that spans multiple “production roles”, there is the need to remove the “sensitivity” of the data in order to protect “personal privacy” or “enterprise trade secrets”.
This brings us to answer to the question.. What is the easiest way to expose private data? The answer is the testing environment. Some of you may be scratching your heads. This is the reaction I see almost half of the time I speak to people in person about this issue. For others, you may be nodding your heads in agreement. But, if you are like me, you want to hear some backup to this assertion. First, according to Richard Mogul, Senior Analyst with Gartner, in a report entitled, Danger Within Protecting your Company from Internal Security Attacks, 70% of data breaches occur internally. That is a staggering number, considering that this statistic points to the loyalty of your own employees. So then we need to consider how it is so easy for employees to get identifiable data on your customers. That speaks to the nature of how test databases are created and used. A test environment may be created to test a new version of an ERP application for example. Most often, companies simply clone the production database in order to create the test environment. By definition, this means personally-identifying information about your client is propagated from a secure production environment to a more vulnerable test environment. Application data needs to be tested outside the Production environment so that if any errors occur during testing, the live Production system is not affected. This sometimes requires multiple testing environments. You should now be asking yourself, Do these developers need identifiable test data? The answer is no. What happens if an employee decides he wants to take some of the ss# s from this test data for his own purposes? Your company has now just become the victim of a data breach. What if a developer placed a copy of that test database on his laptop, and then had that laptop stolen when he went to eat lunch at a fast food restaurant? Your company, as well as its clients, or even employees, have now just become victims of a data breach. And how about that test database that was sent to have test activities performed offshore, but got into the wrong hands? Your company has now just become the victim of a data breach. Now, I think it is perfectly clear why the test database has become the easiest way to expose private data. , it is important to note that the PCI standard states, Production data (real credit card numbers) cannot be used for testing or development. Failure to abide by this regulation can result in a $500,000 fine from Visa/Mastercard.
Strategies for Successful Data Governance Eileen Killeen IBM Software Group
No part of this presentation may be reproduced or transmitted in any form by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of IBM
IBM customers are responsible for ensuring their own compliance with legal requirements. It is the customer's sole responsibility to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer's business and any actions the customer may need to take to comply with such laws.
IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
The information contained in this documentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information provided, it is provided “as is” without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this documentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
What is Data Governance? (Strategic View) Data Governance is the political process of changing organizational behaviour to enhance and protect data as a strategic enterprise asset Implementing Data Governance can be fundamental change to the methods & rigor both Business and Information Technology use to define, manage and use of data
The core objectives of a governance program are:
Guide information management decision-making
Ensure information is consistently defined and well understood
Increase the use and trust of data as an enterprise asset
Improve consistency of projects across an enterprise
State sues global management consulting company over stolen backup tape. Unencrypted tape contained personal information on 58 taxpayers and nearly 460 state bank accounts. Over 45 million credit and debit card numbers stolen from large retailer. Estimated costs $1bn over five years (not including lawsuits). $117m costs in 2Q ’07 alone.
What data should I be saving, for how long and for what reasons?
What data should I be deleting?
How am I going to find the data when I need it?
What do I do with the data when I no longer need it?
What is the most appropriate solution to meet my archiving needs?
What is the cost/benefit analysis to support an archiving solution acquisition?
What makes retaining/retrieving data stored in databases so difficult?
A singular piece of data is not enough ... you need to store and produce the complete business object.
How do you classify?
How long do you retain?
Can you delete? If so, when?
I need the records ... how do I retrieve them?
Who defines retention policies?
How does Archiving Work? Current Open Access to Application Data Production Application Report Writer XML ODBC / JDBC Historical Archive Archives Reporting Data Reference Data Historical Data Retrieve Retrieved
Production level data is typically one of the most expensive storage platforms
Migrate and store data according to its evolving business value (ILM)
Use tiered storage strategies to your advantage to maximize cost efficiencies
Utilize the storage you already have (including tape!)
Clones Data Multiplier Effect 12 TB Before 6 After 6 TB Reclaimed! 12 TB Before Archiving Production After Archiving Production Test Clones 1 TB 1TB 1 TB 1 TB 1 TB 1 TB 2 TB 2 TB 2 TB 2 TB 2 TB 2 TB Test = Total Size = 12 TB Production Backup Disaster Recovery Development Test Training 2 TB 2 2 2 2 2
More MIPS/processors required to process large data repositories
Example: 1 TB database that supports 500 concurrent users might require an eight-processor server with 4 GB of memory to achieve optimal performance. The same application that runs a database half that size might require only six processors and 2 GB of memory.
Noel Yuhanna, Forrester Research, Database Archiving Remains An Important Part Of Enterprise DBMS Strategy, 8/13/07
“ Given the impact that regulatory compliance is having and the increased role electronic records play in corporate litigation cases, deleting records without ensuring future access or considering usage requirements puts organizations at considerable risk.”
Source: Enterprise Storage Group Because you need the data…
Electronic discovery (also called e-discovery or ediscovery) refers to any process in which electronic data is sought, located, secured and searched with the intent of using it as evidence in a civil or criminal legal case.
In the process of electronic discovery, data of all types can serve as evidence. This can include text, images, calendar files, databases , spreadsheets, audio files, animation, Web sites and computer programs.
Lawyers ... Ya Gotta Love ’em Legal Costs of E-Discovery Debra Logan, “Mapping Technology and Vendors to the Electronic Discovery Reference Model,” GartnerResearch, ID Number: G00153110, November 9,2007. $1000-$2100/hour Produce the Data $120-$350/hour Review the Data $200-$300/hour Collect the Data $100-$300/hour Preserve the Data $200/hour Identify Appropriate Data
Released “Personal Data” lifecycle Representative asking prices found recently on cybercrime forums. Source: USA TODAY research 10/06 Open Market Price How Personal Data is released Source Information Week 3/2006 $7 Paypal account Log-on and Password $7 $25 Credit Card Number with Security Code and Expiration Date $100 Social Security Card $150 Birth Certificate $150 Drivers License $500 Credit Card Number With PIN Source: Ponemon Institute Enterprise Cost: $197
Replication of production safeguards not sufficient
Need “realistic” data to test accurately
Why Production and Production-Support are different? “ Now, here’s a dirty little secret … Unless you have formal consent from the people whose data you are using, then simply sampling the production system for test data is illegal: you are using the data for purposes for which it was not provided” Bloor research – 14th March 2006 Server Accessibility, Theft of disk. … Physical Security Access control, Hacking Monitoring … Operational Security Roles, Processes, Compliance, … Business Security Analyst Manager Supplier Customer User Production Develop QA Training Desensitized Data Production Decommission Role Access Developer Outsourcer QA Training Production Support
AKA data masking, depersonalization, desensitization, obfuscation or data scrubbing
Technology that helps conceal real data
Scrambles data to create new, legible data
Retains the data's properties, such as its width, type, and format
Common data masking algorithms include random, substring, concatenation, date aging
Used in Non-Production environments as a Best Practice to protect sensitive data
Masking is transparent to the outside world Card Holder and Card Number have been masked Your Credit Card SANFORD P. BRIGGS 12/09 4536 6382 9896 5200 GOOD THRU > Your Credit Card EUGENE V. WHEATLEY 12/09 GOOD THRU > 4212 5454 6565 7780