Although there is disagreement about the exact definition of data governance, the consequences of ineffective data governance are well known: lack of control on one of your organizations most critical assets - its data – which ultimately leads to increased risk, cost inefficiencies, regulatory noncompliance, and potentially costly data breaches. An example of such is the data breach that occurred at Her Majesty's Revenue and Customs (HMRC) agency in the UK last October 2007. Two computer discs owned by HRMC containing data relating to child benefits went missing. The two discs contained the personal details of all families in the United Kingdom claiming child benefits, thought to be approximately 25 million people (nearly half of the country's population). The discs were sent by junior staff as unrecorded internal mail. After not receiving the disks at the destination - and then not being able to find them after an extensive search - they announced its loss to the public on November 20 th 2007 due to that countries disclosure laws for lost data. The personal data on the missing discs included names, addresses, and dates of birth of children, together with the National Insurance numbers and bank details of many of their parents. Unfortunately, the HMRC breach is just one of many such occurrences that have occurred all over the world. In fact, it is estimated that over 245 million customer and employee records have been leaked since 2005 in the US alone. [www.privacyrights.org] This situation is only one example, but clearly highlights the following: 1) how easily such a disaster can occur even unintentionally 2) the stark consequences of such mistakes and 3) the importance of effective data governance. References : http://www.guardian.co.uk/politics/2007/nov/20/economy.personalfinancenews http://en.wikipedia.org/wiki/2007_UK_Child_Benefit_data_scandal
Optim Integrated Data Management represents a integrated approach to managing data across an organization. IDM is made up of products and capabilities from Princeton Softech’s Optim products, Data Studio and the DB2 & IMS Tools portfolio. The focus is on data assets from IBM, Oracle, Microsoft, packaged applications and more. Manage data across its lifespan – from design to deletion Manage data across complex IT environments Multiple, interrelated databases, applications and platforms Facilitate cross-functional collaboration Within IT Among Line of Business, Compliance functions Across disparate skill sets Optimize business value Respond quickly to emerging opportunities Improve quality of service Reduce cost of ownership Mitigate risk
Rational Data Architect is more than a data modeling tool. It is also a: -documentation tool. It helps you to create diagrams of existing database structures -Information Integration tool. Helps to define federation concepts -XML mapping tool. Map database schemas to SOA structures -Code Development tool. Create valid DB2 SQL code. IBM Data Studio is the product that does all this outside of RDA. -Traceability tool. Know why, what and when for every change. New release features integrations with IBM Rational Software Architect, Eclipse 3.2 and IBM Information Server; additional mappings and expanded support for XML, DB2 V9, Sybase, Informix and mySQL.
Manage static SQL deployment: This release builds out additional capabilities to enhance developer and DBA collaboration and manage static SQL execution Empower developers or DBAs to customize captured SQL before binding Select which SQL statements are bound Delete SQL statements from packages Replace existing SQL with an equivalent, and potentially more optimal, SQL without modifying the program source code Enable developers to give DBAs deployment-ready files for package binding Improve feedback on bind errors including SQL statements within the package that caused bind to fail Simplify bind file development (automate SQL file discovery within a single project for bindprops) and manage binds across jar, war, and ear files used for deployment Avoid unnecessary binds when redeploying a jar when only a subset of contained applications has changed
4-TuneSQL10 The same as that on the picture
Integrated Data Management Vision and Roadmap Curt Cotner IBM Fellow Vice President and CTO for IBM Database Servers [email_address]
What do Businesses Have? A Collection of Disparate, Single-Purpose Products CA ERwin IBM InfoSphere Data Architect Embarcadero ER/Studio Sybase PowerDesigner Design IBM DB2 tools BMC Patrol Quest Central Oracle Diagnostic Pack Operate Oracle Tuning Pack Solix EDMS IBM Optim Data Growth Solution Optimize Quest Spotlight Quest TOAD IBM Data Studio Developer Oracle JDeveloper Develop Embarcadero Rapid SQL IBM Comparison Tool for DB2 z/OS Embarcadero Change Manager Data Studio Administrator Deploy Oracle Change Management Pack Quest InTrust Guardium IBM Optim Govern Oracle Vault
Average customer churn rate up 2.5% after a breach
Loss of revenue
$197 USD per customer record leaked
Average cost was ~ $6.3 million / breach in this study
Average cost for financial services organizations was 17% higher than average
Fines, penalties or inability to conduct business based on non-compliance
Data Breach Disclosure Laws
Source: “2007 Annual Study: Cost of a Data Breach” , The Ponemon Institute
Driven by the increasing numbers of physical systems, system management has become the main component of IT costs and is growing rapidly Many Servers, Much Capacity, Low Utilization = $140B unutilized server assets
What do Businesses Need? An integrated environment to span today’s flexible roles
Manage data throughout its lifecycle
From design to sunset
Manage data across complex IT environments
Multiple interrelated databases, applications and platforms
Deliver increasing value across the lifecycle , from requirements to retirement
Facilitate collaboration and efficiency across roles, via shared artifacts automation and consistent interfaces
Increase ability to meet service level agreements, improving problem isolation, performance optimization, capacity planning, and workload and impact analysis
Comply with data security, privacy, and retention policies leveraging shared policy, services, and reporting infrastructure
Develop Design Deploy Optimize Operate Govern Models Policies Metadata
The broadest range of capabilities for managing the value of your data throughout its lifetime InfoSphere Data Architect Data Studio Developer Optim Test Data Management Optim Data Growth Solutions Optim Data Privacy Solutions DB2 Performance Expert and Extended Insight Feature Data Studio pureQuery Runtime DB2 Audit Management Expert Database Encryption Expert Data Studio Administrator DB2 Optim Query Tuner (a.k.a. Optimization Expert) Develop Design Deploy Optimize Operate Govern Policies Models Metadata
Our Design tool has been extended to include application context information about the customer’s data. For example:
semantic meaning (SSN, home phone number, medical privacy data, credit card number, PIN code, etc.)
masking algorithm that should be used to present the data in reports
Data Studio Administrator automatically checks that encryption is used for the table containing CCN due to PCI DSS rules. Data Studio Administrator would create fine grained access control rules to prevent DBAs or other unauthorized people from viewing CCN values. Data Architect specifies column CCN contains a credit card number, and the data masking algorithm. Data Studio Developer would prevent copy of rows containing CCN column values from PROD to TEST due to PCI DSS rules, unless Optim product is used to anonymize data. Data Architect emits runtime metadata for Optim so that it knows which columns to anonymize, etc. Develop Design Deploy Optimize Operate Govern Standards Models Policies Data Architect Design Discover, import, model, relate, standardize
InfoSphere Data Architect is a collaborative, data design solution to discover, model, relate, and standardize diverse data assets.
Create logical and physical data models
Discover, explore, and visualize the structure of data sources
Discover or identify relationships between disparate data sources
Compare and synchronize the structure of two data sources
Analyze and enforce compliance to enterprise standards
Support across heterogeneous databases
Integration with the Rational Software Delivery Platform, Optim, IBM Information Server, and IBM Industry Models
Data Governance Protect Privacy De-identify Data Encrypt Data Secure Data Prevent Access Restrict Access Monitor Access Audit Data Audit Access Audit Privileges Audit Users Optim Data Privacy Solution Database Encryption Expert Label Based Access Control Trusted Context Data Studio Developer and pureQuery Runtime Retain Data Data Archival Data Retention Data Retirement DB2 Audit Management Expert Tivoli Security Information and Event Manager Optim Data Growth Solution Manage Lifecycle Model policies Integrate tools InfoSphere Data Architect Optim Test Data Management
A utility for unloading data at very high speed (minimum wall clock time). Also can extract individual tables from DB2 backups. While unloading, it can repartition the data for even faster, parallel reloading on a different system which has a different partitioning layout from the one being unloaded from.
What’s its value to customers?
Reduced costs by speeding up operations which require the unloading of large amounts of DB2 data.
Been used in a number of disaster recovery situations by extracting individual tables from DB2 backups.
Speeding up the process of migrating a DB2 server to new hardware.
New features and functions:
System migration performed entirely by HPU. The unloading and repartitioning of the data, sending of it across the network and loading using DB2 LOAD command all handled by HPU.
Today, you have to build complicated scripts to do this process
Improved autonomics. One memory tuning parameter instead of several. Tell HPU how much memory it can use, and HPU will figure out the best way to use it.
Simplification of syntax by eliminating some keywords for specifying certain HPU functions through the use of “templates” to define the output file names.
Existing syntax also supported for backward compatibility
Optimizing Your WebSphere Applications with Data Studio
What’s so Great About DB2 Accounting for CICS Apps? z/OS LPAR CICS AOR1 Txn1 - Pgm1 - Pgm2 CICS AOR2 TxnA - PgmX - PgmY DB2PROD CICS AOR3 Txn1 - Pgm1 - Pgm2 App CPU PLAN Txn1 2.1 TN1PLN TxnA 8.3 TNAPLN
DB2 Accounting for CICS apps allows you to study performance data from many perspectives:
By transaction (PLAN name)
By program (package level accounting)
By address space (AOR name)
By end user ID (CICS thread reuse) This flexibility makes it very easy to isolate performance problems, perform capacity planning exercises, analyze program changes for performance regression, compare one user’s resource usage to another’s, etc.
JDBC Performance Reporting and Problem Determination – Before pureQuery Application Server DB2 or IDS A1 A2 A5 A3 A6 A4 USER1 USER1 USER1 User CPU PACKAGE USER1 2.1 JDBC USER1 8.3 JDBC USER1 22.0 JDBC What is visible to the DBA? - IP address of WAS app server - Connection pooling userid for WAS - app is running JDBC or CLI What is not known by the DBA? - which app is running? - which developer wrote the app? - what other SQL does this app issue? - when was the app last changed? - how has CPU changed over time? - etc. Data Access Logic Persistence Layer DB2 Java Driver EJB Query Language
What’s so Great About Data Studio pureQuery Accounting for WebSphere Applications? z/OS LPAR CICS AOR2 TxnA (PLANA) - PgmX - PgmY App CPU TxnA 2.1 TxnB 8.3
Data Studio and pureQuery provide the same granularity for reporting WebSphere’s DB2 resources that we have with CICS:
By transaction (Set Client Application name )
By class name (program - package level accounting)
By address space (IP address)
By end user ID (DB2 trusted context and DB2 Roles) This flexibility makes it very easy to isolate performance problems, perform capacity planning exercises, analyze program changes for performance regression, compare one user’s resource usage to another’s, etc.
Unix or Windows WAS 220.127.116.11 TxnA (Set Client App=TxnA) - ClassX - ClassY
Using pureQuery to Foster Collaboration and Produce Enterprise-ready Apps Application Server Catalog data for SQL Application Meta data DB2 or IDS Prod A4 A1 A1 A6 A6 A2 A2 A3 A3 A4 A4 A5 A5 A1 A4 A5 Performance Data Warehouse Application Developer Database Administrator A1 A6 A2 A3 A4 A5 Quickly compare unit test perf results to production Use pureQuery app metadata as a way to communicate in terms familiar to both DBA and developer Application Meta data DB2 or IDS Dev System A1 A6 A2 A3 A4 A5 A1 A4 A5
Data Studio Developer -- pureQuery Outline Speed up problem isolation for developers – even when using frameworks
Capture application-SQL-data object correlation (with or without the source code)
Trace SQL statements to using code for faster problem isolation
Enhance impact analysis identifying application code impacted due to database changes
Answer “Where used” questions like “Where is this column used within the application?”
Use with modern Java frameworks e.g. Hibernate, Spring, iBatis, OpenJPA
Java Persistence Technologies with pureQuery JPA API pureQuery API JPA Runtime pureQuery Runtime JDBC w/pureQuery IBM Database pureQuery Metadata, Manageability Spring iBatis JDBC SQLJ High Speed API
Client Optimization Improve Java data access performance for DB2 – without changing a line of code
Captures SQL for Java applications
Custom-developed, framework-based, or packaged applications
Bind the SQL for static execution without changing a line of code
New bind tooling included
Delivers static SQL execution value to existing DB2 applications
Making response time predictable and stable by locking in the SQL access path pre-execution, rather than re-computing at access time
Limiting user access to tables by granting execute privileges on the query packages rather than access privileges on the table
Aiding forecasting accuracy and capacity planning by capturing additional workload information based on package statistics
Drive down CPU cycles to increase overall capability
Choose between dynamic or static execution at deployment time, rather than development time
Toughest issue for Web applications – Problem diagnosis and resolution Web Browser Users Web Server Application Server DB2 Server Business Logic Data Access Logic Persistence Layer DB2 Java Driver JDBC Package EJB Query Language
Customer Job Roles – A Barrier to a “Holistic View” Application Server DB Server Data Access Logic Persistence Layer DB Java Driver JDBC Package EJB Query Language WebSphere Connection Pool Business Logic 1 3 5 4 2 Application Developer System Programmer DBA Network Admin
Scenario It seems that the first application server has a problem. Double-click to drill-down. In this situation, all applications are equally affected, and the problem seems not to be in the data server.
Scenario - continued Double-click to drill-down and display detail information. Most of the time is spent for „WAS connection pool wait“ time.
Scenario – continued 5 second wait time indicates that the maximum number of allowed connections is not sufficient… … which becomes also evident when comparing the parameters and metrics of this client with other clients.
Future enhancements to Data Studio and pureQuery
Heat Chart Dashboard Alerts SLAs In-flight analysis Database: Accounting TOP by DS elapsed DS CPU time Physical I/O Sort time - + - + - + SELECT TIME FROM UNIVERSE Stmt text Analyze Time distribution Force application Stop SQL sorting DS Proc USER CPU SYSTEM CPU Unacc wait DS sorting DB2 Performance Expert futures -- Associate SQL with Java Source Statement text schema E2E elapsed occurrences sort time phys. I/O SELECT TIME FROM UNIVERSE SAP3 132.13 1323 123.32 1.303 SELECT SALARY FROM PAYMENT … SYSIBM 323.4 221 11.3 32.1 DELETE FROM ACCOUNT WHERE AID = 3… PROC 23.3 435 32322.3 32.1 TOP 3 currently running SQL Statements Application DS user ID KARN Client IP addr / hostname TPKARN.de.ibm.com Client user ID KARN Client workstation name TPKARN Client application name Jawaw.exe Client accounting N/A application name Online banking application contact [email_address] package West.OLBank class Account method Transfer() source line 314 Resource usage Query cost estimates 18.456 Buffer Pools Data – hit ratio (%) 43.4% Data – physical reads / min 4323 Index – hit ratio (%) 54.2% Index – physical reads / min 3214 Statement information X Statement elapsed time Current 132.13 sec last day 239.40 sec last week 15.60 sec
OpenJPA and Hibernate -- SQL Query Generation JPA Query Select emp_obj(), dept_obj() SQL Select * from EMP WHERE … Select * from DEPT WHERE … JPA query transform
Hibernate and OpenJPA often rewrite queries
No database statistics are used – entirely heuristic!!!
Can often result in poorly performing queries
pureQuery -- More Visibility, Productivity, and Control of Application SQL
Share, review, and optimize SQL
Revise/optimize SQL and validate equivalency without changing the application
Bind for static execution to lock in service level or run dynamically
Restrict SQL to eliminate SQL injection
Capture Review Optimize Revise Restrict
Visualize execution metrics Execute, tune, share, trace, explore SQL Replace SQL without changing the application Position in Database Explorer Visualize application SQL
OpenJPA, Hibernate, iBatis -- Batch Queries JPA Query new dept_obj … new emp_obj … new dept_obj … new emp_obj … SQL INSERT INTO DEPT … INSERT INTO EMP … INSERT INTO DEPT … INSERT INTO EMP … JPA query rewrite
OpenJPA, Hibernate, and iBatis “batch” queries to reduce network traffic
Batches must contain executions of a single prepared statement
Referential integrity constraints can change batch size:
2 network trips without RI (one for EMP, one for DEPT)
4 network trips if RI disables batching
pureQuery can convert the above example to a single network trip, regardless of whether RI is used or not…
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
IBM, the IBM logo, ibm.com, DB2, and WebSphere are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.