Your SlideShare is downloading. ×
Why innovation matters for IBM
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Why innovation matters for IBM


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Although there is disagreement about the exact definition of data governance, the consequences of ineffective data governance are well known: lack of control on one of your organizations most critical assets - its data – which ultimately leads to increased risk, cost inefficiencies, regulatory noncompliance, and potentially costly data breaches. An example of such is the data breach that occurred at Her Majesty's Revenue and Customs (HMRC) agency in the UK last October 2007. Two computer discs owned by HRMC containing data relating to child benefits went missing. The two discs contained the personal details of all families in the United Kingdom claiming child benefits, thought to be approximately 25 million people (nearly half of the country's population). The discs were sent by junior staff as unrecorded internal mail. After not receiving the disks at the destination - and then not being able to find them after an extensive search - they announced its loss to the public on November 20 th 2007 due to that countries disclosure laws for lost data. The personal data on the missing discs included names, addresses, and dates of birth of children, together with the National Insurance numbers and bank details of many of their parents. Unfortunately, the HMRC breach is just one of many such occurrences that have occurred all over the world. In fact, it is estimated that over 245 million customer and employee records have been leaked since 2005 in the US alone. [] This situation is only one example, but clearly highlights the following: 1) how easily such a disaster can occur even unintentionally 2) the stark consequences of such mistakes and 3) the importance of effective data governance. References :
  • Optim Integrated Data Management represents a integrated approach to managing data across an organization. IDM is made up of products and capabilities from Princeton Softech’s Optim products, Data Studio and the DB2 & IMS Tools portfolio. The focus is on data assets from IBM, Oracle, Microsoft, packaged applications and more. Manage data across its lifespan – from design to deletion Manage data across complex IT environments Multiple, interrelated databases, applications and platforms Facilitate cross-functional collaboration Within IT Among Line of Business, Compliance functions Across disparate skill sets Optimize business value Respond quickly to emerging opportunities Improve quality of service Reduce cost of ownership Mitigate risk
  • Rational Data Architect is more than a data modeling tool. It is also a: -documentation tool. It helps you to create diagrams of existing database structures -Information Integration tool. Helps to define federation concepts -XML mapping tool. Map database schemas to SOA structures -Code Development tool. Create valid DB2 SQL code. IBM Data Studio is the product that does all this outside of RDA. -Traceability tool. Know why, what and when for every change. New release features integrations with IBM Rational Software Architect, Eclipse 3.2 and IBM Information Server; additional mappings and expanded support for XML, DB2 V9, Sybase, Informix and mySQL.
  • From:
  • Top-level overview
  • Manage static SQL deployment: This release builds out additional capabilities to enhance developer and DBA collaboration and manage static SQL execution Empower developers or DBAs to customize captured SQL before binding Select which SQL statements are bound Delete SQL statements from packages Replace existing SQL with an equivalent, and potentially more optimal, SQL without modifying the program source code Enable developers to give DBAs deployment-ready files for package binding Improve feedback on bind errors including SQL statements within the package that caused bind to fail Simplify bind file development (automate SQL file discovery within a single project for bindprops) and manage binds across jar, war, and ear files used for deployment Avoid unnecessary binds when redeploying a jar when only a subset of contained applications has changed
  • 4-TuneSQL10 The same as that on the picture
  • Transcript

    • 1. Integrated Data Management Vision and Roadmap Curt Cotner IBM Fellow Vice President and CTO for IBM Database Servers [email_address]
    • 2. What do Businesses Have? A Collection of Disparate, Single-Purpose Products CA ERwin IBM InfoSphere Data Architect Embarcadero ER/Studio Sybase PowerDesigner Design IBM DB2 tools BMC Patrol Quest Central Oracle Diagnostic Pack Operate Oracle Tuning Pack Solix EDMS IBM Optim Data Growth Solution Optimize Quest Spotlight Quest TOAD IBM Data Studio Developer Oracle JDeveloper Develop Embarcadero Rapid SQL IBM Comparison Tool for DB2 z/OS Embarcadero Change Manager Data Studio Administrator Deploy Oracle Change Management Pack Quest InTrust Guardium IBM Optim Govern Oracle Vault
    • 3. The gaps create risk …
      • Loss of customers
        • Average customer churn rate up 2.5% after a breach
      • Loss of revenue
        • $197 USD per customer record leaked
        • Average cost was ~ $6.3 million / breach in this study
        • Average cost for financial services organizations was 17% higher than average
      • Fines, penalties or inability to conduct business based on non-compliance
        • PCI
        • Sarbanes-Oxley (SOX) ‏
        • HIPAA
        • Data Breach Disclosure Laws
        • Gramm-Leach-Bliley Act
        • Basel II
      Source: “2007 Annual Study: Cost of a Data Breach” , The Ponemon Institute
    • 4. Driven by the increasing numbers of physical systems, system management has become the main component of IT costs and is growing rapidly Many Servers, Much Capacity, Low Utilization = $140B unutilized server assets
    • 5. What do Businesses Need? An integrated environment to span today’s flexible roles
      • Manage data throughout its lifecycle
        • From design to sunset
      • Manage data across complex IT environments
        • Multiple interrelated databases, applications and platforms
      • Facilitate cross-functional collaboration
        • Within IT
        • Among Line of Business, Compliance functions
        • Across disparate skill sets
      • Optimize business value
        • Respond quickly to emerging opportunities
        • Improve quality of service
        • Reduce cost of ownership
        • Mitigate risk
    • 6. Introducing Integrated Data Management
      • Enabling organizations to more efficiently and effectively
      • Respond to emergent, data-intensive business opportunities
      • Meet service level agreements for data-driven applications
      • Comply with data privacy and data retention regulations
      • Grow the business while driving down total cost of ownership
      An integrated, modular environment to design, develop, deploy, operate, optimize and govern enterprise data throughout its lifecycle on the System z platform
    • 7. Integrated Data Management
      • Deliver increasing value across the lifecycle , from requirements to retirement
      • Facilitate collaboration and efficiency across roles, via shared artifacts automation and consistent interfaces
      • Increase ability to meet service level agreements, improving problem isolation, performance optimization, capacity planning, and workload and impact analysis
      • Comply with data security, privacy, and retention policies leveraging shared policy, services, and reporting infrastructure
      Develop Design Deploy Optimize Operate Govern Models Policies Metadata
    • 8. The broadest range of capabilities for managing the value of your data throughout its lifetime InfoSphere Data Architect Data Studio Developer Optim Test Data Management Optim Data Growth Solutions Optim Data Privacy Solutions DB2 Performance Expert and Extended Insight Feature Data Studio pureQuery Runtime DB2 Audit Management Expert Database Encryption Expert Data Studio Administrator DB2 Optim Query Tuner (a.k.a. Optimization Expert) Develop Design Deploy Optimize Operate Govern Policies Models Metadata
    • 9. Model-driven Governance – Automating Governance Policies
      • Our Design tool has been extended to include application context information about the customer’s data. For example:
        • semantic meaning (SSN, home phone number, medical privacy data, credit card number, PIN code, etc.)
        • masking algorithm that should be used to present the data in reports
      Data Studio Administrator automatically checks that encryption is used for the table containing CCN due to PCI DSS rules. Data Studio Administrator would create fine grained access control rules to prevent DBAs or other unauthorized people from viewing CCN values. Data Architect specifies column CCN contains a credit card number, and the data masking algorithm. Data Studio Developer would prevent copy of rows containing CCN column values from PROD to TEST due to PCI DSS rules, unless Optim product is used to anonymize data. Data Architect emits runtime metadata for Optim so that it knows which columns to anonymize, etc. Develop Design Deploy Optimize Operate Govern Standards Models Policies Data Architect Design Discover, import, model, relate, standardize
    • 10. InfoSphere Data Architect
      • InfoSphere Data Architect is a collaborative, data design solution to discover, model, relate, and standardize diverse data assets.
      • Key Features
      • Create logical and physical data models
      • Discover, explore, and visualize the structure of data sources
      • Discover or identify relationships between disparate data sources
      • Compare and synchronize the structure of two data sources
      • Analyze and enforce compliance to enterprise standards
      • Support across heterogeneous databases
      • Integration with the Rational Software Delivery Platform, Optim, IBM Information Server, and IBM Industry Models
    • 11. Data Governance Protect Privacy De-identify Data Encrypt Data Secure Data Prevent Access Restrict Access Monitor Access Audit Data Audit Access Audit Privileges Audit Users Optim Data Privacy Solution Database Encryption Expert Label Based Access Control Trusted Context Data Studio Developer and pureQuery Runtime Retain Data Data Archival Data Retention Data Retirement DB2 Audit Management Expert Tivoli Security Information and Event Manager Optim Data Growth Solution Manage Lifecycle Model policies Integrate tools InfoSphere Data Architect Optim Test Data Management
    • 12. Data Studio Administrator
      • GA July 2008 for DB2 LUW servers
        • Compare, Sync and Alter
        • DDL roundtrip support
        • Extended Alter
        • Impact Analysis
        • Change model
        • Physical modeling,
        • Unified Change Project
        • Advanced Data Movement (HPU)
        • Scheduling & Enhanced Advanced Deployment
    • 13. High Performance Unload
      • What is it?
        • A utility for unloading data at very high speed (minimum wall clock time). Also can extract individual tables from DB2 backups. While unloading, it can repartition the data for even faster, parallel reloading on a different system which has a different partitioning layout from the one being unloaded from.
      • What’s its value to customers?
        • Reduced costs by speeding up operations which require the unloading of large amounts of DB2 data.
        • Been used in a number of disaster recovery situations by extracting individual tables from DB2 backups.
        • Speeding up the process of migrating a DB2 server to new hardware.
      • New features and functions:
        • System migration performed entirely by HPU. The unloading and repartitioning of the data, sending of it across the network and loading using DB2 LOAD command all handled by HPU.
          • Today, you have to build complicated scripts to do this process
        • Improved autonomics. One memory tuning parameter instead of several. Tell HPU how much memory it can use, and HPU will figure out the best way to use it.
        • Simplification of syntax by eliminating some keywords for specifying certain HPU functions through the use of “templates” to define the output file names.
          • Existing syntax also supported for backward compatibility
      Modified 12/07/2006
    • 14. Optimizing Your WebSphere Applications with Data Studio
    • 15. What’s so Great About DB2 Accounting for CICS Apps? z/OS LPAR CICS AOR1 Txn1 - Pgm1 - Pgm2 CICS AOR2 TxnA - PgmX - PgmY DB2PROD CICS AOR3 Txn1 - Pgm1 - Pgm2 App CPU PLAN Txn1 2.1 TN1PLN TxnA 8.3 TNAPLN
      • DB2 Accounting for CICS apps allows you to study performance data from many perspectives:
      • By transaction (PLAN name)
      • By program (package level accounting)
      • By address space (AOR name)
      • By end user ID (CICS thread reuse) This flexibility makes it very easy to isolate performance problems, perform capacity planning exercises, analyze program changes for performance regression, compare one user’s resource usage to another’s, etc.
    • 16. JDBC Performance Reporting and Problem Determination – Before pureQuery Application Server DB2 or IDS A1 A2 A5 A3 A6 A4 USER1 USER1 USER1 User CPU PACKAGE USER1 2.1 JDBC USER1 8.3 JDBC USER1 22.0 JDBC What is visible to the DBA? - IP address of WAS app server - Connection pooling userid for WAS - app is running JDBC or CLI What is not known by the DBA? - which app is running? - which developer wrote the app? - what other SQL does this app issue? - when was the app last changed? - how has CPU changed over time? - etc. Data Access Logic Persistence Layer DB2 Java Driver EJB Query Language
    • 17. What’s so Great About Data Studio pureQuery Accounting for WebSphere Applications? z/OS LPAR CICS AOR2 TxnA (PLANA) - PgmX - PgmY App CPU TxnA 2.1 TxnB 8.3
      • Data Studio and pureQuery provide the same granularity for reporting WebSphere’s DB2 resources that we have with CICS:
      • By transaction (Set Client Application name )
      • By class name (program - package level accounting)
      • By address space (IP address)
      • By end user ID (DB2 trusted context and DB2 Roles) This flexibility makes it very easy to isolate performance problems, perform capacity planning exercises, analyze program changes for performance regression, compare one user’s resource usage to another’s, etc.
      Unix or Windows WAS TxnA (Set Client App=TxnA) - ClassX - ClassY
    • 18. Simplifying Problem Determination Scenario
      • Application Developer
      • Available for each db access
        • SQL text generated
        • Access path
        • Cost estimates
        • Estimated response time
        • Elapsed & CPU time
        • Data transfer (getpages)
        • Tuning advice
      • Database Administrator
      • Available for each SQL
        • Application name
        • Java class name
        • Java method name
        • Java object name
        • Source code line number
        • Source code context
        • J-LinQ transaction name
        • Last compile timestamp
      Java Profiling pureQuery DRDA Extentions
    • 19. Using pureQuery to Foster Collaboration and Produce Enterprise-ready Apps Application Server Catalog data for SQL Application Meta data DB2 or IDS Prod A4 A1 A1 A6 A6 A2 A2 A3 A3 A4 A4 A5 A5 A1 A4 A5 Performance Data Warehouse Application Developer Database Administrator A1 A6 A2 A3 A4 A5 Quickly compare unit test perf results to production Use pureQuery app metadata as a way to communicate in terms familiar to both DBA and developer Application Meta data DB2 or IDS Dev System A1 A6 A2 A3 A4 A5 A1 A4 A5
    • 20. Data Studio Developer -- pureQuery Outline Speed up problem isolation for developers – even when using frameworks
      • Capture application-SQL-data object correlation (with or without the source code)
      • Trace SQL statements to using code for faster problem isolation
      • Enhance impact analysis identifying application code impacted due to database changes
      • Answer “Where used” questions like “Where is this column used within the application?”
      • Use with modern Java frameworks e.g. Hibernate, Spring, iBatis, OpenJPA
    • 21. Java Persistence Technologies with pureQuery JPA API pureQuery API JPA Runtime pureQuery Runtime JDBC w/pureQuery IBM Database pureQuery Metadata, Manageability Spring iBatis JDBC SQLJ High Speed API
    • 22. Client Optimization Improve Java data access performance for DB2 – without changing a line of code
      • Captures SQL for Java applications
        • Custom-developed, framework-based, or packaged applications
      • Bind the SQL for static execution without changing a line of code
        • New bind tooling included
      • Delivers static SQL execution value to existing DB2 applications
        • Making response time predictable and stable by locking in the SQL access path pre-execution, rather than re-computing at access time
        • Limiting user access to tables by granting execute privileges on the query packages rather than access privileges on the table
        • Aiding forecasting accuracy and capacity planning by capturing additional workload information based on package statistics
        • Drive down CPU cycles to increase overall capability
      • Choose between dynamic or static execution at deployment time, rather than development time
    • 23. Data Studio pureQuery Runtime for z/OS
      • In-house testing shows double-digit reduction in CPU costs over dynamic JDBC
      • IRWW – an OLTP workload, Type 4 driver
      • Cache hit ratio between 70 and 85%
      • 15% - 25% reduction on CPU per txn over dynamic JDBC
    • 24. Have You Heard of SQL Injection?
    • 25. Toughest issue for Web applications – Problem diagnosis and resolution Web Browser Users Web Server Application Server DB2 Server Business Logic Data Access Logic Persistence Layer DB2 Java Driver JDBC Package EJB Query Language
    • 26. Customer Job Roles – A Barrier to a “Holistic View” Application Server DB Server Data Access Logic Persistence Layer DB Java Driver JDBC Package EJB Query Language WebSphere Connection Pool Business Logic 1 3 5 4 2 Application Developer System Programmer DBA Network Admin
    • 27. How do we plan to help?
    • 28. Scenario It seems that the first application server has a problem. Double-click to drill-down. In this situation, all applications are equally affected, and the problem seems not to be in the data server.
    • 29. Scenario - continued Double-click to drill-down and display detail information. Most of the time is spent for „WAS connection pool wait“ time.
    • 30. Scenario – continued 5 second wait time indicates that the maximum number of allowed connections is not sufficient… … which becomes also evident when comparing the parameters and metrics of this client with other clients.
    • 31. Future enhancements to Data Studio and pureQuery
    • 32. Heat Chart Dashboard Alerts SLAs In-flight analysis  Database: Accounting TOP by DS elapsed DS CPU time Physical I/O Sort time - + - + - + SELECT TIME FROM UNIVERSE Stmt text Analyze Time distribution Force application Stop SQL sorting DS Proc USER CPU SYSTEM CPU Unacc wait DS sorting DB2 Performance Expert futures -- Associate SQL with Java Source Statement text schema E2E elapsed  occurrences sort time phys. I/O SELECT TIME FROM UNIVERSE SAP3 132.13 1323 123.32 1.303 SELECT SALARY FROM PAYMENT … SYSIBM 323.4 221 11.3 32.1 DELETE FROM ACCOUNT WHERE AID = 3… PROC 23.3 435 32322.3 32.1 TOP 3 currently running SQL Statements  Application DS user ID KARN Client IP addr / hostname Client user ID KARN Client workstation name TPKARN Client application name Jawaw.exe Client accounting N/A application name Online banking application contact [email_address] package West.OLBank class Account method Transfer() source line 314 Resource usage Query cost estimates 18.456 Buffer Pools Data – hit ratio (%) 43.4% Data – physical reads / min 4323 Index – hit ratio (%) 54.2% Index – physical reads / min 3214 Statement information X Statement elapsed time Current 132.13 sec last day 239.40 sec last week 15.60 sec
    • 33. OpenJPA and Hibernate -- SQL Query Generation JPA Query Select emp_obj(), dept_obj() SQL Select * from EMP WHERE … Select * from DEPT WHERE … JPA query transform
      • Hibernate and OpenJPA often rewrite queries
      • No database statistics are used – entirely heuristic!!!
      • Can often result in poorly performing queries
    • 34. pureQuery -- More Visibility, Productivity, and Control of Application SQL
      • Capture SQL
      • Share, review, and optimize SQL
      • Revise/optimize SQL and validate equivalency without changing the application
      • Bind for static execution to lock in service level or run dynamically
      • Restrict SQL to eliminate SQL injection
      Capture Review Optimize Revise Restrict
    • 35. Visualize execution metrics Execute, tune, share, trace, explore SQL Replace SQL without changing the application Position in Database Explorer Visualize application SQL
    • 36. OpenJPA, Hibernate, iBatis -- Batch Queries JPA Query new dept_obj … new emp_obj … new dept_obj … new emp_obj … SQL INSERT INTO DEPT … INSERT INTO EMP … INSERT INTO DEPT … INSERT INTO EMP … JPA query rewrite
      • OpenJPA, Hibernate, and iBatis “batch” queries to reduce network traffic
      • Batches must contain executions of a single prepared statement
      • Referential integrity constraints can change batch size:
        • 2 network trips without RI (one for EMP, one for DEPT)
        • 4 network trips if RI disables batching
      • pureQuery can convert the above example to a single network trip, regardless of whether RI is used or not…
    • 37. Support for Oracle in Data Studio Developer
    • 38. Oracle Object Management support in Data Studio
      • Create/Alter/Drop for all objects
      • Tables, Synonyms, Sequences
      • Functions
      • Views/materialized views
      Physical objects Performance objects Integrity objects Procedures and Functions User Defined Types Events Space
      • Create/Alter/Drop
      • Tablespace, Extents, Free Lists, logging
      • LOB Attributes
      • Buffer pools
      • Create/Alter/Drop
      • Partitions (Range, Hash, List)
      • Indexes
      • Create/Alter/Drop
      • Constraints (primary, unique, check, foreign)
      • Triggers
      • Before/After/Foreach types
      • Trigger events
      • Create/Alter/Drop
      • Table Types
      • Object Types
      • Array Types
      Strengthening Oracle Support
    • 39. Design Lifecycle
      • Logical Modeling
        • Capture business requirements
        • Represent an organization’s Data
        • Abstract complex heterogeneous environments
        • Often associated with a Domain Model
          • Dedicated vocabulary
      • Physical Modeling
        • Platform specific implementation
          • Tables, Constraints, Data Types
          • Disk and Security requirements
          • Caching and Fast access strategies
        • Leverage and Validate against platform key features and constraints
    • 40. Advanced heterogeneous support
      • Oracle Support
      • Visualize
      • Design Privileges
      • Storage and Data Partition
      • Advanced Code Generation
      • Analyze Impact
      • Validate
      IBM Data Studio IBM ORACLE Physical Data Modeling Logical Data Modeling
    • 41. Visualize Oracle Data Sources
      • High fidelity display of the Catalog Information
      • Load on Demand technology
        • Instantaneous connection
        • Fast retrievals
      • Enable Physical Data Model transformation
    • 42. Managing Oracle Tables Tree-Based Representation Object Properties Editor SQL and Results of the Execution Context-Sensitive Actions
    • 43. Oracle Privileges Support
      • Physical Model enables Design capability
        • Grant appropriate privileges and roles to users
      • More detailed display allows finer-grained control
    • 44. Oracle Storage
      • Storage properties display
      • Ability to design Table Spaces
    • 45. Data Partition
      • Table and Materialized View support
        • Range partition
        • List partition
        • Hash partition
        • Composite partition
    • 46. PL/SQL Development
      • Integrated Query Editor support
        • Content Assist
        • Parser support (2009) with Error reporting
    • 47. PL/SQL Deployment
      • Execution Configuration
        • Separation of concerns
          • Better decouple configuration from definition and implementation Deployment and Debugging
        • Runtime Target initialization
        • Authorization configuration
    • 48. Optimization Expert
    • 49. Understanding Query with Query Annotation Original and transformed Query Formatted, reorganized query text Annotations (catalog stats, cost estimation)
    • 51. Provide hints for AccessType
    • 52. ACCESS PLAN GRAPH (APG) The Access Plan Graph screen displays the access path on the right, and relevant statistics on the left.
    • 53. Query Advisor Predicate that should be considered for re-write to get better performance Re-write advice and details
    • 54. Access Path Advisor
    • 56. Index Advisor Index Recommendations DDL to create the new index statement
    • 57. Common Connection Repository
      • Enhancing the value propositions on Team support
        • Centralized connection properties for sharing between DBA and Developers
        • Improve usability and up-and-running scenarios
        • Give controls to DBAs on connection properties settings
        • Eliminates the need to configure each database server on each client desktop
        • “ push down” of client properties to allow DBAs to control and override application behaviors
      • Key Features
        • Integrated solution to Eclipse Data Source Explorer
        • Integration with upcoming Web DBA tooling
        • Create or connect to Connection Repository
        • Connect to database using existing definitions
        • Create new definition
        • Logical grouping of connection definitions
    • 58. Data & Object Movement
      • Value Proposition –
        • Provide for the copying of database objects and data between homogeneous and heterogeneous databases within Data Studio
      • Key Features
        • Copy objects at various levels – complete databases to a fixed number of rows from a single table
        • Action performed in Data Source Explorer – Copy/Paste and Drag/Drop
        • Can automatically copy rows from related tables using:
          • RI in database
          • Data Architect model
          • Optim application models
          • Data Relationship Analyzer
        • Can optionally annonymize the rows using Optim Test Database Manager
    • 59. pureQuery Runtime improvements
      • “ client optimization” for non-Java SQL applications (convert dynamic SQL to static)
        • .NET apps
        • CLI apps (including Ruby, PHP, etc.)
    • 60. Where to get IBM Data Studio ?
      • IBM Data Studio
          • FAQs / Tutorials
          • Downloads
          • Forum / Blogs
          • Join the IBM Data Studio user community
    • 61. Disclaimer
        • © Copyright IBM Corporation 2009 All rights reserved.
        • U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
      • IBM, the IBM logo,, DB2, and WebSphere are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at
      • Other company, product, or service names may be trademarks or service marks of others.