Db2 tech talk using info sphere information server with db2


Published on

Db2 tech talk using info sphere information server with db2

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Db2 tech talk using info sphere information server with db2

  1. 1. © 2013 IBM Corporation 1 InfoSphere Information Server and DB2 Delivering trusted information in a timely fashion February 27, 2014 Presented by: Sriram Padmanabhan
  2. 2. © 2013 IBM Corporation 2 Need webcast troubleshooting help? Click attachments 1. The presentation for this Tech Talk: http://bit.ly/ttfilefeb14 2. Next steps and troubleshooting guide: click “Attachments“ in this webcast window A few details …. Sriram Padmanabhan Chief Architect, InfoSphere Information Server Today’s technical presenters . . . DB2 Tech Talk series host and today’s presenter: Rick Swagerman, Host and Today’s Presenter DB2 Language Architect
  3. 3. © 2013 IBM Corporation 3 Need webcast troubleshooting help? Click attachments Disclaimer The information contained in this presentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “as is”, without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other documentation. Nothing contained in this presentation is intended to, or shall have the effect of: • Creating any warranty or representation from IBM (or its affiliates or its or their suppliers and/or licensors); or • Altering the terms and conditions of the applicable license agreement governing the use of IBM software. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
  4. 4. © 2013 IBM Corporation Agenda  Brief Introduction to Information Integration and Governance  Integration Patterns – InfoSphere Data Integration basics – DB2 Connector – Performance Notes  DataClick: Enabling simpler self-service patterns  Governance  Summary 4 Need webcast troubleshooting help? Click attachments
  5. 5. © 2013 IBM Corporation Open Architecture/ Multiple Product Entry Points Information Ingestion and Integration Data Exploration Archive Real-time Analytics Information Governance, Security and Business Continuity Data Exploration Enterprise Warehouse Data Marts IBM Big Data and Analytics Reference Architecture Information Server DB2 Need webcast troubleshooting help? Click attachments5
  6. 6. © 2013 IBM Corporation Make Informed Decisions Uncover competitive advantages Identify new opportunities Rapid, easy access to big data, wherever it resides Easy categorization, indexing, discovery of big data to optimize its usage Definition and execution of governance appropriate to data value and intended use Acting on Insight Requires Confidence in Data Automated Integration Agile GovernanceVisual Context Take Bigger, Calculated Risks Information Integration & Governance for Big Data Need webcast troubleshooting help? Click attachments6
  7. 7. © 2013 IBM Corporation Delivering Business Value via an Integrated Platform Agile Integration Wherever your integration resides, integrate it quickly and flexibly Business Driven Governance Make decisions with confidence using trusted data at the point of impact Sustainable Quality Ensure information accuracy and quickly adapt to strategic business changes 7 Need webcast troubleshooting help? Click attachments
  8. 8. © 2013 IBM Corporation Agile Integration Patterns: InfoSphere DataStage and DB2 Sources DB2 DB2 Targets Transform Transform Sources DB2 Transform Files Need webcast troubleshooting help? Click attachments8
  9. 9. © 2013 IBM Corporation Integration: Transform and Deliver to Any System, improving Time-to-Value 9 Integrate and transform data on demand across multiple sources and targets … Transform  Satisfy the most complex transformation requirements with the most scalable runtime available  Transform and aggregate any data volume  Benefit from hundreds of built-in transformation functions  Leverage metadata-driven productivity and enable collaboration  Use a simple, web-based dashboard to manage your runtime environment  Manage your requirements for transformation activities to align with the business Need webcast troubleshooting help? Click attachments
  10. 10. © 2013 IBM Corporation10 InfoSphere DataStage  Accelerate development of integration processes  Delivers hundreds of pre-built re-usable transformation components and routines  Delivers a scalable platform to meet both batch and real-time demands  Collaborative, reusable and productive metadata driven development  Support for complex transformation across heterogeneous systems  Massively scalable architecture and performance Requirements Benefits DataStage Transform and aggregate any volume of information in batch or real time through visually designed logic Need webcast troubleshooting help? Click attachments10
  11. 11. © 2013 IBM Corporation DataStage Features – Runtime Flexible and scalable runtime from connectivity layer through transformation tasks to scale with massive data volumes Flexible ● Robust job runtime supporting capturing and logging of environment variables, parameter settings, job statistics, error condition details and debugging information. ● Support both batch and real-time processing styles. ● Build complex heterogeneous data integration tasks as web services (with Information Services Director) ● Ability to push processing to source or target databases to support ELT, TEL, TETLT, etc… alongside ETL (with Balanced Optimizer) ● Checkpoint/restart of data integration tasks ● Runtime column propagation Design logic once Run and scale anywhere Need webcast troubleshooting help? Click attachments11
  12. 12. © 2013 IBM Corporation DB2 Connector in InfoSphere DataStage  Supports DB2 UDB LUW versions 8, 9, 10, 10.5  Validated and Tested with DB2 10.5 BLU  Available from Information Server 8.1 – current version of Information Sever is 9.1.2  Replaces legacy plug-ins and operators: – DB2 UDB API Stage – DB2 UDB Enterprise Stage – DB2 UDB Load Stage  Collaboration with DB2 team to ensure strong integration  Beginning with DB2 9.5: • Tolerance testing – legacy DB2 plug-ins and operators • Tolerance testing AND exploitation of new functionality – DB2 Connector Need webcast troubleshooting help? Click attachments12
  13. 13. © 2013 IBM Corporation Advantages of DB2 Connector  A single stage for all DB2 access  Implements DB2 partitioning  Uses the DB2 CLI interfaces  Connects to multiple DB2 instances  Separate connection parameters for conductor and players  Loads client libraries dynamically to support DB2 8&9&10 concurrently  LOB transfer for arbitrarily large LOBs  XML Support  Resolves schema between design time and actual  Enhanced reject support  Metadata import into shared metadata repository  New improved stage editor accesses connector at design time Need webcast troubleshooting help? Click attachments13
  14. 14. © 2013 IBM Corporation DB2 Stage Editor & Connection Properties Full control over connection properties including instance, conductor and libraries. This permits explicitly referencing a particular client version. 14
  15. 15. © 2013 IBM Corporation Using the Connector with DB2 DPF  DataStage is also an MPP Parallel Processing engine like DB2 DPF  Possible to maximize parallelism for best performance using the DB2 Connector.  For DB2 DPF as Source (Reading data) – can leverage the nodenumber() function – Caution: must use this wisely to avoid redundant/duplicate work (spider queries)  For DB2 DPF as Target (Writing data) – Connector supports Parallel Loading via CLI and API – Buffered Inserts – can also use the nodenumber() clause wisely in Update Subselect or Delete Subselect statements. Need webcast troubleshooting help? Click attachments15
  16. 16. © 2013 IBM Corporation Setting up DB2 Connector for Parallel Reads The connector supports auto-generated and manually entered SQL statements. If the property Generate SQL is set to Yes, the connector generates a simple SELECT statement based on the table name provided in the property Table name and the columns specified in the columns grid. If the Generate SQL is set to No, a SELECT statement needs to be entered manually. To configure the connector to run in parallel the property Enable partitioned reads needs to be set to Yes. The child property Partitioned reads method needs to be set to DB2. The child property Table name needs to be set to the same table as the one used in the statement. When Generate SQL is set to No, another child property is enabled Generate partitioning SQL. Setting it to Yes instructs the connector to add or modify the WHERE clause by adding a nodenumber clause. Need webcast troubleshooting help? Click attachments16
  17. 17. © 2013 IBM Corporation Setting up DB2 Connector for Parallel Reads (Cont'd) The following screen shot shows how to set up the connector to run a manually entered SQL statement in parallel and to automatically add the partitioning clause. Need webcast troubleshooting help? Click attachments17
  18. 18. © 2013 IBM Corporation 18 Need webcast troubleshooting help? Click attachments Writing Data in Parallel The goal is the same as with parallel reads – leverage the parallelism of both DataStage and DB2 to achieve high performance and scalability Two ways to run the parallel writes: –Match the DB2 server parallelism – For example if the server has 88 partitions, there are 88 parallel writers in the connector. Each writer targets a different (and only one) partition –Reduced parallelism – For example if the server has 88 partitions, there could be less than 88 parallel writers (e.g. 22) To split the data for parallel processing DataStage uses components called partitioners. DataStage provides a number of generic partitioners (e.g Round robin, hash, etc) DB2 Connector provides a special partitioner that uses the same algorithm as the DB2 server.
  19. 19. © 2013 IBM Corporation DB2 Connector as Data Partitioner Conductor process DataStage EE DB2 Connector PXBridge db2nodes.cfg default.apt Partitioner process DB2 Connector PXBridge Part table_name job layout data row player # Player process #1 PXBridge DB2 Connector Player process #2 PXBridge DB2 Connector Partition #2 data row data row Partition #1 A B C Any partition A - Environment reconciliation phase B - Data row partitioning phase C,D - Data row insert phase D Need webcast troubleshooting help? Click attachments19
  20. 20. © 2013 IBM Corporation Bulk Load Capabilities  The DB2 Connector supports two different bulk loading technologies when loading to DB2 for LUW: CLI load and API load.  CLI load: – Slower than API load – Supports the XML and LOB data types  API load – Faster than CLI load – Does not support the XML and LOB data types – Preferred, and used by most customers  The user selects CLI load or API load via the DB2 Connector’s “Bulk load with LOB or XML column(s)” property. – Setting “Bulk load with LOB or XML column(s)” to Yes activates CLI load. This is the default. – Setting “Bulk load with LOB or XML column(s)” to No activates API load Need webcast troubleshooting help? Click attachments20
  21. 21. © 2013 IBM Corporation Writing Data in Parallel – Bulk Load Mode (Cont'd) The following screen shot shows the properties for controlling the load parallelism. Need webcast troubleshooting help? Click attachments21
  22. 22. © 2013 IBM Corporation 22 Need webcast troubleshooting help? Click attachments Deployment Scenarios – Running on DB2 Nodes The picture depicts the connector running on DB2 nodes on a server with 4 physical machines with 4 partitions each for a total of 16 partitions. DB2 Data Nodes Conductor ETL Node Partitions 0-3 Players 0-3 Partitions 4-7 Players 4-7 Partitions 8-11 Players 8-11 Partitions 12-15 Players 12-15 DataStage configuration file: Logical name (node) Physical name (fastname) Node pool (pool) conductor etl_0 “” Node0 db2_0 “DB2” Node1 db2_1 “DB2” Node2 db2_2 “DB2” Node3 db2_3 “DB2” etl_0 db2_0 db2_1 db2_2 db2_3
  23. 23. © 2013 IBM Corporation Deployment Scenarios – Running on ETL Nodes The following picture depicts the connector running on two ETL nodes. DB2 Data Nodes Conductor Partitions 4-7 Players 0-7 Unused Partitions 4-7 Unused Partitions 8-15 etl_0 Partitions 0-3 db2_0 db2_1 db2_2 db2_3 Players 8-15 etl_1 Need webcast troubleshooting help? Click attachments23
  24. 24. © 2013 IBM Corporation 24 Need webcast troubleshooting help? Click attachments Special Considerations for HA Systems In high availability (HA) systems (ETL cluster, or PDOA) a failing DB2 node can be replaced with a backup node upon a fail over which means that the real host name of a machine can change. 1. It is important to use virtual host names in the configuration steps when configuring DB2 clients as the actual host names can be changed when a backup machine is used. 2. In the APT configuration file make sure you specify the actual DB2 host names (non-virtual) 3. Make sure that all nodes are included in the APT configuration file, both active and backup. When a DB2 node fails, the job will fail too, and will need to be restarted once the backup nodes kick in. When the backup nodes are operational, the information in db2nodes.cfg will be updated to list the new host names automatically by DB2 server. The connector will detect the change and will look for the new nodes in the APT configuration file.
  25. 25. © 2013 IBM Corporation Balanced Optimization (aka Push Down Optimization or ELT)  Design using Data Flow tooling in DataStage  Invoke the Optimization engine from the design canvas.  Optimization engine uses simple Optimization Rules – Push down to Source only or target only or both – creates an equivalent representation of job in SQL.  This optimized job can be saved differently from original and can be deployed. Need webcast troubleshooting help? Click attachments25
  26. 26. © 2013 IBM Corporation Performance Need webcast troubleshooting help? Click attachments26
  27. 27. © 2013 IBM Corporation Data Warehouse Scenario Main DataStage Jobs Load_Trades Job Load_CashBalances Job • Associate each cash transaction with trade and account. • Typical DI job with standard processing operations such as filters, joins, lookups, transforms, sorts, grouping operations and multiple outputs. • I/O intensive • Validate and transform trade data then bulk load to target data warehouse • Typical DI job with multiple lookup and complex transformation • CPU intensive Need webcast troubleshooting help? Click attachments27
  28. 28. © 2013 IBM Corporation IS Module Performance Results (Throughput) with DB2 Job Execution Time (minutes) Data Volume (GB) Throughput (GB/hr/core) Remarks Load_Trades 55 143 19.5 CPU intensive Load_CashBalances 70 263 28.2 I/O intensive Raw Load 87.8 I/O intensive • Results were based on Information Server Module in the ISAS 7700 appliance. • Tests performed in 2Q 2011 using ISAS 7700 DB2 warehouse appliance Need webcast troubleshooting help? Click attachments28
  29. 29. © 2013 IBM Corporation ISAS 7700 IS Module – Taking Advantage of SSDs on I/O Intensive Workload With SSDs With Regular HDDs • I/O was highly utilized but not maxed out • CPU was efficiently utilized • Execution time (70 minutes) is much shorter than with regular HDDs • I/O was maxed out • CPU was under utilized • Execution time (103 minutes) is much longer than with SSDs Disk Disk CPU CPU * * 12 HDD spindles vs. 4 SSDs Need webcast troubleshooting help? Click attachments29
  30. 30. © 2013 IBM Corporation DataClick Enabling self-service patterns Need webcast troubleshooting help? Click attachments30
  31. 31. © 2013 IBM Corporation •Universal Connectivity includes traditional and big data sources •Purposefully designed for an end-to- end experience new user home screen including light- weight monitoring •Web-based Configuration browser based policy setup and config for the architect •Built in Governance InfoSphere Data Click Self-service Data Integration on demand 31 Key Points •Users need quick and easy access to information to support their analytical projects. Agile Integration •Traditional data integration tools are too complex to provide for self-service patterns. •Organizations need to avoid data sprawl, so governance best practices must be ensured. Need webcast troubleshooting help? Click attachments31
  32. 32. © 2013 IBM Corporation32 1st Click 2nd Click • Preselected sources and targets minimize user interaction • Checked tabs indicate that the required configuration is complete • Review configuration Execution!!! InfoSphere Data Click – Two Clicks to Data Integration Need webcast troubleshooting help? Click attachments
  33. 33. © 2013 IBM Corporation Information Server Provides Governance of DB2 assets Need webcast troubleshooting help? Click attachments33
  34. 34. © 2013 IBM Corporation Streamline Business & IT Understanding Database = DB2 Schema = NAACCT Table = DLYTRANS Column = TAXVL Data type = Decimal (14,2) Derivation: SUM(TRNTXAMT) ILM Policy: MSK_RNDM_INT Category: Costs Term: Tax Expense Full Name: Tax to be paid on Gross Income: “The expense due to taxes …..” Policy: Tax Policies Rule: Data must be masked for testing Data Integration Data Quality Master Data Management Big DataAnalytics Information Lifecycle Management Privacy & Security Semantic Terminology, Policies & Rules IBM Confidential Information34 Need webcast troubleshooting help? Click attachments
  35. 35. © 2013 IBM Corporation Cross tool impact analysis and traceability Assess impact of change and mitigate risks Show impact on downstream applications and BI reports Navigate through impacted areas and drill down Need webcast troubleshooting help? Click attachments35
  36. 36. © 2013 IBM Corporation Data Quality capabilities Understand  Investigate data sources  Discover hidden relationships and linkages between tables and data sources  Profile for anomalies Monitor  Measure success with KPIs  Dashboards, graphics and business level quality reports  Trend analysis, snapshots and user-defined views Cleanse  Standardize data formats  Match and deduplicate data from all business domains  Advanced survivorship and householding of data IBM is helping us create a trusted data layer that feeds all our applications and our analytic processes” HealthNow Need webcast troubleshooting help? Click attachments36
  37. 37. © 2013 IBM Corporation Conclusion  Information Integration and Governance is required to enable confidence in your DB2 data  InfoSphere Information Server provides a very powerful DB2 Connector capability – supports DB2 10.5 BLU as well –Leverages DB2 DPF in most optimal ways – Provides very high scalability and performance  Information Server has Governance, Data Click and Data Quality features to increase confidence of DB2 data. Need webcast troubleshooting help? Click attachments37
  38. 38. © 2013 IBM Corporation InfoSphere Information Integration and Governance Platform Information Integration and Governance Metadata, Business Glossary and Policy Management and Entity Analytics Privacy & Security Data Lifecycle Management Information Integration Master Data Management Data Quality • Extract, Transform, Load • Replicate • Federate • Standardize • Validate • Verify • Enrich • Match • Master multiple domains • Registry or transaction hub • Collaboratively author • Govern master data • Database Archiving • Test data management • Activity monitoring • Masking • Encryption • Redaction • Automated data discovery • Enterprise metadata repository • Business terminology defined in business glossary • Define, share and execute information governance policies • Information Governance project blueprints • Incremental context accumulation Need webcast troubleshooting help? Click attachments38
  39. 39. © 2013 IBM Corporation Master Data Management 20% market share leaderLeader! Information Integration & Data Quality 28% market share leaderLeader! Privacy & Security “Market share leader” - IDCLeader! Information Lifecycle Management 76% market share leaderLeader! InfoSphere is the Leader in Information Integration & Governance Need webcast troubleshooting help? Click attachments39
  40. 40. © 2013 IBM Corporation 40 Need webcast troubleshooting help? Click attachments DB2 Tech Talk: Using InfoSphere Information Server with DB2 Next Steps Roadmap Read the RedBooks • IBM Information Server: Integration and Governance for Emerging Data Warehouse Demands: ibm.co/1fV7NEe • InfoSphere DataStage Data Flow and Job Design: bm.co/1mI48CC Browse the information center for DB2 Connector • DB2 Connector Info center: ibm.co/1emXPz1 Learn the best practices • Information Server best practices web site: ibm.co/1bN7X4 Join the community • InfoSphere Information Server Tech Forum: ibm.co/1bN93gB • InfoSphere DataStage Tech Forum: ibm.co/1kawgfM • DB2 LUW Tech Forum: http://bit.ly/db2forumluw Reference Call IBM to schedule a demo or learn more • 1 800 966-9875 (U.S) • 1-888-746-7426 (Canada) • 1800-425-3333 (India) • Or visit http://www.ibm.com/planetwide/ for contact information worldwide IBM DB2 10.5 product page Ibm.com/db2 IBM DB2 10.5 Product features ibm.co/12c1PJz IBM InfoSphere Information Server product page ibm.co/1bN9kQM Step Three Step Two Step One Step Four
  41. 41. © 2013 IBM Corporation DB2 Tech Talk 41 Need webcast troubleshooting help? Click attachments
  42. 42. © 2013 IBM Corporation 42 Need webcast troubleshooting help? Click attachments Questions Listening in replay? Questions: www.sqltips4db2.com Click submit a question.
  43. 43. © 2013 IBM Corporation 43 Need webcast troubleshooting help? Click attachments Thanks for attending! Please rate the session Presentation download: bit.ly/ttfilefeb14 click Attachments in this webcast environment