Db2 tech talk real world warehousing for technical professionals


Published on

IBM DB2 for Warehousing Professionals

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Db2 tech talk real world warehousing for technical professionals

  1. 1. © 2013 IBM Corporation 1 Jessica Rockwood, Warehouse Developer and Performance Expert March 28, 2013 Real-World Data Warehousing for Tech Pros: What’s Current? Where’s It Going? March 28, 2013 Presented by: Jessica Rockwood, Warehouse Developer and Performance Expert
  2. 2. © 2013 IBM Corporation DB2 Tech Talk 2 Need webcast troubleshooting help? Click attachments Rick Swagerman, Host Language Architect, DB2 for Linux, UNIX, and Windows at IBM Rick‘s Blog: www.sqltips4db2.com 1. Access the presentation for this Tech Talk: http://bit.ly/ttfilemarch13 2. Next steps and troubleshooting guide: click ―Attachments― in this webcast window A few details …. Jessica Rockwood IBM Warehouse developer and Performance Expert Today’s technical presenters . . . DB2 Tech Talk series host and today‘s presenter:
  3. 3. © 2013 IBM Corporation DB2 Tech Talk 3 Need webcast troubleshooting help? Click attachments Disclaimer The information contained in this presentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided ―as is‖, without warranty of any kind, express or implied. In addition, this information is based on IBM‘s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other documentation. Nothing contained in this presentation is intended to, or shall have the effect of: • Creating any warranty or representation from IBM (or its affiliates or its or their suppliers and/or licensors); or • Altering the terms and conditions of the applicable license agreement governing the use of IBM software. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
  4. 4. © 2013 IBM Corporation DB2 Tech Talk Agenda  Why warehousing? – Business Intelligence and the data warehouse  What constitutes a data warehouse? – Terminology overview – Component overview  How do you design, implement and manage a warehouse?  Warehousing with IBM Information Management Products today: – InfoSphere Warehouse – PureData for Operational Analytics  Technology trends -- what are the next possible evolutions in warehousing? Why are they needed?  Case studies 4 Need webcast troubleshooting help? Click attachments
  5. 5. © 2013 IBM Corporation DB2 Tech Talk Why warehousing?  To succeed, today‘s companies need to: – Be more efficient, streamline operations • Spot areas for cost savings – Quickly identify and respond to business trends • Understand customer behavior • Understand key business metrics – Predict future performance • What-if analysis of hypothetical scenarios • Look for new opportunities, where to cross-sell or up-sell products  What do you need to analyze, predict, and report? Business Intelligence
  6. 6. © 2013 IBM Corporation DB2 Tech Talk Business intelligence (BI) is a set of … architectures, and technologies that transform raw data into meaningful and useful information … using a consistent set of metrics to both measure past performance, gain insight, and guide business planning – BI answers what happened, how many, how often.. what actions are needed Business analytics (BA) refers to … continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. – BA answers why is this happening, what if these trends continue, what will happen next, what is the best that can happen Business Intelligence & Business Analytics
  7. 7. © 2013 IBM Corporation DB2 Tech Talk ** Percentage of respondents who cited a competitive advantage. 58% 2010 37% 2011 220%Analytics-driven organizations outperform their industry peers ** IBM IBV/MIT Sloan Management Review Study 2011 Analytics Create Competitive Advantage … and the gap is widening  More organizations see analytics as competitive advantage  … And it is!
  8. 8. © 2013 IBM Corporation DB2 Tech Talk Reducing Energy Dependence Reducing Traffic & Pollution Reducing Customer Churn Fighting Chronic Disease Averting Fraudulent Transactions Preventing Contamination Streamlining Supply Chains Analytic Innovation Helps Businesses Across all Industries
  9. 9. © 2013 IBM Corporation DB2 Tech Talk A data warehouse (DW or DWH) or enterprise data warehouse (EDW) is a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate sources. Traditional warehouse Core element of business intelligence architectures is the data warehouse
  10. 10. © 2013 IBM Corporation DB2 Tech Talk Staging Area Data warehouse component overview Integration Layer Data Marts Subsets of data warehouse oriented to a specific business line or teamExtract, Transform, Load (ETL) ETL OLTP systems Data warehouse ETL Sales Marketing ERP Focus area
  11. 11. © 2013 IBM Corporation DB2 Tech Talk Evolution of warehousing – What It Was  Traditional enterprise data warehouse implies: – Incremental updates on a weekly or daily basis – Strict isolation between transactional systems and warehouse systems – Content of warehouse systems typically historical transaction data • Little operational data aligned with business processes – Lower availability requirements • Typically lowest tier of all databases; outage is not ideal but does not prevent business from occurring – Lower performance requirements • Speed and efficiency is preferred, but less critical than in transaction systems
  12. 12. © 2013 IBM Corporation DB2 Tech Talk Today‘s Data Warehouse Requirements – What It Is ■ New Demands on the database product – Mixed workloads – both traditional complex query and short OLTP queries – Real time load and updates (think trickle feed) – Increased focus on monitoring and workload management – Increased storage requirements (tens to hundreds of TBs) – Integration of all types of data, structured and unstructured – Advancement and increasing importance of a data warehouse appliance – Strong focus on ROI for the warehouse – Get up and running faster with less iterations to get an optimal design – Real-time operational analytics
  13. 13. © 2013 IBM Corporation DB2 Tech Talk Real-Time Operational Analytics Requires both analytics and operational data management Business Analysts Data Warehouse Multiple, Concurrent Analytic Queries Sales & Profit for Shoes & Belts Year >= 2005 2010 2009 2008 2007 2006 2005 SALES BI Reports and Analytics Extreme concurrent query volumes on real time information 100s to 10,000+ Read (and Update) Queries Business Users, Call Centers, Online Queries, etc
  14. 14. © 2013 IBM Corporation DB2 Tech Talk How to bootstrap your data warehouse implementation  Leverage an appliance or integrated system – IBM PureData for Operational Analytics  Leverage industry models and patterns – IBM Industry Models • Data models and sample reports for Banking, Banking Process and Service, Financial Markets, Banking and Financial Markets, Insurance, Information, Insurance Process and Service, Health Plan, Retail, and Telecommunications – IBM InfoSphere Warehouse Packs • Provide physical data models, example data mining algorithms and sample IBM Cognos Business Intelligence reports • Derived from IBM Industry Models for specific horizontal business issues (customer insight, market and campaign insight, supply chain insight)
  15. 15. © 2013 IBM Corporation DB2 Tech Talk Warehouse Design & Implementation  From ―Physical database design for data warehouse environments‖ best practice paper  In preparation for design, need to know: – What is the expected query performance? – What is the expected data availability and recoverability? – How is data loaded into and removed from the warehouse? – What volume of data is expected? And can the system architecture and design support it?  2-step process: – Design the logical data model • Create a logical data model that defines logical entities and the relationships between them – Design the physical data model • Goal is to speed up performance of database activities, balance data across multiple partitions (if used), and allow for fast recovery • Implement initial design & iterate to find optimal design • Leverage Optim Performance Manager Extended Insight and Optim Query Workload Tuner to help identify areas of improvement
  16. 16. © 2013 IBM Corporation DB2 Tech Talk Logical model terminology – aka. learning the lingo  Normalization – Organizing fields and tables of a relational database to minimize redundancy – Dividing large tables into smaller tables with relationships (keys) defined between them – Simplifies data consistency as each attribute is stored in a single place, and all facts reference that single entry – Reduces space requirements
  17. 17. © 2013 IBM Corporation DB2 Tech Talk Learning the lingo – part 2  Star schema Fact Table (SALES) Dimension Table (STORE) Dimension Table (STORE) Dimension Table (STORE) Dimension Table (STORE) Dimension Table (ITEM) Dimension Table (PROMOTION) Dimension Table (DATE) Dimension Table (STORE) Dimension Table (CUSTOMER)
  18. 18. © 2013 IBM Corporation DB2 Tech Talk Learning the lingo – part 3  Snowflake schema Fact Table (SALES) Dimension Table (STORE) Dimension Table (STORE) Dimension Table (STORE) Dimension Table (STORE) Dimension Table (ITEM) Dimension Table (PROMOTION) Dimension Table (DATE) Dimension Table (STORE) Dimension Table (CUSTOMER) Dimension Table (C_ADDRESS) Dimension Table (S_ADDRESS) Dimension Table (COUNTRY) Dimension Table (DEMOGRAPHIC)
  19. 19. © 2013 IBM Corporation DB2 Tech Talk Optimal attributes of the physical data model  Dimension tables with a primary key to guarantee uniqueness  Columns defined as NOT NULL if they should have data  Standard data types across the warehouse to facilitate joins and logical model in business intelligence tooling  Informational referential constraints between dimension pair (i.e.. foreign key between fact table column and dimension primary key) – Aid optimization of queries without performance impact on insert/update/delete of enforced constraints  Indexes on foreign key columns  Integer data type for key columns, where possible  Use DATE data type for date columns, where possible  Leverage materialized query tables (MQTs) for common ―slices‖ of data – Pre-compute aggregation once and optimizer will reuse where possible
  20. 20. © 2013 IBM Corporation DB2 Tech Talk IBM Provides More Than Just a Database InfoSphere Warehouse provides extended capabilities and value Traditional warehouse Embedded analytics Multidimensional analysis Data mining and visualization Beyond traditional structured data Generate and leverage knowledge from unstructured information OLTP Benefits of a transactional data server foundation Optimized for real-time access, High availability and reliability Scalable, secure and auditable DW Dedicated warehousing Shared-nothing architecture Advanced data partitioning Extreme workload management Best of Both Worlds Architecture InfoSphere Warehouse DataVolumes UnstructuStructur
  21. 21. © 2013 IBM Corporation DB2 Tech Talk Predictable Scaling ■ Double the data, double system resources – Each partition processes the same amount of data as before • Response times and throughput will remain constant ■ Double the system resources, same data – Each partition processes ½ the amount of data as before • Response times will be 2x faster, and throughput will double ■ Keep system resources constant, double the data – Each partition processes double the amount of data as before • Response times should double, and throughput will be cut in half
  22. 22. © 2013 IBM Corporation DB2 Tech Talk DB2 Shared Nothing Architecture for Scalability select … from table Tables FCM network Database … Shared Nothing DB2 Database via Hash Partitions  Database is divided into multiple partitions  Partitions run on different servers  Each Partition has balanced resources  Parallel Processing occurs on all partitions and is coordinated by the DBMS  Single database system image to user, DBA and application Partition n data+log Engine Partition 3 data+log Engine Partition 2 data+log Engine Partition 1 data+log Engine
  23. 23. © 2013 IBM Corporation DB2 Tech Talk Multi-Core Parallelism  Challenge – New hardware is constantly providing more cores – 85% of computing resources often sit idle  MCP (a.k.a. SMP) – Divide work among subagents to parallelize query execution – Leverage the full CPU resources on multi-core environment  Enhanced and more flexible in DB2 10 – Degree of parallelism controlled through WLM – Better built-in runtime decision of parallelism degree when ANY is specified – New REBAL plan operator rebalances work among subagents – Latch contentions alleviated or eliminated – Parallel scans on partitioned indexes 23
  24. 24. © 2013 IBM Corporation DB2 Tech Talk InfoSphere Warehouse 10 Improved Cost Efficiencies Higher Performance Increased Team Productivity  Adaptive Data Compression provides on average 30% improvement (up to 75%) over existing Deep Compression  Multi-Temperature Storage enables the optimization of cost-efficient data storage  Built-In Time Travel query enabling faster historical and trend analytical queries  Row and Column access controls to support multiple tenant operational warehouses.  Star Schema optimization delivery for quicker response times - delivering 3x performance on BI workloads  Continuous Ingest of data optimizes loading of data leading to faster decisions InfoSphere Warehouse 10: Real-time operational analytics empowering organizations to make active, timely and informed decisions as business events occur Operational Analytics: Analytics over a large volume of data combined with high-scale operational access to the data and insights - delivering real time insights to improve each business decision
  25. 25. © 2013 IBM Corporation DB2 Tech Talk Multiple Instances of at least 3x Faster Query Performance* Increase Ability to Meet SLAs; Postpone Hardware Upgrades  Multi-core parallelism enhancements  Performance improvements for: – Queries over star schemas – Queries with joins and sorts – Queries with aggregation – Hash joins  Higher performance – Up to 35% faster out-of-the-box performance – Multiple instances of at least 3x faster when using new features *  Lower costs – Postpone hardware upgrades * Based on both external tests by partners, as well as internal tests of IBM DB2 9.7 FP3 vs. DB2 10.1 with new compression features on P6-550 systems with comparable specifications using data warehouse / decision support workloads, as of 29 Mar 2012. “IBM and Intel® have collaborated over a decade to optimize DB2 performance with Intel® Parallel Studio 2011, software development suite on Intel® Xeon® processors. We are excited to see a ~10x improvement in query processing performance using DB2 10 over the previous DB2 version, running on IBM System x3850 using Intel® Xeon® Processor E7. Customers can now realize dramatically greater performance boost at lower cost per query running IBM DB2 10 on servers powered by Intel® Xeon® processors.” —Pauline Nist, GM Software Strategies, Intel’s Datacenter & Connected Systems Group 25
  26. 26. © 2013 IBM Corporation DB2 Tech Talk Breakthrough Savings with Adaptive Compression Lower Storage Costs; Lower Administration Costs DB2 9.1 Table Compression DB2 9.7 Temp Space & Index Compression DB2 10 Adaptive Compression  Adaptively apply both table-level compression and page-level compression  Maintain high compression ratios over time without table reorganization  Compress archive logs
  27. 27. © 2013 IBM Corporation DB2 Tech Talk Adaptive Compression Shrinks your Data Storage Needs ―Our migration from Oracle Database to DB2 resulted in a 40% storage savings. Upgrading to DB2 9.7 and index compression brought our average savings to 57%. Now adaptive compression brings our average savings to 77%, dramatic savings!‖ —Andrew Juarez, Lead SAP Basis / DBA, Coca Cola Bottling Company. ―Page-level dynamic compression is one of the new DB2 features that will reduce planned outages and increase storage savings by up to 2X over DB2 9.7%.‖ —Jessica Tatiana Flores Montiel, DAFROS Multiservicios  Higher performance – Faster queries for I/O-bound environments – Faster backups  Lower costs – Postpone upcoming storage purchases – Lower ongoing storage needs – Easier administration with reduced need for table re-orgs
  28. 28. © 2013 IBM Corporation DB2 Tech Talk Workload Manager ■ Identification and control of applications – Enabling Enterprise Data Warehouse ■ Direct control of the execution environment – Tight integration with AIX WLM ■ Detection and control of ―rogue‖ queries – Prevent bad queries from executing ■ Query concurrency – Optimize query throughput ■ Advanced monitoring – Real time monitoring of query execution
  29. 29. © 2013 IBM Corporation DB2 Tech Talk Workload Manager Example InfoSphere Warehouse User Requests System Requests Marketingapps Marketingmgrs Default Workload Marketing Managers Default User Class Default System Class
  30. 30. © 2013 IBM Corporation DB2 Tech Talk Continuous Data Ingest Continuous Data Ingest - new feature enabled in the underlying database Optimized, continuous loading of data, not just periodically Reduced downtime & Up-to-date data helps feed the EDW Leads to faster, accurate tactical decision making. ―Replacing DB2 Import with Continuous Data Ingest we reduced data ingest time by 94% for a table with 1.7GB of data.‖ —Chunguang Yuan, China MinSheng Banking Corp.
  31. 31. © 2013 IBM Corporation DB2 Tech Talk Continuous Feeding of Data Frequency: Daily Loads, Slower Loads Up to the Second Loads, Faster Loads ETL- Extract, Transform, Load DW Traditional Data Warehouses DW ETL + CDI DataSources DataSources vs. InfoSphere Warehouse 10
  32. 32. © 2013 IBM Corporation DB2 Tech Talk Use Case #1 - Fiserv  Learn more at http://www-01.ibm.com/software/success/cssdb.nsf/CS/JHUN- 8UAPHM?OpenDocument&Site=default&cty=en_us  Fiserv offers technology solutions for > 16,000 financial institutions worldwide – ~ 20 billion data transactions/year  Business need: – Understand customers – Reduce customer attrition – Increase customer adoption of new products  Solution: InfoSphere Warehouse with IBM Banking Data Warehouse Model and more... – Turn billions of transactions into actionable insights that help banks better target offers and maximize their marketing dollars – Use cloud technologies to consolidate and virtualize servers (reduce cost & increase availability) – Five year savings estimated at US$8 million
  33. 33. © 2013 IBM Corporation DB2 Tech Talk Use Case #2 - beyerdynamic  Learn more at http://www-01.ibm.com/software/success/cssdb.nsf/CS/STRD- 933L98?OpenDocument&Site=default&cty=en_us  Beyerdynamic is a leading producer of professional audio electronics products  Business need: – Audio electronics market is booming with high demand for new technologies – Seek to increase profitability and growth by targeting global sales more precisely  Solution: Business Intelligence platform of ISW and Cognos Express – Gather, process, and analyze corporate data to provide new insight into sales and marketing strategy with sound evidence – Daily insights on market trends and demand levels leads to insight to expand strategic vision for sales management
  34. 34. © 2013 IBM Corporation DB2 Tech Talk Data Warehouse of the future  Big Data, Big Data, Big Data…  Do more – Increase in volume and velocity of data – Increase in complexity of analytics  With less – Reduce storage requirements for increase in data – Reduce system requirements for optimized performance – Consolidate workload on fewer servers  Faster – Need for ‗speed of thought‘ analysis, regardless of data volumes – True ‗real-time‘ access in the warehouse – Shorten time to value in deploying new warehouses
  35. 35. © 2013 IBM Corporation DB2 Tech Talk  New analytic applications drive the requirements for a big data platform – Integrate and manage the full variety, velocity and volume of data – Apply advanced analytics to information in its native form with the right type of system  InfoSphere Warehouse is THE: – Platform for custom-built warehouse solutions – Platform for real-time operational warehouses – Technology under IBM Smart Analytics Systems  InfoSphere Warehouse 10 – Real-time, operational data warehousing – Massively Parallel Processing (MPP) – Continual Data Ingest for Real-time Analysis – Integration with Hadoop systems – In-database mining & Hybrid OLAP – Native XML & relational data warehouse IBM Big Data Strategy - Move Analytics Closer to the Data InfoSphere Warehouse 10 is a Foundational Element of our Big Data Story BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse
  36. 36. © 2013 IBM Corporation DB2 Tech Talk Gartner Strategic Technology Trends for 2013  Taken from http://www.gartner.com/technology/research/top-10-technology-trends/  What could affect data warehousing? – Strategic Big Data – Actionable Analytics • Deliver analytics to users at the point of action and in context – Mainstream In-Memory Computing (IMC) • Optimize performance, near real-time services, leverage the cloud – Integrated Ecosystems • Appliances, cloud-based solutions
  37. 37. © 2013 IBM Corporation DB2 Tech Talk Top 9 Data Warehousing Trends of the Future  Taken from: http://www.melissadata.com/enews/dataadvisor/articles/032011/3.htm  Optimization and performance  Data warehouse appliances  The intensive POC  Data warehouse mixed workloads  Resurgence of data marts  Column-store DBMSs  In-Memory DBMSs  Data warehouse as a service and cloud  Using an Open-Source DBMS to deploy the data warehouse
  38. 38. © 2013 IBM Corporation DB2 Tech Talk 38 Need webcast troubleshooting help? Click attachments DB2 Tech Talk: Real World Data Warehousing Next Steps Roadmap Learn the Best Practices •Physical database design for warehouse environments ibm.co/11E3ace •Implementing DB2 Workload Management in a Data Warehouse ibm.co/ZQRMcW Visit the DB2 10 Information Center • Warehousing and analytics chapter bit.ly/PwPs5D Listen to The Data Warehousing Institute – IBM Webcast • Analytic Workloads: Which Data Warehousing Architecture is Right for You? ibm.co/XCjnvA Download the software • IBM DB2 10 fully functioning trial bit.ly/db2dnld • InfoSphere Warehouse trial software ibm.co/XCQC4X Reference Call IBM to schedule a demo or learn more • 1 800 966-9875 (U.S) • 1-888-746-7426 (Canada) • 1800-425-3333 (India) • Or visit http://www.ibm.com/planetwide/ for contact information worldwide IBM Data Warehousing Products Ibm.co/datamgt Tech forum on developerWorks ibm.co/12hi0G9 SQL Tips for DB2 Blog: www.sqltips4db2.com Step Three Step Two Step One Step Four
  39. 39. © 2013 IBM Corporation DB2 Tech Talk 39 Need webcast troubleshooting help? Click attachments Next DB2 Tech Talk: • Please check www.idug-db2.com on April 4th for registration details IDUG DB2 Tech Conference North America • In-person conference sponsored by IDUG • Orlando, Florida • April 29 – May 3, 2013 • Agenda and registration: www.idug.org, select events No Charge DB2 for Linux, UNIX and Windows Tech Workshop at IDUG Tech Conference 8AM to 2PM • Presenters: Chris Eaton, Danny Arnold and Eric Alton • Yes, you can attend just the workshop • Registration: bit.ly/DB2LUW-IDUG13 Upcoming Tech Talks and other Events Don’t miss these in-depth DB2 and related product talks! Dates and topics subject to change and modification. How to register : DB2 Tech Talks web site
  40. 40. © 2013 IBM Corporation DB2 Tech Talk 40 Need webcast troubleshooting help? Click attachments
  41. 41. © 2013 IBM Corporation DB2 Tech Talk 41 Need webcast troubleshooting help? Click attachments Questions Listening in replay? Questions: www.sqltips4db2.com Click submit a question.
  42. 42. © 2013 IBM Corporation DB2 Tech Talk 42 Need webcast troubleshooting help? Click attachments Thanks for attending! Please rate the session Presentation download: bitly.com/ttfilenov click Attachments in this webcast environment