Informix Warehouse &  Informix Warehouse Accelerator Overview  <ul><li>Scripted for Tech Sales audience </li></ul><ul><li>...
Disclaimer <ul><ul><li>© Copyright IBM Corporation 2011. All rights reserved. </li></ul></ul><ul><ul><li>U.S. Government U...
Agenda <ul><li>Data Warehouse Industry Trends </li></ul><ul><li>Data Warehousing on Informix  </li></ul><ul><ul><li>Histor...
Data Warehousing  Industry Trends
State of Data Warehousing in 2011 <ul><li>DBMS Market in 2011: </li></ul><ul><li>DBMS market at the close of 2009 was appr...
State of Data Warehousing, Cont’d <ul><li>Market Dynamics for 2011 </li></ul><ul><li>Today, smaller data warehouses, those...
State of Data Warehouse, Cont’d  <ul><li>A Glimpse Into the Future </li></ul><ul><li>Vendor solutions began to focus even ...
Data Warehouse Trends for the CIO, 2011-2012 <ul><li>Data Warehouse Appliances: </li></ul><ul><li>DW appliances are not a ...
Informix Warehouse History <ul><li>Informix has 3 Database Products: </li></ul><ul><li>XPS for MPP Data Warehousing </li><...
Existing IDS Warehousing Features <ul><li>Performance & Scalability </li></ul><ul><ul><li>Inherent SMP Multi-threading </l...
Informix Warehousing Moving Forward <ul><li>Goal is to provide a comprehensive warehousing platform that is highly competi...
Informix Warehouse Roadmap  <ul><li>Informix Warehouse Feature </li></ul><ul><ul><li>SQW </li></ul></ul><ul><ul><li>Data M...
Informix Warehouse 11.70 Features
Typical Data Warehouse Architecture
Source: Forrester Query Tools Analytics BPS Apps BI Apps LOB apps Databases Other transactional data sources I/O & data lo...
Informix Warehouse Tooling - SQW Execution DEPLOY Deployment  preparation Deploy RUNTIME HTTP service ( WAS  ) SQW Runtime...
SQW: Design Studio <ul><li>Design Studio </li></ul><ul><ul><li>Eclipse based IDE </li></ul></ul><ul><ul><ul><li>Integrated...
SQW: Data Modeling <ul><li>Physical Data Model </li></ul><ul><ul><li>Visualized data modeling </li></ul></ul><ul><ul><li>I...
SQW: Data Flows <ul><li>Data Flow Operators: </li></ul><ul><li>Source & target operators (table, file) </li></ul><ul><li>S...
SQW: Data Flows A simple flow <ul><li>Generated SQL code </li></ul><ul><li>Optimization across SQL statements.  </li></ul>...
SQW: Control Flows <ul><li>Control flow </li></ul><ul><li>Common utility operators </li></ul><ul><li>Control logic, parall...
SQW Overview Design Studio Eclipse Based Design Environment Admin Console Production Environment in Websphere deploy <ul><...
Admin Console <ul><li>Flex RIA based Warehouse Admin Console </li></ul><ul><li>Admin Console manages common resources (e.g...
Informix 11.70 Feature: Warehouse Time-Cyclic Data Management <ul><li>Time-cyclic data management (roll-on, roll-off) </li...
Interval Fragmentation <ul><li>Fragments data based on an interval value </li></ul><ul><ul><li>E.g. fragment for every mon...
Informix 11.70 Feature:  Multi-Index Scan <ul><li>Make use of all available indices </li></ul><ul><li>Use set operations t...
Multi-Index Scan – An Example <ul><li>Handling common Data Warehouse queries more efficiently </li></ul><ul><li>Large dime...
Multi-Index Scan Example <ul><li>Method #1: </li></ul><ul><ul><li>Evaluates the most selective constraint </li></ul></ul><...
Multi-Index Scan Example <ul><li>Method #2 </li></ul><ul><ul><li>Evaluate each constraint by using a different B-tree inde...
Informix 11.70 Feature: Push Down Hash Join <ul><li>First, a standard Hash Join for typical warehousing queries involving ...
Typical Star Schema: An Example <ul><li>Large Central “Fact” table </li></ul><ul><li>Smaller “Dimension” tables </li></ul>...
Prior to 11.70: Standard Left Deep Tree Solution  Scan D1 1K 1M Problem Join Second Join Build Too Large Scan F Hash Join ...
11.70 Feature: Pushdown Hash-Join Solution Scan F Scan D1 1K Scan D3 Scan D2 1K Join Keys Multi Index Scan of Fact Table u...
Informix Warehouse Accelerator (IWA)
Agenda <ul><li>3 rd  Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA...
Third Generation of Database Technology <ul><li>According to IDC’s Article (Carl Olofson) – Feb. 2010 </li></ul><ul><li>1 ...
Example of 2nd Generation Database Disk I/O Issue
How Oracle/Exadata Solves That Problem: Add an I/O Layer
Sun Oracle Database Machine Full Rack <ul><li>Each Exadata cell is a self-contained server which houses disk storage and r...
Cost of Oracle/Exadata Solution <ul><li>Database Machine price – Full Rack </li></ul><ul><ul><li>$1,115,000    Hardware (s...
Agenda <ul><li>3 rd  Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA...
Informix Warehouse Accelerator   3 rd  Generation Database Technology is Here <ul><li>How is it different? </li></ul><ul><...
Breakthrough technologies for performance Row & Columnar Database Row format within IDS for transactional workloads and co...
TCP/IP Informix Warehouse Accelerator Configuration <ul><li>IDS:   </li></ul><ul><li>Routes SQL queries to accelerator </l...
Informix Warehouse Accelerator Overview Coordinator  Process  Orchestrating the distributed tasks like Load or Query execu...
Target Market:  Business Intelligence (BI) <ul><li>Characterized by: </li></ul><ul><ul><li>“ Star” or “snowflake” schema: ...
What IWA is Designed For <ul><li>Selective, fast scans over large (fact) tables </li></ul><ul><li>Joins with smaller Dimen...
Case Study #1: Major U.S. Shoe Retailer  <ul><li>Top 7 time-consuming queries in Retail BI and Warehouse:  (Against 1 Bill...
Case Study #2: Datamart at a Government Agency <ul><li>Microstrategy report was run, which generates  </li></ul><ul><ul><l...
Case Study #3: U.S. Government Agency             15800.89% Fact Table Scan 0:00:41 1:48:58 Summarize all transactions by ...
Agenda <ul><li>3 rd  Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA...
Row Oriented Data Store Each row stored sequentially   <ul><li>Optimized for record I/O  </li></ul><ul><li>Fetch and decom...
Columnar Data Store  Data is stored sequentially by column If attributes are not required for a specific query execution, ...
Compression: Frequency Partitioning Top 64  traded goods  – 6 bit code Rest Prod Origin Trade Info (volume, product,    or...
Compression Process: Step 1 Male/John Input tuple Column 1 Column 2 Co-code transform Type specific transform Column 1 & 2...
Compression Process: Step 2 First tuple code Tuplecode — Sorted Tuplecodes 1 Previous Tuplecode Delta Huffman Encode Delta...
Data is Processed in Compressed Format <ul><li>Within a  Register – Store , several columns are grouped together. </li></u...
Register Stores Facilitate SIMD Parallelism <ul><li>Access only the banks referenced in the query (like a column store): <...
Simultaneous Evaluation of Equality Predicates State==‘CA’ && Quarter == ‘Q4’ State==01001 && Quarter==1110 Translate valu...
Agenda <ul><li>3 rd  Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA...
Defining, What Data to Accelerate <ul><li>A MART is a logical collection of tables which are related to each other. For ex...
IWA Design Studio
Distributing data from IDS  (Fact tables) Data Fragment Fact Table UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedures Copy ...
Distributing data from IDS  (Dimension tables) IDS UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedure All dimension tables a...
Mapping Data from IDS to IWA Inside IWA Inside IDS Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment D...
Agenda <ul><li>3 rd  Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA...
IWA Referenced Hardware Configuration Options: 300 GB SAS hard disk drives each 6 disks 512G Memory  X7560 @ 2.27GH 4 X 8 ...
IWA Software Components <ul><li>Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11) </li></ul><ul><li>IDS 11.70 + IWA code modu...
(Fred Ho – hof@us.ibm.com)
 
Upcoming SlideShare
Loading in …5
×

Informix warehouse and accelerator overview

3,423 views

Published on

Describes recent features in Informix warehouse and the new query acceleration technology.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Informix warehouse and accelerator overview

  1. 1. Informix Warehouse & Informix Warehouse Accelerator Overview <ul><li>Scripted for Tech Sales audience </li></ul><ul><li>March 2011 </li></ul>
  2. 2. Disclaimer <ul><ul><li>© Copyright IBM Corporation 2011. All rights reserved. </li></ul></ul><ul><ul><li>U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. </li></ul></ul><ul><ul><li>THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.  WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.  IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE. </li></ul></ul><ul><li>IBM, the IBM logo, ibm.com , Cognos, SPSS and Informix are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml </li></ul><ul><li>Other company, product, or service names may be trademarks or service marks of others. </li></ul>
  3. 3. Agenda <ul><li>Data Warehouse Industry Trends </li></ul><ul><li>Data Warehousing on Informix </li></ul><ul><ul><li>History & Roadmap </li></ul></ul><ul><li>Informix Data Warehouse </li></ul><ul><ul><li>Informix Warehouse Tooling – ETL </li></ul></ul><ul><ul><li>IDS 11.70 Server Features </li></ul></ul><ul><li>Informix Warehouse Accelerator </li></ul><ul><li>Q&A </li></ul>
  4. 4. Data Warehousing Industry Trends
  5. 5. State of Data Warehousing in 2011 <ul><li>DBMS Market in 2011: </li></ul><ul><li>DBMS market at the close of 2009 was approximately $21.2 billion (2010 data not yet available) </li></ul><ul><li>Data Warehouse DBMS market was approximately 35% of the DBMS market or $7.42 billion </li></ul><ul><li>Key Findings: </li></ul><ul><li>Data warehouse DBMSs have evolved to a broader analytics infrastructure supporting operational analytics, corporate performance management and other new applications and uses. </li></ul><ul><li>Cost is driving interest in alternative architectures but performance optimization is driving multi-tiered data architectures and a variety of deployment options - notably a strong interest in in-memory data mart deployments. </li></ul>
  6. 6. State of Data Warehousing, Cont’d <ul><li>Market Dynamics for 2011 </li></ul><ul><li>Today, smaller data warehouses, those less than 5 TB's of source system extracted data (SSED) are the only &quot;data warehouse&quot; for the entire organization and are commonly solving organizations' analytic needs. Gartner estimates that between 70% and 75% of all systems referred to as EDW are actually single business departments in nature. </li></ul><ul><li>Analysis: </li></ul><ul><li>Optimization techniques such as summaries, aggregates and indexes are simply the result of performance restrictions inherent to normalized data and the way the RDBMS manages rows and columns . </li></ul>
  7. 7. State of Data Warehouse, Cont’d <ul><li>A Glimpse Into the Future </li></ul><ul><li>Vendor solutions began to focus even more on the ability to isolate and prioritize workload types including strategies for dual warehouse deployments and mixing OLTP and OLAP on the same platform . </li></ul><ul><li>In-memory DBMS solutions provide a technology which enables OLTP/OLAP combined solutions. Organizations should increase their emphasis on financial viability during 2011 and even into 2012 as well as aligning their analytics strategies with vendor road maps when choosing a solution. </li></ul>
  8. 8. Data Warehouse Trends for the CIO, 2011-2012 <ul><li>Data Warehouse Appliances: </li></ul><ul><li>DW appliances are not a new concept. Most vendors have developed an appliance offering or promote certified configurations. Main reason for consideration is simplicity. </li></ul><ul><li>The Resurgence of Data Marts: </li></ul><ul><li>Data marts can be used to optimize DW by offloading part of the workload, returning greater performance to the warehousing environment </li></ul><ul><li>Column-Store DBMSs </li></ul><ul><li>CIOs should be aware that their current DBMS vendor may offer a column-store solution. Don’t just buy a column-store-only DBMS because a column store was recommended by your team. </li></ul><ul><li>In-Memory DBMSs </li></ul><ul><li>IMDBMS technology also introduces a higher probability that analytics and transactional systems can share the same database. </li></ul>
  9. 9. Informix Warehouse History <ul><li>Informix has 3 Database Products: </li></ul><ul><li>XPS for MPP Data Warehousing </li></ul><ul><li>Red Brick for Star Schema data marts/data warehousing </li></ul><ul><li>Informix Dynamic Server (IDS) for OLTP & (now) Data Warehousing </li></ul>
  10. 10. Existing IDS Warehousing Features <ul><li>Performance & Scalability </li></ul><ul><ul><li>Inherent SMP Multi-threading </li></ul></ul><ul><ul><li>Parallel Data Query (PDQ) </li></ul></ul><ul><ul><li>Light Scan for fast table scans </li></ul></ul><ul><ul><li>Online Index build </li></ul></ul><ul><ul><li>Efficient Hash Joins </li></ul></ul><ul><ul><li>Auto Fragment Elimination </li></ul></ul><ul><ul><li>Memory Grant Manager (MGM) </li></ul></ul><ul><ul><li>High Performance Loader </li></ul></ul><ul><ul><li>Optimistic Concurrency </li></ul></ul><ul><li>Easy of Management </li></ul><ul><ul><li>Time cyclic data management using Range Partitioning </li></ul></ul><ul><ul><li>Sophisticated Query Optimizer for OLTP and Warehousing </li></ul></ul>
  11. 11. Informix Warehousing Moving Forward <ul><li>Goal is to provide a comprehensive warehousing platform that is highly competitive in the marketplace </li></ul><ul><ul><li>Incorporating the best features of XPS and Red Brick into IDS for OLTP/Warehousing and Mixed-Workload </li></ul></ul><ul><ul><li>Using the latest Informix technology in: </li></ul></ul><ul><ul><ul><li>Continuous Availability and Flexible Grid </li></ul></ul></ul><ul><ul><ul><li>Data Warehouse Accelerator using latest industry technology </li></ul></ul></ul><ul><ul><li>Integration of IBM’s BI software stack </li></ul></ul>
  12. 12. Informix Warehouse Roadmap <ul><li>Informix Warehouse Feature </li></ul><ul><ul><li>SQW </li></ul></ul><ul><ul><li>Data Modeling </li></ul></ul><ul><ul><li>ELT/ETL </li></ul></ul>Informix Warehouse with Storage Optimization/Compression <ul><li>Cognos integration </li></ul><ul><ul><li>- Native Content Store on IDS </li></ul></ul><ul><li>SQL Merge </li></ul>External Tables Star Join Optimization Multi-index Scan New Fragmentation Fragment Level Stats Storage Provisioning Warehouse Accelerator
  13. 13. Informix Warehouse 11.70 Features
  14. 14. Typical Data Warehouse Architecture
  15. 15. Source: Forrester Query Tools Analytics BPS Apps BI Apps LOB apps Databases Other transactional data sources I/O & data loading Query processing DBMS & Storage mgmt 11.70 Warehousing Features Data Loading HPL DB utilities ON utilities DataStage External Tables Online attach/detach Data & Storage Management Deep Compression Interval and List Fragmentation Online attach/detach Fragment level stats Storage provisioning Table defragmenter Query Processing Light Scans Merge Hierarchical Queries Multi-Index Scan Skip Scan Bitmap Technology Star and Snowflake join optimization Implicit PDQ Access performance
  16. 16. Informix Warehouse Tooling - SQW Execution DEPLOY Deployment preparation Deploy RUNTIME HTTP service ( WAS ) SQW Runtime Applications Other Servers (DataStage) DB2 Oracle SQL Server Design Studio Admin Console Deploy Data Source Databases Execution Execution Debug SQW Control DB IDS DESIGN Design Center (Eclipse) Data Flows + Control Flows Deployment package Code Units Build Profile User scripts Warehouse DB IDS SQW Execution DB IDS
  17. 17. SQW: Design Studio <ul><li>Design Studio </li></ul><ul><ul><li>Eclipse based IDE </li></ul></ul><ul><ul><ul><li>Integrated tools, shell sharing </li></ul></ul></ul><ul><ul><li>Team development </li></ul></ul><ul><ul><ul><li>CVS, clearcase for checkin/checkout projects, flows </li></ul></ul></ul><ul><li>Data Warehousing Project </li></ul><ul><ul><li>Data Models </li></ul></ul><ul><ul><li>Data Flows </li></ul></ul><ul><ul><li>Control Flows </li></ul></ul><ul><ul><li>Warehouse Applications (deployment packages) </li></ul></ul><ul><ul><li>Subflow & Subprocess (reusable flow module) </li></ul></ul><ul><ul><li>Variables </li></ul></ul><ul><li>Data Source Explorer </li></ul><ul><ul><li>Database connections to multiple vendors, e.g. Informix, DB2 LUW, Oracle, SQL Server, MySQL, DB2 z/OS </li></ul></ul><ul><li>DataStage Servers </li></ul><ul><ul><li>Integration with IBM DataStage </li></ul></ul>
  18. 18. SQW: Data Modeling <ul><li>Physical Data Model </li></ul><ul><ul><li>Visualized data modeling </li></ul></ul><ul><ul><li>Impact analysis </li></ul></ul><ul><ul><li>Reverse engineering or new from scratch </li></ul></ul><ul><ul><li>Compare & sync </li></ul></ul><ul><ul><li>Generate DDL </li></ul></ul><ul><ul><li>Overview diagram </li></ul></ul><ul><li>Shell Sharing with Rational Data Architect & other Data Studio products </li></ul>
  19. 19. SQW: Data Flows <ul><li>Data Flow Operators: </li></ul><ul><li>Source & target operators (table, file) </li></ul><ul><li>SQL Transformation operators </li></ul><ul><li>Warehousing operators </li></ul>File source Table source Table join aggregation Table target
  20. 20. SQW: Data Flows A simple flow <ul><li>Generated SQL code </li></ul><ul><li>Optimization across SQL statements. </li></ul><ul><li>Optimized staging strategy </li></ul><ul><li>In-database transformation </li></ul>
  21. 21. SQW: Control Flows <ul><li>Control flow </li></ul><ul><li>Common utility operators </li></ul><ul><li>Control logic, parallel execution, loop iteration </li></ul><ul><li>Error handling </li></ul>
  22. 22. SQW Overview Design Studio Eclipse Based Design Environment Admin Console Production Environment in Websphere deploy <ul><li>Application package (zip file) </li></ul><ul><li>Deployment profile: database connections, machine resources, variable definitions, DDL files etc.. </li></ul><ul><li>Generated code </li></ul>create <ul><li>Manage warehouse applications </li></ul><ul><li>Schedule </li></ul><ul><li>Monitor </li></ul>manage
  23. 23. Admin Console <ul><li>Flex RIA based Warehouse Admin Console </li></ul><ul><li>Admin Console manages common resources (e.g. databases connections, ftp servers, DataStage servers) </li></ul><ul><li>Schedule & monitor warehouse processes </li></ul>
  24. 24. Informix 11.70 Feature: Warehouse Time-Cyclic Data Management <ul><li>Time-cyclic data management (roll-on, roll-off) </li></ul><ul><li>Attach and detach online without requiring exclusive lock and access to the table </li></ul><ul><li>Automatically kicks off background process to recollect statistics. </li></ul><ul><li>Interval and List Fragmentation </li></ul><ul><li>Auto Fragment level statistics </li></ul>field field field field field field field Jan Feb Mar Apr May 2011 Dec 2010 enables storing data over time field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field
  25. 25. Interval Fragmentation <ul><li>Fragments data based on an interval value </li></ul><ul><ul><li>E.g. fragment for every month or every million customer records </li></ul></ul><ul><li>Tables have an initial set of fragments defined by a range expression </li></ul><ul><li>When a row is inserted that does not fit in the initial range fragments, IDS will automatically create fragment to hold the row (no DBA intervention) </li></ul><ul><li>No Exclusive-lock is required for fragment addition </li></ul><ul><li>All the benefits of fragment by expression </li></ul>
  26. 26. Informix 11.70 Feature: Multi-Index Scan <ul><li>Make use of all available indices </li></ul><ul><li>Use set operations to apply to all rowids </li></ul><ul><li>Use bitmap operations like union and intersection </li></ul><ul><li>Bitmap can also be used for Skip Scan operations </li></ul>
  27. 27. Multi-Index Scan – An Example <ul><li>Handling common Data Warehouse queries more efficiently </li></ul><ul><li>Large dimension tables, e.g. customer table </li></ul><ul><li>Multiple low-selectivity attributes like gender, age group, zip code, etc. </li></ul><ul><li>Example </li></ul><ul><ul><li>SELECT count (customer_id) </li></ul></ul><ul><ul><li>FROM customer_table </li></ul></ul><ul><ul><li>WHERE gender = male </li></ul></ul><ul><ul><ul><li>AND income_category = HIGH </li></ul></ul></ul><ul><ul><ul><li>AND education_level = MASTERS </li></ul></ul></ul><ul><ul><ul><li>AND zip_code = 95032; </li></ul></ul></ul>
  28. 28. Multi-Index Scan Example <ul><li>Method #1: </li></ul><ul><ul><li>Evaluates the most selective constraint </li></ul></ul><ul><ul><li>Generates a list of rows that qualify, and </li></ul></ul><ul><ul><li>Evaluate the remaining constraints for each of the rows generated above which will produce the answer to the query </li></ul></ul>Method retrieves rows based on the most selective constraint using only the index for that column, followed by a sequential evaluation of each of other constraints in a post-retrieval manner.
  29. 29. Multi-Index Scan Example <ul><li>Method #2 </li></ul><ul><ul><li>Evaluate each constraint by using a different B-tree index on each attribute – results in a list of rows that qualify for each constraints. </li></ul></ul><ul><ul><li>Merge the lists to form one master list that satisfies all the constraints </li></ul></ul><ul><ul><li>Retrieve the qualifying rows to produce the answers </li></ul></ul>Gender=‘m’ Zipcode=‘95032’ AND Records Sorted RIDs Income_Category=“high” Education_level = “masters” Sequential Skip Scan
  30. 30. Informix 11.70 Feature: Push Down Hash Join <ul><li>First, a standard Hash Join for typical warehousing queries involving a “large” Fact table with multiple dimension tables </li></ul><ul><li>Build Hash Table on Left Input </li></ul><ul><li>Probe with Right Input </li></ul><ul><li>Typically, build on smaller input </li></ul><ul><ul><li>avoids hash table overflow to disk </li></ul></ul>Build Scan Hash Join Build Probe Probe Scan
  31. 31. Typical Star Schema: An Example <ul><li>Large Central “Fact” table </li></ul><ul><li>Smaller “Dimension” tables </li></ul><ul><li>Restrictions on Dimension tables </li></ul><ul><ul><li>assume independence </li></ul></ul><ul><li>Small fraction of Fact table in result </li></ul>Dim (D1) Dim (D3) Fact (F) 1M rows sel : 1/1000 10K rows sel : 1/10 10K rows sel : 1/10 10K rows sel: 1/10 Dim (D2)
  32. 32. Prior to 11.70: Standard Left Deep Tree Solution Scan D1 1K 1M Problem Join Second Join Build Too Large Scan F Hash Join Hash Join Scan D3 Hash Join Scan D2 100K 1K 10K 1K
  33. 33. 11.70 Feature: Pushdown Hash-Join Solution Scan F Scan D1 1K Scan D3 Scan D2 1K Join Keys Multi Index Scan of Fact Table using Join Keys and Single-Column Indexes Join Keys Pushed Down to Reduce Probe Size Hash Join Hash Join Hash Join 1K 1K 1K 1K
  34. 34. Informix Warehouse Accelerator (IWA)
  35. 35. Agenda <ul><li>3 rd Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA) </li></ul><ul><ul><li>Target Market </li></ul></ul><ul><ul><li>Beta Customer Experience </li></ul></ul><ul><li>IWA vs. Row/Column/Hybrid Stores </li></ul><ul><li>Loading IWA </li></ul><ul><li>Referenced Hardware & Software Configuration </li></ul>
  36. 36. Third Generation of Database Technology <ul><li>According to IDC’s Article (Carl Olofson) – Feb. 2010 </li></ul><ul><li>1 st Generation: </li></ul><ul><li>- Vendor proprietary databases of IMS, IDMS, Datacom </li></ul><ul><li>2 nd Generation: </li></ul><ul><li>- RDBMS for Open Systems, dependent on disk layout, limitations in scalability and disk I/O </li></ul><ul><li>- Database tuning by adding updating stats, creating/dropping indexes, data partitioning, summary tables & cubes, force query plans, resource governing </li></ul><ul><li>3 rd Generation: IDC Predicts that within 5 years: </li></ul><ul><li>Most data warehouses will be stored in a columnar fashion </li></ul><ul><li>Most OLTP database will either be augmented by an in-memory database (IMDB) or reside entirely in memory </li></ul><ul><li>Most large-scale database servers will achieve horizontal scalability through clustering </li></ul>
  37. 37. Example of 2nd Generation Database Disk I/O Issue
  38. 38. How Oracle/Exadata Solves That Problem: Add an I/O Layer
  39. 39. Sun Oracle Database Machine Full Rack <ul><li>Each Exadata cell is a self-contained server which houses disk storage and runs the Exadata software </li></ul><ul><li>Databases are deployed across multiple Exadata cells </li></ul><ul><li>Database enhanced to work in cooperation with Exadata intelligent storage </li></ul>14 Exadata Storage Cells (Storage Server) per Cell up to 1.5 GB/Sec I/O Bandwidth => 21 GB/Sec per DB machine 8 Oracle RAC Database Servers InfiniBand Switches/Network InfiniBand 16 Gigabit per Channel 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 72 GB Memory 8 Cores 72 GB Memory 8 Cores 72 GB Memory 8 Cores 72 GB Memory 8 Cores 72 GB Memory
  40. 40. Cost of Oracle/Exadata Solution <ul><li>Database Machine price – Full Rack </li></ul><ul><ul><li>$1,115,000 Hardware (same price for 600GB or 2TB drives) </li></ul></ul><ul><ul><li>$1,680,000 Oracle Exadata Storage Server software </li></ul></ul><ul><ul><li>$1,520,000 Oracle 11gR2 Enterprise Edition </li></ul></ul><ul><ul><li>$736,000 Oracle Real Application Clusters </li></ul></ul><ul><ul><li>$368,000 Oracle Partitioning </li></ul></ul><ul><ul><li>$368,000 Advanced Compression </li></ul></ul><ul><ul><li>$160,000 Enterprise Manager Diagnostic Pack (recommended) </li></ul></ul><ul><ul><li>$160,000 Enterprise Manager Tuning Pack (recommended) </li></ul></ul><ul><ul><li>$1,098,240 1 st year software support and maintenance </li></ul></ul><ul><ul><li>--------------------------------------------------------------------------------------------------------- </li></ul></ul><ul><ul><li>$7,240,240 Total Price </li></ul></ul><ul><li>Excludes OLAP option, Data Mining option, ETL option </li></ul><ul><li>Installation is extra and requires a custom quote </li></ul>
  41. 41. Agenda <ul><li>3 rd Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA) </li></ul><ul><ul><li>Target Market </li></ul></ul><ul><ul><li>Beta Customer Experience </li></ul></ul><ul><li>IWA vs. Row/Column/Hybrid Stores </li></ul><ul><li>Loading IWA </li></ul><ul><li>Referenced Hardware & Software Configuration </li></ul>
  42. 42. Informix Warehouse Accelerator 3 rd Generation Database Technology is Here <ul><li>How is it different? </li></ul><ul><li>Performance: Unprecedented response times to enable 'train of thought' analysis frequently blocked by poor query performance. </li></ul><ul><li>Integration: Connects to IDS through deep integration providing transparency to all applications. </li></ul><ul><li>Self-managed workloads: queries are executed in the most efficient way </li></ul><ul><li>Transparency: applications connected to IDS, are entirely unaware of IWA </li></ul><ul><li>Simplified administration: appliance-like hands-free operations, eliminating many database tuning tasks </li></ul>What is it? The Informix Warehouse Accelerator (IWA) is a workload optimized, appliance-like, add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times. Breakthrough Technology Enabling New Opportunities
  43. 43. Breakthrough technologies for performance Row & Columnar Database Row format within IDS for transactional workloads and columnar data access via accelerator for OLAP queries. Extreme Compression Required because RAM is the limiting factor. Massive Parallelism All cores are used within used for queries Predicate evaluation on compressed data Often scans w/o decompression during evaluation Frequency Partitioning Enabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination. In Memory Database 3 rd generation database technology avoids I/O. Compression allows huge databases to be completely memory resident Multi-core and Vector Optimized Algorithms Avoiding locking or synchronization 1 2 3 4 5 6 7 1 2 3 4 5 6 7
  44. 44. TCP/IP Informix Warehouse Accelerator Configuration <ul><li>IDS: </li></ul><ul><li>Routes SQL queries to accelerator </li></ul><ul><li>User need not change SQL or apps . </li></ul><ul><li>Can always run query in IDS, e.g., if </li></ul><ul><ul><li>too short an est. execution time </li></ul></ul>Bulk Loader SQL Queries (from apps) Informix Warehouse Accelerator Compressed DB partition Query Processor Data Warehouse IDS SQL (via DRDA) Query Router <ul><li>Informix Warehouse Accelerator: </li></ul><ul><li>Connects to IDS via TCP/IP & DRDA </li></ul><ul><li>Analyzes, compresses, and loads </li></ul><ul><ul><li>Copy of (portion of) warehouse </li></ul></ul><ul><li>Processes routed SQL query and </li></ul><ul><ul><li>returns answer to IDS </li></ul></ul>Results
  45. 45. Informix Warehouse Accelerator Overview Coordinator Process Orchestrating the distributed tasks like Load or Query execution . Have all the data in main memory spread across all cores. Do the compression and query execution. IDS Query parsing and matching to the Optimizer. Routing query blocks. . . Worker Processes
  46. 46. Target Market: Business Intelligence (BI) <ul><li>Characterized by: </li></ul><ul><ul><li>“ Star” or “snowflake” schema: </li></ul></ul><ul><li>Complex, ad hoc queries that typically </li></ul><ul><li>Look for trends, exceptions to make actionable business decisions </li></ul><ul><li>Touch large subset of the database (unlike OLTP) </li></ul><ul><li>Involve aggregation functions (e.g., COUNT, SUM, AVG,…) </li></ul><ul><li>The “Sweet Spot” for the IWA! </li></ul>Dimensions Fact Table City Region Store SALES Product Period Brand Month Quarter Category
  47. 47. What IWA is Designed For <ul><li>Selective, fast scans over large (fact) tables </li></ul><ul><li>Joins with smaller Dimension tables </li></ul><ul><li>OLAP-style queries over large fact tables in relational star schema with grouping and aggregations </li></ul>SELECT PRODUCT_DEPARTMENT, REGION, SUM(REVENUE) FROM FACT_SALES F INNER JOIN DIM_PRODUCT P ON F.FKP = P.PK INNER JOIN DIM_REGION R ON F.FKR = R.PK LEFT OUTER JOIN DIM_TIME T ON F.FKT = T.PK WHERE T.YEAR = 2009 AND R.GEOID = 17 AND P.TYPEID = 3 GROUP BY PRODUCT_DEPARTMENT, REGION
  48. 48. Case Study #1: Major U.S. Shoe Retailer <ul><li>Top 7 time-consuming queries in Retail BI and Warehouse: (Against 1 Billion rows Fact Table) </li></ul>Our Retail users will be really happy to see such a huge improvement in the queries processing timings. This IWA extension to IDS will really bring value to the Retail BI environment. 2 secs 45 mins & up 7 2 secs 30 mins 6 2 secs 2 mins 5 4 secs 30 mins & up 4 2 secs 3 mins 40 secs 3 2 secs 1 min 3 secs 2 4 secs 22 mins 1 IDS 11.7 IWA IDS 11.5 Query
  49. 49. Case Study #2: Datamart at a Government Agency <ul><li>Microstrategy report was run, which generates </li></ul><ul><ul><li>667 SQL statements of which 537 were Select statements </li></ul></ul><ul><li>Datamart for this report has 250 Tables and 30 GB Data size </li></ul><ul><li>Original report on XPS and Sun Sparc M9000 took 90 mins </li></ul><ul><li>With IDS 11.7 on Linux Intel box, it took 40 mins </li></ul><ul><li>With IWA, it took 67 seconds. </li></ul>
  50. 50. Case Study #3: U.S. Government Agency             15800.89% Fact Table Scan 0:00:41 1:48:58 Summarize all transactions by State, County, City, State, Zip, Program, Program Year, Commodity and Fiscal Year 5 108.41% Index Read 0:00:06 0:00:06 Detailed Report on Specific Programs in a Date Range 4 41708.49% Fact Table Scan 0:00:14 1:34:37 Summarize all transactions by State and County 3 7640.45% Fact Table Scan 0:01:05 1:22:32 Find Top 100 Members 2 6023.23% Fact Table Scan 0:01:28 1:28:22 Find Top 100 Entities 1 Improvement Notes Informix w/ IWA Informix Description Query
  51. 51. Agenda <ul><li>3 rd Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA) </li></ul><ul><ul><li>Target Market </li></ul></ul><ul><ul><li>Beta Customer Experience </li></ul></ul><ul><li>IWA vs. Row/Column/Hybrid Stores </li></ul><ul><li>Loading IWA </li></ul><ul><li>Referenced Hardware & Software Configuration </li></ul>
  52. 52. Row Oriented Data Store Each row stored sequentially <ul><li>Optimized for record I/O </li></ul><ul><li>Fetch and decompress entire row, every time </li></ul><ul><li>Result – </li></ul><ul><ul><li>Very efficient for transactional workloads </li></ul></ul><ul><ul><li>Not always efficient for analytical workloads </li></ul></ul>If only few columns are required the complete row is still fetched and uncompressed
  53. 53. Columnar Data Store Data is stored sequentially by column If attributes are not required for a specific query execution, they are skipped completely. <ul><li>Data is compressed sequentially for column: </li></ul><ul><ul><li>Aids sequential scan </li></ul></ul><ul><ul><li>Slows random access </li></ul></ul>
  54. 54. Compression: Frequency Partitioning Top 64 traded goods – 6 bit code Rest Prod Origin Trade Info (volume, product, origin country) Histogram on Origin Histogram on Product Origin Product China USA GER, FRA, … Rest Table partitioned into Cells Column Partitions Vol <ul><li>Field lengths vary between cells </li></ul><ul><ul><li>Higher Frequencies  Shorter Codes (Approximate Huffman) </li></ul></ul><ul><li>Field lengths fixed within cells </li></ul>Cell 4 Cell 1 Cell 2 Cell 3 Cell 5 Cell 6 Common Values Rare values Number of Occurrences
  55. 55. Compression Process: Step 1 Male/John Input tuple Column 1 Column 2 Co-code transform Type specific transform Column 1 & 2 Column 3.A Column Code TupleCode Column Code Column 3 Column 3.B Column Code Male/John/Sat Sat 2006 Male, John, 08/10/06, Mango 101101011 001 01011101 101101011 001 01011101 p = 1/512 p = 1/8 p = 1/512 w35/Mango w35 Huffman Encode Dict Huffman Encode Dict Huffman Encode Dict Male John 08/10/06 Mango 1.5% Steven 1.9% Thomas 2.3% Richard 2.4% Mark 2.5% William 3.5% John 3.5% Robert 3.6% James 3.8% David 4.2% Michael 22% 28% 17% 15% 9% 5% 4% Female 12% 42% 23% 6% 10% 4% 3% Male Sun Sat Fri Thu Wed Tue Mon
  56. 56. Compression Process: Step 2 First tuple code Tuplecode — Sorted Tuplecodes 1 Previous Tuplecode Delta Huffman Encode Delta Code Append Dict Compression Block 101101011100001100 10110101110001011111 1011010111000011101 10110101110001011101 10110101110001011101 0000000000000000001 000 000 00000000000000000001 010 010 0000000000000000101 1110 1110 Look Ma, no delimiters! 101101011100010111010000101110 — — —
  57. 57. Data is Processed in Compressed Format <ul><li>Within a Register – Store , several columns are grouped together. </li></ul><ul><li>The sum of the width of the compressed columns doesn‘t exceed a register compatible width. This utilizes the full capabilities of a 64 bit system. It doesn‘t matter how many columns are placed within the register – wide data element. </li></ul><ul><li>It is beneficial to place commonly used columns within the same register – wide data element. But this requires dynamic knowledge about the executed workload (runtime statistics). </li></ul><ul><li>Having multiple columns within the same register – wide data element prevents ANDing of different results. </li></ul>The Register – Store is an optimization of the Column – Store approach where we try to make the best use of existing hardware. Reshuffeling small data elements at runtime into a register is time consuming and can be avoided. The Register – Store also delivers good vectorization capabilities. Predicate evaluation is done against compressed data!
  58. 58. Register Stores Facilitate SIMD Parallelism <ul><li>Access only the banks referenced in the query (like a column store): </li></ul><ul><ul><li>SELECT SUM ( T.G ) </li></ul></ul><ul><ul><li>FROM T </li></ul></ul><ul><ul><li>WHERE T.A > 5 </li></ul></ul><ul><ul><li>GROUP BY T.D </li></ul></ul><ul><li>Pack multiple rows from the same bank into the 128-bit register </li></ul><ul><li>Enables yet another layer of parallelism: SIMD (Single-Instruction, Multiple-Data)! </li></ul>32 bits 32 bits 32 bits 32 bits 128 bits Vector Operation A 1 D 1 G 1 A 2 D 2 G 2 A 4 D 4 G 4 Bank β 1 (32 bits) A 3 D 3 G 3 B 1 E 1 F 1 B 2 E 2 F 2 B 4 E 4 F 4 C 1 H 1 C 3 H 3 C 4 H 4 Bank β 2 (32 bits) Bank β 3 (16 bits) Cell Block B 3 E 3 F 3 C 2 H 2 Result 1 Result 2 Result 3 Result 4 Operand Operand Operand Operand
  59. 59. Simultaneous Evaluation of Equality Predicates State==‘CA’ && Quarter == ‘Q4’ State==01001 && Quarter==1110 Translate value query to Code query Row Mask Selection result … … … … 01001 0 1110 0 == & <ul><li>CPU operates on 128-bit units </li></ul><ul><ul><li>Lots of fields fit in 128 bits </li></ul></ul><ul><li>These fields are at fixed offsets </li></ul><ul><li>Apply predicates to all columns simultaneously ! </li></ul>State Quarter 11111 0 1111 0
  60. 60. Agenda <ul><li>3 rd Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA) </li></ul><ul><ul><li>Target Market </li></ul></ul><ul><ul><li>Beta Customer Experience </li></ul></ul><ul><li>IWA vs. Row/Column/Hybrid Stores </li></ul><ul><li>Loading IWA </li></ul><ul><li>Referenced Hardware & Software Configuration </li></ul>
  61. 61. Defining, What Data to Accelerate <ul><li>A MART is a logical collection of tables which are related to each other. For example, all tables of a single star schema would belong to the same MART. </li></ul><ul><li>The administrator uses a rich client interface to define the tables which belong to a MART together with the information about their relationships. </li></ul><ul><li>IDS creates definitions for these MARTs in the own catalog. The related data is read from the IDS tables and transferred to IWA. </li></ul><ul><li>The IWA transforms the data into a highly compressed, scan optimized format which is kept locally (in memory) on the Accelerator </li></ul>Define Worker Processes Coordinator Process IDS + IWA
  62. 62. IWA Design Studio
  63. 63. Distributing data from IDS (Fact tables) Data Fragment Fact Table UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedures Copy A copy of the IDS data is now transferred over to the Worker process. The Worker process holds a subset of the data (compressed) in main memory and is able to execute queries on this subset. The data is evenly distributed (no value based partitioning) across the cpus. Coordinator Process Worker Process Compressed Data Compressed Data Compressed Data Compressed Data Compressed Data Compressed Data Worker Process Worker Process Data Fragment Data Fragment Data Fragment
  64. 64. Distributing data from IDS (Dimension tables) IDS UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedure All dimension tables are transferred to the worker process. Coordinator Process Worker Process Worker Process Worker Process Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table
  65. 65. Mapping Data from IDS to IWA Inside IWA Inside IDS Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Fact Table Dimension Table Dimension Table Dimension Table Dimension Table Compressed Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Fact Table Dimension Table Dimension Table Dimension Table Dimension Table
  66. 66. Agenda <ul><li>3 rd Generation Data Base Technology </li></ul><ul><li>Overview of the Informix Warehouse Accelerator (IWA) </li></ul><ul><ul><li>Target Market </li></ul></ul><ul><ul><li>Beta Customer Experience </li></ul></ul><ul><li>IWA vs. Row/Column/Hybrid Stores </li></ul><ul><li>Loading on IWA </li></ul><ul><li>Referenced Hardware & Software Configuration </li></ul>
  67. 67. IWA Referenced Hardware Configuration Options: 300 GB SAS hard disk drives each 6 disks 512G Memory X7560 @ 2.27GH 4 X 8 Intel(R) Xeon(R) CPU - 16x 1.8&quot; SAS SSDs with eXFlash or 8x 2.5&quot; SAS HDDs - Optional MAX5 32-DIMM memory expansion - Scalable from 4 sockets and 64 DIMMs to 8 sockets and 128 DIMMs - 8-core, 6-core and 4-core processor options with up to 2.26 GHz (8-core), 2.66 GHz (six-core) and 1.86 GHz (four-core) speeds with up to 16 MB L3 cache - 4-processor, 4U rack-optimized enterprise server with Intel® Xeon® processors.
  68. 68. IWA Software Components <ul><li>Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11) </li></ul><ul><li>IDS 11.70 + IWA code modules including IDS Stored Procedures </li></ul><ul><li>ISAO Studio Plug-in – GUI for Mart definition </li></ul><ul><li>OnIWA – On Utilities for Monitoring IWA </li></ul>
  69. 69. (Fred Ho – hof@us.ibm.com)

×