Scaling Data


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Good Afternoon 1:30 – sorry cannot dim the lights 
  • What is Scaling – computer industry coining it’s own definition DATA – becoming more business critical every year
  • What is Scaling – computer industry coining it’s own definition DATA – becoming more business critical every year “Who would want such a thing” Who – us; spend the next hour with Andrew and myself as we explore data scaling from both the SAS and Oracle perspective. We will be focusing on: Scaling components within SAS V9 Scaling components within Oracle Information on how they will work together
  • Why Scale to Data – have to, business requirement, requirement of software solutions, required to deliver “the answer” – hardware till the end of the decade: Moore’s Law in tact 40 GHz Processors 1.5 TB Disks 4-12 GB RAM 4-8 CPU’s per chip Data: Late 90’s – 4TB to 300TB 300 Peta Bytes by 2012 1/3 Exabyte 300,000 TB 4500 Peta Bytes on the Internet Available Tools – SAS System 9 – features and components Available Procedures and Processes – SAS System 9 – Oracle 9i
  • For the next hour Andrew and I will discuss the meaning of scaling between SAS and Oracle: What you will learn about are some of the scaling features in SAS 9 and Oracle 9i. From this information you should be able to derive what scaling means to you and what scaling means to your organization.
  • Turn Silver into Gold
  • SAS System 9 – you are here; a tour through the intelligent architecture; different components and processes. What does the small pyramid represent – OS, Oracle, etc.
  • Some commentary on the product suite – the check* components and the enhancement they involve.
  • Libname engines in SAS V8 and their I/O process
  • Libname engines in SAS System 9 and their use of the threaded process – threaded read, support of threaded procedures and non-threaded procedures. SAS Procedures threaded for SAS System 9: Process for making it happen Libname options SAS options Procedure options
  • Either fully threaded or partial threaded procedures Discuss order by processing using proc sort
  • Option syntax and examples
  • SAS code examples
  • SAS code examples
  • SAS code examples
  • Controlling the threaded process: Libname Controls Procedures Controls How to use them
  • Run on a mid-frame Unix box with 16Gig of RAM and two CPU configs (mostly 8 but 12 in a few tests) Baseline Speedups The scalability measures discussed in the previous two sections compare performance of programs executing in one thread with the corresponding programs executing in multiple threads. Multithreading of SAS procedures requires significant modifications to the existing legacy code. The following sections detail typical speedups that you can obtain with some of the thread-enabled analytic procedures running on multiple CPU platforms. For most of the tests, this server was configured with eight 750 MHZ processors, but where explicitly noted, a configuration with twelve processors was used.
  • Other tests have been run with PROC SORT. This example illustrates the kind of performance increases in SORT on an 8-way unix box. The x axis is the number of records ranging from zero up to 10 million. The y-axis is the percent of maximum elapsed time. Note that threaded (in green) in this test is approximately 85% faster than single threaded processing on the same box. These tests were done for small key and large key sorts.
  • Methods to processing large data volumes, Organization of data, data marts, data bases is key.
  • Methods to processing large data volumes, Organization of data, data marts, data bases is key.
  • I am interested in hearing from you. If you would like copies of the presentation and/or sample SAS code drop me an e-mail, or put your request on your business card.
  • I am interested in hearing from you. If you would like copies of the presentation and/or sample SAS code drop me an e-mail, or put your request on your business card.
  • Scaling Data

    1. 1. Scaling SAS® Data Access to Oracle® RDBMS Howard Plemmons SAS Institute Inc. Andrew Holdsworth Oracle Corporation
    2. 2. Scaling <ul><li>What is Scaling? </li></ul>
    3. 3. Scaling <ul><li>“ To remove the scales of a fish” </li></ul><ul><li>“ To climb up by means of a scaling ladder” </li></ul><ul><li>“ To reach the highest point” </li></ul><ul><li>Data </li></ul>
    4. 4. Scaling Data <ul><li>Why Scale to Data </li></ul>
    5. 5. Scaling Data <ul><li>SAS tools, SAS/ACCESS ® </li></ul><ul><li>SAS Procedure and Processes </li></ul><ul><li>Oracle tools </li></ul><ul><li>Oracle Procedures and Processes </li></ul>
    6. 6. Intelligence Value Chain
    7. 7. Intelligence Value Chain Silver into Gold
    8. 8. SAS System 9
    9. 9. SAS V8 vs. SAS System 9 x Threaded Interface x x Fast Load x x Procedure Interface x x Libname Engine SAS System 9 SAS V8 FEATURE
    10. 10. SAS V8 I/O Model
    11. 11. Threaded Interface SAS 9
    12. 12. SAS Procedures <ul><li>proc sort </li></ul><ul><li>proc summary </li></ul><ul><li>proc dmine </li></ul><ul><li>proc reg; proc dmreg </li></ul><ul><li>proc means </li></ul><ul><li>proc loess; proc dmdb </li></ul><ul><li>proc glm </li></ul><ul><li>proc robustreg </li></ul>
    13. 13. SAS/ACCESS® Engines <ul><li>ORACLE </li></ul><ul><li>DB2 </li></ul><ul><li>Informix </li></ul><ul><li>ODBC </li></ul><ul><li>Sybase </li></ul><ul><li>Teradata </li></ul>
    14. 14. Libname and SAS Procedure Controls <ul><li>dbslice (“where”,”where”,…) </li></ul><ul><li>dbsliceparm (ALL,…) </li></ul><ul><li>defaults (THREADED_APPS,2) </li></ul><ul><li>options sastrace=‘,,t’; </li></ul><ul><li>procedure controls – CPU count </li></ul>
    15. 15. Options In Action - DBSLICEPARM <ul><li>-dbsliceparm none </li></ul><ul><li>option dbsliceparm= </li></ul><ul><li>libname x oracle user=scott pass=tiger </li></ul><ul><li>dbsliceparm=(threaded_apps,2); </li></ul><ul><li>proc print data=y.oratab (dbsliceparm=(all,4)); run; </li></ul>
    16. 16. Options In Action - DBSLICE <ul><li>libname x oracle user=scott pass=tiger; </li></ul><ul><li>proc print data=x.oratab (dbslice= (“where x<100”, “where x >= 100”) ); </li></ul>
    17. 17. Options In Action – CPUCOUNT, THREADS <ul><li>CPUCOUNT= </li></ul><ul><li>THREADS | NOTHREADS </li></ul>
    18. 18. Process <ul><li>Libname controls </li></ul><ul><li>Procedure controls </li></ul><ul><li>Execution </li></ul>
    19. 19. Scalability – SAS 9 Threaded speedup in PROC REG Linear Scalability Achieved Speedup Run on 12-way Unix Box
    20. 20. Scalability – SAS 9 Threaded speedup in PROC SORT Run on 8-way Unix Box Tests run in memory cache
    21. 21. What Does This Mean - access <ul><li>393000 Rows </li></ul><ul><li>No Threads - baseline </li></ul><ul><li>Two Threads (DBSLICE) – 31% </li></ul><ul><li>Six Threads (DBSLICEPARM) – 54% </li></ul>Run on 10-way Unix Box Tests run in memory cache
    22. 22. Scaling Data <ul><li>Data Volumes </li></ul><ul><li>Data ACCESS </li></ul><ul><li>Data Organization </li></ul><ul><li>Scaling using Oracle - Andrew </li></ul>
    23. 23. Scaling with <ul><li>The Star Query </li></ul><ul><li>Use of Parallelism </li></ul><ul><li>Use of the Direct Path </li></ul><ul><li>Use of Specialist Indexes </li></ul><ul><li>Use of Analytical Functions </li></ul><ul><li>Use of Materialized Views </li></ul><ul><li>Use of The Oracle9i Optimizer </li></ul>
    24. 24. The Star Query Fact Product Time Geography Customer
    25. 25. Star Queries <ul><li>The star query is a very common DW technique. It is highly optimized in Oracle and can be tuned depending on the type of queries. In summary the more known about the query composition the higher level of optimization possible. </li></ul>
    26. 26. Star Query Optimization <ul><li>The Optimization is 3 step Process </li></ul><ul><ul><li>Apply query predicates to dimension tables to generate lists of foreign keys into the fact table. </li></ul></ul><ul><ul><li>Query the fact table using series of single column bit mapped indexes on the foreign keys </li></ul></ul><ul><ul><li>Having resolved the query within the fact table complete the query by joining back to dimension tables where needed and roll the query up. </li></ul></ul>
    27. 27. Star Queries <ul><li>To enable star queries the DBA should do the following </li></ul><ul><ul><li>Build single column bitmapped indexes on each foreign key in the fact table </li></ul></ul><ul><ul><li>Build indexes on the dimension tables for query predicates </li></ul></ul><ul><ul><li>Build indexes on the dimension tables to assist in the join back and roll up process </li></ul></ul><ul><ul><li>Generate statistics for the schema </li></ul></ul><ul><ul><li>Set the parameter STAR_TRANSFORMATION_ENABLED=TRUE </li></ul></ul>
    28. 28. Use of Parallelism <ul><li>Multiple CPUs to execute a single query as well multiple concurrent queries </li></ul><ul><li>Execute Table scans, Index probes and scans in parallel </li></ul><ul><li>Execute Joins and Sorts in parallel </li></ul><ul><li>Execute DML in parallel </li></ul><ul><li>Parallelism can be configured manually or automatically </li></ul>
    29. 29. Use of Partitioning <ul><li>Partitioning was originally designed to allow management of large db objects however by partitioning data performance gains can be made by the following </li></ul><ul><ul><li>Partition pruning </li></ul></ul><ul><ul><li>Join optimizations </li></ul></ul><ul><li>Partitioning can be done by the following methods </li></ul><ul><ul><li>Range e.g. Data or key ranges </li></ul></ul><ul><ul><li>List e.g. Discrete values such as State </li></ul></ul><ul><ul><li>Hash to achieve equal size partitions </li></ul></ul><ul><li>Two types of partitioning can be applied </li></ul>
    30. 30. Use of The Direct Path <ul><li>By pass the conventional transaction layer to insert and copy data within the database </li></ul><ul><li>SQL*Loader is user currently by SAS </li></ul><ul><li>Other options include </li></ul><ul><ul><li>Insert with /*+ append */ hint </li></ul></ul><ul><ul><li>Create Table as Select with NOLOGGING </li></ul></ul><ul><li>These constructs can be used to transform vast amounts of data rapidly in parallel </li></ul>
    31. 31. Specialist Indexes <ul><li>B-Tree Indexes </li></ul><ul><li>Bit Mapped Indexes including join indexes </li></ul><ul><li>Functional Indexes </li></ul>
    32. 32. Analytical Functions <ul><li>Oracle has embraced the ANSI OLAP extensions to SQL </li></ul><ul><li>These permit faster response times on queries that would require multiple passes of the data with conventional SQL </li></ul><ul><li>This allows grouped results and functionality such as moving averages </li></ul>
    33. 33. Materialized Views <ul><li>Materialized view allow automatic use of summary tables without a user having to re-write the query </li></ul><ul><li>Well designed materialized views are small in size and can increase performance by orders of magnitude. </li></ul><ul><li>Materialized views are in fact Oracle tables and can use all other features to improve performance </li></ul>
    34. 34. Oracle9i Optimizer <ul><li>On upgrade of Oracle Releases the Optimizer behavior will change </li></ul><ul><li>The Optimizer is tested with over 400,000 SQL Statements </li></ul><ul><ul><li>Where plans change between releases the actual query is ran to test for degradation </li></ul></ul><ul><ul><li>Slower plans are corrected </li></ul></ul><ul><li>It is still important to have good representative Statistics </li></ul><ul><li>DBMS_STATS package allows parallel generation and migration of schema statistics </li></ul>
    35. 35. Oracle9i Optimizer <ul><li>Some common Optimizer problems seen with Oracle9i </li></ul><ul><ul><li>Bad or incomplete statistics </li></ul></ul><ul><ul><li>Init.ora parameters influencing optimizer </li></ul></ul><ul><ul><li>SQL written for RBO </li></ul></ul>
    36. 36. Summary <ul><li>Oracle and SAS provide techniques for scaling to larger databases by optimizing both query performance and fetch performance. </li></ul><ul><li>These techniques are simple to adopt and allow huge productivity improvements </li></ul><ul><li>We have identified some core technologies here however this is a partial picture of the SAS/Oracle ability. </li></ul>
    37. 37. About the Speakers Howard Plemmons Andrew Holdsworth Senior Software Manager Director SAS Institute Inc. Oracle Corp. SAS Circle 500 Oracle Pkwy, Cary, NC Redwood Shores, CA94065 Phone: 919-531-7779 650-506-2938 E-mail: [email_address] [email_address]
    38. 38. Other SUGI Papers/Presentations <ul><li>PC File Data Objects Directly from UNIX – 8:00am Tuesday </li></ul><ul><li>SAS/ACCESS and use of Metadata – Rm 619 @ 2:30 </li></ul><ul><li>Lessons in Scalability – SAS Presents – 3:20 Tuesday </li></ul><ul><li>Data Warehousing section - performance </li></ul>
    39. 39. Scaling SAS Data ACCESS to ORACLE RDBMS
    40. 40. Copyright © 2003, SAS Institute Inc. All rights reserved.