137218761. Copyright © 2015 IBM Corporation
All rights reserved
IBM DB2 Analytics Accelerator
Hands-On Experiences
Netezza In-Database Analytics Functions and
Accelerator Only Table (AoT) Support for QMF 11.2
May 10, 2016
IBM New York City, NY
Dave Trotter
Analytics Technical Sales
North America – Midwest
2. Slide 2 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Please Note:
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product
direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract. The development, release, and
timing of any future features or functionality described for our products remains at our sole
discretion.
Performance is based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput or
performance that any user will experience will vary depending upon many
factors, including considerations such as the amount of multiprogramming in
the user’s job stream, the I/O configuration, the storage configuration, and the
workload processed. Therefore, no assurance can be given that an individual
user will achieve results similar to those stated here.
3. Slide 3 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Acknowledgements and Disclaimers
© Copyright IBM Corporation 2014. All rights reserved.
– U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United
States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information
was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is
available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be
available in all countries in which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own
views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of
being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness
and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind,
express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to,
this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the
effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms
and conditions of the applicable license agreement governing the use of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products
and the results they may have achieved. Actual environmental costs and performance characteristics may vary
by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying
that any activities undertaken by you will result in any specific sales, revenue growth or other results.
4. Slide 4 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
IDAA – INZA Functions and AoTs using QMF 11.2
DB2 Analytics Accelerator
Users’ Group
5. Slide 5 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Agenda
• Brief Overview of IDAA Interfaces
• Netezza In-database Analytics / INZA functions
• SPSS Modeler/Data Studio/Stored Procedure Examples
• Installation Steps and Documentation for INZA Support
• Accelerator Only Tables / QMF 11.2 Support
• Summary / Questions
6. Slide 6 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
IBM DB2 Analytics Accelerator
Product components
CLIENT
Data Studio with
DB2 Analytics Accelerator
Studio
Plug-in
z System
DB2 for z/OS enabled for IBM
DB2 Analytics Accelerator
IBM DB2
Analytics
Accelerator
v5.1
Dedicated highly available
network connection
PureData System
for Analytics
(Netezza Technology)
SPSS Modeler 17
or
SPSS Modeler 18
(GA March 2016)
7. Slide 7 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Accelerator-only table type in DB2 for z/OS
Creation (DDL) and access through DB2 for z/OS in all cases
Non-accelerator DB2 table
• Data in DB2 only
Accelerator-shadow table
• Data in DB2 and the Accelerator
Accelerator-archived table / partition
• Empty read-only partition in DB2
• Partition data is in Accelerator only
Accelerator-only table (AOT)
• “Proxy table” in DB2
• Data is in Accelerator only
Table 1
Table 4
Table 3
Table 2
Table 2
Table 4
Table 3
8. Slide 8 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Netezza In-database Analytics (INZA) functions
Enable acceleration of predictive analytics applications
In-database analytics enables SPSS/Netezza
Analytics (INZA) data mining and in-database
modeling to be processed within IDAA.
Accelerate SPSS/Netezza Analytics data mining and in-database modeling through
SQL Stored Procedure calls.
Allow frequent model refreshes to enable adequate scoring
Reduce need of data movement processes (ETL) to other platforms for predictive
analytics purposes
Support the full lifecycle of a real-time analytics solution on a single, integrated
system, combining transactional data, historical data, and predictive analytics
9. Slide 9 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
In-database analytics – Technical basics
Support for INZA function calls on IDAA
Set of stored procedures contained in the IBM Netezza In-Database Analytics
Package (INZA) available to be installed on the Accelerator.
Currently there are 19 functions, that support:
Decision Tree
Regression Tree
Naive Bayes
K-means Clustering
TwoStep Clustering
Stored procedures use accelerator-shadow tables or accelerator-only tables as
input and create accelerator-only tables and data models as output
DB2 for z stored procedures invoke the INZA function code on the Accelerator
itself. These procedures can be called from DB2 for z/OS client applications or
from SPSS Modeler 17 and beyond.
10. Slide 10 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
In-database Analytics – Technical basics
Only SPSS Modeler 17 and newer are supported, which take advantage
of the following 19 INZA stored procedures
dectree - Builds a Decision Tree model by growing and pruning a tree
grow_dectree - Builds a Decision Tree model
predict_dectree - Applies a Decision Tree model to generate classification predictions
prune_dectree - Prunes a previously built Decision Tree model
regtree - Builds a Regression Tree model by growing and pruning a tree
grow_regtree - Builds a Regression Tree model
prune_regtree - Prunes a previously built Regression Tree model
predict_regtree - Applies a Regression Tree model to generate regression predictions for
a dataset
naivebayes - Builds a Naive Bayes model
predict_naivebayes - Applies a Naive Bayes model to generate classification predictions
for a dataset
kmeans - Builds a Clustering model that clusters the input data into k centers. The centers
are calculated as the mean value of the nearest input data records
predict_kmeans - Applies a K-means Clustering model to cluster records of a dataset
11. Slide 11 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
In-database Analytics – Technical basics
two_step - Builds a TwoStep Clustering model that first distributes the input data into a
hierarchical tree structure according to the distance between the data records, then
reduces the tree into k clusters. A second pass over the data associates the input data
records to the next cluster
predict_twostep - Applies a TwoStep Clustering model to score records of a dataset
split_data - Randomly splits the input data into two separated subsets
pmml_model - Stores the given analytics model as PMML document to a table
export_pmml - Exports the given analytics model as PMML document to a file, or it
exports a model from a PMML table to a file. If no PMML table exists containing the
PMML document for this model, one can be created automatically when requested.
Optionally, instead of writing to a file, the result can be returned by the procedure.
model_exists - Checks if the given model exists. The model can be searched in the current
or in another given database.
drop_model - Drops the given model. All managed tables of this model are also dropped
Many other functions exist in the INZA Analytics Package, but support for
these extended functions in IDAA is still in development.
12. Slide 12 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Developer’s Guide: https://ibm.biz/Bd4bRE
Reference Guide: https://ibm.biz/Bd4zdA
These guides are hosted out on IBM’s DeveloperWorks Community Pages as PDF
downloads. They are also available for download from the IBM FixCentral site, along
with the install packages.
Netezza Analytics Reference Information
13. Slide 13 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
In-Database Analytics
Data Preparation and SPSS modeling in the Accelerator
Transaction Processing
Systems (OLTP)
With embedded scoring
Advantages:
• Allows fast model refreshes
• Better performance and
reduced latency
• Ensures adequate scoring
• Scoring outside accelerator
with SPSS Modeler Server
Scoring Adapter for
DB2 for z/OS
Data for transactional and analytical processing
Customer
Transactions
Customer
Data
Customer Txn
Data Prep AOTs
Customer
Transactions
Customer
Data
Modeling
Model
Model
14. Slide 14 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
IBM SPSS Modeler 17 - Stream Processing
• Simple stream to select some rows from a table
and display the result:
• Enable Stream Properties in "File"->"Stream
Properties"->"Optimization"
• Executed statement on the Accelerator STRIPER:
• SELECT T0."R_REGIONKEY" AS
"R_REGIONKEY",T0."R_NAME" AS
"R_NAME",T0."R_COMMENT" AS
"R_COMMENT"
FROM (SELECT * FROM STKNOL.REGION)
T0
WHERE NOT((LOCATE('A', T0."R_NAME",
CODEUNITS32) = 1))
15. Slide 15 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
IBM SPSS Modeler 17
Stream Processing with caching enabled
• With caching enabled on the SELECT node of the stream, SPSS creates
an accelerator-only table on the Accelerator
• Executed statements on the Accelerator:
• CREATE TABLE "SESSION".CLEMTMP79C49C041 ( "R_REGIONKEY" INTEGER,"R_NAME"
VARCHAR(25),"R_COMMENT" VARCHAR(152) ) IN ACCELERATOR "STRIPER" CCSID UNICODE
INSERT INTO "SESSION".CLEMTMP79C49C041 ("R_REGIONKEY","R_NAME","R_COMMENT")
SELECT T0."R_REGIONKEY" AS "R_REGIONKEY",T0."R_NAME" AS "R_NAME",T0."R_COMMENT" AS
"R_COMMENT"
FROM (SELECT * FROM STKNOL.REGION) T0 WHERE NOT((LOCATE('A', T0."R_NAME", CODEUNITS32) = 1))
SELECT T0."R_REGIONKEY" AS "R_REGIONKEY",T0."R_NAME" AS "R_NAME",T0."R_COMMENT" AS
"R_COMMENT" FROM "SESSION".CLEMTMP79C49C041 T0
Find more details here:
https://www.ibm.com/developerworks/community/wikis/home
?lang=en#!/wiki/W494c1ca765dc_4cbe_a8cb_dc15fd30847c/p
age/Use%20of%20IBM%20DB2%20Analytics%20Accelerator
%20in%20SPSS%20Modeler
Will be enhanced with Modeling examples with DB2 Analytics
Accelerator V5.1
16. Slide 16 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
A large petroleum and energy products supplier did a pilot project with
SPSS Modeler.
Working together with IBM, they deployed IBM SPSS Modeler software
with IBM DB2 Analytics Accelerator for z/OS.
Using accelerated DB2 queries, the company now harnesses SPSS
predictive analytics insights to generate personalized sales suggestions
based on customers’ purchasing histories.
Pilot Project with SPSS Modeler
17. Slide 17 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
process process type
node
modeling nugget score output
Step 1 DB2 Analytics Accelerator Query Acceleration
Step 2 Step 3 In-database transformation and analytics
Without Accelerator: The SPSS source node selects the input data from DB2 z/OS without any
acceleration, then all data preparation is done in SPSS Modeler, followed by the modeling algorithm in
SPSS modeler, then the model is scored and scoring results are inserted into a DB2 z/OS result table
Source tables are accelerator-shadow tables and the accelerator processes the (complex)
select statement
Step 1
Data preparation is pushed down into the accelerator producing an accelerator-only table for the
type node
Step 2
INZA modeling algorithm is called and executed in the accelerator on an accelerator-
only table
Step 3
Data Source Access and Preparation Modeling Scoring
Source data
source process process t
(select) (select) Result data
Data preparation (using AOTs) and SPSS modeling in the
Accelerator
With Accelerator:
Pilot Project with SPSS Modeler
18. Slide 18 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Results of Pilot Project with SPSS Modeler
source
(select)
process process type
node
modeling nugget score output
Step 1 IDAA Query Acceleration
Step 2 Step 3 In-database transformationand analytics
Source tables are accelerated tables and IDAA processes the (complex) select statement
Significant acceleration of select statement in stream II: 10x faster
Step 1
Data preparation is pushed down into IDAA producing an accelerator-only table for the type node
Data Preparation in minutes which was not possible before in stream II
Significant acceleration of data preparation in stream I + II + III : 3-240x faster
Step 2
IDAA/INZAmodeling algorithm is called and executed in IDAA on an accelerator-only table
Step 3
Data SourceAccess and Preparation Modelling Scoring
Source data (select) Result data
Total acceleration for stream I + II + III: 3 - 23x faster
• Acceleration of Modelling highly depends on data size.
Your mileage can and will vary !!!
19. Slide 19 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Batch scoring with accelerated in-database predictive modeling
20. Slide 20 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Prereqs for SPSS Support with DB2 for z/OS and IDAA/INZA
• IBM SPSS® Modeler 17.0 (or 18.0 now available) running in local mode or against an
SPSS Modeler Server installation.
• DB2 for z/OS Version 10 or later together with DB2 Analytics Accelerator for z/OS
Version 5.1, PTF 2
• IBM SPSS Data Access Pack V7.1 or other compatible ODBC Drivers
• License for DB2 Connect™ for System z®
• SQL generation and optimization enabled in SPSS Modeler
• IBM SPSS Modeler Scoring Adapter for zEnterprise® V17.0
• SPSS Modeler 17.0 Info
https://www.ibm.com/support/knowledgecenter/SS3RA7_17.0.0/modeler_mainhelp_clien
t_ddita/clementine/dbmining_zdb2_container.dita?lang=en
21. Slide 21 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
IBM SPSS Modeler 17 (and now 18)
Enabling integration with IBM DB2 Analytics Accelerator
1. Configure ODBC connection to DB2 for z/OS from SPSS Modeler (e.g. MySampleDB)
2. Edit the file odbc-db2-accelerator-names.cfg and associate an accelerator name with the configured
ODBC data source in the format:
• "<DSN>","<ACCELNAME>","<ENCODING>“
• Example (with default encoding UNICODE):
• "MySampleDB","STRIPER“,”UNICODE”
• Default location of the config file:
• Windows: C:Program FilesIBMSPSSModeler17config
• Linux: /opt/ibm/spss/modeler/17.0/config
3. Restart the SPSS Modeler Server and connect to the ODBC data source from the SPSS Modeler Client
• For the MySampleDB data source SPSS will show all accelerator-shadow tables of accelerator
STRIPER
• CURRENT QUERY ACCELERATION register set by SPSS
• If you want to work with tables available in DB2 for z/OS only then create a second ODBC data
source.
• Links:
• https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W494c1ca765dc_4cbe_a8cb_dc15fd30847c/page/Use%20of%20IBM%20D
B2%20Analytics%20Accelerator%20in%20SPSS%20Modeler
• http://www-01.ibm.com/support/knowledgecenter/SS3RA7_17.0.0/clementine/dbmining_zdb2_enabling.html
Note, encoding EBCDIC is not supported
until PTF2.....Soon to be available!
22. Slide 22 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Use Data Studio to Load Data / Run Queries
23. Slide 23 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Invoke INZA Stored Procedure directly from DataStudio
24. Slide 24 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Example of INZA Stored Procedure Call
Data Studio or an Application with an embedded call to DB2 Stored Procedure:
CALL INZA.KMEANS(’MYACCEL', 'model=adult_mdl,
intable=TPCH30M.CUSTOMER,
outtable=IWATEST.adult_out,
id=C_CUSTKEY, target=C_NATIONKEY, transform=S,
distance=euclidean, k=3, maxiter=5', ?, '');
Blue = procedure/algorithm to execute
Red = Accelerator to run the procedure on
Green = Algorithm parameters
Orange = Table information
25. Slide 25 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Example of DB2 SQL Stored Procedure wrapper
26. Slide 26 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Installation and setup
https://www-
01.ibm.com/support/knowledgecenter/SS4LQ8_5.1.0/com.ibm.datatools.aqt.doc/installma
nual/concept/c_idaa_inst_analytics.html
Installation package can be downloaded from Fix Central
Described in Knowledge Center
27. Slide 27 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Retrieving Analytics installation package from Fix Central
Use the following URL:
http://www.933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm~Information%2BManagement&product=ibm/Information+Management/
Netezza+Applications&release=ANALYTICS_IDAA_3.2&platform=All&function=fixId&fixids=3.2.1.0-IM-Netezza-ANALYTICS-IDAA-
fp105780&includeSupersedes=0
28. Slide 28 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Download of installation files from Fix Central
29. Slide 29 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Additional sources of Information
•IBM DB2 Analytics Accelerator for z/OS V5.1.0 Release Notes:
http://www-
01.ibm.com/support/docview.wss?uid=swg27047096&myns=swgimgmt&mynp=OC
SS4LQ8
•Known issues with IBM Netezza Analytics 3.2.1 for System z
http://www.ibm.com/support/docview.wss?uid=swg27047149
30. Slide 30 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Accelerator-only tables
Supporting in-database transformation and multi-step
processing
Introduction of Accelerator-only tables (AOT)
to store intermediate or final results of data
transformation or reporting processes
Accelerate in-database data transformations and data movement processes
Reduced need of data movement processes to other platforms for data
transformation purposes
Enables multi-step reporting on the Accelerator
Saves disk space and CPU cost on z Systems currently used for transformations and
reporting steps
Allow data preparation steps for data mining and other advanced analytics to
execute on the Accelerator
31. Slide 31 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Accelerator-only tables – Technical basics
AOTs are created and dropped using DB2 DDL statements (CREATE; DROP)
• Accelerator must be started
• QUERYACCELERATION behavior may have any value during CREATE/DROP
• Syntax:
CREATE TABLE MYTABLE (...) IN ACCELERATOR <ACCEL1>;
DROP TABLE MYTABLE;
Recommended to create a database in DB2 to be used for the AOTs
• CREATE TABLE MYTABLE (...) IN ACCELERATOR <ACCEL1> IN DATABASE
MYDB;
• Usual authorization necessary to create objects in database
Queries using AOTs can only run on the Accelerator
• QUERYACCELERATION behavior must be set to ENABLE/ELIGIBLE/ALL
AOTs can be subject to INSERT/UPDATE/DELETE operations on other accelerated tables
archived tables or AOTs
Dynamic and static SQL can be used with AOTs
32. Slide 32 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
• Recommended to create a database in DB2 to be used for the AOTs
– CREATE DATABASE MYDB;
– CREATE TABLE MYTABLE (...) IN ACCELERATOR <ACCEL1> IN
DATABASE MYDB;
– DROP TABLE MYTABLE;
– DROP DATABASE MYDB;
– Authorization necessary to create objects in database
• Each single CREATE/DROP statement must be committed
– No other statements are allowed to run in these transactions
• Multiple I/U/D statements in one transaction are only possible if all statements
target AOTs on the same Accelerator
Accelerator-Only tables – Usage Notes
33. Slide 33 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
AOTs in Data Studio
• How do AOTs appear in Data Studio?
• A different icon in front of the table allows distinction to accelerated and
archive tables
• Less operations are possible for AOTs (Load, Switch Acceleration, Storage
Saver …
• The ‘Last Load’ column shows “Operational”
34. Slide 34 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
AoT and special register support in QMF 11.2 for z/OS
Save query results to AOTs
• Available in QMF for z/OS V11.2 (GA 4th of September 2015)
• Full support in TSO client
• Current limitations in other clients, full support in plan for a future fix pack
• Value:
• Save intermediate results temporarily on the Accelerator as part of a multi-step
process
• Persist a query result for later accelerated processing
Global variables
•DSQEC_SAV_ALLOWED – Controls whether users save data to a new table in the database
or in an Accelerator
0 – Disable Save Data
1 – Enable Save Data to database tables only
2 – Enable Save Data to AOT only
3 – Enable Save Data to either database or AOTs (database default)
4 – Enable Save Data to either database or AOTs (accelerator default)
•DSQEC_SAV_ACCELNM – Contains the default name of the Accelerator to be used when
creating AOTs from QMF commands (e.g. SAVE DATA)
35. Slide 35 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
QMF V11.2 enhancements to support AoTs (Cont.)
Command with syntax enhancement Description
SAVE DATA AS tabname (ACCELERATOR
accelname
Saves data as accelerator-only table
tabname in accelerator accelname
SAVE DATA AS tabname (SPACE name Saves data as database table tabname in
database and table space specified by
name
IMPORT TABLE tabname FROM
datasetOrFile (ACCELERATOR accelname
Imports table data into accelerator-only
table tabname in accelerator accelname
IMPORT TABLE tabname FROM
datasetOrFile (SPACE name
Imports table data to database table
tabname in database and table space
specified by name
RUN QUERY qname (TABLE tabname
ACCELERATOR accelname
Runs a query and saves the result directly
into accelerator-only table tabname in
accelerator accelname
RUN QUERY qname (TABLE tabname
SPACE name
Runs a query and saves the result directly
into database table tabname in database
and table space specified by name
New syntax enhancement supported in TSO client only so far. For other clients create the AOTs separately
before using the SAVE DATA, RUN QUERY or IMPORT commands.
36. Slide 36 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Sample QMF procedures for multi-step reporting
• Runs in DB2
• SAVE DATA creates
regular DB2 Table
RUN QUERY BMB.QUERY_ACCEL_NONE
RUN QUERY BMB.DEMO1
SAVE DATAAS BMB.PRICE_PUB
RUN QUERY BMB.DEMO1A
SAVE DATAAS BMB.PRICE_NONPUB
RUN QUERY BMB.DEMO2
RUN QUERY BMB.QUERY_ACCEL_ELIGIBLE
SET GLOBAL (DSQEC_SAV_ALLOWED=4
SET GLOBAL (DSQEC_SAV_ACCELNM=DEMOIDAA
RUN QUERY BMB.DEMO1
SAVE DATAAS BMB.PRICE_PUB_AOT
RUN QUERY BMB.DEMO1A
SAVE DATAAS BMB.PRICE_NONPUB_AOT
RUN QUERY BMB.DEMO3
RUN QUERY BMB.QUERY_ACCEL_ELIGIBLE
SET GLOBAL (DSQEC_SAV_ALLOWED=4
SET GLOBAL (DSQEC_SAV_ACCELNM=DEMOIDAA
RUN QUERY BMB.DEMO1 (TABLE=BMB.PRICE_PUB_AOT
RUN QUERY BMB.DEMO1A (TABLE=BMB.PRICE_NONPUB_AOT
RUN QUERY BMB.DEMO3
Runs on Accelerator
SAVE DATA creates AOT
Multiple SQL statements to
SELECT data and INSERT
data
Runs on Accelerator
RUN QUERY creates AOT
Single SQL statement to
INSERT FROM SELECT
data
37. Slide 37 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Large Insurance Company
Problem Query - Focus Batch
BEN PERIOD QUERY1
TEST.FOCEXEC.DATA(BEN_Period_Query1)
Additional Notes: Ran off-peak in Job Class R for almost exactly ten hours.
It returned 172,097 rows. 10M+ rows on the largest table referenced.
26.93 CPU minutes, 598.39 clock minutes.
TABLE FILE Example1 /* Part 1 of 3……other parts not shown……. */
SUM
CORP_CD
PROD_CD
TYP_BEN_CD
SPECI_BEN_CD
DEP_AGE_BEN_PRD_CD
BY CORP_CD NOPRINT
BY PROD_CD NOPRINT
BY TYP_BEN_CD NOPRINT
BY SPECI_BEN_CD NOPRINT
WHERE BEN_CAN_DT GT '2016-01-11' AND
(DEP_AGE_BEN_PRD_CD EQ 'C' OR
DEP_AGE_BEN_PRD_CD EQ 'Y')
AND CORP_CD EQ '2'
AND (PROD_CD LT '30' OR PROD_CD GT '60')
ON TABLE HOLD AS TEMP1
END
JOIN CLEAR
JOIN CORP_CD AND PROD_CD AND TYP_BEN_CD AND SPECI_BEN_CD IN TEMP1
TO ALL
CORP_CD AND PROD_CD AND TYP_BEN_CD AND SPECI_BEN_CD IN Example1
END
38. Slide 38 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Focus Re-write using QMF (Run Query) to Save
Results to AoT
The RUN QUERY command saves the query result faster into an accelerator-only
table than the SAVE DATA AS command, because the RUN QUERY command
runs the query and saves the result into a table using a single INSERT INTO
SELECT FROM SQL statement.
TEST.Ben_Period_Query1 /* example part1 */
SELECT
SUM(CORP_ID),SUM(PROD_CD,SUM(TYP_BEN_CD),SUM(DEP_AGE_BEN_PRD_CD),’CORP_
CD’
FROM (DB2 table)
WHERE BEN_CAN_DT > '2016-02-10' AND DEP_AGE_BEN_PRD_CD = 'C' OR
DEP_AGE_BEN_PRD_CD EQ Y')
AND CORP_CD = '2' AND (PROD_CD LT '30' OR PROD_CD > '60')
GROUP BY CORP_CD,PROD,CD,TYP_BEN_CD,SPECI_BEN_CD
Resulting QMF Example
RUN QUERY TEST.BeneFit_Period_Query1 TABLE TEST.Ben_Period_QRY1AGG_AOT
39. Slide 39 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Large Insurance Company
Results from using IDAA AoT Support in QMF
Example #1
QMF Batch
QMF Baseline (z13 – DB2):
5 Minutes - Elapsed
Results: 260 Rows
IDAA:
1 Second -
Elapsed (300x
faster)
This QMF PROC runs 24
queries in succession.
We modified the PROC
to take advantage of the
Accelerator Only Tables
(with Common Table
Expressions) feature in
IDAA.
Example #2
FOCUS
BATCH
REWRITE to
QMF
FOCUS
BATCH
Baseline (z13 – DB2):
167.09 Minutes - Elapsed
Results: 171,670 Rows
IDAA:
10 Seconds -
Elapsed (1003x
faster)
Query was rewritten in
SQL/QMF (Insert sub-
select) to take
advantage of
Accelerator Only Table
(1 second) feature in
IDAA.
Last step (data
grouping/consolidation)
was completed in
FOCUS (9 seconds).
40. Slide 40 of 39
Copyright © 2014 IBM Corporation
All rights reserved
System z Business Analytics Performance Integration
Questions?
Editor's Notes IBM IOD 2011 IBM IOD 2011 IDAA – An extension of DB2 for z/OS, utilizing a Massively parallel processing engine as an add-on appliance.
In simplest terms, it provided 3 varieties of perf and usability enhancements.
Ability to shadow db2 for z/os data, and accelerate query workloads against that data.
Online Archiving, by pushing historical/static data to the IDAA appliance, saving db2 storage.
Creating Accelerator Only Data (AoTs). Enables multi-step processing, on the high performance engine AND facilitates what we will talk more about in the rest of this discussion, which is the ability to in-database analytics and data modeling operations (INZA Functions)
Interfaces:
Using Data Studio via IDAA Studio Plugin to access DB2 for z/os AND idaa, to work with the INZA functions, and Accelerator Only Tables.
Also, SPSS Modeler 17 and, as of March 2016 GA, SPSS Modeler 18. With SPSS Modeler we have the ability to leverage the new support for Netezza In-Database Analytics (INZA) functions to do data modeling right on the IDAA appliance.
SPSS Modeler is a tool that allows users to integrate predictive analytics with decision management. Allows real-time scoring and improves optimization in your organization's processes and operational systems, SPSS Modeler helps your users and systems make the right decision every time.
Brief explanation of stand-alone PDA/Netezza. Vs IDAA.
PDA supports a vast collection of (260+) statistics, data mining and data modeling algorithms that can be invoked against the Netezza database.
This built-in library of statistical and mathematical functions supports a breadth of analytic tools and programming languages. These scalable in-database analytic functions execute analytics in parallel, while abstracting away the complexity for developers, users, and DBAs. Also included are in-database geospatial analytics that are compatible with the industry-standard Esri GIS formats which enable easy integration into existing geospatial analytic environments.
The capabilities provided by the INZA package is really for users and developers interested in leveraging the development and use these analytics algorithms to perform research or other business-related activities. The availability of these functions brings data mining functionality to IDAA, enabling data mining on large data sets, taking advantage of the computational power and parallelization mechanisms provided by the Accelerator.
Most currently available data mining tools suffer significant performance limitations when applied to large data sets. These limitations may be two-fold:
► space: if system memory is used for storing data sets and auxiliary data structures to achieve high performance, the limited memory size and/or address space prevents applying data
mining tools to large data sets.
► time: if external storage is used for storing data sets or auxiliary data structures to overcome memory limitations, the resulting performance decline makes application of data mining tools
to large data sets impractical.
Overcoming both these limitations, the parallel architecture of the IDAA environment enables high-performance computation on large data sets, making it the ideal platform for large scale data mining applications. Mining large data sets might seem unnecessary, as good data mining models can often be created from data samples. However, the widespread practice of using small data samples when working with large data sets is typically a matter of necessity not choice. When highly reliable data mining results are required, no substantial data portions should be discarded. For complex data mining tasks, creating data samples of an appropriate size and structure may be a non-trivial task. The IBM Netezza In-Database Analytics package provides the tools necessary for mining the spectrum of data set sizes. Decision Tree Modeling
Regression Tree Modeling
Naïve Bayes
Kmeans and Two Step Clustering Additional supporting functions Developers Guide: A comprehensive guide to not only INZA analytics functions, but just data modeling and data mining in general. 300+ pages Models can be pushed back to Db2 on z/os and used for real time scoring, etc.
PTF provides support for using EBCDIC data……all of INZA base functions on Netezza Server use UNICODE parameter and output tables prior to PTF2. Focus - Information Builders
BI Tool / Database Programming Language 40