David Rice
IzODA Chief Iteration Manager & Technical Lead of Scale Adoption
drice@us.ibm.com
October 2018
IBM Open Data Analytics for z/OS: z Conference
© 2017 IBM Corporation
2
Trends in the industry: Increasing focus on Real Time
Ø Pervasiveness of Analytics
Ø Business growth
Ø Risk Mitigation
Ø Need for Real-Time
Ø Insight at point of impact
Source & Full Forrester paper: https://www-03.ibm.com/systems/z/solutions/real-time-analytics/data-analysis.html
© 2017 IBM Corporation
3
z/OS
• DB2, IMS, VSAM
• Transactional
Data from
Operational
Systems
• History Data
• Warehouses
Mobile
Chat
Call
Center
Social / Public
Data Scientist
Distributed
• Warehouses
• ODS
• Client Facing Apps
• Departmental
Datamarts
Ø Data / Analytic Currency
Ø Increased security,
governance, privacy risk
Ø Longer ROI for analytic
insights
Ø Added development costs
Ø Data coherency of the lake
Ø Ability to quickly adapt to
suit analytical needs (new
data sources, schemas,
freshness, etc.)
Today’s Typical Current State: migrate all endpoint data to a data ‘lake’, then analyze
• Using an ETL-only approach results in costly side-effects: risk, reduced efficiency and missed opportunity
Challenges
© 2017 IBM Corporation
4
Where do enterprise transactions & data originate?
Data Gravity: Co-locate analytics with data based on value,
volume, rate of change, security…
92 of world’s top 100
banks
10 out of the top 10
insurance organizations
87% of all credit card
transactions and nearly
$8 trillion payments a
year
More than 30 billion
transactions a day,
more than number of
Google searches
64% of Fortune 500 80% of world’s corporate
data
© 2017 IBM Corporation
5
Use Cases Well-Aligned with Analytics on IBM Z
Predominance of data
originates on IBM Z,
z/OS (transactions,
member info,…)
Data volume is large,
distilling data
provides operational
efficiencies
Real-time / near real-
time insights are
valuable
Performance matters
for variety of data on
and off IBM Z
Core transactional
systems of record ae
on IBM Z
Data Gravity
Security / data privacy
needs to be preserved
Podcast: http://www.ibmbigdatahub.com/podcast/making-data-simple-what-data-gravity
© 2017 IBM Corporation
6
Cross Industry Use Case: Modernization, Data Exploration, Hybrid Integration
DB2
z/OS
z/OS
Result Store:
• Frequent
Refresh
• Ease of
Integration
• TCO
advantage
VSAM IMS Hadoop
• Easily blend data from Z and non-Z
• Limit data movement
• Enrich reporting and ad-hoc queries
• Leverage modern, open technologies, skill
Warehouses
Optimized Data Layer
Dashboards, Spreadsheets
Examples: Cognos, Tableau
Ø More current data leveraged across entire infrastructure
Ø Reduced raw data movement costs
Ø Security & data privacy advantages
IBM Open Data Analytics for z/OS
Existing Data Lakes
Business Interfaces
Cloud Platforms
StandardInterfaces
© 2017 IBM Corporation
7
Insurance: Real-Time State of the Business Views
Real-Time Insights
Value: Real-time visualization of state of the business across clients, industries, geographies, products, etc. to determine
profitability, risk assessment, etc. Potential to have current view along with 15-30-60-90 day views for trend analysis
How: Leverage analytics of data in place across various systems, using both internal & external sources
Client 1:
• Life insurance coverage
• Accident coverage
Client 2:
• Vision Coverage
• Accident Risk
Client 3:
• Dental Coverage
• Home coverage
Client 4:
• Disability coverage
• Life Insurance covergae
ProfitabilityView
Activity View
weather
geopolitical
By Industry, product
© 2017 IBM Corporation
8
Use Case - Banking: Enhanced Card Fraud Detection
Existing Rules Engine
• Apply in-house rules for detect
• Invoke 3rd party scores (FICO)
• Apply custom scoring
• Determine Disposition
IBM z/OS
VSAMDB2 IMS
Core Card Process
• Verify, augment data
• Manage workload
• Ensure scale
• Likely: CICS, IMS
Today: Models refreshed periodically, deployment path requires custom coding
Challenge: Emerging fraud pattern detection delayed, model deployment & refresh not agile
Benefit: Current data for modeling, intra-day model refresh, flexibility to add new data via configuration
Point of
sale
systems
ETL
Warehouse
Warehouse
DB2 IMS VSAM
Real Time Analytics: leverage in-place current
access to variety of data sources
• Create Models
• Apply Data Science
• Refresh Models
• Schedule
Deployment
Coding
Deploy
IBM z/OS
© 2017 IBM Corporation
9
Example: Real-Time ACH Analytics for Banking Clients
ACH Processing:
• ACH Payment origination & receipt
• Interaction with Automated Clearing House
verification
• Implementation of NACHA rules
• Defined data formats for exchange of info
IBM z/OS
ACH format
ACH format
ACH format
“All Items”: ACH, POS,
WEB, etc
Batch
Posting
Process
Future:
Real Time
Process
Real-Time Insights
Real-Time Analytics
• Real-time payment and
ACH analytics on RT
payments
• Increased granularity of
compliance / risk / fraud
analytics
• Integration across ACH
and core banking systems
Today: Largely post processed, multi-day verification of ACH rejects, fraud / risk assessment, delay in insights
Challenge: Same-day payments creates requirement to address rejects, fraud immediately, in real-time scope
Benefit: In-place, real-time analytics of ACH data for compliance / fraud risk to address same-day payments, accessing
source data as well as off platform data via federation
1
Warehouse
© 2017 IBM Corporation
10
DB2 z/OS IMS VSAM
z/OS
Optimized
Analytics
Runtime
Enterprise Data
Environments
Ø Leverage most current data, in
place
Ø Flexible structure, rich analytics
runtime co-located data
Ø TCO advantages
Ø Leverage leading open source
technologies & skills
Ø Enable advanced solutions
from IBM and partners
Ø Integrate and differentiate
Apache Spark for
z/OS
Python / Anaconda
Open Source stack
Optimized Data Layer
z/OS
WarehousesHadoop
Distributed
IBM Machine
Learning for
z/OS
Solutions from
SIs & Business
Partners
Other IBM based
solutions &
Client Solutions
Solutions
Example: Federated Analytics, Access to Wide Variety of Data: Modernization, Exploration, Integration2
Optimized Data Layer: Integrated Access to DB2, IMS, IMS raw read , VSAM, PS, PDSE, ADABAS,
IDMS, CICS Queues, Virtual Tape, SMF, Syslog, Oracle Enterprise, Teradata, HDFS… etc
© 2017 IBM Corporation
11
Abstracted
access to z/OS
Data
} from VSAM
} from DB2
Modern Analytic Frameworks &
Tools
3
© 2017 IBM Corporation
12
Value: Reduce Risk à via Simplified Data Privacy via Configuration
Cust_ID Avg
Daily TX
Education Education
Group
Social Security
Number
Investment Avg TX
AMT
Churn Label Age
1009530860 3.9145 2 BS 123-84-9015 114368 2090.32 N 84
1009544000 4.28 2 BS 122-49-3821 90298 2095.04 N 44
1009534260 1.23 2 BS 931-29-0612 94881 1723.59 Y 23
1009574010 0.95 2 BS 491-19-2102 112099 1297.41 Y 24
1009578620 2.73 5 DR 813-90-4183 84638 1333.18 N 67
Features FeaturesNot Feature Not Feature, PII
Cust_ID Avg
Daily TX
Education Education
Group
Investment Avg TX
AMT
Churn Label Age
1009530860 3.9145 2 BS 114368 2090.32 N 84
1009544000 4.28 2 BS 90298 2095.04 N 44
1009534260 1.23 2 BS 94881 1723.59 Y 23
1009574010 0.95 2 BS 112099 1297.41 Y 24
1009578620 2.73 5 DR 84638 1333.18 N 67
View of Table Visible to Data Scientists
Original Table
Sensitive Data
– View presented to
data science
teams can be
different than
original
– Via UI
configuration,
obfuscate or
remove select
columns
– Configure for
varying levels of
access based on
PII designations
– Flexibility for data
protection
4
© 2017 IBM Corporation
13
Apache Spark z/OS: Cost Efficiency & Powerful Data-in-Place Analytics
§ Spark on z/OS joins multiple data types for fast,
complete analytics, without moving the data
§ Test of >350M rows read, parsed, analyzed, and
summarized (approx. 60gig)
§ Average Spark processing times – average of 3
minutes on a single z13 LPAR with 1 GP, 13 zIIPS
and 512Gb memory:
– DB2: 2.35 minutes (4.1 mins.
maximum)
– Flat File: 2.95 minutes (3.2 mins. Maximum)
– VSAM: 2.80 minutes (3.3 mins. Maximum)
DB2
z/OS
Flat file
VSAM
z/OS
JDBC
JDBC
JDBC
88% zIIP
offload
97% zIIP
offload
97% zIIP
offload
Use Case: Large Data Pull --- bring back all 350Million rows from each data
source, touch each data element and run Spark aggregation across all data
Source: IBM Competitive Project Office
5
© 2017 IBM Corporation
14
Apache Spark z/OS: Cost Efficiency & Powerful Data-in-Place Analytics
Trade
166GB
Brokerage aggregation query
workload across Trades tables
from 3 exchanges (over 5
Billion trades, 500GB)
* 3-Year TCA includes 3-year US prices for Hardware, Software, Maintenance and
Support as of 05/16/2016. Price and performance for x86 environment includes cost of
ETL and elapsed time to transfer the data. This is based on an IBM internal study
designed to replicate a typical IBM customer workload usage in the marketplace.
z13-606 + 11 zIIPs
z13-605 Competitor x86 System
Intel E5-2697 v2 2.7GHz 12co
lower TCA*For systems compared67%
$2,105,990
(3 yr. TCA)
$697,106
(3 yr. TCA)
Linux
Apache
Spark
Parquet
z/OS
CICS
DB2
z/OS
CICS
DB2
Apache
Spark
ETL
© 2017 IBM Corporation
15
Minimizing Impact to Production6
Ø Current Challenges:
q Current status quo ETL processes consume GP MIPS, often run during batch window cycles that causes potential
issues for client batch workloads
q Analytics off platform that accesses z/OS data often goes through standard subsystem interfaces for DB2 & IMS,
interfering with bufferpools and resulting in lower zIIP eligibility
Ø Analytics on z/OS has unique features to minimize impact to production workloads:
1. Limit Analytic Workloads’ Access to resources via capping zIIPs & memory; leverage WLM classifications
2. Leverage Unique “Raw-Read” Features – avoid impact to IMS & DB2 subsystems, high zIIP eligibility
3. Leverage Unique DataFrame Store – separate well-formed analytics, persist result, enable off platform
ad-hoc analytics to DataFrame store
4. Analytic workloads are all read-only (no locks held)
© 2017 IBM Corporation
16
Jupyter Demo
© 2017 IBM Corporation
17
Ø Machine Learning and z Systems:
Ø https://www.youtube.com/watch?v=T2HtyNX7aHc
Ø Machine Learning Launch Event interview:
Ø https://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p
Ø Gaining Agility with Spark Analytics on z Systems
Ø https://www.youtube.com/watch?v=Y7HQbKBR_l4
Ø Youtube of IBM Edge Analytics Segment featuring State of California and Jack Henry Associates
Ø https://www.youtube.com/watch?v=ws9rLnXyb3g&feature=youtu.be (Analytics segment starts 26:25 into the video)
Ø IBM z/OS Platform for Apache Spark
Ø https://www-03.ibm.com/systems/z/os/zos/apache-spark.html
Ø IBM Knowledge Center: z/OS Platform for Apache Spark
Ø https://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.azk/azk.htm
Ø IBM Knowledge Center: IBM Machine Learning for z/OS
Ø https://www.ibm.com/support/knowledgecenter/SS9PF4_1.1.0/src/tpc/mlz_home.html
Ø Redbook: Apache Spark Implementation on IBM z/OS
Ø http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf
Ø IBM Machine Learning for z/OS Marketplace
Ø https://www.ibm.com/us-en/marketplace/machine-learning-for-zos
Useful Links
© 2017 IBM Corporation
18
Comments & Questions?

IBM Z for the Digital Enterprise - IBM Z Open Data Analytics

  • 1.
    David Rice IzODA ChiefIteration Manager & Technical Lead of Scale Adoption drice@us.ibm.com October 2018 IBM Open Data Analytics for z/OS: z Conference
  • 2.
    © 2017 IBMCorporation 2 Trends in the industry: Increasing focus on Real Time Ø Pervasiveness of Analytics Ø Business growth Ø Risk Mitigation Ø Need for Real-Time Ø Insight at point of impact Source & Full Forrester paper: https://www-03.ibm.com/systems/z/solutions/real-time-analytics/data-analysis.html
  • 3.
    © 2017 IBMCorporation 3 z/OS • DB2, IMS, VSAM • Transactional Data from Operational Systems • History Data • Warehouses Mobile Chat Call Center Social / Public Data Scientist Distributed • Warehouses • ODS • Client Facing Apps • Departmental Datamarts Ø Data / Analytic Currency Ø Increased security, governance, privacy risk Ø Longer ROI for analytic insights Ø Added development costs Ø Data coherency of the lake Ø Ability to quickly adapt to suit analytical needs (new data sources, schemas, freshness, etc.) Today’s Typical Current State: migrate all endpoint data to a data ‘lake’, then analyze • Using an ETL-only approach results in costly side-effects: risk, reduced efficiency and missed opportunity Challenges
  • 4.
    © 2017 IBMCorporation 4 Where do enterprise transactions & data originate? Data Gravity: Co-locate analytics with data based on value, volume, rate of change, security… 92 of world’s top 100 banks 10 out of the top 10 insurance organizations 87% of all credit card transactions and nearly $8 trillion payments a year More than 30 billion transactions a day, more than number of Google searches 64% of Fortune 500 80% of world’s corporate data
  • 5.
    © 2017 IBMCorporation 5 Use Cases Well-Aligned with Analytics on IBM Z Predominance of data originates on IBM Z, z/OS (transactions, member info,…) Data volume is large, distilling data provides operational efficiencies Real-time / near real- time insights are valuable Performance matters for variety of data on and off IBM Z Core transactional systems of record ae on IBM Z Data Gravity Security / data privacy needs to be preserved Podcast: http://www.ibmbigdatahub.com/podcast/making-data-simple-what-data-gravity
  • 6.
    © 2017 IBMCorporation 6 Cross Industry Use Case: Modernization, Data Exploration, Hybrid Integration DB2 z/OS z/OS Result Store: • Frequent Refresh • Ease of Integration • TCO advantage VSAM IMS Hadoop • Easily blend data from Z and non-Z • Limit data movement • Enrich reporting and ad-hoc queries • Leverage modern, open technologies, skill Warehouses Optimized Data Layer Dashboards, Spreadsheets Examples: Cognos, Tableau Ø More current data leveraged across entire infrastructure Ø Reduced raw data movement costs Ø Security & data privacy advantages IBM Open Data Analytics for z/OS Existing Data Lakes Business Interfaces Cloud Platforms StandardInterfaces
  • 7.
    © 2017 IBMCorporation 7 Insurance: Real-Time State of the Business Views Real-Time Insights Value: Real-time visualization of state of the business across clients, industries, geographies, products, etc. to determine profitability, risk assessment, etc. Potential to have current view along with 15-30-60-90 day views for trend analysis How: Leverage analytics of data in place across various systems, using both internal & external sources Client 1: • Life insurance coverage • Accident coverage Client 2: • Vision Coverage • Accident Risk Client 3: • Dental Coverage • Home coverage Client 4: • Disability coverage • Life Insurance covergae ProfitabilityView Activity View weather geopolitical By Industry, product
  • 8.
    © 2017 IBMCorporation 8 Use Case - Banking: Enhanced Card Fraud Detection Existing Rules Engine • Apply in-house rules for detect • Invoke 3rd party scores (FICO) • Apply custom scoring • Determine Disposition IBM z/OS VSAMDB2 IMS Core Card Process • Verify, augment data • Manage workload • Ensure scale • Likely: CICS, IMS Today: Models refreshed periodically, deployment path requires custom coding Challenge: Emerging fraud pattern detection delayed, model deployment & refresh not agile Benefit: Current data for modeling, intra-day model refresh, flexibility to add new data via configuration Point of sale systems ETL Warehouse Warehouse DB2 IMS VSAM Real Time Analytics: leverage in-place current access to variety of data sources • Create Models • Apply Data Science • Refresh Models • Schedule Deployment Coding Deploy IBM z/OS
  • 9.
    © 2017 IBMCorporation 9 Example: Real-Time ACH Analytics for Banking Clients ACH Processing: • ACH Payment origination & receipt • Interaction with Automated Clearing House verification • Implementation of NACHA rules • Defined data formats for exchange of info IBM z/OS ACH format ACH format ACH format “All Items”: ACH, POS, WEB, etc Batch Posting Process Future: Real Time Process Real-Time Insights Real-Time Analytics • Real-time payment and ACH analytics on RT payments • Increased granularity of compliance / risk / fraud analytics • Integration across ACH and core banking systems Today: Largely post processed, multi-day verification of ACH rejects, fraud / risk assessment, delay in insights Challenge: Same-day payments creates requirement to address rejects, fraud immediately, in real-time scope Benefit: In-place, real-time analytics of ACH data for compliance / fraud risk to address same-day payments, accessing source data as well as off platform data via federation 1 Warehouse
  • 10.
    © 2017 IBMCorporation 10 DB2 z/OS IMS VSAM z/OS Optimized Analytics Runtime Enterprise Data Environments Ø Leverage most current data, in place Ø Flexible structure, rich analytics runtime co-located data Ø TCO advantages Ø Leverage leading open source technologies & skills Ø Enable advanced solutions from IBM and partners Ø Integrate and differentiate Apache Spark for z/OS Python / Anaconda Open Source stack Optimized Data Layer z/OS WarehousesHadoop Distributed IBM Machine Learning for z/OS Solutions from SIs & Business Partners Other IBM based solutions & Client Solutions Solutions Example: Federated Analytics, Access to Wide Variety of Data: Modernization, Exploration, Integration2 Optimized Data Layer: Integrated Access to DB2, IMS, IMS raw read , VSAM, PS, PDSE, ADABAS, IDMS, CICS Queues, Virtual Tape, SMF, Syslog, Oracle Enterprise, Teradata, HDFS… etc
  • 11.
    © 2017 IBMCorporation 11 Abstracted access to z/OS Data } from VSAM } from DB2 Modern Analytic Frameworks & Tools 3
  • 12.
    © 2017 IBMCorporation 12 Value: Reduce Risk à via Simplified Data Privacy via Configuration Cust_ID Avg Daily TX Education Education Group Social Security Number Investment Avg TX AMT Churn Label Age 1009530860 3.9145 2 BS 123-84-9015 114368 2090.32 N 84 1009544000 4.28 2 BS 122-49-3821 90298 2095.04 N 44 1009534260 1.23 2 BS 931-29-0612 94881 1723.59 Y 23 1009574010 0.95 2 BS 491-19-2102 112099 1297.41 Y 24 1009578620 2.73 5 DR 813-90-4183 84638 1333.18 N 67 Features FeaturesNot Feature Not Feature, PII Cust_ID Avg Daily TX Education Education Group Investment Avg TX AMT Churn Label Age 1009530860 3.9145 2 BS 114368 2090.32 N 84 1009544000 4.28 2 BS 90298 2095.04 N 44 1009534260 1.23 2 BS 94881 1723.59 Y 23 1009574010 0.95 2 BS 112099 1297.41 Y 24 1009578620 2.73 5 DR 84638 1333.18 N 67 View of Table Visible to Data Scientists Original Table Sensitive Data – View presented to data science teams can be different than original – Via UI configuration, obfuscate or remove select columns – Configure for varying levels of access based on PII designations – Flexibility for data protection 4
  • 13.
    © 2017 IBMCorporation 13 Apache Spark z/OS: Cost Efficiency & Powerful Data-in-Place Analytics § Spark on z/OS joins multiple data types for fast, complete analytics, without moving the data § Test of >350M rows read, parsed, analyzed, and summarized (approx. 60gig) § Average Spark processing times – average of 3 minutes on a single z13 LPAR with 1 GP, 13 zIIPS and 512Gb memory: – DB2: 2.35 minutes (4.1 mins. maximum) – Flat File: 2.95 minutes (3.2 mins. Maximum) – VSAM: 2.80 minutes (3.3 mins. Maximum) DB2 z/OS Flat file VSAM z/OS JDBC JDBC JDBC 88% zIIP offload 97% zIIP offload 97% zIIP offload Use Case: Large Data Pull --- bring back all 350Million rows from each data source, touch each data element and run Spark aggregation across all data Source: IBM Competitive Project Office 5
  • 14.
    © 2017 IBMCorporation 14 Apache Spark z/OS: Cost Efficiency & Powerful Data-in-Place Analytics Trade 166GB Brokerage aggregation query workload across Trades tables from 3 exchanges (over 5 Billion trades, 500GB) * 3-Year TCA includes 3-year US prices for Hardware, Software, Maintenance and Support as of 05/16/2016. Price and performance for x86 environment includes cost of ETL and elapsed time to transfer the data. This is based on an IBM internal study designed to replicate a typical IBM customer workload usage in the marketplace. z13-606 + 11 zIIPs z13-605 Competitor x86 System Intel E5-2697 v2 2.7GHz 12co lower TCA*For systems compared67% $2,105,990 (3 yr. TCA) $697,106 (3 yr. TCA) Linux Apache Spark Parquet z/OS CICS DB2 z/OS CICS DB2 Apache Spark ETL
  • 15.
    © 2017 IBMCorporation 15 Minimizing Impact to Production6 Ø Current Challenges: q Current status quo ETL processes consume GP MIPS, often run during batch window cycles that causes potential issues for client batch workloads q Analytics off platform that accesses z/OS data often goes through standard subsystem interfaces for DB2 & IMS, interfering with bufferpools and resulting in lower zIIP eligibility Ø Analytics on z/OS has unique features to minimize impact to production workloads: 1. Limit Analytic Workloads’ Access to resources via capping zIIPs & memory; leverage WLM classifications 2. Leverage Unique “Raw-Read” Features – avoid impact to IMS & DB2 subsystems, high zIIP eligibility 3. Leverage Unique DataFrame Store – separate well-formed analytics, persist result, enable off platform ad-hoc analytics to DataFrame store 4. Analytic workloads are all read-only (no locks held)
  • 16.
    © 2017 IBMCorporation 16 Jupyter Demo
  • 17.
    © 2017 IBMCorporation 17 Ø Machine Learning and z Systems: Ø https://www.youtube.com/watch?v=T2HtyNX7aHc Ø Machine Learning Launch Event interview: Ø https://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p Ø Gaining Agility with Spark Analytics on z Systems Ø https://www.youtube.com/watch?v=Y7HQbKBR_l4 Ø Youtube of IBM Edge Analytics Segment featuring State of California and Jack Henry Associates Ø https://www.youtube.com/watch?v=ws9rLnXyb3g&feature=youtu.be (Analytics segment starts 26:25 into the video) Ø IBM z/OS Platform for Apache Spark Ø https://www-03.ibm.com/systems/z/os/zos/apache-spark.html Ø IBM Knowledge Center: z/OS Platform for Apache Spark Ø https://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.azk/azk.htm Ø IBM Knowledge Center: IBM Machine Learning for z/OS Ø https://www.ibm.com/support/knowledgecenter/SS9PF4_1.1.0/src/tpc/mlz_home.html Ø Redbook: Apache Spark Implementation on IBM z/OS Ø http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf Ø IBM Machine Learning for z/OS Marketplace Ø https://www.ibm.com/us-en/marketplace/machine-learning-for-zos Useful Links
  • 18.
    © 2017 IBMCorporation 18 Comments & Questions?