Data and AI Forum 2019
Coud-based Data Lake
for Analytics and AI
Torsten Steinbach
Cloud Data Services Architect
Data and AI Forum 2019
Evolution of Form Factors For Big Data Analytics
Enterprise Data
Warehouses
Tightly integrated and
optimized systems
Hadoop
Introduced open data formats &
easy scaling on commodity HW
Cloud-Native:
Serverless Analytics-aaS
• Elasticity
• Pay-per-query
• Data in object store
• Disaggregated architecture
• No more infrastructure head aches
The 90-ies 2000 Today
Blog Article:
Big Data
Data and AI Forum 2019
The Role of a Data Lake
Data
Origination
Data
Lake
Data
Purpose
ApplicationsApplicationsApplications
BI & AI
Telemetry
Data
100 % Elasticity
Persist
Organize
Prepare
Optimize
Index Govern
ApplicationsApplicationsIoT Devices
Database
s Archived Data
Database
s
Reporting
Dashboarding
Model Training
Predicting
DWH
Promote Data
Analyze
Interactive
Analytics
This is the “SQL Sandwich”
Data and AI Forum 2019
The SQL Sandwich
Object Storage
Object Storage
Data Warehouse
Raw Data
High Quality
Data
Archived Data
SQL ETL
SQL ETL
SQLFederation
Explore, Prepare &
Batch Analytics
Interactive Analytics
with SLAs
Compliance
Reporting
SQL
SQL
SQL
Blog Article:
SQL Sandwich
Data and AI Forum 2019
SQL on Object
Storage
DM Gartner
Hype Cycle
2019
Data and AI Forum 2019
The Layers of IBM Cloud Data Lake
Ingest LogDNA
Event
Streams
Streaming
Analytics
Cloud
Functions
IBM COSKMS IAM
SQL-based ETL, ELT & Query
Timeseries &
Spatial Extenders
Automation
(Cloud Functions)
Indexing
Big Data
Metadata
Persist
Manage
ETL
Process
Governance Search
Blog Article:
Cloud Data Lake
Data and AI Forum 2019
IBM Cloud Data Lake & Cloud Pak for Data
IBM Cloud
Data Lake
IBM Cloud
Pak for Data
• Fully Managed
• Serverless Consumption
• Fully Elastic
• Reserved Compute & SLAs
• Enterprise Options: Db2
Warehouse, BigSQL & Db2
Event Store
Collect
Organize
Analyze
Infuse
LaddertoAI
Data and AI Forum 2019
IBM SQL Query
Cloud Data
Data
Transformation
Serverless SQL
Analytics
Object
Storage
Db2
+
Developers
Data
Engineers
Data Analysts
 Perfect for Machine Generated Data
 Ad-hoc Data Exploration
 Operationalizing Data Pipelines
 Big Data Lakes
 Flexible Data Transformation
 Extremely affordable. 5$/TB scanned
 100% API enabled
 Analytics on Object Storage
 Big Data Scale-Out. Running on Spark
 100% Self service – No Setup
Data and AI Forum 2019
IBM SQL Query Architecture
2. Read data
4. Read
results
Application
3. Write results
IBM Cloud
Object Storage
Result SetData Set
Data Set
Data Set
1. Submit SQL
SQL
Archive / Export
IBM Cloud Streaming
IBM Streams
Event Streams
Land
Query
IBM Cloud Functions
IBM Cloud Databases
Db2 on Cloud
Geospatial SQLIndexes
Timeseries SQL
Upload
Hive Metastore
3. Write
To Table
Data and AI Forum 2019
What supported formats are analytics friendly?
Blog Article:
Data Layout
Data and AI Forum 2019
IBM SQL Query – Access Patterns
SQL REST API
Create
Query
SQL Web Console
Watson
Studio
Notebooks
SQL Cloud Function
Integrate Explore
Deploy
Node SDK
Python SDK
JDBC
Data and AI Forum 2019
Telemetry Data Pipelines for BI & AI
Telemetry Data Prep in Data Lake BI & AI
ApplicationsApplicationsApplications
BI & AI
Land
Telemetry
Data
ApplicationsApplicationsIoT Devices
Reporting
Dashboarding
Model Training
Predicting
Db2
Warehouse
IBM Cloud Object Storage
Query
Cleanse
Filter
Merge
Aggregate
Compress
Explore
Promote
Interactive
Analytics
Data and AI Forum 2019
Promoting Data After Preparation
SELECT …
INTO <COS URI> <format & layout ops> |
<Db2 service CRN> | <Db2 database URI> /<table name>
[CREATE | OVERWRITE | APPEND] [PARALLELISM <num>]
COS URI: e.g. cos://us-south/myBucket/myFolder/myData.parquet
COS Format/Layout: e.g. STORED AS PARQUET PARTITIONED BY (city, date)
Db2 options:
PARALLELISM: Number of parallel threads for writing (default 1)
Examples:
… INTO db2://db2w-dja.us-south.db2w.cloud.ibm.com/MYSCHEMA.MYTABLE PARALLELISM 20
… INTO crn:v1:bluemix:public:dashdb-for-tx:us-south:s/c38…:cf-service-instance:/MYTABLE
* future
Promote
on COS
Promote
to Db2
Blog Article:
Db2 ETL
Backup
Multi-Cloud is here
However:
• 39% claim too much infrastructure complexity
• 39% of businesses cannot analyze the entire
environment
AberdeenGroup, 2019
70%of enterprises will be implementing a
multi-cloud strategy by 2019
Gartner
© 2019 IBM Corporation
SQL on Cloud Object Storage in Db2
 “SQL on COS” to be available in all Db2 form factors in 2020, with support for all
open source formats (CSV, Parquet, ORC, etc)
 Decoupled storage opens up multiple “modernization points” for Db2
– Compute Elasticity
– Resilience to Node Failures
– Multiple instances operating against the same data
– Transient/ephemeral instances
17
Use AWS client to list files in IBM COS bucket
named ‘bigsql-secure’
Table ‘sales_fact’ contains Parquet files
partitioned by organization_keySynchronize table ‘sales_fact’ from
external metadata catalogQuery table ‘sales_fact’ (Parquet data
on COS) from Db2!
Data and AI Forum 2019
Serverless Stack for Analytics
Serverless
Storage
Serverless
Runtimes
Serverless
Analytics
Object
Storage
Cloud
Function
s
Query
Data and AI Forum 2019
IBM Cloud Functions + SQL Query – User Cases
Unstructured Data Prep
SQL
Query
Cloud
Functions
Analyze
COSCOS
Extract Features
Automated/Scheduled SQL Execution
SQL
Query
Cloud
Functions
Develop SQL Deploy as SQL Cloud Function
Set up Cloud
Function
Trigger/Schedule
Shield Data From Direct Access
SQL
Query
Cloud
Functions
Deploy Cloud Function
with COS API Key
User Calls
Function to
Access Data
COS
Grant Execute on SQL
Cloud Function to User
Configure SQL Pipelines
SQL
Query
Cloud
Functions
User creates function
sequence to automate flow
of consecutive SQLs
Sequence
SQL
Query
Cloud
Functions
1
.
2
.
Data and AI Forum 2019
Object Storage
IBM Cloud Object Storage
Objects
Objects
Objects
At Rest
On the Wire
Buckets
Encrypted
Pennies per GB
REST
Elastic
Durable
Flexible
Resiliency Choices
Storage Classes
User Managed
Encryption Keys
S3 Compatible
High Speed Data
Transfer
Aspera
SQL Queries
Data and AI Forum 2019
COS Ingest Options
High Customizability
Degree of Serverless-ness
IBM Event Streams
(Kafka aaS)
IBM Cloud Functions
Out-of-the-Box
IBM Streaming Analytics
(IBM Streams aaS)
via Cloud Object Storage API
SQL Query ETL
Cloudant Replication
Blockchain Synch
Data and AI Forum 2019
SQL Query Scale Out Architecture
Data Center 2
Analytics Engine Cluster
20 Kernels
Node 1
Node 3
Node 2
Node 3
…
20
Kernels
…
Data Center 3
Analytics Engine Cluster
20 Kernels
Node 1
Node 3
Node 2
Node 3
…
20
Kernels
…
SQL 1 SQL 1
Data Center 1
Analytics Engine Cluster
20 Kernels
Cluster
Pool
Request Queue
Node 1
Node 3
Node 2
Node 3
…
Kernel
Pools
20
Kernels
…
SQL 1 SQL 2 SQL 3 SQL 4 SQL 5
Cloud Object Storage
SQL 6 …
JKG (Web Sockets)
Data and AI Forum 2019
SQL Query Built on Apache Spark
Best of breed Spark SQL Reference
• Complete, intuitive and interactive SQL Reference
• Each sample SQL can immediately be executed as is
https://cloud.ibm.com/docs/services/sql-query/sqlref/sql_reference.html#sql-reference
IBM Spark
SQL
Reference
Data and AI Forum 2019
Analyzing Application Logs
Logs
Your Cloud
Application/Solution
IBM Cloud Object Storage
Query
Transform
Compress
Aggregate
Repartition
Analyze
Anomaly Detection
User Segmentation
Customer Support
Resource Planning
• Build & run data pipelines and analytics of your log message data
• Flexible log data analytics with full power of SQL
• Seamless scalability & elasticity according to your log message volume
Data and AI Forum 2019
IBM SQL Query – Timeseries SQL 1/2
 Intuitive first-of-a-kind SQL extensions for timeseries operations
 Industry leading differentiators, including:
• Timeseries transformation functions:
• Correlation, Fourier transformation,
z-normalization, Granger, interpolation,
and distances
• Temporal Joins: SQL support for
Left/Right/Full Inner and Outer joins
of multiple timeseries
Alignment & Joining:
Data and AI Forum 2019
IBM SQL Query – Timeseries SQL 2/2
 Further Industry leading differentiators
• Numerical and categorical timeseries types
• Timeseries data skipping for fast queries
• Forecasting:
• ARIMA, BATS, Anomaly detection, etc.
• Subsequence Mining:
• Train & match models for event sequences
• Segmentation:
• Time-based, Record-based, Anchor-based, Burst, and silence
Segmentation:
Data and AI Forum 2019
IBM SQL Query – Spatial SQL
 SQL/MM standard to store & analyze spatial data in RDBMS
 Migration of PostGIS compliant SQL queries
 Aggregation, computation and join via native SQL syntax
 Industry leading differentiators
• Geodetic Full Earth support
• Increased developer productivity
• Avoid piece-wise planar projections
• High precision calculations anywhere on the earth
• Very large polygons (e.g. countries), polar caps, x-ing anti-meridian
• Spatial data skipping for fast queries
• Native and fine-granular geohash support
• Fast spatial aggregation
Data and AI Forum 2019
Combining Spatial and Temporal Processing
IBM Cloud Object Storage
Sensor
Data
Query
Location
Analytics
Mobile
Cars
Devices
Land
Location
Filtering
Spatial
Aggregation
GPS
SQL/MM
Sensor
Metrics
t
t
t
Timeseries
Assembly
Timeseries
Join
Timeseries SQL
t
Data and AI Forum 2019
Scaling COS Big Data Processing: Data Skipping
Index All
Objects
IBM Cloud Object Storage
Data Set Objects
SQL
Query
Data Skipping
Indexing
Candidate
Objects
WHERE Clause
Saving Time
and $
SQL Query learns which objects are not relevant to a query
using a data skipping index
CREATE METAINDEX stores index summary metadata
for each object. Much smaller than the data.
SQLs skipping irrelevant objects to significantly reduce I/O
E.g.:
Independent of data formats
Index Types: Min/Max, Value List, Bounding Box
Get location and time of heat waves (>40 celcius)
SELECT lat, long, city, temp, date
FROM weather
WHERE temp > 40.0
Data and AI Forum 2019
Notices and disclaimers
Copyright © 2019 by International Business Machines Corporation (IBM).
No part of this document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that
have not yet been announced by IBM) has been reviewed for accuracy as of the
date of initial publication and could include unintentional technical or
typographical errors. IBM shall have no responsibility to update this information.
This document is distributed “as is” without any warranty, either express or
implied. In no event shall IBM be liable for any damage arising from the use
of this information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and services
are warranted according to the terms and conditions of the agreements under
which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans
are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled,
isolated environments. Customer examples are presented
as illustrations of how those customers have used IBM products and
the results they may have achieved. Actual performance, cost, savings or other
results in other operating environments may vary.
References in this document to IBM products, programs, or services does not
imply that IBM intends to make such products, programs or services available in
all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by
independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for informational
purposes only, and are neither intended to, nor shall constitute legal or other
guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal
requirements and to obtain advice of competent legal counsel as to
the identification and interpretation of any relevant laws and regulatory
requirements that may affect the customer’s business and any actions
the customer may need to take to comply with such laws. IBM does not provide
legal advice or represent or warrant that its services or products will ensure that
the customer is in compliance with any law.
Data and AI Forum 2019
Notices and disclaimers
continued
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products in
connection with this publication and cannot confirm the accuracy of
performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be
addressed to the suppliers of those products. IBM does not warrant the
quality of any third-party products, or the ability of any such third-party
products to interoperate with IBM’s products. IBM expressly
disclaims all warranties, expressed or implied, including but not
limited to, the implied warranties of merchantability and fitness
for a particular, purpose.
The provision of the information contained herein is not intended to,
and does not, grant any right or license under any IBM patents,
copyrights, trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS,
Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document
Management System™, FASP®, FileNet®, Global Business Services®,
Global Technology Services®, IBM ExperienceOne™, IBM
SmartCloud®, IBM Social Business®, Information on Demand, ILOG,
Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON,
OpenPower, PureAnalytics™, PureApplication®, pureCluster™,
PureCoverage®, PureData®, PureExperience®, PureFlex®,
pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®,
Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli® Trusteer®, Unica®, urban{code}®, Watson,
WebSphere®, Worklight®, X-Force® and System z® Z/OS, are
trademarks of International Business Machines Corporation, registered
in many jurisdictions worldwide. Other product and service names
might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark
information" at: www.ibm.com/legal/copytrade.shtml.

Coud-based Data Lake for Analytics and AI

  • 1.
    Data and AIForum 2019 Coud-based Data Lake for Analytics and AI Torsten Steinbach Cloud Data Services Architect
  • 2.
    Data and AIForum 2019 Evolution of Form Factors For Big Data Analytics Enterprise Data Warehouses Tightly integrated and optimized systems Hadoop Introduced open data formats & easy scaling on commodity HW Cloud-Native: Serverless Analytics-aaS • Elasticity • Pay-per-query • Data in object store • Disaggregated architecture • No more infrastructure head aches The 90-ies 2000 Today Blog Article: Big Data
  • 3.
    Data and AIForum 2019 The Role of a Data Lake Data Origination Data Lake Data Purpose ApplicationsApplicationsApplications BI & AI Telemetry Data 100 % Elasticity Persist Organize Prepare Optimize Index Govern ApplicationsApplicationsIoT Devices Database s Archived Data Database s Reporting Dashboarding Model Training Predicting DWH Promote Data Analyze Interactive Analytics This is the “SQL Sandwich”
  • 4.
    Data and AIForum 2019 The SQL Sandwich Object Storage Object Storage Data Warehouse Raw Data High Quality Data Archived Data SQL ETL SQL ETL SQLFederation Explore, Prepare & Batch Analytics Interactive Analytics with SLAs Compliance Reporting SQL SQL SQL Blog Article: SQL Sandwich
  • 5.
    Data and AIForum 2019 SQL on Object Storage DM Gartner Hype Cycle 2019
  • 6.
    Data and AIForum 2019 The Layers of IBM Cloud Data Lake Ingest LogDNA Event Streams Streaming Analytics Cloud Functions IBM COSKMS IAM SQL-based ETL, ELT & Query Timeseries & Spatial Extenders Automation (Cloud Functions) Indexing Big Data Metadata Persist Manage ETL Process Governance Search Blog Article: Cloud Data Lake
  • 7.
    Data and AIForum 2019 IBM Cloud Data Lake & Cloud Pak for Data IBM Cloud Data Lake IBM Cloud Pak for Data • Fully Managed • Serverless Consumption • Fully Elastic • Reserved Compute & SLAs • Enterprise Options: Db2 Warehouse, BigSQL & Db2 Event Store Collect Organize Analyze Infuse LaddertoAI
  • 8.
    Data and AIForum 2019 IBM SQL Query Cloud Data Data Transformation Serverless SQL Analytics Object Storage Db2 + Developers Data Engineers Data Analysts  Perfect for Machine Generated Data  Ad-hoc Data Exploration  Operationalizing Data Pipelines  Big Data Lakes  Flexible Data Transformation  Extremely affordable. 5$/TB scanned  100% API enabled  Analytics on Object Storage  Big Data Scale-Out. Running on Spark  100% Self service – No Setup
  • 9.
    Data and AIForum 2019 IBM SQL Query Architecture 2. Read data 4. Read results Application 3. Write results IBM Cloud Object Storage Result SetData Set Data Set Data Set 1. Submit SQL SQL Archive / Export IBM Cloud Streaming IBM Streams Event Streams Land Query IBM Cloud Functions IBM Cloud Databases Db2 on Cloud Geospatial SQLIndexes Timeseries SQL Upload Hive Metastore 3. Write To Table
  • 10.
    Data and AIForum 2019 What supported formats are analytics friendly? Blog Article: Data Layout
  • 11.
    Data and AIForum 2019 IBM SQL Query – Access Patterns SQL REST API Create Query SQL Web Console Watson Studio Notebooks SQL Cloud Function Integrate Explore Deploy Node SDK Python SDK JDBC
  • 12.
    Data and AIForum 2019 Telemetry Data Pipelines for BI & AI Telemetry Data Prep in Data Lake BI & AI ApplicationsApplicationsApplications BI & AI Land Telemetry Data ApplicationsApplicationsIoT Devices Reporting Dashboarding Model Training Predicting Db2 Warehouse IBM Cloud Object Storage Query Cleanse Filter Merge Aggregate Compress Explore Promote Interactive Analytics
  • 13.
    Data and AIForum 2019 Promoting Data After Preparation SELECT … INTO <COS URI> <format & layout ops> | <Db2 service CRN> | <Db2 database URI> /<table name> [CREATE | OVERWRITE | APPEND] [PARALLELISM <num>] COS URI: e.g. cos://us-south/myBucket/myFolder/myData.parquet COS Format/Layout: e.g. STORED AS PARQUET PARTITIONED BY (city, date) Db2 options: PARALLELISM: Number of parallel threads for writing (default 1) Examples: … INTO db2://db2w-dja.us-south.db2w.cloud.ibm.com/MYSCHEMA.MYTABLE PARALLELISM 20 … INTO crn:v1:bluemix:public:dashdb-for-tx:us-south:s/c38…:cf-service-instance:/MYTABLE * future Promote on COS Promote to Db2 Blog Article: Db2 ETL
  • 14.
  • 15.
    Multi-Cloud is here However: •39% claim too much infrastructure complexity • 39% of businesses cannot analyze the entire environment AberdeenGroup, 2019 70%of enterprises will be implementing a multi-cloud strategy by 2019 Gartner
  • 16.
    © 2019 IBMCorporation SQL on Cloud Object Storage in Db2  “SQL on COS” to be available in all Db2 form factors in 2020, with support for all open source formats (CSV, Parquet, ORC, etc)  Decoupled storage opens up multiple “modernization points” for Db2 – Compute Elasticity – Resilience to Node Failures – Multiple instances operating against the same data – Transient/ephemeral instances 17 Use AWS client to list files in IBM COS bucket named ‘bigsql-secure’ Table ‘sales_fact’ contains Parquet files partitioned by organization_keySynchronize table ‘sales_fact’ from external metadata catalogQuery table ‘sales_fact’ (Parquet data on COS) from Db2!
  • 17.
    Data and AIForum 2019 Serverless Stack for Analytics Serverless Storage Serverless Runtimes Serverless Analytics Object Storage Cloud Function s Query
  • 18.
    Data and AIForum 2019 IBM Cloud Functions + SQL Query – User Cases Unstructured Data Prep SQL Query Cloud Functions Analyze COSCOS Extract Features Automated/Scheduled SQL Execution SQL Query Cloud Functions Develop SQL Deploy as SQL Cloud Function Set up Cloud Function Trigger/Schedule Shield Data From Direct Access SQL Query Cloud Functions Deploy Cloud Function with COS API Key User Calls Function to Access Data COS Grant Execute on SQL Cloud Function to User Configure SQL Pipelines SQL Query Cloud Functions User creates function sequence to automate flow of consecutive SQLs Sequence SQL Query Cloud Functions 1 . 2 .
  • 19.
    Data and AIForum 2019 Object Storage IBM Cloud Object Storage Objects Objects Objects At Rest On the Wire Buckets Encrypted Pennies per GB REST Elastic Durable Flexible Resiliency Choices Storage Classes User Managed Encryption Keys S3 Compatible High Speed Data Transfer Aspera SQL Queries
  • 20.
    Data and AIForum 2019 COS Ingest Options High Customizability Degree of Serverless-ness IBM Event Streams (Kafka aaS) IBM Cloud Functions Out-of-the-Box IBM Streaming Analytics (IBM Streams aaS) via Cloud Object Storage API SQL Query ETL Cloudant Replication Blockchain Synch
  • 21.
    Data and AIForum 2019 SQL Query Scale Out Architecture Data Center 2 Analytics Engine Cluster 20 Kernels Node 1 Node 3 Node 2 Node 3 … 20 Kernels … Data Center 3 Analytics Engine Cluster 20 Kernels Node 1 Node 3 Node 2 Node 3 … 20 Kernels … SQL 1 SQL 1 Data Center 1 Analytics Engine Cluster 20 Kernels Cluster Pool Request Queue Node 1 Node 3 Node 2 Node 3 … Kernel Pools 20 Kernels … SQL 1 SQL 2 SQL 3 SQL 4 SQL 5 Cloud Object Storage SQL 6 … JKG (Web Sockets)
  • 22.
    Data and AIForum 2019 SQL Query Built on Apache Spark Best of breed Spark SQL Reference • Complete, intuitive and interactive SQL Reference • Each sample SQL can immediately be executed as is https://cloud.ibm.com/docs/services/sql-query/sqlref/sql_reference.html#sql-reference IBM Spark SQL Reference
  • 23.
    Data and AIForum 2019 Analyzing Application Logs Logs Your Cloud Application/Solution IBM Cloud Object Storage Query Transform Compress Aggregate Repartition Analyze Anomaly Detection User Segmentation Customer Support Resource Planning • Build & run data pipelines and analytics of your log message data • Flexible log data analytics with full power of SQL • Seamless scalability & elasticity according to your log message volume
  • 24.
    Data and AIForum 2019 IBM SQL Query – Timeseries SQL 1/2  Intuitive first-of-a-kind SQL extensions for timeseries operations  Industry leading differentiators, including: • Timeseries transformation functions: • Correlation, Fourier transformation, z-normalization, Granger, interpolation, and distances • Temporal Joins: SQL support for Left/Right/Full Inner and Outer joins of multiple timeseries Alignment & Joining:
  • 25.
    Data and AIForum 2019 IBM SQL Query – Timeseries SQL 2/2  Further Industry leading differentiators • Numerical and categorical timeseries types • Timeseries data skipping for fast queries • Forecasting: • ARIMA, BATS, Anomaly detection, etc. • Subsequence Mining: • Train & match models for event sequences • Segmentation: • Time-based, Record-based, Anchor-based, Burst, and silence Segmentation:
  • 26.
    Data and AIForum 2019 IBM SQL Query – Spatial SQL  SQL/MM standard to store & analyze spatial data in RDBMS  Migration of PostGIS compliant SQL queries  Aggregation, computation and join via native SQL syntax  Industry leading differentiators • Geodetic Full Earth support • Increased developer productivity • Avoid piece-wise planar projections • High precision calculations anywhere on the earth • Very large polygons (e.g. countries), polar caps, x-ing anti-meridian • Spatial data skipping for fast queries • Native and fine-granular geohash support • Fast spatial aggregation
  • 27.
    Data and AIForum 2019 Combining Spatial and Temporal Processing IBM Cloud Object Storage Sensor Data Query Location Analytics Mobile Cars Devices Land Location Filtering Spatial Aggregation GPS SQL/MM Sensor Metrics t t t Timeseries Assembly Timeseries Join Timeseries SQL t
  • 28.
    Data and AIForum 2019 Scaling COS Big Data Processing: Data Skipping Index All Objects IBM Cloud Object Storage Data Set Objects SQL Query Data Skipping Indexing Candidate Objects WHERE Clause Saving Time and $ SQL Query learns which objects are not relevant to a query using a data skipping index CREATE METAINDEX stores index summary metadata for each object. Much smaller than the data. SQLs skipping irrelevant objects to significantly reduce I/O E.g.: Independent of data formats Index Types: Min/Max, Value List, Bounding Box Get location and time of heat waves (>40 celcius) SELECT lat, long, city, temp, date FROM weather WHERE temp > 40.0
  • 29.
    Data and AIForum 2019 Notices and disclaimers Copyright © 2019 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
  • 30.
    Data and AIForum 2019 Notices and disclaimers continued Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a particular, purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services®, Global Technology Services®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli® Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Editor's Notes

  • #2 Title: 1 min (1) Slide 2: 3 min (4) Slide 3: 4 min (8) Slide 4: 5 min (13) Slide 5: 4 min (17) Slide 6: 1 min (18) Slide 7: 5 min (23) Slide 8: 4 min (27) Slide 9: 4 min (31) Slide 10: 3 min (34) Slide 11: 2 min (36) Slide 12: 3 min (39) Slide 13: 3 min (42) Slide 14: 3 min (45)