You don't necessarily have to set up a relational database, tables and load data in order to use a surprisingly rich set of SQL capabilities on your data in the cloud. IBM SQL Query lets you analyze terabytes of distributed data of heterogeneous formats with a complete ANSI SQL dialect in a completely serverless usage model, elegantly ETL data between formats and partitioning layouts as needed, and run complex time series transformations, analysis and correlations with advanced built-in timeseries SQL algorithms that are differentiating in the entire industry. It also support a complete PostGIS compliant geospatial SQL function set. Come explore the stunningly advanced world of SQL without a database in IBM Cloud.
2. Evolution of Mobility
Your own
chauffeur-
driven car
Owning and
driving a car
Renting
a Car Car Service
Flexibility
3. Evolution of Form Factors
For Big Data Analytics
Enterprise Data
Warehouses
Tightly integrated and
optimized systems
Hadoop
Introduced open data formats &
easy scaling on commodity HW
Cloud-Native:
Serverless Analytics-aaS
• Seamless elasticity
• Pay-per-query consumption
• Analyze data as it sits in an object store
• Disaggregated architecture
• No more infrastructure head aches
The 90-ies 2000 Today
6. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)
2. Read data
4. Read
results
Application
3. Write results
IBM Cloud
Object Storage
Result SetData Set
Data Set
Data Set
1. Submit SQL
SQL
Archive / Export
IBM Cloud Streaming
IBM Streams
Message Hub
Land
Query
Watson IoT
IBM Cloud Query – Architecture
IBM Cloud Databases
Db2 on Cloud
Geospatial SQLData Skipping
Timeseries SQL
Upload
15. Data Skipping Saving you Time and $
Index All
Objects
IBM Cloud Object Storage
Data Set Objects
SQL
Query
Data Skipping
Indexing
Candidate
Objects
WHERE Clause
Saving Time
and $
SQL Query learns which objects are not relevant to a query
using a data skipping index
CREATE METAINDEX stores index summary metadata for
each object. Much smaller than the data.
SQLs skipping irrelevant objects to significantly reduce I/O
E.g.:
Independent of data formats
Index Types: Min/Max, Value List, Bounding Box
Get location and time of heat waves (>40 celcius)
SELECT lat, long, city, temp, date
FROM weather
WHERE temp > 40.0
20. IBM Query – Spatial SQL
§ SQL/MM standard to store & analyze spatial data in RDBMS
§ Migration of PostGIS compliant SQL queries
§ Aggregation, computation and join via native SQL syntax
§ Industry leading differentiators
• Geodetic Full Earth support
• Increased developer productivity
• Avoid piece-wise planar projections
• High precision calculations anywhere on the earth
• Support for very large polygons (e.g. countries), polar
caps, geometries crossing anti-meridian
• Spatial data skipping for fast queries
• Native and fine-granular geohash support
• Fast spatial aggregation
22. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Sensor Data Analytics with Extended Syntax
IBM Cloud Object Storage
Sensor
Data
Query
Location
Analytics
Mobile
Cars
Devices
Land
Location
Filtering
Spatial
Aggregation
GPS
SQL/MM
Sensor
Metrics
t
t
t
Timeseries
Assembly
Timeseries
Join
Timeseries SQL
t
23. A Stack for Serverless Data & Analytics Solutions
Serverless
Storage
Serverless
Runtimes
Serverless
Analytics
Object
Storage
Cloud
Functions
Query
24. Use Cases of Cloud Functions Adding Value to SQL
Unstructured Data Prep
SQL Query
Cloud
Functions
Analyze
COSCOS
Extract Features
Automated/Scheduled SQL Execution
SQL Query
Cloud
Functions
Develop SQL Deploy as SQL Cloud Function
Set up Cloud
Function
Trigger/Schedule
Shield Data From Direct Access
SQL Query
Cloud
Functions
Deploy Cloud Function
with COS API Key
User Calls
Function to
Access Data
COS
Grant Execute on SQL
Cloud Function to User
Configure SQL Pipelines
SQL Query
Cloud
Functions
User creates function
sequence to automate flow
of consecutive SQLs
Sequence
SQL Query
Cloud
Functions
1.
2.
25. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Use for Data Pipelines to fuel BI
IBM Cloud Object Storage
Acquire
Query
Data Warehouses &
Databases
Db2 on Cloud
Process Report
ApplicationsApplications
Applications
IoT
Streaming
Devices
Devices
Devices
BI Reporting
Land
Log Messages
Cleanse
Filter
Merge
Aggregate
Compress
Watson Studio
Looker
Cognos
Tableau
Explore
Analyze Analyze
Promote
27. When Serverless ? When RDBMS?
RDBMS Serverless
Cloud-Native Solutions
Reserved Compute
Open Data Formats
Avoid data load
Schema at read
Interactive SQLs
Seamless elasticity
UDFs required
Transactions
Pay per query
JDBC/ODBC
REST API
Highly resilient/available
30. IBM SQL Query – Available Features (Q1 2019)
Available Now:
• Read, write & transform open data in Object Storage
• CSV, JSON, Parquet, ORC, AVRO
• Full ANSI SQL & scale-out based on Apache Spark
• Including Authoritative Spark SQL Reference
• Geospatial SQL Support
• Automatic partitioning & schema inference
• Writing results w/ hive-style or paginated partitioning
• I/O Exploitation of Hive-style partitioning
• SQL Web UI
• SQL REST API
• Python & Node.JS client SDKs
• IBM Cloud Function integration
• SQL Notebook in Watson Studio
Available for Beta By Invitation:
• Data Skipping Indexes
• Native Timeseries SQL Support
• JDBC Driver support
Upcoming:
• Reading from Cloudant
• Reading / Writing Db2 & other RDBMS
• Reading Shapefile data
• Cataloging SQL Assets
34. Use IBM SQL Query to learn Spark SQL
• SQL Query UI is basically an interactive Spark SQL UI
Best of breed Spark SQL Reference
• Complete, intuitive and interactive SQL Reference
• Each sample SQL can immediately be executed as is
https://cloud.ibm.com/docs/services/sql-query/sqlref/sql_reference.html#sql-reference
Spark SQL Reference
36. IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without
notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not
be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can
be given that an individual user will achieve results similar to those stated here.
36
Please note