SlideShare a Scribd company logo
1 of 41
Download to read offline
IBM Db2 Big SQL and Open Source
Hebert Pereyra Big SQL and Data Virtualization Chief Architect
pereyra@ca.ibm.com
IBM Cloud
Legal Disclaimer
2
Copyright © IBM Corporation 2018 All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication, or disclosure restricted by GSA ADP Schedule Contract with IBM Corporation
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE
EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS
PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS
INFORMATION IS BASED ON CURRENT THINKING REGARDING TRENDS AND DIRECTIONS, WHICH ARE SUBJECT TO CHANGE
BY IBM WITHOUT NOTICE. FUNCTION DESCRIBED HEREIN MY NEVER BE DELIVERED BY I BM. IBM SHALL NOT BE
RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY
OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF,
CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE
TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
IBM, the IBM logo, ibm.com and Db2 are trademarks or registered trademarks of International Business Machines Corporation in
the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this
information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at
the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
IBM Cloud
IBM and Hortonworks
Focus on extending data science and machine learning to analyze the data in Apache
Hadoop systems. Consumers get the best in class open technology
• #1 Rank by Gartner 2017
Data Science Magic Quadrant
• Leader in SQL technology
for Hadoop (www.tpc.org)
• Leader in data and analytics
solutions for Hybrid Cloud
• Provides Data Science & Machine
Learning
• Leader in Hadoop Open Source
Distribution
• 1000+ customers and
2100+ ecosystem partners
• Hadoop original architects,
developers employed
by Hortonworks
• Provides Open Hadoop Data
Platform
Commitment to progressing advanced analytics through open source
+
2017-18: 100s of deals closed together. Over 40 joint events across 17 countries. Integration with IHAH, Big SQL, DSX, Truata
GDPR offering. Working on embedded offerings for IoT, Watson. Monthly Exec Interlocks. Weekly Sales Exec Meetings. Engineering
& OM Interlocks.
IBM Cloud
IBM and Hortonworks – High Value
IBM Cloud
SQL
Ad-hoc
queries, data
preparation
Federation
Operational
with fast
lookups
High
performance
and scalability
Integrated
Spark and
Machine
Learning
Complex
SQL,
Deep
analytics,
Many users
Application
portability
Db2 Big SQL – For all DWH needs in Hadoop
SQL-based
Application
Common SQL Engine
Client driver
Db2 Big SQL
Data Storage
SQL MPP Run-time
DFS
Hadoop
© 2017 IBM Corporation
DB2 Warehouse (DPF) drives all nodes to read table partitioned across multiple nodes
SELECT SUM (…)
FROM some_table DB2 Coordinator
node 1 node 2 node 3 node 4 node n
Sum(..) Sum(..) Sum(..) Sum(..) Sum(..)
© 2017 IBM Corporation
DB2 Warehouse (DPF) drives all nodes to read table partitioned across multiple nodes
Query Result DB2 Coordinator
node 1 node 2 node 3 node 4 node n
Sum(..) Sum(..) Sum(..) Sum(..) Sum(..)
© 2017 IBM Corporation
Let’s extend the MPP processing concept now to
Hadoop…
Database
Client Big SQL (head)
node 1 node 2 node 3 node 4 node n
HDFS
A A A
B B B
A + BComplete Table =
Big SQL
Scheduler
NameNode
© 2017 IBM Corporation
Big SQL Query Execution
Database
Client Big SQL (head)
node 1 node 2 node 3 node 4 node n
HDFS
A A A
B B B
A + BComplete Table =
Big SQL
Scheduler
NameNode
IBM Cloud
Db2 Big SQL V5.0
CoreCapabilitiesApplications
Core SQL Engine SecurityAdministration
Comprehensive ANSI SQL
coverage
Advanced cost-based optimizer
SQL based RBAC
Ranger
Automatic workload management
WLM
Automatic memory management
Performance
Query rewrite for optimized
execution
Elastic boost – logical worker
nodes
SQL compatibility – Db2, Oracle,
Netezza
Federation
MQTs
Integration
Spark Integration
Batch SQL
(minutes to hours)
Interactive SQL
(seconds to minutes)
Self-service /
Interactive BI
(Sub-second)
Data augmentation
(Spark integration)
Application portability
• ETL
• Reporting
• Data mining
• Deep analytics
• Reporting
• Complex queries
• BI Tools: Cognos, Tableau,
etc
• Ad-hoc, exploratory
• BI tools: Cognos,
Tableau, etc
• Query EDW
• Join data
• Use ML
• Reuse applications
• Reuse skills
Roles
DSM, Ambari
SQL and NoSQL
Structured & Unstructured
www.tpc.org – check out TPC-H and TPC-DS – Big SQL vs Impala vs Hive
Db2 Big SQL 5.0 2X faster than Hive LLAP with Tez (gap increases rapidly with higher concurrency)
Db2 Big SQL 5.0 3X faster than Spark SQL (more optimal resource utilization and execution engine)
Db2 Big SQL 5.0 faster than Cloudera Impala 2.9.0 (scalability hindered by architecture)
IBM Cloud
No Vendor
Lock-in &
Integration
with Hive
• Db2 Big SQL preserves open source foundation
• Separation of compute and storage - Data is part of Hadoop
• Alternate execution engine to MapReduce
• Hive and Db2 Big SQL share common metadata
• Db2 Big SQL is optimized to push projections and predicate filters down to the
storage I/O engine
• Db2 Big SQL can invoke native Hive UDFs efficiently using Parameter Style Hive
• Work on Hive ACID tables from Big SQL
SQL Execution Engines
Open Source Storage Model
CSV Parquet ORC
Others
…
Tab
Delim.
Hive Metastore
(open source)
Db2Big SQL
(IBM)
Hive
(Open Source)
5.0.3
Ecosystem Integration
IBM Cloud
New & Improved Reader/Writer for all File Formats
ORC
HBASE
ANALYZE
CUSTOM SERDES
DATE STORED AS DATE
OBJECT STORE
EVERY OTHER FORMAT
Formats supported by the C++ Reader
and more
COMPLEX TYPES
PARQUET
TEXT
RCFILE
AVRO
SEQUENCE FILE
Db2 Big SQL 5.0.3
Java I/O
▪Improved Performance on all tables / file types
▪Particularly text files
▪Product Stability
▪Reduced Complexity in Db2 Big SQL architecture
▪Reduced Cost
−Personnel and Processes to maintain 2 I/O engines
−Customer PMRs and critical situations
▪Reduced Out-of-Memory and instability issues
▪Reduced Resource allocation
▪Better Interoperability with Open Source
▪Java reader/writer for all file formats
VARBINARY
Enhanced stability and robustness to the product
5.0.3
IBM Cloud
Big SQL High Availability (Head Node)
Scheduler
Big SQL Master
(Primary)
Catalogs Scheduler
Big SQL Master
(Standby)
Catalogs
…
HDFS Data
Worker
Node
Worker
Node
Worker
Node
Database
Logs + Data Shipping
HDFS Data HDFS Data
▪ Big SQL master node high availability
− Scheduler automatically restarted upon failure
− Catalog changes (metadata) replicated real time to “warm” standby instance
− Standby automatically takes over if the primary fails
− Worker nodes automatically detect and re-connect to acting “primary”
− Automatic Client Re-route (ACR): clients automatically re-connect to acting “primary”
IBM Cloud
PERFORMANCE: 6-streams
Db2 Big SQL 2.3X FASTER
HADOOP-DS @ 10TB
85 COMMON QUERIES
WORKING COMPLIANT QUERIES: 6-streams
WORKLOAD
SCALE FACTOR: 10 TB
FILE FORMAT: ORC (ZLIB)
CONCURRENCY: 6 STREAMS
QUERY SUBSET: 85 QUERIES
RESOURCE UTILIZATION:
6-STREAMS
1.5x FEWER CPU CYCLES USED
STACK
HDP 2.6.1
Db2 Big SQL 5.0.1
HIVE 2.1 LLAP ON TEZ
INTERESTING FACTS
FASTEST QUERY
5.4X FASTER (Db2 Big SQL: 1.5 SEC, HIVE: 8.1 SEC)
SLOWEST QUERY (QUERY 67)
1.7X FASTER (Db2 Big SQL: 6827 SEC, HIVE: 11830 SEC)
Db2 Big SQL FASTER FOR 80% OF
QUERIES RUN
PERFORMANCE: 1-stream
Db2 Big SQL 1.8X FASTER
hrs
hrs
Query Performance at a Glance – vs Hive LLAP with Tez
IBM Cloud
Combining Hadoop Technologies
Not Mutually Exclusive.
Hive, Db2 Big SQL &
Spark SQL can co-exist
and complement and
leverage each other in
a cluster
Hive Db2 Big SQL Spark SQL
Geospatial analytics
ACID capabilities
Fast ingest
Federation
Complex Queries
High Concurrency
Enterprise ready
Application portability
All open source files
Machine learning
Data exploration
Simpler SQL
Leveraged for
metastore and UDFs
IBM’s proprietary SQL
engine for Hadoop
IBM’s open source SQL
engine for Hadoop
IBM Cloud
PERFORMANCE
Db2 Big SQL 5.0 is 3.2x faster than Spark SQL 2.1
(4 Concurrent Streams)
SNAPSHOT OF 100TB HADOOP-DS
I/O (vs Spark)
Db2 Big SQL reads 12x less data
Db2 Big SQL writes 30x less data
COMPRESSION
60%
SPACE SAVED
WITH PARQUET
AVERAGE CPU USAGE
76.4%
MAX I/O THROUGHPUT
READ 4.4 GB/SEC
WRITE 2.8 GB/SEC
WORKING QUERIES
Leads performance metrics on high volumes of data and concurrent streams
Blog on benchmark: https://developer.ibm.com/hadoop/2017/02/07/experiences-comparing-big-sql-and-spark-sql-at-100tb/
Query Performance at a Glance – Db2 Big SQL & Spark SQL
IBM Cloud
Materialized Query Tables (MQTs) on Hadoop Tables
User executes
a complex
query
Db2 Big SQL
Query
rewrite
Generate plan
Generate plan
Plans are
compared and
the best one is
picked
Query results
Base tables
MQTs
MQTs
Base
• MQTs have results pre-computed and stored
• Enables sub-second response times for complex queries with aggregates and joins on dimension tables
• Db2 Big SQL optimizer automatically recognizes the MQTS for faster response
5.0.3
IBM Cloud
MQT Performance – Star Schema Benchmark Queries
Quick
metric
queries
Product
insight
queries
Customer
insight
queries
Using Scale Factor 1000, tested 13 queries that join 1 fact with 4 dimension tables
6 Billion Lineitems & 30 Million Customers rows
Response time in secs
Query performance
on non-MQT table
5.0.3
IBM Cloud
MS SQL
Server
Netezza
(PDA) Oracle
PostgreS
QL Teradata
DB2
LUW,
Db2z,
DB2 on i Informix
WebHDFS
Object Store
(S3)
Hive HBase HDFS
Hortonworks Data Platform (HDP)
Db2 Big SQL NoSQL
ML
Model
Federation – Virtualize Heterogeneous Data
Db2 Big SQL queries heterogeneous systems in a single query
Only SQL-on-Hadoop that virtualizes more than 10 different data
sources: RDBMS, NoSQL, HDFS or Object Store
Transparent
▪ Appears to be one source
▪ Programmers don’t need to know how / where data is stored
High Function
▪ Full query support against all data
▪ Capabilities of sources as well
Autonomous
▪ Non-disruptive to data sources, existing applications, systems.
High Performance
▪ Optimization of distributed queries
IBM Cloud
✓ Easily access information on demand
✓ Combine data in Hadoop with disparate
sources to form a data lake
✓ Quickly extend your data warehouse by
enriching it
Connect Query Monitor Data Placement
▪ QuickaccesstoDatavalue
▪ CommonFramework
▪ ODBC/JDBC
▪ Spark integration enables new
data sources
▪ Connect all data sources in
single query
▪ Intelligent Query Routing
▪ Cost-based optimizer
▪ SQL pushdown
▪ Local data caching
▪ ANSI-compliant SQL
▪ Easily define & manage
through a common UI
▪ Simple point & click to
discover and query
▪ Monitor and visualize active
queries
▪ Schema conversion when
moving data
▪ Bulk data copy to Hadoop
▪ Filtered subsets of data
Federation - Rich Capabilities that Brings Data Together
Think 2018 /9071A - Live Data Analytics using Db2 Big SQL and Big Replicate / March 19, 2018 / © 2018 IBM Corporation
IBM Cloud
Access data from new data sources
Extend the federation capabilities to data sources like: MySQL, PostgreSQL, MariaDB, &
MongoDb, with enriched capabilities of query pushdown, best execution plan, secured access
and optimized execution time
Federation – Db2 Big SQL 5.0.3 Enhancements
Function mapping for best query results
When a function in one data source can be mapped to same or similar function in the remote
data source, the results returned from the query pushdown is refined rather than returning
large result sets with no function mapping
Create local cache for federated data
When federated data does not change frequently, it can be locally cached by creating Hadoop
MQTs or regular MQTs to have local access to get best performance when compared to
accessing remote data
Use computational group to improve performance
Computational partition group enables dynamically redistributing nickname data to parallelize
processing in Hadoop. This improves performance especially when there’s large nickname
data or when queries get complex.
5.0.3
IBM Cloud
Offload data
Data warehouse offload to Hadoop is now made easy:
• Write one, run anywhere…
• Easy porting of applications
• Reuse skills of DBAs/ developers who know ANSI SQL
Db2 Big SQL is the best platform
for offloading
Oracle Data Marts and
Warehouses
to Hadoop
Application Portability: Move Applications without Re-tooling
IBM Cloud
Here’s why Db2 Big SQL can get you the best execution for
complex queries and many concurrent users with high
performance
Db2 Big SQL - Query Execution
Self Tuning
Memory Manager
World Class
Cost Based Optimizer Query rewrite
Advanced Statistics
Native Row &
Columnar stores
Elastic Boost
SQL Compatibility
Hardened runtime
Advanced
Workload manager
Materialized Query
Tables
Performance Concurrent users Complex
query
IBM Cloud
For more details check the blog: https://developer.ibm.com/hadoop/2017/11/07/ibm-big-sql-machine-learning-demo/
Operationalize Machine Learning Models using SQL
IBM Cloud
Db2 Big SQL - Security
IBM Cloud
Db2 Big SQL - Security and Governance
Combining Hadoop Technologies
Added to SQL level security for row-level and column-
level access control, Db2 Big SQL integrates with other
components
With Apache Ranger Db2 Big SQL plugin, you can
setup policies for access to Db2 Big SQL tables:
Create, alter, analyze, load, truncate, drop, insert, select,
update, and delete.
Supports Ranger Audit
Big SQL also integrates with Information Governance
Catalog by enabling easy shared imports to InfoSphere
Metadata Asset Manager, which allows:
Analyze assets
Utilize assets in jobs
Designate stewards for the assets
Apache Ranger
InfoSphere Metadata Asset Manager
5.0.3
IBM Cloud
Db2 Big SQL - Advanced Workload Management
Benefits
• Identification and control of applications
• Direct control of the execution environment
• Detection and control of rogue queries –
prevent bad queries from executing
• Query concurrency – optimize query
throughput
• Advanced monitoring
Avoid under-utilizing or over saturating the resources
Resources Assigned:
CPU
I/O
Memory
Service Classes
categorize work and set
goals:
Response time
Velocity
System
Discretionary WLM Checks
every 10 secs
5.0.3
IBM Cloud
Custom WLM
Db2 Big SQL - Advanced Workload Management
5.0.3
IBM Cloud
Query streaming
data in HBase
with data at-rest
• HBase is a columnar NoSQL data store for Hadoop
• HBase offers a flexible schema, low latency key lookups and small key scans.
However, HBase is approximately 3x to 5x slower than Hive for large table
scans
• HBase tables are updateable (ACID), and could be used for versioned
dimensions (though native Db2 tables local to the Head Node are generally a
better choice here)
• Db2 Big SQL supports the full range of SQL operations against HBase tables
• Combine streaming data in HBase with relational data in Hive using Big SQL
Db2 Big SQL
HBaseHive
Analytical
SQL
NZ
RDBMS
Offloaded
Data
High Data
Ingest Rates
from External
Applications
Db2 Big SQL and Hbase Support
IBM Cloud
Db2 Big SQL - Tables over S3 Object Storage
Create Tables over Data residing in Object Store directly (no copy required into Hadoop)
Once configured, Object Store tables work like any other table in Big SQL
Benefits:
No need to copy data into Hadoop first! Query data where it resides.
Partitioning supported!
Tradeoff:
Expect reduced performance relative to HDFS local tables
CREATE HADOOP TABLE staff ( … )
LOCATION
's3a://s3atables/staff';
LOAD FROM
Object Store
also supported!
IBM Cloud
Db2 Big SQL - Tables over WebHDFS
Transparently access data on any platform implementing WebHDFS
Examples: Microsoft Azure Data Lake (ADL) service
Once setup, WebHDFS tables work like any other table in Big SQL
Technical Preview Limitations:
WebHDFS via Knox not supported
Performance not well understood. Reduce performance expected.
Db2 Big SQL
Local Hadoop Cluster
Remote Hadoop Cluster
or
WebHDFS enabled Storage
CREATE HADOOP TABLE staff ( … )
PARTITIONED BY (JOB VARCHAR(5))
LOCATION 'webhdfs://namenode.acme.com:50070/path/to/table/staff';
LOAD FROM
WebHDFS
also supported!
5.0.3
IBM Cloud
Db2 Big SQL – Integration with Yarn and Spark
IBM Cloud
HDFS
Big SQL Head Node
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Spark Exec. Spark Exec. Spark Exec. Spark Exec.
= Fast data transfer over
shared memory
Db2 Big SQL – Deep Integration with Spark
IBM Cloud
Exploit Db2 Big SQL from Spark
Requirements: db2jcc.jar must be added to the classpath of the Spark application (found in /home/bigsql/java/)
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
…
Dataset<Row> tableDf = sqlCtx.read()
.format("jdbc")
.option("driver", "com.ibm.db2.jcc.DB2Driver")
.option("url", "jdbc:db2://server1.foo.bar.com:32051/BIGSQL")
.option("user", "joe")
.option("password", "joespwd")
.option("dbtable", "myshcema.mytable")
.load();
tableDf.createOrReplaceTempView("myTable");
Dataset<Row> queryDF =
spark.sql("SELECT col2, col3 FROM myTable WHERE col1 > 100");
Big SQL secures data for
self-service
data exploration.
Used this way, Spark
users are subject to
Big SQL row/column
security
IBM Cloud
Exploit Spark from Big SQL
Example: Spark Schema Discovery for JSON
Bring the best of Spark into Big SQL!
Machine Learning
Cache remote tables (Spark has rich library of connectors)
Graph Processing
General in memory processing
SELECT doc.*
FROM TABLE(
SYSHADOOP.EXECSPARK( class => 'DataSource',
load => 'hdfs://host.port.com:8020/user/bigsql/demo.json')
) AS doc
WHERE doc.language = 'English';
Structure of JSON
document determined at
run time
© 2018 IBM Corporation36
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Big SQL Elastic Boost – new in v5.x (and 4.2.5)
Multiple Logical Workers per Host
Big SQL
Head
HDFS
Container
Big SQL
Components
Users
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
© 2018 IBM Corporation37
Big SQL Elastic Boost – new in v5.x (and 4.2.5)
Logical Workers allows multiple Yarn containers
per Host
Big SQL
Head
NM NM NM NM NM NM
HDFS
Slider Client
YARN
Resource
Manager
& Scheduler
Big SQL
AM
Big SQL
Worker
Big SQL
Worker
Big SQL
Worker
Container
YARN
components
Slider
Components
Big SQL
Components
Big SQL Slider package
implements Slider Client APIs
Users
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
© 2018 IBM Corporation38
Elastic Big SQL Capacity
• Remember that with Big SQL, 1 container (1 Worker) can service hundreds of concurrent
SQL jobs. It accomplishes such by being a long running service that stays resident in
memory.
So what does
50% mean..?
© 2018 IBM Corporation39
What does 50% mean? (default post enablement)
▪ YARN (not Big SQL) decides where containers/workers are started. These situations (and
others) are possible.
▪ As capacity target increases above 50%, opportunity of skew is reduced.
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
WorkerWorker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker Worker
Worker
Worker
Worker
Worker
Worker
WorkerWorker
Ideal Minor Skew
Significant Skew Significant Skew
IBM Cloud
Db2 Big SQL w/ Elastic Boost - INSERT Performance
For both 1 and 10 TB TPC-DS dataset
2 Workers/Node: 1.6x speedup
4 Workers/Node: 2.2x speedup
In each scenario, the
same TOTAL
CPU/memory is used
INSERT…SELECT
performance with
Elastic Boost
# Workers / Node
IBM Cloud
Db2 Big SQL 5.0 – How it fits with Hortonworks
Big SQL deploys on top of Hortonworks Data Platform(HDP)
Capability
Apache Hadoop and
Apache Spark and Ecosystem
IBM Big SQL
Support
Community Support
Hortonworks Data
Platform for IBM
Support Offering
for HDP
includes support
for Big SQL
BUY
Hortonworks Data Platform

More Related Content

What's hot

Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Amazon Web Services
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep DiveAmazon Web Services
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewMario Cartia
 
A Cloud Journey - Move to the Oracle Cloud
A Cloud Journey - Move to the Oracle CloudA Cloud Journey - Move to the Oracle Cloud
A Cloud Journey - Move to the Oracle CloudMarkus Michalewicz
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflakeSunil Gurav
 
Introduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingIntroduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingAmazon Web Services
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud ServicesDavid J Rosenthal
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksDatabricks
 

What's hot (20)

Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
 
A Cloud Journey - Move to the Oracle Cloud
A Cloud Journey - Move to the Oracle CloudA Cloud Journey - Move to the Oracle Cloud
A Cloud Journey - Move to the Oracle Cloud
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Introduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingIntroduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud Computing
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud Services
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Intro to AI & ML at Amazon
Intro to AI & ML at AmazonIntro to AI & ML at Amazon
Intro to AI & ML at Amazon
 
Disaster Recovery Synapse
Disaster Recovery SynapseDisaster Recovery Synapse
Disaster Recovery Synapse
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
 

Similar to Ibm db2 big sql

TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...tdc-globalcode
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youModusOptimum
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)DataWorks Summit
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project ExperienceEDB
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRailswebuploader
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
Breaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopBreaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopDataWorks Summit
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 

Similar to Ibm db2 big sql (20)

TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for you
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
 
Db2 tools
Db2 toolsDb2 tools
Db2 tools
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRails
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Breaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopBreaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over Hadoop
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 

More from ModusOptimum

Modernizing your information architecture with ai
Modernizing your information architecture with aiModernizing your information architecture with ai
Modernizing your information architecture with aiModusOptimum
 
Informix 14.1 launch webinar
Informix 14.1 launch webinarInformix 14.1 launch webinar
Informix 14.1 launch webinarModusOptimum
 
Informix 14.1 launch Webinar
Informix 14.1 launch WebinarInformix 14.1 launch Webinar
Informix 14.1 launch WebinarModusOptimum
 
Db2 on cloud overview
Db2 on cloud overviewDb2 on cloud overview
Db2 on cloud overviewModusOptimum
 
Ibm cloud private and icp for data
Ibm cloud private and icp for dataIbm cloud private and icp for data
Ibm cloud private and icp for dataModusOptimum
 
Db2 family and v11.1.4.4
Db2 family and v11.1.4.4Db2 family and v11.1.4.4
Db2 family and v11.1.4.4ModusOptimum
 
Db2 developer ecosystem
Db2 developer ecosystemDb2 developer ecosystem
Db2 developer ecosystemModusOptimum
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...ModusOptimum
 
Infographic-RedmondWCInfluencer-FB-29246
Infographic-RedmondWCInfluencer-FB-29246Infographic-RedmondWCInfluencer-FB-29246
Infographic-RedmondWCInfluencer-FB-29246ModusOptimum
 
Infographic-TechValidate-FB-29328
Infographic-TechValidate-FB-29328Infographic-TechValidate-FB-29328
Infographic-TechValidate-FB-29328ModusOptimum
 
Adult Con Ed-Corp Bro_single pgs
Adult Con Ed-Corp Bro_single pgsAdult Con Ed-Corp Bro_single pgs
Adult Con Ed-Corp Bro_single pgsModusOptimum
 

More from ModusOptimum (12)

Modernizing your information architecture with ai
Modernizing your information architecture with aiModernizing your information architecture with ai
Modernizing your information architecture with ai
 
Informix 14.1 launch webinar
Informix 14.1 launch webinarInformix 14.1 launch webinar
Informix 14.1 launch webinar
 
Informix 14.1 launch Webinar
Informix 14.1 launch WebinarInformix 14.1 launch Webinar
Informix 14.1 launch Webinar
 
Db2 event store
Db2 event storeDb2 event store
Db2 event store
 
Db2 on cloud overview
Db2 on cloud overviewDb2 on cloud overview
Db2 on cloud overview
 
Ibm cloud private and icp for data
Ibm cloud private and icp for dataIbm cloud private and icp for data
Ibm cloud private and icp for data
 
Db2 family and v11.1.4.4
Db2 family and v11.1.4.4Db2 family and v11.1.4.4
Db2 family and v11.1.4.4
 
Db2 developer ecosystem
Db2 developer ecosystemDb2 developer ecosystem
Db2 developer ecosystem
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
Infographic-RedmondWCInfluencer-FB-29246
Infographic-RedmondWCInfluencer-FB-29246Infographic-RedmondWCInfluencer-FB-29246
Infographic-RedmondWCInfluencer-FB-29246
 
Infographic-TechValidate-FB-29328
Infographic-TechValidate-FB-29328Infographic-TechValidate-FB-29328
Infographic-TechValidate-FB-29328
 
Adult Con Ed-Corp Bro_single pgs
Adult Con Ed-Corp Bro_single pgsAdult Con Ed-Corp Bro_single pgs
Adult Con Ed-Corp Bro_single pgs
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Ibm db2 big sql

  • 1. IBM Db2 Big SQL and Open Source Hebert Pereyra Big SQL and Data Virtualization Chief Architect pereyra@ca.ibm.com
  • 2. IBM Cloud Legal Disclaimer 2 Copyright © IBM Corporation 2018 All rights reserved. U.S. Government Users Restricted Rights - Use, duplication, or disclosure restricted by GSA ADP Schedule Contract with IBM Corporation THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON CURRENT THINKING REGARDING TRENDS AND DIRECTIONS, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. FUNCTION DESCRIBED HEREIN MY NEVER BE DELIVERED BY I BM. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE. IBM, the IBM logo, ibm.com and Db2 are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
  • 3. IBM Cloud IBM and Hortonworks Focus on extending data science and machine learning to analyze the data in Apache Hadoop systems. Consumers get the best in class open technology • #1 Rank by Gartner 2017 Data Science Magic Quadrant • Leader in SQL technology for Hadoop (www.tpc.org) • Leader in data and analytics solutions for Hybrid Cloud • Provides Data Science & Machine Learning • Leader in Hadoop Open Source Distribution • 1000+ customers and 2100+ ecosystem partners • Hadoop original architects, developers employed by Hortonworks • Provides Open Hadoop Data Platform Commitment to progressing advanced analytics through open source + 2017-18: 100s of deals closed together. Over 40 joint events across 17 countries. Integration with IHAH, Big SQL, DSX, Truata GDPR offering. Working on embedded offerings for IoT, Watson. Monthly Exec Interlocks. Weekly Sales Exec Meetings. Engineering & OM Interlocks.
  • 4. IBM Cloud IBM and Hortonworks – High Value
  • 5. IBM Cloud SQL Ad-hoc queries, data preparation Federation Operational with fast lookups High performance and scalability Integrated Spark and Machine Learning Complex SQL, Deep analytics, Many users Application portability Db2 Big SQL – For all DWH needs in Hadoop SQL-based Application Common SQL Engine Client driver Db2 Big SQL Data Storage SQL MPP Run-time DFS Hadoop
  • 6. © 2017 IBM Corporation DB2 Warehouse (DPF) drives all nodes to read table partitioned across multiple nodes SELECT SUM (…) FROM some_table DB2 Coordinator node 1 node 2 node 3 node 4 node n Sum(..) Sum(..) Sum(..) Sum(..) Sum(..)
  • 7. © 2017 IBM Corporation DB2 Warehouse (DPF) drives all nodes to read table partitioned across multiple nodes Query Result DB2 Coordinator node 1 node 2 node 3 node 4 node n Sum(..) Sum(..) Sum(..) Sum(..) Sum(..)
  • 8. © 2017 IBM Corporation Let’s extend the MPP processing concept now to Hadoop… Database Client Big SQL (head) node 1 node 2 node 3 node 4 node n HDFS A A A B B B A + BComplete Table = Big SQL Scheduler NameNode
  • 9. © 2017 IBM Corporation Big SQL Query Execution Database Client Big SQL (head) node 1 node 2 node 3 node 4 node n HDFS A A A B B B A + BComplete Table = Big SQL Scheduler NameNode
  • 10. IBM Cloud Db2 Big SQL V5.0 CoreCapabilitiesApplications Core SQL Engine SecurityAdministration Comprehensive ANSI SQL coverage Advanced cost-based optimizer SQL based RBAC Ranger Automatic workload management WLM Automatic memory management Performance Query rewrite for optimized execution Elastic boost – logical worker nodes SQL compatibility – Db2, Oracle, Netezza Federation MQTs Integration Spark Integration Batch SQL (minutes to hours) Interactive SQL (seconds to minutes) Self-service / Interactive BI (Sub-second) Data augmentation (Spark integration) Application portability • ETL • Reporting • Data mining • Deep analytics • Reporting • Complex queries • BI Tools: Cognos, Tableau, etc • Ad-hoc, exploratory • BI tools: Cognos, Tableau, etc • Query EDW • Join data • Use ML • Reuse applications • Reuse skills Roles DSM, Ambari SQL and NoSQL Structured & Unstructured www.tpc.org – check out TPC-H and TPC-DS – Big SQL vs Impala vs Hive Db2 Big SQL 5.0 2X faster than Hive LLAP with Tez (gap increases rapidly with higher concurrency) Db2 Big SQL 5.0 3X faster than Spark SQL (more optimal resource utilization and execution engine) Db2 Big SQL 5.0 faster than Cloudera Impala 2.9.0 (scalability hindered by architecture)
  • 11. IBM Cloud No Vendor Lock-in & Integration with Hive • Db2 Big SQL preserves open source foundation • Separation of compute and storage - Data is part of Hadoop • Alternate execution engine to MapReduce • Hive and Db2 Big SQL share common metadata • Db2 Big SQL is optimized to push projections and predicate filters down to the storage I/O engine • Db2 Big SQL can invoke native Hive UDFs efficiently using Parameter Style Hive • Work on Hive ACID tables from Big SQL SQL Execution Engines Open Source Storage Model CSV Parquet ORC Others … Tab Delim. Hive Metastore (open source) Db2Big SQL (IBM) Hive (Open Source) 5.0.3 Ecosystem Integration
  • 12. IBM Cloud New & Improved Reader/Writer for all File Formats ORC HBASE ANALYZE CUSTOM SERDES DATE STORED AS DATE OBJECT STORE EVERY OTHER FORMAT Formats supported by the C++ Reader and more COMPLEX TYPES PARQUET TEXT RCFILE AVRO SEQUENCE FILE Db2 Big SQL 5.0.3 Java I/O ▪Improved Performance on all tables / file types ▪Particularly text files ▪Product Stability ▪Reduced Complexity in Db2 Big SQL architecture ▪Reduced Cost −Personnel and Processes to maintain 2 I/O engines −Customer PMRs and critical situations ▪Reduced Out-of-Memory and instability issues ▪Reduced Resource allocation ▪Better Interoperability with Open Source ▪Java reader/writer for all file formats VARBINARY Enhanced stability and robustness to the product 5.0.3
  • 13. IBM Cloud Big SQL High Availability (Head Node) Scheduler Big SQL Master (Primary) Catalogs Scheduler Big SQL Master (Standby) Catalogs … HDFS Data Worker Node Worker Node Worker Node Database Logs + Data Shipping HDFS Data HDFS Data ▪ Big SQL master node high availability − Scheduler automatically restarted upon failure − Catalog changes (metadata) replicated real time to “warm” standby instance − Standby automatically takes over if the primary fails − Worker nodes automatically detect and re-connect to acting “primary” − Automatic Client Re-route (ACR): clients automatically re-connect to acting “primary”
  • 14. IBM Cloud PERFORMANCE: 6-streams Db2 Big SQL 2.3X FASTER HADOOP-DS @ 10TB 85 COMMON QUERIES WORKING COMPLIANT QUERIES: 6-streams WORKLOAD SCALE FACTOR: 10 TB FILE FORMAT: ORC (ZLIB) CONCURRENCY: 6 STREAMS QUERY SUBSET: 85 QUERIES RESOURCE UTILIZATION: 6-STREAMS 1.5x FEWER CPU CYCLES USED STACK HDP 2.6.1 Db2 Big SQL 5.0.1 HIVE 2.1 LLAP ON TEZ INTERESTING FACTS FASTEST QUERY 5.4X FASTER (Db2 Big SQL: 1.5 SEC, HIVE: 8.1 SEC) SLOWEST QUERY (QUERY 67) 1.7X FASTER (Db2 Big SQL: 6827 SEC, HIVE: 11830 SEC) Db2 Big SQL FASTER FOR 80% OF QUERIES RUN PERFORMANCE: 1-stream Db2 Big SQL 1.8X FASTER hrs hrs Query Performance at a Glance – vs Hive LLAP with Tez
  • 15. IBM Cloud Combining Hadoop Technologies Not Mutually Exclusive. Hive, Db2 Big SQL & Spark SQL can co-exist and complement and leverage each other in a cluster Hive Db2 Big SQL Spark SQL Geospatial analytics ACID capabilities Fast ingest Federation Complex Queries High Concurrency Enterprise ready Application portability All open source files Machine learning Data exploration Simpler SQL Leveraged for metastore and UDFs IBM’s proprietary SQL engine for Hadoop IBM’s open source SQL engine for Hadoop
  • 16. IBM Cloud PERFORMANCE Db2 Big SQL 5.0 is 3.2x faster than Spark SQL 2.1 (4 Concurrent Streams) SNAPSHOT OF 100TB HADOOP-DS I/O (vs Spark) Db2 Big SQL reads 12x less data Db2 Big SQL writes 30x less data COMPRESSION 60% SPACE SAVED WITH PARQUET AVERAGE CPU USAGE 76.4% MAX I/O THROUGHPUT READ 4.4 GB/SEC WRITE 2.8 GB/SEC WORKING QUERIES Leads performance metrics on high volumes of data and concurrent streams Blog on benchmark: https://developer.ibm.com/hadoop/2017/02/07/experiences-comparing-big-sql-and-spark-sql-at-100tb/ Query Performance at a Glance – Db2 Big SQL & Spark SQL
  • 17. IBM Cloud Materialized Query Tables (MQTs) on Hadoop Tables User executes a complex query Db2 Big SQL Query rewrite Generate plan Generate plan Plans are compared and the best one is picked Query results Base tables MQTs MQTs Base • MQTs have results pre-computed and stored • Enables sub-second response times for complex queries with aggregates and joins on dimension tables • Db2 Big SQL optimizer automatically recognizes the MQTS for faster response 5.0.3
  • 18. IBM Cloud MQT Performance – Star Schema Benchmark Queries Quick metric queries Product insight queries Customer insight queries Using Scale Factor 1000, tested 13 queries that join 1 fact with 4 dimension tables 6 Billion Lineitems & 30 Million Customers rows Response time in secs Query performance on non-MQT table 5.0.3
  • 19. IBM Cloud MS SQL Server Netezza (PDA) Oracle PostgreS QL Teradata DB2 LUW, Db2z, DB2 on i Informix WebHDFS Object Store (S3) Hive HBase HDFS Hortonworks Data Platform (HDP) Db2 Big SQL NoSQL ML Model Federation – Virtualize Heterogeneous Data Db2 Big SQL queries heterogeneous systems in a single query Only SQL-on-Hadoop that virtualizes more than 10 different data sources: RDBMS, NoSQL, HDFS or Object Store Transparent ▪ Appears to be one source ▪ Programmers don’t need to know how / where data is stored High Function ▪ Full query support against all data ▪ Capabilities of sources as well Autonomous ▪ Non-disruptive to data sources, existing applications, systems. High Performance ▪ Optimization of distributed queries
  • 20. IBM Cloud ✓ Easily access information on demand ✓ Combine data in Hadoop with disparate sources to form a data lake ✓ Quickly extend your data warehouse by enriching it Connect Query Monitor Data Placement ▪ QuickaccesstoDatavalue ▪ CommonFramework ▪ ODBC/JDBC ▪ Spark integration enables new data sources ▪ Connect all data sources in single query ▪ Intelligent Query Routing ▪ Cost-based optimizer ▪ SQL pushdown ▪ Local data caching ▪ ANSI-compliant SQL ▪ Easily define & manage through a common UI ▪ Simple point & click to discover and query ▪ Monitor and visualize active queries ▪ Schema conversion when moving data ▪ Bulk data copy to Hadoop ▪ Filtered subsets of data Federation - Rich Capabilities that Brings Data Together Think 2018 /9071A - Live Data Analytics using Db2 Big SQL and Big Replicate / March 19, 2018 / © 2018 IBM Corporation
  • 21. IBM Cloud Access data from new data sources Extend the federation capabilities to data sources like: MySQL, PostgreSQL, MariaDB, & MongoDb, with enriched capabilities of query pushdown, best execution plan, secured access and optimized execution time Federation – Db2 Big SQL 5.0.3 Enhancements Function mapping for best query results When a function in one data source can be mapped to same or similar function in the remote data source, the results returned from the query pushdown is refined rather than returning large result sets with no function mapping Create local cache for federated data When federated data does not change frequently, it can be locally cached by creating Hadoop MQTs or regular MQTs to have local access to get best performance when compared to accessing remote data Use computational group to improve performance Computational partition group enables dynamically redistributing nickname data to parallelize processing in Hadoop. This improves performance especially when there’s large nickname data or when queries get complex. 5.0.3
  • 22. IBM Cloud Offload data Data warehouse offload to Hadoop is now made easy: • Write one, run anywhere… • Easy porting of applications • Reuse skills of DBAs/ developers who know ANSI SQL Db2 Big SQL is the best platform for offloading Oracle Data Marts and Warehouses to Hadoop Application Portability: Move Applications without Re-tooling
  • 23. IBM Cloud Here’s why Db2 Big SQL can get you the best execution for complex queries and many concurrent users with high performance Db2 Big SQL - Query Execution Self Tuning Memory Manager World Class Cost Based Optimizer Query rewrite Advanced Statistics Native Row & Columnar stores Elastic Boost SQL Compatibility Hardened runtime Advanced Workload manager Materialized Query Tables Performance Concurrent users Complex query
  • 24. IBM Cloud For more details check the blog: https://developer.ibm.com/hadoop/2017/11/07/ibm-big-sql-machine-learning-demo/ Operationalize Machine Learning Models using SQL
  • 25. IBM Cloud Db2 Big SQL - Security
  • 26. IBM Cloud Db2 Big SQL - Security and Governance Combining Hadoop Technologies Added to SQL level security for row-level and column- level access control, Db2 Big SQL integrates with other components With Apache Ranger Db2 Big SQL plugin, you can setup policies for access to Db2 Big SQL tables: Create, alter, analyze, load, truncate, drop, insert, select, update, and delete. Supports Ranger Audit Big SQL also integrates with Information Governance Catalog by enabling easy shared imports to InfoSphere Metadata Asset Manager, which allows: Analyze assets Utilize assets in jobs Designate stewards for the assets Apache Ranger InfoSphere Metadata Asset Manager 5.0.3
  • 27. IBM Cloud Db2 Big SQL - Advanced Workload Management Benefits • Identification and control of applications • Direct control of the execution environment • Detection and control of rogue queries – prevent bad queries from executing • Query concurrency – optimize query throughput • Advanced monitoring Avoid under-utilizing or over saturating the resources Resources Assigned: CPU I/O Memory Service Classes categorize work and set goals: Response time Velocity System Discretionary WLM Checks every 10 secs 5.0.3
  • 28. IBM Cloud Custom WLM Db2 Big SQL - Advanced Workload Management 5.0.3
  • 29. IBM Cloud Query streaming data in HBase with data at-rest • HBase is a columnar NoSQL data store for Hadoop • HBase offers a flexible schema, low latency key lookups and small key scans. However, HBase is approximately 3x to 5x slower than Hive for large table scans • HBase tables are updateable (ACID), and could be used for versioned dimensions (though native Db2 tables local to the Head Node are generally a better choice here) • Db2 Big SQL supports the full range of SQL operations against HBase tables • Combine streaming data in HBase with relational data in Hive using Big SQL Db2 Big SQL HBaseHive Analytical SQL NZ RDBMS Offloaded Data High Data Ingest Rates from External Applications Db2 Big SQL and Hbase Support
  • 30. IBM Cloud Db2 Big SQL - Tables over S3 Object Storage Create Tables over Data residing in Object Store directly (no copy required into Hadoop) Once configured, Object Store tables work like any other table in Big SQL Benefits: No need to copy data into Hadoop first! Query data where it resides. Partitioning supported! Tradeoff: Expect reduced performance relative to HDFS local tables CREATE HADOOP TABLE staff ( … ) LOCATION 's3a://s3atables/staff'; LOAD FROM Object Store also supported!
  • 31. IBM Cloud Db2 Big SQL - Tables over WebHDFS Transparently access data on any platform implementing WebHDFS Examples: Microsoft Azure Data Lake (ADL) service Once setup, WebHDFS tables work like any other table in Big SQL Technical Preview Limitations: WebHDFS via Knox not supported Performance not well understood. Reduce performance expected. Db2 Big SQL Local Hadoop Cluster Remote Hadoop Cluster or WebHDFS enabled Storage CREATE HADOOP TABLE staff ( … ) PARTITIONED BY (JOB VARCHAR(5)) LOCATION 'webhdfs://namenode.acme.com:50070/path/to/table/staff'; LOAD FROM WebHDFS also supported! 5.0.3
  • 32. IBM Cloud Db2 Big SQL – Integration with Yarn and Spark
  • 33. IBM Cloud HDFS Big SQL Head Node Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Worker Spark Exec. Spark Exec. Spark Exec. Spark Exec. = Fast data transfer over shared memory Db2 Big SQL – Deep Integration with Spark
  • 34. IBM Cloud Exploit Db2 Big SQL from Spark Requirements: db2jcc.jar must be added to the classpath of the Spark application (found in /home/bigsql/java/) import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; … Dataset<Row> tableDf = sqlCtx.read() .format("jdbc") .option("driver", "com.ibm.db2.jcc.DB2Driver") .option("url", "jdbc:db2://server1.foo.bar.com:32051/BIGSQL") .option("user", "joe") .option("password", "joespwd") .option("dbtable", "myshcema.mytable") .load(); tableDf.createOrReplaceTempView("myTable"); Dataset<Row> queryDF = spark.sql("SELECT col2, col3 FROM myTable WHERE col1 > 100"); Big SQL secures data for self-service data exploration. Used this way, Spark users are subject to Big SQL row/column security
  • 35. IBM Cloud Exploit Spark from Big SQL Example: Spark Schema Discovery for JSON Bring the best of Spark into Big SQL! Machine Learning Cache remote tables (Spark has rich library of connectors) Graph Processing General in memory processing SELECT doc.* FROM TABLE( SYSHADOOP.EXECSPARK( class => 'DataSource', load => 'hdfs://host.port.com:8020/user/bigsql/demo.json') ) AS doc WHERE doc.language = 'English'; Structure of JSON document determined at run time
  • 36. © 2018 IBM Corporation36 Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Elastic Boost – new in v5.x (and 4.2.5) Multiple Logical Workers per Host Big SQL Head HDFS Container Big SQL Components Users Big SQL Worker Big SQL Worker Big SQL Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker
  • 37. © 2018 IBM Corporation37 Big SQL Elastic Boost – new in v5.x (and 4.2.5) Logical Workers allows multiple Yarn containers per Host Big SQL Head NM NM NM NM NM NM HDFS Slider Client YARN Resource Manager & Scheduler Big SQL AM Big SQL Worker Big SQL Worker Big SQL Worker Container YARN components Slider Components Big SQL Components Big SQL Slider package implements Slider Client APIs Users Worker Worker Worker Worker Worker Worker Worker Worker Worker
  • 38. © 2018 IBM Corporation38 Elastic Big SQL Capacity • Remember that with Big SQL, 1 container (1 Worker) can service hundreds of concurrent SQL jobs. It accomplishes such by being a long running service that stays resident in memory. So what does 50% mean..?
  • 39. © 2018 IBM Corporation39 What does 50% mean? (default post enablement) ▪ YARN (not Big SQL) decides where containers/workers are started. These situations (and others) are possible. ▪ As capacity target increases above 50%, opportunity of skew is reduced. Worker Worker Worker Worker Worker Worker Worker Worker WorkerWorker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker WorkerWorker Ideal Minor Skew Significant Skew Significant Skew
  • 40. IBM Cloud Db2 Big SQL w/ Elastic Boost - INSERT Performance For both 1 and 10 TB TPC-DS dataset 2 Workers/Node: 1.6x speedup 4 Workers/Node: 2.2x speedup In each scenario, the same TOTAL CPU/memory is used INSERT…SELECT performance with Elastic Boost # Workers / Node
  • 41. IBM Cloud Db2 Big SQL 5.0 – How it fits with Hortonworks Big SQL deploys on top of Hortonworks Data Platform(HDP) Capability Apache Hadoop and Apache Spark and Ecosystem IBM Big SQL Support Community Support Hortonworks Data Platform for IBM Support Offering for HDP includes support for Big SQL BUY Hortonworks Data Platform