SlideShare a Scribd company logo
1 of 35
Download to read offline
© 2014 IBM Corporation 
IBM Big SQL – Vendor Landscape 
<<Speaker Name Here>> 
<<For questions about this presentation contact Glen Sheffield shef@ca.ibm.com
Why SQL for Hadoop? 
 Lower cost DW, exploration and discovery, data lake or reservoir 
 Open up Hadoop data to familiar SQL tools such as Cognos 
“This trend is expected to continue as Hadoop vendors continue to improve so-called SQL-on-Hadoop offerings. These 
technologies from vendors such as Cloudera, Hortonworks and Actian allow data scientists and less sophisticated business 
analysts and business users to query data in Hadoop using ANSI SQL-like tools (some SQL-on-Hadoop offerings have 
more complete SQL functionality than others) and developers to build Hadoop-based data-driven applications” 
2 © 2014 IBM Corporation
What is IBM Big SQL? 
 IBM’s SQL for Hadoop 
– Opens Hadoop data to a wider audience 
– Familiar, widely known syntax 
 Complements the Data Warehouse 
– Exploratory analytics 
– Sandbox 
– Data Lake 
 Included in BigInsights 
 Use familiar SQL tools 
– Like Cognos 
3 © 2014 IBM Corporation
IBM Big SQL is Architected for Performance 
 Architected from the ground up for low latency and high throughput 
 MapReduce replaced with a modern MPP architecture 
– Compiler and runtime are native code (not java) 
– Big SQL worker daemons live directly on cluster 
• Continuously running (no startup latency) 
• Processing happens locally at the data 
– Message passing allows data to flow directly 
between nodes 
 Operations occur in memory with the ability 
to spill to disk 
– Supports joins, aggregations, and sorts larger than 
available RAM 
SQL-based 
Application 
IBM data server 
client 
Big SQL 
Engine 
SQL MPP Run-time 
Data Sources 
CSV 
CSV 
Seq 
Seq 
Parquet 
Parquet 
RC 
RC 
ORC 
ORC 
Avro 
Avro 
Custom 
Custom 
JSON 
JSON 
InfoSphere BigInsights 
4 © 2014 IBM Corporation
IBM Big SQL Embraces Open Source HDFS file formats 
 Big SQL applies SQL to your existing Hadoop data 
– No propriety storage format 
– Only vendor to support Parquet and ORC 
 A table is simply a view on your Hadoop data 
– All data is Hadoop data 
– In files in HDFS 
– SEQ, RC, delimited, Parquet … 
 Table definitions shared with Hive 
– The Hive Metastore catalogs table definitions 
– Reading/writing data logic is shared 
with Hive 
– Definitions can be shared across the 
Hadoop ecosystem 
 Data stored in Hive immediately query-able 
Hive 
Hive 
Metastore 
Hadoop 
Cluster 
Pig 
Hive APIs 
Sqoop 
Big SQL 
Hive APIs 
Hive APIs 
5 © 2014 IBM Corporation
Different Approaches to SQL on Hadoop 
Better 
Worse 
Just the Query Engine 
• MPP SQL query engine 
• Uses Hive metadata 
• Runs on Hadoop cluster 
• Operates directly on HDFS files 
• Best integration with 
Hadoop 
• Native HDFS file 
formats 
Full RDBMS on Hadoop 
• Complete database, including 
storage layer and query engine 
• Uses proprietary metadata 
• Runs on Hadoop cluster 
• Proprietary formats 
• Adds database 
complexities to 
Hadoop 
Submit a remote query 
• RDBMS sends request to Hadoop 
• Result returned to RDBMS 
• Network may impact performance 
• Ability to push down work varies 
• Requires front-end 
database 
• Performance depends 
on network and 
RDBMS load 
6 © 2014 IBM Corporation
What’s the Difference? 
7 
SQL Query Engine 
(Big SQL) 
Hive Metadata Catalog 
(Hadoop) 
Standard HDFS files 
(Hadoop) 
3. Remote Query (e.g. Oracle, Teradata) 
Requires a completely separate front-end 
RDBMS system 
RDBMS 
2. Complete RDBMS on Hadoop 
(e.g. Pivotal HAWQ, Actian, Vertica) 
Query engine, meta data, storage layer 
are all glued together in a 
proprietary bundle running on Hadoop 
© 2014 IBM Corporation 
1. Just the Query Engine 
(IBM Big SQL) 
• Query engine, meta data, storage 
layers are all separate 
• Leverages Hadoop ecosystem 
SQL Query Engine 
Proprietary Database Catalog 
(Meta Data) 
Proprietary Database 
Storage (tables) 
• IBM Big SQL has an architectural 
advantage over other RDBMS vendors 
• Big SQL is a MPP Query engine 
running natively on Hadoop 
• IBM Big SQL uses open source HDFS 
file system, not proprietary RDBMS 
storage layer 
HADOOP 
Hadoop Cluster
SQL for Hadoop Vendor Landscape 
Architecture, 
Performance 
Announced July 15 
2014 for GA 3Q 2014 
Available only with 
Oracle Big Data 
Appliance 
Good, based on 
Oracle 
Remote query from 
Oracle Exadata via 
external tables 
Oracle Big 
Data SQL 
New MPP query engine 
in BigInsights 3.0 
Included with IBM 
InfoSphere Big 
Insights 
Very Good, SQL 
2003/2011 
MPP query engine 
based on DB2 runs 
native on Hadoop 
IBM Big SQL 
Approach SQL 
Support 
Packaging Comments Big SQL 
Advantage 
Cloudera 
(Impala) 
MPP query engine built 
by Cloudera runs 
native on Hadoop 
Poor, subset of 
Hive 0.9 
Included with 
Cloudera Enterprise 
Available for MapR 
and Amazon EMR 
Perception leader SQL support 
Memory usage 
Concurrency 
Hortonworks 
(Hive / Stinger) 
Enhance Hive, replace 
MapReduce with Tez, 
runs native on Hadoop 
Better with Hive 
.13 but sub-query 
restrictions 
Hive .13 Included with 
Hortonworks; Hive .12 
included in Cloudera, 
BigInsights, MapR 
Microsoft, Teradata, 
SAP, HP are partners 
SQL support 
Performance 
Pivotal HD 
(HAWQ) 
Port Greenplum 
database to Hadoop 
including storage layer 
Good, based on 
Greenplum 
Optional feature of 
Pivotal HD 
Funded by EMC and 
GE Capital 
Architecture 
Elastic 
Scalability 
Teradata 
(SQL-H) 
Remote query from 
Teradata by treating 
Hive tables as a view 
Good, based on 
Teradata 
Optional feature of 
Teradata 14.1, 15. 
Included with 
Teradata Distribution 
for Hadoop (TDH) 
QueryGrid coming in 
3Q will provide filtering 
and push-down in 
Hadoop via Hive 13 
Architecture, 
Performance 
Microsoft 
(Polybase) 
Remote query from 
PDW using external 
tables 
Good, based on 
Microsoft 
Available only with 
Microsoft PDW V2 
Limited to Microsoft 
PDW customers 
Architecture, 
Performance 
Presto MPP query engine built 
by Facebook runs 
native on Hadoop 
Good Open source 
download 
New, unproven Commercial 
support 
8 © 2014 IBM Corporation
IBM Big SQL Advantages 
 Native Hadoop architecture 
– Designed for Hadoop Ecosystem 
– Uses native Hadoop data types 
– Elastic scalability 
 Leading performance 
– Better value for money 
 ANSI compliant SQL – broad language support 
– Minimize re-coding retains tools investments 
 Automatic memory management 
– Other vendors may require that queries be “hand-optimized” 
 Security – row, column level, field masking 
 Rich analytic and aggregation functions 
9 © 2014 IBM Corporation
10 © 2014 IBM Corporation
Cloudera Impala Overview 
11 
11 
Interactive SQL for Hadoop 
Responses in seconds 
Nearly ANSI-92 standard SQL with Hive SQL 
Native MPP Query Engine 
Purpose-built for low-latency queries 
Separate runtime from MapReduce 
Designed as part of the Hadoop ecosystem 
Open Source 
Apache-licensed 
11 © 2014 IBM Corporation
Is Cloudera Impala Open Source? 
 Yes 
– Impala code is available for download on Github, and available under Apache license 
– Impala is available for Amazon EMR and MapR Hadoop distributions 
 No 
– Customers assume that the open source community contributes to Impala 
– But Impala is Not an Apache Software Foundation open source project 
– Cloudera is the only contributor (developer) to Impala code 
 IBM has decades of SQL research and development being invested into IBM 
Big SQL 
– Impala has no advantage over IBM Big SQL in terms of code contribution or product 
development 
12 © 2014 IBM Corporation
IBM Big SQL Compared to Cloudera Impala 
 Better SQL Support for end user tools and queries 
– SQL 2003/2011+ (Impala only supports a subset of SQL92) 
– Impala doesn’t support sub-queries in where-clause, nested-subqueries, windowed 
aggregates, common table expressions, rollup, cube, for example 
– Translates to better end-user tool usage and performance 
 Guaranteed execution for complex queries for reliability and ease of use 
– Big SQL has Automatic Memory Management 
– Unlike Impala, Big SQL has no limitation that joined tables have to fit in aggregated 
memory of data nodes which can cause queries to run out of memory and fail 
 Fine grained row and column access control for enhanced security 
– Easy development for multi-tenancy applications 
– Does not require the use of views 
– Impala requires use of views which increases complexity 
 More Features 
– Federation 
– Stored procedures 
– More Scalar functions 
– More Aggregate functions 
13 © 2014 IBM Corporation
14 © 2014 IBM Corporation
Hortonworks Stinger Initiative Improved Apache Hive 
15 © 2014 IBM Corporation
IBM Big SQL Extends Value of Apache Hive 
 Big SQL can immediately query data already stored in Hive 
– Big SQL uses Hive Metadata and extends value of Hive 
– Big Insights includes both Hive and Big SQL 
 Better SQL Support for end user tools and queries 
– IBM Big SQL runs all 99 TPC-DS queries un-modified 
– Hive .12 runs only 43 TPC-DS queries un-modified (Hive .13 testing pending) 
– Hive still has many sub-query restrictions and doesn’t support non equi-joins, etc. 
– Translates to better end-user tool usage and performance 
 Faster Performance 
– Up to 41x faster than Hive .12 on TPC-H like benchmark* 
– Hive .13 benchmarks pending 
 Fine grained row and column access control for enhanced security 
– Easy development for multi-tenancy applications 
– Does not require the use of views 
 More Features 
– Federation 
– Stored procedures 
– More Scalar functions 
– More Aggregate functions 
TPC-DS Benchmark and TPC-H are trademarks of the Transaction Processing Performance Council (TPC). 
16 © 2014 IBM Corporation
Comparing Big SQL and Hive 0.12 for Ad-Hoc Queries 
Big SQL is up 
to 41x faster 
than Hive 0.12 
Big SQL is up 
to 41x faster 
than Hive 0.12 
*Based on IBM internal tests comparing IBM Infosphere Biginsights 3.0 Big SQL with Hive 0.12 executing the 1TB Classic BI 
Workload in a controlled laboratory environment. The 1TB Classic BI Workload is a workload derived from the TPC-H 
Benchmark Standard, running at 1TB scale factor. It is materially equivalent with the exception that no update functions are 
performed. TPC Benchmark and TPC-H are trademarks of the Transaction Processing Performance Council (TPC). 
Configuration: Cluster of 9 System x3650HD servers, each with 64GB RAM and 9x2TB HDDs running Redhat Linux 6.3. 
Results may not be typical and will vary based on actual workload, configuration, applications, queries and other variables in a 
production environment. Results as of April 22, 2014 
17 © 2014 IBM Corporation
How many times Faster is Big SQL than Hive 0.12? 
Max 
Max 
Speedup 
of 74x 
Speedup 
of 74x 
Avg 
Avg 
Speedup 
of 20x 
Speedup 
of 20x 
Queries sorted by speed up ratio (worst to best) 
* Based on IBM internal tests comparing IBM Infosphere Biginsights 3.0 Big SQL with Hive 0.12 executing the 1TB Modern BI Workload 
in a controlled laboratory environment. The 1TB Modern BI Workload is a workload derived from the TPC-DS Benchmark Standard, 
running at 1TB scale factor. It is materially equivalent with the exception that no updates are performed, and only 43 out of 99 queries 
are executed. The test measured sequential query execution of all 43 queries for which Hive syntax was publically available. TPC 
Benchmark and TPC-DS are trademarks of the Transaction Processing Performance Council (TPC). 
Configuration: Cluster of 9 System x3650HD servers, each with 64GB RAM and 9x2TB HDDs running Redhat Linux 6.3. Results may not 
be typical and will vary based on actual workload, configuration, applications, queries and other variables in a production environment. 
Results as of April 22, 2014 
18 © 2014 IBM Corporation
19 © 2014 IBM Corporation
Oracle Big Data SQL Requires Exadata and Big Data Appliance 
 Oracle Big Data SQL will allow an 
Oracle Exadata user to issue a 
remote SQL query to Oracle's Big 
Data Appliance and return the 
result 
 Not Yet Available 
 IBM Big SQL, part of BigInsights, 
gives customers the ability to issue 
SQL queries directly to Hadoop 
 IBM Big SQL is available today 
Oracle Big Data SQL 
Oracle Exadata 
Oracle Big Data Appliance 
IBM Big SQL 
Hadoop 
Not Yet Available 
Available Today 
© 20 2014 IBM Corporation
Oracle Big Data SQL – Exadata to Big Data Appliance 
Oracle Big Data SQL = Remote query from Exadata to Oracle Big Data Appliance 
Oracle Exadata 
(Oracle 12 c Database) 
Oracle Big Data 
Appliance 
(Cloudera Hadoop) 
SQL query issued 
Results returned
Oracle Big Data SQL – Not What it Seems 
 Oracle's solution requires Oracle 
12c Database to submit the query 
to Hadoop 
– Oracle Big Data SQL requires the 
Oracle 12c database on Exadata as the 
front-end system to issue the query to 
Hadoop 
 Oracle's solution requires both 
Oracle Exadata and the Oracle Big 
Data Appliance 
– Likely to be of interest only to 
customers willing to buy into the 
complete Oracle stack 
 IBM Big SQL queries Hadoop 
directly 
– No need for a front-end relational 
database 
 IBM Big SQL runs on any 
commodity x86 or Power Linux 
hardware 
– No need for proprietary, lock-in Big 
Data Appliance 
© 22 2014 IBM Corporation
IBM Big SQL Compared to Oracle Big Data SQL 
 IBM Big SQL offers faster performance 
– Hadoop data does not need to be transferred across a network to an Oracle 
database for additional processing (joins, aggregations, etc) 
 IBM Big SQL offers “Direct Connect” 
– Front-end query tools such as Cognos can connect directly to Hadoop with IBM 
Big SQL, and more efficiently, by not having to connect first to the Oracle 
database 
 IBM Big SQL offers true MPP on Hadoop 
– Big SQL deploys directly on the physical Hadoop cluster 
– Big SQL accesses Hadoop data natively for reading and writing 
 IBM Big SQL offers query federation 
– Big SQL can federate queries across DB2, Netezza, Teradata, and even Oracle 
– Integrating existing relational database data with Hadoop 
 IBM Big SQL is available today 
– Oracle Big Data SQL is not even available in Beta yet 
© 23 2014 IBM Corporation
24 © 2014 IBM Corporation
Pivotal HD HAWQ 
Or is it 
The complexity of Hadoop 
married to the complexity of 
the Greenplum Database? 
25 © 2014 IBM Corporation
Pivotal HD – HAWQ is based on Greenplum Database 
 HAWQ is based on the Greenplum database with modifications that enable 
data to be stored on HDFS 
– Or HDFS compatible file systems (such as EMC ISILON OneFS) 
 The data is still stored in Greenplum relational tables, in a proprietary format 
(.GDB) on HDFS, separate from Hadoop data 
– Readable only through the Greenplum interface* 
 HAWQ SQL access to Hadoop data (including HBase) is done via the 
Greenplum Database External Table feature 
– Part of what is now called PXF – Pivotal Extension Framework. 
 HAWQ uses its own internal proprietary metadata 
– Does not use Apache Hadoop Hive Metadata Catalog (HCatalog) 
*Hawq recently added support for Parquet files 
26 © 2014 IBM Corporation
IBM Big SQL Advantages Compared to Pivotal HAWQ 
 Architecture designed for Hadoop Ecosystem 
– Big SQL uses meta data and standard files on HDFS 
– HAWQ uses its own metadata and database tables 
 Elastic Scalability 
– With Big SQL, nodes can be added or removed from cluster on-line 
– HAWQ requires complex, disruptive, off-line MPP database re-distribution 
 Federation 
– DB2, Netezza, Oracle, Teradata 
 Fine grained row and column access control 
– Easy development for multi-tenancy applications 
– Does not require the use of views 
 Packaging 
– Big SQL included in BigInsights 
– HAWQ is extra cost license for Pivotal HD 
27 © 2014 IBM Corporation
28 © 2014 IBM Corporation
Teradata “Unified Data Architecture” 
 Hadoop purpose is for data 
capture and transformation 
 Aster Data used for discovery 
and exploratory analytics 
 SQL-H is used to query Hadoop 
using SQL from either Aster 
Data or Teradata 
 SQL-H issues a remote query to 
Hadoop and brings the data 
back to the RDBMS for 
processing 
29 © 2014 IBM Corporation
Teradata SQL-H – Remote Query to Hadoop 
30 © 2014 IBM Corporation
IBM Big SQL Compared to Teradata SQL-H 
 Big SQL operates and processes the data directly on Hadoop 
– Does not need to move data to relational database for analytics 
 Big SQL does not require separate licensing or complexity of an MPP 
relational database like Teradata 
– SQL-H requires deployment of the Aster or Teradata relational database 
– Big SQL provides direct SQL access to Hadoop as part of BigInsights 
 Big SQL supports SQL access to HBase 
– SQL-H does not support HBase 
 SQL-H depends on network latency/speed and Teradata system 
configuration for performance 
– Teradata system may already be overburdened 
– Teradata system likely not sized for additional processing of Hadoop data 
 Aster Data and Teradata are complex 
– Relational MPP engine with partitioning, indexing, tuning requirements 
– Aster has operational limitations including having to rebuild indexes on load, vacuum 
operations to reclaim space, managing freespace, etc. 
31 © 2014 IBM Corporation
Summary 
32 © 2014 IBM Corporation
Summary 
 SQL on Hadoop presents huge opportunity 
– Data Lake, Data Warehouse offload, Sandbox and exploration 
 IBM Big SQL Architecture is designed for Hadoop Ecosystem 
– MPP query engine runs natively on Hadoop 
 Big SQL provides many advantages over Cloudera Impala 
– SQL support, memory management, more features 
 Big SQL also has advantages over Hive 
– SQL support, performance 
– But Big SQL works on top of Hive, extending value of Hive 
 Big SQL designed for Hadoop, queries Hadoop data directly 
– Neither Oracle, nor Teradata, nor Microsoft have a similar solution 
– Many SQL Hadoop solutions submit remote queries to Hadoop 
– Other SQL solutions are ports of database, including storage layer 
33 © 2014 IBM Corporation
Backup: Vendor Landscape 
 Cloudera has developed a MPP SQL engine for Hadoop called Impala, which is 
now also offered by MapR and Amazon EMR (Elastic Map Reduce) 
 Hortonworks is enhancing the breadth of SQL coverage, and performance of Hive, 
under the project name Stinger 
 Pivotal HAWQ is based on the Greenplum MPP database, and provides similar 
MPP database capability on Hadoop as part of the Pivotal HD (Hadoop) product 
 Teradata offers SQL-H which enables a Teradata end-user to issue a remote 
query to Hadoop and bring the data back into Teradata for analysis or integration 
with Teradata data. Teradata Query Grid (3Q 2014) will enable SQL-H result sets 
to be filtered on Hadoop prior to being returned to Teradata. 
 Oracle announced Oracle Big Data SQL which enables an Oracle Exadata end 
user to issue remote queries to the Oracle Big Data Appliance and bring back a 
filtered result set to Oracle database for analysis or integration with Oracle data. 
GA 3Q 2014. 
 Microsoft’s Polybase enables their PDW appliance to issue remote SQL queries to 
Hortonworks Hadoop on Windows or Microsoft HDInsight 
 Actian has consolidated ParAccel and Vectorwise technologies, ported to Hadoop 
and released it as Actian Analytics Platform Hadoop SQL Edition. 
 HP Vertica has released Dragline which can run on a MapR cluster and share 
storage with Hadoop 
34 © 2014 IBM Corporation
Backup: Vendor Landcape 
 Facebook has developed Presto, an open source SQL engine for Hadoop 
designed for interactive query analysis on large data sets 
 Apache Drill is an open source project based on Google Dremel to provide a 
distributed SQL execution engine for interactive, low latency queries. Currently in 
Beta 
 Spark SQL is a new SQL engine being developed from the ground up for Spark by 
Databricks, and replaces the previous Shark project. Currently in Alpha. 
 Splice Machine claims to be a full ACID compliant RDBMS on Hadoop, by taking 
the Apache Derby Java relational database and removed its storage layer, 
replacing it with the Apache HBase NoSQL database. Then the company modified 
the planner, optimiser and executor inside Derby to take advantage of HBase's 
distributed architecture. Currently in Beta. 
 Citus DB is built on PostgreSQL and runs on Hadoop nodes 
 JethroData is an index-based SQL engine for Hadoop automatically indexes data 
as it is written into Hadoop 
 InfiniDB has recently announced InfiniDB for Apache Hadoop, which is positioned 
for analytics and claims “if you know MySQL, you know InfiniDB” 
35 © 2014 IBM Corporation

More Related Content

What's hot

Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdIBM Analytics
 
Using your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and SparkUsing your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and SparkCynthia Saracco
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Data Con LA
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab Cynthia Saracco
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoopCraig Jordan
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLPiotr Pruski
 
Big Data: Explore Hadoop and BigInsights self-study lab
Big Data:  Explore Hadoop and BigInsights self-study labBig Data:  Explore Hadoop and BigInsights self-study lab
Big Data: Explore Hadoop and BigInsights self-study labCynthia Saracco
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 

What's hot (17)

Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
 
Using your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and SparkUsing your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and Spark
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Big Data: Explore Hadoop and BigInsights self-study lab
Big Data:  Explore Hadoop and BigInsights self-study labBig Data:  Explore Hadoop and BigInsights self-study lab
Big Data: Explore Hadoop and BigInsights self-study lab
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Oracle Data Warehouse
Oracle Data WarehouseOracle Data Warehouse
Oracle Data Warehouse
 

Viewers also liked

Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewPaulo Fagundes
 
Tajo and SQL-on-Hadoop in Tech Planet 2013
Tajo and SQL-on-Hadoop in Tech Planet 2013Tajo and SQL-on-Hadoop in Tech Planet 2013
Tajo and SQL-on-Hadoop in Tech Planet 2013Gruter
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBRadenko Zec
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetupnvvrajesh
 
Impala SQL Support
Impala SQL SupportImpala SQL Support
Impala SQL SupportYue Chen
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Matt Fuller
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupWojciech Biela
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Edureka!
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSChris Riddell
 
Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN Arinto Murdopo
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARNArinto Murdopo
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Matt Fuller
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRDouglas Bernardini
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseDataWorks Summit
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark Cynthia Saracco
 

Viewers also liked (20)

Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
Tajo and SQL-on-Hadoop in Tech Planet 2013
Tajo and SQL-on-Hadoop in Tech Planet 2013Tajo and SQL-on-Hadoop in Tech Planet 2013
Tajo and SQL-on-Hadoop in Tech Planet 2013
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
Impala SQL Support
Impala SQL SupportImpala SQL Support
Impala SQL Support
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWS
 
Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 

Similar to Big SQL Competitive Summary - Vendor Landscape

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...tdc-globalcode
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
 
Big SQL NYC Event December by Virender
Big SQL NYC Event December by VirenderBig SQL NYC Event December by Virender
Big SQL NYC Event December by Virendervithakur
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 

Similar to Big SQL Competitive Summary - Vendor Landscape (20)

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Big SQL NYC Event December by Virender
Big SQL NYC Event December by VirenderBig SQL NYC Event December by Virender
Big SQL NYC Event December by Virender
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 

More from Nicolas Morales

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Nicolas Morales
 
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014Nicolas Morales
 
IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014Nicolas Morales
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easyNicolas Morales
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 
SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0Nicolas Morales
 
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesNicolas Morales
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big DataNicolas Morales
 

More from Nicolas Morales (10)

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
 
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
 
IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 
SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0
 
Text Analytics
Text Analytics Text Analytics
Text Analytics
 
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data Technologies
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 

Recently uploaded

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Recently uploaded (20)

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Big SQL Competitive Summary - Vendor Landscape

  • 1. © 2014 IBM Corporation IBM Big SQL – Vendor Landscape <<Speaker Name Here>> <<For questions about this presentation contact Glen Sheffield shef@ca.ibm.com
  • 2. Why SQL for Hadoop? Lower cost DW, exploration and discovery, data lake or reservoir Open up Hadoop data to familiar SQL tools such as Cognos “This trend is expected to continue as Hadoop vendors continue to improve so-called SQL-on-Hadoop offerings. These technologies from vendors such as Cloudera, Hortonworks and Actian allow data scientists and less sophisticated business analysts and business users to query data in Hadoop using ANSI SQL-like tools (some SQL-on-Hadoop offerings have more complete SQL functionality than others) and developers to build Hadoop-based data-driven applications” 2 © 2014 IBM Corporation
  • 3. What is IBM Big SQL? IBM’s SQL for Hadoop – Opens Hadoop data to a wider audience – Familiar, widely known syntax Complements the Data Warehouse – Exploratory analytics – Sandbox – Data Lake Included in BigInsights Use familiar SQL tools – Like Cognos 3 © 2014 IBM Corporation
  • 4. IBM Big SQL is Architected for Performance Architected from the ground up for low latency and high throughput MapReduce replaced with a modern MPP architecture – Compiler and runtime are native code (not java) – Big SQL worker daemons live directly on cluster • Continuously running (no startup latency) • Processing happens locally at the data – Message passing allows data to flow directly between nodes Operations occur in memory with the ability to spill to disk – Supports joins, aggregations, and sorts larger than available RAM SQL-based Application IBM data server client Big SQL Engine SQL MPP Run-time Data Sources CSV CSV Seq Seq Parquet Parquet RC RC ORC ORC Avro Avro Custom Custom JSON JSON InfoSphere BigInsights 4 © 2014 IBM Corporation
  • 5. IBM Big SQL Embraces Open Source HDFS file formats Big SQL applies SQL to your existing Hadoop data – No propriety storage format – Only vendor to support Parquet and ORC A table is simply a view on your Hadoop data – All data is Hadoop data – In files in HDFS – SEQ, RC, delimited, Parquet … Table definitions shared with Hive – The Hive Metastore catalogs table definitions – Reading/writing data logic is shared with Hive – Definitions can be shared across the Hadoop ecosystem Data stored in Hive immediately query-able Hive Hive Metastore Hadoop Cluster Pig Hive APIs Sqoop Big SQL Hive APIs Hive APIs 5 © 2014 IBM Corporation
  • 6. Different Approaches to SQL on Hadoop Better Worse Just the Query Engine • MPP SQL query engine • Uses Hive metadata • Runs on Hadoop cluster • Operates directly on HDFS files • Best integration with Hadoop • Native HDFS file formats Full RDBMS on Hadoop • Complete database, including storage layer and query engine • Uses proprietary metadata • Runs on Hadoop cluster • Proprietary formats • Adds database complexities to Hadoop Submit a remote query • RDBMS sends request to Hadoop • Result returned to RDBMS • Network may impact performance • Ability to push down work varies • Requires front-end database • Performance depends on network and RDBMS load 6 © 2014 IBM Corporation
  • 7. What’s the Difference? 7 SQL Query Engine (Big SQL) Hive Metadata Catalog (Hadoop) Standard HDFS files (Hadoop) 3. Remote Query (e.g. Oracle, Teradata) Requires a completely separate front-end RDBMS system RDBMS 2. Complete RDBMS on Hadoop (e.g. Pivotal HAWQ, Actian, Vertica) Query engine, meta data, storage layer are all glued together in a proprietary bundle running on Hadoop © 2014 IBM Corporation 1. Just the Query Engine (IBM Big SQL) • Query engine, meta data, storage layers are all separate • Leverages Hadoop ecosystem SQL Query Engine Proprietary Database Catalog (Meta Data) Proprietary Database Storage (tables) • IBM Big SQL has an architectural advantage over other RDBMS vendors • Big SQL is a MPP Query engine running natively on Hadoop • IBM Big SQL uses open source HDFS file system, not proprietary RDBMS storage layer HADOOP Hadoop Cluster
  • 8. SQL for Hadoop Vendor Landscape Architecture, Performance Announced July 15 2014 for GA 3Q 2014 Available only with Oracle Big Data Appliance Good, based on Oracle Remote query from Oracle Exadata via external tables Oracle Big Data SQL New MPP query engine in BigInsights 3.0 Included with IBM InfoSphere Big Insights Very Good, SQL 2003/2011 MPP query engine based on DB2 runs native on Hadoop IBM Big SQL Approach SQL Support Packaging Comments Big SQL Advantage Cloudera (Impala) MPP query engine built by Cloudera runs native on Hadoop Poor, subset of Hive 0.9 Included with Cloudera Enterprise Available for MapR and Amazon EMR Perception leader SQL support Memory usage Concurrency Hortonworks (Hive / Stinger) Enhance Hive, replace MapReduce with Tez, runs native on Hadoop Better with Hive .13 but sub-query restrictions Hive .13 Included with Hortonworks; Hive .12 included in Cloudera, BigInsights, MapR Microsoft, Teradata, SAP, HP are partners SQL support Performance Pivotal HD (HAWQ) Port Greenplum database to Hadoop including storage layer Good, based on Greenplum Optional feature of Pivotal HD Funded by EMC and GE Capital Architecture Elastic Scalability Teradata (SQL-H) Remote query from Teradata by treating Hive tables as a view Good, based on Teradata Optional feature of Teradata 14.1, 15. Included with Teradata Distribution for Hadoop (TDH) QueryGrid coming in 3Q will provide filtering and push-down in Hadoop via Hive 13 Architecture, Performance Microsoft (Polybase) Remote query from PDW using external tables Good, based on Microsoft Available only with Microsoft PDW V2 Limited to Microsoft PDW customers Architecture, Performance Presto MPP query engine built by Facebook runs native on Hadoop Good Open source download New, unproven Commercial support 8 © 2014 IBM Corporation
  • 9. IBM Big SQL Advantages Native Hadoop architecture – Designed for Hadoop Ecosystem – Uses native Hadoop data types – Elastic scalability Leading performance – Better value for money ANSI compliant SQL – broad language support – Minimize re-coding retains tools investments Automatic memory management – Other vendors may require that queries be “hand-optimized” Security – row, column level, field masking Rich analytic and aggregation functions 9 © 2014 IBM Corporation
  • 10. 10 © 2014 IBM Corporation
  • 11. Cloudera Impala Overview 11 11 Interactive SQL for Hadoop Responses in seconds Nearly ANSI-92 standard SQL with Hive SQL Native MPP Query Engine Purpose-built for low-latency queries Separate runtime from MapReduce Designed as part of the Hadoop ecosystem Open Source Apache-licensed 11 © 2014 IBM Corporation
  • 12. Is Cloudera Impala Open Source? Yes – Impala code is available for download on Github, and available under Apache license – Impala is available for Amazon EMR and MapR Hadoop distributions No – Customers assume that the open source community contributes to Impala – But Impala is Not an Apache Software Foundation open source project – Cloudera is the only contributor (developer) to Impala code IBM has decades of SQL research and development being invested into IBM Big SQL – Impala has no advantage over IBM Big SQL in terms of code contribution or product development 12 © 2014 IBM Corporation
  • 13. IBM Big SQL Compared to Cloudera Impala Better SQL Support for end user tools and queries – SQL 2003/2011+ (Impala only supports a subset of SQL92) – Impala doesn’t support sub-queries in where-clause, nested-subqueries, windowed aggregates, common table expressions, rollup, cube, for example – Translates to better end-user tool usage and performance Guaranteed execution for complex queries for reliability and ease of use – Big SQL has Automatic Memory Management – Unlike Impala, Big SQL has no limitation that joined tables have to fit in aggregated memory of data nodes which can cause queries to run out of memory and fail Fine grained row and column access control for enhanced security – Easy development for multi-tenancy applications – Does not require the use of views – Impala requires use of views which increases complexity More Features – Federation – Stored procedures – More Scalar functions – More Aggregate functions 13 © 2014 IBM Corporation
  • 14. 14 © 2014 IBM Corporation
  • 15. Hortonworks Stinger Initiative Improved Apache Hive 15 © 2014 IBM Corporation
  • 16. IBM Big SQL Extends Value of Apache Hive Big SQL can immediately query data already stored in Hive – Big SQL uses Hive Metadata and extends value of Hive – Big Insights includes both Hive and Big SQL Better SQL Support for end user tools and queries – IBM Big SQL runs all 99 TPC-DS queries un-modified – Hive .12 runs only 43 TPC-DS queries un-modified (Hive .13 testing pending) – Hive still has many sub-query restrictions and doesn’t support non equi-joins, etc. – Translates to better end-user tool usage and performance Faster Performance – Up to 41x faster than Hive .12 on TPC-H like benchmark* – Hive .13 benchmarks pending Fine grained row and column access control for enhanced security – Easy development for multi-tenancy applications – Does not require the use of views More Features – Federation – Stored procedures – More Scalar functions – More Aggregate functions TPC-DS Benchmark and TPC-H are trademarks of the Transaction Processing Performance Council (TPC). 16 © 2014 IBM Corporation
  • 17. Comparing Big SQL and Hive 0.12 for Ad-Hoc Queries Big SQL is up to 41x faster than Hive 0.12 Big SQL is up to 41x faster than Hive 0.12 *Based on IBM internal tests comparing IBM Infosphere Biginsights 3.0 Big SQL with Hive 0.12 executing the 1TB Classic BI Workload in a controlled laboratory environment. The 1TB Classic BI Workload is a workload derived from the TPC-H Benchmark Standard, running at 1TB scale factor. It is materially equivalent with the exception that no update functions are performed. TPC Benchmark and TPC-H are trademarks of the Transaction Processing Performance Council (TPC). Configuration: Cluster of 9 System x3650HD servers, each with 64GB RAM and 9x2TB HDDs running Redhat Linux 6.3. Results may not be typical and will vary based on actual workload, configuration, applications, queries and other variables in a production environment. Results as of April 22, 2014 17 © 2014 IBM Corporation
  • 18. How many times Faster is Big SQL than Hive 0.12? Max Max Speedup of 74x Speedup of 74x Avg Avg Speedup of 20x Speedup of 20x Queries sorted by speed up ratio (worst to best) * Based on IBM internal tests comparing IBM Infosphere Biginsights 3.0 Big SQL with Hive 0.12 executing the 1TB Modern BI Workload in a controlled laboratory environment. The 1TB Modern BI Workload is a workload derived from the TPC-DS Benchmark Standard, running at 1TB scale factor. It is materially equivalent with the exception that no updates are performed, and only 43 out of 99 queries are executed. The test measured sequential query execution of all 43 queries for which Hive syntax was publically available. TPC Benchmark and TPC-DS are trademarks of the Transaction Processing Performance Council (TPC). Configuration: Cluster of 9 System x3650HD servers, each with 64GB RAM and 9x2TB HDDs running Redhat Linux 6.3. Results may not be typical and will vary based on actual workload, configuration, applications, queries and other variables in a production environment. Results as of April 22, 2014 18 © 2014 IBM Corporation
  • 19. 19 © 2014 IBM Corporation
  • 20. Oracle Big Data SQL Requires Exadata and Big Data Appliance Oracle Big Data SQL will allow an Oracle Exadata user to issue a remote SQL query to Oracle's Big Data Appliance and return the result Not Yet Available IBM Big SQL, part of BigInsights, gives customers the ability to issue SQL queries directly to Hadoop IBM Big SQL is available today Oracle Big Data SQL Oracle Exadata Oracle Big Data Appliance IBM Big SQL Hadoop Not Yet Available Available Today © 20 2014 IBM Corporation
  • 21. Oracle Big Data SQL – Exadata to Big Data Appliance Oracle Big Data SQL = Remote query from Exadata to Oracle Big Data Appliance Oracle Exadata (Oracle 12 c Database) Oracle Big Data Appliance (Cloudera Hadoop) SQL query issued Results returned
  • 22. Oracle Big Data SQL – Not What it Seems Oracle's solution requires Oracle 12c Database to submit the query to Hadoop – Oracle Big Data SQL requires the Oracle 12c database on Exadata as the front-end system to issue the query to Hadoop Oracle's solution requires both Oracle Exadata and the Oracle Big Data Appliance – Likely to be of interest only to customers willing to buy into the complete Oracle stack IBM Big SQL queries Hadoop directly – No need for a front-end relational database IBM Big SQL runs on any commodity x86 or Power Linux hardware – No need for proprietary, lock-in Big Data Appliance © 22 2014 IBM Corporation
  • 23. IBM Big SQL Compared to Oracle Big Data SQL IBM Big SQL offers faster performance – Hadoop data does not need to be transferred across a network to an Oracle database for additional processing (joins, aggregations, etc) IBM Big SQL offers “Direct Connect” – Front-end query tools such as Cognos can connect directly to Hadoop with IBM Big SQL, and more efficiently, by not having to connect first to the Oracle database IBM Big SQL offers true MPP on Hadoop – Big SQL deploys directly on the physical Hadoop cluster – Big SQL accesses Hadoop data natively for reading and writing IBM Big SQL offers query federation – Big SQL can federate queries across DB2, Netezza, Teradata, and even Oracle – Integrating existing relational database data with Hadoop IBM Big SQL is available today – Oracle Big Data SQL is not even available in Beta yet © 23 2014 IBM Corporation
  • 24. 24 © 2014 IBM Corporation
  • 25. Pivotal HD HAWQ Or is it The complexity of Hadoop married to the complexity of the Greenplum Database? 25 © 2014 IBM Corporation
  • 26. Pivotal HD – HAWQ is based on Greenplum Database HAWQ is based on the Greenplum database with modifications that enable data to be stored on HDFS – Or HDFS compatible file systems (such as EMC ISILON OneFS) The data is still stored in Greenplum relational tables, in a proprietary format (.GDB) on HDFS, separate from Hadoop data – Readable only through the Greenplum interface* HAWQ SQL access to Hadoop data (including HBase) is done via the Greenplum Database External Table feature – Part of what is now called PXF – Pivotal Extension Framework. HAWQ uses its own internal proprietary metadata – Does not use Apache Hadoop Hive Metadata Catalog (HCatalog) *Hawq recently added support for Parquet files 26 © 2014 IBM Corporation
  • 27. IBM Big SQL Advantages Compared to Pivotal HAWQ Architecture designed for Hadoop Ecosystem – Big SQL uses meta data and standard files on HDFS – HAWQ uses its own metadata and database tables Elastic Scalability – With Big SQL, nodes can be added or removed from cluster on-line – HAWQ requires complex, disruptive, off-line MPP database re-distribution Federation – DB2, Netezza, Oracle, Teradata Fine grained row and column access control – Easy development for multi-tenancy applications – Does not require the use of views Packaging – Big SQL included in BigInsights – HAWQ is extra cost license for Pivotal HD 27 © 2014 IBM Corporation
  • 28. 28 © 2014 IBM Corporation
  • 29. Teradata “Unified Data Architecture” Hadoop purpose is for data capture and transformation Aster Data used for discovery and exploratory analytics SQL-H is used to query Hadoop using SQL from either Aster Data or Teradata SQL-H issues a remote query to Hadoop and brings the data back to the RDBMS for processing 29 © 2014 IBM Corporation
  • 30. Teradata SQL-H – Remote Query to Hadoop 30 © 2014 IBM Corporation
  • 31. IBM Big SQL Compared to Teradata SQL-H Big SQL operates and processes the data directly on Hadoop – Does not need to move data to relational database for analytics Big SQL does not require separate licensing or complexity of an MPP relational database like Teradata – SQL-H requires deployment of the Aster or Teradata relational database – Big SQL provides direct SQL access to Hadoop as part of BigInsights Big SQL supports SQL access to HBase – SQL-H does not support HBase SQL-H depends on network latency/speed and Teradata system configuration for performance – Teradata system may already be overburdened – Teradata system likely not sized for additional processing of Hadoop data Aster Data and Teradata are complex – Relational MPP engine with partitioning, indexing, tuning requirements – Aster has operational limitations including having to rebuild indexes on load, vacuum operations to reclaim space, managing freespace, etc. 31 © 2014 IBM Corporation
  • 32. Summary 32 © 2014 IBM Corporation
  • 33. Summary SQL on Hadoop presents huge opportunity – Data Lake, Data Warehouse offload, Sandbox and exploration IBM Big SQL Architecture is designed for Hadoop Ecosystem – MPP query engine runs natively on Hadoop Big SQL provides many advantages over Cloudera Impala – SQL support, memory management, more features Big SQL also has advantages over Hive – SQL support, performance – But Big SQL works on top of Hive, extending value of Hive Big SQL designed for Hadoop, queries Hadoop data directly – Neither Oracle, nor Teradata, nor Microsoft have a similar solution – Many SQL Hadoop solutions submit remote queries to Hadoop – Other SQL solutions are ports of database, including storage layer 33 © 2014 IBM Corporation
  • 34. Backup: Vendor Landscape Cloudera has developed a MPP SQL engine for Hadoop called Impala, which is now also offered by MapR and Amazon EMR (Elastic Map Reduce) Hortonworks is enhancing the breadth of SQL coverage, and performance of Hive, under the project name Stinger Pivotal HAWQ is based on the Greenplum MPP database, and provides similar MPP database capability on Hadoop as part of the Pivotal HD (Hadoop) product Teradata offers SQL-H which enables a Teradata end-user to issue a remote query to Hadoop and bring the data back into Teradata for analysis or integration with Teradata data. Teradata Query Grid (3Q 2014) will enable SQL-H result sets to be filtered on Hadoop prior to being returned to Teradata. Oracle announced Oracle Big Data SQL which enables an Oracle Exadata end user to issue remote queries to the Oracle Big Data Appliance and bring back a filtered result set to Oracle database for analysis or integration with Oracle data. GA 3Q 2014. Microsoft’s Polybase enables their PDW appliance to issue remote SQL queries to Hortonworks Hadoop on Windows or Microsoft HDInsight Actian has consolidated ParAccel and Vectorwise technologies, ported to Hadoop and released it as Actian Analytics Platform Hadoop SQL Edition. HP Vertica has released Dragline which can run on a MapR cluster and share storage with Hadoop 34 © 2014 IBM Corporation
  • 35. Backup: Vendor Landcape Facebook has developed Presto, an open source SQL engine for Hadoop designed for interactive query analysis on large data sets Apache Drill is an open source project based on Google Dremel to provide a distributed SQL execution engine for interactive, low latency queries. Currently in Beta Spark SQL is a new SQL engine being developed from the ground up for Spark by Databricks, and replaces the previous Shark project. Currently in Alpha. Splice Machine claims to be a full ACID compliant RDBMS on Hadoop, by taking the Apache Derby Java relational database and removed its storage layer, replacing it with the Apache HBase NoSQL database. Then the company modified the planner, optimiser and executor inside Derby to take advantage of HBase's distributed architecture. Currently in Beta. Citus DB is built on PostgreSQL and runs on Hadoop nodes JethroData is an index-based SQL engine for Hadoop automatically indexes data as it is written into Hadoop InfiniDB has recently announced InfiniDB for Apache Hadoop, which is positioned for analytics and claims “if you know MySQL, you know InfiniDB” 35 © 2014 IBM Corporation