SlideShare a Scribd company logo
Trafodion 
Transactional SQL-on-HBase 
Trafodion and Hadoop / HBase 
www.trafodion.org 
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion innovation built upon Hadoop stack 
Leverages Hadoop and 
HBase for core modules 
• Maintains API compatibility 
• Inherited scalability and 
availability 
Differentiation 
• ANSI SQL via ODBC/JDBC 
• Relational schema abstraction 
• Distributed transaction protection 
• Mature SQL technology 
• Automatic parallelism 
Hadoop Trafodion 
Client Application using 
ODBC/JDBC on 
Windows/Linux 
Client Services for ODBC and JDBC 
HBase 
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 2 to change without notice. 
Hive 
HDFS 
Zookeeper 
SQL Compiler / Optimizer / Executor 
Distributed Transaction Manager 
+
HBase vs. Trafodion comparison 
HBase Trafodion + HBase 
Data abstraction Key and value pair Relational schema 
Physical Layout Column family store where 
row data is stored together by 
cells 
Same except there is a single column 
family with space-saving column 
encoding 
Column values Uninterpreted array of bytes Explicitly defined and enforced data 
types 
ACID Guarantee Single row atomicity Multi- SQL statements, tables, and 
rows defined as part of transaction 
Language API Get/put/delete SQL (Trafodion invokes native HBase 
API) 
Row Key Index Single (string) row key Composite (multi-column) row key 
Secondary Indexes Not supported Arbitrary secondary key columns 
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 3 to change without notice.
Salting of row keys 
How it works 
• HBase table gets created, pre-split with one 
region per salt value 
• A hash value column, “_SALT_”, is added as a 
prefix to the row key 
• Salting is transparent to SQL statements 
– Automatically computed during insert/update 
statements 
– Predicates automatically generated where feasible 
– Minimal overhead for direct lookup by key value 
Benefits 
• Even data distributions across HBase regions 
• Avoids region hotspots caused by insertion of 
data in row key order 
INSERT(s) SELECT(s) 
HBase 
Region 
PART 1 PART 2 PART 3 PART 4 
HDFS 
CREATE TABLE t(a integer not null primary key, b 
integer) SALT USING 4 PARTITIONS; 
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 4 to change without notice. 
HBase 
Region 
HDFS 
HBase 
Region 
HDFS 
HBase 
Region 
HDFS
Trafodion and Hadoop – Better Together! 
Leverages and extends Hadoop for transactional SQL workloads 
Complete: Full-function ANSI SQL 
Reuse existing SQL skills and improve developer productivity 
Protected: Distributed ACID transactions 
Guarantees data consistency across multiple rows, tables, SQL statements 
Efficient: Optimized for low-latency read and write 
transactions 
Supports real-time transaction processing applications 
Flexible: Schema flexibility and multi-structured data 
Seamlessly integrates structured, unstructured, and semi-structured data 
Interoperable: Standard ODBC/JDBC access 
Works with existing tools and applications 
Open: Hadoop and Linux distribution neutral 
Easy to add to your existing infrastructure and no vendor lock-in 
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 5 to change without notice. 
Reuse SQL 
skills 
Scale without 
complexity 
Complements 
Hadoop 
Reduce 
Costs 
Real-time 
Performanc 
e 
+
See for yourself… 
Come discover and develop on Trafodion 
www.trafodion.org 
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 6 to change without notice.

More Related Content

What's hot

Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
DataWorks Summit
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
Hortonworks
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
Seungdon Choi
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
Douglas Bernardini
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
DataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
SAP HORTONWORKS
SAP HORTONWORKSSAP HORTONWORKS
SAP HORTONWORKS
Douglas Bernardini
 
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
Big Data Montreal
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
BigData Research
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
DataWorks Summit
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
DataWorks Summit
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
DataWorks Summit
 

What's hot (20)

Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
SAP HORTONWORKS
SAP HORTONWORKSSAP HORTONWORKS
SAP HORTONWORKS
 
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 

Similar to 2 - Trafodion and Hadoop HBase

Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
Joseph Niemiec
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
KarthigaGunasekaran1
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
Shivam Shukla
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
Rishabh Dugar
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azure
Mostafa
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
Khalid Salama
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
Ravi Veeramachaneni
 
3 - Trafodion Technology Look
3 - Trafodion Technology Look3 - Trafodion Technology Look
3 - Trafodion Technology Look
Rohit Jain
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
Jonathan Bloom
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
Xplenty
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
Felipe Ferreira
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
 

Similar to 2 - Trafodion and Hadoop HBase (20)

Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
H base
H baseH base
H base
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azure
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
3 - Trafodion Technology Look
3 - Trafodion Technology Look3 - Trafodion Technology Look
3 - Trafodion Technology Look
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 

Recently uploaded

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 

2 - Trafodion and Hadoop HBase

  • 1. Trafodion Transactional SQL-on-HBase Trafodion and Hadoop / HBase www.trafodion.org HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 2. Trafodion innovation built upon Hadoop stack Leverages Hadoop and HBase for core modules • Maintains API compatibility • Inherited scalability and availability Differentiation • ANSI SQL via ODBC/JDBC • Relational schema abstraction • Distributed transaction protection • Mature SQL technology • Automatic parallelism Hadoop Trafodion Client Application using ODBC/JDBC on Windows/Linux Client Services for ODBC and JDBC HBase HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 2 to change without notice. Hive HDFS Zookeeper SQL Compiler / Optimizer / Executor Distributed Transaction Manager +
  • 3. HBase vs. Trafodion comparison HBase Trafodion + HBase Data abstraction Key and value pair Relational schema Physical Layout Column family store where row data is stored together by cells Same except there is a single column family with space-saving column encoding Column values Uninterpreted array of bytes Explicitly defined and enforced data types ACID Guarantee Single row atomicity Multi- SQL statements, tables, and rows defined as part of transaction Language API Get/put/delete SQL (Trafodion invokes native HBase API) Row Key Index Single (string) row key Composite (multi-column) row key Secondary Indexes Not supported Arbitrary secondary key columns HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 3 to change without notice.
  • 4. Salting of row keys How it works • HBase table gets created, pre-split with one region per salt value • A hash value column, “_SALT_”, is added as a prefix to the row key • Salting is transparent to SQL statements – Automatically computed during insert/update statements – Predicates automatically generated where feasible – Minimal overhead for direct lookup by key value Benefits • Even data distributions across HBase regions • Avoids region hotspots caused by insertion of data in row key order INSERT(s) SELECT(s) HBase Region PART 1 PART 2 PART 3 PART 4 HDFS CREATE TABLE t(a integer not null primary key, b integer) SALT USING 4 PARTITIONS; HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 4 to change without notice. HBase Region HDFS HBase Region HDFS HBase Region HDFS
  • 5. Trafodion and Hadoop – Better Together! Leverages and extends Hadoop for transactional SQL workloads Complete: Full-function ANSI SQL Reuse existing SQL skills and improve developer productivity Protected: Distributed ACID transactions Guarantees data consistency across multiple rows, tables, SQL statements Efficient: Optimized for low-latency read and write transactions Supports real-time transaction processing applications Flexible: Schema flexibility and multi-structured data Seamlessly integrates structured, unstructured, and semi-structured data Interoperable: Standard ODBC/JDBC access Works with existing tools and applications Open: Hadoop and Linux distribution neutral Easy to add to your existing infrastructure and no vendor lock-in HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 5 to change without notice. Reuse SQL skills Scale without complexity Complements Hadoop Reduce Costs Real-time Performanc e +
  • 6. See for yourself… Come discover and develop on Trafodion www.trafodion.org HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject 6 to change without notice.

Editor's Notes

  1. Welcome to the Trafodion briefing series. Trafodion is a HP sponsored open-source project to deliver an enterprise-class Transactional SQL-on-HBase DBMS solution. The purpose of this segment is to discuss how Trafodion both leverages and extends elements of the Hadoop/HBase infrastructure to support operational workload requirements.
  2. Trafodion is designed to build upon and leverage Apache Hadoop and HBase core modules and thereby transactional/operational applications using Trafodion transparently gain Hadoop’ s advantages of affordable performance, scalability, elasticity, availability, etc. This graphic depicts a subset of the Hadoop software stack colored in dark grey that are specifically leveraged by Trafodion, namely HBase, HDFS, and Zookeeper. To this stack, Trafodion adds (items colored in blue) the Trafodion ODBC/JDBC drivers, the Trafodion database software, and the DTM for distributed transaction protection. Trafodion interfaces to Hadoop services using their standard APIs. By maintaining API compatibility, Trafodion becomes Hadoop distribution neutral thereby eliminating vendor lock-in by offering customers a choice of distributions to choose from. Trafodion is targeted to deliver differentiation on top of Hadoop in these key areas: A full-featured ANSI SQL implementation whose database services are accessible via a standard ODBC/JDBC connection Provides a SQL relational schema abstraction which makes Trafodion look and feel like any other relational database Distributed ACID transaction protection Mature SQL technology Parallel optimizations for transactional workloads
  3. Although Trafodion stores its database objects in HBase/HDFS storage structures, it differs and brings value-add over vanilla HBase in a multitude of ways as depicted on this slide. Trafodion provides a relational schema abstraction on top of HBase which allows customers to leverage known and well tested relational design methodologies and SQL programming skills. From a physical layout perspective, Trafodion uses standard HBase storage mechanisms (column family store using key-value pairs) to store and access objects. Trafodion currently stores all columns in a single column family to improve access efficiency and speed for operational data. Additionally Trafodion incorporates a column name encoding mechanism to save space on disk and to reduce messaging overhead for the purposes of improving SQL performance. Unlike vanilla HBase that treats stored data as an uninterpreted array of bytes, Trafodion defined columns are assigned specific data types that are enforced by Trafodion when inserting or updating its data contents. This not only greatly improves data quality/integrity, it also eliminates the need to develop application logic to parse and interpret the data contents. Vanilla HBase provides ACID transaction protection only at the row level. Trafodion extends ACID protection to application defined transactions that can span multiple SQL statements, multiple tables, and rows. This greatly improves database integrity by protecting the database against partially completed transactions i.e. ensuring that either the whole transaction is completely materialized in the database or none of it. HBase’s native API is at a very low level and is not a commonly used programming API. In contrast, Trafodion’ s API is ANSI SQL which is a familiar and well known programming interface and allows companies to leverage existing SQL knowledge and skills. Unlike HBase’s key structure that is comprised of a single uninterpreted array of bytes, Trafodion supports the common relational practice of allowing the primary key to be a composite key comprised of multiple columns. Finally unlike vanilla HBase, Trafodion supports the creation of secondary indexes that can be used to speed transaction performance when accessing row data by a column value that is not the row key. In summary, Trafodion incorporates a number of enhancements over vanilla HBase for the purposes of improving transaction performance, data integrity, and DBA/developer productivity (i.e. by reducing application complexity through the use of standard and well known relational practices and APIs).
  4. One known problematic area for HBase is supporting transactional workloads where data is inserted into a table in row key order. When this happens, all of the I/O gets concentrated to a single HBase region which in turn creates a server and disk hotspot and performance bottleneck. To alleviate this problem, Trafodion provides a innovative feature called “salting the row key”. To enable this feature the DBA specifies the number of partitions (i.e. regions) the table is to be split over when creating the table i.e. “SALT USING 4 PARTITIONS”. Trafodion creates the table pre-split with one region per salt value. An internal hash value column, “_SALT_”, is added as a prefix to the row key. Salting is handled automatically by Trafodion and is transparent to application written SQL statements. As data is inserted into the table, Trafodion computes the salt value and directs the insert to the appropriate region. Likewise, Trafodion calculates the salt value when data is retrieved from the table and automatically generates predicates where feasible. This is a very lightweight operation with little overhead or impact to direct key access operations. The benefits of salting are that you get more even data distributions across regions and improved performance via hotspot elimination. Next let’s look at other ways Trafodion brings innovation and value add to vanilla HBase.
  5. Trafodion delivers on the promise of a full featured and optimized transactional SQL-on-HBase DBMS solution with full transactional data protection. This combination of HBase and an enterprise-class transactional SQL engine overcomes Hadoop’s weaknesses in terms of supporting operational workloads. Customers gain the following recognized benefits: Complete: Is a comprehensive and full-functioned ANSI SQL DBMS which allows companies to leverage their in-house SQL learnings and expertise versus having to learn complex map/reduce programming. Protected: Extends Hadoop HBase by adding support for guaranteed transactional consistency across multiple SQL statements, tables, and rows. Efficient: Includes many optimizations for low-latency read and write transactions in support of the fast response time requirements of the operational SQL workloads Trafodion is targeting. Flexible: Trafodion hosted applications gain schema flexibility and seamless integration of data from Trafodion, native HBase, and Hive tables without expensive replication or data movement overhead. Interoperable: Provides investment protection via interoperability with your existing tools and applications using standard ODBC and JDBC access. Open: Is designed to seamlessly fit within customer’s existing IT infrastructure with no vendor lock-in by remaining neutral to the underlying Linux and Hadoop distribution. It complements existing Hadoop investments and benefits. HP open-source sponsorship and investment