Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase

2,406 views

Published on

Trafodion, open sourced by HP, reflects 20+ years of investment in a full-fledged RDBMS built on Tandem's OLTP heritage and geared towards a wide set of mixed query workloads. In this talk, we will discuss how HP integrated Trafodion with HBase to take full advantage of the Trafodion database engine and the HBase storage engine, covering 3-tier architecture, storage, salting/partitioning, data movement, and more.

Published in: Software

HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase

  1. 1. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.1 Trafodion Integrating Operational SQL into Hadoop HBaseCon 2015, San Francisco May 7th
  2. 2. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2 The most mature SQL open source RDBMS on Hadoop Operational Heritage • Sub-second response times • High concurrency • Full ACID distributed transaction management • Mission critical availability • Unparalleled scale before NoSQL • ANSI SQL support • UDFs BI Heritage • Parallel everything • Sophisticated optimizer • Enterprise level manageability • Multi-temperate data • Materialized Views & query rewrite • OLAP & extensive function support Open sourced on HBase • Transaction mgmt for Traf and HBase tables • Data type and check enforcement • Schema flexibility • Optional row formats • Integration of struct, semi-struct, & unstruct data • Operational, historical, analytical deployments on single platform 20+ years in Tandem / NonStop OLTP + Neoview EDW capabilities on MPP architecture Operational Heritage • Sub-second response times • High concurrency • Full ACID distributed transaction management • Mission critical availability • Unparalleled scale before NoSQL • ANSI SQL support • UDFs BI Heritage • Parallel everything • Sophisticated optimizer • Enterprise level manageability • Multi-temperate data • Materialized Views & query rewrite • OLAP & extensive function support Open sourced on HBase • Transaction mgmt for Traf and HBase tables • Data type and check enforcement • Schema flexibility • Optional row formats • Integration of struct, semi-struct, & unstruct data • Operational, historical, analytical deployments on single platform
  3. 3. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 Client JDBC ODBC User and ISV Operational Applications Driver Hive Native Hive Tables Multi-Structured Data Store Integration HBase Native HBase Tables KVS, Columnar SQL ESP CMP Master ESPDTM WMS Compiler and Optimizer Workload Management SQL Parallelism Distributed Transaction Management . . . . Database Connectivity UDF External Communication HBase HDFS Relationa l Schema Trafodio n Tables Storage Engines Layered Architecture
  4. 4. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 Trafodion Metadata Trafodion Data Hive Data HDFS Data Trafodion Node (DCS,EXE, ESP, CMP, DTM, UDF, WMS) Hadoop Data Node HBase APIs HBase Region Server Hive/HDFS APIs Trafodion Metadata Trafodion Data Hive Data HDFS Data Trafodion Node (DCS,EXE, ESP, CMP, DTM, UDF, WMS) Hadoop Data Node HBase APIs HBase Region Server Hive/HDFS APIs TCP/IP TCP/IP … TCP/IP Process architecture
  5. 5. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 Optimized execution plans based on statistics Rule-driven and cost-based optimizer Based on Cascades & Large Scope Rules Parallel and non-parallel plans Equal-height histogram stats Join and aggregation variants Subquery un-nesting Optimized inner, left, right, outer joins
  6. 6. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 Efficient data flow SQL execution Scan Scan Join Group By • Nested, nested cache, merge, hybrid hash joins • Eager & full aggregations incl. hash GROUP BYs • Unions, sorts • I/O operations (scan, update, delete, insert) In-memory, data flow architecture • Continuous data flow through in-memory queues • overflow to disk for hash and sort operations Reduced data movement Scheduler driven Multi-threaded executor Adaptive Segmentation Skew Buster
  7. 7. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 DOP features • Varying degrees of parallelism • Salting of rows for even data distribution Expression evaluation • Evaluated close to data • Fastpaths, prefetch, pcode, LLVM Scalability • Parallel execution • Scales out with Hadoop Degree of parallelism optimization Operator parallelism Partitioned parallelism Pipeline parallelism Master Join Scan Group by Scan 4 0 3 0 2 0 • Support for co-located joins • repartitioning when necessary • inner child and outer child broadcasts
  8. 8. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 Varying operational workloads Node 1 Node 2 Node n Client Application HDFS HBase HBase HBaseFILTERS HDFS HDFS HDFS HDFS Ethernet COPROCESSORS Master ESP ESP ESP ESP ESP ESP ESP ESP ESP ESP Master Multi- fragmen t Access optimizations • Random (keyed), Multi-dimensional (MDAM) Secondary index access Row format optimizations • HBase(col per cell), aligned(row per cell) Reusable ESPs for parallelism Cached SQL plans Pushdown (filters + coprocessors) Service persistence (via Zookeeper) Automatic query resubmission
  9. 9. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 YCSB operation speeds that approach HBase (within 20%) Trafodion performance objective Meets current objective! With max variance at 10.8% 0 128 256 384 512 640 768 896 1,024 Throughput(OPS) Concurrency (Streams) YCSB Singleton5050 (Workload A) Traf 1.1 HBase
  10. 10. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 YCSB and Order Entry scale linearly! Trafodion performance objective Meets objective! Transactional Order Entry Throughput YCSB Selects Updates 50/50 Throughput Throughput Throughput
  11. 11. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 Trafodion Distributed Transaction Management … 1. Multiple row inserts, updates, and deletes to a table Trafodion 3 Region A Region B Region C Region D 2 Table A Table B Table C 1 ... Table A 4 2. Multiple table and SQL insert, update, and delete statements 3. Distributed multiple HBase region ins, upd, del transaction (2-phase commit) 4. Read-only transaction (eliminates commit overhead)
  12. 12. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 Scalable Architecture, implemented using HBase coprocessors Transaction Distributed Process Management … Node n SQL Process Transaction Manager Library Resource Manager Library SQL Process Transaction Manager Library Resource Manager Library SQL Process Transaction Manager Library Resource Manager Library Transaction Manager HBase trx Region Server HBase Region Server TLOG HBase RegionHBase RegionTrx Region Endpoint coproc Node 2 SQL Process Transaction Manager Library Resource Manager Library SQL Process Transaction Manager Library Resource Manager Library SQL Process Transaction Manager Library Resource Manager Library Transaction Manager HBase trx Region Server HBase Region Server TLOG HBase RegionHBase RegionTrx Region Endpoint coproc ... Node 1 SQL Process Transaction Manager Library Resource Manager Library SQL Process Transaction Manager Library Resource Manager Library SQL Process Transaction Manager Library Resource Manager Library Transaction Manager HBase trx Region Server HBase Region Server TLOG HBase RegionHBase RegionTrx Region Endpoint coproc
  13. 13. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 Minimum distributed transaction management overhead (within 20%) Trafodion transaction performance objective Order Entry: multi-statement transactional workload • 5 transaction types (New Orders, Payments, Order Status, Deliver, and Stock Level checks • On average has about 20 statements per transaction 0 128 256 384 512 640 768 896 1,024 Throughput(TPM) Concurrency (Streams) OrderEntry Traf 1.1 Autcommit Meets current objective! With max variance at 11.3%
  14. 14. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 Log Files Trafodion manageability overview • Performant capture and publishing of query statistics – Threshold driven – Aggregation • Events logged using log4cpp/log4j • Client access via ODBC/JDBC, REST API, or HPdsm Trafodion Instance Database Administrator ODBC/JDB C REST API Publications from Trafodion Subsystems Query Statistics Events Repositor y Session Query AGGR Query Log4cpp/log4j
  15. 15. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 High availability and data integrity: Features & Testing Hadoop, HDFS, HBase • Name Node Redundancy • HBase Replication (asynchronous) • HDFS Replication (data block copies) • HBase Snapshot • Zookeeper Trafodion • Persistent connectivity services • Automatic Query Retry • Efficient fully distributed transaction recovery • Backup and Restore utilities • Extensive HBase / Trafodion HA testing +
  16. 16. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 Query: List of all products, some product info, current specials, a summary of their ratings and reviews Nested Join for keyed lookup into Trafodion Parallel scan larger Trafodion tables Cache of previous lookups into Trafodion Demo Screenshot: Operational Reporting Queries
  17. 17. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 Load data from Trafodion tables to Hive table with insert-select statement Source data is detailed order information obtained by joining multiple Trafodion tables Parallel Join Trafodion tables acting as source Parallel insert into Hive Hive table is the target Demo Screenshot: Interoperability (Trafodion & Hive/HDFS)
  18. 18. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 Demo Screenshot: UDFs: User Defined Functions
  19. 19. HP © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19 Demo Screenshot: Query Monitoring
  20. 20. See for yourself… Come discover and develop on Trafodion www.trafodion.org

×