SlideShare a Scribd company logo
1 of 22
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HiveWarehouse
Connector
(Update)
Eric Wohlstadter – Hortonworks R&D
June 2018
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Overview
 HDP3 version of Spark-Hive Connector
 Features
– Spark access to ACID tables
– Other integrations
• e.g. Spark access to Ranger tables
 API and Architecture
– Reads from Hive to Spark
– Writes from Spark to Hive
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Features: Spark access to ACID tables
 Hive supports traditional ACID semantics
– ORC with delta files to support low-latency writes
– Compaction to prevent storage fragmentation
– Custom readers to reconcile deltas on read
 ACID tables use extended Metastore format
 Spark doesn’t read/write ACID tables
 Spark doesn’t use ACID Metastore format
Support Spark reads/writes for ACID tables
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
Executors LLAP Daemons
ACID
TablesX
X
Spark can’t read/write ACID tables
Spark doesn’t use ACID Metastore format
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
Executors LLAP Daemons
Isolate Spark and Hive Catalogs/Tables
Leverage connector for Spark <-> Hive
HWC
HWC
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Features: Spark access to Ranger tables
– Column-level access control
– Column masking
• “show only first four chars of string column”
– Row-level access control
• “show only rows WHERE …”
Support Spark reads/writes for Ranger tables Ranger UI
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Overview
 Latest version of Hive Connector library for Spark
 Features
– Spark access to Ranger tables
– Spark access to ACID tables
 API and Architecture
–Reads from Hive to Spark
– Writes from Spark to Hive
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
“JDBC-like” READ API
a) hive =
HiveWarehouseBuilder.session(spark).build()
• Create HiveWarehouseSession
b) hive.execute(sql : String): DataFrame
• SHOW, DESCRIBE, etc…
c) hive.executeQuery(sql: String): DataFrame
– SELECT, SELECT CTE, etc…
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
HWC (Thrift JDBC)
Executors LLAP Daemons
• Driver submits catalog op to HiveServer
• HWC returns ResultSet as DataFrame
JDBC
b) hive.execute(“show databases”).show()
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
HWC (JDBC)
Executors LLAP Daemons
1
2
3
1. Driver submits query to HiveServer
2. Compile query and return ”splits” to Driver
3. Execute query on LLAP
c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show()
ACID
Tables
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
HWC (Arrow)
Executors LLAP Daemons
4
5
4. Executor Tasks run for each split
5. Tasks reads Arrow data from LLAP
6. HWC returns ArrowColumnVectors to Spark
6
c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show()
ACID
Tables
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Other Recent READ improvements
 Leverage Spark 2.3.1 support for Arrow
 Implemented SupportsColumnBatchScan plugin
 Add Hive Arrow SerDe
 Add Arrow support to LlapOutputFormatService
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Overview
 Latest version of Hive Connector library for Spark
 Features
– Spark access to Ranger tables
– Spark access to ACID tables
 API and Architecture
– Reads from Hive to Spark
–Writes from Spark to Hive
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Connector WRITE API
a) hive.executeUpdate(sql : String) : Bool
• Create, Update, Alter, Insert, Merge, Delete, etc…
b) df.write.format(HIVE_WAREHOUSE_CONNECTOR)
• Write DataFrame using LOAD DATA INTO TABLE
c) df.write.format(STREAM_TO_STREAM)
• Write Streaming DataFrame using Hive-Streaming
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer2+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
HWC (Thrift JDBC)
Executors LLAP Daemons
a) hive.executeUpdate(“INSERT INTO s SELECT * FROM t”)
1. Driver submits update op to HiveServer2
2. Process update through Tez and/or LLAP
3. HWC returns true on success
1
2
3
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Example: LOAD to Hive
df.select("ws_sold_time_sk", "ws_ship_date_sk")
.filter("ws_sold_time_sk > 80000")
.write.format(HIVE_WAREHOUSE_CONNECTOR)
.option("table", “my_acid_table”)
.save()
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
Executors LLAP Daemons
b) df.write.format(HIVE_WAREHOUSE_CONNECTOR).save()
1. Driver launches DataWriter tasks
2. Tasks write ORC files
3. On commit, Driver executes LOAD DATA INTO TABLE
HDFS
/tmp
1
2
3
ACID
Tables
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Example: Stream to Hive
val df = spark.readStream.format("socket")
...
.load()
df.writeStream.format(STREAM_TO_STREAM)
.option(“table”, “my_acid_table”)
.start()
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Driver
MetaStore
HiveServer+Tez
Executors
Spark
Meta
Hive
Meta
Executors
c) df.write.format(STREAM_TO_STREAM).start()
1. Driver launches DataWriter tasks
2. Tasks open Txns
3. Write rows to ACID tables in Tx
ACID
Tables
1
2
3
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Other Recent Write Improvements
 Implemented WriteSupport and StreamWriteSupport
Spark plugins
 Improved Hive LOAD DATA INTO TABLE
– e.g. Support for bucketing and dynamic partitioning
 Improved HiveStreaming
– e.g. Support for dynamic partitioning
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Compatibility Matrix
Connector
Branch
Spark Hive HDP
master
(Summer 2018)
2.3.1 3.1.0 3.0.0 (GA)
branch-2.3 2.3.0 2.1.0 2.6.5 (TP)
branch-2.2 2.2.0 2.1.0 2.6.3~4 (TP)
branch-2.1 2.1.1 2.1.0 2.6.0~2 (TP)
branch-1.6 1.6.3 2.1.0 2.5.x (TP)
https://github.com/hortonworks-spark/spark-llap
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Acknowledgements:
Teddy Choi, Jason Dere, Gunther Hagleitner,
Dongjoon Hyun, Prasanth Jayachandran, Hyukjin Kwon,
Bikas Saha, Jerry Zhao

More Related Content

What's hot

Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Josh Elser
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3Dongjoon Hyun
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewJosh Elser
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Next Generation Execution for Apache Storm
Next Generation Execution for Apache StormNext Generation Execution for Apache Storm
Next Generation Execution for Apache StormDataWorks Summit
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query ServerJosh Elser
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopHortonworks
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache HiveDataWorks Summit
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_featuresAlberto Romero
 
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerArtem Ervits
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016alanfgates
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?Hortonworks
 

What's hot (20)

Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Next Generation Execution for Apache Storm
Next Generation Execution for Apache StormNext Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query Server
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
 

Similar to HiveWarehouseConnector

Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with ZeppelinHortonworks
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleHortonworks
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoopGergely Devenyi
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018alanfgates
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseSankar H
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingAll Things Open
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 

Similar to HiveWarehouseConnector (20)

Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 

Recently uploaded

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

HiveWarehouseConnector

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved HiveWarehouse Connector (Update) Eric Wohlstadter – Hortonworks R&D June 2018
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Overview  HDP3 version of Spark-Hive Connector  Features – Spark access to ACID tables – Other integrations • e.g. Spark access to Ranger tables  API and Architecture – Reads from Hive to Spark – Writes from Spark to Hive
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Features: Spark access to ACID tables  Hive supports traditional ACID semantics – ORC with delta files to support low-latency writes – Compaction to prevent storage fragmentation – Custom readers to reconcile deltas on read  ACID tables use extended Metastore format  Spark doesn’t read/write ACID tables  Spark doesn’t use ACID Metastore format Support Spark reads/writes for ACID tables
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta Executors LLAP Daemons ACID TablesX X Spark can’t read/write ACID tables Spark doesn’t use ACID Metastore format
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta Executors LLAP Daemons Isolate Spark and Hive Catalogs/Tables Leverage connector for Spark <-> Hive HWC HWC
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Features: Spark access to Ranger tables – Column-level access control – Column masking • “show only first four chars of string column” – Row-level access control • “show only rows WHERE …” Support Spark reads/writes for Ranger tables Ranger UI
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Overview  Latest version of Hive Connector library for Spark  Features – Spark access to Ranger tables – Spark access to ACID tables  API and Architecture –Reads from Hive to Spark – Writes from Spark to Hive
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved “JDBC-like” READ API a) hive = HiveWarehouseBuilder.session(spark).build() • Create HiveWarehouseSession b) hive.execute(sql : String): DataFrame • SHOW, DESCRIBE, etc… c) hive.executeQuery(sql: String): DataFrame – SELECT, SELECT CTE, etc…
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (Thrift JDBC) Executors LLAP Daemons • Driver submits catalog op to HiveServer • HWC returns ResultSet as DataFrame JDBC b) hive.execute(“show databases”).show()
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (JDBC) Executors LLAP Daemons 1 2 3 1. Driver submits query to HiveServer 2. Compile query and return ”splits” to Driver 3. Execute query on LLAP c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (Arrow) Executors LLAP Daemons 4 5 4. Executor Tasks run for each split 5. Tasks reads Arrow data from LLAP 6. HWC returns ArrowColumnVectors to Spark 6 c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Other Recent READ improvements  Leverage Spark 2.3.1 support for Arrow  Implemented SupportsColumnBatchScan plugin  Add Hive Arrow SerDe  Add Arrow support to LlapOutputFormatService
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Overview  Latest version of Hive Connector library for Spark  Features – Spark access to Ranger tables – Spark access to ACID tables  API and Architecture – Reads from Hive to Spark –Writes from Spark to Hive
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Connector WRITE API a) hive.executeUpdate(sql : String) : Bool • Create, Update, Alter, Insert, Merge, Delete, etc… b) df.write.format(HIVE_WAREHOUSE_CONNECTOR) • Write DataFrame using LOAD DATA INTO TABLE c) df.write.format(STREAM_TO_STREAM) • Write Streaming DataFrame using Hive-Streaming
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer2+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (Thrift JDBC) Executors LLAP Daemons a) hive.executeUpdate(“INSERT INTO s SELECT * FROM t”) 1. Driver submits update op to HiveServer2 2. Process update through Tez and/or LLAP 3. HWC returns true on success 1 2 3
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Example: LOAD to Hive df.select("ws_sold_time_sk", "ws_ship_date_sk") .filter("ws_sold_time_sk > 80000") .write.format(HIVE_WAREHOUSE_CONNECTOR) .option("table", “my_acid_table”) .save()
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta Executors LLAP Daemons b) df.write.format(HIVE_WAREHOUSE_CONNECTOR).save() 1. Driver launches DataWriter tasks 2. Tasks write ORC files 3. On commit, Driver executes LOAD DATA INTO TABLE HDFS /tmp 1 2 3 ACID Tables
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Example: Stream to Hive val df = spark.readStream.format("socket") ... .load() df.writeStream.format(STREAM_TO_STREAM) .option(“table”, “my_acid_table”) .start()
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez Executors Spark Meta Hive Meta Executors c) df.write.format(STREAM_TO_STREAM).start() 1. Driver launches DataWriter tasks 2. Tasks open Txns 3. Write rows to ACID tables in Tx ACID Tables 1 2 3
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Other Recent Write Improvements  Implemented WriteSupport and StreamWriteSupport Spark plugins  Improved Hive LOAD DATA INTO TABLE – e.g. Support for bucketing and dynamic partitioning  Improved HiveStreaming – e.g. Support for dynamic partitioning
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Compatibility Matrix Connector Branch Spark Hive HDP master (Summer 2018) 2.3.1 3.1.0 3.0.0 (GA) branch-2.3 2.3.0 2.1.0 2.6.5 (TP) branch-2.2 2.2.0 2.1.0 2.6.3~4 (TP) branch-2.1 2.1.1 2.1.0 2.6.0~2 (TP) branch-1.6 1.6.3 2.1.0 2.5.x (TP) https://github.com/hortonworks-spark/spark-llap
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Acknowledgements: Teddy Choi, Jason Dere, Gunther Hagleitner, Dongjoon Hyun, Prasanth Jayachandran, Hyukjin Kwon, Bikas Saha, Jerry Zhao

Editor's Notes

  1. ACID table details Spark doesn’t support ACID tables
  2. Isolate Catalogs Interoperate with Connector
  3. Other interoperability Access to Hive tables mediated by Ranger
  4. Changes needed to read Hive JDBC like API
  5. Bridge Hive catalog operations
  6. Execute query in Hive Spark doesn’t directly access ACID tables
  7. HWC returns DataFrames that can be transformed by DataFrame API
  8. Support for writing SparkSQL Streams to ACID tables