Submit Search
Upload
HiveWarehouseConnector
•
Download as PPTX, PDF
•
0 likes
•
109 views
E
Eric Wohlstadter
Follow
Hive/Spark interoperability for Hive 3+ and Spark 2.3.1+
Read less
Read more
Software
Report
Share
Report
Share
1 of 22
Download now
Recommended
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache Spark
DataWorks Summit
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
ORC Improvement & Roadmap in Apache Spark 2.3 and 2.4
ORC Improvement & Roadmap in Apache Spark 2.3 and 2.4
Dongjoon Hyun
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
DataWorks Summit
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
Recommended
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache Spark
DataWorks Summit
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
ORC Improvement & Roadmap in Apache Spark 2.3 and 2.4
ORC Improvement & Roadmap in Apache Spark 2.3 and 2.4
Dongjoon Hyun
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
DataWorks Summit
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
Dongjoon Hyun
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
Josh Elser
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Next Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
DataWorks Summit
Hive 3 a new horizon
Hive 3 a new horizon
Artem Ervits
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
Apache Phoenix Query Server
Apache Phoenix Query Server
Josh Elser
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
Transactional SQL in Apache Hive
Transactional SQL in Apache Hive
DataWorks Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Artem Ervits
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
More Related Content
What's hot
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
Dongjoon Hyun
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
Josh Elser
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Next Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
DataWorks Summit
Hive 3 a new horizon
Hive 3 a new horizon
Artem Ervits
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
Apache Phoenix Query Server
Apache Phoenix Query Server
Josh Elser
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
Transactional SQL in Apache Hive
Transactional SQL in Apache Hive
DataWorks Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Artem Ervits
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
What's hot
(20)
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
Next Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
Hive 3 a new horizon
Hive 3 a new horizon
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
Apache Phoenix Query Server
Apache Phoenix Query Server
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Transactional SQL in Apache Hive
Transactional SQL in Apache Hive
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Similar to HiveWarehouseConnector
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
Micro services vs hadoop
Micro services vs hadoop
Gergely Devenyi
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Sankar H
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
Similar to HiveWarehouseConnector
(20)
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Micro services vs hadoop
Micro services vs hadoop
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Recently uploaded
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
Andolasoft Inc
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Fatema Valibhai
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
SolGuruz
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
aagamshah0812
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ComplianceQuest1
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Willy Marroquin (WillyDevNET)
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
Arshad QA
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
shikhaohhpro
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
ABDERRAOUF MEHENNI
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
kalichargn70th171
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
MyIntelliSource, Inc.
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Alberto González Trastoy
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
OnePlan Solutions
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
Wave PLM
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
Recently uploaded
(20)
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
HiveWarehouseConnector
1.
1 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved HiveWarehouse Connector (Update) Eric Wohlstadter – Hortonworks R&D June 2018
2.
2 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Overview HDP3 version of Spark-Hive Connector Features – Spark access to ACID tables – Other integrations • e.g. Spark access to Ranger tables API and Architecture – Reads from Hive to Spark – Writes from Spark to Hive
3.
3 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Features: Spark access to ACID tables Hive supports traditional ACID semantics – ORC with delta files to support low-latency writes – Compaction to prevent storage fragmentation – Custom readers to reconcile deltas on read ACID tables use extended Metastore format Spark doesn’t read/write ACID tables Spark doesn’t use ACID Metastore format Support Spark reads/writes for ACID tables
4.
4 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta Executors LLAP Daemons ACID TablesX X Spark can’t read/write ACID tables Spark doesn’t use ACID Metastore format
5.
5 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta Executors LLAP Daemons Isolate Spark and Hive Catalogs/Tables Leverage connector for Spark <-> Hive HWC HWC
6.
6 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Features: Spark access to Ranger tables – Column-level access control – Column masking • “show only first four chars of string column” – Row-level access control • “show only rows WHERE …” Support Spark reads/writes for Ranger tables Ranger UI
7.
7 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Overview Latest version of Hive Connector library for Spark Features – Spark access to Ranger tables – Spark access to ACID tables API and Architecture –Reads from Hive to Spark – Writes from Spark to Hive
8.
8 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved “JDBC-like” READ API a) hive = HiveWarehouseBuilder.session(spark).build() • Create HiveWarehouseSession b) hive.execute(sql : String): DataFrame • SHOW, DESCRIBE, etc… c) hive.executeQuery(sql: String): DataFrame – SELECT, SELECT CTE, etc…
9.
9 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (Thrift JDBC) Executors LLAP Daemons • Driver submits catalog op to HiveServer • HWC returns ResultSet as DataFrame JDBC b) hive.execute(“show databases”).show()
10.
10 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (JDBC) Executors LLAP Daemons 1 2 3 1. Driver submits query to HiveServer 2. Compile query and return ”splits” to Driver 3. Execute query on LLAP c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables
11.
11 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (Arrow) Executors LLAP Daemons 4 5 4. Executor Tasks run for each split 5. Tasks reads Arrow data from LLAP 6. HWC returns ArrowColumnVectors to Spark 6 c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables
12.
12 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Other Recent READ improvements Leverage Spark 2.3.1 support for Arrow Implemented SupportsColumnBatchScan plugin Add Hive Arrow SerDe Add Arrow support to LlapOutputFormatService
13.
13 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Overview Latest version of Hive Connector library for Spark Features – Spark access to Ranger tables – Spark access to ACID tables API and Architecture – Reads from Hive to Spark –Writes from Spark to Hive
14.
14 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Connector WRITE API a) hive.executeUpdate(sql : String) : Bool • Create, Update, Alter, Insert, Merge, Delete, etc… b) df.write.format(HIVE_WAREHOUSE_CONNECTOR) • Write DataFrame using LOAD DATA INTO TABLE c) df.write.format(STREAM_TO_STREAM) • Write Streaming DataFrame using Hive-Streaming
15.
15 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer2+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (Thrift JDBC) Executors LLAP Daemons a) hive.executeUpdate(“INSERT INTO s SELECT * FROM t”) 1. Driver submits update op to HiveServer2 2. Process update through Tez and/or LLAP 3. HWC returns true on success 1 2 3
16.
16 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Example: LOAD to Hive df.select("ws_sold_time_sk", "ws_ship_date_sk") .filter("ws_sold_time_sk > 80000") .write.format(HIVE_WAREHOUSE_CONNECTOR) .option("table", “my_acid_table”) .save()
17.
17 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta Executors LLAP Daemons b) df.write.format(HIVE_WAREHOUSE_CONNECTOR).save() 1. Driver launches DataWriter tasks 2. Tasks write ORC files 3. On commit, Driver executes LOAD DATA INTO TABLE HDFS /tmp 1 2 3 ACID Tables
18.
18 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Example: Stream to Hive val df = spark.readStream.format("socket") ... .load() df.writeStream.format(STREAM_TO_STREAM) .option(“table”, “my_acid_table”) .start()
19.
19 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Driver MetaStore HiveServer+Tez Executors Spark Meta Hive Meta Executors c) df.write.format(STREAM_TO_STREAM).start() 1. Driver launches DataWriter tasks 2. Tasks open Txns 3. Write rows to ACID tables in Tx ACID Tables 1 2 3
20.
20 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Other Recent Write Improvements Implemented WriteSupport and StreamWriteSupport Spark plugins Improved Hive LOAD DATA INTO TABLE – e.g. Support for bucketing and dynamic partitioning Improved HiveStreaming – e.g. Support for dynamic partitioning
21.
21 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Compatibility Matrix Connector Branch Spark Hive HDP master (Summer 2018) 2.3.1 3.1.0 3.0.0 (GA) branch-2.3 2.3.0 2.1.0 2.6.5 (TP) branch-2.2 2.2.0 2.1.0 2.6.3~4 (TP) branch-2.1 2.1.1 2.1.0 2.6.0~2 (TP) branch-1.6 1.6.3 2.1.0 2.5.x (TP) https://github.com/hortonworks-spark/spark-llap
22.
22 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Acknowledgements: Teddy Choi, Jason Dere, Gunther Hagleitner, Dongjoon Hyun, Prasanth Jayachandran, Hyukjin Kwon, Bikas Saha, Jerry Zhao
Editor's Notes
ACID table details Spark doesn’t support ACID tables
Isolate Catalogs Interoperate with Connector
Other interoperability Access to Hive tables mediated by Ranger
Changes needed to read Hive JDBC like API
Bridge Hive catalog operations
Execute query in Hive Spark doesn’t directly access ACID tables
HWC returns DataFrames that can be transformed by DataFrame API
Support for writing SparkSQL Streams to ACID tables
Download now