SlideShare a Scribd company logo
1 of 6
Confidential – Restricted
Tamil Selvan Kandasamy, Senior Customer Operations Engineer
- Configuring Spark Hive Warehouse Connector
- Debugging LLAP Issues
© Cloudera, Inc. All rights reserved. 2
HiveWarehouseConnector
A library to read/write DataFrames and Streaming DataFrames to/from Apache Hive™ using LLAP
You can use the Hive Warehouse Connector (HWC) API to access any type of table in the Hive catalog from
Spark.
When you use SparkSQL, standard Spark APIs access tables in the Spark catalog
© Cloudera, Inc. All rights reserved. 3
HiveWarehouseConnector - Continued
The Hive Warehouse Connector supports the following applications:
● Spark shell
● PySpark
● The spark-submit script
Does Ranger come into picture?
Yes, it does. Both row and column access control when the data in accessed from Spark
© Cloudera, Inc. All rights reserved. 4
HiveWarehouseConnector - Setting things up
You must add several Spark properties through spark-2-defaults in Ambari to use the Hive Warehouse
Connector for accessing data in Hive. Alternatively, configuration can be provided for each job using hive --
conf
Property Description Comments
spark.sql.hive.hiveserver2.jdbc.url URL for HiveServer2 Interactive In Ambari, copy the value from Services > Hive >
Summary > HIVESERVER2 INTERACTIVE JDBC URL
spark.datasource.hive.warehouse.metastoreUri URI for metastore Copy the value from hive.metastore.uris. For
example,thrift://mycluster-1.com:9083
spark.datasource.hive.warehouse.load.staging.dir HDFS temp directory for batch writes to Hive For example, /tmp
spark.hadoop.hive.llap.daemon.service.hosts Application name for LLAP service Copy value from Advanced hive-interactive-site
>hive.llap.daemon.service.hosts
spark.hadoop.hive.zookeeper.quorum Zookeeper hosts used by LLAP Copy value from Advanced hive-
sitehive.zookeeper.quorum
© Cloudera, Inc. All rights reserved. 5
HiveWarehouseConnector - Making a connection
Run following code in scala shell to view the table data
Scala> import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
hive.execute("show tables").show
hive.executeQuery("select * from employee").show
Confidential – Restricted
THANK YOU

More Related Content

What's hot

The Role of Elastic Load Balancer - Apache Stratos
The Role of Elastic Load Balancer - Apache StratosThe Role of Elastic Load Balancer - Apache Stratos
The Role of Elastic Load Balancer - Apache Stratos
Imesh Gunaratne
 

What's hot (20)

AWS - Beanstalk Fundamentals
AWS - Beanstalk FundamentalsAWS - Beanstalk Fundamentals
AWS - Beanstalk Fundamentals
 
AWS database services
AWS database servicesAWS database services
AWS database services
 
AWS Services - Part 1
AWS Services - Part 1AWS Services - Part 1
AWS Services - Part 1
 
The Role of Elastic Load Balancer - Apache Stratos
The Role of Elastic Load Balancer - Apache StratosThe Role of Elastic Load Balancer - Apache Stratos
The Role of Elastic Load Balancer - Apache Stratos
 
Spring Cloud Config
Spring Cloud ConfigSpring Cloud Config
Spring Cloud Config
 
Shipping logs to splunk from a container in aws howto
Shipping logs to splunk from a container in aws howtoShipping logs to splunk from a container in aws howto
Shipping logs to splunk from a container in aws howto
 
Amazon S3 connector
Amazon S3 connectorAmazon S3 connector
Amazon S3 connector
 
8 Source Code Cloudstack Developer Day
8 Source Code Cloudstack Developer Day8 Source Code Cloudstack Developer Day
8 Source Code Cloudstack Developer Day
 
Agile Deployment using Git and AWS Elastic Beanstalk
Agile Deployment using Git and AWS Elastic BeanstalkAgile Deployment using Git and AWS Elastic Beanstalk
Agile Deployment using Git and AWS Elastic Beanstalk
 
Aws landing zone. journey to the cloud
Aws landing zone. journey to the cloudAws landing zone. journey to the cloud
Aws landing zone. journey to the cloud
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
 
Ansible Automation for Oracle RMAN / Apex Restores
Ansible Automation for Oracle RMAN / Apex RestoresAnsible Automation for Oracle RMAN / Apex Restores
Ansible Automation for Oracle RMAN / Apex Restores
 
6 Roadmap Cloudstack Developer Day
6 Roadmap Cloudstack Developer Day6 Roadmap Cloudstack Developer Day
6 Roadmap Cloudstack Developer Day
 
Apache Ambari BOF - Overview - Hadoop Summit 2013
Apache Ambari BOF - Overview - Hadoop Summit 2013Apache Ambari BOF - Overview - Hadoop Summit 2013
Apache Ambari BOF - Overview - Hadoop Summit 2013
 
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel AvivElastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv
Elastic Load Balancing Deep Dive and Best Practices - Pop-up Loft Tel Aviv
 
Auto Retweets Using AWS Lambda
Auto Retweets Using AWS LambdaAuto Retweets Using AWS Lambda
Auto Retweets Using AWS Lambda
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 

Similar to Hive warehouse connector

Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Denodo
 
1 Introduction One can argue that the most challenging task .pdf
1 Introduction One can argue that the most challenging task .pdf1 Introduction One can argue that the most challenging task .pdf
1 Introduction One can argue that the most challenging task .pdf
cbholla1
 
Introduction One can argue that the most challenging task in.pdf
Introduction One can argue that the most challenging task in.pdfIntroduction One can argue that the most challenging task in.pdf
Introduction One can argue that the most challenging task in.pdf
adinathknit
 

Similar to Hive warehouse connector (20)

Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
HiveWarehouseConnector
HiveWarehouseConnectorHiveWarehouseConnector
HiveWarehouseConnector
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
Data Engineer’s Lunch #45: Apache Livy
Data Engineer’s Lunch #45: Apache LivyData Engineer’s Lunch #45: Apache Livy
Data Engineer’s Lunch #45: Apache Livy
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
1 Introduction One can argue that the most challenging task .pdf
1 Introduction One can argue that the most challenging task .pdf1 Introduction One can argue that the most challenging task .pdf
1 Introduction One can argue that the most challenging task .pdf
 
Introduction One can argue that the most challenging task in.pdf
Introduction One can argue that the most challenging task in.pdfIntroduction One can argue that the most challenging task in.pdf
Introduction One can argue that the most challenging task in.pdf
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
MySQL Fabric
MySQL FabricMySQL Fabric
MySQL Fabric
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
 Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S... Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
Walking through the Spring Stack for Apache Kafka with Soby Chacko | Kafka S...
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Tensorflow model using docker and AWS SageMaker
Tensorflow model using docker and AWS SageMakerTensorflow model using docker and AWS SageMaker
Tensorflow model using docker and AWS SageMaker
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Hive warehouse connector

  • 1. Confidential – Restricted Tamil Selvan Kandasamy, Senior Customer Operations Engineer - Configuring Spark Hive Warehouse Connector - Debugging LLAP Issues
  • 2. © Cloudera, Inc. All rights reserved. 2 HiveWarehouseConnector A library to read/write DataFrames and Streaming DataFrames to/from Apache Hive™ using LLAP You can use the Hive Warehouse Connector (HWC) API to access any type of table in the Hive catalog from Spark. When you use SparkSQL, standard Spark APIs access tables in the Spark catalog
  • 3. © Cloudera, Inc. All rights reserved. 3 HiveWarehouseConnector - Continued The Hive Warehouse Connector supports the following applications: ● Spark shell ● PySpark ● The spark-submit script Does Ranger come into picture? Yes, it does. Both row and column access control when the data in accessed from Spark
  • 4. © Cloudera, Inc. All rights reserved. 4 HiveWarehouseConnector - Setting things up You must add several Spark properties through spark-2-defaults in Ambari to use the Hive Warehouse Connector for accessing data in Hive. Alternatively, configuration can be provided for each job using hive -- conf Property Description Comments spark.sql.hive.hiveserver2.jdbc.url URL for HiveServer2 Interactive In Ambari, copy the value from Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL spark.datasource.hive.warehouse.metastoreUri URI for metastore Copy the value from hive.metastore.uris. For example,thrift://mycluster-1.com:9083 spark.datasource.hive.warehouse.load.staging.dir HDFS temp directory for batch writes to Hive For example, /tmp spark.hadoop.hive.llap.daemon.service.hosts Application name for LLAP service Copy value from Advanced hive-interactive-site >hive.llap.daemon.service.hosts spark.hadoop.hive.zookeeper.quorum Zookeeper hosts used by LLAP Copy value from Advanced hive- sitehive.zookeeper.quorum
  • 5. © Cloudera, Inc. All rights reserved. 5 HiveWarehouseConnector - Making a connection Run following code in scala shell to view the table data Scala> import com.hortonworks.hwc.HiveWarehouseSession val hive = HiveWarehouseSession.session(spark).build() hive.execute("show tables").show hive.executeQuery("select * from employee").show