SlideShare a Scribd company logo
1 of 34
Hive
Hive
●
●
●
●
Hive is a data warehouse infrastructure tool.
It resides on top of Hadoop to summarize Big
Data, and makes querying and analyzing easy.
Initially Hive was developed by Facebook,
Later the Apache Software Foundation took it
up and developed it further as an open source
under the name Apache Hive.
Hive is not
● A relational database.
● A design forOnLine Transaction
Processing (OLTP).
● A language for real-time queries and
row-level updates.
Features of Hive
●
●
●
●
It stores schema in a database and processed
data into HDFS.
It is designed for OLAP.
It provides SQLtype language for querying
called HiveQL or HQL.
It is familiar, fast, scalable, and extensible.
Architecture of Hive
Architecture of Hive
Unit Name Operation
User Interface Hive is a data warehouse infrastructure software that can
create interaction between user and HDFS.
The user interfaces that Hive supports are Hive Web UI,
Hive command line, and Hive HD Insight.
Meta Store Hive chooses respective database servers to store the
schema or Metadata of tables, databases, columns in a
table, their data types, and HDFS mapping.
HiveQL Process
Engine
HiveQL is similar to SQLfor querying on schema info on
the Metastore
Execution
Engine
The conjunction part of HiveQL process Engine and
MapReduce is Hive ExecutionEngine.
Execution engine processes the query and generates
results as same as MapReduceresults
HDFS or HBASE Hadoop distributed file system or HBASE are the data
storage techniques to store data into file system.
Working of Hive
Hive - Data Types
●
All the data types in Hive are classified into
four types,
–
–
–
–
Column Types
Literals
Null Values
Complex Types
Column Types
●
●
Column type are used as column data types of
Hive.
Integral Types
– Type
– TINYINT
– SMALLINT
– INT
– BIGINT
Postfix
Y
S
-
L
Example
10Y
10S
10
10L
Column Types
● String Types
– Specified using single quotes (' ') or double quotes
(" ").
– It contains two data types: VARCHAR and CHAR.
Hive follows C-types escape characters.
●
●
●
Data Type
VARCHAR
CHAR
Length
1 to 65355
255
Maps
MAP<primitive_type, data_type>
Column Types
Timestamp Dates Decimals
YYYY-MM-DD HH:MM:SS.fffffffff YYYY-MM-DD DECIMAL(precision, scale)
java.sql.Timestamp 1982-01-14 decimal(10,0)
Union Types
Union is a collection of heterogeneous data types.
UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>
{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
Floating Point
Types
Decimal Type
Null Value – NULL
Hive - Create Database
hive> show databases;
OK
default
Time taken: 13.112 seconds
hive> create databaseretail;
OK
Time taken: 0.113seconds
hive> show databases;
OK
default
retail
Time taken: 0.058seconds
●
Hive - Drop Database
hive> showdatabases;
OK
default
retail
userdb
Time taken: 0.058seconds
hive> DROP DATABASEIF EXISTSuserdb;
OK
Time taken: 4.841seconds
hive> show databases;
OK
default
retail
Time taken: 0.07seconds
hive> DROP DATABASE IF EXISTS financials CASCADE; drop the tables in the
database first
Hive - Create Table & Load Data
SNO Field Name Data Type
1 Eid int
2 Name String
3 Salary Float
4 Designation String
CREATETABLE IF NOT EXISTS retail.employee ( eid int, name String, salary float, designation
String)
COMMENT 'Employee Details'
ROW FORMAT DELIMITED
FIELDSTERMINATED BY't'
LINES TERMINATED BY'n'
STOREDAS TEXTFILE;
hive> LOAD DATALOCAL INPATH'/home/hduser/emp.txt'
> OVERWRITEINTO TABLEretail.employee;
Hive - Alter Table
●
●
●
●
●
ALTERTABLE name RENAME TO new_name
ALTERTABLE name ADD COLUMNS (col_spec[, col_spec...])
ALTERTABLE name DROP [COLUMN] column_name
ALTERTABLE name CHANGE column_name new_name new_type
ALTERTABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
hive> show tables;
OK
testtable
Time taken: 0.082seconds
hive> ALTERTABLE testtable RENAME TOemp;
OK
Time taken: 1.837seconds
hive> show tables;
OK
emp
Time taken: 0.08seconds
Hive - Alter Table
●
●
●
●
●
ALTERTABLE name RENAME TO new_name
ALTERTABLE name ADD COLUMNS (col_spec[, col_spec...])
ALTERTABLE name DROP [COLUMN] column_name
ALTERTABLE name CHANGE column_name new_name new_type
ALTERTABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
hive> show tables;
OK
testtable
Time taken: 0.082seconds
hive> ALTERTABLE testtable RENAME TOemp;
OK
Time taken: 1.837seconds
hive> show tables;
OK
emp
Time taken: 0.08seconds
Hive - Alter Table Example
hive> ALTERTABLE employee CHANGE name ename String;
hive> ALTERTABLE employee CHANGE salary salaryDouble;
hive >ALTERTABLE employee ADD COLUMNS ( dept STRINGCOMMENT
'Department name');
Hive - Drop Table
DROP TABLE IF EXISTSemployee;
Create DATABASE
●
●
●
●
●
●
●
●
●
●
●
hive> CREATE DATABASE IF NOT EXISTS STUDENTS COMMENT 'STUDENT
Details'
> WITH DBPROPERTIES('creator'='PRAKASH');
OK
Time taken: 0.496 seconds
hive> SHOW DATABASES;
OK
default
retail
students
Time taken: 0.086 seconds
hive>
●
Describe
●
●
●
●
●
●
DESCRIBE DATABASE STUDENTS;
OK
students STUDENT Details
hdfs://localhost:54310/user/hive/warehouse/students.db
DESCRIBE DATABASE EXTENDED STUDENTS;
OK
students STUDENT Details
hdfs://localhost:54310/user/hive/warehouse/students.db
{creator=PRAKASH}
●
Alter and Describe
●
●
●
●
●
ALTER DATABASE STUDENTS SET
DBPROPERTIES ('edited by' = 'SRINIDHI');
DESCRIBE DATABASE EXTENDED STUDENTS;
OK
students STUDENT Details
hdfs://localhost:54310/user/hive/warehouse/studen
ts.db{edited by=SRINIDHI, creator=PRAKASH}
Time taken: 0.048 seconds
●
●
Tables – Managed Table
●
●
●
Stores the managed tables under the warehouse
folder under Hive.
The life cycle of table and data is managed by Hive
When the internal table is dropped, it drops the data
as well as metadata.
–
–
●
– CREATE TABLE IF NOT EXISTS STUDENT (rollno INT,
name STRING, gpa FLOAT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 't';
DESCRIBE STUDENT;
OK
int
string
rollno
name
gpa float
External or Self-Managed Table
●
●
●
When the table is dropped, it retains the data in the undelying
location.
External keyword is used.
Location needs to be specified to store the data set in that particular
location.
●
CREATE EXTERNAL TABLE IF NOT EXISTS EXT_STUDENT(rollno
INT, name STRING,
> gpa FLOAT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY 't'
> LOCATION '/STUDENT_INFO';
LOAD DATALOCAL INPATH '/home/hduser/stu.tsv'
> OVERWRITE INTO TABLE EXT_STUDENT;
Copying data from file:/home/hduser/stu.tsv
Copying file: file:/home/hduser/stu.tsv
Loading data to table default.ext_student
Work with Collection Data Types
●
●
1001, Prakash,BE:ME,FLA!65:CLE!76:DAA!89
1002, Ram,Btech:Mtech,FLA!35:CLE!66:DAA!54
●
CREATE TABLE STUDENT_INFO(rollno INT, name String,
qualificationARRAY
> <STRING>, marks MAP<STRING,INT>)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> COLLECTION ITEMS TERMINATED BY ':'
> MAP KEYS TERMINATED BY '!';
●
LOAD DATA LOCAL INPATH '/home/hduser/studentinfo.csv'
> INTO TABLE STUDENT_INFO;
Querying Tables
●
●
●
SELECT * FROM EXT_STUDENT;
SELECT NAME,MARKS['FLA'] FROM
STUDENT_INFO;
SELECT NAME,QUALIFICATION[0] FROM
STUDENT_INFO;
Hive - Partitioning
●
●
–
–
–
–
●
–
●
–
Hive organizes tables into partitions. It is a way of dividing a table into related
parts based on the values of partitioned columns such as date, city, and
department. Using partition, it is easy to query a portion of the data.
Partitions are fundamentally horizontal slices of data which allow larges sets of
data to be segmented into more manageable chunks.
Assume that you are storing information of people in entire world spread across 196+
countries spanning around 500 crores of entries.
If you want to query people from a particular country (Vatican city), in absence of
partitioning, you have to scan all 500 crores of entries even to fetch thousand entries of
a country.
If you partition the table based on country, you can fine tune querying process by just
checking the data for only one country partition.
Hive partition creates a separate directory for a column(s) value.
Static Partition
Columns values known at compile time.
Dynamic Partition
Columns values known at Execution time
Static Partition
CREATE TABLE IF NOT EXISTS STATIC_STUDENT( rollno INT, name
STRING)
> PARTITIONED BY (gpa FLOAT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY 't';
INSERT OVERWRITE TABLE STATIC_STUDENT PARTITION (gpa = 8.1)
> SELECT ROLLNO, NAME FROM EXT_STUDENT WHERE GPA=8.1;
Dymanic Partition
CREATE TABLE IF NOT EXISTS DYNAMIC_STUDENT(rollno
INT, name STRING)
> PARTITIONED BY (gpa FLOAT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY 't';
hive> SET hive.exec.dynamic.partition = true;
hive> SET hive.exec.dynamic.partition.mode= nonstrict;
INSERT OVERWRITE TABLE DYNAMIC_STUDENT PARTITION
(gpa) select rollno, name , gpa from EXT_STUDENT;
Bucketing
●
●
●
●
Similar to partition.
In a partition you need to create a partition for
each unique value of the column -leads
thousands of partition.
Bucketing – limits the number of partition.
Bucket is a file.
Bucketing
• Tocreate a bucketed table having 3 buckets.
CREATE TABLE IF NOT EXISTS STUDENT_BUCKET (rollno INT, name STRING,
gpa FLOAT)
CLUSTERED BY (gpa) into 3 buckets;
• Load data to bucketed table.
FROM STUDENT INSERT OVERWRITE TABLE STUDENT_BUCKET
SELECT rollno,name,gpa;
• Todisplay the content of first bucket.
SELECT DISTINCT gpa FROM STUDENT_BUCKET
TABLESAMPLE(BUCKET 1 OUT OF 3 ON gpa);
View
View support is available only in version starting from 0.6.
To create a view table named “STUDENT_VIEW”
CREATE VIEW STUDENT_VIEW AS SELECT rollno, name FROM
EXT_STUDENT;
Querying the view
SELECT * FROM STUDENT_VIEW LIMIT 4;
To drop the view
DROP VIEW STUDENT_VIEW;
Sub Query
LOAD DATALOCAL INPATH '/home/hduser/Desktop/lines.txt'
OVERWRITE INTO TABLE docs;
CREATE TABLE word_count AS
> SELECT word , count(1) AS count FROM
> (SELECT explode (split (line, ' ')) AS word FROM docs) w
> GROUP BY word
> ORDER BY word;
● explode function – takes array as input and outputs the
elements of the array as seperate rows.
Joins
Joins in Hive is similar to SQL joins
To create JOIN between Student and Department tables where we use RollNo from both the tables as the join key.
1.CREATE TABLE IF NOT EXISTS STUDENT(rollno INT, name STRING, gpa FLOAT) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ‘t’;
2. LOAD DATA LOCAL INPATH ‘/home/hduser/Desktop/student.tsv’ OVERWRITWE INTO TABLE STUDENT;
3.CREATE TABLE IF NOT EXISTS DEPARTMENT(rollno INT, deptno INT ,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ‘t’;
4. LOAD DATA LOCAL INPATH ‘//home/hduser/Desktop/department.tsv’ OVERWRITWE INTO TABLE DEPARTMENT;
5. SELECT a.rollno,a.name,a.gpa,b.deptno FROM STUDENT a JOIN DEPARTMENT b ON a.rollno=b.rollno
Aggregations
Hive supports aggregation functions like avg, count, etc.
Towrite the average and count aggregation function.
SELECT avg(gpa) FROM STUDENT;
SELECT count(*) FROM STUDENT;
Thanks

More Related Content

Similar to HivePart1.pptx

Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
vFabric SQLFire Introduction
vFabric SQLFire IntroductionvFabric SQLFire Introduction
vFabric SQLFire IntroductionJags Ramnarayan
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select TopicsJay Coskey
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftAmazon Web Services
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3MariaDB plc
 
Using ddl statements to create and manage tables
Using ddl statements to create and manage tablesUsing ddl statements to create and manage tables
Using ddl statements to create and manage tablesSyed Zaid Irshad
 
Les09 (using ddl statements to create and manage tables)
Les09 (using ddl statements to create and manage tables)Les09 (using ddl statements to create and manage tables)
Les09 (using ddl statements to create and manage tables)Achmad Solichin
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfssuser598883
 
Hive - ORIEN IT
Hive - ORIEN ITHive - ORIEN IT
Hive - ORIEN ITORIEN IT
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB plc
 

Similar to HivePart1.pptx (20)

Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
vFabric SQLFire Introduction
vFabric SQLFire IntroductionvFabric SQLFire Introduction
vFabric SQLFire Introduction
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
 
Rdbms day3
Rdbms day3Rdbms day3
Rdbms day3
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Using ddl statements to create and manage tables
Using ddl statements to create and manage tablesUsing ddl statements to create and manage tables
Using ddl statements to create and manage tables
 
Les10 Creating And Managing Tables
Les10 Creating And Managing TablesLes10 Creating And Managing Tables
Les10 Creating And Managing Tables
 
Les09 (using ddl statements to create and manage tables)
Les09 (using ddl statements to create and manage tables)Les09 (using ddl statements to create and manage tables)
Les09 (using ddl statements to create and manage tables)
 
Les10.ppt
Les10.pptLes10.ppt
Les10.ppt
 
Dbms sql-final
Dbms  sql-finalDbms  sql-final
Dbms sql-final
 
SQL Windowing
SQL WindowingSQL Windowing
SQL Windowing
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdf
 
Les09
Les09Les09
Les09
 
Hive - ORIEN IT
Hive - ORIEN ITHive - ORIEN IT
Hive - ORIEN IT
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
 

Recently uploaded

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Recently uploaded (20)

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

HivePart1.pptx

  • 2. Hive ● ● ● ● Hive is a data warehouse infrastructure tool. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, Later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.
  • 3. Hive is not ● A relational database. ● A design forOnLine Transaction Processing (OLTP). ● A language for real-time queries and row-level updates.
  • 4. Features of Hive ● ● ● ● It stores schema in a database and processed data into HDFS. It is designed for OLAP. It provides SQLtype language for querying called HiveQL or HQL. It is familiar, fast, scalable, and extensible.
  • 6. Architecture of Hive Unit Name Operation User Interface Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight. Meta Store Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping. HiveQL Process Engine HiveQL is similar to SQLfor querying on schema info on the Metastore Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive ExecutionEngine. Execution engine processes the query and generates results as same as MapReduceresults HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to store data into file system.
  • 8. Hive - Data Types ● All the data types in Hive are classified into four types, – – – – Column Types Literals Null Values Complex Types
  • 9. Column Types ● ● Column type are used as column data types of Hive. Integral Types – Type – TINYINT – SMALLINT – INT – BIGINT Postfix Y S - L Example 10Y 10S 10 10L
  • 10. Column Types ● String Types – Specified using single quotes (' ') or double quotes (" "). – It contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters. ● ● ● Data Type VARCHAR CHAR Length 1 to 65355 255 Maps MAP<primitive_type, data_type>
  • 11. Column Types Timestamp Dates Decimals YYYY-MM-DD HH:MM:SS.fffffffff YYYY-MM-DD DECIMAL(precision, scale) java.sql.Timestamp 1982-01-14 decimal(10,0) Union Types Union is a collection of heterogeneous data types. UNIONTYPE<int, double, array<string>, struct<a:int,b:string>> {0:1} {1:2.0} {2:["three","four"]} {3:{"a":5,"b":"five"}} {2:["six","seven"]} {3:{"a":8,"b":"eight"}} {0:9} Floating Point Types Decimal Type Null Value – NULL
  • 12. Hive - Create Database hive> show databases; OK default Time taken: 13.112 seconds hive> create databaseretail; OK Time taken: 0.113seconds hive> show databases; OK default retail Time taken: 0.058seconds ●
  • 13. Hive - Drop Database hive> showdatabases; OK default retail userdb Time taken: 0.058seconds hive> DROP DATABASEIF EXISTSuserdb; OK Time taken: 4.841seconds hive> show databases; OK default retail Time taken: 0.07seconds hive> DROP DATABASE IF EXISTS financials CASCADE; drop the tables in the database first
  • 14. Hive - Create Table & Load Data SNO Field Name Data Type 1 Eid int 2 Name String 3 Salary Float 4 Designation String CREATETABLE IF NOT EXISTS retail.employee ( eid int, name String, salary float, designation String) COMMENT 'Employee Details' ROW FORMAT DELIMITED FIELDSTERMINATED BY't' LINES TERMINATED BY'n' STOREDAS TEXTFILE; hive> LOAD DATALOCAL INPATH'/home/hduser/emp.txt' > OVERWRITEINTO TABLEretail.employee;
  • 15. Hive - Alter Table ● ● ● ● ● ALTERTABLE name RENAME TO new_name ALTERTABLE name ADD COLUMNS (col_spec[, col_spec...]) ALTERTABLE name DROP [COLUMN] column_name ALTERTABLE name CHANGE column_name new_name new_type ALTERTABLE name REPLACE COLUMNS (col_spec[, col_spec ...]) hive> show tables; OK testtable Time taken: 0.082seconds hive> ALTERTABLE testtable RENAME TOemp; OK Time taken: 1.837seconds hive> show tables; OK emp Time taken: 0.08seconds
  • 16. Hive - Alter Table ● ● ● ● ● ALTERTABLE name RENAME TO new_name ALTERTABLE name ADD COLUMNS (col_spec[, col_spec...]) ALTERTABLE name DROP [COLUMN] column_name ALTERTABLE name CHANGE column_name new_name new_type ALTERTABLE name REPLACE COLUMNS (col_spec[, col_spec ...]) hive> show tables; OK testtable Time taken: 0.082seconds hive> ALTERTABLE testtable RENAME TOemp; OK Time taken: 1.837seconds hive> show tables; OK emp Time taken: 0.08seconds
  • 17. Hive - Alter Table Example hive> ALTERTABLE employee CHANGE name ename String; hive> ALTERTABLE employee CHANGE salary salaryDouble; hive >ALTERTABLE employee ADD COLUMNS ( dept STRINGCOMMENT 'Department name'); Hive - Drop Table DROP TABLE IF EXISTSemployee;
  • 18. Create DATABASE ● ● ● ● ● ● ● ● ● ● ● hive> CREATE DATABASE IF NOT EXISTS STUDENTS COMMENT 'STUDENT Details' > WITH DBPROPERTIES('creator'='PRAKASH'); OK Time taken: 0.496 seconds hive> SHOW DATABASES; OK default retail students Time taken: 0.086 seconds hive> ●
  • 19. Describe ● ● ● ● ● ● DESCRIBE DATABASE STUDENTS; OK students STUDENT Details hdfs://localhost:54310/user/hive/warehouse/students.db DESCRIBE DATABASE EXTENDED STUDENTS; OK students STUDENT Details hdfs://localhost:54310/user/hive/warehouse/students.db {creator=PRAKASH} ●
  • 20. Alter and Describe ● ● ● ● ● ALTER DATABASE STUDENTS SET DBPROPERTIES ('edited by' = 'SRINIDHI'); DESCRIBE DATABASE EXTENDED STUDENTS; OK students STUDENT Details hdfs://localhost:54310/user/hive/warehouse/studen ts.db{edited by=SRINIDHI, creator=PRAKASH} Time taken: 0.048 seconds ● ●
  • 21. Tables – Managed Table ● ● ● Stores the managed tables under the warehouse folder under Hive. The life cycle of table and data is managed by Hive When the internal table is dropped, it drops the data as well as metadata. – – ● – CREATE TABLE IF NOT EXISTS STUDENT (rollno INT, name STRING, gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'; DESCRIBE STUDENT; OK int string rollno name gpa float
  • 22. External or Self-Managed Table ● ● ● When the table is dropped, it retains the data in the undelying location. External keyword is used. Location needs to be specified to store the data set in that particular location. ● CREATE EXTERNAL TABLE IF NOT EXISTS EXT_STUDENT(rollno INT, name STRING, > gpa FLOAT) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY 't' > LOCATION '/STUDENT_INFO'; LOAD DATALOCAL INPATH '/home/hduser/stu.tsv' > OVERWRITE INTO TABLE EXT_STUDENT; Copying data from file:/home/hduser/stu.tsv Copying file: file:/home/hduser/stu.tsv Loading data to table default.ext_student
  • 23. Work with Collection Data Types ● ● 1001, Prakash,BE:ME,FLA!65:CLE!76:DAA!89 1002, Ram,Btech:Mtech,FLA!35:CLE!66:DAA!54 ● CREATE TABLE STUDENT_INFO(rollno INT, name String, qualificationARRAY > <STRING>, marks MAP<STRING,INT>) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > COLLECTION ITEMS TERMINATED BY ':' > MAP KEYS TERMINATED BY '!'; ● LOAD DATA LOCAL INPATH '/home/hduser/studentinfo.csv' > INTO TABLE STUDENT_INFO;
  • 24. Querying Tables ● ● ● SELECT * FROM EXT_STUDENT; SELECT NAME,MARKS['FLA'] FROM STUDENT_INFO; SELECT NAME,QUALIFICATION[0] FROM STUDENT_INFO;
  • 25. Hive - Partitioning ● ● – – – – ● – ● – Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data. Partitions are fundamentally horizontal slices of data which allow larges sets of data to be segmented into more manageable chunks. Assume that you are storing information of people in entire world spread across 196+ countries spanning around 500 crores of entries. If you want to query people from a particular country (Vatican city), in absence of partitioning, you have to scan all 500 crores of entries even to fetch thousand entries of a country. If you partition the table based on country, you can fine tune querying process by just checking the data for only one country partition. Hive partition creates a separate directory for a column(s) value. Static Partition Columns values known at compile time. Dynamic Partition Columns values known at Execution time
  • 26. Static Partition CREATE TABLE IF NOT EXISTS STATIC_STUDENT( rollno INT, name STRING) > PARTITIONED BY (gpa FLOAT) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY 't'; INSERT OVERWRITE TABLE STATIC_STUDENT PARTITION (gpa = 8.1) > SELECT ROLLNO, NAME FROM EXT_STUDENT WHERE GPA=8.1;
  • 27. Dymanic Partition CREATE TABLE IF NOT EXISTS DYNAMIC_STUDENT(rollno INT, name STRING) > PARTITIONED BY (gpa FLOAT) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY 't'; hive> SET hive.exec.dynamic.partition = true; hive> SET hive.exec.dynamic.partition.mode= nonstrict; INSERT OVERWRITE TABLE DYNAMIC_STUDENT PARTITION (gpa) select rollno, name , gpa from EXT_STUDENT;
  • 28. Bucketing ● ● ● ● Similar to partition. In a partition you need to create a partition for each unique value of the column -leads thousands of partition. Bucketing – limits the number of partition. Bucket is a file.
  • 29. Bucketing • Tocreate a bucketed table having 3 buckets. CREATE TABLE IF NOT EXISTS STUDENT_BUCKET (rollno INT, name STRING, gpa FLOAT) CLUSTERED BY (gpa) into 3 buckets; • Load data to bucketed table. FROM STUDENT INSERT OVERWRITE TABLE STUDENT_BUCKET SELECT rollno,name,gpa; • Todisplay the content of first bucket. SELECT DISTINCT gpa FROM STUDENT_BUCKET TABLESAMPLE(BUCKET 1 OUT OF 3 ON gpa);
  • 30. View View support is available only in version starting from 0.6. To create a view table named “STUDENT_VIEW” CREATE VIEW STUDENT_VIEW AS SELECT rollno, name FROM EXT_STUDENT; Querying the view SELECT * FROM STUDENT_VIEW LIMIT 4; To drop the view DROP VIEW STUDENT_VIEW;
  • 31. Sub Query LOAD DATALOCAL INPATH '/home/hduser/Desktop/lines.txt' OVERWRITE INTO TABLE docs; CREATE TABLE word_count AS > SELECT word , count(1) AS count FROM > (SELECT explode (split (line, ' ')) AS word FROM docs) w > GROUP BY word > ORDER BY word; ● explode function – takes array as input and outputs the elements of the array as seperate rows.
  • 32. Joins Joins in Hive is similar to SQL joins To create JOIN between Student and Department tables where we use RollNo from both the tables as the join key. 1.CREATE TABLE IF NOT EXISTS STUDENT(rollno INT, name STRING, gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’; 2. LOAD DATA LOCAL INPATH ‘/home/hduser/Desktop/student.tsv’ OVERWRITWE INTO TABLE STUDENT; 3.CREATE TABLE IF NOT EXISTS DEPARTMENT(rollno INT, deptno INT ,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’; 4. LOAD DATA LOCAL INPATH ‘//home/hduser/Desktop/department.tsv’ OVERWRITWE INTO TABLE DEPARTMENT; 5. SELECT a.rollno,a.name,a.gpa,b.deptno FROM STUDENT a JOIN DEPARTMENT b ON a.rollno=b.rollno
  • 33. Aggregations Hive supports aggregation functions like avg, count, etc. Towrite the average and count aggregation function. SELECT avg(gpa) FROM STUDENT; SELECT count(*) FROM STUDENT;