SlideShare a Scribd company logo
1 of 18
Hive in Practice
Andras_Feher@epam.com
March 13, 2017
AGENDA
• About this presentation
• Refreshing memory
• Homework micro project
• Some basic HiveQL in connection with the homework
• Tips and trics
• Q and A
REFRESHING THE MEMORY ...
Data warehouse infrastructure built on Hadoop1
Initially developed by Facebook2
Access any type of data through SQL-like interface3
HiveQL: subset/extension of standard SQL4
Converts HiveQL (SQL-like) queries to MapReduce, Tez, Spark jobs5
Not suitable for OLTP6
Not a relational database7
Not performant with small amounts of data8
Internal vs. External tables9
HOMEWORK – THE USE CASE
Mainframe
Fund
transactions
......
Hadoop cluster
Hive
Fund
transactions
external table
Linux FS
Funds internal
table
Fund tr. archive
internal table
partitionedTOP 5 Funds
Report
...
1
2
3
Reporting
application
HOMEWORK – REPORT SAMPLE
TOP 5 FUNDS SOLD BY FUND MANAGER
Date: 2017-01-23
AEGON
Fund ID Fund Name Amt(HUF)
----------------------------------------------------------------------------------------
HU0000707401 AEGON Russia Részvény Befektetési Alap 837088447
HU0000710843 AEGON Lengyel Részvény Befektetési Alap B sorozat 724418208
HU0000713144 AEGON Russia Részvény Befektetési Alap PI sorozat 297676842
HU0000710157 AEGON Russia Részvény Befektetési Alap P sorozat 213092436
HU0000709514 AEGON Russia Részvény Befektetési Alap I sorozat 137271573
ERSTE
Fund ID Fund Name Amt(HUF)
----------------------------------------------------------------------------------------
HU0000708656 ERSTE Abszolút Hozamú Eszközallokációs Alapok Alapja 741018249
HU0000708631 ERSTE DPM Globális Részvény Alapok Alapja 734481241
HU0000701537 Erste Nyíltvégű Közép-Európai Részvény Alapok Alapja 512927455
HU0000704200 Erste Stock Hungary Indexkövető Részvény Befektetési Alap 147623040
HU0000712492 Erste Stock Global HUF Alapok Alapja 124933138
OTP
Fund ID Fund Name Amt(HUF)
----------------------------------------------------------------------------------------
HU0000709084 OTP Orosz Részvény Alap B sorozat 964596471
HU0000709092 OTP Orosz Részvény Alap C sorozat 709871151
HU0000704960 OTP Tőzsdén Kereskedett BUX Indexkövető Alap 685225834
HU0000705561 OTP Planéta Feltörekvő Piaci Részvény Alapok Alapja B sorozat 220627446
HU0000709019 OTP Orosz Részvény Alap A sorozat 59372424
....
Fund
manager
Top 5 funds
sold today,
sorted by sum
of amount desc
HOMEWORK TASKS
Create the:
• database,
• FUNDS_TRANSACTIONS external table
• FUNDS and FUNDS_TR_ARCHIVE internal tables partitioned by transaction year
• optionally index(es),
• optionally views
in Hive
1
Load the internal tables from the homework data files2
Create a single query, using analytic functions, that joins the tables and
produces report data
3
Create the reporting program that produces the report in the format
described on the previous slide
4
Schedule the running in chron or Oozie to run at 18:00 every day5
Optionally:
Send all HiveQL source and report data to me
BEELINE
• Beeline – Command Line Shell
beeline -u jdbc:hive2://localhost:10000/default -n scott -p tiger --color=true
Get list of internal commands like save commands as script, get list of indexes
for a table, get list of all tables, run script from file ....:
0: jdbc:hive2://localhost:10000/default> !help
beeline -u jdbc:hive2://localhost:10000/default -n scott -p tiger -f homework.hql 
--outputformat=csv2 --showHeader=false --silent --showWarnings=false 
| python report.py > report.txt
Idea for the homework:
OFF TOPIC - PYTHON
...
print ... (90 - len(fields[1].decode('utf8'))) * ' ' ...
Workaround for left positioning utf-8 text in Python 2.x:
report.py
CREATION DDL
Creating database:
• LOCATION hdfs_path
Creating table (official documentation):
• <managed> / EXTERNAL /TEMPORARY
• PARTITIONED BY
• STORED AS : use ORC or PARQUET whenever possible, 3-4x faster than TEXT
• TBLPROPERTIES : "orc.compress"="ZLIB” (/"SNAPPY "/"NONE " )
• LOCATION
Note: PK and FK constants can be defined, but not enforced. Metadata info for optimizers.
Creating index (avoid in case of file type containing index e.g. ORC):
• COMPACT/BITMAP
• <automatic>, WITH DEFERRED REBUILD (ALTER INDEX ... REBUILD)
Note: it is possible to rebuild index by partition
Notes for creating a view :
• no materialized views
• schema frozen
• filter push down
ADDING PARTITION
Creating the partitioned table:
CREATE TABLE usr_part(id int, name string) PARTITIONED BY (entry_year int);
Static mode:
INSERT INTO usr_part PARTITION(entry_year=2016)
SELECT id, name FROM dyn_part_source;
Or
ALTER TABLE usr_part ADD PARTITION (entry_year=1987);
Dynamic mode (does not work with load):
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
FROM dyn_part_source dps
INSERT OVERWRITE TABLE usr_part PARTITION(entry_year)
SELECT dps.id, dps.name, dps.entry_year;
Directory structure:
/apps/hive/warehouse/mydb.db/usr_part
/entry_year=1987
000000_0
/entry_year=2008
000000_0
/entry_year=2009
000000_0
SKIPPING HEADER AT LOADING TABLE
Skipping header:
Solution 1 Hive table skips first row (not recommended for partitioned tables):
CREATE TABLE ....
TBLPROPERTIES ("skip.header.line.count"="1")
Solution 2 pre-processing file with tail:
tail -n +2 withfirstrow.csv > withoutfirstrow.csv
Solution 3 pre-processing file with sed:
sed -i 1d filename.csv
Warning: LOAD DATA LOCAL moves the data
RANK() ANALYTICAL (=WINDOWING) FUNCTION
Analytic functions by Example
Best earning employees by department
SELECT id, name, deptid, salary, rank() OVER (PARTITION BY deptid ORDER BY salary) as rank
FROM employee;
+-----+-------+---------+---------+-------+--+
| id | name | deptid | salary | rank |
+-----+-------+---------+---------+-------+--+
| 4 | D | 1 | 4000 | 1 |
| 3 | C | 1 | 3000 | 2 |
| 5 | E | 1 | 2500 | 3 |
| 2 | B | 1 | 2000 | 4 |
| 6 | F | 1 | 1500 | 5 |
| 1 | A | 1 | 1000 | 6 |
| 11 | K | 2 | 5000 | 1 |
| 7 | G | 2 | 2500 | 2 |
| 9 | I | 2 | 2300 | 3 |
| 10 | J | 2 | 1800 | 4 |
| 12 | L | 2 | 1600 | 5 |
| 8 | H | 2 | 1400 | 6 |
+-----+-------+---------+---------+-------+--+
RANK() ANALYTICAL (=WINDOWING) FUNCTION
Top 3 best earning employees by department
SELECT id, name, deptid, salary, rank
FROM (
SELECT id, name, deptid, salary,
rank() OVER (PARTITION BY deptid ORDER BY salary desc) as rank
from employee
) ranked_table
WHERE ranked_table.rank <=3;
+-----+-------+---------+---------+-------+--+
| id | name | deptid | salary | rank |
+-----+-------+---------+---------+-------+--+
| 1 | A | 1 | 1000 | 1 |
| 6 | F | 1 | 1500 | 2 |
| 2 | B | 1 | 2000 | 3 |
| 8 | H | 2 | 1400 | 1 |
| 12 | L | 2 | 1600 | 2 |
| 10 | J | 2 | 1800 | 3 |
+-----+-------+---------+---------+-------+--+
TIPS AND TRICKS - COMPLEX TYPES
• First: create a dummy table with exactly one row:
CREATE TABLE dual(x int) TBLPROPERTIES("immutable"="true");
INSERT INTO TABLE dual values (1);
• Structs
CREATE TABLE phonebook_struct(
name string,
phones struct<phone_type:string,phone_number:string>
);
INSERT INTO TABLE phonebook_struct
SELECT
'Tercsi',
NAMED_STRUCT('phone_type','home','phone_number','98347598374')
FROM dual;
SELECT phones.phone_number
FROM phonebook_struct
WHERE name='Tercsi' AND phones.phone_type='home';
TIPS AND TRICS - COMPLEX TYPES
• Maps (key-value tuples)
CREATE TABLE phonebook_map(name string, phones map<string,string>);
INSERT INTO TABLE phonebook_map
SELECT 'Tercsi', str_to_map("home:348756348756") FROM dual;
SELECT phones['home'] from phonebook_map WHERE name='Tercsi';
• Arrays (indexable lists)
CREATE TABLE phonebook_array(name string, phones array<string>);
INSERT INTO phonebook_array
SELECT 'Tercsi', array('12345','678')
FROM dual;
SELECT phones[0]
FROM phonebook_array
WHERE name='Tercsi';
SELECT SORT_ARRAY(work_place)
FROM employee
WHERE ARRAY_CONTAINS(work_place, 'Montreal');
• Union (Support incomplete, use only for look-at)
CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>);
TIPS AND TRICKS
Merging splitted files on HDFS into single local file:
hdfs dfs -getmerge /user/andras/text_import /tmp/test
Query result directly to HDFS:
Create
hive -e 'select * from andras.titanic'|hdfs dfs -put -f - /user/andras/t
Append
hive -e 'select * from andras.titanic'|hdfs dfs -appendToFile -f -
/user/andras/t
Beeline
insert overwrite directory '/user/hive/t5' select * from titanic;
Backup and restore (with metadata):
export table titanic to '/tmp/backup';
import table titanic_imported from '/tmp/backup';
TIPS AND TRICKS
Current time:
select from_unixtime(unix_timestamp()) as current_time from
employee limit 1
Difference in days
select (UNIX_TIMESTAMP('2015-01-21 18:00:00') -
UNIX_TIMESTAMP('2015-01-10 11:00:00'))/60/60/24 as daydiff
FROM employee LIMIT 1;
Converting timestamp to date:
select TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())) AS curr_date
FROM employee LIMIT 1;
TIPS AND TRICKS
Workaround for „Null pointer exception” az index rebuild:
set hive.execution.engine=mr;
alter index … rebuild
set hive.execution.engine=tez;
Adding third-party serde to advanced csv processing:
add jar /home/andras_feher/csv-serde-1.1.2-0.11.0-all.jar
create table airports( ... )
)row format serde 'com.bizo.hive.serde.csv.CSVSerde'
....;

More Related Content

What's hot

Functional programming from its fundamentals
Functional programming from its fundamentalsFunctional programming from its fundamentals
Functional programming from its fundamentalsMauro Palsgraaf
 
12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible Columns12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible ColumnsConnor McDonald
 
peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysisVyacheslav Arbuzov
 
Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Anastasia Lubennikova
 
Exploiting Memory Overflows
Exploiting Memory OverflowsExploiting Memory Overflows
Exploiting Memory OverflowsAnkur Tyagi
 
Scala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsScala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsKirill Kozlov
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewDan Morrill
 
OREO - Hack.lu CTF 2014
OREO - Hack.lu CTF 2014OREO - Hack.lu CTF 2014
OREO - Hack.lu CTF 2014YOKARO-MON
 
Pandas pythonfordatascience
Pandas pythonfordatasciencePandas pythonfordatascience
Pandas pythonfordatascienceNishant Upadhyay
 
R Programming: Numeric Functions In R
R Programming: Numeric Functions In RR Programming: Numeric Functions In R
R Programming: Numeric Functions In RRsquared Academy
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3MariaDB plc
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义leejd
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 

What's hot (20)

Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Data transformation-cheatsheet
Data transformation-cheatsheetData transformation-cheatsheet
Data transformation-cheatsheet
 
Functional programming from its fundamentals
Functional programming from its fundamentalsFunctional programming from its fundamentals
Functional programming from its fundamentals
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible Columns12c Mini Lesson - Invisible Columns
12c Mini Lesson - Invisible Columns
 
peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysis
 
Lecture 2 f17
Lecture 2 f17Lecture 2 f17
Lecture 2 f17
 
Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL.
 
Exploiting Memory Overflows
Exploiting Memory OverflowsExploiting Memory Overflows
Exploiting Memory Overflows
 
Windows 7
Windows 7Windows 7
Windows 7
 
Scala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsScala. Introduction to FP. Monads
Scala. Introduction to FP. Monads
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
OREO - Hack.lu CTF 2014
OREO - Hack.lu CTF 2014OREO - Hack.lu CTF 2014
OREO - Hack.lu CTF 2014
 
Pandas pythonfordatascience
Pandas pythonfordatasciencePandas pythonfordatascience
Pandas pythonfordatascience
 
Haskell 101
Haskell 101Haskell 101
Haskell 101
 
Perm winter school 2014.01.31
Perm winter school 2014.01.31Perm winter school 2014.01.31
Perm winter school 2014.01.31
 
R Programming: Numeric Functions In R
R Programming: Numeric Functions In RR Programming: Numeric Functions In R
R Programming: Numeric Functions In R
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 

Similar to Hive in Practice

sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.pptssuserd64918
 
Discard inport exchange table & tablespace
Discard inport exchange table & tablespaceDiscard inport exchange table & tablespace
Discard inport exchange table & tablespaceMarco Tusa
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 
Lecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erickLecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erickokelloerick
 
Compiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesCompiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesEelco Visser
 
4. Data Manipulation.ppt
4. Data Manipulation.ppt4. Data Manipulation.ppt
4. Data Manipulation.pptKISHOYIANKISH
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code LabColin Su
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterLibriotech
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformanceZohar Elkayam
 
Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Edwin Jung
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Top 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsTop 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsNirav Shah
 
3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schemaRoland Bouman
 

Similar to Hive in Practice (20)

sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 
01 isa
01 isa01 isa
01 isa
 
01_intro-cpp.ppt
01_intro-cpp.ppt01_intro-cpp.ppt
01_intro-cpp.ppt
 
01_intro-cpp.ppt
01_intro-cpp.ppt01_intro-cpp.ppt
01_intro-cpp.ppt
 
Discard inport exchange table & tablespace
Discard inport exchange table & tablespaceDiscard inport exchange table & tablespace
Discard inport exchange table & tablespace
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Lecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erickLecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erick
 
Compiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesCompiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual Machines
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
python.ppt
python.pptpython.ppt
python.ppt
 
4. Data Manipulation.ppt
4. Data Manipulation.ppt4. Data Manipulation.ppt
4. Data Manipulation.ppt
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code Lab
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøster
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme Performance
 
Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
python.ppt
python.pptpython.ppt
python.ppt
 
Top 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsTop 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tips
 
3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema
 

Recently uploaded

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Hive in Practice

  • 2. AGENDA • About this presentation • Refreshing memory • Homework micro project • Some basic HiveQL in connection with the homework • Tips and trics • Q and A
  • 3. REFRESHING THE MEMORY ... Data warehouse infrastructure built on Hadoop1 Initially developed by Facebook2 Access any type of data through SQL-like interface3 HiveQL: subset/extension of standard SQL4 Converts HiveQL (SQL-like) queries to MapReduce, Tez, Spark jobs5 Not suitable for OLTP6 Not a relational database7 Not performant with small amounts of data8 Internal vs. External tables9
  • 4. HOMEWORK – THE USE CASE Mainframe Fund transactions ...... Hadoop cluster Hive Fund transactions external table Linux FS Funds internal table Fund tr. archive internal table partitionedTOP 5 Funds Report ... 1 2 3 Reporting application
  • 5. HOMEWORK – REPORT SAMPLE TOP 5 FUNDS SOLD BY FUND MANAGER Date: 2017-01-23 AEGON Fund ID Fund Name Amt(HUF) ---------------------------------------------------------------------------------------- HU0000707401 AEGON Russia Részvény Befektetési Alap 837088447 HU0000710843 AEGON Lengyel Részvény Befektetési Alap B sorozat 724418208 HU0000713144 AEGON Russia Részvény Befektetési Alap PI sorozat 297676842 HU0000710157 AEGON Russia Részvény Befektetési Alap P sorozat 213092436 HU0000709514 AEGON Russia Részvény Befektetési Alap I sorozat 137271573 ERSTE Fund ID Fund Name Amt(HUF) ---------------------------------------------------------------------------------------- HU0000708656 ERSTE Abszolút Hozamú Eszközallokációs Alapok Alapja 741018249 HU0000708631 ERSTE DPM Globális Részvény Alapok Alapja 734481241 HU0000701537 Erste Nyíltvégű Közép-Európai Részvény Alapok Alapja 512927455 HU0000704200 Erste Stock Hungary Indexkövető Részvény Befektetési Alap 147623040 HU0000712492 Erste Stock Global HUF Alapok Alapja 124933138 OTP Fund ID Fund Name Amt(HUF) ---------------------------------------------------------------------------------------- HU0000709084 OTP Orosz Részvény Alap B sorozat 964596471 HU0000709092 OTP Orosz Részvény Alap C sorozat 709871151 HU0000704960 OTP Tőzsdén Kereskedett BUX Indexkövető Alap 685225834 HU0000705561 OTP Planéta Feltörekvő Piaci Részvény Alapok Alapja B sorozat 220627446 HU0000709019 OTP Orosz Részvény Alap A sorozat 59372424 .... Fund manager Top 5 funds sold today, sorted by sum of amount desc
  • 6. HOMEWORK TASKS Create the: • database, • FUNDS_TRANSACTIONS external table • FUNDS and FUNDS_TR_ARCHIVE internal tables partitioned by transaction year • optionally index(es), • optionally views in Hive 1 Load the internal tables from the homework data files2 Create a single query, using analytic functions, that joins the tables and produces report data 3 Create the reporting program that produces the report in the format described on the previous slide 4 Schedule the running in chron or Oozie to run at 18:00 every day5 Optionally: Send all HiveQL source and report data to me
  • 7. BEELINE • Beeline – Command Line Shell beeline -u jdbc:hive2://localhost:10000/default -n scott -p tiger --color=true Get list of internal commands like save commands as script, get list of indexes for a table, get list of all tables, run script from file ....: 0: jdbc:hive2://localhost:10000/default> !help beeline -u jdbc:hive2://localhost:10000/default -n scott -p tiger -f homework.hql --outputformat=csv2 --showHeader=false --silent --showWarnings=false | python report.py > report.txt Idea for the homework:
  • 8. OFF TOPIC - PYTHON ... print ... (90 - len(fields[1].decode('utf8'))) * ' ' ... Workaround for left positioning utf-8 text in Python 2.x: report.py
  • 9. CREATION DDL Creating database: • LOCATION hdfs_path Creating table (official documentation): • <managed> / EXTERNAL /TEMPORARY • PARTITIONED BY • STORED AS : use ORC or PARQUET whenever possible, 3-4x faster than TEXT • TBLPROPERTIES : "orc.compress"="ZLIB” (/"SNAPPY "/"NONE " ) • LOCATION Note: PK and FK constants can be defined, but not enforced. Metadata info for optimizers. Creating index (avoid in case of file type containing index e.g. ORC): • COMPACT/BITMAP • <automatic>, WITH DEFERRED REBUILD (ALTER INDEX ... REBUILD) Note: it is possible to rebuild index by partition Notes for creating a view : • no materialized views • schema frozen • filter push down
  • 10. ADDING PARTITION Creating the partitioned table: CREATE TABLE usr_part(id int, name string) PARTITIONED BY (entry_year int); Static mode: INSERT INTO usr_part PARTITION(entry_year=2016) SELECT id, name FROM dyn_part_source; Or ALTER TABLE usr_part ADD PARTITION (entry_year=1987); Dynamic mode (does not work with load): set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; FROM dyn_part_source dps INSERT OVERWRITE TABLE usr_part PARTITION(entry_year) SELECT dps.id, dps.name, dps.entry_year; Directory structure: /apps/hive/warehouse/mydb.db/usr_part /entry_year=1987 000000_0 /entry_year=2008 000000_0 /entry_year=2009 000000_0
  • 11. SKIPPING HEADER AT LOADING TABLE Skipping header: Solution 1 Hive table skips first row (not recommended for partitioned tables): CREATE TABLE .... TBLPROPERTIES ("skip.header.line.count"="1") Solution 2 pre-processing file with tail: tail -n +2 withfirstrow.csv > withoutfirstrow.csv Solution 3 pre-processing file with sed: sed -i 1d filename.csv Warning: LOAD DATA LOCAL moves the data
  • 12. RANK() ANALYTICAL (=WINDOWING) FUNCTION Analytic functions by Example Best earning employees by department SELECT id, name, deptid, salary, rank() OVER (PARTITION BY deptid ORDER BY salary) as rank FROM employee; +-----+-------+---------+---------+-------+--+ | id | name | deptid | salary | rank | +-----+-------+---------+---------+-------+--+ | 4 | D | 1 | 4000 | 1 | | 3 | C | 1 | 3000 | 2 | | 5 | E | 1 | 2500 | 3 | | 2 | B | 1 | 2000 | 4 | | 6 | F | 1 | 1500 | 5 | | 1 | A | 1 | 1000 | 6 | | 11 | K | 2 | 5000 | 1 | | 7 | G | 2 | 2500 | 2 | | 9 | I | 2 | 2300 | 3 | | 10 | J | 2 | 1800 | 4 | | 12 | L | 2 | 1600 | 5 | | 8 | H | 2 | 1400 | 6 | +-----+-------+---------+---------+-------+--+
  • 13. RANK() ANALYTICAL (=WINDOWING) FUNCTION Top 3 best earning employees by department SELECT id, name, deptid, salary, rank FROM ( SELECT id, name, deptid, salary, rank() OVER (PARTITION BY deptid ORDER BY salary desc) as rank from employee ) ranked_table WHERE ranked_table.rank <=3; +-----+-------+---------+---------+-------+--+ | id | name | deptid | salary | rank | +-----+-------+---------+---------+-------+--+ | 1 | A | 1 | 1000 | 1 | | 6 | F | 1 | 1500 | 2 | | 2 | B | 1 | 2000 | 3 | | 8 | H | 2 | 1400 | 1 | | 12 | L | 2 | 1600 | 2 | | 10 | J | 2 | 1800 | 3 | +-----+-------+---------+---------+-------+--+
  • 14. TIPS AND TRICKS - COMPLEX TYPES • First: create a dummy table with exactly one row: CREATE TABLE dual(x int) TBLPROPERTIES("immutable"="true"); INSERT INTO TABLE dual values (1); • Structs CREATE TABLE phonebook_struct( name string, phones struct<phone_type:string,phone_number:string> ); INSERT INTO TABLE phonebook_struct SELECT 'Tercsi', NAMED_STRUCT('phone_type','home','phone_number','98347598374') FROM dual; SELECT phones.phone_number FROM phonebook_struct WHERE name='Tercsi' AND phones.phone_type='home';
  • 15. TIPS AND TRICS - COMPLEX TYPES • Maps (key-value tuples) CREATE TABLE phonebook_map(name string, phones map<string,string>); INSERT INTO TABLE phonebook_map SELECT 'Tercsi', str_to_map("home:348756348756") FROM dual; SELECT phones['home'] from phonebook_map WHERE name='Tercsi'; • Arrays (indexable lists) CREATE TABLE phonebook_array(name string, phones array<string>); INSERT INTO phonebook_array SELECT 'Tercsi', array('12345','678') FROM dual; SELECT phones[0] FROM phonebook_array WHERE name='Tercsi'; SELECT SORT_ARRAY(work_place) FROM employee WHERE ARRAY_CONTAINS(work_place, 'Montreal'); • Union (Support incomplete, use only for look-at) CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>);
  • 16. TIPS AND TRICKS Merging splitted files on HDFS into single local file: hdfs dfs -getmerge /user/andras/text_import /tmp/test Query result directly to HDFS: Create hive -e 'select * from andras.titanic'|hdfs dfs -put -f - /user/andras/t Append hive -e 'select * from andras.titanic'|hdfs dfs -appendToFile -f - /user/andras/t Beeline insert overwrite directory '/user/hive/t5' select * from titanic; Backup and restore (with metadata): export table titanic to '/tmp/backup'; import table titanic_imported from '/tmp/backup';
  • 17. TIPS AND TRICKS Current time: select from_unixtime(unix_timestamp()) as current_time from employee limit 1 Difference in days select (UNIX_TIMESTAMP('2015-01-21 18:00:00') - UNIX_TIMESTAMP('2015-01-10 11:00:00'))/60/60/24 as daydiff FROM employee LIMIT 1; Converting timestamp to date: select TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())) AS curr_date FROM employee LIMIT 1;
  • 18. TIPS AND TRICKS Workaround for „Null pointer exception” az index rebuild: set hive.execution.engine=mr; alter index … rebuild set hive.execution.engine=tez; Adding third-party serde to advanced csv processing: add jar /home/andras_feher/csv-serde-1.1.2-0.11.0-all.jar create table airports( ... ) )row format serde 'com.bizo.hive.serde.csv.CSVSerde' ....;

Editor's Notes

  1. Schema frozen: select * from
  2. Schema frozen: select * from
  3. allow developers to perform tasks in SQL that were previously confined to procedural languages
  4. allow developers to perform tasks in SQL that were previously confined to procedural languages