SlideShare a Scribd company logo
Apache Hive & Hadoop
The de facto standard for SQL queries in Hadoop
Since its incubation in 2008, Apache Hive is considered the defacto standard
for interactive SQL queries over petabytes of data in Hadoop.
With the completion of the Stinger Initiative, and the next phase of
Stinger.next, the Apache community has greatly improved Hive’s speed, scale
and SQL semantics. Hive easily integrates with other critical data center
technologies using a familiar JDBC interface.
What Hive Does
Hadoop was built to organize and store massive amounts of data of all shapes,
sizes and formats. Because of Hadoop’s “schema on read” architecture, a
Hadoop cluster is a perfect reservoir of heterogeneous data—structured and
unstructured—from a multitude of sources.
Data analysts use Hive to explore, structure and analyze that data, then turn it
into actionable business insight.
How Hive Works
The tables in Hive are similar to tables in a relational database, and data units
are organized in a taxonomy from larger to more granular units. Databases are
comprised of tables, which are made up of partitions. Data can be accessed via
a simple query language and Hive supports overwriting or appending data.
Within a particular database, data in the tables is serialized and each table has
a corresponding Hadoop Distributed File System (HDFS) directory. Each table
can be sub-divided into partitions that determine how data is distributed
within sub-directories of the table directory. Data within partitions can be
further broken down into buckets.
Hive supports all the common primitive data formats such as BIGINT, BINARY,
BOOLEAN, CHAR, DECIMAL, DOUBLE, FLOAT, INT, SMALLINT, STRING,
TIMESTAMP, and TINYINT. In addition, analysts can combine primitive data
types to form complex data types, such as structs, maps and arrays.
As part of the exercise in learning about the hive query language we completed
one sample exercise on hive.
The sample exercise has been taken from the following website
http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-
hive/
Hive provides a platform to run SQL queries. This is a more familiar with the
programmers of SQL background. It is a known and learned fact that sql
queries are similar to comprehend and understand
Downloading the data :
The csv file can be downloaded from the following zip file.
http://hortonassets.s3.amazonaws.com/pig/lahman591-csv.zip
Uploading the data:
Although the link from wherewe are solving the programis an easy guide. We
would be using the hue to upload the files for execution of the program.
Steps to upload:
StartVM Box and open SSH Terminal
Log on to the address http://127.0.0.1:8000/
Click on file browser option followed by view then click on hue
Upload 2 files batting.csv and master.csv
Lets look into the programme:
create table temp_batting (col_value STRING);
This would create the table to store the data. Remember we have to save the
queries with a name create and also press execute once it is done
Next we would load the batting.csv into the temp_batting we created. Refer
snapshot below:
Load data Inpath '/user/admin/Batting.csv' OVERWRITE INTO TABLE
temp_batting;
As in sql for looking at the data we would write the select command . Here also
we follow suite
Select * from batting_temp;
The output is given below:
create table batting ( player_id STRING , year INT ,runs INT);
Through regex operation we would copy the data from temp_batting and copy
it into batting. For this we need to create the table batting with 3 fields
player_id , years and run. After this we would group the data by year to
highlight the maximum runs obtained for each year
Select year , max(runs) FROM batting GROUP BY year;
Last step we would find the player id with maximum runs. For this refer the
query below...
Select a. year , a. player_ID , a.runs frombatting a
JOIN(SELECTyear , max(runs) runs FROMbatting GROUP BY year ) b
ON (a.year = b. year AND a.runs = b.runs);
This was a simple yet engaging exercise to learn about the HQL. Interesting
part we found the batting.csv file vanishes from the view of file browser. Refer
Below...

More Related Content

What's hot

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
Yahoo Developer Network
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Abhishek Gautam
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
vishal choudhary
 
Apache hive
Apache hiveApache hive
Apache hive
Vaibhav Kadu
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
Gigin Krishnan
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
Hive Hadoop
Hive HadoopHive Hadoop
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Apache Hive
Apache HiveApache Hive
Apache Hive
tusharsinghal58
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Import Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioImport Database Data using RODBC in R Studio
Import Database Data using RODBC in R Studio
Rupak Roy
 
Import and Export Big Data using R Studio
Import and Export Big Data using R StudioImport and Export Big Data using R Studio
Import and Export Big Data using R Studio
Rupak Roy
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Apache storm
Apache stormApache storm
Apache storm
Kapil Kumar
 
Big Data & Hadoop Data Analysis
Big Data & Hadoop Data AnalysisBig Data & Hadoop Data Analysis
Big Data & Hadoop Data Analysis
Koushik Mondal
 

What's hot (20)

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hive training
Hive trainingHive training
Hive training
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Import Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioImport Database Data using RODBC in R Studio
Import Database Data using RODBC in R Studio
 
Import and Export Big Data using R Studio
Import and Export Big Data using R StudioImport and Export Big Data using R Studio
Import and Export Big Data using R Studio
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Apache storm
Apache stormApache storm
Apache storm
 
Big Data & Hadoop Data Analysis
Big Data & Hadoop Data AnalysisBig Data & Hadoop Data Analysis
Big Data & Hadoop Data Analysis
 

Similar to Apache hive

Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
earnwithme2522
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
Aucfan
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
AnandMHadoop
 
Hive presentation
Hive presentationHive presentation
Hive presentation
Hitesh Agrawal
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
Will Du
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zingzingopen
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
Bhadra Gowdra
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Apache Kite
Apache KiteApache Kite
Apache Kite
Alwin James
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for Processing
Manish Chopra
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & ZingLong Dao
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
Edward D. Kim
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
Christopher Williams
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
Manish Chopra
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
snehal parikh
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
RUHULAMINHAZARIKA
 

Similar to Apache hive (20)

Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Hive presentation
Hive presentationHive presentation
Hive presentation
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Apache Kite
Apache KiteApache Kite
Apache Kite
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for Processing
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
 

Recently uploaded

Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 

Recently uploaded (20)

Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 

Apache hive

  • 1. Apache Hive & Hadoop The de facto standard for SQL queries in Hadoop Since its incubation in 2008, Apache Hive is considered the defacto standard for interactive SQL queries over petabytes of data in Hadoop. With the completion of the Stinger Initiative, and the next phase of Stinger.next, the Apache community has greatly improved Hive’s speed, scale and SQL semantics. Hive easily integrates with other critical data center technologies using a familiar JDBC interface. What Hive Does Hadoop was built to organize and store massive amounts of data of all shapes, sizes and formats. Because of Hadoop’s “schema on read” architecture, a Hadoop cluster is a perfect reservoir of heterogeneous data—structured and unstructured—from a multitude of sources. Data analysts use Hive to explore, structure and analyze that data, then turn it into actionable business insight. How Hive Works The tables in Hive are similar to tables in a relational database, and data units are organized in a taxonomy from larger to more granular units. Databases are comprised of tables, which are made up of partitions. Data can be accessed via a simple query language and Hive supports overwriting or appending data. Within a particular database, data in the tables is serialized and each table has a corresponding Hadoop Distributed File System (HDFS) directory. Each table can be sub-divided into partitions that determine how data is distributed within sub-directories of the table directory. Data within partitions can be further broken down into buckets. Hive supports all the common primitive data formats such as BIGINT, BINARY, BOOLEAN, CHAR, DECIMAL, DOUBLE, FLOAT, INT, SMALLINT, STRING,
  • 2. TIMESTAMP, and TINYINT. In addition, analysts can combine primitive data types to form complex data types, such as structs, maps and arrays. As part of the exercise in learning about the hive query language we completed one sample exercise on hive. The sample exercise has been taken from the following website http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache- hive/ Hive provides a platform to run SQL queries. This is a more familiar with the programmers of SQL background. It is a known and learned fact that sql queries are similar to comprehend and understand Downloading the data : The csv file can be downloaded from the following zip file. http://hortonassets.s3.amazonaws.com/pig/lahman591-csv.zip Uploading the data: Although the link from wherewe are solving the programis an easy guide. We would be using the hue to upload the files for execution of the program. Steps to upload: StartVM Box and open SSH Terminal Log on to the address http://127.0.0.1:8000/ Click on file browser option followed by view then click on hue Upload 2 files batting.csv and master.csv
  • 3. Lets look into the programme: create table temp_batting (col_value STRING); This would create the table to store the data. Remember we have to save the queries with a name create and also press execute once it is done
  • 4. Next we would load the batting.csv into the temp_batting we created. Refer snapshot below: Load data Inpath '/user/admin/Batting.csv' OVERWRITE INTO TABLE temp_batting; As in sql for looking at the data we would write the select command . Here also we follow suite Select * from batting_temp;
  • 5. The output is given below: create table batting ( player_id STRING , year INT ,runs INT); Through regex operation we would copy the data from temp_batting and copy it into batting. For this we need to create the table batting with 3 fields player_id , years and run. After this we would group the data by year to highlight the maximum runs obtained for each year Select year , max(runs) FROM batting GROUP BY year;
  • 6.
  • 7.
  • 8. Last step we would find the player id with maximum runs. For this refer the query below... Select a. year , a. player_ID , a.runs frombatting a JOIN(SELECTyear , max(runs) runs FROMbatting GROUP BY year ) b ON (a.year = b. year AND a.runs = b.runs);
  • 9.
  • 10. This was a simple yet engaging exercise to learn about the HQL. Interesting part we found the batting.csv file vanishes from the view of file browser. Refer Below...