SlideShare a Scribd company logo
1 of 15
HIVE (APACHE HIVE)
• Hive, and later owned by Apache, is a data storage
originally developed by Facebook system that was
developed with a purpose to analyze organized data.
• Hive in Big Data is a data warehouse and SQL-like
querying tool built on the Hadoop ecosystem.
• Apache Hive is a distributed, fault-tolerant
data warehouse system that enables analytics at a
massive scale.
INTRODUCTION
Cont.
• Apache Hive is a distributed, fault-tolerant data warehouse system that enables
analytics at a massive scale.
• Hive Metastore (HMS) provides a central repository of metadata that can
easily be analyzed to make informed, data driven decisions, and therefore
it is a critical component of many data lake architectures.
• Hive is built on top of Apache Hadoop and supports storage on S3, ADLS,
GS etc. though HDFS. Hive allows users to read, write, and manage petabytes
of data using SQL.
• Apache Hive is a very effective tool when it comes to big data (descriptive
data to be analyzed).
• As data is stored in the Apache Hadoop Distributed File System (HDFS) where
data is processed .
• Apache Hive assists in processing and analyzing, and producing data-
driven patterns and trends.
Apache HIVE Architecture
Cont.
 HiveQL is a SQL-like language
that interacts with the Hive
website in various organizations
and analyzes the
required data in a
structured format.
Hive chiefly consists of three core parts:
Hive Clients: Hive offers a variety of drivers designed for communication with
different applications.
For example,
Hive provides Thrift clients for Thrift-based applications.
Hive Services: Hive services perform client interactions with Hive.
For example,
if a client wants to perform a query, it must talk with Hive
services.
Hive Storage and
Computing:
Hive services such as file system, job client, and meta
store then communicates with Hive storageand stores
things like metadata table information and query
results.
Cont.
• Hive in big data innovation is a milestone that eventually led to data analysis
on a large scale.
• Large organizations need big data to record information collected over time.
• To generate data-driven analysis, organizations collect data and use such
software applications to analyze their data.
• This data, contained in Apache Hive, can be used to read,write,and manage
stored information in an organized way.
• For this, organizations neededlarger equipment and that is probably why the
release of software likeApache Hive was needed.
Need of HIVE
• SQL-like Interface: Hive's familiar SQL-like interface makes it simple for users to
query and analyze big datasets without the need for programming experience.
Scalability: Hive in Big Data can handle massive amounts of data stored in
HDFS and other data stores compatible with Hadoop.
Flexibility: Hive supports various data serialization formats ORC, making it a
versatile tool capable of handling various use cases and data formats.
•Integration: Hive in Big Data interfaces with other Hadoop ecosystem tools
like Pig, Sqoop, and Flume, allowing users to conduct data analysis jobs and
processes.
Characteristics of Hive
External tables:
Hive supports external tables, which allow users to access data stored in other storage systems
such as HBase, Cassandra, andAmazon S3.
Partitioning:
Hive offers partitioning, which allows users to separate huge datasets based on parameters
such as date, location, or user ID. Restricting the quantity of data that must be scanned
improves query performance.
Cont.
Fast : Quickly process enormous amounts of data.
Familiar : Hive is its familiar SQL-like interface.
Scalable: Hive in Big Data can handle massive amount of data stored in
HDFS and other compatible data stores.
Advantages of Hive
• Partition your data to reduce read time within your directory, or else all the
data will get read
• Use appropriate file formats such as the Optimized Row
Columnar (ORC)to increase query performance. ORC reduces the original
data size by up to 75 percent
• Divide table sets into more manageable parts by employing bucketing
• Improve aggregations, filters, scans, and joins by vectorizing your queries.
Perform these functions in batches of 1024 rows at once, rather than one at a
time
• Create a separate index table that functions as a quick reference for the original
table.
Hive Optimization Techniques
• Data Mining
• Log Processing
• Document Indexing
• Customer Facing Business
Intelligence
• Predictive Modelling
• Hypothesis Testing
Applications of Hive
EXAMPLES
 “Airbnb connects people with accommodation and activities worldwide
by 2.9 million registered tourists, who support 800k overnight stays.
Airbnb uses Amazon EMR to run Apache Hive in the S3 data pool. Running
Hive in EMR collections enables Airbnb analysts to create temporary SQL
queries in data stored in the S3 data pool. Spark at three times its
original speed”.
 “Guardian provides 27 million members with the protection they deserve
through insurance and asset management products and services. Guardian
uses Amazon EMR to deploy Apache Hive in the S3 data pool. Apache
Hive is used to process clusters. data once influenced Guardian Direct, a
digital platform that allows consumers to research and purchase both
Guardian products and third-party products in the insurance industry”.
Important Points
• Hive is a Hadoop-based data warehouse and SQL-style querying tool.
• It enables users to execute ad-hoc searches and analyses on big datasets without
learning languages like MapReduce or Pig.
• Hive supports external tables, partitioning, and data serialization formats such as Avro
and Parquet.
• Hive's architecture comprises four major components:
• Hive User Interface, Meta Store, HiveQLProcess Engine, and Execution Engine.
• Hive has several benefits for big data analysis, including ease of use, scalability,
flexibility, integration, and cost-effectiveness.
Thank you :)

More Related Content

Similar to hive architecture and hive components in detail

01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptxVIJAYAPRABAP
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxiaeronlineexm
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in AzureMostafa
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxAnonymous9etQKwW
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Big Data Analytics .pptx
Big Data Analytics .pptxBig Data Analytics .pptx
Big Data Analytics .pptxpriti jadhao
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightNilesh Gule
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azureMostafa
 

Similar to hive architecture and hive components in detail (20)

Hive
HiveHive
Hive
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Big Data Analytics .pptx
Big Data Analytics .pptxBig Data Analytics .pptx
Big Data Analytics .pptx
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 
Hive.pptx
Hive.pptxHive.pptx
Hive.pptx
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Case study on big data
Case study on big dataCase study on big data
Case study on big data
 

Recently uploaded

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

hive architecture and hive components in detail

  • 2. • Hive, and later owned by Apache, is a data storage originally developed by Facebook system that was developed with a purpose to analyze organized data. • Hive in Big Data is a data warehouse and SQL-like querying tool built on the Hadoop ecosystem. • Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. INTRODUCTION
  • 3. Cont. • Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. • Hive Metastore (HMS) provides a central repository of metadata that can easily be analyzed to make informed, data driven decisions, and therefore it is a critical component of many data lake architectures. • Hive is built on top of Apache Hadoop and supports storage on S3, ADLS, GS etc. though HDFS. Hive allows users to read, write, and manage petabytes of data using SQL.
  • 4. • Apache Hive is a very effective tool when it comes to big data (descriptive data to be analyzed). • As data is stored in the Apache Hadoop Distributed File System (HDFS) where data is processed . • Apache Hive assists in processing and analyzing, and producing data- driven patterns and trends. Apache HIVE Architecture
  • 5. Cont.  HiveQL is a SQL-like language that interacts with the Hive website in various organizations and analyzes the required data in a structured format.
  • 6. Hive chiefly consists of three core parts: Hive Clients: Hive offers a variety of drivers designed for communication with different applications. For example, Hive provides Thrift clients for Thrift-based applications. Hive Services: Hive services perform client interactions with Hive. For example, if a client wants to perform a query, it must talk with Hive services. Hive Storage and Computing: Hive services such as file system, job client, and meta store then communicates with Hive storageand stores things like metadata table information and query results. Cont.
  • 7. • Hive in big data innovation is a milestone that eventually led to data analysis on a large scale. • Large organizations need big data to record information collected over time. • To generate data-driven analysis, organizations collect data and use such software applications to analyze their data. • This data, contained in Apache Hive, can be used to read,write,and manage stored information in an organized way. • For this, organizations neededlarger equipment and that is probably why the release of software likeApache Hive was needed. Need of HIVE
  • 8. • SQL-like Interface: Hive's familiar SQL-like interface makes it simple for users to query and analyze big datasets without the need for programming experience. Scalability: Hive in Big Data can handle massive amounts of data stored in HDFS and other data stores compatible with Hadoop. Flexibility: Hive supports various data serialization formats ORC, making it a versatile tool capable of handling various use cases and data formats. •Integration: Hive in Big Data interfaces with other Hadoop ecosystem tools like Pig, Sqoop, and Flume, allowing users to conduct data analysis jobs and processes. Characteristics of Hive
  • 9. External tables: Hive supports external tables, which allow users to access data stored in other storage systems such as HBase, Cassandra, andAmazon S3. Partitioning: Hive offers partitioning, which allows users to separate huge datasets based on parameters such as date, location, or user ID. Restricting the quantity of data that must be scanned improves query performance. Cont.
  • 10. Fast : Quickly process enormous amounts of data. Familiar : Hive is its familiar SQL-like interface. Scalable: Hive in Big Data can handle massive amount of data stored in HDFS and other compatible data stores. Advantages of Hive
  • 11. • Partition your data to reduce read time within your directory, or else all the data will get read • Use appropriate file formats such as the Optimized Row Columnar (ORC)to increase query performance. ORC reduces the original data size by up to 75 percent • Divide table sets into more manageable parts by employing bucketing • Improve aggregations, filters, scans, and joins by vectorizing your queries. Perform these functions in batches of 1024 rows at once, rather than one at a time • Create a separate index table that functions as a quick reference for the original table. Hive Optimization Techniques
  • 12. • Data Mining • Log Processing • Document Indexing • Customer Facing Business Intelligence • Predictive Modelling • Hypothesis Testing Applications of Hive
  • 13. EXAMPLES  “Airbnb connects people with accommodation and activities worldwide by 2.9 million registered tourists, who support 800k overnight stays. Airbnb uses Amazon EMR to run Apache Hive in the S3 data pool. Running Hive in EMR collections enables Airbnb analysts to create temporary SQL queries in data stored in the S3 data pool. Spark at three times its original speed”.  “Guardian provides 27 million members with the protection they deserve through insurance and asset management products and services. Guardian uses Amazon EMR to deploy Apache Hive in the S3 data pool. Apache Hive is used to process clusters. data once influenced Guardian Direct, a digital platform that allows consumers to research and purchase both Guardian products and third-party products in the insurance industry”.
  • 14. Important Points • Hive is a Hadoop-based data warehouse and SQL-style querying tool. • It enables users to execute ad-hoc searches and analyses on big datasets without learning languages like MapReduce or Pig. • Hive supports external tables, partitioning, and data serialization formats such as Avro and Parquet. • Hive's architecture comprises four major components: • Hive User Interface, Meta Store, HiveQLProcess Engine, and Execution Engine. • Hive has several benefits for big data analysis, including ease of use, scalability, flexibility, integration, and cost-effectiveness.