SlideShare a Scribd company logo
What is Hive Optimization?
Bahaa Al Zubaidi shares his knowledge on Hive Optimization and defines it as a
process to optimize the structure of a table in Hive. The tables are optimized based on
their usage patterns, which means that they are reorganized to enhance performance.
The two main types of Hive optimizations are:
Partitioned Tables: These tables have been partitioned into multiple sub-tables to
improve query speed and reduce memory usage. This is done by splitting the large data
set into smaller partitions and storing each partition separately. When querying for data,
Hive uses only the required partitions instead of scanning all the partitions. This
improves performance significantly because only one partition needs to be scanned for
each query instead of scanning all partitions at once.
Compressed Tables: These tables have been compressed using Snappy compression
or LZO compression. For example, If you have a 100 million-row table with 1 million
unique values (index keys), Hive can compress it using Snappy or LZO. As a result, it
takes less space and makes your queries faster.
Best Hive Optimization Techniques
Bahaa Al Zubaidi takes a closer look at each of the Hive optimization strategies we can
use for fine-tuning our server’s performance in Hive.
Tez-Execution Engine in Hive
The goal of using the Tez Execution Engine – Hive Optimization Techniques is to
improve the speed with which our hive queries run. Tez is a new application framework
based on Hadoop Yarn, to put it simply.
Which processes general-purpose data processing jobs represented as
complex-directed acyclic graphs. However, it can be seen as the successor to the
map-reduce framework that is both more versatile and powerful.
Usage of Suitable File
If we use a suitable file format based on the data, we can optimize our use of the hive
using ORCFILE. Our ability to conduct queries will improve dramatically as a result.
The ORC file format is ideal for maximizing query performance. ORC stands for
“Optimized Row Columnar” in this context. As a result, we have a more efficient means
of storing data than is possible with conventional file formats.
Hive Partitioning
Partitioning in Hive – Hive Optimization Methods, Hive reads all files in a directory
without splitting them up. It then processes the information using the query filters. This
is a time-consuming and costly process because all data must be read.
The necessity to filter the information based on values in particular columns is another
common scenario. However, in order to implement the partitioning in the Hive, users
need to understand the domain of the data on which they are performing analysis.
Bucketing in Hive
Consider the case of bucketing in Hive – Hive Optimization Methods. Occasionally, a
massive data set is accessible. Partitioning on a certain field or fields does reduce the
overall file size, but it still ends up being much larger than necessary.
Vectorization In Hive
Vectorization We employ Vectorized query execution to boost the speed of operations in
Hive-Hive Optimization Techniques. Scans, aggregates, filters, and joins are all
examples of operations in this context. As a result, operations are carried out in batches
of 1024 rows at once rather than on a single row at a time.
Cost-Based Optimization
Before submitting a Query for final execution, Hive improves its logical and physical
execution plan based on its cost, as described in Cost-Based Optimization in Hive –
Hive Optimization Techniques. Although up to now, the cost of the query has not been
used to guide these optimizations.
A new feature of Hive, CBO, however, executes further optimizations based on query
cost. Decisions like join order, join type, parallelism level, and others may arise as a
result.
Hive Indexing
Hive Index is one of the most effective Hive Optimization Methods. Indexing will boost
your query performance immensely. Indexing, in essence, involves the creation of a
second, reference table that is distinct from the original table itself.
We all know that a Hive table will have many rows and columns. To put it simply, without
indexing, running queries on certain columns will take an extremely long time. Thank
you for your interest in Bahaa Al Zubaidi blogs.
Bahaa Al Zubaidi

More Related Content

Similar to What is Hive Optimization_ - Bahaa Al Zubaidi.pdf

Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...
Serkan Özal
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
hive lab
hive labhive lab
hive lab
marwa baich
 
Cloud computing major project
Cloud computing major projectCloud computing major project
Cloud computing major project
ayk115
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
Lucidworks
 
Hive
HiveHive
Column oriented Transactions
Column oriented TransactionsColumn oriented Transactions
Column oriented Transactions
Aerial Telecom Solutions (ATS) Pvt. Ltd.
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detail
HariKumar544765
 
Applications of parellel computing
Applications of parellel computingApplications of parellel computing
Applications of parellel computing
pbhopi
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
mashoodsyed66
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
PAVANKUMARNOOKALA
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
Umair Shafique
 
6.hive
6.hive6.hive
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
Manish Chopra
 
Centralized logging
Centralized loggingCentralized logging
Centralized logging
blessYahu
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
Puneet Kansal
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
pmanvi
 

Similar to What is Hive Optimization_ - Bahaa Al Zubaidi.pdf (20)

Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
hive lab
hive labhive lab
hive lab
 
Cloud computing major project
Cloud computing major projectCloud computing major project
Cloud computing major project
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Hive
HiveHive
Hive
 
Column oriented Transactions
Column oriented TransactionsColumn oriented Transactions
Column oriented Transactions
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detail
 
Applications of parellel computing
Applications of parellel computingApplications of parellel computing
Applications of parellel computing
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
6.hive
6.hive6.hive
6.hive
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Centralized logging
Centralized loggingCentralized logging
Centralized logging
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 

More from Bahaa Al Zubaidi

RPA: Transforming Business Operations Everywhere
RPA: Transforming Business Operations EverywhereRPA: Transforming Business Operations Everywhere
RPA: Transforming Business Operations Everywhere
Bahaa Al Zubaidi
 
Integrating Push Notifications in PWAs
Integrating Push Notifications in PWAsIntegrating Push Notifications in PWAs
Integrating Push Notifications in PWAs
Bahaa Al Zubaidi
 
BAZUBAIDI - OCT07.docx
BAZUBAIDI - OCT07.docxBAZUBAIDI - OCT07.docx
BAZUBAIDI - OCT07.docx
Bahaa Al Zubaidi
 
PWAs Vs. Native Apps
PWAs Vs. Native AppsPWAs Vs. Native Apps
PWAs Vs. Native Apps
Bahaa Al Zubaidi
 
Offline Capabilities of the PWAs
Offline Capabilities of the PWAsOffline Capabilities of the PWAs
Offline Capabilities of the PWAs
Bahaa Al Zubaidi
 
Introduction to PWAs
Introduction to PWAsIntroduction to PWAs
Introduction to PWAs
Bahaa Al Zubaidi
 
Psycology of Digital Trust
Psycology of Digital TrustPsycology of Digital Trust
Psycology of Digital Trust
Bahaa Al Zubaidi
 
Blockchain & Digital Trust
Blockchain & Digital TrustBlockchain & Digital Trust
Blockchain & Digital Trust
Bahaa Al Zubaidi
 
Evolution of Digital Trust
Evolution of Digital TrustEvolution of Digital Trust
Evolution of Digital Trust
Bahaa Al Zubaidi
 
Data Protection in Smart Cities Apps
Data Protection in Smart Cities AppsData Protection in Smart Cities Apps
Data Protection in Smart Cities Apps
Bahaa Al Zubaidi
 
Role of Biometrics in Smart Cities
Role of Biometrics in Smart CitiesRole of Biometrics in Smart Cities
Role of Biometrics in Smart Cities
Bahaa Al Zubaidi
 
Digital Trust in the Work Place
Digital Trust in the Work PlaceDigital Trust in the Work Place
Digital Trust in the Work Place
Bahaa Al Zubaidi
 
Testing in a DevOps Environment
Testing in a DevOps EnvironmentTesting in a DevOps Environment
Testing in a DevOps Environment
Bahaa Al Zubaidi
 
Infrastructure as Code & its Impact on DevOps
Infrastructure as Code & its Impact on DevOps Infrastructure as Code & its Impact on DevOps
Infrastructure as Code & its Impact on DevOps
Bahaa Al Zubaidi
 
Optimizing Mobile App Development
Optimizing Mobile App Development Optimizing Mobile App Development
Optimizing Mobile App Development
Bahaa Al Zubaidi
 
Revolutionizing DevOps and CI/CD
Revolutionizing DevOps and CI/CDRevolutionizing DevOps and CI/CD
Revolutionizing DevOps and CI/CD
Bahaa Al Zubaidi
 
Exploring Automation with DevOps
Exploring Automation with DevOpsExploring Automation with DevOps
Exploring Automation with DevOps
Bahaa Al Zubaidi
 
Implementing Continuous Integration
Implementing Continuous IntegrationImplementing Continuous Integration
Implementing Continuous Integration
Bahaa Al Zubaidi
 
CI/CD Pipelines: Reliable Software Delivery
CI/CD Pipelines: Reliable Software Delivery CI/CD Pipelines: Reliable Software Delivery
CI/CD Pipelines: Reliable Software Delivery
Bahaa Al Zubaidi
 
Continuous Deployment: Accelerating Releases
Continuous Deployment: Accelerating ReleasesContinuous Deployment: Accelerating Releases
Continuous Deployment: Accelerating Releases
Bahaa Al Zubaidi
 

More from Bahaa Al Zubaidi (20)

RPA: Transforming Business Operations Everywhere
RPA: Transforming Business Operations EverywhereRPA: Transforming Business Operations Everywhere
RPA: Transforming Business Operations Everywhere
 
Integrating Push Notifications in PWAs
Integrating Push Notifications in PWAsIntegrating Push Notifications in PWAs
Integrating Push Notifications in PWAs
 
BAZUBAIDI - OCT07.docx
BAZUBAIDI - OCT07.docxBAZUBAIDI - OCT07.docx
BAZUBAIDI - OCT07.docx
 
PWAs Vs. Native Apps
PWAs Vs. Native AppsPWAs Vs. Native Apps
PWAs Vs. Native Apps
 
Offline Capabilities of the PWAs
Offline Capabilities of the PWAsOffline Capabilities of the PWAs
Offline Capabilities of the PWAs
 
Introduction to PWAs
Introduction to PWAsIntroduction to PWAs
Introduction to PWAs
 
Psycology of Digital Trust
Psycology of Digital TrustPsycology of Digital Trust
Psycology of Digital Trust
 
Blockchain & Digital Trust
Blockchain & Digital TrustBlockchain & Digital Trust
Blockchain & Digital Trust
 
Evolution of Digital Trust
Evolution of Digital TrustEvolution of Digital Trust
Evolution of Digital Trust
 
Data Protection in Smart Cities Apps
Data Protection in Smart Cities AppsData Protection in Smart Cities Apps
Data Protection in Smart Cities Apps
 
Role of Biometrics in Smart Cities
Role of Biometrics in Smart CitiesRole of Biometrics in Smart Cities
Role of Biometrics in Smart Cities
 
Digital Trust in the Work Place
Digital Trust in the Work PlaceDigital Trust in the Work Place
Digital Trust in the Work Place
 
Testing in a DevOps Environment
Testing in a DevOps EnvironmentTesting in a DevOps Environment
Testing in a DevOps Environment
 
Infrastructure as Code & its Impact on DevOps
Infrastructure as Code & its Impact on DevOps Infrastructure as Code & its Impact on DevOps
Infrastructure as Code & its Impact on DevOps
 
Optimizing Mobile App Development
Optimizing Mobile App Development Optimizing Mobile App Development
Optimizing Mobile App Development
 
Revolutionizing DevOps and CI/CD
Revolutionizing DevOps and CI/CDRevolutionizing DevOps and CI/CD
Revolutionizing DevOps and CI/CD
 
Exploring Automation with DevOps
Exploring Automation with DevOpsExploring Automation with DevOps
Exploring Automation with DevOps
 
Implementing Continuous Integration
Implementing Continuous IntegrationImplementing Continuous Integration
Implementing Continuous Integration
 
CI/CD Pipelines: Reliable Software Delivery
CI/CD Pipelines: Reliable Software Delivery CI/CD Pipelines: Reliable Software Delivery
CI/CD Pipelines: Reliable Software Delivery
 
Continuous Deployment: Accelerating Releases
Continuous Deployment: Accelerating ReleasesContinuous Deployment: Accelerating Releases
Continuous Deployment: Accelerating Releases
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

What is Hive Optimization_ - Bahaa Al Zubaidi.pdf

  • 1. What is Hive Optimization? Bahaa Al Zubaidi shares his knowledge on Hive Optimization and defines it as a process to optimize the structure of a table in Hive. The tables are optimized based on their usage patterns, which means that they are reorganized to enhance performance. The two main types of Hive optimizations are: Partitioned Tables: These tables have been partitioned into multiple sub-tables to improve query speed and reduce memory usage. This is done by splitting the large data set into smaller partitions and storing each partition separately. When querying for data, Hive uses only the required partitions instead of scanning all the partitions. This improves performance significantly because only one partition needs to be scanned for each query instead of scanning all partitions at once. Compressed Tables: These tables have been compressed using Snappy compression or LZO compression. For example, If you have a 100 million-row table with 1 million unique values (index keys), Hive can compress it using Snappy or LZO. As a result, it takes less space and makes your queries faster. Best Hive Optimization Techniques Bahaa Al Zubaidi takes a closer look at each of the Hive optimization strategies we can use for fine-tuning our server’s performance in Hive. Tez-Execution Engine in Hive The goal of using the Tez Execution Engine – Hive Optimization Techniques is to improve the speed with which our hive queries run. Tez is a new application framework based on Hadoop Yarn, to put it simply. Which processes general-purpose data processing jobs represented as complex-directed acyclic graphs. However, it can be seen as the successor to the map-reduce framework that is both more versatile and powerful. Usage of Suitable File If we use a suitable file format based on the data, we can optimize our use of the hive using ORCFILE. Our ability to conduct queries will improve dramatically as a result. The ORC file format is ideal for maximizing query performance. ORC stands for “Optimized Row Columnar” in this context. As a result, we have a more efficient means of storing data than is possible with conventional file formats.
  • 2. Hive Partitioning Partitioning in Hive – Hive Optimization Methods, Hive reads all files in a directory without splitting them up. It then processes the information using the query filters. This is a time-consuming and costly process because all data must be read. The necessity to filter the information based on values in particular columns is another common scenario. However, in order to implement the partitioning in the Hive, users need to understand the domain of the data on which they are performing analysis. Bucketing in Hive Consider the case of bucketing in Hive – Hive Optimization Methods. Occasionally, a massive data set is accessible. Partitioning on a certain field or fields does reduce the overall file size, but it still ends up being much larger than necessary. Vectorization In Hive Vectorization We employ Vectorized query execution to boost the speed of operations in Hive-Hive Optimization Techniques. Scans, aggregates, filters, and joins are all examples of operations in this context. As a result, operations are carried out in batches of 1024 rows at once rather than on a single row at a time. Cost-Based Optimization Before submitting a Query for final execution, Hive improves its logical and physical execution plan based on its cost, as described in Cost-Based Optimization in Hive – Hive Optimization Techniques. Although up to now, the cost of the query has not been used to guide these optimizations. A new feature of Hive, CBO, however, executes further optimizations based on query cost. Decisions like join order, join type, parallelism level, and others may arise as a result. Hive Indexing Hive Index is one of the most effective Hive Optimization Methods. Indexing will boost your query performance immensely. Indexing, in essence, involves the creation of a second, reference table that is distinct from the original table itself. We all know that a Hive table will have many rows and columns. To put it simply, without indexing, running queries on certain columns will take an extremely long time. Thank you for your interest in Bahaa Al Zubaidi blogs.