SlideShare a Scribd company logo
1 of 36
Confidential, Copyright © Quanticate
Introduction to
Apache Hive
Muralidharan Deenathayalan
Technical Lead
Muralidharan.deenathayalan@quanticate.com
Apache and Apache Hive project logo are trademarks of The Apache Software Foundation.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Confidential, Copyright © Quanticate
Agenda
 Who Am I ?
 What is Apache Hive?
 Apache Hive key features
 Apache Hive architecture
 How Apache Hive works in Apache Hadoop Eco-system?
 Where Apache Hive is useful?
 Where is Apache Hive is not useful
 Who uses of Apache Hive?
 What is HQL?
 HQL Demo
Confidential, Copyright © Quanticate
Who Am I ?
 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint
 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra
 Primary author of Apache Cassandra Cookbook (In writing )
 Csharpcorner MVP
 Frequent blogger
Confidential, Copyright © Quanticate
What is Apache Hive?
 Apache Hive - SQL on top of Hadoop
 A data warehouse infrastructure built on top of Hadoop for providing data summarization,
query, and analysis.
Confidential, Copyright © Quanticate
Apache Hive key features
 Similar to SQL
 SQL has a huge user base
 SQL is easy to code
 Rich data types (structs, lists and maps)
 Supports SQL filters, joins, group-by and Order by clause
 Extensibility – Custom Types, Custom Functions etc
Confidential, Copyright © Quanticate
Apache Hive architecture
Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
Confidential, Copyright © Quanticate
How Apache Hive works in
Apache Hadoop Eco-system
Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
Confidential, Copyright © Quanticate
Where Apache Hive is useful?
It is well suited for batch processing.
 Log processing,
 Text mining,
 Document indexing,
 Customer-facing business intelligence,
 Predictive modeling etc
Confidential, Copyright © Quanticate
Where is Apache Hive is not useful?
Hive is not designed for,
 Online transaction processing
 Real-time queries
Confidential, Copyright © Quanticate
Who uses of Apache Hive?
Apache Hive is used by,
 Bizo - Uses Hive for reporting and ad hoc queries.
 Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.
 CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.
 Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.
 HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web
analytics
 Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal
and user-facing analytics
Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
Confidential, Copyright © Quanticate
What is HQL?
HQL : Hive Query Language
• Does not conform any ANSI standard
• Very close to MySQL dialect, but with some differences
• SQL to HQL cheat Sheet http://hortonworks.com/wp-
content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
• HQL does not support transactions, so don’t compare with RDBMS
Confidential, Copyright © Quanticate
HQL – Create table
Syntax:
CREATE TABLE <table_name> (<column_definitions>)
[ROW FORMAT <row_format>]
[STORED AS <file_format>]
Example:
CREATE TABLE posts (user STRING, post STRING, time BIGINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
Confidential, Copyright © Quanticate
HQL – Create table Demo
Confidential, Copyright © Quanticate
HQL – Describe table
Syntax :
describe <table_name>;
Example:
describe posts;
Confidential, Copyright © Quanticate
HQL – Describe table demo
Confidential, Copyright © Quanticate
HQL – Show all tables
Syntax:
show tables;
show tables [<filter>];
Example:
show tables;
Show tables ‘table*';
Confidential, Copyright © Quanticate
HQL – Show all tables demo
Confidential, Copyright © Quanticate
HQL – Alter table
Syntax:
ALTER TABLE <table_name> RENAME TO <new_table_name>
ALTER TABLE <table_name> change <old_column_name>
<new_column_name> <new_data_type>;
Example:
//Rename table name
Alter table posts rename to myposts;
// Rename column name with data type change
Alter table posts change time time1 string;
Confidential, Copyright © Quanticate
HQL – Alter table demo
Confidential, Copyright © Quanticate
HQL – How to get records into
Apache Hive tables?
There are two ways to load the data into Apache Hive tables
 Using insert statement
Used to load the data from another table using select statement
 Using Load statement
Used to load the data from a file
Confidential, Copyright © Quanticate
HQL – Insert records
Syntax:
Insert into table <tablename>
select_statement1 from <another_table>;
Example:
Insert into table posts
select “user1”, “Demo“, “123” from table1
Confidential, Copyright © Quanticate
HQL – Insert records demo
Confidential, Copyright © Quanticate
HQL – Load data
Syntax:
Load data inpath <filepath> [overwrite] into table <tablename>
Example:
Load data inpath '/user/hue/posts.csv' into table 'posts'
Confidential, Copyright © Quanticate
HQL –Load data
Confidential, Copyright © Quanticate
HQL – Update records
Syntax:
There is no specific syntax for update, but you can insert statement
with overwrite option.
Example:
Insert overwrite table posts
select “user1”, “Demo“, “123” from table1 where id = ‘123’
Confidential, Copyright © Quanticate
HQL – Update records demo
Confidential, Copyright © Quanticate
HQL – Delete records
You can not records from Apache Hive tables!
Confidential, Copyright © Quanticate
HQL – Delete records demo
Confidential, Copyright © Quanticate
HQL – Drop table
Syntax:
drop table <table_name>
Example:
drop table posts;
Confidential, Copyright © Quanticate
HQL – Drop table demo
Confidential, Copyright © Quanticate
Summary
 What is Apache Hive?
 Apache Hive key features
 Apache Hive architecture
 How Apache Hive works in Apache Hadoop Eco-system?
 Where Apache Hive is useful?
 Where is Apache Hive is not useful
 Who uses of Apache Hive?
 Getting started with HQL
Confidential, Copyright © Quanticate
Q & A
Confidential, Copyright © Quanticate
For the next session !!
 Partitioning
 Bucketing
 Union
 Sub queries
 Joins
 Group By
 Order By
 Aggregations
Confidential, Copyright © Quanticate
References
https://hive.apache.org/
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
https://cwiki.apache.org/confluence/display/Hive/Home
https://cwiki.apache.org/confluence/display/Hive/PoweredBy
http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
Confidential, Copyright © Quanticate
Coding-Freaks.Net
www.codingfreaks.net
Quanticate OPDev Twitter
https://twitter.com/quanticateopdev
Twitter
www.Twitter.com/muralidharand
Confidential, Copyright © Quanticate

More Related Content

What's hot

What's hot (20)

Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Hive
HiveHive
Hive
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Apache hive
Apache hiveApache hive
Apache hive
 

Viewers also liked

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
Tapan Avasthi
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
 
Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009
Namit Jain
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
Reza Ameri
 

Viewers also liked (20)

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Hive tutorial , introduction
Hive tutorial , introductionHive tutorial , introduction
Hive tutorial , introduction
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
 
Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 

Similar to Apache Hive - Introduction

An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to Accumulo
Donald Miner
 

Similar to Apache Hive - Introduction (20)

Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languages
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to Accumulo
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 

More from Muralidharan Deenathayalan

More from Muralidharan Deenathayalan (10)

What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Alfresco 5.0 features
Alfresco 5.0 featuresAlfresco 5.0 features
Alfresco 5.0 features
 
Test drive on driven development process
Test drive on driven development processTest drive on driven development process
Test drive on driven development process
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Alfresco share 4.1 to 4.2 customisation
Alfresco share 4.1 to 4.2 customisationAlfresco share 4.1 to 4.2 customisation
Alfresco share 4.1 to 4.2 customisation
 
Introduction about Alfresco webscript
Introduction about Alfresco webscriptIntroduction about Alfresco webscript
Introduction about Alfresco webscript
 
Alfresco activiti workflows
Alfresco activiti workflowsAlfresco activiti workflows
Alfresco activiti workflows
 
Alfresco content model
Alfresco content modelAlfresco content model
Alfresco content model
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 

Apache Hive - Introduction

  • 1. Confidential, Copyright © Quanticate Introduction to Apache Hive Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache and Apache Hive project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2. Confidential, Copyright © Quanticate Agenda  Who Am I ?  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  What is HQL?  HQL Demo
  • 3. Confidential, Copyright © Quanticate Who Am I ?  7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint  2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra  Primary author of Apache Cassandra Cookbook (In writing )  Csharpcorner MVP  Frequent blogger
  • 4. Confidential, Copyright © Quanticate What is Apache Hive?  Apache Hive - SQL on top of Hadoop  A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
  • 5. Confidential, Copyright © Quanticate Apache Hive key features  Similar to SQL  SQL has a huge user base  SQL is easy to code  Rich data types (structs, lists and maps)  Supports SQL filters, joins, group-by and Order by clause  Extensibility – Custom Types, Custom Functions etc
  • 6. Confidential, Copyright © Quanticate Apache Hive architecture Courtesy & ©: http://www.cubrid.org/blog/dev-platform/platforms-for-big-data/
  • 7. Confidential, Copyright © Quanticate How Apache Hive works in Apache Hadoop Eco-system Courtesy & ©: http://yourstory.com/2012/04/introduction-to-big-data-hadoop-ecosystem-part-1/
  • 8. Confidential, Copyright © Quanticate Where Apache Hive is useful? It is well suited for batch processing.  Log processing,  Text mining,  Document indexing,  Customer-facing business intelligence,  Predictive modeling etc
  • 9. Confidential, Copyright © Quanticate Where is Apache Hive is not useful? Hive is not designed for,  Online transaction processing  Real-time queries
  • 10. Confidential, Copyright © Quanticate Who uses of Apache Hive? Apache Hive is used by,  Bizo - Uses Hive for reporting and ad hoc queries.  Chitika - Uses Hive for data mining and analysis on our 435M monthly global users.  CNET - Uses Hive for data mining, internal log analysis and ad hoc queries.  Digg - Uses Hive for data mining, internal log analysis, R&D, and reporting/analytics.  HubSpot - Uses Hive as part of a larger Hadoop pipeline to serve near-realtime web analytics  Scribd - Users hive for machine learning, data mining, ad-hoc querying, and both internal and user-facing analytics Courtesy & ©: https://cwiki.apache.org/confluence/display/Hive/PoweredBy
  • 11. Confidential, Copyright © Quanticate What is HQL? HQL : Hive Query Language • Does not conform any ANSI standard • Very close to MySQL dialect, but with some differences • SQL to HQL cheat Sheet http://hortonworks.com/wp- content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf • HQL does not support transactions, so don’t compare with RDBMS
  • 12. Confidential, Copyright © Quanticate HQL – Create table Syntax: CREATE TABLE <table_name> (<column_definitions>) [ROW FORMAT <row_format>] [STORED AS <file_format>] Example: CREATE TABLE posts (user STRING, post STRING, time BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
  • 13. Confidential, Copyright © Quanticate HQL – Create table Demo
  • 14. Confidential, Copyright © Quanticate HQL – Describe table Syntax : describe <table_name>; Example: describe posts;
  • 15. Confidential, Copyright © Quanticate HQL – Describe table demo
  • 16. Confidential, Copyright © Quanticate HQL – Show all tables Syntax: show tables; show tables [<filter>]; Example: show tables; Show tables ‘table*';
  • 17. Confidential, Copyright © Quanticate HQL – Show all tables demo
  • 18. Confidential, Copyright © Quanticate HQL – Alter table Syntax: ALTER TABLE <table_name> RENAME TO <new_table_name> ALTER TABLE <table_name> change <old_column_name> <new_column_name> <new_data_type>; Example: //Rename table name Alter table posts rename to myposts; // Rename column name with data type change Alter table posts change time time1 string;
  • 19. Confidential, Copyright © Quanticate HQL – Alter table demo
  • 20. Confidential, Copyright © Quanticate HQL – How to get records into Apache Hive tables? There are two ways to load the data into Apache Hive tables  Using insert statement Used to load the data from another table using select statement  Using Load statement Used to load the data from a file
  • 21. Confidential, Copyright © Quanticate HQL – Insert records Syntax: Insert into table <tablename> select_statement1 from <another_table>; Example: Insert into table posts select “user1”, “Demo“, “123” from table1
  • 22. Confidential, Copyright © Quanticate HQL – Insert records demo
  • 23. Confidential, Copyright © Quanticate HQL – Load data Syntax: Load data inpath <filepath> [overwrite] into table <tablename> Example: Load data inpath '/user/hue/posts.csv' into table 'posts'
  • 24. Confidential, Copyright © Quanticate HQL –Load data
  • 25. Confidential, Copyright © Quanticate HQL – Update records Syntax: There is no specific syntax for update, but you can insert statement with overwrite option. Example: Insert overwrite table posts select “user1”, “Demo“, “123” from table1 where id = ‘123’
  • 26. Confidential, Copyright © Quanticate HQL – Update records demo
  • 27. Confidential, Copyright © Quanticate HQL – Delete records You can not records from Apache Hive tables!
  • 28. Confidential, Copyright © Quanticate HQL – Delete records demo
  • 29. Confidential, Copyright © Quanticate HQL – Drop table Syntax: drop table <table_name> Example: drop table posts;
  • 30. Confidential, Copyright © Quanticate HQL – Drop table demo
  • 31. Confidential, Copyright © Quanticate Summary  What is Apache Hive?  Apache Hive key features  Apache Hive architecture  How Apache Hive works in Apache Hadoop Eco-system?  Where Apache Hive is useful?  Where is Apache Hive is not useful  Who uses of Apache Hive?  Getting started with HQL
  • 32. Confidential, Copyright © Quanticate Q & A
  • 33. Confidential, Copyright © Quanticate For the next session !!  Partitioning  Bucketing  Union  Sub queries  Joins  Group By  Order By  Aggregations
  • 34. Confidential, Copyright © Quanticate References https://hive.apache.org/ https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/Home https://cwiki.apache.org/confluence/display/Hive/PoweredBy http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
  • 35. Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand