SlideShare a Scribd company logo
1 of 37
Introduction to
HiveQL
BY KRISTIN FERRIER
About Me – Kristin Ferrier
 15+ Years in IT (Software development and BI development)
 10+ years experience with SQL Server and 5+ years experience with
Oracle
 Co-founder OKCSQL
 Currently Sr. Data Analyst at an energy company
 Social Media
 Twitter: @SQLenergy
 Blog: http://www.kristinferrier.com
Agenda
 Hadoop – Very High Level
 Hive and HiveQL - High Level
 Getting started with Hive and HiveQL
 HiveQL examples
 Resources for getting started with HiveQL
Hadoop
 Open source software
 Popular for storing, processing, and analyzing large volumes of data
 For example, web logs or sensor data
 Main distributions
 Cloudera
 Hortonworks
 MapR (has some proprietary components)
Hadoop 2.0 Main Components
 Hadoop Distributed File System (HDFS)
 Handles the data storage
 MapReduce
 Handles the processing
 Works with key value pairs
 Often written in Java
 Can be written in any scripting language using the Streaming API of
Hadoop
Example MapReduce Code
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
Code from Hortonworks tutorial found at http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
Getting Started with Hadoop
 What if I don’t know Java?
 Or one of the Scripting languages using the Streaming API of Hadoop
 Example: Python
 That’s OK. If you know SQL, then Hive and HiveQL may be a great
starting point for your Hadoop learning
Hive
Hive essentially allows us to use tables
within Hadoop
 Built on top of Apache Hadoop
 Can access files stored in HDFS or HBase
 HCatalog allows you to apply table structures to the data
 HiveQL to query the data
HiveQL
HiveQL is SQL-like language for
querying data from Hive
 Follows some of the ANSI SQL-92 standard
 Offers its own extensions
 Implicitly turned into MapReduce jobs
HiveQL – Key SQL items it has
 SELECT
 FROM
 WHERE
 GROUP BY
 HAVING
 JOINS – Some kinds
HiveQL – Key differences from SQL
 No transactions
 No materialized views
 Update and delete available only with Hive 0.14 and later
 Hive 0.14 was released November 2014
Accessing Hive
 Hue
 Web interface for Hadoop
 Beeswax
 Hive UI within Hue
Hue
Beeswax
Getting Data into Hive Tables
 One way is to import a file into Hive
 Can create the table at this time
 Can import the data at this time
 File can even come from a Windows box
Importing a file
Beeswax  Tables  Create a new table from a file
Importing a file cont.
Enter Table Name and Description  .. button
Importing a file cont.
Upload a file  Select your Windows file
 Open
Importing a file cont.
After file uploads, double-click your file
Importing a file cont.
Choose a Delimiter
Importing a file cont.
Select column data types  Create Table
Importing a file cont.
Table has been created
Query Editor
 Write queries in the Query Editor
Select
SELECT * FROM WEATHER
Where, Group By, Min/Max
Where, Group By, Min/Max - Results
Aliasing, Ordering
 Standard SQL syntax for Aliasing
 SORT BY instead of ORDER BY– For ordering
Aliasing, Ordering - Results
Joins
 INNER, LEFT, RIGHT, and FULL OUTER
 Equi Joins only: (table1.key = table2.key) is allowed but not (table1.key
<> table2.key)
 Extensions exist like LEFT SEMI JOIN
INNER JOIN
INNER JOIN - Results
LEFT SEMI JOIN
 Left Semi Joins are less necessary
starting with Hive 0.13
 As of Hive 0.13 the IN/NOT
IN/EXISTS/NOT EXISTS operators are
supported using subqueries
SELECT a.key, a.value
FROM a
WHERE a.key in
(SELECT b.key
FROM B);
can be rewritten to
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b ON (a.key = b.key)
Example from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Performance
 Queries can take minutes to run. Focus is on analysis of large data
sets.
 Relational databases are still a strong solution for providing the faster
performance of CRUD (create, read, update, and delete)
operations required by OLTP systems.
Summary
 Hive essentially allows us to use tables in Hadoop
 We can query them using HiveQL, which is similar to SQL
 Knowing how to write MapReduce code is not required, as the
HiveQL will be turned into MapReduce for us
Getting Started Yourself
 Hortonworks Sandbox
 Portable Hadoop environment with tutorials
 Even though the sandbox runs Hadoop on Linux, you can run the sandbox
on your Windows machine and access it via a web browser
 Available at http://hortonworks.com/sandbox
Getting Started Yourself
 Hive DML Reference
 https://cwiki.apache.org/confluence/display/hive/languageManual+dml
 Apache’s Hive Language Manual
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual
 Treasure’s HiveQL Reference
 http://docs.treasuredata.com/articles/hive
 Network World – Comparing the top Hadoop Distros
 http://www.networkworld.com/article/2369327/software/comparing-the-
top-hadoop-distributions.html
Contact Info
 Social Media
 Twitter: @SQLenergy
 Blog: http://www.kristinferrier.com

More Related Content

What's hot (20)

Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Apache hive
Apache hiveApache hive
Apache hive
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence
 
Hive presentation
Hive presentationHive presentation
Hive presentation
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Hadoop
HadoopHadoop
Hadoop
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Sqoop
SqoopSqoop
Sqoop
 

Viewers also liked

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureEduardo Castro
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]knowbigdata
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraMeera Raghu
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperknowbigdata
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsMeera Raghu
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsKalyan Hadoop
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder Sandra Morgan
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Differences between OpenStack and AWS
Differences between OpenStack and AWSDifferences between OpenStack and AWS
Differences between OpenStack and AWSEdureka!
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaEdureka!
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 

Viewers also liked (20)

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage Azure
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: Alankara
 
Jathiswara
JathiswaraJathiswara
Jathiswara
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music Notations
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshots
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Recorder lesson
Recorder lessonRecorder lesson
Recorder lesson
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Differences between OpenStack and AWS
Differences between OpenStack and AWSDifferences between OpenStack and AWS
Differences between OpenStack and AWS
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 

Similar to Introduction to HiveQL

ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseJonathan Bloom
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 

Similar to Introduction to HiveQL (20)

ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 

More from kristinferrier

So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?kristinferrier
 
Intro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and AuthenticationIntro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and Authenticationkristinferrier
 
Demystifying JSON in SQL Server
Demystifying JSON in SQL ServerDemystifying JSON in SQL Server
Demystifying JSON in SQL Serverkristinferrier
 
3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Mapkristinferrier
 

More from kristinferrier (6)

So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?
 
Intro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and AuthenticationIntro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and Authentication
 
Demystifying JSON in SQL Server
Demystifying JSON in SQL ServerDemystifying JSON in SQL Server
Demystifying JSON in SQL Server
 
SQL to JSON
SQL to JSONSQL to JSON
SQL to JSON
 
T-SQL Treats
T-SQL TreatsT-SQL Treats
T-SQL Treats
 
3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Introduction to HiveQL

  • 2. About Me – Kristin Ferrier  15+ Years in IT (Software development and BI development)  10+ years experience with SQL Server and 5+ years experience with Oracle  Co-founder OKCSQL  Currently Sr. Data Analyst at an energy company  Social Media  Twitter: @SQLenergy  Blog: http://www.kristinferrier.com
  • 3. Agenda  Hadoop – Very High Level  Hive and HiveQL - High Level  Getting started with Hive and HiveQL  HiveQL examples  Resources for getting started with HiveQL
  • 4. Hadoop  Open source software  Popular for storing, processing, and analyzing large volumes of data  For example, web logs or sensor data  Main distributions  Cloudera  Hortonworks  MapR (has some proprietary components)
  • 5. Hadoop 2.0 Main Components  Hadoop Distributed File System (HDFS)  Handles the data storage  MapReduce  Handles the processing  Works with key value pairs  Often written in Java  Can be written in any scripting language using the Streaming API of Hadoop
  • 6. Example MapReduce Code public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Code from Hortonworks tutorial found at http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
  • 7. Getting Started with Hadoop  What if I don’t know Java?  Or one of the Scripting languages using the Streaming API of Hadoop  Example: Python  That’s OK. If you know SQL, then Hive and HiveQL may be a great starting point for your Hadoop learning
  • 8. Hive Hive essentially allows us to use tables within Hadoop  Built on top of Apache Hadoop  Can access files stored in HDFS or HBase  HCatalog allows you to apply table structures to the data  HiveQL to query the data
  • 9. HiveQL HiveQL is SQL-like language for querying data from Hive  Follows some of the ANSI SQL-92 standard  Offers its own extensions  Implicitly turned into MapReduce jobs
  • 10. HiveQL – Key SQL items it has  SELECT  FROM  WHERE  GROUP BY  HAVING  JOINS – Some kinds
  • 11. HiveQL – Key differences from SQL  No transactions  No materialized views  Update and delete available only with Hive 0.14 and later  Hive 0.14 was released November 2014
  • 12. Accessing Hive  Hue  Web interface for Hadoop  Beeswax  Hive UI within Hue
  • 13. Hue
  • 15. Getting Data into Hive Tables  One way is to import a file into Hive  Can create the table at this time  Can import the data at this time  File can even come from a Windows box
  • 16. Importing a file Beeswax  Tables  Create a new table from a file
  • 17. Importing a file cont. Enter Table Name and Description  .. button
  • 18. Importing a file cont. Upload a file  Select your Windows file  Open
  • 19. Importing a file cont. After file uploads, double-click your file
  • 20. Importing a file cont. Choose a Delimiter
  • 21. Importing a file cont. Select column data types  Create Table
  • 22. Importing a file cont. Table has been created
  • 23. Query Editor  Write queries in the Query Editor
  • 25. Where, Group By, Min/Max
  • 26. Where, Group By, Min/Max - Results
  • 27. Aliasing, Ordering  Standard SQL syntax for Aliasing  SORT BY instead of ORDER BY– For ordering
  • 29. Joins  INNER, LEFT, RIGHT, and FULL OUTER  Equi Joins only: (table1.key = table2.key) is allowed but not (table1.key <> table2.key)  Extensions exist like LEFT SEMI JOIN
  • 31. INNER JOIN - Results
  • 32. LEFT SEMI JOIN  Left Semi Joins are less necessary starting with Hive 0.13  As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries SELECT a.key, a.value FROM a WHERE a.key in (SELECT b.key FROM B); can be rewritten to SELECT a.key, a.val FROM a LEFT SEMI JOIN b ON (a.key = b.key) Example from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
  • 33. Performance  Queries can take minutes to run. Focus is on analysis of large data sets.  Relational databases are still a strong solution for providing the faster performance of CRUD (create, read, update, and delete) operations required by OLTP systems.
  • 34. Summary  Hive essentially allows us to use tables in Hadoop  We can query them using HiveQL, which is similar to SQL  Knowing how to write MapReduce code is not required, as the HiveQL will be turned into MapReduce for us
  • 35. Getting Started Yourself  Hortonworks Sandbox  Portable Hadoop environment with tutorials  Even though the sandbox runs Hadoop on Linux, you can run the sandbox on your Windows machine and access it via a web browser  Available at http://hortonworks.com/sandbox
  • 36. Getting Started Yourself  Hive DML Reference  https://cwiki.apache.org/confluence/display/hive/languageManual+dml  Apache’s Hive Language Manual  https://cwiki.apache.org/confluence/display/Hive/LanguageManual  Treasure’s HiveQL Reference  http://docs.treasuredata.com/articles/hive  Network World – Comparing the top Hadoop Distros  http://www.networkworld.com/article/2369327/software/comparing-the- top-hadoop-distributions.html
  • 37. Contact Info  Social Media  Twitter: @SQLenergy  Blog: http://www.kristinferrier.com