SlideShare a Scribd company logo
1 of 37
Introduction to
HiveQL
BY KRISTIN FERRIER
About Me – Kristin Ferrier
 15+ Years in IT (Software development and BI development)
 10+ years experience with SQL Server and 5+ years experience with
Oracle
 Co-founder OKCSQL
 Currently Sr. Data Analyst at an energy company
 Social Media
 Twitter: @SQLenergy
 Blog: http://www.kristinferrier.com
Agenda
 Hadoop – Very High Level
 Hive and HiveQL - High Level
 Getting started with Hive and HiveQL
 HiveQL examples
 Resources for getting started with HiveQL
Hadoop
 Open source software
 Popular for storing, processing, and analyzing large volumes of data
 For example, web logs or sensor data
 Main distributions
 Cloudera
 Hortonworks
 MapR (has some proprietary components)
Hadoop 2.0 Main Components
 Hadoop Distributed File System (HDFS)
 Handles the data storage
 MapReduce
 Handles the processing
 Works with key value pairs
 Often written in Java
 Can be written in any scripting language using the Streaming API of
Hadoop
Example MapReduce Code
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
Code from Hortonworks tutorial found at http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
Getting Started with Hadoop
 What if I don’t know Java?
 Or one of the Scripting languages using the Streaming API of Hadoop
 Example: Python
 That’s OK. If you know SQL, then Hive and HiveQL may be a great
starting point for your Hadoop learning
Hive
Hive essentially allows us to use tables
within Hadoop
 Built on top of Apache Hadoop
 Can access files stored in HDFS or HBase
 HCatalog allows you to apply table structures to the data
 HiveQL to query the data
HiveQL
HiveQL is SQL-like language for
querying data from Hive
 Follows some of the ANSI SQL-92 standard
 Offers its own extensions
 Implicitly turned into MapReduce jobs
HiveQL – Key SQL items it has
 SELECT
 FROM
 WHERE
 GROUP BY
 HAVING
 JOINS – Some kinds
HiveQL – Key differences from SQL
 No transactions
 No materialized views
 Update and delete available only with Hive 0.14 and later
 Hive 0.14 was released November 2014
Accessing Hive
 Hue
 Web interface for Hadoop
 Beeswax
 Hive UI within Hue
Hue
Beeswax
Getting Data into Hive Tables
 One way is to import a file into Hive
 Can create the table at this time
 Can import the data at this time
 File can even come from a Windows box
Importing a file
Beeswax  Tables  Create a new table from a file
Importing a file cont.
Enter Table Name and Description  .. button
Importing a file cont.
Upload a file  Select your Windows file
 Open
Importing a file cont.
After file uploads, double-click your file
Importing a file cont.
Choose a Delimiter
Importing a file cont.
Select column data types  Create Table
Importing a file cont.
Table has been created
Query Editor
 Write queries in the Query Editor
Select
SELECT * FROM WEATHER
Where, Group By, Min/Max
Where, Group By, Min/Max - Results
Aliasing, Ordering
 Standard SQL syntax for Aliasing
 SORT BY instead of ORDER BY– For ordering
Aliasing, Ordering - Results
Joins
 INNER, LEFT, RIGHT, and FULL OUTER
 Equi Joins only: (table1.key = table2.key) is allowed but not (table1.key
<> table2.key)
 Extensions exist like LEFT SEMI JOIN
INNER JOIN
INNER JOIN - Results
LEFT SEMI JOIN
 Left Semi Joins are less necessary
starting with Hive 0.13
 As of Hive 0.13 the IN/NOT
IN/EXISTS/NOT EXISTS operators are
supported using subqueries
SELECT a.key, a.value
FROM a
WHERE a.key in
(SELECT b.key
FROM B);
can be rewritten to
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b ON (a.key = b.key)
Example from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Performance
 Queries can take minutes to run. Focus is on analysis of large data
sets.
 Relational databases are still a strong solution for providing the faster
performance of CRUD (create, read, update, and delete)
operations required by OLTP systems.
Summary
 Hive essentially allows us to use tables in Hadoop
 We can query them using HiveQL, which is similar to SQL
 Knowing how to write MapReduce code is not required, as the
HiveQL will be turned into MapReduce for us
Getting Started Yourself
 Hortonworks Sandbox
 Portable Hadoop environment with tutorials
 Even though the sandbox runs Hadoop on Linux, you can run the sandbox
on your Windows machine and access it via a web browser
 Available at http://hortonworks.com/sandbox
Getting Started Yourself
 Hive DML Reference
 https://cwiki.apache.org/confluence/display/hive/languageManual+dml
 Apache’s Hive Language Manual
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual
 Treasure’s HiveQL Reference
 http://docs.treasuredata.com/articles/hive
 Network World – Comparing the top Hadoop Distros
 http://www.networkworld.com/article/2369327/software/comparing-the-
top-hadoop-distributions.html
Contact Info
 Social Media
 Twitter: @SQLenergy
 Blog: http://www.kristinferrier.com

More Related Content

What's hot

Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitEric Wendelin
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 

What's hot (20)

Apache hive
Apache hiveApache hive
Apache hive
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 

Viewers also liked

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureEduardo Castro
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]knowbigdata
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraMeera Raghu
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperknowbigdata
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsMeera Raghu
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsKalyan Hadoop
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder Sandra Morgan
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Differences between OpenStack and AWS
Differences between OpenStack and AWSDifferences between OpenStack and AWS
Differences between OpenStack and AWSEdureka!
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaEdureka!
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!
 

Viewers also liked (20)

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage Azure
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: Alankara
 
Jathiswara
JathiswaraJathiswara
Jathiswara
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music Notations
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshots
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Recorder lesson
Recorder lessonRecorder lesson
Recorder lesson
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Differences between OpenStack and AWS
Differences between OpenStack and AWSDifferences between OpenStack and AWS
Differences between OpenStack and AWS
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 

Similar to Introduction to HiveQL

ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseJonathan Bloom
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 

Similar to Introduction to HiveQL (20)

ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 

More from kristinferrier

So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?kristinferrier
 
Intro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and AuthenticationIntro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and Authenticationkristinferrier
 
Demystifying JSON in SQL Server
Demystifying JSON in SQL ServerDemystifying JSON in SQL Server
Demystifying JSON in SQL Serverkristinferrier
 
3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Mapkristinferrier
 

More from kristinferrier (6)

So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?
 
Intro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and AuthenticationIntro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and Authentication
 
Demystifying JSON in SQL Server
Demystifying JSON in SQL ServerDemystifying JSON in SQL Server
Demystifying JSON in SQL Server
 
SQL to JSON
SQL to JSONSQL to JSON
SQL to JSON
 
T-SQL Treats
T-SQL TreatsT-SQL Treats
T-SQL Treats
 
3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map
 

Recently uploaded

Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 

Recently uploaded (20)

Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 

Introduction to HiveQL

  • 2. About Me – Kristin Ferrier  15+ Years in IT (Software development and BI development)  10+ years experience with SQL Server and 5+ years experience with Oracle  Co-founder OKCSQL  Currently Sr. Data Analyst at an energy company  Social Media  Twitter: @SQLenergy  Blog: http://www.kristinferrier.com
  • 3. Agenda  Hadoop – Very High Level  Hive and HiveQL - High Level  Getting started with Hive and HiveQL  HiveQL examples  Resources for getting started with HiveQL
  • 4. Hadoop  Open source software  Popular for storing, processing, and analyzing large volumes of data  For example, web logs or sensor data  Main distributions  Cloudera  Hortonworks  MapR (has some proprietary components)
  • 5. Hadoop 2.0 Main Components  Hadoop Distributed File System (HDFS)  Handles the data storage  MapReduce  Handles the processing  Works with key value pairs  Often written in Java  Can be written in any scripting language using the Streaming API of Hadoop
  • 6. Example MapReduce Code public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Code from Hortonworks tutorial found at http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
  • 7. Getting Started with Hadoop  What if I don’t know Java?  Or one of the Scripting languages using the Streaming API of Hadoop  Example: Python  That’s OK. If you know SQL, then Hive and HiveQL may be a great starting point for your Hadoop learning
  • 8. Hive Hive essentially allows us to use tables within Hadoop  Built on top of Apache Hadoop  Can access files stored in HDFS or HBase  HCatalog allows you to apply table structures to the data  HiveQL to query the data
  • 9. HiveQL HiveQL is SQL-like language for querying data from Hive  Follows some of the ANSI SQL-92 standard  Offers its own extensions  Implicitly turned into MapReduce jobs
  • 10. HiveQL – Key SQL items it has  SELECT  FROM  WHERE  GROUP BY  HAVING  JOINS – Some kinds
  • 11. HiveQL – Key differences from SQL  No transactions  No materialized views  Update and delete available only with Hive 0.14 and later  Hive 0.14 was released November 2014
  • 12. Accessing Hive  Hue  Web interface for Hadoop  Beeswax  Hive UI within Hue
  • 13. Hue
  • 15. Getting Data into Hive Tables  One way is to import a file into Hive  Can create the table at this time  Can import the data at this time  File can even come from a Windows box
  • 16. Importing a file Beeswax  Tables  Create a new table from a file
  • 17. Importing a file cont. Enter Table Name and Description  .. button
  • 18. Importing a file cont. Upload a file  Select your Windows file  Open
  • 19. Importing a file cont. After file uploads, double-click your file
  • 20. Importing a file cont. Choose a Delimiter
  • 21. Importing a file cont. Select column data types  Create Table
  • 22. Importing a file cont. Table has been created
  • 23. Query Editor  Write queries in the Query Editor
  • 25. Where, Group By, Min/Max
  • 26. Where, Group By, Min/Max - Results
  • 27. Aliasing, Ordering  Standard SQL syntax for Aliasing  SORT BY instead of ORDER BY– For ordering
  • 29. Joins  INNER, LEFT, RIGHT, and FULL OUTER  Equi Joins only: (table1.key = table2.key) is allowed but not (table1.key <> table2.key)  Extensions exist like LEFT SEMI JOIN
  • 31. INNER JOIN - Results
  • 32. LEFT SEMI JOIN  Left Semi Joins are less necessary starting with Hive 0.13  As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries SELECT a.key, a.value FROM a WHERE a.key in (SELECT b.key FROM B); can be rewritten to SELECT a.key, a.val FROM a LEFT SEMI JOIN b ON (a.key = b.key) Example from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
  • 33. Performance  Queries can take minutes to run. Focus is on analysis of large data sets.  Relational databases are still a strong solution for providing the faster performance of CRUD (create, read, update, and delete) operations required by OLTP systems.
  • 34. Summary  Hive essentially allows us to use tables in Hadoop  We can query them using HiveQL, which is similar to SQL  Knowing how to write MapReduce code is not required, as the HiveQL will be turned into MapReduce for us
  • 35. Getting Started Yourself  Hortonworks Sandbox  Portable Hadoop environment with tutorials  Even though the sandbox runs Hadoop on Linux, you can run the sandbox on your Windows machine and access it via a web browser  Available at http://hortonworks.com/sandbox
  • 36. Getting Started Yourself  Hive DML Reference  https://cwiki.apache.org/confluence/display/hive/languageManual+dml  Apache’s Hive Language Manual  https://cwiki.apache.org/confluence/display/Hive/LanguageManual  Treasure’s HiveQL Reference  http://docs.treasuredata.com/articles/hive  Network World – Comparing the top Hadoop Distros  http://www.networkworld.com/article/2369327/software/comparing-the- top-hadoop-distributions.html
  • 37. Contact Info  Social Media  Twitter: @SQLenergy  Blog: http://www.kristinferrier.com