Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data Analytics using
Hive
Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Scope of PPT – BIG Data Analytics via Hive
ᗍ Introduction to Big Data and Hadoop
ᗍ Understanding Hive and its Concepts
ᗍ Hive Architecture, Hive Meta Store and Hive Use-Cases
ᗍ BIG Data Analytics via Hive
ᗍ BIG Data & Hadoop Job Trends
ᗍ Webinar Session by Skillspeed
Get Started with BIG Data & Hadoop
Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Get Started with BIG Data & Hadoop
Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications
Systems / Enterprises generate huge amount of
data from Terabytes to and even Petabytes of
information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, managing unstructured and voluminous data is creating a big problem.Get Started with BIG Data & Hadoop
Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be utilized for processing & analyzing large data-sets.
Before that let’s understand what is Hadoop?
Get Started with BIG Data & Hadoop
Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop
Characteristics
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks (MPI,
GIRAPH)
YARN
Cluster Resource Management
Get Started with BIG Data & Hadoop
Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Origination
ᗍ Hive originated as an internal project in Facebook
ᗍ Later it was adopted in Apache as an open source project
ᗍ Facebook deals with massive amount of data (petabytes scale) and it needs to perform more than
75k ad-hoc queries on this massive amount of data
ᗍ Since the data is collected from multiple servers and is of diverse nature, any RDBMS system could
not fit as probable solution
ᗍ Map Reduce could be a natural choice, but it had its own limitations
Get Started with BIG Data & Hadoop
Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Hive?
ᗍ It is a query engine wrapper built on top of Map Reduce
ᗍ It is treated as Data Warehousing tool of Hadoop Ecosystem
ᗍ It is used for data analysis
ᗍ Primarily targeted to the users with SQL background
ᗍ Provides HiveQL, which is very similar to SQL
ᗍ It is used for managing and querying structured data
ᗍ Hadoop complexity is hidden from end users
ᗍ Java and Hadoop API knowledge is optional for core users
ᗍ Developed by Facebook and contributed to community
Get Started with BIG Data & Hadoop
Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Use Cases
Ad-hoc analysis
of underlying
data
Hypothesis
testing of the
underlying data
Big Data Testing
of huge data
sets
Analysis of the
processed data
Get Started with BIG Data & Hadoop
Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Components
Hive
Components
Driver
Shell
Metastore
Compiler
Execution
Engine
Get Started with BIG Data & Hadoop
Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Architecture
JDBC/ODBC
Browse, Query, DDL
Metastore
Thrift API
HIVE QL
Parser
Planner
Optimizer
Execution
User-defined
MapReduce Scripts
FileFormats
TextFile
SequenceFile
RCFile
Map Reduce HDFS
UDF/UDAF
Substr
Sum
Average
SerDe
CSV
Thrift
Regex
Get Started with BIG Data & Hadoop
Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Meta Store
Metastore
Derby
Metastore Metastore
MySQL
Metastore
Server JVM
Metastore
Server JVM
MySQL
Embedded Metastore Local Metastore Remote Metastore
HIVE Service JVM
DriverDriver Driver Driver Driver Driver
Get Started with BIG Data & Hadoop
Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime access
to Course
Content via
LMS
100% Placement
Assistance
24x7 Support
Get Started with BIG Data & Hadoop
Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big
Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and
Data Loading
Module 3
Introduction to Map
Reduce
Module 4
Advanced Map Reduce
Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and
Introduction to Hive
Module 7
Advanced Hive
Concepts
Module 8
Extending Hive and
HBase Introduction
Module 9
Advanced HBase and
Oozie Introduction
Module 10
Project Set-up
Discussion
Get Started with BIG Data & Hadoop
Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact Us
Get Started with BIG Data & Hadoop
Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots
http://iconizer.net/en/search/1/collection:Practika
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture

Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture

  • 1.
    Slide 1© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data Analytics using Hive
  • 2.
    Slide 2© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Scope of PPT – BIG Data Analytics via Hive ᗍ Introduction to Big Data and Hadoop ᗍ Understanding Hive and its Concepts ᗍ Hive Architecture, Hive Meta Store and Hive Use-Cases ᗍ BIG Data Analytics via Hive ᗍ BIG Data & Hadoop Job Trends ᗍ Webinar Session by Skillspeed Get Started with BIG Data & Hadoop
  • 3.
    Slide 3© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Get Started with BIG Data & Hadoop
  • 4.
    Slide 4© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information It’s very difficult to manage such huge data…… Get Started with BIG Data & Hadoop
  • 5.
    Slide 5© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Who Generates Big Data? Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data? Today, managing unstructured and voluminous data is creating a big problem.Get Started with BIG Data & Hadoop
  • 6.
    Slide 6© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop can be utilized for processing & analyzing large data-sets. Before that let’s understand what is Hadoop? Get Started with BIG Data & Hadoop
  • 7.
    Slide 7© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop and its Characteristics Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model It is an Open-source Data Management technology with scale-out storage and distributed processing Hadoop Characteristics Flexible Reliable Economical Scalable Get Started with BIG Data & Hadoop
  • 8.
    Slide 8© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop Ecosystem Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) YARN Cluster Resource Management Get Started with BIG Data & Hadoop
  • 9.
    Slide 9© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hive Origination ᗍ Hive originated as an internal project in Facebook ᗍ Later it was adopted in Apache as an open source project ᗍ Facebook deals with massive amount of data (petabytes scale) and it needs to perform more than 75k ad-hoc queries on this massive amount of data ᗍ Since the data is collected from multiple servers and is of diverse nature, any RDBMS system could not fit as probable solution ᗍ Map Reduce could be a natural choice, but it had its own limitations Get Started with BIG Data & Hadoop
  • 10.
    Slide 10© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com What is Hive? ᗍ It is a query engine wrapper built on top of Map Reduce ᗍ It is treated as Data Warehousing tool of Hadoop Ecosystem ᗍ It is used for data analysis ᗍ Primarily targeted to the users with SQL background ᗍ Provides HiveQL, which is very similar to SQL ᗍ It is used for managing and querying structured data ᗍ Hadoop complexity is hidden from end users ᗍ Java and Hadoop API knowledge is optional for core users ᗍ Developed by Facebook and contributed to community Get Started with BIG Data & Hadoop
  • 11.
    Slide 11© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hive Use Cases Ad-hoc analysis of underlying data Hypothesis testing of the underlying data Big Data Testing of huge data sets Analysis of the processed data Get Started with BIG Data & Hadoop
  • 12.
    Slide 12© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hive Components Hive Components Driver Shell Metastore Compiler Execution Engine Get Started with BIG Data & Hadoop
  • 13.
    Slide 13© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hive Architecture JDBC/ODBC Browse, Query, DDL Metastore Thrift API HIVE QL Parser Planner Optimizer Execution User-defined MapReduce Scripts FileFormats TextFile SequenceFile RCFile Map Reduce HDFS UDF/UDAF Substr Sum Average SerDe CSV Thrift Regex Get Started with BIG Data & Hadoop
  • 14.
    Slide 14© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hive Meta Store Metastore Derby Metastore Metastore MySQL Metastore Server JVM Metastore Server JVM MySQL Embedded Metastore Local Metastore Remote Metastore HIVE Service JVM DriverDriver Driver Driver Driver Driver Get Started with BIG Data & Hadoop
  • 15.
    Slide 15© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Job Trends – Hadoop Get Started with BIG Data & Hadoop
  • 16.
    Slide 16© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support Get Started with BIG Data & Hadoop
  • 17.
    Slide 17© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Course Topics Module 1 Introduction to Big Data and Hadoop Module 2 HDFS Internals, Hadoop Configurations and Data Loading Module 3 Introduction to Map Reduce Module 4 Advanced Map Reduce Concepts Module 5 Introduction to Pig Module 6 Advanced Pig and Introduction to Hive Module 7 Advanced Hive Concepts Module 8 Extending Hive and HBase Introduction Module 9 Advanced HBase and Oozie Introduction Module 10 Project Set-up Discussion Get Started with BIG Data & Hadoop
  • 18.
    Slide 18© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Corporate Partners Get Started with BIG Data & Hadoop
  • 19.
    Slide 19© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at sales@skillspeed.com Contact Us Get Started with BIG Data & Hadoop
  • 20.
    Slide 20© 2015BlueCamphor Technologies (P) Ltd. www.skillspeed.com Image References Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots http://iconizer.net/en/search/1/collection:Practika http://findicons.com/icon/66444/user_group http://www.virtualizor.com/tour https://accounts.it.et.byu.edu/ http://www.clipartsfree.net/tag/server.html http://www.gopixpic.com/16/time-clock-icon-png-download http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/ http://www.lincs.fr/research/areas/big-data/ http://www.counsellingpages.co.uk/ http://langfordsconsultancy.com/langfords-training-support-package/ http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010

Editor's Notes

  • #17 SkillSpeed offer virtual instructor lead courses designed to bridge the time to competency gap experienced by the technology companies. USP of SkillSpeed is the subject matter expert (SME). SMEs are industry experts and has a good understanding and hands-on industry experience of the technology. This industry expert designs, develops, and delivers the course. SkillSpeed provides you: Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Real life industry case studies  - Live Virtual Interactions Interaction with industry experts  - Lifetime access to all course content via the LMS   - 24*7 support   - 100% placement assistance