SlideShare a Scribd company logo
BIG DATA ANALYTICS
Apache Hive: Introduction, Architecture
BIG DATA ANALYTICS
Unit IV
Understanding HIVE:
 Introducing Hive
Hive services (Architecture)
 Hive services (Architecture)
 Builtin functions in Hive
 Hive DDL
 Data manipulation in Hive
Introduction to Apache HIVE
 Hive is an open source data warehouse system built on top of
Hadoop used for querying and analyzing large datasets
stored in Hadoop files.
 developed by Facebook.
 runs SQL like queries called HQL (Hive query language) which gets
internally converted to map reduce jobs.
 used to analyze structured data.
 best suited for batch jobs
Introduction to HIVE
 Hive: data warehousing application in Hadoop
Query language is HiveQL, variant of SQL
Tables stored on HDFS as flat files
Developed by Facebook, now open source
student = LOAD ‘student_details.txt' USING PigStorage(',')
as (id:int, fname:chararray, lname:chararray, mob:chararray, city:chararray);
student_order = ORDER student BY age DESC;
student_limit = LIMIT student_order 4;
Dump student_limit;
./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig
Developed by Facebook, now open source
 Pig: large-scale data processing system
Scripts are written in Pig Latin, a dataflow language
Developed by Yahoo!, now open source
 Common idea:
Provide higher-level language to facilitate large-data processing
Higher-level language “compiles down” to Hadoop jobs
./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig
Applications of HIVE
 Data Mining
 Log Processing
 Document Indexing
 Customer Facing Business Intelligence
 Predictive Modelling
 Hypothesis Testing
HIVE Features
 Hive is fast and scalable.
 It provides SQL-like queries (i.e., HQL) that are implicitly transformed to
MapReduce or Spark jobs.
It is capable of analyzing large datasets stored in HDFS.
 It is capable of analyzing large datasets stored in HDFS.
 It allows different storage types such as plain text, RCFile (Record Columnar
File), and HBase.
 It uses indexing to accelerate queries.
 It can operate on compressed data stored in the Hadoop ecosystem.
 It supports user-defined functions (UDFs) where user can provide its functionality.
HIVE Features
 A subset of SQL covering the most common statements
 Agile data types: Array, Map, Struct, and JSON objects
 Builtin functions and User Defined Functions and Aggregates
 Multiple users can query simultaneously
 MapReduce support; JDBC support; External table & ETL support
 Partitions and Buckets (for performance optimization)
 Views and Indexes.
 Hive supports Data Definition Language (DDL), Data
Manipulation Language (DML), and User Defined Functions (UDF).
HIVE
Architecture
API standard for Hive DBMS, enabling
Hive Web UI, Server and CLI provides a user
Driver – It acts like a controller which
Apache Thrift is basically protocols
which define how connections are
made between clients and servers.
API standard for Hive DBMS, enabling
JDBC/ODBC compliant applications to
interact with Hive through a standard
interface.
Hive Web UI, Server and CLI provides a user
interface for an external user to interact with
Hive, allows external clients to interact with Hive
over a network, similar to the JDBC or ODBC
protocols.
Driver – It acts like a controller which
receives the HiveQL statements. The
driver starts the execution of the
statement by creating sessions
Metastore – It stores metadata for
each of the tables like their schema
and location.
HIVE Builtin functions
 Mathematical Functions
 Date functions
Collection functions
 Collection functions
 String functions
Mathematical Functions
round(DOUBLE a) Returns the rounded BIGINT value of a.
round(DOUBLE a, INT d) Returns a rounded to d decimal places.
rand(), rand(INT seed)
Returns a random number (that changes from row to
row) that is distributed uniformly from 0 to 1. Specifying
the seed will make sure the generated random number
sequence is deterministic.
sequence is deterministic.
exp(DOUBLE a) Returns ea where e is the base of the natural logarithm.
ln(DOUBLE a) Returns the natural logarithm of the argument a.
log10(DOUBLE a) Returns the base-10 logarithm of the argument a.
log2(DOUBLE a) Returns the base-2 logarithm of the argument a.
pow(DOUBLE a, DOUBLE p) Returns ap.
sqrt(DOUBLE a) Returns the square root of a.
Collection Functions
size(Map<K.V>) Returns the number of elements in the map type.
size(Array<T>) Returns the number of elements in the array type.
map_keys(Map<K.V>)
Returns an unordered array containing the keys of
the input map.
map_values(Map<K.V>)
Returns an unordered array containing the values of
the input map.
sort_array(Array<T>)
Sorts the input array in ascending order according to
the natural ordering of the array elements
Date Functions
unix_timestamp() Gets current Unix timestamp in seconds.
unix_timestamp(string date) Converts time string to Unix timestamp (in seconds),
to_date(string timestamp) Returns the date part of a timestamp
year(string date) Returns the year part of a date
month(string date) Returns the month part of a date
day(string date) Returns the day part of a date
hour(string date) Returns the hour of the timestamp
minute(string date) Returns the minute of the timestamp.
second(string date) Returns the second of the timestamp.
current_date Returns the current date at the start of query
current_timestamp Returns current timestamp at the start of query evaluation
last_day(string date) Returns the last day of the month which the date belongs
String Functions
ascii(string str) Returns the numeric value of the first character of str.
character_length(string str) Returns the number of UTF-8 characters contained in str
concat(string|binary A, string|binary
B...)
Returns the string or bytes resulting from concatenating the
strings or bytes passed in as parameters in order.
find_in_set(string str, string strList)
Returns the first occurance of str in strList where strList is a
comma-delimited string.
length(string A) Returns the length of the string.
length(string A) Returns the length of the string.
locate(string substr, string str[, int pos])
Returns the position of the first occurrence of substr in str
after position pos.
lower(string A)
Returns the string resulting from converting all characters
to lower case.
ltrim(string A)
Returns the string resulting from trimming spaces from the
beginning(left hand side) of A.
hive>select year(‘2020-12-23 10:20:30’) from emp;
output: 2020
hive>select month(‘2020-12-23 10:20:30’) from emp;
output: 12
UNIX_TIMESTAMP() // returns 1970-01-01 00:00:00 using the default time zone.
UNIX_TIMESTAMP('2000-01-01 00:00:00') returns 946713600 string format
TO_DATE('2020-12-23 10:20:30') returns '2020-12-23'
output: 12
DAY('2020-12-23 10:30:30') returns 23
HOUR('2020-12-23 11:30:30') returns 11
MINUTE('2020-12-23 11:40:30') returns 40
SECOND('2020-12-23 11:20:50') returns 50
WEEKOFYEAR('2000-03-01 10:20:30') returns 9
DATEDIFF('2000-03-01', '2000-01-10') returns 51
DATE_ADD('2000-03-01', 5) returns '2000-03-06‘
 hive>select Id,Name, sqrt(Salary) from employee_data ;
 hive> select min(Salary) from employee_data;
hive> select max(Salary) from employee_data;
 hive> select max(Salary) from employee_data;
Hive Builtin function examples
 select concat("ABC","DEF"); // Returns ABCDEF
 select concat_ws("|","1","2","3"); // Returns 1|2|3
 select format_number(1234567,3); // Returns 1,234,567.000
select format_number(1234567,0); // Returns 1,234,567
 select format_number(1234567,0); // Returns 1,234,567
 select format_number(1234567.23456,3); // 1,234,567.235
 select locate("is","usa is a usa is a"); // Returns 5
 select locate("is","usa is a usa is a",6); // Returns 14
 select lower("UNITEDSTATES"); // unitedstates
 select ltrim(" UNITEDSTATES"); // UNITEDSTATES
Hive Builtin function examples
select reverse("ABCDEF"); // Returns FEDCBA
select rpad("UNITED",10,'0'); // Returns UNITED0000
select rpad("UNITED",10,' '); // Returns 'UNITED '
select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT
select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT
select rpad("UNITEDSTATES",10,null); // Returns NULL
select space(10); // Returns ' '
select split("USA IS A PLACE"," "); // Returns: ["USA","IS","A","PLACE"]
select substr("USA IS A PLACE",5,2); // Returns IS
select substr("USA IS A PLACE",5,100); // Returns IS A PLACE
select upper("unitedstates"); // Returns UNITEDSTATES
Hive Builtin function examples
select initcap("USA IS A PLACE"); // Returns: Usa Is A Place
select CONCAT(‘cmputer',‘science',‘engg'); //computerscienceengg
select substr('This is hive demo',9,4); // hive
select length('hadoop'); // 6
select length('hadoop'); // 6
select lpad('hadoop',8,'H'); // Hhhadoop
select rpad(‘hadoop’,8,’p’); // hadooppp
 select trim(' Hadoop '); // 'Hadoop‘
select ltrim(' Hadoop '); // 'Hadoop ‘
select rtrim(' Hadoop '); // ' Hadoop‘
select repeat('Hadoop',2); //HadoopHadoop
Hive Builtin function examples
select reverse('Hadoop'); // OK poodaH
select split('hadoop~supports~split~function','~');
// ["hadoop","supports","split","function"]
// ["hadoop","supports","split","function"]
select max(Salary) from employee_data;
select min(Salary) from employee_data;
select Id, upper(Name) from employee_data;
select Id, lower(Name) from employee_data;
HIVE Builtin functions
 Hive provides various in-built functions to perform
mathematical and aggregate type operations.
 Create a hive table using the following command:
 Create a hive table using the following command:
 create table employee_data (Id int, Name string , Salary
float) row format delimited fields terminated by ',' ;
 load data local inpath '/home/code/hive/emp_details' in
to table employee_data;
Mathematical Functions
 hive> select Id, Name, sqrt(Salary) from employee_dat
a ;
 hive> select min(Salary) from employee_data;
 hive> select min(Salary) from employee_data;
Aggregate Functions
 hive> select max(Salary) from employee_data;
Other Builtin Functions
Hive Builtin function examples
 select concat("ABC","DEF"); // Returns ABCDEF
 select concat_ws("|","1","2","3"); // Returns 1|2|3
 select format_number(1234567,3); // Returns 1,234,567.000
select format_number(1234567,0); // Returns 1,234,567
 select format_number(1234567,0); // Returns 1,234,567
 select format_number(1234567.23456,3); // 1,234,567.235
 select locate("is","usa is a usa is a"); // Returns 5
 select locate("is","usa is a usa is a",6); // Returns 14
 select lower("UNITEDSTATES"); // unitedstates
 select lcase("UNITEDSTATES"); // unitedstates
 select ltrim(" UNITEDSTATES"); // UNITEDSTATES
select reverse("ABCDEF"); // Returns FEDCBA
select rpad("UNITED",10,'0'); // Returns UNITED0000
select rpad("UNITED",10,' '); // Returns 'UNITED
select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT
select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT
select rpad("UNITEDSTATES",10,null); // Returns NULL
select space(10); ==> Returns ' '
select split("USA IS A PLACE"," "); // Returns: ["USA","IS","A","PLACE"]
select substr("USA IS A PLACE",5,2); // Returns IS
select substr("USA IS A PLACE",5,100); // Returns IS A PLACE
select upper("unitedstates"); // Returns UNITEDSTATES
select initcap("USA IS A PLACE"); // Returns: Usa Is A Place
select CONCAT(‘cmputer',‘science',‘engg'); //computerscienceengg
select substr('This is hive demo',9,4); // hive
select length('hadoop'); // 6
select length('hadoop'); // 6
select lpad('hadoop',8,'H'); // Hhhadoop
select rpad(‘hadoop’,8,’p’); // hadooppp
 select trim(' Hadoop '); // 'Hadoop‘
select ltrim(' Hadoop '); // 'Hadoop ‘
select rtrim(' Hadoop '); // ' Hadoop‘
select repeat('Hadoop',2); //HadoopHadoop
select reverse('Hadoop'); // OK poodaH
select split('hadoop~supports~split~function','~');
// ["hadoop","supports","split","function"]
// ["hadoop","supports","split","function"]
select max(Salary) from employee_data;
select min(Salary) from employee_data;
select Id, upper(Name) from employee_data;
select Id, lower(Name) from employee_data;
HIVE DDL Commands
 CREATE
 SHOW
DDL Command Use With
CREATE Database, Table
SHOW
Databases, Tables, Table
Properties, Partitions, Functions,
 DESCRIBE
 USE
 DROP
 ALTER
 TRUNCATE
SHOW Properties, Partitions, Functions,
Index
DESCRIBE Database, Table, view
USE Database
DROP Database, Table
ALTER Database, Table
TRUNCATE Table// Deletes all contents
create table txnrecords(txnnno INT, txndate
STRING, custno INT, amount DOUBLE, category
STRING, product STRING, city STRING, State
STRING, product STRING, city STRING, State
STRING, spendby STRING) row format delimited
fields terminated by ',' stored as textfile.
drop table txnrecords
ALTER TABLE employee RENAME TO employee2;
hive> create database if not exists financials;
hive> create table records (year string, temperature int, quantity int)
> row format delimited
> fields terminated by 't';
hive> create table employees (
> name string,
> salary float,
> salary float,
> subordinates array<string>,
> deductions map<string, float>,
> address struct<street:string, city:string, state:string, zip:int>);
hive> create database financials2
> with dbproperties('creator' = ‘Sreedhar', 'date' = '2020-12-19');
HiveQL Data Manipulation
 Load
Student_data.txt
LOAD statement in Hive is used to move
data files into the locations corresponding
to Hive tables
to Hive tables
LOAD DATA [LOCAL] INPATH 'hdfsfilepath/localfilepath'
[OVERWRITE] INTO TABLE existing_table_name
Select
 SELECT statement in Hive is similar to the SELECT
statement in SQL used for retrieving data from the
database.
database.
 SELECT col1,col2 FROM tablename;
INSERT Command
 INSERT command in Hive loads the data into a Hive
table.
 INSERT INTO TABLE tablename1 [PARTITION
 INSERT INTO TABLE tablename1 [PARTITION
(partcol1=val1, partcol2=val2 ...)] select_statement1
FROM from_statement;
DELETE command
 DELETE statement in Hive deletes the table data. If
the WHERE clause is specified, then it deletes the
rows that satisfy the condition in where clause.
rows that satisfy the condition in where clause.
 DELETE FROM tablename [WHERE expression];
 DELETE FROM student WHERE roll_no=104;
HiveQL Data Manipulation
 Load,
 Insert,
Export Data and
 Export Data and
 Create Table
CREATE TABLE
 Hive> CREATE TABLE Employees AS SELECT
eno,ename,sal,address FROM emp WHERE
country=’IN’;
country=’IN’;
Load
 Hive>LOAD DATA LOCAL INPATH
'/home/hduser/sampledata/users.txt‘
LOCAL’ indicates the source data is on local file systemLocal
 LOCAL’ indicates the source data is on local file systemLocal
data will be copied into the final destination (HDFS file system)
by HiveIf ‘Local’ is not specified, the file is assumed to be on
HDFSHive does not do any data transformation while loading
the data
INSERT
 Hive> INSERT OVERWRITE TABLE Employee
Partition (country= ‘IN’,state=’KA’) SELECT * FROM
emp_stage ese WHERE ese.country=’IN’ AND
emp_stage ese WHERE ese.country=’IN’ AND
ese.state=’KA’;
Exporting Data out of Hive
 Hive>INSERT OVERWRITE LOCAL
DIRECTORY '/home/hadoop/data' SELECT name,
age FROM aliens WHERE date_sighted >'2014-09-
age FROM aliens WHERE date_sighted >'2014-09-
15'
Unit V
NoSQL Data Management:
 Introducing to NoSQL,
characteristics of NoSQL
 characteristics of NoSQL
 Types of NoSQL data models
 Schema less databases
 NoSQL database stands for "Not Only SQL" or
"Not SQL."
 NoSQL Database is a non-relational Data
 NoSQL Database is a non-relational Data
Management System, that does not require a fixed
schema.
 NoSQL is used for Big data and real-time web
apps.
Features of NoSQL
 Non-relational
 NoSQL databases never follow the relational model
 Never provide tables with flat fixed-column records
 Work with self-contained aggregates or BLOBs
 Work with self-contained aggregates or BLOBs
 Doesn't require object-relational mapping and data normalization
 No complex features like query languages, query planners, referential integrity joins, ACID
 Schema-free
 NoSQL databases are either schema-free or have relaxed schemas
 Do not require any sort of definition of the schema of the data
 Offers heterogeneous structures of data in the same domain
Advantages of NoSQL
 Can be used as Primary or Analytic Data Source
 Big Data Capability
 No Single Point of Failure
 Easy Replication
 No Need for Separate Caching Layer
 Support Key Developer Languages and
Platforms
 Simple to implement than using RDBMS
 It can serve as the primary data source for
online applications.
Handles big data which manages data velocity,
 No Need for Separate Caching Layer
 It provides fast performance and horizontal
scalability.
 Can handle structured, semi-structured, and
unstructured data with equal effect
 Object-oriented programming which is easy to
use and flexible
 NoSQL databases don't need a dedicated high-
performance server
 Handles big data which manages data velocity,
variety, volume, and complexity
 Excels at distributed database and multi-data
center operations
 Eliminates the need for a specific caching layer
to store data
 Offers a flexible schema design which can
easily be altered without downtime or service
disruption
Types of NoSQL Databases
 Key-value Pair Based
 Column-oriented Graph
Graphs based
 Graphs based
 Document-oriented
Key Value Pair Based
 Data is stored in key/value pairs. It is designed in
such a way to handle lots of data and heavy load.
 Key-value pair storage databases store data as a
 Key-value pair storage databases store data as a
hash table where each key is unique, and the value
can be a JSON, BLOB(Binary Large Objects), string,
etc.
Column-based
 Column-oriented databases work on columns and
are based on BigTable paper by Google. Every
column is treated separately. Values of single
column is treated separately. Values of single
column databases are stored contiguously.
Document-Oriented:
 Document-Oriented NoSQL DB stores and retrieves
data as a key value pair but the value part is
stored as a document. The document is stored in
stored as a document. The document is stored in
JSON or XML formats. The value is understood by
the DB and can be queried.
Graph-Based
 A graph type database stores entities as well the
relations amongst those entities. The entity is stored
as a node with the relationship as edges. An edge
as a node with the relationship as edges. An edge
gives a relationship between nodes. Every node and
edge has a unique identifier.
Tools for NoSQL
 Wide column: Accumulo, Cassandra, Scylla, HBase.
 Document: Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase,
Cosmos DB, eXist-db, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx,
RethinkDB
RethinkDB
 Key–value: Aerospike, Apache Ignite, ArangoDB, Berkeley DB, Couchbase,
Dynamo, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL
Database, OrientDB, Redis, Riak, SciDB, SDBM/Flat File dbm, ZooKeeper
 Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic,
Neo4J, OrientDB, Virtuoso

More Related Content

What's hot

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
Mohammed Guller
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Hive presentation
Hive presentationHive presentation
Hive presentation
Hitesh Agrawal
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
GauravBiswas9
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
Sadhik7
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
AnandMHadoop
 
What Is RDD In Spark? | Edureka
What Is RDD In Spark? | EdurekaWhat Is RDD In Spark? | Edureka
What Is RDD In Spark? | Edureka
Edureka!
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Duyhai Doan
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
Abdullah Çetin ÇAVDAR
 
Family tree in java
Family tree in javaFamily tree in java
Family tree in java
Programming Homework Help
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 

What's hot (20)

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hive presentation
Hive presentationHive presentation
Hive presentation
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
What Is RDD In Spark? | Edureka
What Is RDD In Spark? | EdurekaWhat Is RDD In Spark? | Edureka
What Is RDD In Spark? | Edureka
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
Family tree in java
Family tree in javaFamily tree in java
Family tree in java
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 

Similar to Big Data Analytics Part2

Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
Hortonworks
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
Guy Lebanon
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
R language introduction
R language introductionR language introduction
R language introduction
Shashwat Shriparv
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
Jay Coskey
 
Power of functions in a typed world
Power of functions in a typed worldPower of functions in a typed world
Power of functions in a typed world
Debasish Ghosh
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developers
Keshav Murthy
 
Python Pandas
Python PandasPython Pandas
Python Pandas
Sunil OS
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Sages
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
Baishampayan Ghose
 
R-Shiny Cheat sheet
R-Shiny Cheat sheetR-Shiny Cheat sheet
R-Shiny Cheat sheet
Dr. Volkan OBAN
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
Twitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian NetworkTwitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian Network
Hendy Irawan
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
Fernando Rodriguez
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
Wojciech Pituła
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
Julian Hyde
 

Similar to Big Data Analytics Part2 (20)

Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
R language introduction
R language introductionR language introduction
R language introduction
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
 
Power of functions in a typed world
Power of functions in a typed worldPower of functions in a typed world
Power of functions in a typed world
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developers
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
R-Shiny Cheat sheet
R-Shiny Cheat sheetR-Shiny Cheat sheet
R-Shiny Cheat sheet
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Twitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian NetworkTwitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian Network
 
Sql server lab_2
Sql server lab_2Sql server lab_2
Sql server lab_2
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 

More from Sreedhar Chowdam

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
Sreedhar Chowdam
 
Design and Analysis of Algorithms (Knapsack Problem)
Design and Analysis of Algorithms (Knapsack Problem)Design and Analysis of Algorithms (Knapsack Problem)
Design and Analysis of Algorithms (Knapsack Problem)
Sreedhar Chowdam
 
DCCN Network Layer congestion control TCP
DCCN Network Layer congestion control TCPDCCN Network Layer congestion control TCP
DCCN Network Layer congestion control TCP
Sreedhar Chowdam
 
Data Communication and Computer Networks
Data Communication and Computer NetworksData Communication and Computer Networks
Data Communication and Computer Networks
Sreedhar Chowdam
 
DCCN Unit 1.pdf
DCCN Unit 1.pdfDCCN Unit 1.pdf
DCCN Unit 1.pdf
Sreedhar Chowdam
 
Data Communication & Computer Networks
Data Communication & Computer NetworksData Communication & Computer Networks
Data Communication & Computer Networks
Sreedhar Chowdam
 
PPS Notes Unit 5.pdf
PPS Notes Unit 5.pdfPPS Notes Unit 5.pdf
PPS Notes Unit 5.pdf
Sreedhar Chowdam
 
PPS Arrays Matrix operations
PPS Arrays Matrix operationsPPS Arrays Matrix operations
PPS Arrays Matrix operations
Sreedhar Chowdam
 
Programming for Problem Solving
Programming for Problem SolvingProgramming for Problem Solving
Programming for Problem Solving
Sreedhar Chowdam
 
Python Programming: Lists, Modules, Exceptions
Python Programming: Lists, Modules, ExceptionsPython Programming: Lists, Modules, Exceptions
Python Programming: Lists, Modules, Exceptions
Sreedhar Chowdam
 
Python Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdfPython Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdf
Sreedhar Chowdam
 
Python Programming Strings
Python Programming StringsPython Programming Strings
Python Programming Strings
Sreedhar Chowdam
 
Python Programming
Python Programming Python Programming
Python Programming
Sreedhar Chowdam
 
Python Programming
Python ProgrammingPython Programming
Python Programming
Sreedhar Chowdam
 
C Recursion, Pointers, Dynamic memory management
C Recursion, Pointers, Dynamic memory managementC Recursion, Pointers, Dynamic memory management
C Recursion, Pointers, Dynamic memory management
Sreedhar Chowdam
 
C Programming Storage classes, Recursion
C Programming Storage classes, RecursionC Programming Storage classes, Recursion
C Programming Storage classes, Recursion
Sreedhar Chowdam
 
Programming For Problem Solving Lecture Notes
Programming For Problem Solving Lecture NotesProgramming For Problem Solving Lecture Notes
Programming For Problem Solving Lecture Notes
Sreedhar Chowdam
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021
Sreedhar Chowdam
 
Computer Networks Lecture Notes 01
Computer Networks Lecture Notes 01Computer Networks Lecture Notes 01
Computer Networks Lecture Notes 01
Sreedhar Chowdam
 

More from Sreedhar Chowdam (20)

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
Design and Analysis of Algorithms (Knapsack Problem)
Design and Analysis of Algorithms (Knapsack Problem)Design and Analysis of Algorithms (Knapsack Problem)
Design and Analysis of Algorithms (Knapsack Problem)
 
DCCN Network Layer congestion control TCP
DCCN Network Layer congestion control TCPDCCN Network Layer congestion control TCP
DCCN Network Layer congestion control TCP
 
Data Communication and Computer Networks
Data Communication and Computer NetworksData Communication and Computer Networks
Data Communication and Computer Networks
 
DCCN Unit 1.pdf
DCCN Unit 1.pdfDCCN Unit 1.pdf
DCCN Unit 1.pdf
 
Data Communication & Computer Networks
Data Communication & Computer NetworksData Communication & Computer Networks
Data Communication & Computer Networks
 
PPS Notes Unit 5.pdf
PPS Notes Unit 5.pdfPPS Notes Unit 5.pdf
PPS Notes Unit 5.pdf
 
PPS Arrays Matrix operations
PPS Arrays Matrix operationsPPS Arrays Matrix operations
PPS Arrays Matrix operations
 
Programming for Problem Solving
Programming for Problem SolvingProgramming for Problem Solving
Programming for Problem Solving
 
Python Programming: Lists, Modules, Exceptions
Python Programming: Lists, Modules, ExceptionsPython Programming: Lists, Modules, Exceptions
Python Programming: Lists, Modules, Exceptions
 
Python Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdfPython Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdf
 
Python Programming Strings
Python Programming StringsPython Programming Strings
Python Programming Strings
 
Python Programming
Python Programming Python Programming
Python Programming
 
Python Programming
Python ProgrammingPython Programming
Python Programming
 
C Recursion, Pointers, Dynamic memory management
C Recursion, Pointers, Dynamic memory managementC Recursion, Pointers, Dynamic memory management
C Recursion, Pointers, Dynamic memory management
 
C Programming Storage classes, Recursion
C Programming Storage classes, RecursionC Programming Storage classes, Recursion
C Programming Storage classes, Recursion
 
Programming For Problem Solving Lecture Notes
Programming For Problem Solving Lecture NotesProgramming For Problem Solving Lecture Notes
Programming For Problem Solving Lecture Notes
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021
 
Computer Networks Lecture Notes 01
Computer Networks Lecture Notes 01Computer Networks Lecture Notes 01
Computer Networks Lecture Notes 01
 

Recently uploaded

power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 

Recently uploaded (20)

power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 

Big Data Analytics Part2

  • 1. BIG DATA ANALYTICS Apache Hive: Introduction, Architecture BIG DATA ANALYTICS
  • 2. Unit IV Understanding HIVE:  Introducing Hive Hive services (Architecture)  Hive services (Architecture)  Builtin functions in Hive  Hive DDL  Data manipulation in Hive
  • 3. Introduction to Apache HIVE  Hive is an open source data warehouse system built on top of Hadoop used for querying and analyzing large datasets stored in Hadoop files.  developed by Facebook.  runs SQL like queries called HQL (Hive query language) which gets internally converted to map reduce jobs.  used to analyze structured data.  best suited for batch jobs
  • 4. Introduction to HIVE  Hive: data warehousing application in Hadoop Query language is HiveQL, variant of SQL Tables stored on HDFS as flat files Developed by Facebook, now open source student = LOAD ‘student_details.txt' USING PigStorage(',') as (id:int, fname:chararray, lname:chararray, mob:chararray, city:chararray); student_order = ORDER student BY age DESC; student_limit = LIMIT student_order 4; Dump student_limit; ./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig Developed by Facebook, now open source  Pig: large-scale data processing system Scripts are written in Pig Latin, a dataflow language Developed by Yahoo!, now open source  Common idea: Provide higher-level language to facilitate large-data processing Higher-level language “compiles down” to Hadoop jobs ./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig
  • 5. Applications of HIVE  Data Mining  Log Processing  Document Indexing  Customer Facing Business Intelligence  Predictive Modelling  Hypothesis Testing
  • 6. HIVE Features  Hive is fast and scalable.  It provides SQL-like queries (i.e., HQL) that are implicitly transformed to MapReduce or Spark jobs. It is capable of analyzing large datasets stored in HDFS.  It is capable of analyzing large datasets stored in HDFS.  It allows different storage types such as plain text, RCFile (Record Columnar File), and HBase.  It uses indexing to accelerate queries.  It can operate on compressed data stored in the Hadoop ecosystem.  It supports user-defined functions (UDFs) where user can provide its functionality.
  • 7. HIVE Features  A subset of SQL covering the most common statements  Agile data types: Array, Map, Struct, and JSON objects  Builtin functions and User Defined Functions and Aggregates  Multiple users can query simultaneously  MapReduce support; JDBC support; External table & ETL support  Partitions and Buckets (for performance optimization)  Views and Indexes.  Hive supports Data Definition Language (DDL), Data Manipulation Language (DML), and User Defined Functions (UDF).
  • 8. HIVE Architecture API standard for Hive DBMS, enabling Hive Web UI, Server and CLI provides a user Driver – It acts like a controller which Apache Thrift is basically protocols which define how connections are made between clients and servers. API standard for Hive DBMS, enabling JDBC/ODBC compliant applications to interact with Hive through a standard interface. Hive Web UI, Server and CLI provides a user interface for an external user to interact with Hive, allows external clients to interact with Hive over a network, similar to the JDBC or ODBC protocols. Driver – It acts like a controller which receives the HiveQL statements. The driver starts the execution of the statement by creating sessions Metastore – It stores metadata for each of the tables like their schema and location.
  • 9. HIVE Builtin functions  Mathematical Functions  Date functions Collection functions  Collection functions  String functions
  • 10. Mathematical Functions round(DOUBLE a) Returns the rounded BIGINT value of a. round(DOUBLE a, INT d) Returns a rounded to d decimal places. rand(), rand(INT seed) Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifying the seed will make sure the generated random number sequence is deterministic. sequence is deterministic. exp(DOUBLE a) Returns ea where e is the base of the natural logarithm. ln(DOUBLE a) Returns the natural logarithm of the argument a. log10(DOUBLE a) Returns the base-10 logarithm of the argument a. log2(DOUBLE a) Returns the base-2 logarithm of the argument a. pow(DOUBLE a, DOUBLE p) Returns ap. sqrt(DOUBLE a) Returns the square root of a.
  • 11. Collection Functions size(Map<K.V>) Returns the number of elements in the map type. size(Array<T>) Returns the number of elements in the array type. map_keys(Map<K.V>) Returns an unordered array containing the keys of the input map. map_values(Map<K.V>) Returns an unordered array containing the values of the input map. sort_array(Array<T>) Sorts the input array in ascending order according to the natural ordering of the array elements
  • 12. Date Functions unix_timestamp() Gets current Unix timestamp in seconds. unix_timestamp(string date) Converts time string to Unix timestamp (in seconds), to_date(string timestamp) Returns the date part of a timestamp year(string date) Returns the year part of a date month(string date) Returns the month part of a date day(string date) Returns the day part of a date hour(string date) Returns the hour of the timestamp minute(string date) Returns the minute of the timestamp. second(string date) Returns the second of the timestamp. current_date Returns the current date at the start of query current_timestamp Returns current timestamp at the start of query evaluation last_day(string date) Returns the last day of the month which the date belongs
  • 13. String Functions ascii(string str) Returns the numeric value of the first character of str. character_length(string str) Returns the number of UTF-8 characters contained in str concat(string|binary A, string|binary B...) Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. find_in_set(string str, string strList) Returns the first occurance of str in strList where strList is a comma-delimited string. length(string A) Returns the length of the string. length(string A) Returns the length of the string. locate(string substr, string str[, int pos]) Returns the position of the first occurrence of substr in str after position pos. lower(string A) Returns the string resulting from converting all characters to lower case. ltrim(string A) Returns the string resulting from trimming spaces from the beginning(left hand side) of A.
  • 14. hive>select year(‘2020-12-23 10:20:30’) from emp; output: 2020 hive>select month(‘2020-12-23 10:20:30’) from emp; output: 12 UNIX_TIMESTAMP() // returns 1970-01-01 00:00:00 using the default time zone. UNIX_TIMESTAMP('2000-01-01 00:00:00') returns 946713600 string format TO_DATE('2020-12-23 10:20:30') returns '2020-12-23' output: 12 DAY('2020-12-23 10:30:30') returns 23 HOUR('2020-12-23 11:30:30') returns 11 MINUTE('2020-12-23 11:40:30') returns 40 SECOND('2020-12-23 11:20:50') returns 50 WEEKOFYEAR('2000-03-01 10:20:30') returns 9 DATEDIFF('2000-03-01', '2000-01-10') returns 51 DATE_ADD('2000-03-01', 5) returns '2000-03-06‘
  • 15.  hive>select Id,Name, sqrt(Salary) from employee_data ;  hive> select min(Salary) from employee_data; hive> select max(Salary) from employee_data;  hive> select max(Salary) from employee_data;
  • 16. Hive Builtin function examples  select concat("ABC","DEF"); // Returns ABCDEF  select concat_ws("|","1","2","3"); // Returns 1|2|3  select format_number(1234567,3); // Returns 1,234,567.000 select format_number(1234567,0); // Returns 1,234,567  select format_number(1234567,0); // Returns 1,234,567  select format_number(1234567.23456,3); // 1,234,567.235  select locate("is","usa is a usa is a"); // Returns 5  select locate("is","usa is a usa is a",6); // Returns 14  select lower("UNITEDSTATES"); // unitedstates  select ltrim(" UNITEDSTATES"); // UNITEDSTATES
  • 17. Hive Builtin function examples select reverse("ABCDEF"); // Returns FEDCBA select rpad("UNITED",10,'0'); // Returns UNITED0000 select rpad("UNITED",10,' '); // Returns 'UNITED ' select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT select rpad("UNITEDSTATES",10,null); // Returns NULL select space(10); // Returns ' ' select split("USA IS A PLACE"," "); // Returns: ["USA","IS","A","PLACE"] select substr("USA IS A PLACE",5,2); // Returns IS select substr("USA IS A PLACE",5,100); // Returns IS A PLACE select upper("unitedstates"); // Returns UNITEDSTATES
  • 18. Hive Builtin function examples select initcap("USA IS A PLACE"); // Returns: Usa Is A Place select CONCAT(‘cmputer',‘science',‘engg'); //computerscienceengg select substr('This is hive demo',9,4); // hive select length('hadoop'); // 6 select length('hadoop'); // 6 select lpad('hadoop',8,'H'); // Hhhadoop select rpad(‘hadoop’,8,’p’); // hadooppp  select trim(' Hadoop '); // 'Hadoop‘ select ltrim(' Hadoop '); // 'Hadoop ‘ select rtrim(' Hadoop '); // ' Hadoop‘ select repeat('Hadoop',2); //HadoopHadoop
  • 19. Hive Builtin function examples select reverse('Hadoop'); // OK poodaH select split('hadoop~supports~split~function','~'); // ["hadoop","supports","split","function"] // ["hadoop","supports","split","function"] select max(Salary) from employee_data; select min(Salary) from employee_data; select Id, upper(Name) from employee_data; select Id, lower(Name) from employee_data;
  • 20. HIVE Builtin functions  Hive provides various in-built functions to perform mathematical and aggregate type operations.  Create a hive table using the following command:  Create a hive table using the following command:  create table employee_data (Id int, Name string , Salary float) row format delimited fields terminated by ',' ;  load data local inpath '/home/code/hive/emp_details' in to table employee_data;
  • 22.  hive> select Id, Name, sqrt(Salary) from employee_dat a ;  hive> select min(Salary) from employee_data;  hive> select min(Salary) from employee_data;
  • 24.  hive> select max(Salary) from employee_data;
  • 26. Hive Builtin function examples  select concat("ABC","DEF"); // Returns ABCDEF  select concat_ws("|","1","2","3"); // Returns 1|2|3  select format_number(1234567,3); // Returns 1,234,567.000 select format_number(1234567,0); // Returns 1,234,567  select format_number(1234567,0); // Returns 1,234,567  select format_number(1234567.23456,3); // 1,234,567.235  select locate("is","usa is a usa is a"); // Returns 5  select locate("is","usa is a usa is a",6); // Returns 14  select lower("UNITEDSTATES"); // unitedstates  select lcase("UNITEDSTATES"); // unitedstates  select ltrim(" UNITEDSTATES"); // UNITEDSTATES
  • 27. select reverse("ABCDEF"); // Returns FEDCBA select rpad("UNITED",10,'0'); // Returns UNITED0000 select rpad("UNITED",10,' '); // Returns 'UNITED select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT select rpad("UNITEDSTATES",10,'0'); // Returns UNITEDSTAT select rpad("UNITEDSTATES",10,null); // Returns NULL select space(10); ==> Returns ' ' select split("USA IS A PLACE"," "); // Returns: ["USA","IS","A","PLACE"] select substr("USA IS A PLACE",5,2); // Returns IS select substr("USA IS A PLACE",5,100); // Returns IS A PLACE select upper("unitedstates"); // Returns UNITEDSTATES
  • 28. select initcap("USA IS A PLACE"); // Returns: Usa Is A Place select CONCAT(‘cmputer',‘science',‘engg'); //computerscienceengg select substr('This is hive demo',9,4); // hive select length('hadoop'); // 6 select length('hadoop'); // 6 select lpad('hadoop',8,'H'); // Hhhadoop select rpad(‘hadoop’,8,’p’); // hadooppp  select trim(' Hadoop '); // 'Hadoop‘ select ltrim(' Hadoop '); // 'Hadoop ‘ select rtrim(' Hadoop '); // ' Hadoop‘ select repeat('Hadoop',2); //HadoopHadoop
  • 29. select reverse('Hadoop'); // OK poodaH select split('hadoop~supports~split~function','~'); // ["hadoop","supports","split","function"] // ["hadoop","supports","split","function"] select max(Salary) from employee_data; select min(Salary) from employee_data; select Id, upper(Name) from employee_data; select Id, lower(Name) from employee_data;
  • 30.
  • 31. HIVE DDL Commands  CREATE  SHOW DDL Command Use With CREATE Database, Table SHOW Databases, Tables, Table Properties, Partitions, Functions,  DESCRIBE  USE  DROP  ALTER  TRUNCATE SHOW Properties, Partitions, Functions, Index DESCRIBE Database, Table, view USE Database DROP Database, Table ALTER Database, Table TRUNCATE Table// Deletes all contents
  • 32. create table txnrecords(txnnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, State STRING, product STRING, city STRING, State STRING, spendby STRING) row format delimited fields terminated by ',' stored as textfile. drop table txnrecords ALTER TABLE employee RENAME TO employee2;
  • 33. hive> create database if not exists financials; hive> create table records (year string, temperature int, quantity int) > row format delimited > fields terminated by 't'; hive> create table employees ( > name string, > salary float, > salary float, > subordinates array<string>, > deductions map<string, float>, > address struct<street:string, city:string, state:string, zip:int>); hive> create database financials2 > with dbproperties('creator' = ‘Sreedhar', 'date' = '2020-12-19');
  • 34. HiveQL Data Manipulation  Load Student_data.txt LOAD statement in Hive is used to move data files into the locations corresponding to Hive tables to Hive tables LOAD DATA [LOCAL] INPATH 'hdfsfilepath/localfilepath' [OVERWRITE] INTO TABLE existing_table_name
  • 35. Select  SELECT statement in Hive is similar to the SELECT statement in SQL used for retrieving data from the database. database.  SELECT col1,col2 FROM tablename;
  • 36. INSERT Command  INSERT command in Hive loads the data into a Hive table.  INSERT INTO TABLE tablename1 [PARTITION  INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
  • 37.
  • 38.
  • 39. DELETE command  DELETE statement in Hive deletes the table data. If the WHERE clause is specified, then it deletes the rows that satisfy the condition in where clause. rows that satisfy the condition in where clause.  DELETE FROM tablename [WHERE expression];  DELETE FROM student WHERE roll_no=104;
  • 40. HiveQL Data Manipulation  Load,  Insert, Export Data and  Export Data and  Create Table
  • 41. CREATE TABLE  Hive> CREATE TABLE Employees AS SELECT eno,ename,sal,address FROM emp WHERE country=’IN’; country=’IN’;
  • 42. Load  Hive>LOAD DATA LOCAL INPATH '/home/hduser/sampledata/users.txt‘ LOCAL’ indicates the source data is on local file systemLocal  LOCAL’ indicates the source data is on local file systemLocal data will be copied into the final destination (HDFS file system) by HiveIf ‘Local’ is not specified, the file is assumed to be on HDFSHive does not do any data transformation while loading the data
  • 43. INSERT  Hive> INSERT OVERWRITE TABLE Employee Partition (country= ‘IN’,state=’KA’) SELECT * FROM emp_stage ese WHERE ese.country=’IN’ AND emp_stage ese WHERE ese.country=’IN’ AND ese.state=’KA’;
  • 44. Exporting Data out of Hive  Hive>INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/data' SELECT name, age FROM aliens WHERE date_sighted >'2014-09- age FROM aliens WHERE date_sighted >'2014-09- 15'
  • 45. Unit V NoSQL Data Management:  Introducing to NoSQL, characteristics of NoSQL  characteristics of NoSQL  Types of NoSQL data models  Schema less databases
  • 46.  NoSQL database stands for "Not Only SQL" or "Not SQL."  NoSQL Database is a non-relational Data  NoSQL Database is a non-relational Data Management System, that does not require a fixed schema.  NoSQL is used for Big data and real-time web apps.
  • 47.
  • 48. Features of NoSQL  Non-relational  NoSQL databases never follow the relational model  Never provide tables with flat fixed-column records  Work with self-contained aggregates or BLOBs  Work with self-contained aggregates or BLOBs  Doesn't require object-relational mapping and data normalization  No complex features like query languages, query planners, referential integrity joins, ACID  Schema-free  NoSQL databases are either schema-free or have relaxed schemas  Do not require any sort of definition of the schema of the data  Offers heterogeneous structures of data in the same domain
  • 49. Advantages of NoSQL  Can be used as Primary or Analytic Data Source  Big Data Capability  No Single Point of Failure  Easy Replication  No Need for Separate Caching Layer  Support Key Developer Languages and Platforms  Simple to implement than using RDBMS  It can serve as the primary data source for online applications. Handles big data which manages data velocity,  No Need for Separate Caching Layer  It provides fast performance and horizontal scalability.  Can handle structured, semi-structured, and unstructured data with equal effect  Object-oriented programming which is easy to use and flexible  NoSQL databases don't need a dedicated high- performance server  Handles big data which manages data velocity, variety, volume, and complexity  Excels at distributed database and multi-data center operations  Eliminates the need for a specific caching layer to store data  Offers a flexible schema design which can easily be altered without downtime or service disruption
  • 50. Types of NoSQL Databases  Key-value Pair Based  Column-oriented Graph Graphs based  Graphs based  Document-oriented
  • 51.
  • 52. Key Value Pair Based  Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load.  Key-value pair storage databases store data as a  Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.
  • 53. Column-based  Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately. Values of single column is treated separately. Values of single column databases are stored contiguously.
  • 54. Document-Oriented:  Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. The document is stored in stored as a document. The document is stored in JSON or XML formats. The value is understood by the DB and can be queried.
  • 55. Graph-Based  A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier.
  • 56. Tools for NoSQL  Wide column: Accumulo, Cassandra, Scylla, HBase.  Document: Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, eXist-db, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB RethinkDB  Key–value: Aerospike, Apache Ignite, ArangoDB, Berkeley DB, Couchbase, Dynamo, FoundationDB, InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, SciDB, SDBM/Flat File dbm, ZooKeeper  Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso