SlideShare a Scribd company logo
Advance Hive, NoSQL
DataBase (HBase)
HiveQL: Data Manipulation
Loading Data into Managed Tables
• Hive has no row-level insert, update, and delete operations.
• Only data can be loaded in tables through “bulk” load operations.
LOAD DATA LOCAL INPATH ‘/usr/hive/warehouse/california-employees'
OVERWRITE INTO TABLE employees
PARTITION (country = 'US', state = 'CA');
Inserting Data into Tables from Queries
• INSERT statement allows to load data into a table from a query.
• With OVERWRITE, any previous contents of the partition (or
whole table if not partitioned) are replaced.
HiveQL: Data Manipulation
Dynamic Partition Inserts
• Dynamic partition feature, where it can infer the partitions to create
based on query parameters.
HiveQL: Data Manipulation
HiveQL: Data Manipulation
Creating Tables and Loading Them in One Query
Exporting Data
User Defined Functions
• Hive has the ability to use User Defined Functions written in Java to perform
computations that would otherwise be difficult (or impossible) to perform
using the built-in Hive functions and SQL commands.
• To invoke a UDF from within a Hive script, it is required to:
1. Register the JAR file that contains the UDF class, and
2. Define an alias for the function using the CREATE TEMPORARY FUNCTION
command.
Example UDF 1
2
3
4
5
public class UDFZodiacSign extends UDF {
private SimpleDateFormat df;
public UDFZodiacSign() {
df = new SimpleDateFormat("MM-dd-yyyy");}
public String evaluate(Date bday) {
return this.evaluate(bday.getMonth(), bday.getDay());}
public String evaluate(String bday) {
Date date = null;
try {date = df.parse(bday);} catch (Exception ex) {return null;}
return this.evaluate(date.getMonth() + 1, date.getDay());}
public String evaluate(Integer month, Integer day) {
if (month == 1) {
if (day < 20) {return "Capricorn";} else {return "Aquarius";}}
if (month == 2) {
if (day < 19) {return "Aquarius";} else {return "Pisces";}}return null;}}
Custom Map/Reduce in Hive
HBase: Introduction to HBase
• HBase is a distributed column-oriented data store built on top of HDFS.
• HBase is an Apache open source project whose goal is to provide storage for the
Hadoop Distributed Computing
• Data is logically organized into tables, rows and columns
HBase vs. HDFS
• HDFS is good for batch processing (scans over big files)
• Not good for record lookup
• Not good for incremental addition of small batches
• Not good for updates
• HBase is designed to efficiently address the above points
• Fast record lookup
• Support for record-level insertion
• Support for updates (not in place)
Tables, Rows, Column family
• Table: HBase organizes data into tables. Table names are Strings and composed of
characters that are safe for use in a file system path.
• Row: Within a table, data is stored according to its row. Rows are identified
uniquely by their row key. Row keys do not have a data type and are always
treated as a byte[ ] (byte array).
• Column Family: Data within a row is grouped by column family. Column families
also impact the physical arrangement of data stored in HBase. For this reason,
they must be defined up front and are not easily modified. Every row in a table
has the same column families, although a row need not store data in all its
families. Column families are Strings and composed of characters that are safe for
use in a file system path.
• Column Qualifier: Data within a column family is addressed via its column
qualifier, or simply, column. Column qualifiers need not be specified in advance.
Column qualifiers need not be consistent between rows. Like row keys, column
qualifiers do not have a data type and are always treated as a byte[ ].
• Cell: A combination of row key, column family, and column qualifier uniquely
identifies a cell. The data stored in a cell is referred to as that cell’s value. Values
also do not have a data type and are always treated as a byte[ ].
• Timestamp: Values within a cell are versioned. Versions are identified by their
version number, which by default is the timestamp of when the cell was written.
If a timestamp is not specified during a write, the current timestamp is used. If
the timestamp is not specified for a read, the latest one is returned. The number
of cell value versions retained by HBase is configured for each column family. The
default number of cell versions is three.
Column, Cell, Timestamp
Pictorial Representation
Representation as a Multi Dimensional Map
SortedMap<RowKey, List<SortedMap<Column, List<Value, Timestamp>>>>
HBase Table as Key-Value Store
HBase Architecture
Client API: Administrative API
Client API: CRUD Operations put()
Client API: CRUD Operations get()
Client API: CRUD Operations delete()
HBase Clients
• Java Client
• Useful when the interacting application is written in a java language.
• REST and Thrift
• HBase ships with REST and Thrift interfaces. These are useful when the
interacting application is written in a language other than Java.
HBase MapReduce Integration
public class SimpleRowCounter extends
Configured implements Tool {
static class RowCounterMapper extends
TableMapper<ImmutableBytesWritable, Result> {
public static enum Counters { ROWS }
@Override
public void map(ImmutableBytesWritable row,
Result value, Context context) {
context.getCounter(Counters.ROWS).increment(1);}}
@Override
public int run(String[] args) throws Exception {
if (args.length != 1) {System.err.println("Usage:
SimpleRowCounter <tablename>"); return -1;}
String tableName = args[0];
Scan scan = new Scan();
scan.setFilter(new FirstKeyOnlyFilter());
Job job = new Job(getConf(),
getClass().getSimpleName());
job.setJarByClass(getClass());
TableMapReduceUtil.initTableMapperJob(tableNam
e, scan,
RowCounterMapper.class,
ImmutableBytesWritable.class, Result.class, job);
job.setNumReduceTasks(0);
job.setOutputFormatClass(NullOutputFormat.class);
return job.waitForCompletion(true) ? 0 : 1;}
public static void main(String[] args) throws
Exception {
int exitCode =
ToolRunner.run(HBaseConfiguration.create(),
new SimpleRowCounter(), args);
System.exit(exitCode);}}

More Related Content

What's hot

Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
vishal choudhary
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Apache storm
Apache stormApache storm
Apache storm
Kapil Kumar
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Abhishek Gautam
 
Unit 5-lecture4
Unit 5-lecture4Unit 5-lecture4
Unit 5-lecture4
vishal choudhary
 
Big Data and Hadoop Components
Big Data and Hadoop ComponentsBig Data and Hadoop Components
Big Data and Hadoop Components
DezyreAcademy
 
Unit 5-lecture-3
Unit 5-lecture-3Unit 5-lecture-3
Unit 5-lecture-3
vishal choudhary
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
Uday Vakalapudi
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Apache hive
Apache hiveApache hive
Apache hive
Vaibhav Kadu
 
Hbase
HbaseHbase
Hbase
HbaseHbase
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
Apache Hive
Apache HiveApache Hive
Apache Hive
tusharsinghal58
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
AWS Germany
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFrame
Prashant Gupta
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
Some corner at the Laboratory
 
Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at Facebook
Databricks
 
The strength of a spatial database
The strength of a spatial databaseThe strength of a spatial database
The strength of a spatial database
Peter Horsbøll Møller
 
Spark core
Spark coreSpark core
Spark core
Prashant Gupta
 

What's hot (20)

Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Apache storm
Apache stormApache storm
Apache storm
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Unit 5-lecture4
Unit 5-lecture4Unit 5-lecture4
Unit 5-lecture4
 
Big Data and Hadoop Components
Big Data and Hadoop ComponentsBig Data and Hadoop Components
Big Data and Hadoop Components
 
Unit 5-lecture-3
Unit 5-lecture-3Unit 5-lecture-3
Unit 5-lecture-3
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hbase
HbaseHbase
Hbase
 
Hbase
HbaseHbase
Hbase
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFrame
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at Facebook
 
The strength of a spatial database
The strength of a spatial databaseThe strength of a spatial database
The strength of a spatial database
 
Spark core
Spark coreSpark core
Spark core
 

Viewers also liked

Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5
Rohit Agrawal
 
Hadoop/HBase POC framework
Hadoop/HBase POC frameworkHadoop/HBase POC framework
Hadoop/HBase POC framework
Doug Chang
 
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case StudyOozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
FX Live Group
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
DataWorks Summit
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtime
DataWorks Summit
 
Apache Pig for Data Scientists
Apache Pig for Data ScientistsApache Pig for Data Scientists
Apache Pig for Data ScientistsDataWorks Summit
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
ANSHUL GUPTA
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBase
Cask Data
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
Chicago Hadoop Users Group
 
Hive ppt (1)
Hive ppt (1)Hive ppt (1)
Hive ppt (1)
marwa baich
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Sudhir Mallem
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
ROHIT KHARABE
 
Apache hbase overview (20160427)
Apache hbase overview (20160427)Apache hbase overview (20160427)
Apache hbase overview (20160427)
Steve Min
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
Future Of Data Paris - BI and Big Data
Future Of Data Paris - BI and Big DataFuture Of Data Paris - BI and Big Data
Future Of Data Paris - BI and Big Data
Mathias Kluba
 
ORC Files
ORC FilesORC Files
ORC Files
Owen O'Malley
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 

Viewers also liked (20)

Veracity think bugdata #2 6.7.2015
Veracity think bugdata #2   6.7.2015Veracity think bugdata #2   6.7.2015
Veracity think bugdata #2 6.7.2015
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5
 
Hadoop/HBase POC framework
Hadoop/HBase POC frameworkHadoop/HBase POC framework
Hadoop/HBase POC framework
 
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case StudyOozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
HadoopFileFormats_2016
HadoopFileFormats_2016HadoopFileFormats_2016
HadoopFileFormats_2016
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtime
 
Apache Pig for Data Scientists
Apache Pig for Data ScientistsApache Pig for Data Scientists
Apache Pig for Data Scientists
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBase
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
Hive ppt (1)
Hive ppt (1)Hive ppt (1)
Hive ppt (1)
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
 
Apache hbase overview (20160427)
Apache hbase overview (20160427)Apache hbase overview (20160427)
Apache hbase overview (20160427)
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Future Of Data Paris - BI and Big Data
Future Of Data Paris - BI and Big DataFuture Of Data Paris - BI and Big Data
Future Of Data Paris - BI and Big Data
 
ORC Files
ORC FilesORC Files
ORC Files
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 

Similar to Advance Hive, NoSQL Database (HBase) - Module 7

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
Sadhik7
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
MapR Technologies
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
sheetal sharma
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
vijayapraba1
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
Vibrant Technologies & Computers
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
Anuja Gunale
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
phanleson
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
Valerii Moisieienko
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base智杰 付
 
Hbase
HbaseHbase
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Valerii Moisieienko Apache hbase workshop
Valerii Moisieienko	Apache hbase workshopValerii Moisieienko	Apache hbase workshop
Valerii Moisieienko Apache hbase workshop
Аліна Шепшелей
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
Will Du
 

Similar to Advance Hive, NoSQL Database (HBase) - Module 7 (20)

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Hbase
HbaseHbase
Hbase
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
01 hbase
01 hbase01 hbase
01 hbase
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
 
Hbase
HbaseHbase
Hbase
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Valerii Moisieienko Apache hbase workshop
Valerii Moisieienko	Apache hbase workshopValerii Moisieienko	Apache hbase workshop
Valerii Moisieienko Apache hbase workshop
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 

More from Rohit Agrawal

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10
Rohit Agrawal
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
Rohit Agrawal
 
Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8
Rohit Agrawal
 
Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4
Rohit Agrawal
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
Rohit Agrawal
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
Rohit Agrawal
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 

More from Rohit Agrawal (7)

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
 
Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8
 
Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Advance Hive, NoSQL Database (HBase) - Module 7

  • 2. HiveQL: Data Manipulation Loading Data into Managed Tables • Hive has no row-level insert, update, and delete operations. • Only data can be loaded in tables through “bulk” load operations. LOAD DATA LOCAL INPATH ‘/usr/hive/warehouse/california-employees' OVERWRITE INTO TABLE employees PARTITION (country = 'US', state = 'CA');
  • 3. Inserting Data into Tables from Queries • INSERT statement allows to load data into a table from a query. • With OVERWRITE, any previous contents of the partition (or whole table if not partitioned) are replaced. HiveQL: Data Manipulation
  • 4. Dynamic Partition Inserts • Dynamic partition feature, where it can infer the partitions to create based on query parameters. HiveQL: Data Manipulation
  • 5. HiveQL: Data Manipulation Creating Tables and Loading Them in One Query Exporting Data
  • 6. User Defined Functions • Hive has the ability to use User Defined Functions written in Java to perform computations that would otherwise be difficult (or impossible) to perform using the built-in Hive functions and SQL commands. • To invoke a UDF from within a Hive script, it is required to: 1. Register the JAR file that contains the UDF class, and 2. Define an alias for the function using the CREATE TEMPORARY FUNCTION command.
  • 8. public class UDFZodiacSign extends UDF { private SimpleDateFormat df; public UDFZodiacSign() { df = new SimpleDateFormat("MM-dd-yyyy");} public String evaluate(Date bday) { return this.evaluate(bday.getMonth(), bday.getDay());} public String evaluate(String bday) { Date date = null; try {date = df.parse(bday);} catch (Exception ex) {return null;} return this.evaluate(date.getMonth() + 1, date.getDay());} public String evaluate(Integer month, Integer day) { if (month == 1) { if (day < 20) {return "Capricorn";} else {return "Aquarius";}} if (month == 2) { if (day < 19) {return "Aquarius";} else {return "Pisces";}}return null;}}
  • 10. HBase: Introduction to HBase • HBase is a distributed column-oriented data store built on top of HDFS. • HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing • Data is logically organized into tables, rows and columns
  • 11. HBase vs. HDFS • HDFS is good for batch processing (scans over big files) • Not good for record lookup • Not good for incremental addition of small batches • Not good for updates • HBase is designed to efficiently address the above points • Fast record lookup • Support for record-level insertion • Support for updates (not in place)
  • 12. Tables, Rows, Column family • Table: HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path. • Row: Within a table, data is stored according to its row. Rows are identified uniquely by their row key. Row keys do not have a data type and are always treated as a byte[ ] (byte array). • Column Family: Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase. For this reason, they must be defined up front and are not easily modified. Every row in a table has the same column families, although a row need not store data in all its families. Column families are Strings and composed of characters that are safe for use in a file system path.
  • 13. • Column Qualifier: Data within a column family is addressed via its column qualifier, or simply, column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows. Like row keys, column qualifiers do not have a data type and are always treated as a byte[ ]. • Cell: A combination of row key, column family, and column qualifier uniquely identifies a cell. The data stored in a cell is referred to as that cell’s value. Values also do not have a data type and are always treated as a byte[ ]. • Timestamp: Values within a cell are versioned. Versions are identified by their version number, which by default is the timestamp of when the cell was written. If a timestamp is not specified during a write, the current timestamp is used. If the timestamp is not specified for a read, the latest one is returned. The number of cell value versions retained by HBase is configured for each column family. The default number of cell versions is three. Column, Cell, Timestamp
  • 15. Representation as a Multi Dimensional Map SortedMap<RowKey, List<SortedMap<Column, List<Value, Timestamp>>>>
  • 16. HBase Table as Key-Value Store
  • 19. Client API: CRUD Operations put()
  • 20. Client API: CRUD Operations get()
  • 21. Client API: CRUD Operations delete()
  • 22. HBase Clients • Java Client • Useful when the interacting application is written in a java language. • REST and Thrift • HBase ships with REST and Thrift interfaces. These are useful when the interacting application is written in a language other than Java.
  • 23. HBase MapReduce Integration public class SimpleRowCounter extends Configured implements Tool { static class RowCounterMapper extends TableMapper<ImmutableBytesWritable, Result> { public static enum Counters { ROWS } @Override public void map(ImmutableBytesWritable row, Result value, Context context) { context.getCounter(Counters.ROWS).increment(1);}} @Override public int run(String[] args) throws Exception { if (args.length != 1) {System.err.println("Usage: SimpleRowCounter <tablename>"); return -1;} String tableName = args[0]; Scan scan = new Scan(); scan.setFilter(new FirstKeyOnlyFilter()); Job job = new Job(getConf(), getClass().getSimpleName()); job.setJarByClass(getClass()); TableMapReduceUtil.initTableMapperJob(tableNam e, scan, RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, job); job.setNumReduceTasks(0); job.setOutputFormatClass(NullOutputFormat.class); return job.waitForCompletion(true) ? 0 : 1;} public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(HBaseConfiguration.create(), new SimpleRowCounter(), args); System.exit(exitCode);}}