SlideShare a Scribd company logo
1 of 24
Dancing With The Elephant
We will discuss
• Introduction to Hadoop
• HBase: Definition, Storage Model, Usecases
• Basic Data Access from shell
• Hands-on with HBase API
What is Hadoop
• Framework for distributed processing of large
datasets(BigData)
• HDFS+MapReduce
• HDFS: (Data)
 Distributed Filesystem responsible for storing data across
cluster
 Provides replication on cheap commodity hardware
 Namenode and DataNode processes
• MapReduce: (Processing)
 May be a future session
HBase: What
• a sparse, distributed, persistent, multidimensional, sorted
map ( defined by Google’s paper on BigTable)
• Distributed NoSQL Database designed on top of HDFS
RDBMS Woes (with massive data)
• Scaling is Hard and Expensive
• Turn off relational features/secondary indexes.. to scale
• Hard to do quick reads at larger tables sizes(500 GB)
• Single point of failures
• Schema changes
HBase: Why
• Scalable: Just add nodes as your data grows
• Distributed: Leveraging Hadoop’s HDFS advantages
• Built on top of Hadoop : Being part of the
ecosystem, can be integrated to multiple tools
• High performance for read/write
 Short-Circuit reads
 Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms
• Schema less
• Production-Ready where data is in order of petabytes
HBase: Storage Model 1
HTable
• Tables are split into regions
• Region: Data with continuous range of RowKeys from
[Start to End) sorted Order
• Regions split as Table grows (Region size can be
configured)
• Table Schema defines Column Families
• (Table, RowKey, ColumnFamily, ColumnName, Timestamp) 
Value
HTable(Data Structure)
• SortedMap(
RowKey, List(
SortedMap(
Column, List(
Value, Timestamp
)
)
)
)
HBase: Data Read/Write
• Get: Random read
• Scan: Sequential read
• Put: Write/Update
HBase: Data Access Clients
• Demo of HBase shell
• Java API
HBase: API
• Connection
• DDL
• DML
• Filters
• Hands-On
HBase: API
• Configuration: holds details where to find the cluster
and tunable setting .
• Hconnection : represent connection to the cluster.
• HBaseAdmin: handles DDL
operations(create, list,drop,alter).
• Htable (HTableInterface) :is a handle on a single Hbase
table. Send “command” to the table (Put , Get , Scan
, Delete , Increment)
HBase: API:DDL
Group name: ddl (Data Defination Language)
Commands:
alter, create, describe, disable, drop, enable, exists, is_di
sabled, is_enabled, list
HBase: API:DDL
HBaseConfiguration conf = new HBaseConfiguration();
conf.set("hbase.master","localhost:60010");
HBaseAdmin hbase = new HBaseAdmin(conf);
HTableDescriptor desc = new HTableDescriptor(" testtable ");
HColumnDescriptor meta = new HColumnDescriptor(" colfam1
".getBytes());
HColumnDescriptor prefix = new HColumnDescriptor(" colfam2
".getBytes());
desc.addFamily(meta);
desc.addFamily(prefix);
hbase.createTable(desc);
HBase: API:DML
Group name: dml (Data Manipulation Language)
Commands:
count, delete, deleteall, get, get_counter, incr, put, scan,
truncate
HBase: API:DML PUT
HTable table = new HTable(conf, "testtable");
Put put = new Put(Bytes.toBytes("row1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
Bytes.toBytes("val1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"),
Bytes.toBytes("val2"));
table.put(put);
HBase: API:DML GET
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "testtable");
Get get = new Get(Bytes.toBytes("row1"));
get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q
ual1"));
Result result = table.get(get);
byte[] val = result.getValue(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1"));
System.out.println("Value: " + Bytes.toString(val));
HBase: API:DML SCAN
Scan scan1 = new Scan();
ResultScanner scanner1 = table.getScanner(scan1);
for (Result res : scanner1) {
System.out.println(res);
}
scanner1.close();
Other Projects around HBase
• SQL Layer: Phoenix, Hive, Impala
• Object Persistence: Lily, Kundera
FollowUp
• Part2:
 Building KeyValue Data store in HBase
 Challenges we faced in SMART
• {Rahul, vinay}@briotribes.com
Shoutout To
HBase: Usecase (Facebook)
• Facebook Messaging:
 Titan
 1.5 M ops per second at peak
 6B+ messages per day
 16 columns per operation across diff. families
• Facebook insights:
 Puma
 provides developers and Page owners with metrics about their
content
 > 1 M counter increments per second
Dancing with the elephant   h base1_final

More Related Content

What's hot

Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesHBaseCon
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBaseHBaseCon
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 

What's hot (20)

Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBase
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 

Viewers also liked

Ppdb 2013 2014 4
Ppdb 2013 2014 4Ppdb 2013 2014 4
Ppdb 2013 2014 4johanstupa
 
Ppdb 2013 2014 3
Ppdb 2013 2014 3Ppdb 2013 2014 3
Ppdb 2013 2014 3johanstupa
 
Ppdb 2013 2014 1
Ppdb 2013 2014 1Ppdb 2013 2014 1
Ppdb 2013 2014 1johanstupa
 
Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013johanstupa
 
Ppdb 2013 2014 5
Ppdb 2013 2014 5Ppdb 2013 2014 5
Ppdb 2013 2014 5johanstupa
 
Trabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnoTrabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnodarwinproyectoilustrador
 
Stcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaStcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaGello Hembz
 
Методичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнівМетодичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнівВиктория Грига
 
Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05PaulPoulain
 
Cardiac development & fetal circulation (2)
Cardiac development & fetal  circulation (2)Cardiac development & fetal  circulation (2)
Cardiac development & fetal circulation (2)Deeptha Premnath
 
Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)Виктория Грига
 
Electric Servo Motor
Electric Servo MotorElectric Servo Motor
Electric Servo MotorGello Hembz
 

Viewers also liked (16)

Ppdb 2013 2014 4
Ppdb 2013 2014 4Ppdb 2013 2014 4
Ppdb 2013 2014 4
 
2013 RFS AMCLC
2013 RFS AMCLC2013 RFS AMCLC
2013 RFS AMCLC
 
Ppdb 2013 2014 3
Ppdb 2013 2014 3Ppdb 2013 2014 3
Ppdb 2013 2014 3
 
Ppdb 2013 2014 1
Ppdb 2013 2014 1Ppdb 2013 2014 1
Ppdb 2013 2014 1
 
Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013
 
Ppdb 2013 2014 5
Ppdb 2013 2014 5Ppdb 2013 2014 5
Ppdb 2013 2014 5
 
ACR RFS Overview
ACR RFS OverviewACR RFS Overview
ACR RFS Overview
 
Trabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnoTrabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturno
 
Stcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaStcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 Marina
 
Методичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнівМетодичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнів
 
Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05
 
студия 2016
студия  2016студия  2016
студия 2016
 
Cardiac development & fetal circulation (2)
Cardiac development & fetal  circulation (2)Cardiac development & fetal  circulation (2)
Cardiac development & fetal circulation (2)
 
Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)
 
Electric Servo Motor
Electric Servo MotorElectric Servo Motor
Electric Servo Motor
 
Интегрированный урок
Интегрированный урокИнтегрированный урок
Интегрированный урок
 

Similar to Dancing with the elephant h base1_final

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of HadoopNam Nham
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comEdward D. Kim
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxvishwasgarade1
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDataWorks Summit
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 

Similar to Dancing with the elephant h base1_final (20)

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Hbase
HbaseHbase
Hbase
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Dancing with the elephant h base1_final

  • 1. Dancing With The Elephant
  • 2. We will discuss • Introduction to Hadoop • HBase: Definition, Storage Model, Usecases • Basic Data Access from shell • Hands-on with HBase API
  • 3. What is Hadoop • Framework for distributed processing of large datasets(BigData) • HDFS+MapReduce • HDFS: (Data)  Distributed Filesystem responsible for storing data across cluster  Provides replication on cheap commodity hardware  Namenode and DataNode processes • MapReduce: (Processing)  May be a future session
  • 4. HBase: What • a sparse, distributed, persistent, multidimensional, sorted map ( defined by Google’s paper on BigTable) • Distributed NoSQL Database designed on top of HDFS
  • 5. RDBMS Woes (with massive data) • Scaling is Hard and Expensive • Turn off relational features/secondary indexes.. to scale • Hard to do quick reads at larger tables sizes(500 GB) • Single point of failures • Schema changes
  • 6. HBase: Why • Scalable: Just add nodes as your data grows • Distributed: Leveraging Hadoop’s HDFS advantages • Built on top of Hadoop : Being part of the ecosystem, can be integrated to multiple tools • High performance for read/write  Short-Circuit reads  Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms • Schema less • Production-Ready where data is in order of petabytes
  • 8. HTable • Tables are split into regions • Region: Data with continuous range of RowKeys from [Start to End) sorted Order • Regions split as Table grows (Region size can be configured) • Table Schema defines Column Families • (Table, RowKey, ColumnFamily, ColumnName, Timestamp)  Value
  • 9. HTable(Data Structure) • SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  • 10. HBase: Data Read/Write • Get: Random read • Scan: Sequential read • Put: Write/Update
  • 11. HBase: Data Access Clients • Demo of HBase shell • Java API
  • 12. HBase: API • Connection • DDL • DML • Filters • Hands-On
  • 13. HBase: API • Configuration: holds details where to find the cluster and tunable setting . • Hconnection : represent connection to the cluster. • HBaseAdmin: handles DDL operations(create, list,drop,alter). • Htable (HTableInterface) :is a handle on a single Hbase table. Send “command” to the table (Put , Get , Scan , Delete , Increment)
  • 14. HBase: API:DDL Group name: ddl (Data Defination Language) Commands: alter, create, describe, disable, drop, enable, exists, is_di sabled, is_enabled, list
  • 15. HBase: API:DDL HBaseConfiguration conf = new HBaseConfiguration(); conf.set("hbase.master","localhost:60010"); HBaseAdmin hbase = new HBaseAdmin(conf); HTableDescriptor desc = new HTableDescriptor(" testtable "); HColumnDescriptor meta = new HColumnDescriptor(" colfam1 ".getBytes()); HColumnDescriptor prefix = new HColumnDescriptor(" colfam2 ".getBytes()); desc.addFamily(meta); desc.addFamily(prefix); hbase.createTable(desc);
  • 16. HBase: API:DML Group name: dml (Data Manipulation Language) Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
  • 17. HBase: API:DML PUT HTable table = new HTable(conf, "testtable"); Put put = new Put(Bytes.toBytes("row1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val2")); table.put(put);
  • 18. HBase: API:DML GET Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "testtable"); Get get = new Get(Bytes.toBytes("row1")); get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q ual1")); Result result = table.get(get); byte[] val = result.getValue(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); System.out.println("Value: " + Bytes.toString(val));
  • 19. HBase: API:DML SCAN Scan scan1 = new Scan(); ResultScanner scanner1 = table.getScanner(scan1); for (Result res : scanner1) { System.out.println(res); } scanner1.close();
  • 20. Other Projects around HBase • SQL Layer: Phoenix, Hive, Impala • Object Persistence: Lily, Kundera
  • 21. FollowUp • Part2:  Building KeyValue Data store in HBase  Challenges we faced in SMART • {Rahul, vinay}@briotribes.com
  • 23. HBase: Usecase (Facebook) • Facebook Messaging:  Titan  1.5 M ops per second at peak  6B+ messages per day  16 columns per operation across diff. families • Facebook insights:  Puma  provides developers and Page owners with metrics about their content  > 1 M counter increments per second