Dancing with the elephant h base1_final

We will discuss
• Introduction to Hadoop
• HBase: Definition, Storage Model, Usecases
• Basic Data Access from shell
• Hands-on with HBase API

What is Hadoop
• Framework for distributed processing of large
datasets(BigData)
• HDFS+MapReduce
• HDFS: (Data)
 Distributed Filesystem responsible for storing data across
cluster
 Provides replication on cheap commodity hardware
 Namenode and DataNode processes
• MapReduce: (Processing)
 May be a future session

HBase: What
• a sparse, distributed, persistent, multidimensional, sorted
map ( defined by Google’s paper on BigTable)
• Distributed NoSQL Database designed on top of HDFS

RDBMS Woes (with massive data)
• Scaling is Hard and Expensive
• Turn off relational features/secondary indexes.. to scale
• Hard to do quick reads at larger tables sizes(500 GB)
• Single point of failures
• Schema changes

HBase: Why
• Scalable: Just add nodes as your data grows
• Distributed: Leveraging Hadoop’s HDFS advantages
• Built on top of Hadoop : Being part of the
ecosystem, can be integrated to multiple tools
• High performance for read/write
 Short-Circuit reads
 Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms
• Schema less
• Production-Ready where data is in order of petabytes

HTable
• Tables are split into regions
• Region: Data with continuous range of RowKeys from
[Start to End) sorted Order
• Regions split as Table grows (Region size can be
configured)
• Table Schema defines Column Families
• (Table, RowKey, ColumnFamily, ColumnName, Timestamp) 
Value

HTable(Data Structure)
• SortedMap(
RowKey, List(
SortedMap(
Column, List(
Value, Timestamp
)
)
)
)

HBase: Data Read/Write
• Get: Random read
• Scan: Sequential read
• Put: Write/Update

HBase: Data Access Clients
• Demo of HBase shell
• Java API

HBase: API
• Connection
• DDL
• DML
• Filters
• Hands-On

HBase: API
• Configuration: holds details where to find the cluster
and tunable setting .
• Hconnection : represent connection to the cluster.
• HBaseAdmin: handles DDL
operations(create, list,drop,alter).
• Htable (HTableInterface) :is a handle on a single Hbase
table. Send “command” to the table (Put , Get , Scan
, Delete , Increment)

HBase: API:DDL
Group name: ddl (Data Defination Language)
Commands:
alter, create, describe, disable, drop, enable, exists, is_di
sabled, is_enabled, list

HBase: API:DDL
HBaseConfiguration conf = new HBaseConfiguration();
conf.set("hbase.master","localhost:60010");
HBaseAdmin hbase = new HBaseAdmin(conf);
HTableDescriptor desc = new HTableDescriptor(" testtable ");
HColumnDescriptor meta = new HColumnDescriptor(" colfam1
".getBytes());
HColumnDescriptor prefix = new HColumnDescriptor(" colfam2
".getBytes());
desc.addFamily(meta);
desc.addFamily(prefix);
hbase.createTable(desc);

HBase: API:DML
Group name: dml (Data Manipulation Language)
Commands:
count, delete, deleteall, get, get_counter, incr, put, scan,
truncate

HBase: API:DML PUT
HTable table = new HTable(conf, "testtable");
Put put = new Put(Bytes.toBytes("row1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
Bytes.toBytes("val1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"),
Bytes.toBytes("val2"));
table.put(put);

HBase: API:DML GET
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "testtable");
Get get = new Get(Bytes.toBytes("row1"));
get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q
ual1"));
Result result = table.get(get);
byte[] val = result.getValue(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1"));
System.out.println("Value: " + Bytes.toString(val));

HBase: API:DML SCAN
Scan scan1 = new Scan();
ResultScanner scanner1 = table.getScanner(scan1);
for (Result res : scanner1) {
System.out.println(res);
}
scanner1.close();

Other Projects around HBase
• SQL Layer: Phoenix, Hive, Impala
• Object Persistence: Lily, Kundera

FollowUp
• Part2:
 Building KeyValue Data store in HBase
 Challenges we faced in SMART
• {Rahul, vinay}@briotribes.com

HBase: Usecase (Facebook)
• Facebook Messaging:
 Titan
 1.5 M ops per second at peak
 6B+ messages per day
 16 columns per operation across diff. families
• Facebook insights:
 Puma
 provides developers and Page owners with metrics about their
content
 > 1 M counter increments per second

Dancing with the elephant h base1_final

Dancing with the elephant h base1_final

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Dancing with the elephant h base1_final

Similar to Dancing with the elephant h base1_final (20)

Recently uploaded

Recently uploaded (20)

Dancing with the elephant h base1_final