HugeTable：Application-Oriented Structure Data Storage System

HugeTable：Application-Oriented
Structure Data Storage System

China Mobile Research Institute
HugeTable Project Team
Qian Ling

Agenda

Motivations
Hadoop, Hive & HBase
HT Design & Development
HT Applications
Further Plans

Motivations
Huge Data Volumes
Total data volumes: Several PB per system
Daily data volumes: Several TB per system
Longer retention period: several months
Big potential: 200% increase in some area
Multiple Applications Areas Data Warehouse
•Scalable
BOSS BI NMS Internet ...
•High Available
Data Integration •Reliable
Traditional Application Model + App Solution
SQL support
Fast Index Query … Affordable
Multiple Application support
Sensitive data
CRUD support
Statistic & Reporting

Hadoop: Raw Techniques

HDFS: distributed file system with fault tolerance
MapReduce: parallel programming
environments over HDFS
Similar to the situation of POSIX API + Local FS
High Level Toolkits are initiated
Yahoo: PIG/Latin
Business.com: Cloudbase/Hadoop+JDBC
China Mobile: BC-PDM
Facebook: Hive/SQL

Hive: A Petabytes Scale Data Warehouse
Features:
• Schema support
• Pluggable Storage Engine I/F
• SQL MR translation
• xDBC Driver
• Tools: HQL Console
• Admin: HWI

Usage Scenarios
• Reporting
• Ad hoc Analysis
• Machine Learning
• Others
•Log analysis
•Trend detection
Facebook has huge clusters
>1000 nodes
Source: ICDE 2010/Facebook

HBase: structured storage of sparse data for
Hadoop
Features
• ColumnFamilies
• ACID
• Optimized R/W
• BigTable I/F + BU
• Tools: HBase Shell
• Admin: Jetty Based

Usage Scenarios
• Social Service
• MapReduce Analysis
• Content Repository
• Wiki, RSS
• Near Realtime Reporting
Source: ApacheCon2009/ HBase
& analytics
• Store web pages
… Replacing SQL Systems

HugeTable: Application-Oriented Structure
Data Storage System
Address the missing blocks HugeTable
Index store & Query Optimizer Tools
Client I/F Admin
s Data,
Access Control List HFile w/ Index
config,
FM, Log,
Insert, Update and Delete CF Store Perf

Web-based Administration

Build Solutions for Telco Applications
Network Management System – NMS
Value-added System – VAS
Business Intelligence – BI
Other areas

A Brief History of HugeTable

HT-p1 HT-p2 HT-p3
1. Connect Hive with 1. Move to higher version
1. HBase-based
HBase of Hive, Hadoop and
2. Partial xDBC/SQL HBase
support 3. Support HFile, CF in
2. New Storage Engine
3. Integration HBase Hive 3. Fruitful external I/F
with ZK before 2. Global Indexing 4. Many other
official release 4. Secondary Index improvements
4. Secondary Index 5. Multiple DB support 4. Application Solution
5. Support Schema 6. ACL support
6. ACL support 7. MR & Scan I/F
7. SQL console 8. Loader Tools, HT-Client
9. Admin Portal
10.JDBC remote console

2008 2009 2010

HugeTable Building Blocks
Applications

HugeTable
HugeTable …
…

Storage KVStore SQL-MR Lock NMS
Computing
Hadoop Hadoop Cloud
…
Hadoop Hadoop
Core HBase Hive Zookeeper Master

HBase as HugeTable Index Store
Create Index Select … using index xxx
Drop Index Select … where idxcol

Find Index
Index Meta Data Query Engine

Find Index Read Index

Write Index Index Data
Load Service
HBase

HT Loader Check Index

Index Store Implementation

Primary Index: index into data file
Secondary Index: index into primary index
Exact match and Range scan
Integrated with Hive ql and other modules

20 Nodes，
1TB/Node Hive HT-
HT-p1 HT-p2

Memory
No Additional cost 8GB/Node*TB 2GB/Node*TB
Consumption

20MB/s·Node(No 2.5MB/s·Node(Primar >5MB/s·Node(Primary
Load Speed
Index) y Index) Index)

Index Query N/A <10 sec <10 sec

HugeTable IUD Support

Goal: Support Insert, Update and Delete on application data.

IUD Statement Select

Find IUD table
Meta Data Query Engine

Write IUD Data

HT Data IUD Table Read IUD Data
HDFS HBase

Offline Merger

HugeTable Access Control

Goal: Support Multiple Users from Multiple Applications , w/o mutual trust

Database privileges: User Access Level:
1. Meta Data: Index, Create, 1. System Administrator
Drop 2. User Manager
2. User Data: IUD 3. User

Grant/Revoke
DDL/DML Loader/Portal

Check Privileges
Meta Data ACL Module

Administration Portal

Goal: Unified HugeTable management point, decrease management effort

Data Management User Management Monitor & FM Configuration
DB/TBL/IDX Add/Delete/Modify Log/Alert/Service Deploy/Setup

HugeTable Application API
Various kinds of Applications

JDBC/SQL API MapReduce API BigTable API

• Migration of traditional database • Compatible with Hadoop MR API • BigTable/HBase style API
applications • For data analysis, e.g. data mining • For NoSQL application, on HFile2
• For SQL developer • Work with HT records format • Range scan, Key-value access
• Batch processing & interactive • Access control • Access Control

Table table = new Table("gdr", "admin", "admin");
public void map(LongWritable key, {"default"};
String[] families = new String[] HugeRecord value,
OutputCollector<HugeRecordRowKey, HugeRecord> output,
String[] partitions = new String[] {"dt=20100317"};
int limit = 10; reporter);
Reporter
TableScannerInterface tsi = table.getScanner(
public void reduce(HugeRecordRowKey key,
new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions);
Iterator<HugeRecord> values,
for (int i=0; i<limit; ++i) {
OutputCollector<HugeRecordRowKey, HugeRecord> output,
GroupValue gv = tsi.next();
Reporter reporter);
for (String family : families) {
System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family)));
}
}

HugeTable based Telco Application Solutions
Heavy Requirements, e.g.
Batch processing Telco App
Complex data analysis
Interactive query on CDR
Statistic and reporting Reporting

Interactive Complex Interactive
Simple Query Analyze Complex Query

Database
Data Source HugeTable
Cluster
Data
Data + warehouse
Aggregator
DataMing
Data Source Tool kits

Mass Data Store
Batch processing
Statistic

Future works

Column Sorage Engine
File Format
Compression
Local Index
Global Index
Query Optimization
Join Optimization: index
Load Optimization
Parallel Load
Application Solution

Thanks for your time!
China Mobile Research Institute

HugeTable：Application-Oriented Structure Data Storage System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to HugeTable：Application-Oriented Structure Data Storage System

Similar to HugeTable：Application-Oriented Structure Data Storage System (20)

Recently uploaded

Recently uploaded (20)

HugeTable：Application-Oriented Structure Data Storage System