SlideShare a Scribd company logo
1 of 18
Download to read offline
HugeTable:Application-Oriented
 Structure Data Storage System

      China Mobile Research Institute
         HugeTable Project Team
               Qian Ling
Agenda

 Motivations
 Hadoop, Hive & HBase
 HT Design & Development
 HT Applications
 Further Plans
Motivations
 Huge Data Volumes
    Total data volumes: Several PB per system
    Daily data volumes: Several TB per system
    Longer retention period: several months
    Big potential: 200% increase in some area
 Multiple Applications Areas                    Data Warehouse
                                                •Scalable
    BOSS BI NMS Internet ...
                                                •High Available
    Data Integration                            •Reliable
 Traditional Application Model                  + App Solution
    SQL support
    Fast Index Query                             … Affordable
    Multiple Application support
    Sensitive data
    CRUD support
    Statistic & Reporting
Hadoop: Raw Techniques

 HDFS: distributed file system with fault tolerance
 MapReduce: parallel programming
 environments over HDFS
 Similar to the situation of POSIX API + Local FS
 High Level Toolkits are initiated
   Yahoo: PIG/Latin
   Business.com: Cloudbase/Hadoop+JDBC
   China Mobile: BC-PDM
   Facebook: Hive/SQL
Hive: A Petabytes Scale Data Warehouse
                             Features:
                              •   Schema support
                              •   Pluggable Storage Engine I/F
                              •   SQL     MR translation
                              •   xDBC Driver
                              •   Tools: HQL Console
                              •   Admin: HWI


                             Usage Scenarios
                              • Reporting
                              • Ad hoc Analysis
                              • Machine Learning
                              • Others
                                  •Log analysis
                                  •Trend detection
                                Facebook has huge clusters
                                >1000 nodes
Source: ICDE 2010/Facebook
HBase: structured storage of sparse data for
 Hadoop
                               Features
                                •   ColumnFamilies
                                •   ACID
                                •   Optimized R/W
                                •   BigTable I/F + BU
                                •   Tools: HBase Shell
                                •   Admin: Jetty Based


                               Usage Scenarios
                                • Social Service
                                • MapReduce Analysis
                                • Content Repository
                                • Wiki, RSS
                                • Near Realtime Reporting
Source: ApacheCon2009/ HBase
                                  & analytics
                                • Store web pages
                               … Replacing SQL Systems
HugeTable: Application-Oriented Structure
Data Storage System
Address the missing blocks                   HugeTable
   Index store & Query Optimizer    Tools
                                             Client   I/F   Admin
                                               s              Data,
   Access Control List              HFile w/      Index
                                                             config,
                                                            FM, Log,
   Insert, Update and Delete          CF          Store       Perf

   Web-based Administration


 Build Solutions for Telco Applications
   Network Management System – NMS
   Value-added System – VAS
   Business Intelligence – BI
   Other areas
A Brief History of HugeTable

       HT-p1                  HT-p2                         HT-p3
                       1. Connect Hive with         1. Move to higher version
1. HBase-based
                          HBase                        of Hive, Hadoop and
2. Partial xDBC/SQL                                    HBase
   support             3. Support HFile, CF in
                                                    2. New Storage Engine
3. Integration HBase      Hive                      3. Fruitful external I/F
   with ZK before      2. Global Indexing           4. Many other
   official release    4. Secondary Index              improvements
4. Secondary Index     5. Multiple DB support       4. Application Solution
5. Support Schema      6. ACL support
6. ACL support         7. MR & Scan I/F
7. SQL console         8. Loader Tools, HT-Client
                       9. Admin Portal
                       10.JDBC remote console




        2008                    2009                          2010
HugeTable Building Blocks
                         Applications



                      HugeTable
                      HugeTable                            …
                                                           …



 Storage    KVStore     SQL-MR      Lock      NMS
Computing
                       Hadoop      Hadoop     Cloud
                                                       …
Hadoop      Hadoop
 Core       HBase       Hive      Zookeeper   Master
HBase as HugeTable Index Store
  Create Index                         Select … using index xxx
  Drop Index                           Select … where idxcol



                      Find Index
 Index Meta Data                           Query Engine



         Find Index                 Read Index


                      Write Index            Index Data
  Load Service
                                               HBase



  HT Loader                                 Check Index
Index Store Implementation

  Primary Index: index into data file
  Secondary Index: index into primary index
  Exact match and Range scan
  Integrated with Hive ql and other modules

  20 Nodes,
  1TB/Node      Hive                 HT-
                                     HT-p1                 HT-p2

  Memory
                No Additional cost   8GB/Node*TB           2GB/Node*TB
  Consumption

                20MB/s·Node(No       2.5MB/s·Node(Primar   >5MB/s·Node(Primary
  Load Speed
                Index)               y Index)              Index)

  Index Query   N/A                  <10 sec               <10 sec
HugeTable IUD Support

Goal: Support Insert, Update and Delete on application data.

                                               IUD Statement Select


                              Find IUD table
         Meta Data                                    Query Engine


                                          Write IUD Data



          HT Data                      IUD Table      Read IUD Data
           HDFS                          HBase



                      Offline Merger
HugeTable Access Control

Goal: Support Multiple Users from Multiple Applications , w/o mutual trust


     Database privileges:                  User Access Level:
     1. Meta Data: Index, Create,          1. System Administrator
        Drop                               2. User Manager
     2. User Data: IUD                     3. User


       Grant/Revoke
                                               DDL/DML         Loader/Portal


                               Check Privileges
         Meta Data                                     ACL Module
Administration Portal

Goal: Unified HugeTable management point, decrease management effort


Data Management    User Management     Monitor & FM        Configuration
DB/TBL/IDX         Add/Delete/Modify   Log/Alert/Service   Deploy/Setup
HugeTable Application API
                        Various kinds of Applications


         JDBC/SQL API                           MapReduce API                              BigTable API


• Migration of traditional database   •   Compatible with Hadoop MR API         •   BigTable/HBase style API
  applications                        •   For data analysis, e.g. data mining   •   For NoSQL application, on HFile2
• For SQL developer                   •   Work with HT records format           •   Range scan, Key-value access
• Batch processing & interactive      •   Access control                        •   Access Control


    Table table = new Table("gdr", "admin", "admin");
public void map(LongWritable key, {"default"};
    String[] families = new String[] HugeRecord value,
           OutputCollector<HugeRecordRowKey, HugeRecord> output,
    String[] partitions = new String[] {"dt=20100317"};
    int limit = 10; reporter);
           Reporter
    TableScannerInterface tsi = table.getScanner(
public void reduce(HugeRecordRowKey key,
                new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions);
           Iterator<HugeRecord> values,
    for (int i=0; i<limit; ++i) {
           OutputCollector<HugeRecordRowKey, HugeRecord> output,
                GroupValue gv = tsi.next();
           Reporter reporter);
      for (String family : families) {
                  System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family)));
                }
    }
HugeTable based Telco Application Solutions
  Heavy Requirements, e.g.
      Batch processing                           Telco App
      Complex data analysis
      Interactive query on CDR
      Statistic and reporting Reporting

                                               Interactive Complex      Interactive
                                              Simple Query Analyze    Complex Query

                                   Database
Data Source                                     HugeTable
                                                 Cluster
                                                                             Data
                        Data                        +                      warehouse
                      Aggregator
                                                DataMing
 Data Source                                     Tool kits


                                                              Mass Data Store
                                                              Batch processing
                                                              Statistic
Future works

 Column Sorage Engine
   File Format
   Compression
   Local Index
   Global Index
 Query Optimization
   Join Optimization: index
 Load Optimization
   Parallel Load
 Application Solution
Thanks for your time!
   China Mobile Research Institute

More Related Content

What's hot

Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
Hortonworks
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Cloudera, Inc.
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
Roby Chen
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
larsgeorge
 

What's hot (20)

Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 

Viewers also liked

Rapport Doing Business 2015
Rapport Doing Business 2015Rapport Doing Business 2015
Rapport Doing Business 2015
Franck Dasilva
 
Guj engdictionary
Guj engdictionaryGuj engdictionary
Guj engdictionary
nilay4561
 
A V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake BleedingA V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake Bleeding
radicallights
 
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HRI AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
Laurie Ruettimann
 
peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar
Pangestu S
 

Viewers also liked (20)

Rapport Doing Business 2015
Rapport Doing Business 2015Rapport Doing Business 2015
Rapport Doing Business 2015
 
Boards part 4_review
Boards part 4_reviewBoards part 4_review
Boards part 4_review
 
srthsrth
srthsrthsrthsrth
srthsrth
 
fgfdgdfg
fgfdgdfg fgfdgdfg
fgfdgdfg
 
Guj engdictionary
Guj engdictionaryGuj engdictionary
Guj engdictionary
 
Qiang 羌 references in the book of han 汉书 part 1
Qiang 羌 references in the book of han 汉书 part 1Qiang 羌 references in the book of han 汉书 part 1
Qiang 羌 references in the book of han 汉书 part 1
 
Polymer and rubber manufacturing workforce development plan oct 2010
Polymer and rubber manufacturing workforce development plan oct 2010Polymer and rubber manufacturing workforce development plan oct 2010
Polymer and rubber manufacturing workforce development plan oct 2010
 
CBA PP Branded
CBA PP BrandedCBA PP Branded
CBA PP Branded
 
Samuel quero laplace
Samuel quero laplaceSamuel quero laplace
Samuel quero laplace
 
The Power of BIG OER
The Power of BIG OERThe Power of BIG OER
The Power of BIG OER
 
A V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake BleedingA V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake Bleeding
 
The Creative Minds: Steps in enhancing your creativity
The Creative Minds: Steps in enhancing your creativityThe Creative Minds: Steps in enhancing your creativity
The Creative Minds: Steps in enhancing your creativity
 
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HRI AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
 
Challenging themes: radio English for learners and teachers
Challenging themes: radio English for learners and teachersChallenging themes: radio English for learners and teachers
Challenging themes: radio English for learners and teachers
 
OpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsOpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI Platforms
 
Managic presentation english
Managic presentation englishManagic presentation english
Managic presentation english
 
Cisco 3900 and cisco 2900 series routers
Cisco 3900 and cisco 2900 series routersCisco 3900 and cisco 2900 series routers
Cisco 3900 and cisco 2900 series routers
 
Quiz for ut iii pps
Quiz for ut iii ppsQuiz for ut iii pps
Quiz for ut iii pps
 
peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar
 
Gps4b
Gps4bGps4b
Gps4b
 

Similar to HugeTable:Application-Oriented Structure Data Storage System

Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
Joey Echeverria
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Amr Awadallah
 

Similar to HugeTable:Application-Oriented Structure Data Storage System (20)

Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

HugeTable:Application-Oriented Structure Data Storage System

  • 1. HugeTable:Application-Oriented Structure Data Storage System China Mobile Research Institute HugeTable Project Team Qian Ling
  • 2. Agenda Motivations Hadoop, Hive & HBase HT Design & Development HT Applications Further Plans
  • 3. Motivations Huge Data Volumes Total data volumes: Several PB per system Daily data volumes: Several TB per system Longer retention period: several months Big potential: 200% increase in some area Multiple Applications Areas Data Warehouse •Scalable BOSS BI NMS Internet ... •High Available Data Integration •Reliable Traditional Application Model + App Solution SQL support Fast Index Query … Affordable Multiple Application support Sensitive data CRUD support Statistic & Reporting
  • 4. Hadoop: Raw Techniques HDFS: distributed file system with fault tolerance MapReduce: parallel programming environments over HDFS Similar to the situation of POSIX API + Local FS High Level Toolkits are initiated Yahoo: PIG/Latin Business.com: Cloudbase/Hadoop+JDBC China Mobile: BC-PDM Facebook: Hive/SQL
  • 5. Hive: A Petabytes Scale Data Warehouse Features: • Schema support • Pluggable Storage Engine I/F • SQL MR translation • xDBC Driver • Tools: HQL Console • Admin: HWI Usage Scenarios • Reporting • Ad hoc Analysis • Machine Learning • Others •Log analysis •Trend detection Facebook has huge clusters >1000 nodes Source: ICDE 2010/Facebook
  • 6. HBase: structured storage of sparse data for Hadoop Features • ColumnFamilies • ACID • Optimized R/W • BigTable I/F + BU • Tools: HBase Shell • Admin: Jetty Based Usage Scenarios • Social Service • MapReduce Analysis • Content Repository • Wiki, RSS • Near Realtime Reporting Source: ApacheCon2009/ HBase & analytics • Store web pages … Replacing SQL Systems
  • 7. HugeTable: Application-Oriented Structure Data Storage System Address the missing blocks HugeTable Index store & Query Optimizer Tools Client I/F Admin s Data, Access Control List HFile w/ Index config, FM, Log, Insert, Update and Delete CF Store Perf Web-based Administration Build Solutions for Telco Applications Network Management System – NMS Value-added System – VAS Business Intelligence – BI Other areas
  • 8. A Brief History of HugeTable HT-p1 HT-p2 HT-p3 1. Connect Hive with 1. Move to higher version 1. HBase-based HBase of Hive, Hadoop and 2. Partial xDBC/SQL HBase support 3. Support HFile, CF in 2. New Storage Engine 3. Integration HBase Hive 3. Fruitful external I/F with ZK before 2. Global Indexing 4. Many other official release 4. Secondary Index improvements 4. Secondary Index 5. Multiple DB support 4. Application Solution 5. Support Schema 6. ACL support 6. ACL support 7. MR & Scan I/F 7. SQL console 8. Loader Tools, HT-Client 9. Admin Portal 10.JDBC remote console 2008 2009 2010
  • 9. HugeTable Building Blocks Applications HugeTable HugeTable … … Storage KVStore SQL-MR Lock NMS Computing Hadoop Hadoop Cloud … Hadoop Hadoop Core HBase Hive Zookeeper Master
  • 10. HBase as HugeTable Index Store Create Index Select … using index xxx Drop Index Select … where idxcol Find Index Index Meta Data Query Engine Find Index Read Index Write Index Index Data Load Service HBase HT Loader Check Index
  • 11. Index Store Implementation Primary Index: index into data file Secondary Index: index into primary index Exact match and Range scan Integrated with Hive ql and other modules 20 Nodes, 1TB/Node Hive HT- HT-p1 HT-p2 Memory No Additional cost 8GB/Node*TB 2GB/Node*TB Consumption 20MB/s·Node(No 2.5MB/s·Node(Primar >5MB/s·Node(Primary Load Speed Index) y Index) Index) Index Query N/A <10 sec <10 sec
  • 12. HugeTable IUD Support Goal: Support Insert, Update and Delete on application data. IUD Statement Select Find IUD table Meta Data Query Engine Write IUD Data HT Data IUD Table Read IUD Data HDFS HBase Offline Merger
  • 13. HugeTable Access Control Goal: Support Multiple Users from Multiple Applications , w/o mutual trust Database privileges: User Access Level: 1. Meta Data: Index, Create, 1. System Administrator Drop 2. User Manager 2. User Data: IUD 3. User Grant/Revoke DDL/DML Loader/Portal Check Privileges Meta Data ACL Module
  • 14. Administration Portal Goal: Unified HugeTable management point, decrease management effort Data Management User Management Monitor & FM Configuration DB/TBL/IDX Add/Delete/Modify Log/Alert/Service Deploy/Setup
  • 15. HugeTable Application API Various kinds of Applications JDBC/SQL API MapReduce API BigTable API • Migration of traditional database • Compatible with Hadoop MR API • BigTable/HBase style API applications • For data analysis, e.g. data mining • For NoSQL application, on HFile2 • For SQL developer • Work with HT records format • Range scan, Key-value access • Batch processing & interactive • Access control • Access Control Table table = new Table("gdr", "admin", "admin"); public void map(LongWritable key, {"default"}; String[] families = new String[] HugeRecord value, OutputCollector<HugeRecordRowKey, HugeRecord> output, String[] partitions = new String[] {"dt=20100317"}; int limit = 10; reporter); Reporter TableScannerInterface tsi = table.getScanner( public void reduce(HugeRecordRowKey key, new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions); Iterator<HugeRecord> values, for (int i=0; i<limit; ++i) { OutputCollector<HugeRecordRowKey, HugeRecord> output, GroupValue gv = tsi.next(); Reporter reporter); for (String family : families) { System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family))); } }
  • 16. HugeTable based Telco Application Solutions Heavy Requirements, e.g. Batch processing Telco App Complex data analysis Interactive query on CDR Statistic and reporting Reporting Interactive Complex Interactive Simple Query Analyze Complex Query Database Data Source HugeTable Cluster Data Data + warehouse Aggregator DataMing Data Source Tool kits Mass Data Store Batch processing Statistic
  • 17. Future works Column Sorage Engine File Format Compression Local Index Global Index Query Optimization Join Optimization: index Load Optimization Parallel Load Application Solution
  • 18. Thanks for your time! China Mobile Research Institute