SlideShare a Scribd company logo
HugeTable:Application-Oriented
 Structure Data Storage System

      China Mobile Research Institute
         HugeTable Project Team
               Qian Ling
Agenda

 Motivations
 Hadoop, Hive & HBase
 HT Design & Development
 HT Applications
 Further Plans
Motivations
 Huge Data Volumes
    Total data volumes: Several PB per system
    Daily data volumes: Several TB per system
    Longer retention period: several months
    Big potential: 200% increase in some area
 Multiple Applications Areas                    Data Warehouse
                                                •Scalable
    BOSS BI NMS Internet ...
                                                •High Available
    Data Integration                            •Reliable
 Traditional Application Model                  + App Solution
    SQL support
    Fast Index Query                             … Affordable
    Multiple Application support
    Sensitive data
    CRUD support
    Statistic & Reporting
Hadoop: Raw Techniques

 HDFS: distributed file system with fault tolerance
 MapReduce: parallel programming
 environments over HDFS
 Similar to the situation of POSIX API + Local FS
 High Level Toolkits are initiated
   Yahoo: PIG/Latin
   Business.com: Cloudbase/Hadoop+JDBC
   China Mobile: BC-PDM
   Facebook: Hive/SQL
Hive: A Petabytes Scale Data Warehouse
                             Features:
                              •   Schema support
                              •   Pluggable Storage Engine I/F
                              •   SQL     MR translation
                              •   xDBC Driver
                              •   Tools: HQL Console
                              •   Admin: HWI


                             Usage Scenarios
                              • Reporting
                              • Ad hoc Analysis
                              • Machine Learning
                              • Others
                                  •Log analysis
                                  •Trend detection
                                Facebook has huge clusters
                                >1000 nodes
Source: ICDE 2010/Facebook
HBase: structured storage of sparse data for
 Hadoop
                               Features
                                •   ColumnFamilies
                                •   ACID
                                •   Optimized R/W
                                •   BigTable I/F + BU
                                •   Tools: HBase Shell
                                •   Admin: Jetty Based


                               Usage Scenarios
                                • Social Service
                                • MapReduce Analysis
                                • Content Repository
                                • Wiki, RSS
                                • Near Realtime Reporting
Source: ApacheCon2009/ HBase
                                  & analytics
                                • Store web pages
                               … Replacing SQL Systems
HugeTable: Application-Oriented Structure
Data Storage System
Address the missing blocks                   HugeTable
   Index store & Query Optimizer    Tools
                                             Client   I/F   Admin
                                               s              Data,
   Access Control List              HFile w/      Index
                                                             config,
                                                            FM, Log,
   Insert, Update and Delete          CF          Store       Perf

   Web-based Administration


 Build Solutions for Telco Applications
   Network Management System – NMS
   Value-added System – VAS
   Business Intelligence – BI
   Other areas
A Brief History of HugeTable

       HT-p1                  HT-p2                         HT-p3
                       1. Connect Hive with         1. Move to higher version
1. HBase-based
                          HBase                        of Hive, Hadoop and
2. Partial xDBC/SQL                                    HBase
   support             3. Support HFile, CF in
                                                    2. New Storage Engine
3. Integration HBase      Hive                      3. Fruitful external I/F
   with ZK before      2. Global Indexing           4. Many other
   official release    4. Secondary Index              improvements
4. Secondary Index     5. Multiple DB support       4. Application Solution
5. Support Schema      6. ACL support
6. ACL support         7. MR & Scan I/F
7. SQL console         8. Loader Tools, HT-Client
                       9. Admin Portal
                       10.JDBC remote console




        2008                    2009                          2010
HugeTable Building Blocks
                         Applications



                      HugeTable
                      HugeTable                            …
                                                           …



 Storage    KVStore     SQL-MR      Lock      NMS
Computing
                       Hadoop      Hadoop     Cloud
                                                       …
Hadoop      Hadoop
 Core       HBase       Hive      Zookeeper   Master
HBase as HugeTable Index Store
  Create Index                         Select … using index xxx
  Drop Index                           Select … where idxcol



                      Find Index
 Index Meta Data                           Query Engine



         Find Index                 Read Index


                      Write Index            Index Data
  Load Service
                                               HBase



  HT Loader                                 Check Index
Index Store Implementation

  Primary Index: index into data file
  Secondary Index: index into primary index
  Exact match and Range scan
  Integrated with Hive ql and other modules

  20 Nodes,
  1TB/Node      Hive                 HT-
                                     HT-p1                 HT-p2

  Memory
                No Additional cost   8GB/Node*TB           2GB/Node*TB
  Consumption

                20MB/s·Node(No       2.5MB/s·Node(Primar   >5MB/s·Node(Primary
  Load Speed
                Index)               y Index)              Index)

  Index Query   N/A                  <10 sec               <10 sec
HugeTable IUD Support

Goal: Support Insert, Update and Delete on application data.

                                               IUD Statement Select


                              Find IUD table
         Meta Data                                    Query Engine


                                          Write IUD Data



          HT Data                      IUD Table      Read IUD Data
           HDFS                          HBase



                      Offline Merger
HugeTable Access Control

Goal: Support Multiple Users from Multiple Applications , w/o mutual trust


     Database privileges:                  User Access Level:
     1. Meta Data: Index, Create,          1. System Administrator
        Drop                               2. User Manager
     2. User Data: IUD                     3. User


       Grant/Revoke
                                               DDL/DML         Loader/Portal


                               Check Privileges
         Meta Data                                     ACL Module
Administration Portal

Goal: Unified HugeTable management point, decrease management effort


Data Management    User Management     Monitor & FM        Configuration
DB/TBL/IDX         Add/Delete/Modify   Log/Alert/Service   Deploy/Setup
HugeTable Application API
                        Various kinds of Applications


         JDBC/SQL API                           MapReduce API                              BigTable API


• Migration of traditional database   •   Compatible with Hadoop MR API         •   BigTable/HBase style API
  applications                        •   For data analysis, e.g. data mining   •   For NoSQL application, on HFile2
• For SQL developer                   •   Work with HT records format           •   Range scan, Key-value access
• Batch processing & interactive      •   Access control                        •   Access Control


    Table table = new Table("gdr", "admin", "admin");
public void map(LongWritable key, {"default"};
    String[] families = new String[] HugeRecord value,
           OutputCollector<HugeRecordRowKey, HugeRecord> output,
    String[] partitions = new String[] {"dt=20100317"};
    int limit = 10; reporter);
           Reporter
    TableScannerInterface tsi = table.getScanner(
public void reduce(HugeRecordRowKey key,
                new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions);
           Iterator<HugeRecord> values,
    for (int i=0; i<limit; ++i) {
           OutputCollector<HugeRecordRowKey, HugeRecord> output,
                GroupValue gv = tsi.next();
           Reporter reporter);
      for (String family : families) {
                  System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family)));
                }
    }
HugeTable based Telco Application Solutions
  Heavy Requirements, e.g.
      Batch processing                           Telco App
      Complex data analysis
      Interactive query on CDR
      Statistic and reporting Reporting

                                               Interactive Complex      Interactive
                                              Simple Query Analyze    Complex Query

                                   Database
Data Source                                     HugeTable
                                                 Cluster
                                                                             Data
                        Data                        +                      warehouse
                      Aggregator
                                                DataMing
 Data Source                                     Tool kits


                                                              Mass Data Store
                                                              Batch processing
                                                              Statistic
Future works

 Column Sorage Engine
   File Format
   Compression
   Local Index
   Global Index
 Query Optimization
   Join Optimization: index
 Load Optimization
   Parallel Load
 Application Solution
Thanks for your time!
   China Mobile Research Institute

More Related Content

What's hot

Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
Xplenty
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
Ravi Veeramachaneni
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
Narayan Bharadwaj
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Cloudera, Inc.
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
Cynthia Saracco
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
EMC
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Jonathan Seidman
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
Steve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
JAX London
 

What's hot (20)

Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 

Viewers also liked

Rapport Doing Business 2015
Rapport Doing Business 2015Rapport Doing Business 2015
Rapport Doing Business 2015
Franck Dasilva
 
Boards part 4_review
Boards part 4_reviewBoards part 4_review
Boards part 4_review
Sharon Epperson
 
fgfdgdfg
fgfdgdfg fgfdgdfg
fgfdgdfg
robinson1234
 
Guj engdictionary
Guj engdictionaryGuj engdictionary
Guj engdictionarynilay4561
 
Qiang 羌 references in the book of han 汉书 part 1
Qiang 羌 references in the book of han 汉书 part 1Qiang 羌 references in the book of han 汉书 part 1
Qiang 羌 references in the book of han 汉书 part 1
qianghistory
 
Polymer and rubber manufacturing workforce development plan oct 2010
Polymer and rubber manufacturing workforce development plan oct 2010Polymer and rubber manufacturing workforce development plan oct 2010
Polymer and rubber manufacturing workforce development plan oct 2010
RITCWA
 
Samuel quero laplace
Samuel quero laplaceSamuel quero laplace
Samuel quero laplace
samuelquero
 
The Power of BIG OER
The Power of BIG OERThe Power of BIG OER
The Power of BIG OER
Patrick McAndrew
 
A V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake BleedingA V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake Bleedingradicallights
 
The Creative Minds: Steps in enhancing your creativity
The Creative Minds: Steps in enhancing your creativityThe Creative Minds: Steps in enhancing your creativity
The Creative Minds: Steps in enhancing your creativity
History Lovr
 
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HRI AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HRLaurie Ruettimann
 
Challenging themes: radio English for learners and teachers
Challenging themes: radio English for learners and teachersChallenging themes: radio English for learners and teachers
Challenging themes: radio English for learners and teachers
Paul Woods
 
OpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsOpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI Platforms
Prabindh Sundareson
 
Managic presentation english
Managic presentation englishManagic presentation english
Managic presentation english
Antonio Roberto Oliveira
 
Cisco 3900 and cisco 2900 series routers
Cisco 3900 and cisco 2900 series routersCisco 3900 and cisco 2900 series routers
Cisco 3900 and cisco 2900 series routers
3Anetwork com
 
Quiz for ut iii pps
Quiz for ut iii ppsQuiz for ut iii pps
Quiz for ut iii pps
shajugeorge
 
peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar Pangestu S
 

Viewers also liked (20)

Rapport Doing Business 2015
Rapport Doing Business 2015Rapport Doing Business 2015
Rapport Doing Business 2015
 
Boards part 4_review
Boards part 4_reviewBoards part 4_review
Boards part 4_review
 
srthsrth
srthsrthsrthsrth
srthsrth
 
fgfdgdfg
fgfdgdfg fgfdgdfg
fgfdgdfg
 
Guj engdictionary
Guj engdictionaryGuj engdictionary
Guj engdictionary
 
Qiang 羌 references in the book of han 汉书 part 1
Qiang 羌 references in the book of han 汉书 part 1Qiang 羌 references in the book of han 汉书 part 1
Qiang 羌 references in the book of han 汉书 part 1
 
Polymer and rubber manufacturing workforce development plan oct 2010
Polymer and rubber manufacturing workforce development plan oct 2010Polymer and rubber manufacturing workforce development plan oct 2010
Polymer and rubber manufacturing workforce development plan oct 2010
 
CBA PP Branded
CBA PP BrandedCBA PP Branded
CBA PP Branded
 
Samuel quero laplace
Samuel quero laplaceSamuel quero laplace
Samuel quero laplace
 
The Power of BIG OER
The Power of BIG OERThe Power of BIG OER
The Power of BIG OER
 
A V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake BleedingA V I D Juicy Ultimate Brake Bleeding
A V I D Juicy Ultimate Brake Bleeding
 
The Creative Minds: Steps in enhancing your creativity
The Creative Minds: Steps in enhancing your creativityThe Creative Minds: Steps in enhancing your creativity
The Creative Minds: Steps in enhancing your creativity
 
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HRI AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
I AM HR: FIVE STRATEGIC WAYS TO BREAK STEREOTYPES AND RECLAIM HR
 
Challenging themes: radio English for learners and teachers
Challenging themes: radio English for learners and teachersChallenging themes: radio English for learners and teachers
Challenging themes: radio English for learners and teachers
 
OpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsOpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI Platforms
 
Managic presentation english
Managic presentation englishManagic presentation english
Managic presentation english
 
Cisco 3900 and cisco 2900 series routers
Cisco 3900 and cisco 2900 series routersCisco 3900 and cisco 2900 series routers
Cisco 3900 and cisco 2900 series routers
 
Quiz for ut iii pps
Quiz for ut iii ppsQuiz for ut iii pps
Quiz for ut iii pps
 
peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar peran pers nasional KWN by Pangestu chaesar
peran pers nasional KWN by Pangestu chaesar
 
Gps4b
Gps4bGps4b
Gps4b
 

Similar to HugeTable:Application-Oriented Structure Data Storage System

Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
Hortonworks
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Cloudera, Inc.
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
Cloudera, Inc.
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hortonworks
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
Nilesh Gule
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
Janu Jahnavi
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
amrutupre
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
hadoopsphere
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
Platfora
 

Similar to HugeTable:Application-Oriented Structure Data Storage System (20)

Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

HugeTable:Application-Oriented Structure Data Storage System

  • 1. HugeTable:Application-Oriented Structure Data Storage System China Mobile Research Institute HugeTable Project Team Qian Ling
  • 2. Agenda Motivations Hadoop, Hive & HBase HT Design & Development HT Applications Further Plans
  • 3. Motivations Huge Data Volumes Total data volumes: Several PB per system Daily data volumes: Several TB per system Longer retention period: several months Big potential: 200% increase in some area Multiple Applications Areas Data Warehouse •Scalable BOSS BI NMS Internet ... •High Available Data Integration •Reliable Traditional Application Model + App Solution SQL support Fast Index Query … Affordable Multiple Application support Sensitive data CRUD support Statistic & Reporting
  • 4. Hadoop: Raw Techniques HDFS: distributed file system with fault tolerance MapReduce: parallel programming environments over HDFS Similar to the situation of POSIX API + Local FS High Level Toolkits are initiated Yahoo: PIG/Latin Business.com: Cloudbase/Hadoop+JDBC China Mobile: BC-PDM Facebook: Hive/SQL
  • 5. Hive: A Petabytes Scale Data Warehouse Features: • Schema support • Pluggable Storage Engine I/F • SQL MR translation • xDBC Driver • Tools: HQL Console • Admin: HWI Usage Scenarios • Reporting • Ad hoc Analysis • Machine Learning • Others •Log analysis •Trend detection Facebook has huge clusters >1000 nodes Source: ICDE 2010/Facebook
  • 6. HBase: structured storage of sparse data for Hadoop Features • ColumnFamilies • ACID • Optimized R/W • BigTable I/F + BU • Tools: HBase Shell • Admin: Jetty Based Usage Scenarios • Social Service • MapReduce Analysis • Content Repository • Wiki, RSS • Near Realtime Reporting Source: ApacheCon2009/ HBase & analytics • Store web pages … Replacing SQL Systems
  • 7. HugeTable: Application-Oriented Structure Data Storage System Address the missing blocks HugeTable Index store & Query Optimizer Tools Client I/F Admin s Data, Access Control List HFile w/ Index config, FM, Log, Insert, Update and Delete CF Store Perf Web-based Administration Build Solutions for Telco Applications Network Management System – NMS Value-added System – VAS Business Intelligence – BI Other areas
  • 8. A Brief History of HugeTable HT-p1 HT-p2 HT-p3 1. Connect Hive with 1. Move to higher version 1. HBase-based HBase of Hive, Hadoop and 2. Partial xDBC/SQL HBase support 3. Support HFile, CF in 2. New Storage Engine 3. Integration HBase Hive 3. Fruitful external I/F with ZK before 2. Global Indexing 4. Many other official release 4. Secondary Index improvements 4. Secondary Index 5. Multiple DB support 4. Application Solution 5. Support Schema 6. ACL support 6. ACL support 7. MR & Scan I/F 7. SQL console 8. Loader Tools, HT-Client 9. Admin Portal 10.JDBC remote console 2008 2009 2010
  • 9. HugeTable Building Blocks Applications HugeTable HugeTable … … Storage KVStore SQL-MR Lock NMS Computing Hadoop Hadoop Cloud … Hadoop Hadoop Core HBase Hive Zookeeper Master
  • 10. HBase as HugeTable Index Store Create Index Select … using index xxx Drop Index Select … where idxcol Find Index Index Meta Data Query Engine Find Index Read Index Write Index Index Data Load Service HBase HT Loader Check Index
  • 11. Index Store Implementation Primary Index: index into data file Secondary Index: index into primary index Exact match and Range scan Integrated with Hive ql and other modules 20 Nodes, 1TB/Node Hive HT- HT-p1 HT-p2 Memory No Additional cost 8GB/Node*TB 2GB/Node*TB Consumption 20MB/s·Node(No 2.5MB/s·Node(Primar >5MB/s·Node(Primary Load Speed Index) y Index) Index) Index Query N/A <10 sec <10 sec
  • 12. HugeTable IUD Support Goal: Support Insert, Update and Delete on application data. IUD Statement Select Find IUD table Meta Data Query Engine Write IUD Data HT Data IUD Table Read IUD Data HDFS HBase Offline Merger
  • 13. HugeTable Access Control Goal: Support Multiple Users from Multiple Applications , w/o mutual trust Database privileges: User Access Level: 1. Meta Data: Index, Create, 1. System Administrator Drop 2. User Manager 2. User Data: IUD 3. User Grant/Revoke DDL/DML Loader/Portal Check Privileges Meta Data ACL Module
  • 14. Administration Portal Goal: Unified HugeTable management point, decrease management effort Data Management User Management Monitor & FM Configuration DB/TBL/IDX Add/Delete/Modify Log/Alert/Service Deploy/Setup
  • 15. HugeTable Application API Various kinds of Applications JDBC/SQL API MapReduce API BigTable API • Migration of traditional database • Compatible with Hadoop MR API • BigTable/HBase style API applications • For data analysis, e.g. data mining • For NoSQL application, on HFile2 • For SQL developer • Work with HT records format • Range scan, Key-value access • Batch processing & interactive • Access control • Access Control Table table = new Table("gdr", "admin", "admin"); public void map(LongWritable key, {"default"}; String[] families = new String[] HugeRecord value, OutputCollector<HugeRecordRowKey, HugeRecord> output, String[] partitions = new String[] {"dt=20100317"}; int limit = 10; reporter); Reporter TableScannerInterface tsi = table.getScanner( public void reduce(HugeRecordRowKey key, new byte[0],new byte[] {Byte.MAX_VALUE},families, partitions); Iterator<HugeRecord> values, for (int i=0; i<limit; ++i) { OutputCollector<HugeRecordRowKey, HugeRecord> output, GroupValue gv = tsi.next(); Reporter reporter); for (String family : families) { System.out.println(family + " = " + Bytes.toString(gv.getByteValue(family))); } }
  • 16. HugeTable based Telco Application Solutions Heavy Requirements, e.g. Batch processing Telco App Complex data analysis Interactive query on CDR Statistic and reporting Reporting Interactive Complex Interactive Simple Query Analyze Complex Query Database Data Source HugeTable Cluster Data Data + warehouse Aggregator DataMing Data Source Tool kits Mass Data Store Batch processing Statistic
  • 17. Future works Column Sorage Engine File Format Compression Local Index Global Index Query Optimization Join Optimization: index Load Optimization Parallel Load Application Solution
  • 18. Thanks for your time! China Mobile Research Institute