SlideShare a Scribd company logo
hStorage-DB:
    Heterogeneity-aware Data Management to
Exploit the Full Capability of Hybrid Storage Systems

           Tian Luo             Michael Mesnier
          Rubao Lee               Feng Chen
       Xiaodong Zhang




  The Ohio State University       Intel Labs
2

 Heterogeneous Storage Resources vs. Diverse QoS
          Requirements of DB Requests
• Storage advancement provides us with
   – High capacity, low cost, but slow hard disk devices (HDD)
   – Fast, low power, but expensive solid state devices (SSD)
   – HDD and SSD co-exist due to their unique merits and limits
• DB requests have diverse QoS requirements
   – Different access patterns: bandwidth/latency demands
   – Different priorities of data processing requests
   – Dynamic changes of requirements
• Hybrid storage can well satisfy diverse QoS of DB requests
   – should be automatic and adaptive with low overhead
   – But with challenges
3

  Challenges for Hybrid Storage Systems to Satisfy
            Diverse QoS Requirements
• DBMS (What I/O services do I need as a storage user?)
   – Classifications of I/O requests into different types
   – hStorage awareness
   – DBMS enhancements to utilize classifications automatically
• hStorage (What can I do for you as a service provider?)
   – Clear definition of supported QoS classifications
   – Hide device details to DBMS
   – Efficient data management among heterogeneous devices
• Communication between DBMS and hStorage
   – Rich information to deliver but limited by interface abilities
   – Need a standard and general purpose protocol
4


        Current interface to access storage



    read/write(int fd, void *buf, size_t count);

          On-disk location   In-memory data   Request size




This interface cannot inform storage the per-request QoS.
            So, we must take other approaches.
5

                      DBA-based Approach
• DBAs decide data placement among heterogeneous devices
  based on experiences

                                      DBMS



                              SSD              HDD
                            Indexes          Other data

• Limitations:
   – Significant human efforts: expertise on both DB and storage.
   – Large granularity, e.g. table/partition-based data placements
   – Static storage layout:
      • Tuned for the “common” case
      • Could not well respond to execution dynamics
6
                 Monitoring-based Solutions
• Storage systems automatically make data placement and
  replacement decisions, by monitoring access patterns
   – LRU (a basic structure), LIRS (MySQL), ARC (IBM I/O controller)
   – Examples from industry:
      • Solid State Hybrid Drive (Seagate)
      • Easy Tier (IBM)
• Limitations:
   – Takes time to recognize access patterns
      • Hard to handle dynamics in short periods
   – With concurrency, access patterns cannot be easily detected
   – Certain critical insights are not access patterns related
      • Domain information (available from DBMS) is not utilized
7
     What information from DBMS we can use?
• System catalog
  – Data type: index, regular table
  – Ownership of data sets: e.g. VIP user, regular user
• Query optimizer
  – Orders of operations and access path
  – Estimated frequency of accesses to related data
• Query planner
  – Access patterns
• Execution engine
  – Life cycles of data usage

They are un-organized semantic information for I/O requests
8
              DBMS Knowledge is not Well Utilized


                                                …     Execution
                               Query                   Engine
                              Optimizer    System
                                           Catalog

Does not consider critical semantic
information for storage management    Buffer Pool Manager
                                      Request
                                       Storage Manager
                                                                  Block interface:
                                 I/O Request                      r/w, LBN, data, size

                                            Storage
9

Goal: organize/utilize DBMS semantic Information

       Checkpoint                Sequential            User1     User2
                                  Random               。。。
        Vacuum                  Repeated scan
    Bkgd. processes         Query Optimizer           Connection pool


             Sys table      Index       User Table   Temp data
                                Buffer Pool
DBMS

                 Semantic gap


                                    Storage


       The mission of hStorage-DB is to fill this gap.
10

             hStorage-DB: DBMS for hStorage
• Objectives:
   – Automatic system management
   – High performance
      • Utilizing available domain knowledge within DBMS for storage I/O
      • Fine-grained data management (block granularity)
      • Well respond to the dynamics of DB requests with different QoS reqs
• System Design Outline
   – A hStorage system specifies a set of QoS policies
   – At runtime, the DBMS selects the needed policy for each I/O
     request based on the organized semantic information
   – I/O requests and their QoS policies are passed to hStorage system
   – The hStorage system makes data placement actions accordingly.
11


                 Outline

 Introduction
 hStorage-DB
 Caching priority of each I/O request
 Evaluation
12

Structure of hStorage-DB

                         …        Execution
   Query                           Engine
  Optimizer           Query
                     Planner


           Buffer Pool Manager
               Request       + Semantic Information
                Storage Manager
      Info 1       ...  Info N QoS policy


           (Policy assignment table)
      I/O Request            + QoS policy

      Storage System Control Logic


HDD    …       HDD              SSD     …     SSD
13


                  Highlights of hStorage-DB

• Policy assignment table
   – Stores all the rules to assign a QoS policy for each I/O request
   – Assignments are made on organized DB semantic information


• Communication between a DBMS and hStorage
   – The QoS policy for each I/O request is delivered to a hStorage
     system by protocol of “Differentiated Storage Services” (SOSP’11)
   – hStorage system makes action accordingly
14


       The Interface Used in hStorage-DB

                                                  Open with a flag

fd=open("foo", O_RDWR|O_CLASSIFIED, 0666);
qos = 19;
myiov[0].iov_base = &qos;
                               QoS policy of this equest
myiov[0].iov_len = 1;
myiov[1].iov_base = “Hello, world!”;
                                               Payload
myiov[1].iov_len = 13;
writev(fd, myiov, 2);

              QoS is delivered with the payload
15

                         QoS Policies
• They are high-level abstractions of hStorage systems
   – Hide device complexities
   – Match resource characteristics
• QoS policy examples:
      •   High bandwidth (parallelism in SSD/disk array)
      •   Low latency for random accesses (SSD)
      •   Low latency for large sequential accesses (HDD)
      •   Reliability (data duplications)
• For a caching system
   – caching priorities: Priority 1, Priority 2, …, Bypass
16


                 Outline

 Introduction
 Design of hStorage-DB
Caching priority for each I/O request
 Evaluation
17

         Caching Priorities as QoS Policies

• Priorities are enumerated
   – E.g. 1, 2, 3, …, N
   – Priority 1 is the highest priority
      • Data from high-priority requests can evict data cached
        for low-priority requests
• Special “priorities”
   – Bypass
      • Requests with this priority will not affect in-cache data
   – Eviction
      • Data accessed by requests with a eviction “priority” will
        be immediately evicted out of cache
   – Write buffer
18


    From Semantic Information to Caching Priorities
•  Principle:
  1. possibility of data reuse: no reuse, no cache
  2. benefit from cache: no benefit, no cache (repeated scan)
• Methodology:
  1. Classify requests into different types (focus on OLAP)
      •   Sequential access
      •   Random access
      •   Temporary data requests
      •   Update requests
    2. Associate each type with a caching priority
      •   Some types are further divided into subtypes
    3. The hStorage system makes placement decisions
       accordingly upon receiving each I/O request
19


           Policy Assignment Table

 Sequential accesses                 Priority 1


                                     Priority 2
  Random accesses
                                         …

   Temporary data                    Priority N
      accesses

                                      Bypass
Temporary Data delete
                                     Eviction

                                      Write
      Updates
                                      Buffer
20


                      Random Requests
• Determined by operator position in query plan tree
• Follows the iteration model
        Priority 2                             Join

        Priority 4
                                   Hash               Index Scan
         Bypass                                         on: t.c

                                   Join


                         Join               Index Scan
                                              on: t.b
         Index Scan                Join
           on: t.a
                      Index Scan          Sequential Scan
                        on: t.a               on: t.b
21


                      Concurrent Queries

• Concurrent queries may access the same object
  – Causing non-deterministic priority for random requests:
     • Because each query may have a different query plan tree


• Solution
  – A data structure that “aggregates” all concurrent query plan trees
  – The data structure is updated at the start and end of each query
  – Each of the concurrent queries will be assigned a QoS policy based
    on analytics
22


                 Outline

 Introduction
 Design of hStorage-DB
 Caching priority each I/O request
Evaluation
23


           Experimental setup
• Dual-machine setup (with 10GB Ethernet)
  – A DBMS: hStorage-DB based on PostgreSQL
  – A dedicated storage system, with an SSD cache
• Configuration
  – Xeon, 2-way, quad-core 2.33GHz, 8GB RAM,
  – 2 Seagate 15.7K rpm HDD
  – SSD cache: Intel 320 Series 300GB (use 32GB)
• Workload
  – TPC-H @30SF (46GB with 7 indexes)
24


                 Diverse Request Types in TPC-H
                                          Tmp.     Rand.     Seq.

  100%

  90%

  80%

  70%

  60%

  50%

  40%

  30%

  20%

  10%

   0%
         1   2    3   4   5   6   7   8   9   10   11   12   13     14   15   16   17   18   19   20   21   22


• Most queries are dominated by sequential requests
• Queries 2,8,9,20,21 have a large number of random requests
• Query 18 has a large number of temporary data requests
25


   No overhead for cache-insensitive queries
                                      HDD-only   LRU    hStorage-DB   SSD-only

                            400

                            350
     Execution time (sec)



                            300

                            250

                            200

                            150

                            100

                            50

                              0
                                  1              5                      11       19

                                                       Query name

• Current SSD cannot speed up these queries
• Caching may harm performance (LRU)
• hStorage-DB does not incur overhead for sequential requests
26


   Working Well for Cache-Effective Queries
                                     HDD-only   LRU     hStorage-DB    SSD-only

                             40000
                                     35865
                             35000
      Execution time (sec)


                             30000

                             25000

                             20000
                                                5.77X             5.86X           7.19X
                             15000

                             10000
                                                      6216            6120         4986
                              5000

                                 0
                                                         Query 9

• Random requests benefit from SSD
• High locality can be captured by the traditional LRU
• hStorage-DB achieves high performance without monitoring efforts
27


 Efficiently Handling Temporary Data Requests
                                    HDD-only   LRU      hStorage-DB     SSD-only

                            10000
                                    8950        1.03X
                             9000                    8694
     Execution time (sec)


                             8000

                             7000
                                                                      1.46X        1.49X
                                                                       6146         5990
                             6000

                             5000

                             4000

                             3000

                             2000

                             1000

                                0


                                                        Query 18

• hStroage-DB:
  – Temporary data is cached as long as its lifetime, and
    evicted immediately at the end of lifetime
  – Lifetime is hard to detect, if not informed semantically
28


                                                      Concurrency (Throughput)
                               HDD-only     LRU      hStorage-DB      SSD-only                                          HDD-only     LRU       hStorage-DB      SSD-only

                       12000                                                                                    70000
                                                                                                                         66385

                                                                                                                60000
                       10000     9529




                                                                                         Execution time (sec)
Execution time (sec)




                                                                                                                50000
                       8000

                                                                                                                40000
                       6000
                                                                                                                30000                                        25973
                                                                                                                                 22542                               23152
                       4000
                                        2952 2946                                                               20000
                                                    2101                                                                                                                 12525
                       2000                                        1495 1316
                                                                             1092 1034                          10000                    8039 7701

                                                                                                                                                                                 1184
                           0                                                                                        0
                                           9                                18                                                       9                                  18

                                                    Query name                                                                               Query name

                         Performance in independent execution                                                           Performance in concurrency
29


                          Summary
• DBMS could exploit organized semantic information
• DBMS should be hStorage-aware (QoS policies)
• A set of rules to determine the QoS policy (caching priority)
  for each I/O request
• Experiments on hStorage-DB shows its effectiveness
30




Thank you!
Questions?

More Related Content

What's hot

Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
VESIT/University of Mumbai
 
Introduction to Memoria
Introduction to MemoriaIntroduction to Memoria
Introduction to MemoriaVictor Smirnov
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the Answer
Neil Raden
 
HBase Mongo_DB Project
HBase Mongo_DB ProjectHBase Mongo_DB Project
HBase Mongo_DB Project
Sonali Gupta
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradata
Asis Mohanty
 
Fast Analytics
Fast Analytics Fast Analytics
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
IBM Danmark
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
Database administrator project-presentation-1-v1
Database administrator project-presentation-1-v1Database administrator project-presentation-1-v1
Database administrator project-presentation-1-v1
galibhussain
 
Netezza pure data
Netezza pure dataNetezza pure data
Netezza pure data
Hossein Sarshar
 
Ibm total storage san file system as an infrastructure for multimedia servers...
Ibm total storage san file system as an infrastructure for multimedia servers...Ibm total storage san file system as an infrastructure for multimedia servers...
Ibm total storage san file system as an infrastructure for multimedia servers...Banking at Ho Chi Minh city
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
IBM Sverige
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
Dr Neelesh Jain
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDataWorks Summit
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bhawani N Prasad
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
An Introduction to Netezza
An Introduction to NetezzaAn Introduction to Netezza
An Introduction to Netezza
Vijaya Chandrika
 
Netezza All labs
Netezza All labsNetezza All labs
Netezza All labs
Johnny Zurita
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
Biju Nair
 

What's hot (20)

Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Introduction to Memoria
Introduction to MemoriaIntroduction to Memoria
Introduction to Memoria
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the Answer
 
HBase Mongo_DB Project
HBase Mongo_DB ProjectHBase Mongo_DB Project
HBase Mongo_DB Project
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradata
 
Fast Analytics
Fast Analytics Fast Analytics
Fast Analytics
 
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Database administrator project-presentation-1-v1
Database administrator project-presentation-1-v1Database administrator project-presentation-1-v1
Database administrator project-presentation-1-v1
 
Netezza pure data
Netezza pure dataNetezza pure data
Netezza pure data
 
Ibm total storage san file system as an infrastructure for multimedia servers...
Ibm total storage san file system as an infrastructure for multimedia servers...Ibm total storage san file system as an infrastructure for multimedia servers...
Ibm total storage san file system as an infrastructure for multimedia servers...
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
An Introduction to Netezza
An Introduction to NetezzaAn Introduction to Netezza
An Introduction to Netezza
 
Netezza All labs
Netezza All labsNetezza All labs
Netezza All labs
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 

Viewers also liked

Infographic: The Noble Nurse
Infographic: The Noble NurseInfographic: The Noble Nurse
Infographic: The Noble Nurse
Elsevier
 
Elsevier Society Member Survey
Elsevier Society Member SurveyElsevier Society Member Survey
Elsevier Society Member Survey
Elsevier
 
Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit
Elsevier
 
From Ideas to Markets: the Gender Factor
From Ideas to Markets: the Gender FactorFrom Ideas to Markets: the Gender Factor
From Ideas to Markets: the Gender Factor
Elsevier
 
Java an introduction to programming
Java an introduction to programmingJava an introduction to programming
Java an introduction to programmingLewis Haber
 
STEM education: preparing for the jobs of the future
STEM education: preparing for the jobs of the futureSTEM education: preparing for the jobs of the future
STEM education: preparing for the jobs of the future
Elsevier
 
GES Day 2013: Design for Social Impact
GES Day 2013: Design for Social ImpactGES Day 2013: Design for Social Impact
GES Day 2013: Design for Social Impactscmalin28
 
Elsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot Infographic
Elsevier
 

Viewers also liked (11)

Infographic: The Noble Nurse
Infographic: The Noble NurseInfographic: The Noble Nurse
Infographic: The Noble Nurse
 
Finland
FinlandFinland
Finland
 
Elsevier Society Member Survey
Elsevier Society Member SurveyElsevier Society Member Survey
Elsevier Society Member Survey
 
Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit
 
From Ideas to Markets: the Gender Factor
From Ideas to Markets: the Gender FactorFrom Ideas to Markets: the Gender Factor
From Ideas to Markets: the Gender Factor
 
Java an introduction to programming
Java an introduction to programmingJava an introduction to programming
Java an introduction to programming
 
STEM education: preparing for the jobs of the future
STEM education: preparing for the jobs of the futureSTEM education: preparing for the jobs of the future
STEM education: preparing for the jobs of the future
 
GES Day 2013: Design for Social Impact
GES Day 2013: Design for Social ImpactGES Day 2013: Design for Social Impact
GES Day 2013: Design for Social Impact
 
Elsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot Infographic
 
Dios Existe
Dios ExisteDios Existe
Dios Existe
 
Sejarah akuntansi
Sejarah akuntansiSejarah akuntansi
Sejarah akuntansi
 

Similar to hStorage-DB

01 dbms-introduction
01 dbms-introduction01 dbms-introduction
01 dbms-introductionToktok Tukta
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overviewRandall Hauch
 
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...Amazon Web Services
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
BradDesAulniers2
 
Lecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and TechnologyLecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and Technology
phanleson
 
Sql no sql
Sql no sqlSql no sql
Sql no sql
Dave Stokes
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
Swiss Big Data User Group
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
Schubert Zhang
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
RithikRaj25
 
DBMS introduction
DBMS introductionDBMS introduction
DBMS introduction
BHARATH KUMAR
 
6 - Foundations of BI: Database & Info Mgmt
6 - Foundations of BI: Database & Info Mgmt6 - Foundations of BI: Database & Info Mgmt
6 - Foundations of BI: Database & Info Mgmt
Hemant Nagwekar
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudi
pasalapudi123
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Lars Marowsky-Brée
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
Nisha Talagala
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
eBay’s Challenges and Lessons
eBay’s Challenges and LessonseBay’s Challenges and Lessons
eBay’s Challenges and Lessons
hutuworm
 

Similar to hStorage-DB (20)

01 dbms-introduction
01 dbms-introduction01 dbms-introduction
01 dbms-introduction
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overview
 
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
 
Lecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and TechnologyLecture 05 - The Data Warehouse and Technology
Lecture 05 - The Data Warehouse and Technology
 
Sql no sql
Sql no sqlSql no sql
Sql no sql
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
DBMS introduction
DBMS introductionDBMS introduction
DBMS introduction
 
6 - Foundations of BI: Database & Info Mgmt
6 - Foundations of BI: Database & Info Mgmt6 - Foundations of BI: Database & Info Mgmt
6 - Foundations of BI: Database & Info Mgmt
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudi
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
eBay’s Challenges and Lessons
eBay’s Challenges and LessonseBay’s Challenges and Lessons
eBay’s Challenges and Lessons
 

hStorage-DB

  • 1. hStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Michael Mesnier Rubao Lee Feng Chen Xiaodong Zhang The Ohio State University Intel Labs
  • 2. 2 Heterogeneous Storage Resources vs. Diverse QoS Requirements of DB Requests • Storage advancement provides us with – High capacity, low cost, but slow hard disk devices (HDD) – Fast, low power, but expensive solid state devices (SSD) – HDD and SSD co-exist due to their unique merits and limits • DB requests have diverse QoS requirements – Different access patterns: bandwidth/latency demands – Different priorities of data processing requests – Dynamic changes of requirements • Hybrid storage can well satisfy diverse QoS of DB requests – should be automatic and adaptive with low overhead – But with challenges
  • 3. 3 Challenges for Hybrid Storage Systems to Satisfy Diverse QoS Requirements • DBMS (What I/O services do I need as a storage user?) – Classifications of I/O requests into different types – hStorage awareness – DBMS enhancements to utilize classifications automatically • hStorage (What can I do for you as a service provider?) – Clear definition of supported QoS classifications – Hide device details to DBMS – Efficient data management among heterogeneous devices • Communication between DBMS and hStorage – Rich information to deliver but limited by interface abilities – Need a standard and general purpose protocol
  • 4. 4 Current interface to access storage read/write(int fd, void *buf, size_t count); On-disk location In-memory data Request size This interface cannot inform storage the per-request QoS. So, we must take other approaches.
  • 5. 5 DBA-based Approach • DBAs decide data placement among heterogeneous devices based on experiences DBMS SSD HDD Indexes Other data • Limitations: – Significant human efforts: expertise on both DB and storage. – Large granularity, e.g. table/partition-based data placements – Static storage layout: • Tuned for the “common” case • Could not well respond to execution dynamics
  • 6. 6 Monitoring-based Solutions • Storage systems automatically make data placement and replacement decisions, by monitoring access patterns – LRU (a basic structure), LIRS (MySQL), ARC (IBM I/O controller) – Examples from industry: • Solid State Hybrid Drive (Seagate) • Easy Tier (IBM) • Limitations: – Takes time to recognize access patterns • Hard to handle dynamics in short periods – With concurrency, access patterns cannot be easily detected – Certain critical insights are not access patterns related • Domain information (available from DBMS) is not utilized
  • 7. 7 What information from DBMS we can use? • System catalog – Data type: index, regular table – Ownership of data sets: e.g. VIP user, regular user • Query optimizer – Orders of operations and access path – Estimated frequency of accesses to related data • Query planner – Access patterns • Execution engine – Life cycles of data usage They are un-organized semantic information for I/O requests
  • 8. 8 DBMS Knowledge is not Well Utilized … Execution Query Engine Optimizer System Catalog Does not consider critical semantic information for storage management Buffer Pool Manager Request Storage Manager Block interface: I/O Request r/w, LBN, data, size Storage
  • 9. 9 Goal: organize/utilize DBMS semantic Information Checkpoint Sequential User1 User2 Random 。。。 Vacuum Repeated scan Bkgd. processes Query Optimizer Connection pool Sys table Index User Table Temp data Buffer Pool DBMS Semantic gap Storage The mission of hStorage-DB is to fill this gap.
  • 10. 10 hStorage-DB: DBMS for hStorage • Objectives: – Automatic system management – High performance • Utilizing available domain knowledge within DBMS for storage I/O • Fine-grained data management (block granularity) • Well respond to the dynamics of DB requests with different QoS reqs • System Design Outline – A hStorage system specifies a set of QoS policies – At runtime, the DBMS selects the needed policy for each I/O request based on the organized semantic information – I/O requests and their QoS policies are passed to hStorage system – The hStorage system makes data placement actions accordingly.
  • 11. 11 Outline Introduction  hStorage-DB Caching priority of each I/O request Evaluation
  • 12. 12 Structure of hStorage-DB … Execution Query Engine Optimizer Query Planner Buffer Pool Manager Request + Semantic Information Storage Manager Info 1 ... Info N QoS policy (Policy assignment table) I/O Request + QoS policy Storage System Control Logic HDD … HDD SSD … SSD
  • 13. 13 Highlights of hStorage-DB • Policy assignment table – Stores all the rules to assign a QoS policy for each I/O request – Assignments are made on organized DB semantic information • Communication between a DBMS and hStorage – The QoS policy for each I/O request is delivered to a hStorage system by protocol of “Differentiated Storage Services” (SOSP’11) – hStorage system makes action accordingly
  • 14. 14 The Interface Used in hStorage-DB Open with a flag fd=open("foo", O_RDWR|O_CLASSIFIED, 0666); qos = 19; myiov[0].iov_base = &qos; QoS policy of this equest myiov[0].iov_len = 1; myiov[1].iov_base = “Hello, world!”; Payload myiov[1].iov_len = 13; writev(fd, myiov, 2); QoS is delivered with the payload
  • 15. 15 QoS Policies • They are high-level abstractions of hStorage systems – Hide device complexities – Match resource characteristics • QoS policy examples: • High bandwidth (parallelism in SSD/disk array) • Low latency for random accesses (SSD) • Low latency for large sequential accesses (HDD) • Reliability (data duplications) • For a caching system – caching priorities: Priority 1, Priority 2, …, Bypass
  • 16. 16 Outline Introduction Design of hStorage-DB Caching priority for each I/O request Evaluation
  • 17. 17 Caching Priorities as QoS Policies • Priorities are enumerated – E.g. 1, 2, 3, …, N – Priority 1 is the highest priority • Data from high-priority requests can evict data cached for low-priority requests • Special “priorities” – Bypass • Requests with this priority will not affect in-cache data – Eviction • Data accessed by requests with a eviction “priority” will be immediately evicted out of cache – Write buffer
  • 18. 18 From Semantic Information to Caching Priorities • Principle: 1. possibility of data reuse: no reuse, no cache 2. benefit from cache: no benefit, no cache (repeated scan) • Methodology: 1. Classify requests into different types (focus on OLAP) • Sequential access • Random access • Temporary data requests • Update requests 2. Associate each type with a caching priority • Some types are further divided into subtypes 3. The hStorage system makes placement decisions accordingly upon receiving each I/O request
  • 19. 19 Policy Assignment Table Sequential accesses Priority 1 Priority 2 Random accesses … Temporary data Priority N accesses Bypass Temporary Data delete Eviction Write Updates Buffer
  • 20. 20 Random Requests • Determined by operator position in query plan tree • Follows the iteration model Priority 2 Join Priority 4 Hash Index Scan Bypass on: t.c Join Join Index Scan on: t.b Index Scan Join on: t.a Index Scan Sequential Scan on: t.a on: t.b
  • 21. 21 Concurrent Queries • Concurrent queries may access the same object – Causing non-deterministic priority for random requests: • Because each query may have a different query plan tree • Solution – A data structure that “aggregates” all concurrent query plan trees – The data structure is updated at the start and end of each query – Each of the concurrent queries will be assigned a QoS policy based on analytics
  • 22. 22 Outline Introduction Design of hStorage-DB Caching priority each I/O request Evaluation
  • 23. 23 Experimental setup • Dual-machine setup (with 10GB Ethernet) – A DBMS: hStorage-DB based on PostgreSQL – A dedicated storage system, with an SSD cache • Configuration – Xeon, 2-way, quad-core 2.33GHz, 8GB RAM, – 2 Seagate 15.7K rpm HDD – SSD cache: Intel 320 Series 300GB (use 32GB) • Workload – TPC-H @30SF (46GB with 7 indexes)
  • 24. 24 Diverse Request Types in TPC-H Tmp. Rand. Seq. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 • Most queries are dominated by sequential requests • Queries 2,8,9,20,21 have a large number of random requests • Query 18 has a large number of temporary data requests
  • 25. 25 No overhead for cache-insensitive queries HDD-only LRU hStorage-DB SSD-only 400 350 Execution time (sec) 300 250 200 150 100 50 0 1 5 11 19 Query name • Current SSD cannot speed up these queries • Caching may harm performance (LRU) • hStorage-DB does not incur overhead for sequential requests
  • 26. 26 Working Well for Cache-Effective Queries HDD-only LRU hStorage-DB SSD-only 40000 35865 35000 Execution time (sec) 30000 25000 20000 5.77X 5.86X 7.19X 15000 10000 6216 6120 4986 5000 0 Query 9 • Random requests benefit from SSD • High locality can be captured by the traditional LRU • hStorage-DB achieves high performance without monitoring efforts
  • 27. 27 Efficiently Handling Temporary Data Requests HDD-only LRU hStorage-DB SSD-only 10000 8950 1.03X 9000 8694 Execution time (sec) 8000 7000 1.46X 1.49X 6146 5990 6000 5000 4000 3000 2000 1000 0 Query 18 • hStroage-DB: – Temporary data is cached as long as its lifetime, and evicted immediately at the end of lifetime – Lifetime is hard to detect, if not informed semantically
  • 28. 28 Concurrency (Throughput) HDD-only LRU hStorage-DB SSD-only HDD-only LRU hStorage-DB SSD-only 12000 70000 66385 60000 10000 9529 Execution time (sec) Execution time (sec) 50000 8000 40000 6000 30000 25973 22542 23152 4000 2952 2946 20000 2101 12525 2000 1495 1316 1092 1034 10000 8039 7701 1184 0 0 9 18 9 18 Query name Query name Performance in independent execution Performance in concurrency
  • 29. 29 Summary • DBMS could exploit organized semantic information • DBMS should be hStorage-aware (QoS policies) • A set of rules to determine the QoS policy (caching priority) for each I/O request • Experiments on hStorage-DB shows its effectiveness

Editor's Notes

  1. Data type: regular table, index, hash table, etc.Access patterns: sequential, random