An Active and Hybrid Storage System
     for Data-intensive Applications

   Ph.D Candidate: Zhiyang Ding
   Defense Committee Members:
   Dr. Xiao Qin
   Dr. Kai H. Chang
   Dr. David A. Umphress
   University Reader:
   Prof. Wei Wang,
   Chair of the Art Design Dept.
                    5/7/2012
Cluster Computing
      • Large-scale Data Processing is everywhere.




5/7/2012                     2
Motivation
         • Traditional Storage Nodes on the Cluster
                                                            Storage Node
                                     Head Node       (or Storage Area Network)
                    Internet
Client




                                   Network switch




                               Compute
                               Nodes
         5/7/2012                                3
Motivation
         • What’s the next?
         • More “Active”.


                             Head
              Internet




                             Node
Client




                            Network switch



                                                          Storage Node
                  Compute
                  Nodes                      Computation Offload
                                                I/O Request

                                                   Raw Data
                                              Pre-processed Data
         5/7/2012                                           4
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                           Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               5
McSD:
   A Multicore Active Storage Device

• I/O Wall Problem: CPU--I/O Gap
      – Limited I/O Bandwidth
      – CPU Waiting and
        Dissipating the Power
• How to
      – Bridge CPU--I/O Gap
      – Reduce I/O Traffic


5/7/2012                      6
Why McSD?


• “Active”:
      – Leveraging the Processing Power of Storage Devices


• Benefits:
      – Offloading Data-intensive Computation
      – Reducing I/O Traffic
      – Pipeline Parallel Programming


5/7/2012                     7
Contributions


• Design a prototype of a multicore active storage

• Design a pre-assembled processing module

• Extend a shared-memory MapReduce system

• Emulate the whole system on a real testbed


5/7/2012                 8
Background: Active Disks

• Traditional Smart/Active Disks
      – On-board: Embedding a processor into the hard disk
      – Various Research Models
         • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc.

• However, “active disk” is not adopted by hardware vendors

            Improved attachment
                                       Cost of the System
                technologies


            I/O Bound Workloads            Reliability


5/7/2012                           9
Background: Parallel Processing

• Multi-core Processors or Multi-processors
      – 45% transistors increase   20% processing power
• MapReduce: a Parallel Programming Model
      – MapReduce by Google
      – Hadoop, Mars, Phoenix, and etc.
• Multicore and Shared-memory Parallel
  Processing

5/7/2012                     10
Design: System Overview

                                            Pipeline Parallel
                                               Processing

                                           Communication
                                             Mechanism
  Multicore and
 Shared-memory
Parallel Processing
                                       Hybrid Storage Disks




 Design of an Active
      Storage

  5/7/2012                  11
Design and Implementation

• Computation Mechanism
      – Pre-assembled Processing Model
      – smartFAM
• Extend the Shared-Memory MapReduce by
  Partitioning




5/7/2012                   12
Pre-assembled Processing Modules


• Pre-assembled Processing Modules
      – Meet the nature of embedded services
      – Reduce Complexity and Cost
      – Provide Services
           • E.g. Multi-version antivirus service, Pre-process of data-
             intensive apps, De-duplication, and etc.
• How to invoke services?


5/7/2012                            13
smartFAM

• smartFAM = Smart File Alternation Monitor
      – Invokes the pre-assembled processing modules or
        functions by monitoring the changes of the system
        log file.
• Two Components:
      – an inotify function: a Linux system function
      – a trigger daemon


5/7/2012                      14
Design and Implementation

  Active Node


  smartFAM
        Daemon


                   Pre-assembled
                   Modules
         inotify
                           ...        Host node
                           2
                                                                    1
                                      smartFAM      Main Program

                                         Daemon
                       Module Log                                 Data-
        Log files                                     General
                                                                intensive
                      & Result data                 functions
                                                                 function


                           3              inotify
                                                       Merge Results

  NFS




5/7/2012                                    15
Extend the Phoenix:
    A Shared-memory MapReduce Model

• Extend the Phoenix MapReduce Programming
  Model by partitioning and merging
      – New API: partition_input
      – New Functions:
           • partition (provided by the new API)
           • merge (Develop by user)


• Example:
      – wordcount [data-file][partition-size][]


5/7/2012                             16
Pipeline Processing




5/7/2012                 17
Evaluation Environment

• Testbed

• Benchmarks
      – Word Count
      – String Match
      – Matrix Multiplication

• Individual Node Performance
• System Performance
5/7/2012                        18
Individual Node Performance


                Word Count (seconds)    String Match (seconds)

                1 GB          1.25 GB   1 GB           1.25 GB

w/ Partition    40.60          50.91    17.76           20.61

w/o Partition   85.74         139.54    17.62           21.00




5/7/2012                       19
System Evaluation

                  Matrix-Multiplication and Word-Count (Speedups)
Input Data Size          vs Single Machine          vs Single-core Active   vs McSD w/o Partition

   500 MB                        1.47 X                   2.15 X                   0.99 X

   750 MB                        1.45 X                   2.09 X                   1.04 X

     1 GB                        7.62 X                   2.14 X                   6.07 X

   1.25 GB                      19.01 X                   2.50 X                  15.39 X


                      TConsumptionOfControlSample
            Speedup =
                         TConsumptionOfMcSD
 5/7/2012                                           20
Summary

• It can improve system performance by
  offloading data-intensive computation

• McSD is a promising active storage model with
      – Pre-assembled processing modules
      – Parallel data processing
      – Better Evaluation Performance


5/7/2012                   21
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               22
Apply Active Storages to a Cluster


• So far, we know the potential of Active
  Storages

• Challenge: How to coordinate active storage
  nodes with computing nodes?

• Propose a Pipeline-parallel Processing pattern

5/7/2012                23
Contributions


• Propose a pipeline-parallel processing framework
  to “connect” a Active Storage node with
  computing nodes.
• Evaluate the framework using both an analytic
  model and a real implementation.
• Case Study: Extend an existing bioinformatics
  application based on the framework.

5/7/2012                24
Background: Active Storage


                   Processor
     Memory

  Mass Storage
                                         Bridge?


                 Active Storage
                     Node

                  SSD      SSD    Computation

                   Buff Disks


5/7/2012                            25
Background: Bioinformatics App

• BLAST*: Basic Local Alignment Search Tool
      – Comparing primary biological sequence
        information


• mpiBLAST** is a freely available, open-source,
  parallel implementation of NCBI BLAST.
      – Format raw data files
      – Run a parallel BLAST function
                            *http://blast.ncbi.nlm.nih.gov/
                            **http://www.mpiblast.org/
5/7/2012                      27
Pipeline-parallel Design


• Offload the raw-data formatting task to where
    data stores.
• Intra-application Pipeline-parallel Processing
    by “partition” and “merge”.
• pp-mpiBlast, a case study.


5/7/2012                   28
Pipelining Workflow

Active Storage Node                                                              Computing Nodes
                                         Intermediate             Sub-output
                  Partition 1
                                                1                       1

  Raw                     2                   2
                                           Inter-                        2
                                                                                          Output
 Input                 Formart DB          mediat                Formart DB             Output
                                                                                            File
  File                    …                  es     …                       …
                        Partition            Intermediate               Sub-output
                           n                         n                       n

                           n                                                 1
           Partition                 FormatDB                mpiBlast                Merge
                       (n-1) times
                                                         (n-1) times
5/7/2012                                        29
Analytic Model

• Three Critical Measures
 Tresponse = Tactive + Tcompute
                                  1
 Throughput =
                    max(Tactive ,Tcompute )
            Tsequence                n ´ (Tactive + Tcompute )
  Speedup =           =
            Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute
               n
 =
               Throughput
   1+ (n -1) ´
                 Tresponse

5/7/2012                                 30
Evaluation Environment

                Computing Nodes Configuration            Active Storage Configuration
    CPU                  Intel XEON X3430                       Intel Core 2 Q9400
 Memory                               2 GB DDR3 (PC3-10600)
     OS                      Ubuntu 9.04 Jaunty Jackalope 32bit Version
   Kernel                                   2.6.28-15-generic
 Network                                         Gigabit LAN

           Our Testbed                              Opposite Testbeds
    “Pipeline-parallel”           “12-node Cluster”               “13-node Cluster”
    12 Computing Nodes           12 Computing Nodes               13 Computing Nodes
   1 Active Storage Node            1 Storage Node                  1 Storage Node



5/7/2012                                    31
Pipeline-parallel Design




                   Results: Compared With 12-node System




                   Results: Compared With 13-node System
5/7/2012                          32
Speedups Trends: Partition Size




5/7/2012             33
Summary


• We proposed a pipeline-parallel processing
    mechanism to apply an Active Storage Node.


• As a case study, we extended a classic
    bioinformatics application based on the
    pipeline-parallel style.

5/7/2012                   34
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               35
What’s Hybrid?

A Hybrid Combination of a Gas    Power
Engine and a Electronic Engine   Efficiency




5/7/2012               36
Hybrid Disk Drives

• A Hybrid Combination of Two Types of Storage
  Devices: HDD and SSD
      – HDD: Magnetic Hard Disk
      – Solid State Disk: Built by NAND-based flash memory.


                                        What are their roles?




5/7/2012                       37
Motivation


• In a hybrid storage system, using SSDs as the
  buffer can boost the performance.
            WordCount on Intel Core2 Duo E8400 (seconds)

• However, SSDs suffer Input Data Size issues.
  Storage Buffer
                       reliability
                          500 MB   750 MB    1 GB    1.25 GB

           HDD    HDD      21.51    38.30   505.25   1294.64


           HDD     SD
                    S      19.89    36.41    85.74   139.54



5/7/2012                             38
Limitations Related to SSDs

• Flash Memory:
      – Each Block consists 32 or 64 or128 pages.
      – Each Page is typically 512 or 2,048 or 4,096 bytes.
• “Erase-before-write” at block level.
• Lifespan is 10,000 Program/Erase cycles.
      – E.g., *The lifespan of an 80 GB MLC SSD can only
        last 106 days, if the write rates is 30 MB/s.
• Rethink about their roles?
            *Based on the SSD lifespan calculator provided by Virident.com
5/7/2012                                    39
Contributions


• Hybrid Combination of HDD and SSD disks

• De-duplication Service using HDDs as a Write Buffer

• Internal-parallel Processing in SSD

• Simulation of the Whole System For Evaluation



5/7/2012                  40
Hybrid Disk Configuration


                                                       De-duplication
             Data of Write Requests

                                               HDD
     I/O                                                           Dedicated
   Requests                                     data               Processor
                                Deduplicated    data


                Read Requests                          Pre-processing
               Pre-processed Data
                      Data
                                                SSD


5/7/2012                              41
HcDD Architecture




5/7/2012               42
Deduplication Design




5/7/2012                43
List #0
                                     List #1
                                                List #2
                                                            List #3
                                                                       List #4
                                                                                  List #5
                                                                                             List #6
                                                                                                        List #7




5/7/2012
                           ...        ...        ...        ...         ...        ...        ...        ...
                           ...        ...        ...        ...         ...        ...        ...        ...




           SDRAM Cache
                           ...        ...        ...        ...         ...        ...        ...        ...
                           ...        ...        ...        ...         ...        ...        ...        ...
                         Req 17     Req 18     Req 19     Req 20      Req 21     Req 22     Req 23     Req 24
                         Req 9      Req 10     Req 11     Req 12      Req 13     Req 14     Req 15     Req 16
                         Req 1      Req 2      Req 3      Req 4       Req 5      Req 6      Req 7      Req 8




                           #0
                                      #1
                                                 #2
                                                                                   #5
                                                                                              #6
                                                                                                         #7




                                                            #3
                                                                        #4




                         Package
                                    Package
                                               Package
                                                          Package
                                                                      Package
                                                                                 Package
                                                                                            Package
                                                                                                       Package




44
                                                                                                                  Internal Parallel Processing
Evaluation




5/7/2012           45
Internal Parallelism Evaluation:
               Single Node




5/7/2012                 46
Single Node: Dedup Ratio




5/7/2012                  47
System Performance Evaluation




5/7/2012             48
System Performance Evaluation




5/7/2012             49
Summary




5/7/2012         50
Conclusion

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               51
Future Work




5/7/2012        52
Many Thanks!
           And Questions?




5/7/2012        53

An Active and Hybrid Storage System for Data-intensive Applications

  • 1.
    An Active andHybrid Storage System for Data-intensive Applications Ph.D Candidate: Zhiyang Ding Defense Committee Members: Dr. Xiao Qin Dr. Kai H. Chang Dr. David A. Umphress University Reader: Prof. Wei Wang, Chair of the Art Design Dept. 5/7/2012
  • 2.
    Cluster Computing • Large-scale Data Processing is everywhere. 5/7/2012 2
  • 3.
    Motivation • Traditional Storage Nodes on the Cluster Storage Node Head Node (or Storage Area Network) Internet Client Network switch Compute Nodes 5/7/2012 3
  • 4.
    Motivation • What’s the next? • More “Active”. Head Internet Node Client Network switch Storage Node Compute Nodes Computation Offload I/O Request Raw Data Pre-processed Data 5/7/2012 4
  • 5.
    About the ActiveStorage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 5
  • 6.
    McSD: A Multicore Active Storage Device • I/O Wall Problem: CPU--I/O Gap – Limited I/O Bandwidth – CPU Waiting and Dissipating the Power • How to – Bridge CPU--I/O Gap – Reduce I/O Traffic 5/7/2012 6
  • 7.
    Why McSD? • “Active”: – Leveraging the Processing Power of Storage Devices • Benefits: – Offloading Data-intensive Computation – Reducing I/O Traffic – Pipeline Parallel Programming 5/7/2012 7
  • 8.
    Contributions • Design aprototype of a multicore active storage • Design a pre-assembled processing module • Extend a shared-memory MapReduce system • Emulate the whole system on a real testbed 5/7/2012 8
  • 9.
    Background: Active Disks •Traditional Smart/Active Disks – On-board: Embedding a processor into the hard disk – Various Research Models • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc. • However, “active disk” is not adopted by hardware vendors Improved attachment Cost of the System technologies I/O Bound Workloads Reliability 5/7/2012 9
  • 10.
    Background: Parallel Processing •Multi-core Processors or Multi-processors – 45% transistors increase 20% processing power • MapReduce: a Parallel Programming Model – MapReduce by Google – Hadoop, Mars, Phoenix, and etc. • Multicore and Shared-memory Parallel Processing 5/7/2012 10
  • 11.
    Design: System Overview Pipeline Parallel Processing Communication Mechanism Multicore and Shared-memory Parallel Processing Hybrid Storage Disks Design of an Active Storage 5/7/2012 11
  • 12.
    Design and Implementation •Computation Mechanism – Pre-assembled Processing Model – smartFAM • Extend the Shared-Memory MapReduce by Partitioning 5/7/2012 12
  • 13.
    Pre-assembled Processing Modules •Pre-assembled Processing Modules – Meet the nature of embedded services – Reduce Complexity and Cost – Provide Services • E.g. Multi-version antivirus service, Pre-process of data- intensive apps, De-duplication, and etc. • How to invoke services? 5/7/2012 13
  • 14.
    smartFAM • smartFAM =Smart File Alternation Monitor – Invokes the pre-assembled processing modules or functions by monitoring the changes of the system log file. • Two Components: – an inotify function: a Linux system function – a trigger daemon 5/7/2012 14
  • 15.
    Design and Implementation Active Node smartFAM Daemon Pre-assembled Modules inotify ... Host node 2 1 smartFAM Main Program Daemon Module Log Data- Log files General intensive & Result data functions function 3 inotify Merge Results NFS 5/7/2012 15
  • 16.
    Extend the Phoenix: A Shared-memory MapReduce Model • Extend the Phoenix MapReduce Programming Model by partitioning and merging – New API: partition_input – New Functions: • partition (provided by the new API) • merge (Develop by user) • Example: – wordcount [data-file][partition-size][] 5/7/2012 16
  • 17.
  • 18.
    Evaluation Environment • Testbed •Benchmarks – Word Count – String Match – Matrix Multiplication • Individual Node Performance • System Performance 5/7/2012 18
  • 19.
    Individual Node Performance Word Count (seconds) String Match (seconds) 1 GB 1.25 GB 1 GB 1.25 GB w/ Partition 40.60 50.91 17.76 20.61 w/o Partition 85.74 139.54 17.62 21.00 5/7/2012 19
  • 20.
    System Evaluation Matrix-Multiplication and Word-Count (Speedups) Input Data Size vs Single Machine vs Single-core Active vs McSD w/o Partition 500 MB 1.47 X 2.15 X 0.99 X 750 MB 1.45 X 2.09 X 1.04 X 1 GB 7.62 X 2.14 X 6.07 X 1.25 GB 19.01 X 2.50 X 15.39 X TConsumptionOfControlSample Speedup = TConsumptionOfMcSD 5/7/2012 20
  • 21.
    Summary • It canimprove system performance by offloading data-intensive computation • McSD is a promising active storage model with – Pre-assembled processing modules – Parallel data processing – Better Evaluation Performance 5/7/2012 21
  • 22.
    About the ActiveStorage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 22
  • 23.
    Apply Active Storagesto a Cluster • So far, we know the potential of Active Storages • Challenge: How to coordinate active storage nodes with computing nodes? • Propose a Pipeline-parallel Processing pattern 5/7/2012 23
  • 24.
    Contributions • Propose apipeline-parallel processing framework to “connect” a Active Storage node with computing nodes. • Evaluate the framework using both an analytic model and a real implementation. • Case Study: Extend an existing bioinformatics application based on the framework. 5/7/2012 24
  • 25.
    Background: Active Storage Processor Memory Mass Storage Bridge? Active Storage Node SSD SSD Computation Buff Disks 5/7/2012 25
  • 26.
    Background: Bioinformatics App •BLAST*: Basic Local Alignment Search Tool – Comparing primary biological sequence information • mpiBLAST** is a freely available, open-source, parallel implementation of NCBI BLAST. – Format raw data files – Run a parallel BLAST function *http://blast.ncbi.nlm.nih.gov/ **http://www.mpiblast.org/ 5/7/2012 27
  • 27.
    Pipeline-parallel Design • Offloadthe raw-data formatting task to where data stores. • Intra-application Pipeline-parallel Processing by “partition” and “merge”. • pp-mpiBlast, a case study. 5/7/2012 28
  • 28.
    Pipelining Workflow Active StorageNode Computing Nodes Intermediate Sub-output Partition 1 1 1 Raw 2 2 Inter- 2 Output Input Formart DB mediat Formart DB Output File File … es … … Partition Intermediate Sub-output n n n n 1 Partition FormatDB mpiBlast Merge (n-1) times (n-1) times 5/7/2012 29
  • 29.
    Analytic Model • ThreeCritical Measures Tresponse = Tactive + Tcompute 1 Throughput = max(Tactive ,Tcompute ) Tsequence n ´ (Tactive + Tcompute ) Speedup = = Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute n = Throughput 1+ (n -1) ´ Tresponse 5/7/2012 30
  • 30.
    Evaluation Environment Computing Nodes Configuration Active Storage Configuration CPU Intel XEON X3430 Intel Core 2 Q9400 Memory 2 GB DDR3 (PC3-10600) OS Ubuntu 9.04 Jaunty Jackalope 32bit Version Kernel 2.6.28-15-generic Network Gigabit LAN Our Testbed Opposite Testbeds “Pipeline-parallel” “12-node Cluster” “13-node Cluster” 12 Computing Nodes 12 Computing Nodes 13 Computing Nodes 1 Active Storage Node 1 Storage Node 1 Storage Node 5/7/2012 31
  • 31.
    Pipeline-parallel Design Results: Compared With 12-node System Results: Compared With 13-node System 5/7/2012 32
  • 32.
  • 33.
    Summary • We proposeda pipeline-parallel processing mechanism to apply an Active Storage Node. • As a case study, we extended a classic bioinformatics application based on the pipeline-parallel style. 5/7/2012 34
  • 34.
    About the ActiveStorage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 35
  • 35.
    What’s Hybrid? A HybridCombination of a Gas Power Engine and a Electronic Engine Efficiency 5/7/2012 36
  • 36.
    Hybrid Disk Drives •A Hybrid Combination of Two Types of Storage Devices: HDD and SSD – HDD: Magnetic Hard Disk – Solid State Disk: Built by NAND-based flash memory. What are their roles? 5/7/2012 37
  • 37.
    Motivation • In ahybrid storage system, using SSDs as the buffer can boost the performance. WordCount on Intel Core2 Duo E8400 (seconds) • However, SSDs suffer Input Data Size issues. Storage Buffer reliability 500 MB 750 MB 1 GB 1.25 GB HDD HDD 21.51 38.30 505.25 1294.64 HDD SD S 19.89 36.41 85.74 139.54 5/7/2012 38
  • 38.
    Limitations Related toSSDs • Flash Memory: – Each Block consists 32 or 64 or128 pages. – Each Page is typically 512 or 2,048 or 4,096 bytes. • “Erase-before-write” at block level. • Lifespan is 10,000 Program/Erase cycles. – E.g., *The lifespan of an 80 GB MLC SSD can only last 106 days, if the write rates is 30 MB/s. • Rethink about their roles? *Based on the SSD lifespan calculator provided by Virident.com 5/7/2012 39
  • 39.
    Contributions • Hybrid Combinationof HDD and SSD disks • De-duplication Service using HDDs as a Write Buffer • Internal-parallel Processing in SSD • Simulation of the Whole System For Evaluation 5/7/2012 40
  • 40.
    Hybrid Disk Configuration De-duplication Data of Write Requests HDD I/O Dedicated Requests data Processor Deduplicated data Read Requests Pre-processing Pre-processed Data Data SSD 5/7/2012 41
  • 41.
  • 42.
  • 43.
    List #0 List #1 List #2 List #3 List #4 List #5 List #6 List #7 5/7/2012 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... SDRAM Cache ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Req 17 Req 18 Req 19 Req 20 Req 21 Req 22 Req 23 Req 24 Req 9 Req 10 Req 11 Req 12 Req 13 Req 14 Req 15 Req 16 Req 1 Req 2 Req 3 Req 4 Req 5 Req 6 Req 7 Req 8 #0 #1 #2 #5 #6 #7 #3 #4 Package Package Package Package Package Package Package Package 44 Internal Parallel Processing
  • 44.
  • 45.
    Internal Parallelism Evaluation: Single Node 5/7/2012 46
  • 46.
    Single Node: DedupRatio 5/7/2012 47
  • 47.
  • 48.
  • 49.
  • 50.
    Conclusion McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 51
  • 51.
  • 52.
    Many Thanks! And Questions? 5/7/2012 53

Editor's Notes

  • #3 Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • #4 Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • #5 Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • #7 Aesop’s Fable: The Tortoise and the Hare. Speed gap. Fast Runner wait for the slower one.Over the last several decades, the performance has increased rapidly. While, the performance improvement of I/O is relatively slow. It cause... the gap between CPU performance and I/O bandwidth has continually grown. Especially, for data-intensive computing workloads, I/O bottlenecks often cause low CPU utilization.
  • #28 BLAST is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
  • #29 Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #30 The pipeline pattern no only improves the performance by exploiting the par- allelism, but also can solve the out-of-core processing issue, which means required amount of data are too large to fit in the ASN’s main memory. In pp-mpiBlast, partition function is implemented within mpiformatdbfucntion running on ASN. And the merge function is a separate one running on the front node of the cluster.
  • #31 Response time, speedup, and throughput are three critical performance measures for the pipelined BLAST. Denoting T1 and T2 as the execution times associated with the first stage and second stage in the pipeline, we can calculate the response time Tresponse for processing each input data set as the sum of T1 and T2.
  • #32 Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #33 Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #34 Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #35 Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #40 One limitation of flash memory is that although it can be read or programmed a byte or a word at a time in a random access fashion, it can only be erased a "block" at a time. This generally sets all bits in the block to 1. Starting with a freshly erased block, any location within that block can be programmed. However, once a bit has been set to 0, only by erasing the entire block can it be changed back to 1. In other words, flash memory (specifically NOR flash) offers random-access read and programming operations, but cannot offer arbitrary random-access rewrite or erase operations.Based on theSSD lifetime calculator provided by Virident website [36], the lifetime of a 200GB MLC-based SSD could be only 160 days if the write rate performing on it is 50MB/s.
  • #49 The performance depends on the number of writes we removed.In real world implementation, (1) conservative comparison: no optimization, consider writes as synchronous (2) log file system->reduce seek and rotational delays of HDD (3) asynchronous writes: from the user perspective, the delay is not obvious (i.e. can omit)
  • #55 Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary