An Active and Hybrid Storage System     for Data-intensive Applications   Ph.D Candidate: Zhiyang Ding   Defense Committee...
Cluster Computing      • Large-scale Data Processing is everywhere.5/7/2012                     2
Motivation         • Traditional Storage Nodes on the Cluster                                                            S...
Motivation         • What’s the next?         • More “Active”.                             Head              Internet     ...
About the Active Storage               McSD:           A Smart Disk Model           pp-mpiBlast:     How to deploy Active ...
McSD:   A Multicore Active Storage Device• I/O Wall Problem: CPU--I/O Gap      – Limited I/O Bandwidth      – CPU Waiting ...
Why McSD?• “Active”:      – Leveraging the Processing Power of Storage Devices• Benefits:      – Offloading Data-intensive...
Contributions• Design a prototype of a multicore active storage• Design a pre-assembled processing module• Extend a shared...
Background: Active Disks• Traditional Smart/Active Disks      – On-board: Embedding a processor into the hard disk      – ...
Background: Parallel Processing• Multi-core Processors or Multi-processors      – 45% transistors increase   20% processin...
Design: System Overview                                            Pipeline Parallel                                      ...
Design and Implementation• Computation Mechanism      – Pre-assembled Processing Model      – smartFAM• Extend the Shared-...
Pre-assembled Processing Modules• Pre-assembled Processing Modules      – Meet the nature of embedded services      – Redu...
smartFAM• smartFAM = Smart File Alternation Monitor      – Invokes the pre-assembled processing modules or        function...
Design and Implementation  Active Node  smartFAM        Daemon                   Pre-assembled                   Modules  ...
Extend the Phoenix:    A Shared-memory MapReduce Model• Extend the Phoenix MapReduce Programming  Model by partitioning an...
Pipeline Processing5/7/2012                 17
Evaluation Environment• Testbed• Benchmarks      – Word Count      – String Match      – Matrix Multiplication• Individual...
Individual Node Performance                Word Count (seconds)    String Match (seconds)                1 GB          1.2...
System Evaluation                  Matrix-Multiplication and Word-Count (Speedups)Input Data Size          vs Single Machi...
Summary• It can improve system performance by  offloading data-intensive computation• McSD is a promising active storage m...
About the Active Storage               McSD:           A Smart Disk Model           pp-mpiBlast:     How to deploy Active ...
Apply Active Storages to a Cluster• So far, we know the potential of Active  Storages• Challenge: How to coordinate active...
Contributions• Propose a pipeline-parallel processing framework  to “connect” a Active Storage node with  computing nodes....
Background: Active Storage                   Processor     Memory  Mass Storage                                         Br...
Background: Bioinformatics App• BLAST*: Basic Local Alignment Search Tool      – Comparing primary biological sequence    ...
Pipeline-parallel Design• Offload the raw-data formatting task to where    data stores.• Intra-application Pipeline-parall...
Pipelining WorkflowActive Storage Node                                                              Computing Nodes       ...
Analytic Model• Three Critical Measures Tresponse = Tactive + Tcompute                                  1 Throughput =    ...
Evaluation Environment                Computing Nodes Configuration            Active Storage Configuration    CPU        ...
Pipeline-parallel Design                   Results: Compared With 12-node System                   Results: Compared With ...
Speedups Trends: Partition Size5/7/2012             33
Summary• We proposed a pipeline-parallel processing    mechanism to apply an Active Storage Node.• As a case study, we ext...
About the Active Storage               McSD:           A Smart Disk Model           pp-mpiBlast:     How to deploy Active ...
What’s Hybrid?A Hybrid Combination of a Gas    PowerEngine and a Electronic Engine   Efficiency5/7/2012               36
Hybrid Disk Drives• A Hybrid Combination of Two Types of Storage  Devices: HDD and SSD      – HDD: Magnetic Hard Disk     ...
Motivation• In a hybrid storage system, using SSDs as the  buffer can boost the performance.            WordCount on Intel...
Limitations Related to SSDs• Flash Memory:      – Each Block consists 32 or 64 or128 pages.      – Each Page is typically ...
Contributions• Hybrid Combination of HDD and SSD disks• De-duplication Service using HDDs as a Write Buffer• Internal-para...
Hybrid Disk Configuration                                                       De-duplication             Data of Write R...
HcDD Architecture5/7/2012               42
Deduplication Design5/7/2012                43
List #0                                     List #1                                                List #2                ...
Evaluation5/7/2012           45
Internal Parallelism Evaluation:               Single Node5/7/2012                 46
Single Node: Dedup Ratio5/7/2012                  47
System Performance Evaluation5/7/2012             48
System Performance Evaluation5/7/2012             49
Summary5/7/2012         50
Conclusion               McSD:           A Smart Disk Model           pp-mpiBlast:     How to deploy Active Storage?      ...
Future Work5/7/2012        52
Many Thanks!           And Questions?5/7/2012        53
Upcoming SlideShare
Loading in...5
×

An Active and Hybrid Storage System for Data-intensive Applications

476

Published on

Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with traditional storage systems, next-generation systems will embrace dedicated processor to reduce computational load of host machines and will have hybrid combinations of different storage devices. We present a new architecture of active storage system, which leverage the computational power of the dedicated processor, and show how it utilizes the multi-core processor and offloads the computation from the host machine. We then solve the challenge of applying the active storage node to cooperate with the other nodes in the cluster environment by design a pipeline-parallel processing pattern and report the effectiveness of the mechanism. In order to evaluate the design, an open-source bioinformatics application is extended based on the pipeline-parallel mechanism. We also explore the hybrid configuration of storage devices within the active storage. The advent of flash-memory-based solid state disk has become a critical role in revolutionizing the storage world. However, instead of simply replacing the traditional magnetic hard disk with the solid state disk, researchers believe that finding a complementary approach to corporate both of them is more challenging and attractive. Thus, we propose a hybrid combination of different types of disk drives for our active storage system. An simulator is designed and implemented to verify the new configuration. In summary, this dissertation explores the idea of active storage, an emerging new storage configuration, in terms of the architecture and design, the parallel processing capability, the cooperation of other machines in cluster computing environment, and the new disk configuration, the hybrid combination of different types of disk drives.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
476
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • Aesop’s Fable: The Tortoise and the Hare. Speed gap. Fast Runner wait for the slower one.Over the last several decades, the performance has increased rapidly. While, the performance improvement of I/O is relatively slow. It cause... the gap between CPU performance and I/O bandwidth has continually grown. Especially, for data-intensive computing workloads, I/O bottlenecks often cause low CPU utilization.
  • BLAST is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • The pipeline pattern no only improves the performance by exploiting the par- allelism, but also can solve the out-of-core processing issue, which means required amount of data are too large to fit in the ASN’s main memory. In pp-mpiBlast, partition function is implemented within mpiformatdbfucntion running on ASN. And the merge function is a separate one running on the front node of the cluster.
  • Response time, speedup, and throughput are three critical performance measures for the pipelined BLAST. Denoting T1 and T2 as the execution times associated with the first stage and second stage in the pipeline, we can calculate the response time Tresponse for processing each input data set as the sum of T1 and T2.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • One limitation of flash memory is that although it can be read or programmed a byte or a word at a time in a random access fashion, it can only be erased a "block" at a time. This generally sets all bits in the block to 1. Starting with a freshly erased block, any location within that block can be programmed. However, once a bit has been set to 0, only by erasing the entire block can it be changed back to 1. In other words, flash memory (specifically NOR flash) offers random-access read and programming operations, but cannot offer arbitrary random-access rewrite or erase operations.Based on theSSD lifetime calculator provided by Virident website [36], the lifetime of a 200GB MLC-based SSD could be only 160 days if the write rate performing on it is 50MB/s.
  • The performance depends on the number of writes we removed.In real world implementation, (1) conservative comparison: no optimization, consider writes as synchronous (2) log file system->reduce seek and rotational delays of HDD (3) asynchronous writes: from the user perspective, the delay is not obvious (i.e. can omit)
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • An Active and Hybrid Storage System for Data-intensive Applications

    1. 1. An Active and Hybrid Storage System for Data-intensive Applications Ph.D Candidate: Zhiyang Ding Defense Committee Members: Dr. Xiao Qin Dr. Kai H. Chang Dr. David A. Umphress University Reader: Prof. Wei Wang, Chair of the Art Design Dept. 5/7/2012
    2. 2. Cluster Computing • Large-scale Data Processing is everywhere.5/7/2012 2
    3. 3. Motivation • Traditional Storage Nodes on the Cluster Storage Node Head Node (or Storage Area Network) InternetClient Network switch Compute Nodes 5/7/2012 3
    4. 4. Motivation • What’s the next? • More “Active”. Head Internet NodeClient Network switch Storage Node Compute Nodes Computation Offload I/O Request Raw Data Pre-processed Data 5/7/2012 4
    5. 5. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 5
    6. 6. McSD: A Multicore Active Storage Device• I/O Wall Problem: CPU--I/O Gap – Limited I/O Bandwidth – CPU Waiting and Dissipating the Power• How to – Bridge CPU--I/O Gap – Reduce I/O Traffic5/7/2012 6
    7. 7. Why McSD?• “Active”: – Leveraging the Processing Power of Storage Devices• Benefits: – Offloading Data-intensive Computation – Reducing I/O Traffic – Pipeline Parallel Programming5/7/2012 7
    8. 8. Contributions• Design a prototype of a multicore active storage• Design a pre-assembled processing module• Extend a shared-memory MapReduce system• Emulate the whole system on a real testbed5/7/2012 8
    9. 9. Background: Active Disks• Traditional Smart/Active Disks – On-board: Embedding a processor into the hard disk – Various Research Models • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc.• However, “active disk” is not adopted by hardware vendors Improved attachment Cost of the System technologies I/O Bound Workloads Reliability5/7/2012 9
    10. 10. Background: Parallel Processing• Multi-core Processors or Multi-processors – 45% transistors increase 20% processing power• MapReduce: a Parallel Programming Model – MapReduce by Google – Hadoop, Mars, Phoenix, and etc.• Multicore and Shared-memory Parallel Processing5/7/2012 10
    11. 11. Design: System Overview Pipeline Parallel Processing Communication Mechanism Multicore and Shared-memoryParallel Processing Hybrid Storage Disks Design of an Active Storage 5/7/2012 11
    12. 12. Design and Implementation• Computation Mechanism – Pre-assembled Processing Model – smartFAM• Extend the Shared-Memory MapReduce by Partitioning5/7/2012 12
    13. 13. Pre-assembled Processing Modules• Pre-assembled Processing Modules – Meet the nature of embedded services – Reduce Complexity and Cost – Provide Services • E.g. Multi-version antivirus service, Pre-process of data- intensive apps, De-duplication, and etc.• How to invoke services?5/7/2012 13
    14. 14. smartFAM• smartFAM = Smart File Alternation Monitor – Invokes the pre-assembled processing modules or functions by monitoring the changes of the system log file.• Two Components: – an inotify function: a Linux system function – a trigger daemon5/7/2012 14
    15. 15. Design and Implementation Active Node smartFAM Daemon Pre-assembled Modules inotify ... Host node 2 1 smartFAM Main Program Daemon Module Log Data- Log files General intensive & Result data functions function 3 inotify Merge Results NFS5/7/2012 15
    16. 16. Extend the Phoenix: A Shared-memory MapReduce Model• Extend the Phoenix MapReduce Programming Model by partitioning and merging – New API: partition_input – New Functions: • partition (provided by the new API) • merge (Develop by user)• Example: – wordcount [data-file][partition-size][]5/7/2012 16
    17. 17. Pipeline Processing5/7/2012 17
    18. 18. Evaluation Environment• Testbed• Benchmarks – Word Count – String Match – Matrix Multiplication• Individual Node Performance• System Performance5/7/2012 18
    19. 19. Individual Node Performance Word Count (seconds) String Match (seconds) 1 GB 1.25 GB 1 GB 1.25 GBw/ Partition 40.60 50.91 17.76 20.61w/o Partition 85.74 139.54 17.62 21.005/7/2012 19
    20. 20. System Evaluation Matrix-Multiplication and Word-Count (Speedups)Input Data Size vs Single Machine vs Single-core Active vs McSD w/o Partition 500 MB 1.47 X 2.15 X 0.99 X 750 MB 1.45 X 2.09 X 1.04 X 1 GB 7.62 X 2.14 X 6.07 X 1.25 GB 19.01 X 2.50 X 15.39 X TConsumptionOfControlSample Speedup = TConsumptionOfMcSD 5/7/2012 20
    21. 21. Summary• It can improve system performance by offloading data-intensive computation• McSD is a promising active storage model with – Pre-assembled processing modules – Parallel data processing – Better Evaluation Performance5/7/2012 21
    22. 22. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 22
    23. 23. Apply Active Storages to a Cluster• So far, we know the potential of Active Storages• Challenge: How to coordinate active storage nodes with computing nodes?• Propose a Pipeline-parallel Processing pattern5/7/2012 23
    24. 24. Contributions• Propose a pipeline-parallel processing framework to “connect” a Active Storage node with computing nodes.• Evaluate the framework using both an analytic model and a real implementation.• Case Study: Extend an existing bioinformatics application based on the framework.5/7/2012 24
    25. 25. Background: Active Storage Processor Memory Mass Storage Bridge? Active Storage Node SSD SSD Computation Buff Disks5/7/2012 25
    26. 26. Background: Bioinformatics App• BLAST*: Basic Local Alignment Search Tool – Comparing primary biological sequence information• mpiBLAST** is a freely available, open-source, parallel implementation of NCBI BLAST. – Format raw data files – Run a parallel BLAST function *http://blast.ncbi.nlm.nih.gov/ **http://www.mpiblast.org/5/7/2012 27
    27. 27. Pipeline-parallel Design• Offload the raw-data formatting task to where data stores.• Intra-application Pipeline-parallel Processing by “partition” and “merge”.• pp-mpiBlast, a case study.5/7/2012 28
    28. 28. Pipelining WorkflowActive Storage Node Computing Nodes Intermediate Sub-output Partition 1 1 1 Raw 2 2 Inter- 2 Output Input Formart DB mediat Formart DB Output File File … es … … Partition Intermediate Sub-output n n n n 1 Partition FormatDB mpiBlast Merge (n-1) times (n-1) times5/7/2012 29
    29. 29. Analytic Model• Three Critical Measures Tresponse = Tactive + Tcompute 1 Throughput = max(Tactive ,Tcompute ) Tsequence n ´ (Tactive + Tcompute ) Speedup = = Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute n = Throughput 1+ (n -1) ´ Tresponse5/7/2012 30
    30. 30. Evaluation Environment Computing Nodes Configuration Active Storage Configuration CPU Intel XEON X3430 Intel Core 2 Q9400 Memory 2 GB DDR3 (PC3-10600) OS Ubuntu 9.04 Jaunty Jackalope 32bit Version Kernel 2.6.28-15-generic Network Gigabit LAN Our Testbed Opposite Testbeds “Pipeline-parallel” “12-node Cluster” “13-node Cluster” 12 Computing Nodes 12 Computing Nodes 13 Computing Nodes 1 Active Storage Node 1 Storage Node 1 Storage Node5/7/2012 31
    31. 31. Pipeline-parallel Design Results: Compared With 12-node System Results: Compared With 13-node System5/7/2012 32
    32. 32. Speedups Trends: Partition Size5/7/2012 33
    33. 33. Summary• We proposed a pipeline-parallel processing mechanism to apply an Active Storage Node.• As a case study, we extended a classic bioinformatics application based on the pipeline-parallel style.5/7/2012 34
    34. 34. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 35
    35. 35. What’s Hybrid?A Hybrid Combination of a Gas PowerEngine and a Electronic Engine Efficiency5/7/2012 36
    36. 36. Hybrid Disk Drives• A Hybrid Combination of Two Types of Storage Devices: HDD and SSD – HDD: Magnetic Hard Disk – Solid State Disk: Built by NAND-based flash memory. What are their roles?5/7/2012 37
    37. 37. Motivation• In a hybrid storage system, using SSDs as the buffer can boost the performance. WordCount on Intel Core2 Duo E8400 (seconds)• However, SSDs suffer Input Data Size issues. Storage Buffer reliability 500 MB 750 MB 1 GB 1.25 GB HDD HDD 21.51 38.30 505.25 1294.64 HDD SD S 19.89 36.41 85.74 139.545/7/2012 38
    38. 38. Limitations Related to SSDs• Flash Memory: – Each Block consists 32 or 64 or128 pages. – Each Page is typically 512 or 2,048 or 4,096 bytes.• “Erase-before-write” at block level.• Lifespan is 10,000 Program/Erase cycles. – E.g., *The lifespan of an 80 GB MLC SSD can only last 106 days, if the write rates is 30 MB/s.• Rethink about their roles? *Based on the SSD lifespan calculator provided by Virident.com5/7/2012 39
    39. 39. Contributions• Hybrid Combination of HDD and SSD disks• De-duplication Service using HDDs as a Write Buffer• Internal-parallel Processing in SSD• Simulation of the Whole System For Evaluation5/7/2012 40
    40. 40. Hybrid Disk Configuration De-duplication Data of Write Requests HDD I/O Dedicated Requests data Processor Deduplicated data Read Requests Pre-processing Pre-processed Data Data SSD5/7/2012 41
    41. 41. HcDD Architecture5/7/2012 42
    42. 42. Deduplication Design5/7/2012 43
    43. 43. List #0 List #1 List #2 List #3 List #4 List #5 List #6 List #75/7/2012 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... SDRAM Cache ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Req 17 Req 18 Req 19 Req 20 Req 21 Req 22 Req 23 Req 24 Req 9 Req 10 Req 11 Req 12 Req 13 Req 14 Req 15 Req 16 Req 1 Req 2 Req 3 Req 4 Req 5 Req 6 Req 7 Req 8 #0 #1 #2 #5 #6 #7 #3 #4 Package Package Package Package Package Package Package Package44 Internal Parallel Processing
    44. 44. Evaluation5/7/2012 45
    45. 45. Internal Parallelism Evaluation: Single Node5/7/2012 46
    46. 46. Single Node: Dedup Ratio5/7/2012 47
    47. 47. System Performance Evaluation5/7/2012 48
    48. 48. System Performance Evaluation5/7/2012 49
    49. 49. Summary5/7/2012 50
    50. 50. Conclusion McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 51
    51. 51. Future Work5/7/2012 52
    52. 52. Many Thanks! And Questions?5/7/2012 53
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×