An Active and Hybrid Storage System for Data-intensive Applications
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

An Active and Hybrid Storage System for Data-intensive Applications

  • 705 views
Uploaded on

Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with......

Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with traditional storage systems, next-generation systems will embrace dedicated processor to reduce computational load of host machines and will have hybrid combinations of different storage devices. We present a new architecture of active storage system, which leverage the computational power of the dedicated processor, and show how it utilizes the multi-core processor and offloads the computation from the host machine. We then solve the challenge of applying the active storage node to cooperate with the other nodes in the cluster environment by design a pipeline-parallel processing pattern and report the effectiveness of the mechanism. In order to evaluate the design, an open-source bioinformatics application is extended based on the pipeline-parallel mechanism. We also explore the hybrid configuration of storage devices within the active storage. The advent of flash-memory-based solid state disk has become a critical role in revolutionizing the storage world. However, instead of simply replacing the traditional magnetic hard disk with the solid state disk, researchers believe that finding a complementary approach to corporate both of them is more challenging and attractive. Thus, we propose a hybrid combination of different types of disk drives for our active storage system. An simulator is designed and implemented to verify the new configuration. In summary, this dissertation explores the idea of active storage, an emerging new storage configuration, in terms of the architecture and design, the parallel processing capability, the cooperation of other machines in cluster computing environment, and the new disk configuration, the hybrid combination of different types of disk drives.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
705
On Slideshare
702
From Embeds
3
Number of Embeds
3

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 3

https://twimg0-a.akamaihd.net 1
http://www.linkedin.com 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • Aesop’s Fable: The Tortoise and the Hare. Speed gap. Fast Runner wait for the slower one.Over the last several decades, the performance has increased rapidly. While, the performance improvement of I/O is relatively slow. It cause... the gap between CPU performance and I/O bandwidth has continually grown. Especially, for data-intensive computing workloads, I/O bottlenecks often cause low CPU utilization.
  • BLAST is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • The pipeline pattern no only improves the performance by exploiting the par- allelism, but also can solve the out-of-core processing issue, which means required amount of data are too large to fit in the ASN’s main memory. In pp-mpiBlast, partition function is implemented within mpiformatdbfucntion running on ASN. And the merge function is a separate one running on the front node of the cluster.
  • Response time, speedup, and throughput are three critical performance measures for the pipelined BLAST. Denoting T1 and T2 as the execution times associated with the first stage and second stage in the pipeline, we can calculate the response time Tresponse for processing each input data set as the sum of T1 and T2.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • One limitation of flash memory is that although it can be read or programmed a byte or a word at a time in a random access fashion, it can only be erased a "block" at a time. This generally sets all bits in the block to 1. Starting with a freshly erased block, any location within that block can be programmed. However, once a bit has been set to 0, only by erasing the entire block can it be changed back to 1. In other words, flash memory (specifically NOR flash) offers random-access read and programming operations, but cannot offer arbitrary random-access rewrite or erase operations.Based on theSSD lifetime calculator provided by Virident website [36], the lifetime of a 200GB MLC-based SSD could be only 160 days if the write rate performing on it is 50MB/s.
  • The performance depends on the number of writes we removed.In real world implementation, (1) conservative comparison: no optimization, consider writes as synchronous (2) log file system->reduce seek and rotational delays of HDD (3) asynchronous writes: from the user perspective, the delay is not obvious (i.e. can omit)
  • Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary

Transcript

  • 1. An Active and Hybrid Storage System for Data-intensive Applications Ph.D Candidate: Zhiyang Ding Defense Committee Members: Dr. Xiao Qin Dr. Kai H. Chang Dr. David A. Umphress University Reader: Prof. Wei Wang, Chair of the Art Design Dept. 5/7/2012
  • 2. Cluster Computing • Large-scale Data Processing is everywhere.5/7/2012 2
  • 3. Motivation • Traditional Storage Nodes on the Cluster Storage Node Head Node (or Storage Area Network) InternetClient Network switch Compute Nodes 5/7/2012 3
  • 4. Motivation • What’s the next? • More “Active”. Head Internet NodeClient Network switch Storage Node Compute Nodes Computation Offload I/O Request Raw Data Pre-processed Data 5/7/2012 4
  • 5. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 5
  • 6. McSD: A Multicore Active Storage Device• I/O Wall Problem: CPU--I/O Gap – Limited I/O Bandwidth – CPU Waiting and Dissipating the Power• How to – Bridge CPU--I/O Gap – Reduce I/O Traffic5/7/2012 6
  • 7. Why McSD?• “Active”: – Leveraging the Processing Power of Storage Devices• Benefits: – Offloading Data-intensive Computation – Reducing I/O Traffic – Pipeline Parallel Programming5/7/2012 7
  • 8. Contributions• Design a prototype of a multicore active storage• Design a pre-assembled processing module• Extend a shared-memory MapReduce system• Emulate the whole system on a real testbed5/7/2012 8
  • 9. Background: Active Disks• Traditional Smart/Active Disks – On-board: Embedding a processor into the hard disk – Various Research Models • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc.• However, “active disk” is not adopted by hardware vendors Improved attachment Cost of the System technologies I/O Bound Workloads Reliability5/7/2012 9
  • 10. Background: Parallel Processing• Multi-core Processors or Multi-processors – 45% transistors increase 20% processing power• MapReduce: a Parallel Programming Model – MapReduce by Google – Hadoop, Mars, Phoenix, and etc.• Multicore and Shared-memory Parallel Processing5/7/2012 10
  • 11. Design: System Overview Pipeline Parallel Processing Communication Mechanism Multicore and Shared-memoryParallel Processing Hybrid Storage Disks Design of an Active Storage 5/7/2012 11
  • 12. Design and Implementation• Computation Mechanism – Pre-assembled Processing Model – smartFAM• Extend the Shared-Memory MapReduce by Partitioning5/7/2012 12
  • 13. Pre-assembled Processing Modules• Pre-assembled Processing Modules – Meet the nature of embedded services – Reduce Complexity and Cost – Provide Services • E.g. Multi-version antivirus service, Pre-process of data- intensive apps, De-duplication, and etc.• How to invoke services?5/7/2012 13
  • 14. smartFAM• smartFAM = Smart File Alternation Monitor – Invokes the pre-assembled processing modules or functions by monitoring the changes of the system log file.• Two Components: – an inotify function: a Linux system function – a trigger daemon5/7/2012 14
  • 15. Design and Implementation Active Node smartFAM Daemon Pre-assembled Modules inotify ... Host node 2 1 smartFAM Main Program Daemon Module Log Data- Log files General intensive & Result data functions function 3 inotify Merge Results NFS5/7/2012 15
  • 16. Extend the Phoenix: A Shared-memory MapReduce Model• Extend the Phoenix MapReduce Programming Model by partitioning and merging – New API: partition_input – New Functions: • partition (provided by the new API) • merge (Develop by user)• Example: – wordcount [data-file][partition-size][]5/7/2012 16
  • 17. Pipeline Processing5/7/2012 17
  • 18. Evaluation Environment• Testbed• Benchmarks – Word Count – String Match – Matrix Multiplication• Individual Node Performance• System Performance5/7/2012 18
  • 19. Individual Node Performance Word Count (seconds) String Match (seconds) 1 GB 1.25 GB 1 GB 1.25 GBw/ Partition 40.60 50.91 17.76 20.61w/o Partition 85.74 139.54 17.62 21.005/7/2012 19
  • 20. System Evaluation Matrix-Multiplication and Word-Count (Speedups)Input Data Size vs Single Machine vs Single-core Active vs McSD w/o Partition 500 MB 1.47 X 2.15 X 0.99 X 750 MB 1.45 X 2.09 X 1.04 X 1 GB 7.62 X 2.14 X 6.07 X 1.25 GB 19.01 X 2.50 X 15.39 X TConsumptionOfControlSample Speedup = TConsumptionOfMcSD 5/7/2012 20
  • 21. Summary• It can improve system performance by offloading data-intensive computation• McSD is a promising active storage model with – Pre-assembled processing modules – Parallel data processing – Better Evaluation Performance5/7/2012 21
  • 22. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 22
  • 23. Apply Active Storages to a Cluster• So far, we know the potential of Active Storages• Challenge: How to coordinate active storage nodes with computing nodes?• Propose a Pipeline-parallel Processing pattern5/7/2012 23
  • 24. Contributions• Propose a pipeline-parallel processing framework to “connect” a Active Storage node with computing nodes.• Evaluate the framework using both an analytic model and a real implementation.• Case Study: Extend an existing bioinformatics application based on the framework.5/7/2012 24
  • 25. Background: Active Storage Processor Memory Mass Storage Bridge? Active Storage Node SSD SSD Computation Buff Disks5/7/2012 25
  • 26. Background: Bioinformatics App• BLAST*: Basic Local Alignment Search Tool – Comparing primary biological sequence information• mpiBLAST** is a freely available, open-source, parallel implementation of NCBI BLAST. – Format raw data files – Run a parallel BLAST function *http://blast.ncbi.nlm.nih.gov/ **http://www.mpiblast.org/5/7/2012 27
  • 27. Pipeline-parallel Design• Offload the raw-data formatting task to where data stores.• Intra-application Pipeline-parallel Processing by “partition” and “merge”.• pp-mpiBlast, a case study.5/7/2012 28
  • 28. Pipelining WorkflowActive Storage Node Computing Nodes Intermediate Sub-output Partition 1 1 1 Raw 2 2 Inter- 2 Output Input Formart DB mediat Formart DB Output File File … es … … Partition Intermediate Sub-output n n n n 1 Partition FormatDB mpiBlast Merge (n-1) times (n-1) times5/7/2012 29
  • 29. Analytic Model• Three Critical Measures Tresponse = Tactive + Tcompute 1 Throughput = max(Tactive ,Tcompute ) Tsequence n ´ (Tactive + Tcompute ) Speedup = = Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute n = Throughput 1+ (n -1) ´ Tresponse5/7/2012 30
  • 30. Evaluation Environment Computing Nodes Configuration Active Storage Configuration CPU Intel XEON X3430 Intel Core 2 Q9400 Memory 2 GB DDR3 (PC3-10600) OS Ubuntu 9.04 Jaunty Jackalope 32bit Version Kernel 2.6.28-15-generic Network Gigabit LAN Our Testbed Opposite Testbeds “Pipeline-parallel” “12-node Cluster” “13-node Cluster” 12 Computing Nodes 12 Computing Nodes 13 Computing Nodes 1 Active Storage Node 1 Storage Node 1 Storage Node5/7/2012 31
  • 31. Pipeline-parallel Design Results: Compared With 12-node System Results: Compared With 13-node System5/7/2012 32
  • 32. Speedups Trends: Partition Size5/7/2012 33
  • 33. Summary• We proposed a pipeline-parallel processing mechanism to apply an Active Storage Node.• As a case study, we extended a classic bioinformatics application based on the pipeline-parallel style.5/7/2012 34
  • 34. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 35
  • 35. What’s Hybrid?A Hybrid Combination of a Gas PowerEngine and a Electronic Engine Efficiency5/7/2012 36
  • 36. Hybrid Disk Drives• A Hybrid Combination of Two Types of Storage Devices: HDD and SSD – HDD: Magnetic Hard Disk – Solid State Disk: Built by NAND-based flash memory. What are their roles?5/7/2012 37
  • 37. Motivation• In a hybrid storage system, using SSDs as the buffer can boost the performance. WordCount on Intel Core2 Duo E8400 (seconds)• However, SSDs suffer Input Data Size issues. Storage Buffer reliability 500 MB 750 MB 1 GB 1.25 GB HDD HDD 21.51 38.30 505.25 1294.64 HDD SD S 19.89 36.41 85.74 139.545/7/2012 38
  • 38. Limitations Related to SSDs• Flash Memory: – Each Block consists 32 or 64 or128 pages. – Each Page is typically 512 or 2,048 or 4,096 bytes.• “Erase-before-write” at block level.• Lifespan is 10,000 Program/Erase cycles. – E.g., *The lifespan of an 80 GB MLC SSD can only last 106 days, if the write rates is 30 MB/s.• Rethink about their roles? *Based on the SSD lifespan calculator provided by Virident.com5/7/2012 39
  • 39. Contributions• Hybrid Combination of HDD and SSD disks• De-duplication Service using HDDs as a Write Buffer• Internal-parallel Processing in SSD• Simulation of the Whole System For Evaluation5/7/2012 40
  • 40. Hybrid Disk Configuration De-duplication Data of Write Requests HDD I/O Dedicated Requests data Processor Deduplicated data Read Requests Pre-processing Pre-processed Data Data SSD5/7/2012 41
  • 41. HcDD Architecture5/7/2012 42
  • 42. Deduplication Design5/7/2012 43
  • 43. List #0 List #1 List #2 List #3 List #4 List #5 List #6 List #75/7/2012 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... SDRAM Cache ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Req 17 Req 18 Req 19 Req 20 Req 21 Req 22 Req 23 Req 24 Req 9 Req 10 Req 11 Req 12 Req 13 Req 14 Req 15 Req 16 Req 1 Req 2 Req 3 Req 4 Req 5 Req 6 Req 7 Req 8 #0 #1 #2 #5 #6 #7 #3 #4 Package Package Package Package Package Package Package Package44 Internal Parallel Processing
  • 44. Evaluation5/7/2012 45
  • 45. Internal Parallelism Evaluation: Single Node5/7/2012 46
  • 46. Single Node: Dedup Ratio5/7/2012 47
  • 47. System Performance Evaluation5/7/2012 48
  • 48. System Performance Evaluation5/7/2012 49
  • 49. Summary5/7/2012 50
  • 50. Conclusion McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage5/7/2012 51
  • 51. Future Work5/7/2012 52
  • 52. Many Thanks! And Questions?5/7/2012 53