SlideShare a Scribd company logo
An Active and Hybrid Storage System
     for Data-intensive Applications

   Ph.D Candidate: Zhiyang Ding
   Defense Committee Members:
   Dr. Xiao Qin
   Dr. Kai H. Chang
   Dr. David A. Umphress
   University Reader:
   Prof. Wei Wang,
   Chair of the Art Design Dept.
                    5/7/2012
Cluster Computing
      • Large-scale Data Processing is everywhere.




5/7/2012                     2
Motivation
         • Traditional Storage Nodes on the Cluster
                                                            Storage Node
                                     Head Node       (or Storage Area Network)
                    Internet
Client




                                   Network switch




                               Compute
                               Nodes
         5/7/2012                                3
Motivation
         • What’s the next?
         • More “Active”.


                             Head
              Internet




                             Node
Client




                            Network switch



                                                          Storage Node
                  Compute
                  Nodes                      Computation Offload
                                                I/O Request

                                                   Raw Data
                                              Pre-processed Data
         5/7/2012                                           4
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                           Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               5
McSD:
   A Multicore Active Storage Device

• I/O Wall Problem: CPU--I/O Gap
      – Limited I/O Bandwidth
      – CPU Waiting and
        Dissipating the Power
• How to
      – Bridge CPU--I/O Gap
      – Reduce I/O Traffic


5/7/2012                      6
Why McSD?


• “Active”:
      – Leveraging the Processing Power of Storage Devices


• Benefits:
      – Offloading Data-intensive Computation
      – Reducing I/O Traffic
      – Pipeline Parallel Programming


5/7/2012                     7
Contributions


• Design a prototype of a multicore active storage

• Design a pre-assembled processing module

• Extend a shared-memory MapReduce system

• Emulate the whole system on a real testbed


5/7/2012                 8
Background: Active Disks

• Traditional Smart/Active Disks
      – On-board: Embedding a processor into the hard disk
      – Various Research Models
         • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc.

• However, “active disk” is not adopted by hardware vendors

            Improved attachment
                                       Cost of the System
                technologies


            I/O Bound Workloads            Reliability


5/7/2012                           9
Background: Parallel Processing

• Multi-core Processors or Multi-processors
      – 45% transistors increase   20% processing power
• MapReduce: a Parallel Programming Model
      – MapReduce by Google
      – Hadoop, Mars, Phoenix, and etc.
• Multicore and Shared-memory Parallel
  Processing

5/7/2012                     10
Design: System Overview

                                            Pipeline Parallel
                                               Processing

                                           Communication
                                             Mechanism
  Multicore and
 Shared-memory
Parallel Processing
                                       Hybrid Storage Disks




 Design of an Active
      Storage

  5/7/2012                  11
Design and Implementation

• Computation Mechanism
      – Pre-assembled Processing Model
      – smartFAM
• Extend the Shared-Memory MapReduce by
  Partitioning




5/7/2012                   12
Pre-assembled Processing Modules


• Pre-assembled Processing Modules
      – Meet the nature of embedded services
      – Reduce Complexity and Cost
      – Provide Services
           • E.g. Multi-version antivirus service, Pre-process of data-
             intensive apps, De-duplication, and etc.
• How to invoke services?


5/7/2012                            13
smartFAM

• smartFAM = Smart File Alternation Monitor
      – Invokes the pre-assembled processing modules or
        functions by monitoring the changes of the system
        log file.
• Two Components:
      – an inotify function: a Linux system function
      – a trigger daemon


5/7/2012                      14
Design and Implementation

  Active Node


  smartFAM
        Daemon


                   Pre-assembled
                   Modules
         inotify
                           ...        Host node
                           2
                                                                    1
                                      smartFAM      Main Program

                                         Daemon
                       Module Log                                 Data-
        Log files                                     General
                                                                intensive
                      & Result data                 functions
                                                                 function


                           3              inotify
                                                       Merge Results

  NFS




5/7/2012                                    15
Extend the Phoenix:
    A Shared-memory MapReduce Model

• Extend the Phoenix MapReduce Programming
  Model by partitioning and merging
      – New API: partition_input
      – New Functions:
           • partition (provided by the new API)
           • merge (Develop by user)


• Example:
      – wordcount [data-file][partition-size][]


5/7/2012                             16
Pipeline Processing




5/7/2012                 17
Evaluation Environment

• Testbed

• Benchmarks
      – Word Count
      – String Match
      – Matrix Multiplication

• Individual Node Performance
• System Performance
5/7/2012                        18
Individual Node Performance


                Word Count (seconds)    String Match (seconds)

                1 GB          1.25 GB   1 GB           1.25 GB

w/ Partition    40.60          50.91    17.76           20.61

w/o Partition   85.74         139.54    17.62           21.00




5/7/2012                       19
System Evaluation

                  Matrix-Multiplication and Word-Count (Speedups)
Input Data Size          vs Single Machine          vs Single-core Active   vs McSD w/o Partition

   500 MB                        1.47 X                   2.15 X                   0.99 X

   750 MB                        1.45 X                   2.09 X                   1.04 X

     1 GB                        7.62 X                   2.14 X                   6.07 X

   1.25 GB                      19.01 X                   2.50 X                  15.39 X


                      TConsumptionOfControlSample
            Speedup =
                         TConsumptionOfMcSD
 5/7/2012                                           20
Summary

• It can improve system performance by
  offloading data-intensive computation

• McSD is a promising active storage model with
      – Pre-assembled processing modules
      – Parallel data processing
      – Better Evaluation Performance


5/7/2012                   21
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               22
Apply Active Storages to a Cluster


• So far, we know the potential of Active
  Storages

• Challenge: How to coordinate active storage
  nodes with computing nodes?

• Propose a Pipeline-parallel Processing pattern

5/7/2012                23
Contributions


• Propose a pipeline-parallel processing framework
  to “connect” a Active Storage node with
  computing nodes.
• Evaluate the framework using both an analytic
  model and a real implementation.
• Case Study: Extend an existing bioinformatics
  application based on the framework.

5/7/2012                24
Background: Active Storage


                   Processor
     Memory

  Mass Storage
                                         Bridge?


                 Active Storage
                     Node

                  SSD      SSD    Computation

                   Buff Disks


5/7/2012                            25
Background: Bioinformatics App

• BLAST*: Basic Local Alignment Search Tool
      – Comparing primary biological sequence
        information


• mpiBLAST** is a freely available, open-source,
  parallel implementation of NCBI BLAST.
      – Format raw data files
      – Run a parallel BLAST function
                            *http://blast.ncbi.nlm.nih.gov/
                            **http://www.mpiblast.org/
5/7/2012                      27
Pipeline-parallel Design


• Offload the raw-data formatting task to where
    data stores.
• Intra-application Pipeline-parallel Processing
    by “partition” and “merge”.
• pp-mpiBlast, a case study.


5/7/2012                   28
Pipelining Workflow

Active Storage Node                                                              Computing Nodes
                                         Intermediate             Sub-output
                  Partition 1
                                                1                       1

  Raw                     2                   2
                                           Inter-                        2
                                                                                          Output
 Input                 Formart DB          mediat                Formart DB             Output
                                                                                            File
  File                    …                  es     …                       …
                        Partition            Intermediate               Sub-output
                           n                         n                       n

                           n                                                 1
           Partition                 FormatDB                mpiBlast                Merge
                       (n-1) times
                                                         (n-1) times
5/7/2012                                        29
Analytic Model

• Three Critical Measures
 Tresponse = Tactive + Tcompute
                                  1
 Throughput =
                    max(Tactive ,Tcompute )
            Tsequence                n ´ (Tactive + Tcompute )
  Speedup =           =
            Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute
               n
 =
               Throughput
   1+ (n -1) ´
                 Tresponse

5/7/2012                                 30
Evaluation Environment

                Computing Nodes Configuration            Active Storage Configuration
    CPU                  Intel XEON X3430                       Intel Core 2 Q9400
 Memory                               2 GB DDR3 (PC3-10600)
     OS                      Ubuntu 9.04 Jaunty Jackalope 32bit Version
   Kernel                                   2.6.28-15-generic
 Network                                         Gigabit LAN

           Our Testbed                              Opposite Testbeds
    “Pipeline-parallel”           “12-node Cluster”               “13-node Cluster”
    12 Computing Nodes           12 Computing Nodes               13 Computing Nodes
   1 Active Storage Node            1 Storage Node                  1 Storage Node



5/7/2012                                    31
Pipeline-parallel Design




                   Results: Compared With 12-node System




                   Results: Compared With 13-node System
5/7/2012                          32
Speedups Trends: Partition Size




5/7/2012             33
Summary


• We proposed a pipeline-parallel processing
    mechanism to apply an Active Storage Node.


• As a case study, we extended a classic
    bioinformatics application based on the
    pipeline-parallel style.

5/7/2012                   34
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               35
What’s Hybrid?

A Hybrid Combination of a Gas    Power
Engine and a Electronic Engine   Efficiency




5/7/2012               36
Hybrid Disk Drives

• A Hybrid Combination of Two Types of Storage
  Devices: HDD and SSD
      – HDD: Magnetic Hard Disk
      – Solid State Disk: Built by NAND-based flash memory.


                                        What are their roles?




5/7/2012                       37
Motivation


• In a hybrid storage system, using SSDs as the
  buffer can boost the performance.
            WordCount on Intel Core2 Duo E8400 (seconds)

• However, SSDs suffer Input Data Size issues.
  Storage Buffer
                       reliability
                          500 MB   750 MB    1 GB    1.25 GB

           HDD    HDD      21.51    38.30   505.25   1294.64


           HDD     SD
                    S      19.89    36.41    85.74   139.54



5/7/2012                             38
Limitations Related to SSDs

• Flash Memory:
      – Each Block consists 32 or 64 or128 pages.
      – Each Page is typically 512 or 2,048 or 4,096 bytes.
• “Erase-before-write” at block level.
• Lifespan is 10,000 Program/Erase cycles.
      – E.g., *The lifespan of an 80 GB MLC SSD can only
        last 106 days, if the write rates is 30 MB/s.
• Rethink about their roles?
            *Based on the SSD lifespan calculator provided by Virident.com
5/7/2012                                    39
Contributions


• Hybrid Combination of HDD and SSD disks

• De-duplication Service using HDDs as a Write Buffer

• Internal-parallel Processing in SSD

• Simulation of the Whole System For Evaluation



5/7/2012                  40
Hybrid Disk Configuration


                                                       De-duplication
             Data of Write Requests

                                               HDD
     I/O                                                           Dedicated
   Requests                                     data               Processor
                                Deduplicated    data


                Read Requests                          Pre-processing
               Pre-processed Data
                      Data
                                                SSD


5/7/2012                              41
HcDD Architecture




5/7/2012               42
Deduplication Design




5/7/2012                43
List #0
                                     List #1
                                                List #2
                                                            List #3
                                                                       List #4
                                                                                  List #5
                                                                                             List #6
                                                                                                        List #7




5/7/2012
                           ...        ...        ...        ...         ...        ...        ...        ...
                           ...        ...        ...        ...         ...        ...        ...        ...




           SDRAM Cache
                           ...        ...        ...        ...         ...        ...        ...        ...
                           ...        ...        ...        ...         ...        ...        ...        ...
                         Req 17     Req 18     Req 19     Req 20      Req 21     Req 22     Req 23     Req 24
                         Req 9      Req 10     Req 11     Req 12      Req 13     Req 14     Req 15     Req 16
                         Req 1      Req 2      Req 3      Req 4       Req 5      Req 6      Req 7      Req 8




                           #0
                                      #1
                                                 #2
                                                                                   #5
                                                                                              #6
                                                                                                         #7




                                                            #3
                                                                        #4




                         Package
                                    Package
                                               Package
                                                          Package
                                                                      Package
                                                                                 Package
                                                                                            Package
                                                                                                       Package




44
                                                                                                                  Internal Parallel Processing
Evaluation




5/7/2012           45
Internal Parallelism Evaluation:
               Single Node




5/7/2012                 46
Single Node: Dedup Ratio




5/7/2012                  47
System Performance Evaluation




5/7/2012             48
System Performance Evaluation




5/7/2012             49
Summary




5/7/2012         50
Conclusion

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               51
Future Work




5/7/2012        52
Many Thanks!
           And Questions?




5/7/2012        53

More Related Content

What's hot

Erlang Cache
Erlang CacheErlang Cache
Erlang Cacheice j
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
VESIT/University of Mumbai
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
Gavin Heavyside
 
Faster Than A Speeding Disk
Faster Than A Speeding DiskFaster Than A Speeding Disk
Faster Than A Speeding DiskAndrey Klyachkin
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
Richard McDougall
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
SAP Technology
 
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performanceHow an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
gojkoadzic
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战
Weiwei Fang
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012DATAVERSITY
 
Damon2011 preview
Damon2011 previewDamon2011 preview
Damon2011 previewsundarnu
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinar
Hitachi Vantara
 
Gear6 Web Cache Overview
Gear6 Web Cache OverviewGear6 Web Cache Overview
Gear6 Web Cache Overview
Gear6
 
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM India Smarter Computing
 
The unified data center for cloud david yen
The unified data center for cloud david yenThe unified data center for cloud david yen
The unified data center for cloud david yen
deepersnet
 
Good Data: Collaborative Analytics On Demand
Good Data: Collaborative Analytics On DemandGood Data: Collaborative Analytics On Demand
Good Data: Collaborative Analytics On Demand
zsvoboda
 
Nutanix Always On-Solution-Brief
Nutanix Always On-Solution-BriefNutanix Always On-Solution-Brief
Nutanix Always On-Solution-BriefManny Carral
 

What's hot (20)

Erlang Cache
Erlang CacheErlang Cache
Erlang Cache
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Faster Than A Speeding Disk
Faster Than A Speeding DiskFaster Than A Speeding Disk
Faster Than A Speeding Disk
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performanceHow an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012
 
Damon2011 preview
Damon2011 previewDamon2011 preview
Damon2011 preview
 
Osac2012
Osac2012Osac2012
Osac2012
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinar
 
Gear6 Web Cache Overview
Gear6 Web Cache OverviewGear6 Web Cache Overview
Gear6 Web Cache Overview
 
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
 
The unified data center for cloud david yen
The unified data center for cloud david yenThe unified data center for cloud david yen
The unified data center for cloud david yen
 
Hitachi Data Services. Business Continuity
Hitachi Data Services. Business ContinuityHitachi Data Services. Business Continuity
Hitachi Data Services. Business Continuity
 
Good Data: Collaborative Analytics On Demand
Good Data: Collaborative Analytics On DemandGood Data: Collaborative Analytics On Demand
Good Data: Collaborative Analytics On Demand
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Ibm 14052012
Ibm 14052012Ibm 14052012
Ibm 14052012
 
Nutanix Always On-Solution-Brief
Nutanix Always On-Solution-BriefNutanix Always On-Solution-Brief
Nutanix Always On-Solution-Brief
 

Viewers also liked

Project 2 - how to compile os161?
Project 2 - how to compile os161?Project 2 - how to compile os161?
Project 2 - how to compile os161?
Xiao Qin
 
IPCCC 2012 Conference Program Overview
IPCCC 2012 Conference Program OverviewIPCCC 2012 Conference Program Overview
IPCCC 2012 Conference Program Overview
Xiao Qin
 
OS/161 Overview
OS/161 OverviewOS/161 Overview
OS/161 Overview
Xiao Qin
 
Project 2 how to modify OS/161
Project 2 how to modify OS/161Project 2 how to modify OS/161
Project 2 how to modify OS/161
Xiao Qin
 
Reliability Analysis for an Energy-Aware RAID System
Reliability Analysis for an Energy-Aware RAID SystemReliability Analysis for an Energy-Aware RAID System
Reliability Analysis for an Energy-Aware RAID System
Xiao Qin
 
Nas'12 overview
Nas'12 overviewNas'12 overview
Nas'12 overview
Xiao Qin
 
Energy Efficient Data Storage Systems
Energy Efficient Data Storage SystemsEnergy Efficient Data Storage Systems
Energy Efficient Data Storage Systems
Xiao Qin
 
COMP2710 Software Construction: header files
COMP2710 Software Construction: header filesCOMP2710 Software Construction: header files
COMP2710 Software Construction: header files
Xiao Qin
 
How to do research?
How to do research?How to do research?
How to do research?
Xiao Qin
 
Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014
Xiao Qin
 
Why Major in Computer Science and Software Engineering at Auburn University?
Why Major in Computer Science and Software Engineering at Auburn University?Why Major in Computer Science and Software Engineering at Auburn University?
Why Major in Computer Science and Software Engineering at Auburn University?
Xiao Qin
 
Common grammar mistakes
Common grammar mistakesCommon grammar mistakes
Common grammar mistakesXiao Qin
 
Project 2 How to modify os161: A Manual
Project 2 How to modify os161: A ManualProject 2 How to modify os161: A Manual
Project 2 How to modify os161: A Manual
Xiao Qin
 
Project 2 how to install and compile os161
Project 2 how to install and compile os161Project 2 how to install and compile os161
Project 2 how to install and compile os161
Xiao Qin
 
Surviving a group project
Surviving a group projectSurviving a group project
Surviving a group project
Xiao Qin
 
How to add system calls to OS/161
How to add system calls to OS/161How to add system calls to OS/161
How to add system calls to OS/161
Xiao Qin
 
COMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercisesCOMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercises
Xiao Qin
 
Data center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniquesData center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniques
Xiao Qin
 
Understanding what our customer wants-slideshare
Understanding what our customer wants-slideshareUnderstanding what our customer wants-slideshare
Understanding what our customer wants-slideshare
Xiao Qin
 
Performance Evaluation of Traditional Caching Policies on a Large System with...
Performance Evaluation of Traditional Caching Policies on a Large System with...Performance Evaluation of Traditional Caching Policies on a Large System with...
Performance Evaluation of Traditional Caching Policies on a Large System with...
Xiao Qin
 

Viewers also liked (20)

Project 2 - how to compile os161?
Project 2 - how to compile os161?Project 2 - how to compile os161?
Project 2 - how to compile os161?
 
IPCCC 2012 Conference Program Overview
IPCCC 2012 Conference Program OverviewIPCCC 2012 Conference Program Overview
IPCCC 2012 Conference Program Overview
 
OS/161 Overview
OS/161 OverviewOS/161 Overview
OS/161 Overview
 
Project 2 how to modify OS/161
Project 2 how to modify OS/161Project 2 how to modify OS/161
Project 2 how to modify OS/161
 
Reliability Analysis for an Energy-Aware RAID System
Reliability Analysis for an Energy-Aware RAID SystemReliability Analysis for an Energy-Aware RAID System
Reliability Analysis for an Energy-Aware RAID System
 
Nas'12 overview
Nas'12 overviewNas'12 overview
Nas'12 overview
 
Energy Efficient Data Storage Systems
Energy Efficient Data Storage SystemsEnergy Efficient Data Storage Systems
Energy Efficient Data Storage Systems
 
COMP2710 Software Construction: header files
COMP2710 Software Construction: header filesCOMP2710 Software Construction: header files
COMP2710 Software Construction: header files
 
How to do research?
How to do research?How to do research?
How to do research?
 
Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014
 
Why Major in Computer Science and Software Engineering at Auburn University?
Why Major in Computer Science and Software Engineering at Auburn University?Why Major in Computer Science and Software Engineering at Auburn University?
Why Major in Computer Science and Software Engineering at Auburn University?
 
Common grammar mistakes
Common grammar mistakesCommon grammar mistakes
Common grammar mistakes
 
Project 2 How to modify os161: A Manual
Project 2 How to modify os161: A ManualProject 2 How to modify os161: A Manual
Project 2 How to modify os161: A Manual
 
Project 2 how to install and compile os161
Project 2 how to install and compile os161Project 2 how to install and compile os161
Project 2 how to install and compile os161
 
Surviving a group project
Surviving a group projectSurviving a group project
Surviving a group project
 
How to add system calls to OS/161
How to add system calls to OS/161How to add system calls to OS/161
How to add system calls to OS/161
 
COMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercisesCOMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercises
 
Data center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniquesData center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniques
 
Understanding what our customer wants-slideshare
Understanding what our customer wants-slideshareUnderstanding what our customer wants-slideshare
Understanding what our customer wants-slideshare
 
Performance Evaluation of Traditional Caching Policies on a Large System with...
Performance Evaluation of Traditional Caching Policies on a Large System with...Performance Evaluation of Traditional Caching Policies on a Large System with...
Performance Evaluation of Traditional Caching Policies on a Large System with...
 

Similar to An Active and Hybrid Storage System for Data-intensive Applications

Mow2012 data services
Mow2012 data servicesMow2012 data services
Mow2012 data services
Syed Shaaf
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Etu Solution
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
inside-BigData.com
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Anand Haridass
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Data Con LA
 
SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1
SQLPASSTW
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
jlorenzocima
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDGPrateek Jain
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
upthewaterspout
 
Ugif 04 2011 france ug04042011-jroy_part1
Ugif 04 2011   france ug04042011-jroy_part1Ugif 04 2011   france ug04042011-jroy_part1
Ugif 04 2011 france ug04042011-jroy_part1UGIF
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Rose Toomey
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
Bogdan Dina
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch Processing
Chris Adkin
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Shirshanka Das
 

Similar to An Active and Hybrid Storage System for Data-intensive Applications (20)

Mow2012 data services
Mow2012 data servicesMow2012 data services
Mow2012 data services
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
Ugif 04 2011 france ug04042011-jroy_part1
Ugif 04 2011   france ug04042011-jroy_part1Ugif 04 2011   france ug04042011-jroy_part1
Ugif 04 2011 france ug04042011-jroy_part1
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch Processing
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 

More from Xiao Qin

How to apply for internship positions?
How to apply for internship positions?How to apply for internship positions?
How to apply for internship positions?
Xiao Qin
 
How to write research papers? Version 5.0
How to write research papers? Version 5.0How to write research papers? Version 5.0
How to write research papers? Version 5.0
Xiao Qin
 
Making a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 WorksheetMaking a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 Worksheet
Xiao Qin
 
Making a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 TipsMaking a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 Tips
Xiao Qin
 
Auburn csse faculty orientation
Auburn csse faculty orientationAuburn csse faculty orientation
Auburn csse faculty orientation
Xiao Qin
 
Auburn CSSE graduate student orientation
Auburn CSSE graduate student orientationAuburn CSSE graduate student orientation
Auburn CSSE graduate student orientation
Xiao Qin
 
CSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress ReportCSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress Report
Xiao Qin
 
P#1 stream of praise
P#1 stream of praiseP#1 stream of praise
P#1 stream of praise
Xiao Qin
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
Xiao Qin
 
Reliability Modeling and Analysis of Energy-Efficient Storage Systems
Reliability Modeling and Analysis of Energy-Efficient Storage SystemsReliability Modeling and Analysis of Energy-Efficient Storage Systems
Reliability Modeling and Analysis of Energy-Efficient Storage Systems
Xiao Qin
 

More from Xiao Qin (10)

How to apply for internship positions?
How to apply for internship positions?How to apply for internship positions?
How to apply for internship positions?
 
How to write research papers? Version 5.0
How to write research papers? Version 5.0How to write research papers? Version 5.0
How to write research papers? Version 5.0
 
Making a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 WorksheetMaking a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 Worksheet
 
Making a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 TipsMaking a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 Tips
 
Auburn csse faculty orientation
Auburn csse faculty orientationAuburn csse faculty orientation
Auburn csse faculty orientation
 
Auburn CSSE graduate student orientation
Auburn CSSE graduate student orientationAuburn CSSE graduate student orientation
Auburn CSSE graduate student orientation
 
CSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress ReportCSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress Report
 
P#1 stream of praise
P#1 stream of praiseP#1 stream of praise
P#1 stream of praise
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
 
Reliability Modeling and Analysis of Energy-Efficient Storage Systems
Reliability Modeling and Analysis of Energy-Efficient Storage SystemsReliability Modeling and Analysis of Energy-Efficient Storage Systems
Reliability Modeling and Analysis of Energy-Efficient Storage Systems
 

Recently uploaded

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 

Recently uploaded (20)

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 

An Active and Hybrid Storage System for Data-intensive Applications

  • 1. An Active and Hybrid Storage System for Data-intensive Applications Ph.D Candidate: Zhiyang Ding Defense Committee Members: Dr. Xiao Qin Dr. Kai H. Chang Dr. David A. Umphress University Reader: Prof. Wei Wang, Chair of the Art Design Dept. 5/7/2012
  • 2. Cluster Computing • Large-scale Data Processing is everywhere. 5/7/2012 2
  • 3. Motivation • Traditional Storage Nodes on the Cluster Storage Node Head Node (or Storage Area Network) Internet Client Network switch Compute Nodes 5/7/2012 3
  • 4. Motivation • What’s the next? • More “Active”. Head Internet Node Client Network switch Storage Node Compute Nodes Computation Offload I/O Request Raw Data Pre-processed Data 5/7/2012 4
  • 5. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 5
  • 6. McSD: A Multicore Active Storage Device • I/O Wall Problem: CPU--I/O Gap – Limited I/O Bandwidth – CPU Waiting and Dissipating the Power • How to – Bridge CPU--I/O Gap – Reduce I/O Traffic 5/7/2012 6
  • 7. Why McSD? • “Active”: – Leveraging the Processing Power of Storage Devices • Benefits: – Offloading Data-intensive Computation – Reducing I/O Traffic – Pipeline Parallel Programming 5/7/2012 7
  • 8. Contributions • Design a prototype of a multicore active storage • Design a pre-assembled processing module • Extend a shared-memory MapReduce system • Emulate the whole system on a real testbed 5/7/2012 8
  • 9. Background: Active Disks • Traditional Smart/Active Disks – On-board: Embedding a processor into the hard disk – Various Research Models • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc. • However, “active disk” is not adopted by hardware vendors Improved attachment Cost of the System technologies I/O Bound Workloads Reliability 5/7/2012 9
  • 10. Background: Parallel Processing • Multi-core Processors or Multi-processors – 45% transistors increase 20% processing power • MapReduce: a Parallel Programming Model – MapReduce by Google – Hadoop, Mars, Phoenix, and etc. • Multicore and Shared-memory Parallel Processing 5/7/2012 10
  • 11. Design: System Overview Pipeline Parallel Processing Communication Mechanism Multicore and Shared-memory Parallel Processing Hybrid Storage Disks Design of an Active Storage 5/7/2012 11
  • 12. Design and Implementation • Computation Mechanism – Pre-assembled Processing Model – smartFAM • Extend the Shared-Memory MapReduce by Partitioning 5/7/2012 12
  • 13. Pre-assembled Processing Modules • Pre-assembled Processing Modules – Meet the nature of embedded services – Reduce Complexity and Cost – Provide Services • E.g. Multi-version antivirus service, Pre-process of data- intensive apps, De-duplication, and etc. • How to invoke services? 5/7/2012 13
  • 14. smartFAM • smartFAM = Smart File Alternation Monitor – Invokes the pre-assembled processing modules or functions by monitoring the changes of the system log file. • Two Components: – an inotify function: a Linux system function – a trigger daemon 5/7/2012 14
  • 15. Design and Implementation Active Node smartFAM Daemon Pre-assembled Modules inotify ... Host node 2 1 smartFAM Main Program Daemon Module Log Data- Log files General intensive & Result data functions function 3 inotify Merge Results NFS 5/7/2012 15
  • 16. Extend the Phoenix: A Shared-memory MapReduce Model • Extend the Phoenix MapReduce Programming Model by partitioning and merging – New API: partition_input – New Functions: • partition (provided by the new API) • merge (Develop by user) • Example: – wordcount [data-file][partition-size][] 5/7/2012 16
  • 18. Evaluation Environment • Testbed • Benchmarks – Word Count – String Match – Matrix Multiplication • Individual Node Performance • System Performance 5/7/2012 18
  • 19. Individual Node Performance Word Count (seconds) String Match (seconds) 1 GB 1.25 GB 1 GB 1.25 GB w/ Partition 40.60 50.91 17.76 20.61 w/o Partition 85.74 139.54 17.62 21.00 5/7/2012 19
  • 20. System Evaluation Matrix-Multiplication and Word-Count (Speedups) Input Data Size vs Single Machine vs Single-core Active vs McSD w/o Partition 500 MB 1.47 X 2.15 X 0.99 X 750 MB 1.45 X 2.09 X 1.04 X 1 GB 7.62 X 2.14 X 6.07 X 1.25 GB 19.01 X 2.50 X 15.39 X TConsumptionOfControlSample Speedup = TConsumptionOfMcSD 5/7/2012 20
  • 21. Summary • It can improve system performance by offloading data-intensive computation • McSD is a promising active storage model with – Pre-assembled processing modules – Parallel data processing – Better Evaluation Performance 5/7/2012 21
  • 22. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 22
  • 23. Apply Active Storages to a Cluster • So far, we know the potential of Active Storages • Challenge: How to coordinate active storage nodes with computing nodes? • Propose a Pipeline-parallel Processing pattern 5/7/2012 23
  • 24. Contributions • Propose a pipeline-parallel processing framework to “connect” a Active Storage node with computing nodes. • Evaluate the framework using both an analytic model and a real implementation. • Case Study: Extend an existing bioinformatics application based on the framework. 5/7/2012 24
  • 25. Background: Active Storage Processor Memory Mass Storage Bridge? Active Storage Node SSD SSD Computation Buff Disks 5/7/2012 25
  • 26. Background: Bioinformatics App • BLAST*: Basic Local Alignment Search Tool – Comparing primary biological sequence information • mpiBLAST** is a freely available, open-source, parallel implementation of NCBI BLAST. – Format raw data files – Run a parallel BLAST function *http://blast.ncbi.nlm.nih.gov/ **http://www.mpiblast.org/ 5/7/2012 27
  • 27. Pipeline-parallel Design • Offload the raw-data formatting task to where data stores. • Intra-application Pipeline-parallel Processing by “partition” and “merge”. • pp-mpiBlast, a case study. 5/7/2012 28
  • 28. Pipelining Workflow Active Storage Node Computing Nodes Intermediate Sub-output Partition 1 1 1 Raw 2 2 Inter- 2 Output Input Formart DB mediat Formart DB Output File File … es … … Partition Intermediate Sub-output n n n n 1 Partition FormatDB mpiBlast Merge (n-1) times (n-1) times 5/7/2012 29
  • 29. Analytic Model • Three Critical Measures Tresponse = Tactive + Tcompute 1 Throughput = max(Tactive ,Tcompute ) Tsequence n ´ (Tactive + Tcompute ) Speedup = = Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute n = Throughput 1+ (n -1) ´ Tresponse 5/7/2012 30
  • 30. Evaluation Environment Computing Nodes Configuration Active Storage Configuration CPU Intel XEON X3430 Intel Core 2 Q9400 Memory 2 GB DDR3 (PC3-10600) OS Ubuntu 9.04 Jaunty Jackalope 32bit Version Kernel 2.6.28-15-generic Network Gigabit LAN Our Testbed Opposite Testbeds “Pipeline-parallel” “12-node Cluster” “13-node Cluster” 12 Computing Nodes 12 Computing Nodes 13 Computing Nodes 1 Active Storage Node 1 Storage Node 1 Storage Node 5/7/2012 31
  • 31. Pipeline-parallel Design Results: Compared With 12-node System Results: Compared With 13-node System 5/7/2012 32
  • 32. Speedups Trends: Partition Size 5/7/2012 33
  • 33. Summary • We proposed a pipeline-parallel processing mechanism to apply an Active Storage Node. • As a case study, we extended a classic bioinformatics application based on the pipeline-parallel style. 5/7/2012 34
  • 34. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 35
  • 35. What’s Hybrid? A Hybrid Combination of a Gas Power Engine and a Electronic Engine Efficiency 5/7/2012 36
  • 36. Hybrid Disk Drives • A Hybrid Combination of Two Types of Storage Devices: HDD and SSD – HDD: Magnetic Hard Disk – Solid State Disk: Built by NAND-based flash memory. What are their roles? 5/7/2012 37
  • 37. Motivation • In a hybrid storage system, using SSDs as the buffer can boost the performance. WordCount on Intel Core2 Duo E8400 (seconds) • However, SSDs suffer Input Data Size issues. Storage Buffer reliability 500 MB 750 MB 1 GB 1.25 GB HDD HDD 21.51 38.30 505.25 1294.64 HDD SD S 19.89 36.41 85.74 139.54 5/7/2012 38
  • 38. Limitations Related to SSDs • Flash Memory: – Each Block consists 32 or 64 or128 pages. – Each Page is typically 512 or 2,048 or 4,096 bytes. • “Erase-before-write” at block level. • Lifespan is 10,000 Program/Erase cycles. – E.g., *The lifespan of an 80 GB MLC SSD can only last 106 days, if the write rates is 30 MB/s. • Rethink about their roles? *Based on the SSD lifespan calculator provided by Virident.com 5/7/2012 39
  • 39. Contributions • Hybrid Combination of HDD and SSD disks • De-duplication Service using HDDs as a Write Buffer • Internal-parallel Processing in SSD • Simulation of the Whole System For Evaluation 5/7/2012 40
  • 40. Hybrid Disk Configuration De-duplication Data of Write Requests HDD I/O Dedicated Requests data Processor Deduplicated data Read Requests Pre-processing Pre-processed Data Data SSD 5/7/2012 41
  • 43. List #0 List #1 List #2 List #3 List #4 List #5 List #6 List #7 5/7/2012 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... SDRAM Cache ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Req 17 Req 18 Req 19 Req 20 Req 21 Req 22 Req 23 Req 24 Req 9 Req 10 Req 11 Req 12 Req 13 Req 14 Req 15 Req 16 Req 1 Req 2 Req 3 Req 4 Req 5 Req 6 Req 7 Req 8 #0 #1 #2 #5 #6 #7 #3 #4 Package Package Package Package Package Package Package Package 44 Internal Parallel Processing
  • 45. Internal Parallelism Evaluation: Single Node 5/7/2012 46
  • 46. Single Node: Dedup Ratio 5/7/2012 47
  • 50. Conclusion McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 51
  • 52. Many Thanks! And Questions? 5/7/2012 53

Editor's Notes

  1. Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  2. Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  3. Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  4. Aesop’s Fable: The Tortoise and the Hare. Speed gap. Fast Runner wait for the slower one.Over the last several decades, the performance has increased rapidly. While, the performance improvement of I/O is relatively slow. It cause... the gap between CPU performance and I/O bandwidth has continually grown. Especially, for data-intensive computing workloads, I/O bottlenecks often cause low CPU utilization.
  5. BLAST is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
  6. Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  7. The pipeline pattern no only improves the performance by exploiting the par- allelism, but also can solve the out-of-core processing issue, which means required amount of data are too large to fit in the ASN’s main memory. In pp-mpiBlast, partition function is implemented within mpiformatdbfucntion running on ASN. And the merge function is a separate one running on the front node of the cluster.
  8. Response time, speedup, and throughput are three critical performance measures for the pipelined BLAST. Denoting T1 and T2 as the execution times associated with the first stage and second stage in the pipeline, we can calculate the response time Tresponse for processing each input data set as the sum of T1 and T2.
  9. Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  10. Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  11. Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  12. Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  13. One limitation of flash memory is that although it can be read or programmed a byte or a word at a time in a random access fashion, it can only be erased a "block" at a time. This generally sets all bits in the block to 1. Starting with a freshly erased block, any location within that block can be programmed. However, once a bit has been set to 0, only by erasing the entire block can it be changed back to 1. In other words, flash memory (specifically NOR flash) offers random-access read and programming operations, but cannot offer arbitrary random-access rewrite or erase operations.Based on theSSD lifetime calculator provided by Virident website [36], the lifetime of a 200GB MLC-based SSD could be only 160 days if the write rate performing on it is 50MB/s.
  14. The performance depends on the number of writes we removed.In real world implementation, (1) conservative comparison: no optimization, consider writes as synchronous (2) log file system->reduce seek and rotational delays of HDD (3) asynchronous writes: from the user perspective, the delay is not obvious (i.e. can omit)
  15. Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary