SlideShare a Scribd company logo
1 of 30
Download to read offline
GeneIndex: an open source
parallel program for enumerating
and locating words in a genome

 Huian Li, David Hart, Matthias Mueller, Ulf Markward, Craig Stewart


                            August 3, 2009
                            A    t3
Contents
•   Motivation
•   Serial algorithm
•   Parallel implementation
•   Performance Analysis
•   Conclusion
Motivation

             Question from a Biology professor:
             Given a word length, is th computational
             Gi           dl   th i the         t ti l
             task of scanning a DNA sequence and
             recording th positions of all possible
                  di the      iti     f ll     ibl
             words trivial?

            5    10   15   20   25   30
            *    *    *    *    *    *
   5’   TAGCCGTGGCGGAGCCTCTTGGCTTTGTTTATTC         3’
Serial algorithm
• Straightforward implementation:
     Binary coding for A, C, G, T. For example:
     A:
     A 00
     C: 01
     G: 10
     T: 11

          5         10        15        20
          *         *         *         *
  T A G C C G T G G C G G A G C C T C T T G ...
  110010010110111010011010001001011101111110 ...
Serial algorithm
• Given a sequence and a word length k in order to
                                         k,
  list all possible words, we scan the sequence once
  from left to right
                 g
           5        10        15        20
           *        *         *         *
  T A G C C G T G G C G G A G C C T C T T G       ...
  110010010110111010011010001001011101111110      ...
  T A G C
  11001001
    A G C C
    00100101
      G C C G
      10010110
                                    C T T G       ...
                                    0
                                    011111100     ...
Serial algorithm
           5        10        15        20
           *        *         *         *
  T A G C C G T G G C G G A G C C T C T T G ...
  110010010110111010011010001001011101111110 ...
  T A G C
  11001001
    A G C C
    00100101

  ENCODE( AGCC )
  ENCODE("AGCC") =
  ENCODE("TAGC") & MASK << 2 | ENCODE('C')

  MASK = 4k-1 – 1 = 111111 (in case of k = 4)
Serial algorithm
• This essentially becomes a sorting problem since
                                         problem,
  each word is now converted into an integer.
• Each word is associated with its position
  information: (Encoded Word, Position)
• Sorting has to be stable so that for the same words
                                                  words,
  their positions will be in a certain order.
Serial algorithm
Implementation details:
• Words & positions are stored in a long long
  integer (8 bytes = 64 bits)
• Hash table with a linked list for each entry
• S
  Space required f all words i given sequence i
              i d for ll     d in i             is
  pre-allocated, instead of malloc one by one
• M tl AND OR and SHIFT LEFT operations.
  Mostly AND,         d SHIFT-LEFT           ti
Word frequencies
Word distribution
Motivation for parallel implementation



                  Another question
                  from Biology professor:
                  How about the human genome?



        Fact: Human genome includes
        about 3 billion DNA bases.
Parallel implementation: input
Large dataset input:
• Each process reads its own partition from the input
  file.
  file
• Boundary area between neighboring processes has
  to be considered.
        considered
                agcatgcatgcatcgatcgatcgatgc
                atcgatgcatcgatacgatgcatgcta
                gacgatacgagcatgcatctagcatgc
                      t      t    t t     t
                agtagcatgcatcgatgcattagcatg
                ctagctagcatgctagcatgcatcgat
                g
                gcatgctagcatgctagctagcatgct
                    g   g   g   g     g   g
                atgcatgcatgcatcatgcatcgatcg
                atcgtgcaatgcatgctacgatgcatg
                catcagtcagcatgcatgcatcgatcg
                atgcatcgatcgatgcatgcatgacga
                 t    t   t  t    t     t
                gcaatgatgcagtcatgcatcgacgag
                catcgatcgatgcatgcatgcat
Parallel implementation: load balancing
Computation and load balancing:
1. Each process deals with its own piece of data
2 All processes perform global sorting
2.
        Straightforward implementation: binary tree merge
    X   sorting
       Possible solution but could be problematic
       Ideal solution leading to load balancing
Parallel implementation: load balancing
Straightforward implementation: binary tree merge
  sorting
                                               AAAA:
                                               AAAC:
                                         P0    ...



                                               TTTG:
                                               TTTT:




                      AAAA:
                                                                         AAAA:
                      AAAC:
                                                                         AAAC:
                                                                         AAAC
                P0    ...
                                                                 P2      ...


                      TTTG:
                                                                         TTTG:
                      TTTT:
                                                                         TTTT:




         AAAA: 5, 9                AAAA: 19                 AAAA: 4, 8                AAAA: 35
         AAAC: 22                  AAAC: 12                 AAAC: 67                  AAAC: 46
    P0   ...                  P1   ...                 P2   ...                  P3   ...



         TTTG: 101                 TTTG: 201                TTTG: 88                  TTTG: 40
         TTTT: 80                  TTTT: 26                 TTTT: 53                  TTTT: 30
Parallel implementation: load balancing
Possible solution but could be problematic:
   Straightforward solution: partition word range [0, 4k) equally,
     so each process has
            4 k  i 4 k  i  1 
               ,                  , where i = 0, 1, ..., n-1
             n     n      



    AAAA:               AAAA:         AAAA:             AAAA:
    AAAC:               AAAC:         AAAC:             AAAC:
    CAAC:               CAAC:         CAAC:             CAAC:
    CTTT:               CTTT:         CTTT:             CTTT: `
    GAAA:               GAAA:         GAAA:             GAAA:
    GTTT:               GTTT:         GTTT:             GTTT:
    TTTG:               TTTG:         TTTG:             TTTG:
    TTTT:               TTTT:         TTTT:             TTTT:
Parallel implementation: load balancing
Implementation of the straightforward solution:
• Problem is that some words occur more often than others,
  leading to different memory requests for different p
        g                   y q                      processes



     AAAA: 5, 9     CAAA: 19       GAAA: 4, 8     TAAA: 35, 93
     AAAC: 22, 37   CAAC: 12, 47   GAAC: 67, 72   TAAC: 46
     ...            ...            ...            ...



     ATTG: 101      CTTG: 201      GTTG: 88       TTTG: 40, 87
     ATTT: 80       CTTT: 26       GTTT: 53       TTTT: 15, 30
Parallel implementation: load balancing
Ideal solution leading to load balancing:
•   partition the total number of words L-k+1 equally, so each
    p
    process has (  (L-k+1)/n words, where L is the length of
                          )          ,                  g
    given sequence, k is the given word length.
Implementation:
  p
1. After each process scanned its own piece, we know that:
           4 k 1
                                      L  k 1
           
           x 0
                    f (Wx , Pi ) 
                                         n
                                               , where i = 0 1 ..., n 1
                                                           0, 1,    n-1


                            g [0,4k) into many small divisions
2. We divide the word range [                y
   with total divisions of d (where d>>n):
               4k
                  ( j 1) 1
          d 1 d
                                              L  k 1
          
          j 0
                             f (Wx , Pi ) 
                                                 n
                                                       , where i = 0, 1, ..., n-1
                         4k
                    x      j
                         d
Parallel implementation: load balancing
Implementation:
3. The number of words in each small division as below:
                       4k
                          ( j 1) 1
                       d
         T (i, j )         f (W , P ) x       i       , where i = 0, 1, ..., n-1
                           4k
                         x  j
                           d
                                                            and j = 0, 1, ..., d-1


4. The total number of words in each small division across all
   p
   processes will be:
                       4k
                          ( j 1) 1
                  n 1 d
         T ( j)             f (W , P )   x       i   , where j = 0 1 ..., d-1
                                                                    0, 1,    d 1
                  i 0       4k
                           x  j
                             d
Parallel implementation: load balancing
Implementation:
5. Find an array of boundary B, such that:
           B ( m 1)
                       L  k 1
            mT ( j )  n
           jB( )

    where B(0)= 0,
          B(0<m<n) = any number in (0,d-1),
      and B(n)= d-1

6. Process Pi should have all words in:
      [B(i),
      [B(i) B(i+1)) , where i = 0 1 ..., n-1
                                      0, 1, n 1.
Parallel implementation: load balancing



   AAAA:       AAAA:       AAAA:          AAAA:
   AAAC:       AAAC:       AAAC:          AAAC:
   CAAC:       CAAC:       CAAC:          CAAC:
   CTTT:       CTTT:       CTTT:          CTTT: `
   GAAA:       GAAA:       GAAA:          GAAA:
   GTTT:       GTTT:       GTTT:          GTTT:
   TTTG:       TTTG:       TTTG:          TTTG:
   TTTT:       TTTT:       TTTT:          TTTT:
Parallel implementation: output
Output:
• Each process creates its own output file
• If necessary, all files can concatenate into one
     necessary
  single file, while keeping the order

   AAAA: 5, 9, 10
   AAAC: 22, 37     ...                   ...

   ...                                    TTTG: 15, 30
                                          TTTT: 40, 87
Testbed
Testbed specification
• Consists of 768 IBM JS21 blades
• On each blade:
     • 2 dual core PowerPC CPUs @ 2 5GHz
         dual-core                  2.5GHz
     • 8 GB memory
     • SUSE Linux Enterprise Server 9 (ppc)
• Interconnect: Myrinet
• Parallel environment: MPI
Performance analysis

    Number of    Number of      D. melanogaster       H. sapiens
      nodes      processes         k=6       k=25      k=6       k=25
             1             1        95      23203
             2             2        53       5770
             4             4        29       1500
             8             8        15        386
            16            16          9       107      212     93672
            32            32          6         32     118     24934
            64            64          5         14      73      5998
          128            128          9         11      59      1450
          256            256        11          15      49       558
          512            512        18          22      71       195


  Timings of running against two datasets on BigRed using 1 PPN (SECONDS)
Performance analysis

    Number of    Number of      D. melanogaster       H. sapiens
      nodes      processes         k=6       k=25      k=6       k=25
             1             2        54       5958
             2             4        31       1545
             4             8        17        400
             8            16        11        115
            16            32          8         40     156     25661
            32            64          6         19     101      6233
            64           128          7         12      82      1506
          128            256        25          16      61       576
          256            512        20          27      81       198
          512           1024        34          37     115       170


  Timings of running against two datasets on BigRed using 2 PPN (SECONDS)
Performance analysis

    Number of    Number of      D. melanogaster       H. sapiens
      nodes      processes         k=6       k=25      k=6       k=25
             1             4        37       1738
             2             8        23        453
             4            16        15        130
             8            32        10          46
            16            64          9         23     163      6649
            32           128        11          17     121      1632
            64           256        20          25      96       634
          128            512        39          42     120       233
          256           1024        79          90     180       187
          512           2048       134        131      270       281


  Timings of running against two datasets on BigRed using 4 PPN (SECONDS)
Performance analysis
                             Scalability in terms of node numbers
                40.00

                35.00

                30.00

                25.00
           up
      Speedu



                                                                          1 ppn
                20.00                                                     2 ppn

                15.00                                                     4 ppn
                                                                            pp

                10.00

                 5.00

                 0.00
                        16        32      64     128     256        512
                                       Number of nodes


          Scalability of enumerating 6-mers in H. sapiens
Performance analysis
                          Scalability in terms of node numbers
                600.00


                500.00


                400.00
           up
      Speedu



                                                                       1 ppn
                300.00                                                 2 ppn

                                                                       4 ppn
                                                                         pp
                200.00
                200 00


                100.00


                  0.00
                         16     32      64     128     256       512
                                     Number of nodes


          Scalability of enumerating 25-mers in H. sapiens
Conclusion
• Addressed questions from the biology professor 
• Complicate solution aroused from memory
  restriction.
  restriction
• It can handle words of length up to 30.
• It can find often-repeated words, rarely-occurred , or
          fi d ft         t d      d       l          d
  even non-occurred words.
• It scales relatively well on l
        l      l ti l    ll    large cluster machines.
                                       l t       hi
• We recently developed a Java version for small
  DNA sequences, which was “      “our future work”. It can
                                       f          ”
  zoom in or zoom out to view distribution and
  frequencies interactively.
  f          i i t     ti l
The End




          Thank you

More Related Content

What's hot

Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCSR2011
 
SNM 2007, Washington DC
SNM 2007, Washington DCSNM 2007, Washington DC
SNM 2007, Washington DCgzferl
 
Class3
 Class3 Class3
Class3issbp
 
The H.264 Integer Transform
The H.264 Integer TransformThe H.264 Integer Transform
The H.264 Integer TransformIain Richardson
 
timing-analysis
 timing-analysis timing-analysis
timing-analysisVimal Raj
 
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...SSA KPI
 
Optimum failure truncated testing strategies
Optimum failure truncated testing strategies Optimum failure truncated testing strategies
Optimum failure truncated testing strategies ASQ Reliability Division
 
Class8
 Class8 Class8
Class8issbp
 
Os6 2
Os6 2Os6 2
Os6 2issbp
 
Alba ffs flim 2012
Alba ffs flim 2012Alba ffs flim 2012
Alba ffs flim 2012benbarbieri
 

What's hot (10)

Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyen
 
SNM 2007, Washington DC
SNM 2007, Washington DCSNM 2007, Washington DC
SNM 2007, Washington DC
 
Class3
 Class3 Class3
Class3
 
The H.264 Integer Transform
The H.264 Integer TransformThe H.264 Integer Transform
The H.264 Integer Transform
 
timing-analysis
 timing-analysis timing-analysis
timing-analysis
 
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
 
Optimum failure truncated testing strategies
Optimum failure truncated testing strategies Optimum failure truncated testing strategies
Optimum failure truncated testing strategies
 
Class8
 Class8 Class8
Class8
 
Os6 2
Os6 2Os6 2
Os6 2
 
Alba ffs flim 2012
Alba ffs flim 2012Alba ffs flim 2012
Alba ffs flim 2012
 

Viewers also liked

Appraisers Direct, Inc.
Appraisers Direct, Inc.Appraisers Direct, Inc.
Appraisers Direct, Inc.ClintCornett
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IUPTIHPA
 
Air Traffic
Air TrafficAir Traffic
Air Traffichumair73
 
A Common Sense Approach Electronic
A Common Sense Approach   ElectronicA Common Sense Approach   Electronic
A Common Sense Approach ElectronicRegina Henderson
 
Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi PresentationPTIHPA
 

Viewers also liked (7)

Appraisers Direct, Inc.
Appraisers Direct, Inc.Appraisers Direct, Inc.
Appraisers Direct, Inc.
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IU
 
Air Traffic
Air TrafficAir Traffic
Air Traffic
 
A Common Sense Approach Electronic
A Common Sense Approach   ElectronicA Common Sense Approach   Electronic
A Common Sense Approach Electronic
 
Aca Talent
Aca TalentAca Talent
Aca Talent
 
Ciclismo Neiva
Ciclismo NeivaCiclismo Neiva
Ciclismo Neiva
 
Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
 

Similar to GeneIndex: an open source parallel program for enumerating and locating words in a genome

Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assemblyZhuyi Xue
 
Horspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTUHorspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTUSaMarthHitnalli
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Flink Forward
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingVasia Kalavri
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptIndra Hermawan
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptxrediet43
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...Positive Hack Days
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big DataPingCAP
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionMengmeng Xu
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...IOSR Journals
 
Algorithm Analysis
Algorithm AnalysisAlgorithm Analysis
Algorithm AnalysisMegha V
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02mansab MIRZA
 

Similar to GeneIndex: an open source parallel program for enumerating and locating words in a genome (20)

Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
 
Fine Grained Complexity
Fine Grained ComplexityFine Grained Complexity
Fine Grained Complexity
 
discrete-hmm
discrete-hmmdiscrete-hmm
discrete-hmm
 
Horspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTUHorspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTU
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.ppt
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action Detection
 
Biochip
BiochipBiochip
Biochip
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Lec20 dimension1
Lec20 dimension1Lec20 dimension1
Lec20 dimension1
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
 
Lec1
Lec1Lec1
Lec1
 
Algorithm Analysis
Algorithm AnalysisAlgorithm Analysis
Algorithm Analysis
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
Analysis
AnalysisAnalysis
Analysis
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02
 

More from PTIHPA

2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_onPTIHPA
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace VisualizationPTIHPA
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurementPTIHPA
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configurationPTIHPA
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indianaPTIHPA
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program AnalysisPTIHPA
 
Switc Hpa
Switc HpaSwitc Hpa
Switc HpaPTIHPA
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert HenschelPTIHPA
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In DetailPTIHPA
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace VisualizationPTIHPA
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir OverviewPTIHPA
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir UsagePTIHPA
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopPTIHPA
 

More from PTIHPA (14)

2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indiana
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
 
Switc Hpa
Switc HpaSwitc Hpa
Switc Hpa
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

GeneIndex: an open source parallel program for enumerating and locating words in a genome

  • 1. GeneIndex: an open source parallel program for enumerating and locating words in a genome Huian Li, David Hart, Matthias Mueller, Ulf Markward, Craig Stewart August 3, 2009 A t3
  • 2. Contents • Motivation • Serial algorithm • Parallel implementation • Performance Analysis • Conclusion
  • 3. Motivation Question from a Biology professor: Given a word length, is th computational Gi dl th i the t ti l task of scanning a DNA sequence and recording th positions of all possible di the iti f ll ibl words trivial? 5 10 15 20 25 30 * * * * * * 5’ TAGCCGTGGCGGAGCCTCTTGGCTTTGTTTATTC 3’
  • 4. Serial algorithm • Straightforward implementation: Binary coding for A, C, G, T. For example: A: A 00 C: 01 G: 10 T: 11 5 10 15 20 * * * * T A G C C G T G G C G G A G C C T C T T G ... 110010010110111010011010001001011101111110 ...
  • 5. Serial algorithm • Given a sequence and a word length k in order to k, list all possible words, we scan the sequence once from left to right g 5 10 15 20 * * * * T A G C C G T G G C G G A G C C T C T T G ... 110010010110111010011010001001011101111110 ... T A G C 11001001 A G C C 00100101 G C C G 10010110 C T T G ... 0 011111100 ...
  • 6. Serial algorithm 5 10 15 20 * * * * T A G C C G T G G C G G A G C C T C T T G ... 110010010110111010011010001001011101111110 ... T A G C 11001001 A G C C 00100101 ENCODE( AGCC ) ENCODE("AGCC") = ENCODE("TAGC") & MASK << 2 | ENCODE('C') MASK = 4k-1 – 1 = 111111 (in case of k = 4)
  • 7. Serial algorithm • This essentially becomes a sorting problem since problem, each word is now converted into an integer. • Each word is associated with its position information: (Encoded Word, Position) • Sorting has to be stable so that for the same words words, their positions will be in a certain order.
  • 8. Serial algorithm Implementation details: • Words & positions are stored in a long long integer (8 bytes = 64 bits) • Hash table with a linked list for each entry • S Space required f all words i given sequence i i d for ll d in i is pre-allocated, instead of malloc one by one • M tl AND OR and SHIFT LEFT operations. Mostly AND, d SHIFT-LEFT ti
  • 11. Motivation for parallel implementation Another question from Biology professor: How about the human genome? Fact: Human genome includes about 3 billion DNA bases.
  • 12. Parallel implementation: input Large dataset input: • Each process reads its own partition from the input file. file • Boundary area between neighboring processes has to be considered. considered agcatgcatgcatcgatcgatcgatgc atcgatgcatcgatacgatgcatgcta gacgatacgagcatgcatctagcatgc t t t t t agtagcatgcatcgatgcattagcatg ctagctagcatgctagcatgcatcgat g gcatgctagcatgctagctagcatgct g g g g g g atgcatgcatgcatcatgcatcgatcg atcgtgcaatgcatgctacgatgcatg catcagtcagcatgcatgcatcgatcg atgcatcgatcgatgcatgcatgacga t t t t t t gcaatgatgcagtcatgcatcgacgag catcgatcgatgcatgcatgcat
  • 13. Parallel implementation: load balancing Computation and load balancing: 1. Each process deals with its own piece of data 2 All processes perform global sorting 2. Straightforward implementation: binary tree merge X sorting  Possible solution but could be problematic  Ideal solution leading to load balancing
  • 14. Parallel implementation: load balancing Straightforward implementation: binary tree merge sorting AAAA: AAAC: P0 ... TTTG: TTTT: AAAA: AAAA: AAAC: AAAC: AAAC P0 ... P2 ... TTTG: TTTG: TTTT: TTTT: AAAA: 5, 9 AAAA: 19 AAAA: 4, 8 AAAA: 35 AAAC: 22 AAAC: 12 AAAC: 67 AAAC: 46 P0 ... P1 ... P2 ... P3 ... TTTG: 101 TTTG: 201 TTTG: 88 TTTG: 40 TTTT: 80 TTTT: 26 TTTT: 53 TTTT: 30
  • 15. Parallel implementation: load balancing Possible solution but could be problematic:  Straightforward solution: partition word range [0, 4k) equally, so each process has  4 k  i 4 k  i  1   ,   , where i = 0, 1, ..., n-1  n n  AAAA: AAAA: AAAA: AAAA: AAAC: AAAC: AAAC: AAAC: CAAC: CAAC: CAAC: CAAC: CTTT: CTTT: CTTT: CTTT: ` GAAA: GAAA: GAAA: GAAA: GTTT: GTTT: GTTT: GTTT: TTTG: TTTG: TTTG: TTTG: TTTT: TTTT: TTTT: TTTT:
  • 16. Parallel implementation: load balancing Implementation of the straightforward solution: • Problem is that some words occur more often than others, leading to different memory requests for different p g y q processes AAAA: 5, 9 CAAA: 19 GAAA: 4, 8 TAAA: 35, 93 AAAC: 22, 37 CAAC: 12, 47 GAAC: 67, 72 TAAC: 46 ... ... ... ... ATTG: 101 CTTG: 201 GTTG: 88 TTTG: 40, 87 ATTT: 80 CTTT: 26 GTTT: 53 TTTT: 15, 30
  • 17. Parallel implementation: load balancing Ideal solution leading to load balancing: • partition the total number of words L-k+1 equally, so each p process has ( (L-k+1)/n words, where L is the length of ) , g given sequence, k is the given word length. Implementation: p 1. After each process scanned its own piece, we know that: 4 k 1 L  k 1  x 0 f (Wx , Pi )  n , where i = 0 1 ..., n 1 0, 1, n-1 g [0,4k) into many small divisions 2. We divide the word range [ y with total divisions of d (where d>>n): 4k ( j 1) 1 d 1 d L  k 1   j 0 f (Wx , Pi )  n , where i = 0, 1, ..., n-1 4k x j d
  • 18. Parallel implementation: load balancing Implementation: 3. The number of words in each small division as below: 4k ( j 1) 1 d T (i, j )   f (W , P ) x i , where i = 0, 1, ..., n-1 4k x  j d and j = 0, 1, ..., d-1 4. The total number of words in each small division across all p processes will be: 4k ( j 1) 1 n 1 d T ( j)    f (W , P ) x i , where j = 0 1 ..., d-1 0, 1, d 1 i 0 4k x  j d
  • 19. Parallel implementation: load balancing Implementation: 5. Find an array of boundary B, such that: B ( m 1) L  k 1 mT ( j )  n jB( ) where B(0)= 0, B(0<m<n) = any number in (0,d-1), and B(n)= d-1 6. Process Pi should have all words in: [B(i), [B(i) B(i+1)) , where i = 0 1 ..., n-1 0, 1, n 1.
  • 20. Parallel implementation: load balancing AAAA: AAAA: AAAA: AAAA: AAAC: AAAC: AAAC: AAAC: CAAC: CAAC: CAAC: CAAC: CTTT: CTTT: CTTT: CTTT: ` GAAA: GAAA: GAAA: GAAA: GTTT: GTTT: GTTT: GTTT: TTTG: TTTG: TTTG: TTTG: TTTT: TTTT: TTTT: TTTT:
  • 21. Parallel implementation: output Output: • Each process creates its own output file • If necessary, all files can concatenate into one necessary single file, while keeping the order AAAA: 5, 9, 10 AAAC: 22, 37 ... ... ... TTTG: 15, 30 TTTT: 40, 87
  • 23. Testbed specification • Consists of 768 IBM JS21 blades • On each blade: • 2 dual core PowerPC CPUs @ 2 5GHz dual-core 2.5GHz • 8 GB memory • SUSE Linux Enterprise Server 9 (ppc) • Interconnect: Myrinet • Parallel environment: MPI
  • 24. Performance analysis Number of Number of D. melanogaster H. sapiens nodes processes k=6 k=25 k=6 k=25 1 1 95 23203 2 2 53 5770 4 4 29 1500 8 8 15 386 16 16 9 107 212 93672 32 32 6 32 118 24934 64 64 5 14 73 5998 128 128 9 11 59 1450 256 256 11 15 49 558 512 512 18 22 71 195 Timings of running against two datasets on BigRed using 1 PPN (SECONDS)
  • 25. Performance analysis Number of Number of D. melanogaster H. sapiens nodes processes k=6 k=25 k=6 k=25 1 2 54 5958 2 4 31 1545 4 8 17 400 8 16 11 115 16 32 8 40 156 25661 32 64 6 19 101 6233 64 128 7 12 82 1506 128 256 25 16 61 576 256 512 20 27 81 198 512 1024 34 37 115 170 Timings of running against two datasets on BigRed using 2 PPN (SECONDS)
  • 26. Performance analysis Number of Number of D. melanogaster H. sapiens nodes processes k=6 k=25 k=6 k=25 1 4 37 1738 2 8 23 453 4 16 15 130 8 32 10 46 16 64 9 23 163 6649 32 128 11 17 121 1632 64 256 20 25 96 634 128 512 39 42 120 233 256 1024 79 90 180 187 512 2048 134 131 270 281 Timings of running against two datasets on BigRed using 4 PPN (SECONDS)
  • 27. Performance analysis Scalability in terms of node numbers 40.00 35.00 30.00 25.00 up Speedu 1 ppn 20.00 2 ppn 15.00 4 ppn pp 10.00 5.00 0.00 16 32 64 128 256 512 Number of nodes Scalability of enumerating 6-mers in H. sapiens
  • 28. Performance analysis Scalability in terms of node numbers 600.00 500.00 400.00 up Speedu 1 ppn 300.00 2 ppn 4 ppn pp 200.00 200 00 100.00 0.00 16 32 64 128 256 512 Number of nodes Scalability of enumerating 25-mers in H. sapiens
  • 29. Conclusion • Addressed questions from the biology professor  • Complicate solution aroused from memory restriction. restriction • It can handle words of length up to 30. • It can find often-repeated words, rarely-occurred , or fi d ft t d d l d even non-occurred words. • It scales relatively well on l l l ti l ll large cluster machines. l t hi • We recently developed a Java version for small DNA sequences, which was “ “our future work”. It can f ” zoom in or zoom out to view distribution and frequencies interactively. f i i t ti l
  • 30. The End Thank you