• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Trelles_QnormBOSC2009
 

Trelles_QnormBOSC2009

on

  • 559 views

 

Statistics

Views

Total Views
559
Views on SlideShare
559
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Now, lets address the use of parallel processing in bioinformatic applications

Trelles_QnormBOSC2009 Trelles_QnormBOSC2009 Presentation Transcript

  • Q norm: A library of parallel methods for gene-expression Q-normalization José Manuel Mateos-Duran; Pjotr Prins; Andrés Rodríguez and Oswaldo Trelles The Bioinformatics Open Source Conference (BOSC)
  • European Concerted Research Action (COST) Bioinformatics new generation open source Bi ng os Improving open source software for high performance computing in Biology Problem : New HT technologies in several areas of life sciences produce enormous amounts of data. A bottleneck in our ability to process and analyse the data Solution : Increase communication between Bioinformatics, HPC and OSS communities for adapting/developing capable software tools
    • High throughput applications: storage, ensambling, ….
    • Large scale analysis: seq.Analysis, Phylog; GeneExp…
    • Structural analysis (3D modelling)
    • Systems biology (complex interactions)
    • Automatic parallelization (scripting) & benchmarking
  • 1) Load data to memory 2) Order each column of R producing a set of indexes I[G][E]=p (where p is the original position of the value in column 4) Assign the average value to all entries O[g][e]= A[g] g=1 to G; e=1 to E 3) Obtain A[G] the average value for each row 5) Sort each column O[g][E] by the index I[g][E] (reproduce the original order) Q uantile normalization
  • C ode reorganization { nE = LoadProject(fname, fList); for (i=0;i< nE;i++) { // for each Exp [STEP 1] LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Out file & one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or disk) for (j=0;j<nG;j++) { // prepare OUT array dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } P arallel prototype
  • S hared memory version { nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Output file and one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or from disk) for (j=0;j<nG;j++) { // complete output vector dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } #pragma omp parallel shared From, To, Range // Open general parallel section #pragma omp parallel shared From, To, Range
  • Master Slave(s) Get Parameters, Initialize Start with params CalculateBlocks(nP,IniBlocks) Broadcast(IniBlocks)  Receive (Block) while(!ENDsignal) { for each experiment in block { LoadExperiment(Exp) SortExperiment(Exp) AcumulateAverage(Exp); } while (ThereIsBlocks) { AverageBlock(ResultBlock) Receive(ResultBlock,who)  Send(ResultBlock) AverageBlock(ResultBlock) if(!SendAllBlocks) { CalculateNextBlock(NextBlock) Send(who,NextBlock)  Receive(Block); } } } Broadcast(ENDsignal)  ReportResults M essage P assing version
  • CPU nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); CopyToGPU(dataIn); <<kernel>> QSortGPU(dataIn, dIndex) CopyFromGPU(dIndex); WriteToDisk(dIndex); <<kernel>> RowAccum(dataIn, AvG) } <<kernel>> GlobalAvg (AvG, nE) CopyFromGPU(AvG); // Step 2: Produce Output File // Using indexes and global average G PU version GPU NVIDIA CUDA Programming Model GPU kernels: QSortGPU(dataIn, dIndex) RowAccum(dataIn, AvG) GlobalAvg(AvG, nE)
  • Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 470 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). H ardware & D ata Pablo : Shared Memory Cluster up-to 256 Nodes / JS20-IBM 512 CPUs - 1TB Distributed memory. Each node: 2 CPUs IBM PowerPC single-core 970FX - 64 bits - 2 GHz & 4GB RAM mem. HD : 40 GB (local) Interconnection Network: MERINET Picasso: Shared Memory Cluster up-to 64 Nodes Superdome HP 128 CPUs - 128 GB SM. Each node: 2 CPUs Intel Itanium-2 Dual Core - 1,6 GHz Almeria: CPU: Intel Core 2 Quad Q9450, 2.66 GHz, 1.33 GHz FSB, 12 MB L2 GPU: GeForce 9800 GX2, 600/1500 MHz, 2x1 GHz DDR3, 1 GB & 512 bits HD: 2 x 72 GB (RAID 0) **Western Digital Raptors **10000 RPM.
  • Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 4 70 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). B enchmarking Distributed memory Shared memory GPU 2.9 x total speed-up 5.5 x processing speed-up
  • C onclusions Background Application domain: bioinformatics (diverse, disperse, heterogeneous, huge data…) I/O and memory oriented applications Large collection of sequential code unable to deal with computational demands Aims Featuring the application domain Start-up a library of (common) parallel procedures. Benchmarking Performance is strong related to code dependencies Parallel models (shared, distributed, etc) are appropriated for different code structures Shared memory is good but expensive GPU-based solution seem to be a good alternative for local installations I/O bounded applications should search of performance in the I/O device Q norm