Your SlideShare is downloading. ×
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Varre_Biomanycores_BOSC2009

833

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
833
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Biomanycores, a repository of interoperable open-source code for many-cores bioinformatics Jean-St´phane Varr´, St´phane Janot, Mathieu Giraud e e e contact@biomanycores.org Sequoia Bioinformatics LIFL – UMR CNRS 8022 – Universit´ Lille 1, France e INRIA Lille Nord-Europe, France June 2009 J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 1 / 20
  • 2. Outline High-performance computing Graphical Processing Units and bioinformatics biomanycores.org aim of the project what has been done ? future developments J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 2 / 20
  • 3. High Performance Bioinformatics – Manycores 1970 – 2002: Moore’s law = increasing frequencies problems: power consumption, heat dissipation here from now on: Moore’s law continues with multiple cores from multicores: dual-cores, quad-cores, octo-cores... to manycores: Graphic processing units (GPUs) Nvidia GTX 285 ⇒ 30 × 8 cores, 1.2 GHz, 40 (×8) GFlops convergence CPU-GPU: Intel Larrabee J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 3 / 20
  • 4. High Performance Bioinformatics – Manycores GPGPU = General-Purpose computation on GPU until 2007: tweaking graphics primitives 2007: Nvidia CUDA 2009: OpenCL (Khronos Group) dec 08: 1.0 specification may 09: beta release of a Nvidia compiler AMD/ATI compiler coming soon ⇒ portable manycores applications ? With GPGPU... 10× / 100× peak speed-up, low costs ($50–$500) even with loss due to parallelism, 10× speed-up is possible (relatively) easy with CUDA / OpenCL, requires some learning J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 4 / 20
  • 5. GPU + Bioinformatics Methods “Graphical” GPGPU (2005/06): speed-up RAxML up to 2× Charalambous et al. 2005 ClustalW up to 7× Liu et al. 2006 CUDA (since 2007): speed-up mummerGPU up to 10× Schatz et al. 2007 Smith-Waterman up to 15× Manavski and Valle 2008 Neighbor-Joining up to 26× Liu et al. 2009 RNAfold up to 17× Risk and Lavenier 2009 ∼ 10 papers between 2007 and 2009 J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 5 / 20
  • 6. GPU + Bioinformatics Specific Bioinformatics HPC Events HiComb (IEEE Workshop on High Performance Computational Biology) since 2002 in conjunction with IPDPS [may 09, Roma] PBC (Parallel Bio-Computing Workshop) since 2005, every two years in conjunction with PPAM [sept 09, Wroclaw] HiBi (Workshop on High Performance Computational Systems Biology) [oct 09, Trento] J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 6 / 20
  • 7. Sequoia Bioinformatics LIFL, INRIA, Universit´ Lille 1, France e H. Touzet’s group, 14 people (including 5 PhD students) Large-scale sequence analysis Sequence comparisons, seed-based heuristics RNA, transcription factors, NRPS High-Performance Bioinformatics SIMD flexible read mapper (L. No´, M. Gˆ e ırdea) GPU PWM scan / P-value (22× – 77× on a GTX 280) GPU ADP (6.1× – 22.8× on a GTX 280, with U. Bielefeld) GPU & bit-parallelism pattern matching (ongoing) Supported by NVIDIA (Professor Partnership, 2009) J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 7 / 20
  • 8. GPU + Position-Weight Matrices (PWM) Parallel Position Weight Matrices Algorithms. M. Giraud and J.-S. Varr´. ISPDC’09 e PWMs are used for modeling transcription factor binding sites, transcription start sites, 2.0 TGT GGT protein domains, . . . bits 1.0 score threshold or P-value computation: A T T 0.0 TC A C A CT C A C C A requires to enumerate words A G 5 WebLogo 3.0 occurrences: requires to scan quickly a very long sequence 25x 100x CPU (one thread) GeForce 8800 GTX 280 20x GTX 280 (+ atomic) 10x 15x Speedup Speedup 10x CPU (one thread) GeForce 8800 GTX 280 1x 5x 35 40 45 50 55 60 65 70 0 10 20 30 40 50 60 70 80 90 Matrix length Matrix length J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 8 / 20
  • 9. HPC Bioinformatics for human beings ? Research in High-Performance Computing nice ideas, nice papers but not always exploited A few HPC bioinformatics frameworks projects... ⇒ far from everyday usage of bioinformaticians and biologists J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 9 / 20
  • 10. www.biomanycores.org 1. Share OpenCL code = public repository, open-source 2. Make it easy = Bio∗ integration 3. Benchmark algorithms, implementations, hardware J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 10 / 20
  • 11. www.biomanycores.org 1. Share OpenCL code (currently CUDA) = public repository, open-source 2. Make it easy = Bio∗ integration 3. Benchmark algorithms, implementations, hardware J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 10 / 20
  • 12. Already included projects SWcuda – Smith-Waterman protein alignment CRIBI Genomics, University of Padova, Italy S. A. Manavski, G. Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics 2008, 9(S2):S10 pknotsRG – pseudonots of an RNA sequence Universit¨t Bielefeld, Germany a J. Reeder, P. Steffen, R. Giegerich, pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows, Nucl. Acids. Res., 2007 cudaPWM – scan a PWM against a DNA sequence Sequoia, LIFL, INRIA, Universit´ Lille 1 e M. Giraud, J.-S. Varr´, Parallel Position Weight Matrices Algorithms, ISPDC’09 e Interfaces to BioJava 1.6, BioPerl 1.52, and Biopython 1.50b J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 11 / 20
  • 13. J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 12 / 20
  • 14. Biopython + CRIBI SW from Bio i m p o r t SeqIO from Biomanycores i m p o r t PadovaSW bank = SeqIO . parse ( open ( ” u n i p r o t −s t a r t . f a ” ) , ” f a s t a ” ) f o r query i n SeqIO . parse ( open ( ” p r o t 6 4 . f a ” ) , ” f a s t a ” ) : handle = PadovaSW . run ( query , bank ) result = PadovaSW . SWParser ( ) . parse ( ) p r i n t result J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 13 / 20
  • 15. Biopython + CRIBI SW Tests on a GeForce 8800 biopython$ time python sw-demo.py cuda ** cd ../bin/ ; ./swcuda config.gpu ../tmp/swcuda.fa ../tmp/swcuda.bank ** 1.846s 12098 results... [(84.0, 0, 0, ’sp|P30350|ADH1_ANAPL’), (81.0, 0, 0, ’sp|P23991|ADH1_CHICK’), (81.0, real 2.81 user 1.79 sys 0.27 biopython$ time python sw-demo.py cpu ** cd ../bin/ ; ./swcuda config.cpu ../tmp/swcuda.fa ../tmp/swcuda.bank ** 16.604s 12098 results... [(84.0, 0, 0, ’sp|P30350|ADH1_ANAPL’), (81.0, 0, 0, ’sp|P23991|ADH1_CHICK’), (81.0, real 17.57 user 16.42 sys 0.14 10× – 15× paper speedup (BMC Bioinformatics 2008, 9S2) 8.7× application speedup 6.2× final speedup (including Biopython/Biomanycores) J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 14 / 20
  • 16. BioPerl + CRIBI SW BioPerl tutorial u s e Bio : : Tools : : pSW ; $factory = new Bio : : Tools : : pSW ( ’−m a t r i x ’=> ’ b l o s u m 6 2 . b l a ’ , ’−gap ’ ← =>12, ’−e x t ’ =>2) ; $factory−>alig n_and_sh ow ( $seq1 , $seq2 , STDOUT ) ; $aln = $factory−>p a i r w i s e _ a l i g n m e n t ( $seq1 , $seq2 ) ; With biomanycores u s e Bio : : SeqIO ; u s e Biomanycores : : PadovaSW ; $factory = PadovaSW−>new ( ) ; $factory−>swcuda ( $inputseq , $bank ) ; @r = $factory−>parse_result ( ) ; J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 15 / 20
  • 17. BioJava + PWM i m p o r t org . biojavax . bio . seq . RichSequence ; i m p o r t org . biojava . bio . dp . S i m p l e W e i g h t M a t r i x ; ... i m p o r t org . biomanycores . bio . pwm . ∗ ; ... { LillePWMScan scanner = new LillePWMScan ( launcher ) ; // r e a d t h e s e q u e n c e R i c h S e q u e n c e I t e r a t o r it = n u l l ; Buffe redRead er in1 = new Buff eredRead er ( new FileReader ( args [ 1 ] ) ) ; it = RichSequence . IOTools . readFastaDNA ( in1 , n u l l ) ; RichSequence query = it . n e x t R i c h S e q u e nc e ( ) ; // r e a d a w e i g h t m a t r i x S i m p l e W e i g h t M a t r i x pwm = PFMParser . PARSER . get ( args [ 2 ] , alph , ”ACGT” ) ; // s c a n t h e s e q u e n c e List<PWMHit> al = scanner . scan ( query , pwm , threshold ) ; } J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 16 / 20
  • 18. Challenges Differents APIs, different philosophies BioJava : no external program execution ? Object representation (alignments) Object existence (PWM) Minimal modifications to the source code of applications CribiSW : command-line arguments Real-world pipelines ? Bio∗ are not HPC frameworks Succession of several programs Usage: requires CUDA / OpenCL SDK J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 17 / 20
  • 19. Licenses Projects must have an open-source licence Bio∗ interfaces : same license than mother API BioJava: LGPL 2.1 BioPerl: Perl artistic license Biopython: Biopython license J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 18 / 20
  • 20. www.biomanycores.org 1. Share OpenCL code (currently CUDA) = public repository, open-source ⇒ bring new projects 2. Make it easy = Bio∗ integration ⇒ integrate new projects ⇒ improve current interfaces 3. Benchmark algorithms, implementations, hardware ⇒ think ! J.-S. Varr´, S. Janot, M. Giraud (LIFL) e Biomanycores June 2009 19 / 20
  • 21. go back

×