Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Department Name (View Master > Edit Slide 1)ComputingInfrastructures forClinical NGSBarriers to centralizing data analysis...
University of Pittsburgh• Geographically disperse campus (132 acres in Pittsburgh, plus regional campuses)• Large affiliate...
NGS at Pitt                                                 M. Michael Barmada                                      Depart...
Common Analysis hurdles for NGS•Hardware  •Computing capacity  •Network  •Storage•Software                                ...
Common problems - (1) Hardware• When NGS machines started appearing on campus, there were 14 “high-performance” computing ...
Center for Simulation and Modeling (SAM)• One large (>3000 cores) cluster existed on campus - established by computational...
Common Problems - (2) Storage• Despite early successes with SAM cluster, problems started appearing as number of users wen...
Common Problems - (3) Networking• Networking within the SAM cluster is a combination of infiniband and gigabit ethernet - n...
Common Problems - (3) Networking• Solutions   • “Sneaker-net” - works, but leads to proliferation of drives,   potential f...
Common Problems - (4) Software• Pipelines created for linking together common tools (BWA/ NovoAlign/GATK/Annovar) - but th...
Research Gateways• Several Bioinformatics/NGS gateways are in the process of being implemented• Each allows access to the ...
Galaxy                                            M. Michael Barmada                                 Department of Human G...
CLCbio Genomic Server                                                           M. Michael Barmada                        ...
CLCbio Genomic ServerCLCbio Genomics  Workbench                  CLCbio Genomics                      Server              ...
Genboree                                              M. Michael Barmada                                   Department of H...
M. Michael Barmada                        Department of Human GeneticsGraduate School of Public Health, University of Pitt...
Issues with research gateways• Common data storage and data dedup   • A current focus is configuring all NGS gateways so th...
Cloud Computing• Another alternative: Cloud computing/hybrid solutions (“Cloud-bursting”)  • Currently setting up cloud-ba...
Seven Bridges Genomics                                                            M. Michael Barmada                      ...
NGS clinical process                         Patient/Family   UPMC     Pitt                           consented           ...
The last analysis challenge• Even after fixing all of these issues, two major hurdles remain   • Community      • Organizin...
Thanks!                                             M. Michael Barmada                                  Department of Huma...
Upcoming SlideShare
Loading in …5
×

Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing, Center for Computational Genetics, University of Pittsburgh, Michael Barmada Copenhagenomics 2012

1,720 views

Published on

Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing

Published in: Health & Medicine, Technology
  • Be the first to comment

Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing, Center for Computational Genetics, University of Pittsburgh, Michael Barmada Copenhagenomics 2012

  1. 1. Department Name (View Master > Edit Slide 1)ComputingInfrastructures forClinical NGSBarriers to centralizing data analysis at theUniversity of PittsburghM. Michael BarmadaDepartment of Human GeneticsGraduate School of Public Health, University of Pittsburgh
  2. 2. University of Pittsburgh• Geographically disperse campus (132 acres in Pittsburgh, plus regional campuses)• Large affiliated hospital system (UPMC - 23 hospitals spread out over tri-state area)• Has been ranked in the top cluster of research institutions in the US • 7th in nation in terms of funding from NIH M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  3. 3. NGS at Pitt M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  4. 4. Common Analysis hurdles for NGS•Hardware •Computing capacity •Network •Storage•Software M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  5. 5. Common problems - (1) Hardware• When NGS machines started appearing on campus, there were 14 “high-performance” computing centers • Most were small group- or department- specific clusters (<100 cores) with limited storage and standard (GigE) networking • Larger computing resources were available at the Pittsburgh Supercomputing Center, but with limited availability M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  6. 6. Center for Simulation and Modeling (SAM)• One large (>3000 cores) cluster existed on campus - established by computational chemistry and engineering groups• Large capacity machines (>12 cores/48Gb RAM per node - many with 48 cores/128-256Gb RAM) • This cleared up the RAM and capacity problems M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  7. 7. Common Problems - (2) Storage• Despite early successes with SAM cluster, problems started appearing as number of users went up • Storage array - SAM cluster uses a shared NFS array for / home - reading and writing large files (read/quality files) became a serious bottleneck • Upgraded array to high-performance system (Panasas) - allowed for parallel (DirectFS/pNFS) access, greater throughput than standard RAID M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  8. 8. Common Problems - (3) Networking• Networking within the SAM cluster is a combination of infiniband and gigabit ethernet - not a problem• Networking on campus was a problem • Old network segments (100Mb), Firewalls (multiple hops) - maximum transfer speeds only 10-15Mbit • Upgrades “in progress” M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  9. 9. Common Problems - (3) Networking• Solutions • “Sneaker-net” - works, but leads to proliferation of drives, potential for data loss/corruption • Globus/GridFTP - faster than campus network (transfer speeds of 1-2Gbit) • Cloud-based services (SevenBridges) - surprisingly economical and efficient - sequencing centers upload data for individual groups, who then use the data for analysis (online or at local cluster) and for backup (desktops) M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  10. 10. Common Problems - (4) Software• Pipelines created for linking together common tools (BWA/ NovoAlign/GATK/Annovar) - but these require familiarity with command line/unix environment• With increasing use of NGS by medical/clinical research groups, we had more and more users who were not comfortable in a unix environment• Solutions: train users or develop non-unix-based interfaces M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  11. 11. Research Gateways• Several Bioinformatics/NGS gateways are in the process of being implemented• Each allows access to the computational resources of the SAM cluster using web-based or client-based interfaces M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  12. 12. Galaxy M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  13. 13. CLCbio Genomic Server M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  14. 14. CLCbio Genomic ServerCLCbio Genomics Workbench CLCbio Genomics Server M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  15. 15. Genboree M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  16. 16. M. Michael Barmada Department of Human GeneticsGraduate School of Public Health, University of Pittsburgh
  17. 17. Issues with research gateways• Common data storage and data dedup • A current focus is configuring all NGS gateways so that they can all share the same common storage space and files, so we do not need to duplicate data cross multiple storage spaces • CLC bio Genomics Server “plays well with others”, as does Yabi, but Galaxy and Genboree do a lot of file permission modifications • Solution: create a meta-data store that ensures files are owned by the appropriate users and have appropriate permissions - coupled with cron tasks to monitor user/ permission changes M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  18. 18. Cloud Computing• Another alternative: Cloud computing/hybrid solutions (“Cloud-bursting”) • Currently setting up cloud-based storage/staging of NGS data to circumvent networking issues on campus • Natural extension to allow users to analyze data “in the cloud” • Similar offerings from several companies - we’re working with SevenBridges Genomics - nice billing interface for each individual use (storage/staging/analysis) M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  19. 19. Seven Bridges Genomics M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  20. 20. NGS clinical process Patient/Family UPMC Pitt consented Samples DNA/ Sequencing drawn library prep Medical QC/filtering Report Alignment Analysis Variant Calling Interpretation Validation Annotation EMR Data Storage Data Storage (BAM, VCF files) (Raw, BAM, VCF) M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  21. 21. The last analysis challenge• Even after fixing all of these issues, two major hurdles remain • Community • Organizing and coordinating all NGS efforts on campus would greatly speed up the pace of research • Education! • We need to educate clinicians and clinical-support staff (genetic counselors) to understand the limitations and the advantages of sequence data from the perspective of clinical utility M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  22. 22. Thanks! M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh

×