• Save
Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing, Center for Computational Genetics, University of Pittsburgh, Michael Barmada Copenhagenomics 2012
Upcoming SlideShare
Loading in...5
×
 

Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing, Center for Computational Genetics, University of Pittsburgh, Michael Barmada Copenhagenomics 2012

on

  • 1,346 views

Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing

Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing

Statistics

Views

Total Views
1,346
Views on SlideShare
1,122
Embed Views
224

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 224

http://cphx.org 224

Accessibility

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Instructions for editing school and department titles:\n\n Select from menu: View > Master > Slide Master\n\n Click on each text area you wish to edit. Text will become editable.\n
  • Some quick facts about the University of Pittsburgh - we are very spread out, as is common in large academic/medical centers. UPMC in particular is one of the largest hospital systems in the US, serving a population of approximately 15 million people in what we call the tri-state area, with major emphasis in GI, transplant, cancer, and aging centers. Given the large amount of research funding brought in by university and upmc researchers, you can imagine we are a very active research center.\n
  • This map shows the location of NGS machines at Pitt that have shown up in the last three years - a total of 11 machines of all types (454, illumina, solid, ion torrent). The lack of centralization of these resources has hurt efforts to get good (reproducible) data from these centers.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing, Center for Computational Genetics, University of Pittsburgh, Michael Barmada Copenhagenomics 2012 Informatics and Computing Infrastructure for Clinical High-Throughput Sequencing, Center for Computational Genetics, University of Pittsburgh, Michael Barmada Copenhagenomics 2012 Presentation Transcript

  • Department Name (View Master > Edit Slide 1)ComputingInfrastructures forClinical NGSBarriers to centralizing data analysis at theUniversity of PittsburghM. Michael BarmadaDepartment of Human GeneticsGraduate School of Public Health, University of Pittsburgh
  • University of Pittsburgh• Geographically disperse campus (132 acres in Pittsburgh, plus regional campuses)• Large affiliated hospital system (UPMC - 23 hospitals spread out over tri-state area)• Has been ranked in the top cluster of research institutions in the US • 7th in nation in terms of funding from NIH M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • NGS at Pitt M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Common Analysis hurdles for NGS•Hardware •Computing capacity •Network •Storage•Software M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Common problems - (1) Hardware• When NGS machines started appearing on campus, there were 14 “high-performance” computing centers • Most were small group- or department- specific clusters (<100 cores) with limited storage and standard (GigE) networking • Larger computing resources were available at the Pittsburgh Supercomputing Center, but with limited availability M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Center for Simulation and Modeling (SAM)• One large (>3000 cores) cluster existed on campus - established by computational chemistry and engineering groups• Large capacity machines (>12 cores/48Gb RAM per node - many with 48 cores/128-256Gb RAM) • This cleared up the RAM and capacity problems M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Common Problems - (2) Storage• Despite early successes with SAM cluster, problems started appearing as number of users went up • Storage array - SAM cluster uses a shared NFS array for / home - reading and writing large files (read/quality files) became a serious bottleneck • Upgraded array to high-performance system (Panasas) - allowed for parallel (DirectFS/pNFS) access, greater throughput than standard RAID M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Common Problems - (3) Networking• Networking within the SAM cluster is a combination of infiniband and gigabit ethernet - not a problem• Networking on campus was a problem • Old network segments (100Mb), Firewalls (multiple hops) - maximum transfer speeds only 10-15Mbit • Upgrades “in progress” M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Common Problems - (3) Networking• Solutions • “Sneaker-net” - works, but leads to proliferation of drives, potential for data loss/corruption • Globus/GridFTP - faster than campus network (transfer speeds of 1-2Gbit) • Cloud-based services (SevenBridges) - surprisingly economical and efficient - sequencing centers upload data for individual groups, who then use the data for analysis (online or at local cluster) and for backup (desktops) M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Common Problems - (4) Software• Pipelines created for linking together common tools (BWA/ NovoAlign/GATK/Annovar) - but these require familiarity with command line/unix environment• With increasing use of NGS by medical/clinical research groups, we had more and more users who were not comfortable in a unix environment• Solutions: train users or develop non-unix-based interfaces M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Research Gateways• Several Bioinformatics/NGS gateways are in the process of being implemented• Each allows access to the computational resources of the SAM cluster using web-based or client-based interfaces M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Galaxy M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • CLCbio Genomic Server M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • CLCbio Genomic ServerCLCbio Genomics Workbench CLCbio Genomics Server M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Genboree M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • M. Michael Barmada Department of Human GeneticsGraduate School of Public Health, University of Pittsburgh
  • Issues with research gateways• Common data storage and data dedup • A current focus is configuring all NGS gateways so that they can all share the same common storage space and files, so we do not need to duplicate data cross multiple storage spaces • CLC bio Genomics Server “plays well with others”, as does Yabi, but Galaxy and Genboree do a lot of file permission modifications • Solution: create a meta-data store that ensures files are owned by the appropriate users and have appropriate permissions - coupled with cron tasks to monitor user/ permission changes M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Cloud Computing• Another alternative: Cloud computing/hybrid solutions (“Cloud-bursting”) • Currently setting up cloud-based storage/staging of NGS data to circumvent networking issues on campus • Natural extension to allow users to analyze data “in the cloud” • Similar offerings from several companies - we’re working with SevenBridges Genomics - nice billing interface for each individual use (storage/staging/analysis) M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Seven Bridges Genomics M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • NGS clinical process Patient/Family UPMC Pitt consented Samples DNA/ Sequencing drawn library prep Medical QC/filtering Report Alignment Analysis Variant Calling Interpretation Validation Annotation EMR Data Storage Data Storage (BAM, VCF files) (Raw, BAM, VCF) M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • The last analysis challenge• Even after fixing all of these issues, two major hurdles remain • Community • Organizing and coordinating all NGS efforts on campus would greatly speed up the pace of research • Education! • We need to educate clinicians and clinical-support staff (genetic counselors) to understand the limitations and the advantages of sequence data from the perspective of clinical utility M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh
  • Thanks! M. Michael Barmada Department of Human Genetics Graduate School of Public Health, University of Pittsburgh