College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
CCCB Germline Variant Analysis on Cloud Platform
1. CCCB Germline Variant
Analysis on Cloud Platform
Center for Cancer Computational Biology (SM822)
Bioinformatics Team
Homepage: https://cccb.dfci.harvard.edu/
Twitter: @CCCBseq
2. Typical Problems with Data Analysis
Have sequencing data generated but...
○ don’t know where to securely store them long term
○ uploading to GenePattern or Galaxy for analysis is taking forever
○ my bioinformaticians can not process it today
○ want to make additional differential expression contrasts
○ alignment is taking forever to run
○ my exome data is taking forever to run
○ don’t know how to work with variant data
○ my thousand exome is crushing my bioinformaticians’ HPC server
○ I am the bioinformatician and I don’t have the time to do all these analysis!
CCCB Cloud Computing Systems can help!
3. Advantages of Using Cloud Systems
By integrating DFCI Google Virtual Private Cloud and Partners Dropbox Enterprise, the CCCB Cloud
Systems offer convenient, fast, and secure methods to transfer, analyze, and store large sequence data.
Convenient
○ Experimentalists can upload and analyze data on their own anytime
○ Simplified large data upload and download processes by connection to Dropbox
Fast
○ Germline variant analysis can be typically be done within a day from either fastq or bam
files
○ Scalable infrastructure with virtually no computing resource limitation
○ Minimal wait time to get data analyzed
Secure
○ Google Cloud Platform (GCP) is covered by Google-DFCI BAA to ensure HIPAA compliance
security
○ All data can be encrypted with SSL/TLS protocol during transfer
○ Partners’ Dropbox Business can be used as a storage solution for secure and long term data
archive
4. Important accounts and where to get them
DFCI G Suite Account (or just Google Account)
Google accounts linked with organization emails are prefered even though any
google account can be used. For DFCI community, please request an DFCI
google account (user@mail.dfci.harvard.edu) through Research Computing
website: http://rc.dfci.harvard.edu/contact-research-computing
Partners Dropbox
All Dropbox account will work with our systems. Partners Health provides virtually
unlimited encrypted storage on Dropbox Business for all Partners community
members (anyone with partners.org email) for free. Information is available here:
https://rc.partners.org/kb/collaboration/dropbox?article=2062
Agilent CrossLab (a.k.a iLab Solutions)
As most of cores and centers around DFCI, we use iLab to track all of our projects.
A free account can be requested at https://dfci.ilab.agilent.com/account/login
5. CCCB Data Analysis and Visualization Infrastructure
Analysis
Portal
Local
Drive Dropbox
Unlimited space
via PartnersUsers
CCCB via DFCI GCP
GATK
Analysis
RNASeq
Analysis
Variant
Viewer
WebMeV
Upload
Download
Web Access
Direct data transfer
Under construction
6. Variant Analysis Pipeline by GATK
Align reads
Base
recalibration
Variant
calling
Variant
calling
Variant
calling
Variant
calling
Merge
VCFs
Variant calling with GATK HaplotypeCaller
● SNPs and Indels
● Default parameters
● Parallelized between chromosomes
● Provide back VCF to be annotated
with different annotation systems
7. CCCB Cloud System- file uploads
- Upload methods:
- Dropbox*
- From local computer
- File chooser
- Drag/drop interface
* preferred. Fastest and most reliable.
- Currently support upload of
FastQ-format and BAM files. File
naming instructions
- Email notification when transfer is
complete.
8. CCCB Cloud System- sample annotation
Sample names are inferred from
sequencing file names. Can create new
samples or remove existing ones.
- Drag/drop files to the proper
sample
9. Downloading output files
Save output by direct
download or Dropbox transfer
- Authenticated: only
those logged-in as your
Google user can access
files
10. Variant Visualization Platform
DNARails VCF data visualization web app enable summarization and filtering
through individual VCF file
- https://variant-viz.tm4.org/
- Graphical interface
- Filtering of variants
- VEP annotated VCF
11. Analysis of Large Exome Cohorts
● The web app for exome analysis pipelines is suitable for up to 20 samples
○ Same for DNARails visualization
● Larger data sets - from 20 - 1000+ - create new issues
○ Data transfer
○ Analyzing cross-sample
● Insuring samples match the data
● Different software to analyze large data sets
● Larger data sets offer better means of variant filtration
● Custom project with us to provide a suitable analysis pipeline
12. Costs for Basic RNA-Seq and Exome Analysis
Example Costs for DFCI/BWH Members:
20 SR75bp samples for RNA-Seq (DGE): $145 + $15*20 = $445
20 PE75bp samples for Germline Variant Analysis: $145 + $50*20 = $1,145
- with Variant Annot and Visualization: $1,145 + $20*20 = $1,740
DFCI/BWH External non-profit
Project Setup Per Project $145 $189
RNA-Seq (DGE) Per Sample $15 $18
Germline Variant Analysis Per Exome $50 $60
Variant Annotation and Visualization Per Exome $20 $24
WebMeV free free
13. Request Project and Demo Accounts
Individuals can now request free demo accounts for
- RNA-Seq DGE pipeline on 6 single read samples
- Variant Visualization Platform System for hg19 chr20 from the 1000 Genome
Project
Please send request by emailing cccb@jimmy.harvard.edu with a proper Google
account with subject line: [Demo] RNA-Seq DGE or [Demo] Variant Visualization