Ethan Willie summarizes his 8-month contribution to the Genome Sciences Centre, where he worked on several pipelines including ABySS, Trans-ABySS, and Genome-Validator. He validated tools like ChimeraScan, hg38 annotations, Trinity, and Manta. Willie analyzed multiple projects, developed scripts to improve workflows, and learned skills in bioinformatics problem-solving, scripting, visualization, and presentation. He acknowledges areas for improvement like troubleshooting and public speaking, and hopes to further develop his genomics skills and apply his experience in future roles.
2. OVERVIEW
• Pipelines
• Projects
• Validation(s)
• ChimeraScan
• Trinity
• Manta
• Development
• Additional Work
• What I learned
• What I can improve
• Moving forward
• Acknowledgments
3. Pipelines
• ABySS: Assemble short reads by a de novo, parallel, paired-end sequence assembler
• Trans-ABySS: Analyze assemblies for structural variants and splice variants
using a reference genome and annotations.
• Genome-Validator: Validate fusion and indel events from Trans-ABySS
against given BAM files and attempt to assigning ‘tumourigenicity’ as ‘somatic’
or ‘germline’ to events when both a normal tumour genome are given.
• Delly: Discover split-read and paired-end structural variants and genotyping
from parallel sequencing data.
• Microbial Detection Pipeline: Detect bacterial and/or viral sequences to
determine potential contamination or integration into the genome.
• Integration Site Pipeline: Detect putative integrative sites of viral sequences
into human sequences.
• Probing Pipeline: Detect fusion and SNP mutations in genome and
transcriptome libraries.
• Compression and Transfer: Compress and transfer files off of scratch space
for archiving and reducing total space usage on scratch space.
6. ChimeraScan-0.4.5
A software package that detects gene fusions in paired-end RNA
sequencing (RNA-Seq) datasets. differs from other fusion finders(deFUSE)
in that it adds a fragmentation step along with the whole paired-end
approach which is also used by deFUSE.
Script(s):
• setup:
– /projects/trans_scratch/software/chimerascan/scripts/chimerascan_setup_final.sh
• checker:
– /projects/trans_scratch/software/chimerascan/scripts/chimerascan_checker.sh
• cleaner:
– /projects/trans_scratch/software/chimerascan/scripts/chimerascan_cleaner.sh
• binner:
– /projects/trans_scratch/software/chimerascan/scripts/binning_beta.py
• summarizer:
– /projects/trans_scratch/software/chimerascan/scripts/chimerascan.sum.sh
• report generator:
– /projects/trans_scratch/software/chimerascan/scripts/ChimeraScan.report.sh
7. Manta
Rapid detection of structural variants and indels for clinical sequencing
applications
Script(s):
• manta_sum.sh:
– /home/ewillie/tools/scripts/manta_sum.sh
• manta_delly_overlay.py:
– /home/ewillie/tools/scripts/manta_delly_overlay.py
• Manta_gv2_overlay.py:
– /home/ewillie/tools/scripts/manta_gv2_overlay.py
• vcfToBedpe:
– /projects/trans_scratch/software/svtools-Manta2Bedpe/vcfToBedpe
9. Additional Work
• Assemblies: Run ABySS to assemble sample(s) for further downstream analyzing.
• Analyses: Run various analysis tools on data and comparing their result by means of
overlays and/or visualization.
• Overlays: Compare results between different tools or different settings to find
similarities and differences. The overlays are done using appropriate scripts, and venn
diagrams are generated to help illustrate similarities and/or differences.
• Testing Scripts: new scirpts such as integration_pipeline.sh were tested for potential
bugs and ease of use. Testing was done iteratively, with each iteration providing more
confidence.
• ChimeraScan Wiki: Create a comprehensive wiki with information regarding
validation, and a detailed procedure for running the tool. Additional information
such as installation procedure, resource requirements, and interpreting the
outputs. The wiki also contains debugging information.
11. What I Learned
• Real world applications of bioinformatics.
• Problem solving including troubleshooting, debugging and querying the
literature.
• Bash scripting language including a significant knowledge of terminal
commands.
• Writing scripts to improve time and efficiency of jobs.(Do a job manually
for > 2hrs or write a script to do it in a fraction of that time.)
• A greater attention to detail to help reduce rate of errors.
• Time management, task prioritization and meeting deadlines.
• Visualize and analyzing structural variants using IGV.
12. What I could work on
Problem solving and troubleshooting skills.
Deeper understanding of the SVIA pipeline tools.
Clear and concise presentation of my results.
Minimizing my rate of error when performing tasks.
Verbal presentation skills.
Create an appetite for personal projects.
ANY SUGGESTIONS????????
13. Moving Forward
My interest in the algorithmic aspect of genomics has grown tremendously,
enticing me to take more applied algorithm courses.
Obtaining a genomics certificate as part of my degree to further develop my
interest in genomic sciences.
Since i am now aware of the qualities and skills that are needed to be successful
in this rapidly changing industry, I will be dedicating time to further develop
these qualities and sharpen these skills.
Improving my scripting abilities both in python and bash to build on the
experience I have already gained here during the last eight months.
Applying the knowledge and skills i have acquired here in order to be successful
in a different work environment.