CNIT Final Presentation
     Chris Thompson
     April 18th, 2013
        CNIT 227
Table of Contents

Introduction

     Materials

           Methods

               Results and Conclusion
INTRODUCTION
Bioinformatics
Bioinformatics – an interdisciplinary field that develops
and improves upon methods for storing, retrieving,
organizing, and analyzing biological data.

Bioinformatics is important because without the
technologies produced and developed through it, many
of the experiments and assays we do today would not
be possible.
CNIT
CNIT is the bioinformatics course at Purdue, focused on
annotating the genome of mycobacteriophages.

Overall goal is to annotate the genome of the
RiverMonster phage, so other researchers can use it in
the future.
Bacteriophages
• A virus that infects and replicates in bacteria
• One of the most common and populous
  organism in existence
• Many have a mosaic genome
• Unlimited potential usage
• Mycobacteriophages infect M.smegmatis
Clusters
• System to organize bacteriophages
• Phages sorted by factors such as genome
  length, presence of certain genes,
  organization of genome, GC content, and
  plaque size and characteristics
• A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S,
  Singleton, and T
RiverMonster
•   Discovered in 2010 in West Lafayette
•   Mycobacteriophage
•   Cluster E
•   144 genes in total
•   Many protein products are unknown
•   Overall geographical presence is unknown

Through CNIT and bioinformatics we are trying to
answer some of the unknowns about RiverMonster
MATERIALS
Bioinformatics Tools
•   DNA Master
•   Phamerator
•   Glimmer
•   GeneMark
•   NCBI and BLAST
•   EverNote
DNA Master
•   Designed and written by Dr. Jeffrey Lawrence
•   Annotation program
•   Can auto-annotate entire genomes
•   Uses information from Glimmer and GeneMark
•   Can locally BLAST genes
Phamerator
•   Developed in 2011
•   Linux-based bioinformatic program
•   Used for comparative phage genomics
•   Can visualize entire phage genomes
•   Separates phages into “phams”
Glimmer
• Stands for Gene Locator and Interpolated Markov
  ModelER
• Used for finding genes in microbial DNA
• Uses models and algorithms to distinguish between
  coding and non-coding DNA
GeneMark
• A family of gene prediction programs developed at
  the Georgia Institute of Technology
• Determines the protein-coding potential of a DNA
  sequence
• Uses many of the same algorithms and models as
  GIimmer
NCBI and BLAST
• National Center for Biotechnology Information
• Basic Local Alignment Search Tool
• Program that compares DNA sequences with a large
  database of known sequences
• Used to find similar gene sequences
EverNote
• Started in 2008
• Designed for note-taking and archiving
• Used as an online lab notebook for CNIT
METHODS
Organization
•   Genome split into two sections
•   Genes 0 to 65 by Jon and Bill
•   Genes 66 to 144 by Chris and Nyema
•   Split again into four sections
•   0 to 23 by Jon
•   24 to 65 by Bill
•   100 to 123 by Chris
•   124 to 144 by Nyema
Process
•   Document the auto-annotated gene call
•   Ran the Shine-Delgarno Test
•   BLASTed gene and compared scores
•   Compared homologous genes in Phamerator
•   Made final call
First Section
• Genes 66 to 144
• Split up evens and odds
• I had even numbered genes
• No outstandingly tricky gene calls
• Gene 88 seems to be a family of Kinases, many of
  them hypothetical
• Gene 92 is a family of RNA ligases
• Gene 94 is Transcription factor WhiB
Second Section
•   Genes 101 to 123
•   Every gene
•   Gene 101 is a protease family
•   Gene 112 contains genes for polymerases
•   Genes 116 and 117 were reverse genes
•   117 had many inconsistencies and was difficult to call
RESULTS AND
CONCLUSION
Accomplishments
•   Personally called 39 genes
•   Called 144 genes as a class
•   Analyzed protein products
•   Completed a final draft of the RiverMonster genome
Significance
• Genome can be used by future scientists
• Proves validity of undergraduate research
• Learned about bioinformatics, bacteriophages,
  genomes, annotation, and biotechnology
Future Work
• Check and finalize all gene calls
• Compilation of DNA Master file
• Send to HHMI and SEA Phages to be put in
  Phamerator
The End

Cnit final presentation

  • 1.
    CNIT Final Presentation Chris Thompson April 18th, 2013 CNIT 227
  • 2.
    Table of Contents Introduction Materials Methods Results and Conclusion
  • 3.
  • 4.
    Bioinformatics Bioinformatics – aninterdisciplinary field that develops and improves upon methods for storing, retrieving, organizing, and analyzing biological data. Bioinformatics is important because without the technologies produced and developed through it, many of the experiments and assays we do today would not be possible.
  • 5.
    CNIT CNIT is thebioinformatics course at Purdue, focused on annotating the genome of mycobacteriophages. Overall goal is to annotate the genome of the RiverMonster phage, so other researchers can use it in the future.
  • 6.
    Bacteriophages • A virusthat infects and replicates in bacteria • One of the most common and populous organism in existence • Many have a mosaic genome • Unlimited potential usage • Mycobacteriophages infect M.smegmatis
  • 7.
    Clusters • System toorganize bacteriophages • Phages sorted by factors such as genome length, presence of certain genes, organization of genome, GC content, and plaque size and characteristics • A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, Singleton, and T
  • 8.
    RiverMonster • Discovered in 2010 in West Lafayette • Mycobacteriophage • Cluster E • 144 genes in total • Many protein products are unknown • Overall geographical presence is unknown Through CNIT and bioinformatics we are trying to answer some of the unknowns about RiverMonster
  • 9.
  • 10.
    Bioinformatics Tools • DNA Master • Phamerator • Glimmer • GeneMark • NCBI and BLAST • EverNote
  • 11.
    DNA Master • Designed and written by Dr. Jeffrey Lawrence • Annotation program • Can auto-annotate entire genomes • Uses information from Glimmer and GeneMark • Can locally BLAST genes
  • 12.
    Phamerator • Developed in 2011 • Linux-based bioinformatic program • Used for comparative phage genomics • Can visualize entire phage genomes • Separates phages into “phams”
  • 13.
    Glimmer • Stands forGene Locator and Interpolated Markov ModelER • Used for finding genes in microbial DNA • Uses models and algorithms to distinguish between coding and non-coding DNA
  • 14.
    GeneMark • A familyof gene prediction programs developed at the Georgia Institute of Technology • Determines the protein-coding potential of a DNA sequence • Uses many of the same algorithms and models as GIimmer
  • 15.
    NCBI and BLAST •National Center for Biotechnology Information • Basic Local Alignment Search Tool • Program that compares DNA sequences with a large database of known sequences • Used to find similar gene sequences
  • 16.
    EverNote • Started in2008 • Designed for note-taking and archiving • Used as an online lab notebook for CNIT
  • 17.
  • 18.
    Organization • Genome split into two sections • Genes 0 to 65 by Jon and Bill • Genes 66 to 144 by Chris and Nyema • Split again into four sections • 0 to 23 by Jon • 24 to 65 by Bill • 100 to 123 by Chris • 124 to 144 by Nyema
  • 19.
    Process • Document the auto-annotated gene call • Ran the Shine-Delgarno Test • BLASTed gene and compared scores • Compared homologous genes in Phamerator • Made final call
  • 20.
    First Section • Genes66 to 144 • Split up evens and odds • I had even numbered genes • No outstandingly tricky gene calls • Gene 88 seems to be a family of Kinases, many of them hypothetical • Gene 92 is a family of RNA ligases • Gene 94 is Transcription factor WhiB
  • 21.
    Second Section • Genes 101 to 123 • Every gene • Gene 101 is a protease family • Gene 112 contains genes for polymerases • Genes 116 and 117 were reverse genes • 117 had many inconsistencies and was difficult to call
  • 22.
  • 23.
    Accomplishments • Personally called 39 genes • Called 144 genes as a class • Analyzed protein products • Completed a final draft of the RiverMonster genome
  • 24.
    Significance • Genome canbe used by future scientists • Proves validity of undergraduate research • Learned about bioinformatics, bacteriophages, genomes, annotation, and biotechnology
  • 25.
    Future Work • Checkand finalize all gene calls • Compilation of DNA Master file • Send to HHMI and SEA Phages to be put in Phamerator
  • 26.