2. The term “Bioinformatics” was invented by Paulien
Hogeweg and Ben Hesper in 1970 as "the study of informatic
processes in biotic systems". Bioinformatics is the use of
information technology in biotechnology for the data storage,
data warehousing and analyzing the DNA sequences.
3. In Bioinformatics knowledge of many branches are required
like biology, mathematics, computer science, laws of physics
& chemistry, and of course sound knowledge of information
technology to analyze biotech data.
4. Given new DNA or protein sequence, biologist will want to
search databases of known sequences to look for anything
similar.
a. Sequence similarity can provide clues about function and
evolutionary relationships.
b. Databases such as GenBank are far too large to search
manually.
But to search them efficiently, we need an algorithm because
we shouldn't expect exact matches (so it's not really like
Google):
i. Genomes aren't static: mutations, insertions, deletions.
ii. Human (and machine) error in reading sequencing gels.
5. Bioinformatics intends to find new approaches to deal with the
volume and complexity of data.
Providing researchers with better access to analysis and
computing tools in order to advance understanding of our
genetic legacy and its role in health and disease.
6. There are many urgent open problems with lots of
opportunities to make fundamental contributions to the
society.
Develop our creativity and problem-solving skills to the limit.
Join a cross-disciplinary team and work with interesting
people.
Participate in unlocking the mysteries of life itself so as to
make the world a better place to live.
8. Analysis and interpretation of various kinds of data including;
nucleotides and amino acid sequences, protein domains, and
protein structures.
Development of new algorithms (mathematical formulas) and
statistics with which to assess relationships among members
of large data sets, such as methods to locate a gene within a
sequence, predict protein structure and/or function, and cluster
protein sequences into families of related sequences.
Gene therapy; mutations are easily detected and quantified
through next-generation sequencing technology in a
heterogeneous sample thus a cost effective precision medicine,
“right drug at right dose to the right patient at the right time”
can be administered.
9. Drug development; Bioinformatics explores the causes of
diseases at the molecular level, explains the phenomena of the
diseases on the gene/pathway level, makes use of computer
techniques such as data mining to analyze and interpret data
faster thus reducing the cost and time of drug discovery.
Preventative medicine; gene identification by sequence
inspection and prediction of splice sites allows mutations to be
corrected easily. This is much used in the analysis of mutations
that cause cancer.
Climate change Studies; Increasing levels of carbon dioxide
emission are thought to contribute to global climate change,
one way to decrease atmospheric carbon dioxide is to study
the genomes of microbes that use CO2 as their sole C source.
10. Analysis of gene expression; DNA micro arrays allow
simultaneous measurement of the level of transcription for
every gene in a genome (gene expression) by:
Tracking sample over a period of time to see gene expression
over time.
Tracking two different samples under same conditions to see
difference in gene expressions.
11. Computational evolutionary biology; Mouse provides insight
into human genetic disorder:
Waardenburg’s syndrome is characterized by pigmentary
dysphasia.
Disease gene linked to Chromosome 2, but not clear, where it
was located.
“Splotch” mice:
A breed of mice (with splotch gene) had similar symptoms.
Scientists succeeded in identifying location of gene in mice.
This gave clues as to where same gene is located in humans.
12. Data driven biology;
functional genomics
comparative genomics
systems biology
Molecular medicine;
identification of genetic components of various maladies
diagnosis/prognosis from sequence/expression
gene therapy
14. Waste cleanup
Microbial genome applications
Antibiotic resistance
Alternative energy sources
Crop improvement and development of resistant varieties
Forensic analysis
Bio-weapon creation
Insect resistance
Sequence analysis
Literature analysis
15. Genome Where Year
H. Influenza TIGR 1995
E. Coli K -12 Wisconsin 1997
S. cerevisiae (yeast) International Collaboration 1997
C. elegans (worm) Washington U./Sanger 1998
Drosophila M. (fruit fly) multiple groups 2000
E. Coli 0157:H7 (pathogen) Wisconsin 2000
H. Sapiens (human) International Collaboration/ Celera 2001
Mus musculus (mouse) International Collaboration 2002
Rattus norvegicus (rat) International Collaboration 2004
16. Over 300 publicly available databases pertaining to molecular
biology
GenBank
• Has over 61 million sequence entries
• With over 65 billion bases
UnitProtKB / Swis-Prot
• Has over 277 thousand protein sequence entries
• With over 100 million amino acids
Protein Data Bank
• 45,632 protein (and related) structures
17. ◦ Presence of repeats. Repeats are identical sequences that
occur in the genome in different locations and are often seen
in varying lengths and in the multiple copies. There are
several types of repeats: tandem repeats or interspersed
repeats. The read's originating from different copies of the
repeat appear identical to the assembler, causing errors in
the assembly.
◦ Contaminants in samples (e.g. from Bacteria or Human).
◦ PCR artifacts (e.g. Chimeras and Mutations)
◦ Sequencing errors, such as “Homopolymer” errors – when
e.g. 2+ run of same base.
◦ MID’s (multiplex indexes), primers/adapters still in the raw
reads.
◦ Polyploid genomes
18. Biological redundancy and multiplicity
Different sequences with similar structures
Organism with similar genes
Multiple functions of similar genes
Grouping of genes in pathways
Sequence redundancy in genomes
Significance of relationships and similarities
Signal vs. Noise
Lack of data
19. USA
NCBI (National Center for Biotechnology Information)
Europe
EBI (European Bioinformatics Institute, Hinxton, UK)
Japan
NIG (National Institute of Genetics)