BIOINFORMATICS
CHIRAG THAKKAR (MCA-37)
IIND SEM
• Introduction
• History
• Need for bioinformatics
• Computational evolutionary biology
• Success
• Software and tools
CONTENTS
• Bioinformatics is the application of Information
technology to store, organize and analyse the
vast amount of biological data.
• The stored data is available in the form of
sequences and structures of proteins and
nucleic acids (the information carrier).
• The biological information of nucleic acids is
available as sequences while the data of
proteins is available as sequences and
structures
INTRODUCTION
• Sequences are represented in single dimension
where as the structure contains the three
dimensional data of sequences.
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, soft wares, algorithms
to store and analyze the data.
Bioinformaticians
Study biological questions by
analyzing molecular data
The field of science in which biology, computer science
and information technology merge into a single
discipline .
• By course of 10 years starting from 1981,
following events occurred…
• 579 human genes had been mapped.
• Invented a method for automated DNA
sequencing.
• The Human Genome organization (HUGO) was
founded. This is an international organization of
scientists involved in Human Genome Project.
• The first complete genome map was published
for the bacteria Haemophilus influenza .
HISTORY
• After 10 years…
• By 1991, a total of 1879 human genes had been
mapped.
• In 1993, Genethon , a human genome research
center in France Produced a physical map of the
human genome.
• After 3 years…
• Genethon published the final version of the
Human Genetic Map. This concluded the end of
the first phase of the Human Genome Project.
• Bioinformatics was fuelled by the need to create
huge databases.
• GenBank and EMBL and DNA Database of
Japan.
• They store and compare the DNA sequence
data coming from the human genome and other
genome sequencing projects.
• Today, bioinformatics enhances protein structure
analysis, gene and protein functional
information, data from patients, pre-clinical and
clinical trials, and the metabolic pathways of
numerous species.
• The first bioinformatics databases were constructed
a few years after the first protein sequences began
to become available.
• Now, A huge variety of divergent data resources of
different types and sizes are now available either in
the public domain information through
Internet(www.ncbi.nlm.nih.gov).
• All of the original databases were organized in a
very simple way with data entries being stored in flat
files, as a single large text file. Re-write - Later on
lookup indexes were added to allow convenient
keyword searching of header information.
• Bioinformatics uses many areas of computer
science, statistics, mathematics and engineering to process
biological data.
• Complex machines are used to read in biological data at a much
faster rate than before.
• Analyzing biological data may involve algorithms in artificial
intelligence, soft computing, data mining, image processing,
and simulation.
• The algorithms in turn depend on theoretical foundations such
as discrete mathematics, control theory, system theory, information
theory, and statistics.
• Commonly used software tools and technologies in the field
include Java, C#, XML, Perl, C, C++, Python, R, SQL, CUDA, MATL
AB, and spreadsheet
• the development of new algorithms (mathematical formulas) and
NEED FOR BIOINFORMATICS
• Evolutionary biology is the study of the origin
and species, as well as their change over
time. Informatics has assisted evolutionary
biologists by enabling researchers.
COMPUTATIONAL EVOLUTIONARY
BIOLOGY
• 1) Analysis of gene expression
SUCCESS
• 2) Analysis of regulation
• One can then apply clustering algorithms to that
expression data to determine which genes are
co-expressed
• 3) Analysis of protein expression
• Bioinformatics is very much involved in making
sense of protein microarray and HT MS data.
• involves the problem of matching large amounts
of mass data against predicted masses from
protein sequence databases.
• 4) Analysis of mutations in cancer
• Bioinformaticians continue to produce
specialized automated systems to manage the
sheer volume of sequence data produced, and
they create new algorithms and software to
compare the sequencing results to the growing
collection of human genome sequences
and germline polymorphisms
• 5) Comparative genomics
• 6) High-throughput image analysis
• Computational technologies are used to
accelerate or fully automate the processing,
quantification and analysis of large amounts of
high-information-content biomedical imagery.
• accuracy, simple objective and high speed
• Open-source bioinformatics software
• Many free and open-source software tools have
existed and continued to grow up till now.
• The range of open-source software
packages includes titles such
as Bioconductor, BioPerl, Biopython, BioJava, BioR
uby, Bioclipse, EMBOSS, .NET Bio, Taverna
workbench, and UGENE.
• In order to maintain this tradition and create further
opportunities, the non-profit Open Bioinformatics
Foundation have supported the annual
Bioinformatics Open Source Conference (BOSC)
SOFTWARE AND TOOLS
• Web services in bioinformatics
• The main advantages is that end users do not
have to deal with software and database
maintenance overheads.
• Bioinformatics workflow management
systems
• A Bioinformatics workflow management
system is a specialized form of a workflow
management system designed specifically to
compose and execute a series of computational
or data manipulation steps, or a workflow, in a
Bioinformatics application.
• Rosalind
• Rosalind is an educational resource and web
project for learning bioinformatics
through problem solving and computer
programming.
• bioinfo.mbb.yale.edu
• www.ncbi.nlm.nih.gov
• bioinformaticsweb.net
• www.oxfordjournals.org
• www.umass.edu
REFERENCES
Bioinformatics

Bioinformatics

  • 1.
  • 2.
    • Introduction • History •Need for bioinformatics • Computational evolutionary biology • Success • Software and tools CONTENTS
  • 3.
    • Bioinformatics isthe application of Information technology to store, organize and analyse the vast amount of biological data. • The stored data is available in the form of sequences and structures of proteins and nucleic acids (the information carrier). • The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures INTRODUCTION
  • 4.
    • Sequences arerepresented in single dimension where as the structure contains the three dimensional data of sequences.
  • 5.
    Biologists collect molecular data: DNA& Protein sequences, gene expression, etc. Computer scientists (+Mathematicians, Statisticians, etc.) Develop tools, soft wares, algorithms to store and analyze the data. Bioinformaticians Study biological questions by analyzing molecular data The field of science in which biology, computer science and information technology merge into a single discipline .
  • 6.
    • By courseof 10 years starting from 1981, following events occurred… • 579 human genes had been mapped. • Invented a method for automated DNA sequencing. • The Human Genome organization (HUGO) was founded. This is an international organization of scientists involved in Human Genome Project. • The first complete genome map was published for the bacteria Haemophilus influenza . HISTORY
  • 7.
    • After 10years… • By 1991, a total of 1879 human genes had been mapped. • In 1993, Genethon , a human genome research center in France Produced a physical map of the human genome. • After 3 years… • Genethon published the final version of the Human Genetic Map. This concluded the end of the first phase of the Human Genome Project.
  • 8.
    • Bioinformatics wasfuelled by the need to create huge databases. • GenBank and EMBL and DNA Database of Japan. • They store and compare the DNA sequence data coming from the human genome and other genome sequencing projects. • Today, bioinformatics enhances protein structure analysis, gene and protein functional information, data from patients, pre-clinical and clinical trials, and the metabolic pathways of numerous species.
  • 9.
    • The firstbioinformatics databases were constructed a few years after the first protein sequences began to become available. • Now, A huge variety of divergent data resources of different types and sizes are now available either in the public domain information through Internet(www.ncbi.nlm.nih.gov). • All of the original databases were organized in a very simple way with data entries being stored in flat files, as a single large text file. Re-write - Later on lookup indexes were added to allow convenient keyword searching of header information.
  • 10.
    • Bioinformatics usesmany areas of computer science, statistics, mathematics and engineering to process biological data. • Complex machines are used to read in biological data at a much faster rate than before. • Analyzing biological data may involve algorithms in artificial intelligence, soft computing, data mining, image processing, and simulation. • The algorithms in turn depend on theoretical foundations such as discrete mathematics, control theory, system theory, information theory, and statistics. • Commonly used software tools and technologies in the field include Java, C#, XML, Perl, C, C++, Python, R, SQL, CUDA, MATL AB, and spreadsheet • the development of new algorithms (mathematical formulas) and NEED FOR BIOINFORMATICS
  • 11.
    • Evolutionary biologyis the study of the origin and species, as well as their change over time. Informatics has assisted evolutionary biologists by enabling researchers. COMPUTATIONAL EVOLUTIONARY BIOLOGY
  • 12.
    • 1) Analysisof gene expression SUCCESS
  • 13.
    • 2) Analysisof regulation • One can then apply clustering algorithms to that expression data to determine which genes are co-expressed
  • 14.
    • 3) Analysisof protein expression • Bioinformatics is very much involved in making sense of protein microarray and HT MS data. • involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases.
  • 15.
    • 4) Analysisof mutations in cancer • Bioinformaticians continue to produce specialized automated systems to manage the sheer volume of sequence data produced, and they create new algorithms and software to compare the sequencing results to the growing collection of human genome sequences and germline polymorphisms
  • 16.
  • 17.
    • 6) High-throughputimage analysis • Computational technologies are used to accelerate or fully automate the processing, quantification and analysis of large amounts of high-information-content biomedical imagery. • accuracy, simple objective and high speed
  • 18.
    • Open-source bioinformaticssoftware • Many free and open-source software tools have existed and continued to grow up till now. • The range of open-source software packages includes titles such as Bioconductor, BioPerl, Biopython, BioJava, BioR uby, Bioclipse, EMBOSS, .NET Bio, Taverna workbench, and UGENE. • In order to maintain this tradition and create further opportunities, the non-profit Open Bioinformatics Foundation have supported the annual Bioinformatics Open Source Conference (BOSC) SOFTWARE AND TOOLS
  • 19.
    • Web servicesin bioinformatics • The main advantages is that end users do not have to deal with software and database maintenance overheads.
  • 20.
    • Bioinformatics workflowmanagement systems • A Bioinformatics workflow management system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a Bioinformatics application.
  • 21.
    • Rosalind • Rosalindis an educational resource and web project for learning bioinformatics through problem solving and computer programming.
  • 22.
    • bioinfo.mbb.yale.edu • www.ncbi.nlm.nih.gov •bioinformaticsweb.net • www.oxfordjournals.org • www.umass.edu REFERENCES