Introductory Bioinformatics
Syllabus
• Unit 1
• Introduction to Bioinformatics: Definition - Computational
Biology; Biological Data Acquisition: The form of biological
information Retrieval methods for DNA sequence, protein
sequence, and protein structure information; Databases –
Format and Annotation: Conventions for database indexing
and specification of search terms, Common sequence file
formats. Annotated sequence databases - primary sequence
databases, protein sequence, and structure databases;
Organism-specific databases; Data – Access
• Unit 2
Biocomputing: Introduction to String Matching Algorithms.
Database Search Techniques - Local versus global- Sequence
Comparison and Alignment Techniques - Pairwise and
Multiple sequence alignment. - Use of Scoring Matrices-
Dynamic programming algorithms, Needleman-Wunsch and
Smith-waterman. Heuristic Methods of sequence alignment,
BLAST, and PSI-BLAST. Multiple Sequence Alignment and
software tools for pairwise and multiple sequence alignment;
– Phylogenetics analysis- Phylip.
• Unit 3
• Profiles, motifs, and features identification using tools like
Prosite. Automated Gene Prediction - ORF finding;
Visualization tool- Pymol. Introduction to Signaling
Pathways. Machine Learning Methods in Bioinformatics -
Introduction to Matlab.
Books to Refer
• Bioinformatics: Concepts, Skills & Applications – Rastogi
et al.
• Essential Bioinformatics – Xiong
• Developing Bioinformatics Computer Skills – Gibas &
Jambeck
• An Introduction to Bioinformatics Algorithms – Jones and
Pevzner
• Introduction to Bioinformatics-Krawetz Womble
• Introduction to Bioinformatics- V Kothekar
What is Bioinformatics??
Statistics
Biology
computer
science
Bio informatics is an
interdisciplinary
scientific field that
develops and uses
computational tools to
collect, store, analyze,
and interpret large
amounts of biological
data, such as DNA, RNA,
and protein sequences,
to understand living
systems and disease.
It is an
interdisciplinary
field which
harnesses
computer science,
mathematics,
physics, and
biology.
Computer Science
It applies techniques from
machine learning, data
mining, AI, optimization,
visualization and simulation
and develops new techniques
as required
Computational
Biology
• Bioinformatics is limited to sequence,
structural and functional analysis of
genes and genomes and statistics.
• Computational Biology encompasses
all biological areas that involve model
building, simulations and theoretical
methods.
• Eg: Mathematical modelling of
population dynamics, ADME, organ
functioning
Biology
• Bioinformatics involves the application of computational and statistical
techniques to the analysis and interpretation of biological data.
• Various types of biological data are used in bioinformatics, providing
insights into the structure, function, and relationships of biological entities.
• Genomic Data
• Transcriptomic Data
• Proteomic Data
• Metabolomic Data
• Structural Data
• Functional Genomics Data
• Phylogenetic Data
• Biological Literature and Annotations
Omics in Bioinformatics
• Bioinformatics plays a crucial role in processing, analyzing, and interpreting
these massive data sets generated by high-throughput techniques.
• Genomics: study of the entire set of genes (genome)
• DNA sequences, genome assemblies, gene annotations, and SNPs
• Transcriptomics: study of all RNA molecules (gene expression and regulation)
• RNA sequencing (RNA-Seq) data, microarray data, and information on alternative
splicing and isoform expression
• Proteomics: study of the entire set of proteins
• Mass spectrometry data, protein-protein interaction networks, and protein structural
information
• Metabolomics: study of metabolites in biological systems
• Mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy data on
metabolite concentrations and profiles
• Epigenomics: heritable changes in gene function that do not involve alterations to the
underlying DNA sequence
• DNA methylation, histone modifications, and chromatin structure
• Pharmacogenomics: genetic variations in individual responses to drugs, aiming to
personalize medicine
• Genetic variations relevant to drug metabolism, efficacy, and adverse reactions
• Metagenomics: study of the structure and function of entire nucleotide sequences isolated
and analyzed from all the organisms (typically microbes) in a bulk sample.
• DNA sequences from mixed microbial populations, functional gene annotations, and taxonomic
information
• Immunomics: study of the immune system
• immune system components, antibody-antigen interactions, vaccine generation and immune cell
signaling
• Interactomics: explores the interactions between biomolecules, such as protein-protein
interactions, to understand cellular functions and signaling pathways
• Protein-protein interaction networks, signaling pathway data, and information on molecular
interactions
Omics in Bioinformatics
Data Acquisition
• Systematic collection of biological data from various sources for analysis,
interpretation, and further investigation.
• Different sections of Data Acquisition
• Data Generation: Obtaining raw biological data through experimental methods
(Sequencing, microarray, x-ray diffraction, mass spec etc )
• Data Retrieval: Collecting existing biological data from public repositories (genomic,
proteomic and expression).
• Data Integration: Combining data from multiple sources to create a unified dataset.
(Composite databases)
• Data Cleaning and Preprocessing: Preparing data for analysis by addressing issues
such as missing values, outliers, and normalization.
• Data Annotation: is the process of the categorization, describing or labeling of data
• Metadata Collection: additional contextual information (Patient metadata)
• Data Storage and sharing: Storing acquired data in a structured and accessible
manner.
Types of DNA sequences and gene data
• Genomic DNA-The entire genome data
• cDNA- from a mature mRNA using reverse transcriptase (create copies,
PCR and functional genomics )
• Recombinant DNA- artificially created DNA (cloning, GMOs and
transgenic animals)
• ESTs(Expressed Sequence tag)- small sub-sequence of transcribed DNA
• GSSs(Genome Survey Sequences)- small sub-sequence of genomic DNA
origin (dbGSS)
• SNPs
• Gene-gene associations
• Gene-disease associations

Introductory Bioinformttatics-part1.pptx

  • 1.
  • 2.
    Syllabus • Unit 1 •Introduction to Bioinformatics: Definition - Computational Biology; Biological Data Acquisition: The form of biological information Retrieval methods for DNA sequence, protein sequence, and protein structure information; Databases – Format and Annotation: Conventions for database indexing and specification of search terms, Common sequence file formats. Annotated sequence databases - primary sequence databases, protein sequence, and structure databases; Organism-specific databases; Data – Access
  • 3.
    • Unit 2 Biocomputing:Introduction to String Matching Algorithms. Database Search Techniques - Local versus global- Sequence Comparison and Alignment Techniques - Pairwise and Multiple sequence alignment. - Use of Scoring Matrices- Dynamic programming algorithms, Needleman-Wunsch and Smith-waterman. Heuristic Methods of sequence alignment, BLAST, and PSI-BLAST. Multiple Sequence Alignment and software tools for pairwise and multiple sequence alignment; – Phylogenetics analysis- Phylip.
  • 4.
    • Unit 3 •Profiles, motifs, and features identification using tools like Prosite. Automated Gene Prediction - ORF finding; Visualization tool- Pymol. Introduction to Signaling Pathways. Machine Learning Methods in Bioinformatics - Introduction to Matlab.
  • 5.
    Books to Refer •Bioinformatics: Concepts, Skills & Applications – Rastogi et al. • Essential Bioinformatics – Xiong • Developing Bioinformatics Computer Skills – Gibas & Jambeck • An Introduction to Bioinformatics Algorithms – Jones and Pevzner • Introduction to Bioinformatics-Krawetz Womble • Introduction to Bioinformatics- V Kothekar
  • 6.
    What is Bioinformatics?? Statistics Biology computer science Bioinformatics is an interdisciplinary scientific field that develops and uses computational tools to collect, store, analyze, and interpret large amounts of biological data, such as DNA, RNA, and protein sequences, to understand living systems and disease. It is an interdisciplinary field which harnesses computer science, mathematics, physics, and biology.
  • 7.
    Computer Science It appliestechniques from machine learning, data mining, AI, optimization, visualization and simulation and develops new techniques as required
  • 8.
    Computational Biology • Bioinformatics islimited to sequence, structural and functional analysis of genes and genomes and statistics. • Computational Biology encompasses all biological areas that involve model building, simulations and theoretical methods. • Eg: Mathematical modelling of population dynamics, ADME, organ functioning
  • 9.
    Biology • Bioinformatics involvesthe application of computational and statistical techniques to the analysis and interpretation of biological data. • Various types of biological data are used in bioinformatics, providing insights into the structure, function, and relationships of biological entities. • Genomic Data • Transcriptomic Data • Proteomic Data • Metabolomic Data • Structural Data • Functional Genomics Data • Phylogenetic Data • Biological Literature and Annotations
  • 10.
    Omics in Bioinformatics •Bioinformatics plays a crucial role in processing, analyzing, and interpreting these massive data sets generated by high-throughput techniques. • Genomics: study of the entire set of genes (genome) • DNA sequences, genome assemblies, gene annotations, and SNPs • Transcriptomics: study of all RNA molecules (gene expression and regulation) • RNA sequencing (RNA-Seq) data, microarray data, and information on alternative splicing and isoform expression • Proteomics: study of the entire set of proteins • Mass spectrometry data, protein-protein interaction networks, and protein structural information • Metabolomics: study of metabolites in biological systems • Mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy data on metabolite concentrations and profiles
  • 11.
    • Epigenomics: heritablechanges in gene function that do not involve alterations to the underlying DNA sequence • DNA methylation, histone modifications, and chromatin structure • Pharmacogenomics: genetic variations in individual responses to drugs, aiming to personalize medicine • Genetic variations relevant to drug metabolism, efficacy, and adverse reactions • Metagenomics: study of the structure and function of entire nucleotide sequences isolated and analyzed from all the organisms (typically microbes) in a bulk sample. • DNA sequences from mixed microbial populations, functional gene annotations, and taxonomic information • Immunomics: study of the immune system • immune system components, antibody-antigen interactions, vaccine generation and immune cell signaling • Interactomics: explores the interactions between biomolecules, such as protein-protein interactions, to understand cellular functions and signaling pathways • Protein-protein interaction networks, signaling pathway data, and information on molecular interactions Omics in Bioinformatics
  • 13.
    Data Acquisition • Systematiccollection of biological data from various sources for analysis, interpretation, and further investigation. • Different sections of Data Acquisition • Data Generation: Obtaining raw biological data through experimental methods (Sequencing, microarray, x-ray diffraction, mass spec etc ) • Data Retrieval: Collecting existing biological data from public repositories (genomic, proteomic and expression). • Data Integration: Combining data from multiple sources to create a unified dataset. (Composite databases) • Data Cleaning and Preprocessing: Preparing data for analysis by addressing issues such as missing values, outliers, and normalization. • Data Annotation: is the process of the categorization, describing or labeling of data • Metadata Collection: additional contextual information (Patient metadata) • Data Storage and sharing: Storing acquired data in a structured and accessible manner.
  • 14.
    Types of DNAsequences and gene data • Genomic DNA-The entire genome data • cDNA- from a mature mRNA using reverse transcriptase (create copies, PCR and functional genomics ) • Recombinant DNA- artificially created DNA (cloning, GMOs and transgenic animals) • ESTs(Expressed Sequence tag)- small sub-sequence of transcribed DNA • GSSs(Genome Survey Sequences)- small sub-sequence of genomic DNA origin (dbGSS) • SNPs • Gene-gene associations • Gene-disease associations