SlideShare a Scribd company logo
1 of 45
Established as per the Section 2(f) of the UGC Act, 1956
Approved by AICTE, COA and BCI, New Delhi
BIOINFORMATICS
School of Applied Sciences
Established as per the Section 2(f) of the UGC Act, 1956
Approved by AICTE, COA and BCI, New Delhi
Program: B.Sc BMCs
Course Title: Bioinformatics and Biostatistics
Course Code: B19MC3030
Course Type: Hardcore
Course Presenter: Prashantha C N
Course Mentor: Prashantha C N
Semester: & Section: 3rd semester
Academic Year: 2020-21
Course Pre-requisites: Students should have basic knowledge of biology, statistics
and computer sciences. Students should be familiar with
the basic concepts of DNA, RNA and Protein.
L T P: L=4 T=0 P=4
Pedagogy:
Course objectives
1. The basic objective is to give students an introduction, scope and objectives of bioinformatics
2. Understanding of different types of biological databases and tools used to understand data structures.
Course outcome
1. Knowledge and awareness of the basic principles and concepts of biology and biological databases, literature
search methods.
Pedagogy chart
Course code POs/ COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PSO1 PSO2
B20MC3030 CO1 3 2 2 1 1 1 3 1
CO2 2 3 3 3 2 1 2 2 3 2
CO3 1 3 3 3 2 1 2 2 3 2
CO4 1 3 2 1 1 1 2 2 3 1
Syllabus
Unit Topics Course
Outcomes
Program
Outcomes
I Introduction to Bioinformatics, Goal, Scope, Applications, Limitations 1 1
Biological Databases: Types of Databases, 1 2
Literature databases: Open access and open sources, PubMed, PLoS, Biomed Central, 1 2
Information Retrieval from Biological Databases; Sequence Formats, Sequence
databases, structural databases,
2 2
Genome Databases: Viral genome database (ICTVdb, VirGen), Bacterial Genomes
database (Genomes OnLine Database –GOLD, Microbial Genome Database-
MBGD), Organism specific Genome database (OMIM / OMIA, SGD, Worm Base,
PlasmoDB, FlyBase, TAIR), and ligand databases.
3 3
Program Outcomes
PO-1: Science knowledge: Apply the knowledge of Biology, computer science and mathematics, to develop algorithms and analyze
biological programs related to drug discovery, and genomic analysis to solve biological problems.
PO-2: Problem analysis: Identify, formulate and analyze problems related to the various domains of Biological sciences such as
Molecular Biology, Biotechnology, Genetics, Healthcare, agricultural, animal sciences and Microbial biotechnology.
PO-3: Conduct investigations of complex problems: Use research-based knowledge and research methods including design of
experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions.
Reference Books
1. Andreas D. Baxevanis, B.F. Francis Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes
and Proteins, 2nd Edition, 2009.
2. David W. Mount., Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory
Press, New York. 2004.
3. Andrew R. Leach., Molecular Modeling Principles and Applications (2nd Ed.), Prentice Hall, USA. 2001.
4. G. E. Schulz., Principles of Protein Structure, Springer 2009.
e-Resources
1. https://youtu.be/w-uk-_TOgR0
2. https://www.youtube.com/watch?v=IrHDOEDtwD4
3. https://www.youtube.com/watch?v=48Xr-H05raA
Suggested Readings Unit wise
1. Jin Xiong., Essential Bioinformatics, Cambridge university press (2006).
Presentation Topics
Fundamentals of Bioinformatics: Introduction to Bioinformatics, Goal, Scope, Applications, Limitations,
Biological Databases: Types of Databases, Biological databases; Literature databases: Open access and open
sources, PubMed, PLoS, Biomed Central, Information Retrieval from Biological Databases; Sequence Formats,
Sequence databases, structural databases, Genome Databases: Viral genome database (ICTVdb, VirGen), Bacterial
Genomes database (Genomes OnLine Database –GOLD, Microbial Genome Database-MBGD), Organism specific
Genome database (OMIM / OMIA, SGD, Worm Base, PlasmoDB, FlyBase, TAIR), and ligand databases.
Assignments
1. Applications of Bioinformatics
2. Searching literature through PubMed
3. Genome File formats
4. Sequence File formats
5. Understanding of Viral Genome databases
6. Understanding of Bacterial Genome databases
7. Ligand Databases and their applications.
Unit-1:
Fundamentals of Bioinformatics
Bioinformatics
Biology + Mathematical + Statistical + Computational tools = Bioinformatics
Biology
Natural science that study life and living organism based on physical, chemical, molecular,
physiological, development and evolution of organisms.
Genetics, Molecular Biology, Biotechnology, Microbiology, Biochemistry, etc
Mathematical + Statistical
Mathematical formulations to predict the algorithms, study of DNA & Protein interactions,
Statistical methods to analyze biological data and predict the interpretations.
Study of arithmetic calculations, normalization, standard deviation, probability,
correlations & regressions, ANOVA and Variances.
Computational Tools
Biological Data collection, Storage, Retrieve, Analyze and Interpretation
Bioinformatics is a field of a science in which biology, computer science and information technology
merge into a single discipline to analyze biological information using computers and statistical
techniques.
Biological experiments can generate large amounts of data (macromolecular sequences, structures,
expression profiles, pathways etc).
Bioinformatics is about acquiring, managing, analyzing and understanding those data.
Bioinformatics is an interdisciplinary field that
develops methods and software tools for
understanding biological data.
As an interdisciplinary field of science,
bioinformatics combines biology, computer
science, information engineering, mathematics and
statistics to analyze and interpret biological data.
Bioinformatics now entails the creation and
advancement of databases, algorithms,
computational and statistical techniques, and
theory to solve formal and practical problems
arising from the management and analysis of
biological data.
Definitions of Bioinformatics
History of Bioinformatics
The first major bioinformatics project was undertaken by Margaret Dayhoff in 1965,
who developed a first protein sequence database called Atlas of Protein Sequence and
Structure.
She is American Physical Chemist professor at Georgetown University Medical
Center
Margaret Dayhoff,
1965
She originated one of the first substitution matrices, point accepted mutations (PAM).
The one-letter code used for amino acids was developed by her, reflecting an attempt to reduce the
size of the data files used to describe amino acid sequences in an era of punch-card computing.
Subsequently, in the early 1970s, the Brookhaven National Laboratory established the Protein Data
Bank for archiving three-dimensional protein structures.
The first sequence alignment algorithm was developed by Needleman and Wunsch in 1970.
The first protein structure prediction algorithm was developed by Chou and Fasman in 1974.
GenBank was developed in 1980, a fast database search and sequence repository.
FASTA was developed by William Pearson
BLAST was developed by Stephen Altschul and coworkers.
Bioinformatics differs from a related field known as computational biology.
Bioinformatics is limited to sequence, structural, and functional analysis of genes and genomes and
their corresponding products and is often considered computational molecular biology.
However, computational biology encompasses all biological areas that involve computation.
For example, mathematical modeling of ecosystems, population dynamics, application of the game
theory in behavioral studies, and phylogenetic construction using fossil records all employ
computational tools, but do not necessarily involve biological macromolecules.
Bioinformatics Vs Computational Biology
GOAL OF BIOINFORMATICS
Bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation
of various types of data.
Development and implementation of computer programs that enable efficient access to, use and
management of, various types of information
Development of new algorithms (mathematical formulas) and statistical measures that assess
relationships among members of large data sets. For example, there are methods to locate a gene within
a sequence, to predict protein structure and/or function, and to cluster protein sequences into families of
related sequences.
The primary goal of bioinformatics is to increase the understanding of biological processes.
 Pattern recognition,
 Data mining
 Machine learning algorithms and visualization.
Genomics Services
• Next Generation sequencing Data Analysis
• Microarray Data Analysis
Proteomics Services
• Proteomic structure prediction
• Protein models
• Protein-protein interactions
• Peptidomics service
Drug Discovery Services
• Structure based drug discovery
• Ligand Based drug discovery
• Pharmacophore & Pharmacokinetics
• Molecular Docking & Virtual Screening
• Phytochemical Extraction & Identification
• Chemical Synthesis and Characterization
Biostatistics & Bio-IT Services
• Clinical Data Analysis
• Algorithm Development
• Bio-Tools Development
• Biological Database Development
Bioinformatics
• Sequence similarity search
• Phylogenetic Analysis
• Primer Design
• Functional Annotation & Enrichment
• Gene Ontology & Pathway prediction
• Gene & Protein Prediction
Scope of Bioinformatics
Bioinfor
matics
Genomics
Proteomi
cs
Drug
Discovery
Biostatisti
cs & Bio-
IT
Applications of Bioinformatics
Microbial genome applications
Molecular medicine
Personalized medicine
Preventative medicine
Gene therapy
Drug development
Antibiotic resistance
Evolutionary studies
Waste cleanup
Biotechnology
Climate change Studies
Alternative energy sources
Crop improvement
Forensic analysis
Bio-weapon creation
Insect resistance
Improve nutritional quality
Development of Drought resistant varieties
Veterinary Science
Biomedical Informatics
Medical Ontologies & standards
Clinical data management
Machine Learning
Computing
High performance computing
Software optimization & parallelization
Cloud Computing
Software engineering
Machine Learning
Data Management
Limitations
Computer based programmes have helped in better understanding of various processes of life science.
However, there are some limitations of bioinformatics which are listed below:
1) Bioinformatics requires sophisticated laboratory of molecular biology for in-depth study of
Biomolecules. Establishment of such laboratories requires lot of funds.
2) Computer based study of life science requires some training about various computer programmes
applicable for the study of different processes of life science. Thus special training is required for
handling of computer based biological data.
3) There should be uninterrupted electricity (power) supply for computer aided biological investigations.
Interruption of power may sometimes lead to loss of huge data from the computer memory.
4) There should be regular checking of computer viruses because viruses may pose several problems such
as deletion of data and corruption of the programmes.
5) The maintenance and up keeping of molecular laboratories involves lot of expenditure which sometimes
becomes a limiting factor for computer based molecular studies.
Biological Databases
Databases are convenient system to properly store, search and retrieve any type of data.
A database helps to easily handle and share large amount of data and support large scale analysis by
easy access and data updating.
Biological databases are libraries of life sciences information, collected from scientific experiments,
published literature, high- throughput experiment technology and computational analysis.
They contain information from genomics, proteomics, microarray gene expression.
Information contained in biological databases includes function, gene structure localization (both
cellular and chromosomal), biological sequences and structures.
Based on the Biological contents such as DNA, RNA, Protein and functions, databases can be roughly
divided into two categories:
Primary Databases
Secondary Databases
Tertiary Databases
Primary Databases
Theses are the primary sources of data used to store nucleic acid, protein sequences and structural
information of biological macromolecules.
Some of primary databases-
• EMBL, GenBank, DDBJ (DNA data bank of Japan)
• SWISS-PROT(Swiss-Prot )
• PIR (Protein Information Resource)
• PDB(Protein Data Bank)
(This sequence collection of this database is due to the efforts of basic research from academic industrial
and sequencing lab)
Secondary Databases
Secondary databases comprise data derived from the results of analyzing primary data.
Secondary databases often draw upon information from numerous sources, including other databases
(primary and secondary), controlled vocabularies and the scientific literature.
They are highly curated, often using a complex combination of computational algorithms and manual
analysis and interpretation to derive new knowledge from the public record of science.
Example: Uniprot, SwissProt/TrEMBL, PIR, CDD
InterPro (protein families, motifs and domains)
UniProt Knowledgebase (sequence and functional information on proteins)
Ensembl (variation, function, regulation and more layered onto whole genome sequences)
UniProt, InterproScan, Prosite, Pfam, PRINTS, BLOCKS
SCOPAND CATH
SCOP: a Structural Classification of Proteins
Family: 1). All proteins that have residue identities of 30% and greater; 2). proteins with lower sequence
identities but whose functions and structures are very similar; for example, globins with sequence
identities of 15%.
Super family: Families whose proteins have low sequence identities but whose structures and, in many
cases, functional features suggest that a common evolutionary origin is probable, are placed together in
super families;
Common fold: Super families and families are defined as having a common fold if their proteins have
the same major secondary structures in the same arrangement and with the same topological connections.
Class: The different folds have been grouped into classes. Most of the folds are assigned to one of the
five structural classes:
1) all-α, those whose structure is essentially formed by α-helices;
2) all-β, those whose structure is essentially formed by β-sheets;
3) α/β, those with α-helices and β-strands;
4) α+β, those in which α-helices and β-strands are largely segregated;
5) multi-domain, those with domains of different fold and for which no homologues are known at
present.
CATH: Class, Architecture, Topology and Homology
Class: Mainly based on three major classes are recognized; mainly-alpha, mainly-beta and alpha-beta.
Architecture: This describes the overall shape of the domain structure as determined by the orientations
of the secondary structures but ignores the connectivity between the secondary structures.
Topology: Structures are grouped into fold families at this level depending on both the overall shape and
connectivity of the secondary structures.
Homology: Sequence identity >= 35%, 60% of larger structure equivalent to smaller
SSAP score >= 80.0 and sequence identity >= 20%
60% of larger structure equivalent to smaller
SSAP score >= 80.0, 60% of larger structure equivalent to smaller, and
domains which have related functions
Tertiary Databases
Tertiary databases also called as protein 3D structure database.
Protein Data Bank (PDB)
Nucleotide Data Bank (NDB)
Protein Data Bank (PDB)
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large
biological molecules, such as proteins and nucleic acids.
The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-
electron microscopy, and submitted by biologists and biochemists from around the world, are freely
accessible on the Internet via the websites of its member organizations (PDBe, PDBj, RCSB, and
BMRB).
The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.
Major Bioinformatics Databases
Literature Databases
An extensive search of the information available on a topic which results in a list of references to books,
periodicals, and other materials on the topic
Mainly - Books ( printed or e-books)
• Journals (both)
• Research reports (both)
• Institutional publications(both)
• Govt. publications (both)
• Various NGO’s/ INGO’s publications
• Internet (Online resources)
• Intranet (Offline resources)
• Grey Literature
Open Access and Open Source
Open source software, like free software, is a kind of software whose source code is available for
inspection or modification.
Some open source software is available for a fee, but much of it is available at no cost.
Open access is a kind of access or availability.
This kind of access could apply to any digital content, such as software, music, movies, or news.
But we only calls for open access to a certain kind of scientific and scholarly literature.
Sources of information
By nature/content of information-
1. Primary e.g. journal , reports
2. Secondary e.g. Books
3. Tertiary e.g. Subject bibliography
By format/media/channel
a. Hard copy (Print)
b. Soft copy(Electronic)
A. Printed B. Electronic
a. Offline (Intranet) e.g. DVD, CD, cassette
b. Online (Internet)
Types A and B(a), In Library.
HOW CAN WE SEARCH?
Online searching
Internet searching for the purpose of academic, business and others.
PubMed (Databases) searching exclusively for medical literature.
PURPOSE OR NEEDS
1. Review theory
2. Problem statement
3. To improve self knowledge
4. To Know methodology.
5. To assess need of problem
6. To support on tools, methods, findings, data collection
7. To review what was done in past
8. To generate Idea
Searching Literature using PubMed
Efficient literature search is essential to the practice of Evidence-Based Medicine.
PubMed provides free access to one of the largest searchable biomedical databases.
Efficient literature search using PubMed requires a good understanding of the available search
strategies and tools.
Several PubMed tools including 'Single Citation Matcher', 'Clinical Queries', 'Clipboard', 'Field Tags',
and 'Cubby' are highlighted using case based scenarios.
Sequence formats
Data is stored in a biological database in the form of sequences or molecular form.
Unique file format
Representation of data in biological database
Categories of file formats
Sequence database
Molecular database
Sequence file formats
Gene bank flat-file Format
FASTA Format
Multi-FASTA Format
GCG Format
GCG-MSF Format
EMBL Format
Clustal Format
SWIS PROT format
GenBank
• Used by NCBI
• It is divided into three parts
• Header just a direct and very precise or brief
introductory part
• Features all genes in seq., location of genes in
genome, protein product and coding genes etc.
• Sequence : ORIGIN atcgatcgatgcgctat //
FASTA
 One line header
Stats with > followed by name of gene
Sequence of gene or protein
• Blank spaces
• Paragraph marks
• Numerals
Are all ignored
Steric sign * at the end
Sequence Databases
Sequence generation
1. First Generation Sequencing
2. Second Generation Sequencing
3. Third Generation Sequencing
Types of Sequence Databases
Nucleotide Sequence Databases
1. EMBL/DDBJ/GenBank
2. RefSeq
3. Ensembl
Protein Sequence Databases
1. TrEMBL
2. GenPept
3. Entrez protein
4. Uniprot
Sequence Submission
1. Sequin
2. BankIt
3. Webin
Structural Databases
Protein Data Bank – X-Ray Crystallography
Genome Databases
Genomic databases are integral parts of human genome informatics, which enjoyed an exponential growth in
the post genomic era, as a result of the understanding of the genetic etiology of human disorders and the
identification of numerous genomic variants.
MethBase A reference methylome database
Animal Genome Size Database
Aspergillus Genomes
Atlas of Genetics and Cytogenetics in Oncology and Haematology
SilkBase Bombyx mori genome
BGD Bovine Genome Database
CGD Candida Genome Database
Chicken (Gallus gallus) Genome
CYORF Cyanobacteria Gene Annotation Database
Cytogenetics Gallery
OriDB DNA Replication Origin Database
wFleaBase Daphnia Water Flea Genome Database
diArk Database for eukaryotic genome and EST sequencing projects
DGV Database of Genomic Variants
DGVa Database of Genomic Variants archive
GenAge Database of genes related to ageing
dbVar Database of genomic structural variation
Dog Genome Annotation (Canis familiaris)
FlyBase Drosophila Genes and Genome Database
Ensembl Tutorial: Browsing genomes
Gene Ontology Consortium
GMOD Generic Model Organism Database
G10K Genome 10K Project: Genomes of 10,000 vertebrate species
Genome Sequencing Consortiums and Centers
GWIDD Genome Wide Docking Database
GOLD Genomes OnLine Database
euGenes Genomic Information for Eukaryotic Organisms
WormBase Genomics and biology of C. elegans and related nematodes
GiardiaDB Giardia lamblia Genomics Resource
Google Cloud Life Sciences
Honey Bee Genome Project - Apis mellifera
HGD Hymenoptera Genome Database
GeneLoc Integrated map for each human chromosome
IGSR International Genome Sample Resource
IMPC International Mouse Phenotyping Consortium
IRDB Inverted Repeats Database in genomic DNA
JGI Genome Portal
KEGG Organisms: Complete Genomes
KEGG Kyoto Encyclopedia of Genes and Genomes
MGI Mouse Genome Informatics
Mouse Genomes Project
Mouse Genomic Imprinting
NCBI Human Genome - Guide to Online Information
Nematode Genomes
PharmGKB Pharmacogenomics Knowledgebase
PlasmoDB Plasmodium Genome Resource
RGD Rat Genome Database
SGD Saccharomyces Genome Database
PomBase Schizosaccharomyces pombe Genome Project
Telomerase Database
TGD Tetrahymena Genome Database
UCSC Genome Bioinformatics
VEGA Vertebrate Genome Annotation database
Viral genome database
Bacterial Genomes database
Organism specific Genome database
ligand databases
Assignments
1. Applications of Bioinformatics
2. How to Search literature using PubMed
3. Write the importance of Genome Data Viewer
4. Insect Genome Databases and their applications
5. Gene to Disease Databases
Thank you

More Related Content

Similar to bioinformatics algorithms and its basics

Bioinformatics
BioinformaticsBioinformatics
BioinformaticsAmna Jalil
 
Protein sequence classification in data mining– a study
Protein sequence classification in data mining– a studyProtein sequence classification in data mining– a study
Protein sequence classification in data mining– a studyZac Darcy
 
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDY
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDYPROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDY
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDYZac Darcy
 
bioinfo_00-introduction.ppt
bioinfo_00-introduction.pptbioinfo_00-introduction.ppt
bioinfo_00-introduction.pptAmmr2
 
Health Informatics- Module 5-Chapter 3.pptx
Health Informatics- Module 5-Chapter 3.pptxHealth Informatics- Module 5-Chapter 3.pptx
Health Informatics- Module 5-Chapter 3.pptxArti Parab Academics
 
Basics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptxBasics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptxMohdkaifkhan18
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsbiinoida
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Bioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research InstituteBioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research Instituteajay vishwakrma
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
 
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in PharmacyVedika Narvekar
 
An analysis of recent advancements in computational biology and Bioinformatic...
An analysis of recent advancements in computational biology and Bioinformatic...An analysis of recent advancements in computational biology and Bioinformatic...
An analysis of recent advancements in computational biology and Bioinformatic...Pubrica
 
BioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docxBioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docxrichardnorman90310
 
BioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docxBioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docxjasoninnes20
 
Pcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iPcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iMuhammad Younis
 

Similar to bioinformatics algorithms and its basics (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Protein sequence classification in data mining– a study
Protein sequence classification in data mining– a studyProtein sequence classification in data mining– a study
Protein sequence classification in data mining– a study
 
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDY
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDYPROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDY
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDY
 
bioinfo_00-introduction.ppt
bioinfo_00-introduction.pptbioinfo_00-introduction.ppt
bioinfo_00-introduction.ppt
 
Health Informatics- Module 5-Chapter 3.pptx
Health Informatics- Module 5-Chapter 3.pptxHealth Informatics- Module 5-Chapter 3.pptx
Health Informatics- Module 5-Chapter 3.pptx
 
Basics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptxBasics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptx
 
origin, history.pptx
origin, history.pptxorigin, history.pptx
origin, history.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research InstituteBioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research Institute
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
 
An analysis of recent advancements in computational biology and Bioinformatic...
An analysis of recent advancements in computational biology and Bioinformatic...An analysis of recent advancements in computational biology and Bioinformatic...
An analysis of recent advancements in computational biology and Bioinformatic...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
150522 bioinfo gis lr
150522 bioinfo gis lr150522 bioinfo gis lr
150522 bioinfo gis lr
 
BioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docxBioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docx
 
BioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docxBioinformaticsPurpose Bioinformatics is the combination of comp.docx
BioinformaticsPurpose Bioinformatics is the combination of comp.docx
 
Pcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iPcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture i
 

Recently uploaded

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 

Recently uploaded (20)

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 

bioinformatics algorithms and its basics

  • 1. Established as per the Section 2(f) of the UGC Act, 1956 Approved by AICTE, COA and BCI, New Delhi BIOINFORMATICS School of Applied Sciences
  • 2. Established as per the Section 2(f) of the UGC Act, 1956 Approved by AICTE, COA and BCI, New Delhi Program: B.Sc BMCs Course Title: Bioinformatics and Biostatistics Course Code: B19MC3030 Course Type: Hardcore Course Presenter: Prashantha C N Course Mentor: Prashantha C N Semester: & Section: 3rd semester Academic Year: 2020-21 Course Pre-requisites: Students should have basic knowledge of biology, statistics and computer sciences. Students should be familiar with the basic concepts of DNA, RNA and Protein. L T P: L=4 T=0 P=4 Pedagogy:
  • 3. Course objectives 1. The basic objective is to give students an introduction, scope and objectives of bioinformatics 2. Understanding of different types of biological databases and tools used to understand data structures. Course outcome 1. Knowledge and awareness of the basic principles and concepts of biology and biological databases, literature search methods. Pedagogy chart Course code POs/ COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PSO1 PSO2 B20MC3030 CO1 3 2 2 1 1 1 3 1 CO2 2 3 3 3 2 1 2 2 3 2 CO3 1 3 3 3 2 1 2 2 3 2 CO4 1 3 2 1 1 1 2 2 3 1
  • 4. Syllabus Unit Topics Course Outcomes Program Outcomes I Introduction to Bioinformatics, Goal, Scope, Applications, Limitations 1 1 Biological Databases: Types of Databases, 1 2 Literature databases: Open access and open sources, PubMed, PLoS, Biomed Central, 1 2 Information Retrieval from Biological Databases; Sequence Formats, Sequence databases, structural databases, 2 2 Genome Databases: Viral genome database (ICTVdb, VirGen), Bacterial Genomes database (Genomes OnLine Database –GOLD, Microbial Genome Database- MBGD), Organism specific Genome database (OMIM / OMIA, SGD, Worm Base, PlasmoDB, FlyBase, TAIR), and ligand databases. 3 3 Program Outcomes PO-1: Science knowledge: Apply the knowledge of Biology, computer science and mathematics, to develop algorithms and analyze biological programs related to drug discovery, and genomic analysis to solve biological problems. PO-2: Problem analysis: Identify, formulate and analyze problems related to the various domains of Biological sciences such as Molecular Biology, Biotechnology, Genetics, Healthcare, agricultural, animal sciences and Microbial biotechnology. PO-3: Conduct investigations of complex problems: Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions.
  • 5. Reference Books 1. Andreas D. Baxevanis, B.F. Francis Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 2nd Edition, 2009. 2. David W. Mount., Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, New York. 2004. 3. Andrew R. Leach., Molecular Modeling Principles and Applications (2nd Ed.), Prentice Hall, USA. 2001. 4. G. E. Schulz., Principles of Protein Structure, Springer 2009. e-Resources 1. https://youtu.be/w-uk-_TOgR0 2. https://www.youtube.com/watch?v=IrHDOEDtwD4 3. https://www.youtube.com/watch?v=48Xr-H05raA Suggested Readings Unit wise 1. Jin Xiong., Essential Bioinformatics, Cambridge university press (2006).
  • 6. Presentation Topics Fundamentals of Bioinformatics: Introduction to Bioinformatics, Goal, Scope, Applications, Limitations, Biological Databases: Types of Databases, Biological databases; Literature databases: Open access and open sources, PubMed, PLoS, Biomed Central, Information Retrieval from Biological Databases; Sequence Formats, Sequence databases, structural databases, Genome Databases: Viral genome database (ICTVdb, VirGen), Bacterial Genomes database (Genomes OnLine Database –GOLD, Microbial Genome Database-MBGD), Organism specific Genome database (OMIM / OMIA, SGD, Worm Base, PlasmoDB, FlyBase, TAIR), and ligand databases. Assignments 1. Applications of Bioinformatics 2. Searching literature through PubMed 3. Genome File formats 4. Sequence File formats 5. Understanding of Viral Genome databases 6. Understanding of Bacterial Genome databases 7. Ligand Databases and their applications.
  • 7. Unit-1: Fundamentals of Bioinformatics Bioinformatics Biology + Mathematical + Statistical + Computational tools = Bioinformatics Biology Natural science that study life and living organism based on physical, chemical, molecular, physiological, development and evolution of organisms. Genetics, Molecular Biology, Biotechnology, Microbiology, Biochemistry, etc Mathematical + Statistical Mathematical formulations to predict the algorithms, study of DNA & Protein interactions, Statistical methods to analyze biological data and predict the interpretations. Study of arithmetic calculations, normalization, standard deviation, probability, correlations & regressions, ANOVA and Variances. Computational Tools Biological Data collection, Storage, Retrieve, Analyze and Interpretation
  • 8. Bioinformatics is a field of a science in which biology, computer science and information technology merge into a single discipline to analyze biological information using computers and statistical techniques. Biological experiments can generate large amounts of data (macromolecular sequences, structures, expression profiles, pathways etc). Bioinformatics is about acquiring, managing, analyzing and understanding those data.
  • 9. Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret biological data. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Definitions of Bioinformatics
  • 10. History of Bioinformatics The first major bioinformatics project was undertaken by Margaret Dayhoff in 1965, who developed a first protein sequence database called Atlas of Protein Sequence and Structure. She is American Physical Chemist professor at Georgetown University Medical Center Margaret Dayhoff, 1965 She originated one of the first substitution matrices, point accepted mutations (PAM). The one-letter code used for amino acids was developed by her, reflecting an attempt to reduce the size of the data files used to describe amino acid sequences in an era of punch-card computing. Subsequently, in the early 1970s, the Brookhaven National Laboratory established the Protein Data Bank for archiving three-dimensional protein structures. The first sequence alignment algorithm was developed by Needleman and Wunsch in 1970. The first protein structure prediction algorithm was developed by Chou and Fasman in 1974. GenBank was developed in 1980, a fast database search and sequence repository. FASTA was developed by William Pearson BLAST was developed by Stephen Altschul and coworkers.
  • 11. Bioinformatics differs from a related field known as computational biology. Bioinformatics is limited to sequence, structural, and functional analysis of genes and genomes and their corresponding products and is often considered computational molecular biology. However, computational biology encompasses all biological areas that involve computation. For example, mathematical modeling of ecosystems, population dynamics, application of the game theory in behavioral studies, and phylogenetic construction using fossil records all employ computational tools, but do not necessarily involve biological macromolecules. Bioinformatics Vs Computational Biology
  • 12. GOAL OF BIOINFORMATICS Bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data. Development and implementation of computer programs that enable efficient access to, use and management of, various types of information Development of new algorithms (mathematical formulas) and statistical measures that assess relationships among members of large data sets. For example, there are methods to locate a gene within a sequence, to predict protein structure and/or function, and to cluster protein sequences into families of related sequences. The primary goal of bioinformatics is to increase the understanding of biological processes.  Pattern recognition,  Data mining  Machine learning algorithms and visualization.
  • 13. Genomics Services • Next Generation sequencing Data Analysis • Microarray Data Analysis Proteomics Services • Proteomic structure prediction • Protein models • Protein-protein interactions • Peptidomics service Drug Discovery Services • Structure based drug discovery • Ligand Based drug discovery • Pharmacophore & Pharmacokinetics • Molecular Docking & Virtual Screening • Phytochemical Extraction & Identification • Chemical Synthesis and Characterization Biostatistics & Bio-IT Services • Clinical Data Analysis • Algorithm Development • Bio-Tools Development • Biological Database Development Bioinformatics • Sequence similarity search • Phylogenetic Analysis • Primer Design • Functional Annotation & Enrichment • Gene Ontology & Pathway prediction • Gene & Protein Prediction Scope of Bioinformatics Bioinfor matics Genomics Proteomi cs Drug Discovery Biostatisti cs & Bio- IT
  • 14. Applications of Bioinformatics Microbial genome applications Molecular medicine Personalized medicine Preventative medicine Gene therapy Drug development Antibiotic resistance Evolutionary studies Waste cleanup Biotechnology Climate change Studies Alternative energy sources Crop improvement Forensic analysis Bio-weapon creation Insect resistance Improve nutritional quality Development of Drought resistant varieties Veterinary Science Biomedical Informatics Medical Ontologies & standards Clinical data management Machine Learning Computing High performance computing Software optimization & parallelization Cloud Computing Software engineering Machine Learning Data Management
  • 15. Limitations Computer based programmes have helped in better understanding of various processes of life science. However, there are some limitations of bioinformatics which are listed below: 1) Bioinformatics requires sophisticated laboratory of molecular biology for in-depth study of Biomolecules. Establishment of such laboratories requires lot of funds. 2) Computer based study of life science requires some training about various computer programmes applicable for the study of different processes of life science. Thus special training is required for handling of computer based biological data. 3) There should be uninterrupted electricity (power) supply for computer aided biological investigations. Interruption of power may sometimes lead to loss of huge data from the computer memory. 4) There should be regular checking of computer viruses because viruses may pose several problems such as deletion of data and corruption of the programmes. 5) The maintenance and up keeping of molecular laboratories involves lot of expenditure which sometimes becomes a limiting factor for computer based molecular studies.
  • 16. Biological Databases Databases are convenient system to properly store, search and retrieve any type of data. A database helps to easily handle and share large amount of data and support large scale analysis by easy access and data updating. Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high- throughput experiment technology and computational analysis. They contain information from genomics, proteomics, microarray gene expression. Information contained in biological databases includes function, gene structure localization (both cellular and chromosomal), biological sequences and structures. Based on the Biological contents such as DNA, RNA, Protein and functions, databases can be roughly divided into two categories: Primary Databases Secondary Databases Tertiary Databases
  • 17. Primary Databases Theses are the primary sources of data used to store nucleic acid, protein sequences and structural information of biological macromolecules. Some of primary databases- • EMBL, GenBank, DDBJ (DNA data bank of Japan) • SWISS-PROT(Swiss-Prot ) • PIR (Protein Information Resource) • PDB(Protein Data Bank) (This sequence collection of this database is due to the efforts of basic research from academic industrial and sequencing lab)
  • 18. Secondary Databases Secondary databases comprise data derived from the results of analyzing primary data. Secondary databases often draw upon information from numerous sources, including other databases (primary and secondary), controlled vocabularies and the scientific literature. They are highly curated, often using a complex combination of computational algorithms and manual analysis and interpretation to derive new knowledge from the public record of science. Example: Uniprot, SwissProt/TrEMBL, PIR, CDD InterPro (protein families, motifs and domains) UniProt Knowledgebase (sequence and functional information on proteins) Ensembl (variation, function, regulation and more layered onto whole genome sequences) UniProt, InterproScan, Prosite, Pfam, PRINTS, BLOCKS
  • 19. SCOPAND CATH SCOP: a Structural Classification of Proteins Family: 1). All proteins that have residue identities of 30% and greater; 2). proteins with lower sequence identities but whose functions and structures are very similar; for example, globins with sequence identities of 15%. Super family: Families whose proteins have low sequence identities but whose structures and, in many cases, functional features suggest that a common evolutionary origin is probable, are placed together in super families; Common fold: Super families and families are defined as having a common fold if their proteins have the same major secondary structures in the same arrangement and with the same topological connections. Class: The different folds have been grouped into classes. Most of the folds are assigned to one of the five structural classes: 1) all-α, those whose structure is essentially formed by α-helices; 2) all-β, those whose structure is essentially formed by β-sheets; 3) α/β, those with α-helices and β-strands; 4) α+β, those in which α-helices and β-strands are largely segregated; 5) multi-domain, those with domains of different fold and for which no homologues are known at present.
  • 20. CATH: Class, Architecture, Topology and Homology Class: Mainly based on three major classes are recognized; mainly-alpha, mainly-beta and alpha-beta. Architecture: This describes the overall shape of the domain structure as determined by the orientations of the secondary structures but ignores the connectivity between the secondary structures. Topology: Structures are grouped into fold families at this level depending on both the overall shape and connectivity of the secondary structures. Homology: Sequence identity >= 35%, 60% of larger structure equivalent to smaller SSAP score >= 80.0 and sequence identity >= 20% 60% of larger structure equivalent to smaller SSAP score >= 80.0, 60% of larger structure equivalent to smaller, and domains which have related functions
  • 21. Tertiary Databases Tertiary databases also called as protein 3D structure database. Protein Data Bank (PDB) Nucleotide Data Bank (NDB) Protein Data Bank (PDB) The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo- electron microscopy, and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organizations (PDBe, PDBj, RCSB, and BMRB). The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.
  • 23. Literature Databases An extensive search of the information available on a topic which results in a list of references to books, periodicals, and other materials on the topic Mainly - Books ( printed or e-books) • Journals (both) • Research reports (both) • Institutional publications(both) • Govt. publications (both) • Various NGO’s/ INGO’s publications • Internet (Online resources) • Intranet (Offline resources) • Grey Literature
  • 24. Open Access and Open Source Open source software, like free software, is a kind of software whose source code is available for inspection or modification. Some open source software is available for a fee, but much of it is available at no cost. Open access is a kind of access or availability. This kind of access could apply to any digital content, such as software, music, movies, or news. But we only calls for open access to a certain kind of scientific and scholarly literature.
  • 25. Sources of information By nature/content of information- 1. Primary e.g. journal , reports 2. Secondary e.g. Books 3. Tertiary e.g. Subject bibliography By format/media/channel a. Hard copy (Print) b. Soft copy(Electronic) A. Printed B. Electronic a. Offline (Intranet) e.g. DVD, CD, cassette b. Online (Internet) Types A and B(a), In Library. HOW CAN WE SEARCH?
  • 26. Online searching Internet searching for the purpose of academic, business and others. PubMed (Databases) searching exclusively for medical literature. PURPOSE OR NEEDS 1. Review theory 2. Problem statement 3. To improve self knowledge 4. To Know methodology. 5. To assess need of problem 6. To support on tools, methods, findings, data collection 7. To review what was done in past 8. To generate Idea
  • 27. Searching Literature using PubMed Efficient literature search is essential to the practice of Evidence-Based Medicine. PubMed provides free access to one of the largest searchable biomedical databases. Efficient literature search using PubMed requires a good understanding of the available search strategies and tools. Several PubMed tools including 'Single Citation Matcher', 'Clinical Queries', 'Clipboard', 'Field Tags', and 'Cubby' are highlighted using case based scenarios.
  • 28.
  • 29.
  • 30. Sequence formats Data is stored in a biological database in the form of sequences or molecular form. Unique file format Representation of data in biological database Categories of file formats Sequence database Molecular database Sequence file formats Gene bank flat-file Format FASTA Format Multi-FASTA Format GCG Format GCG-MSF Format EMBL Format Clustal Format SWIS PROT format
  • 31. GenBank • Used by NCBI • It is divided into three parts • Header just a direct and very precise or brief introductory part • Features all genes in seq., location of genes in genome, protein product and coding genes etc. • Sequence : ORIGIN atcgatcgatgcgctat //
  • 32. FASTA  One line header Stats with > followed by name of gene Sequence of gene or protein • Blank spaces • Paragraph marks • Numerals Are all ignored Steric sign * at the end
  • 33. Sequence Databases Sequence generation 1. First Generation Sequencing 2. Second Generation Sequencing 3. Third Generation Sequencing Types of Sequence Databases Nucleotide Sequence Databases 1. EMBL/DDBJ/GenBank 2. RefSeq 3. Ensembl Protein Sequence Databases 1. TrEMBL 2. GenPept 3. Entrez protein 4. Uniprot Sequence Submission 1. Sequin 2. BankIt 3. Webin
  • 34. Structural Databases Protein Data Bank – X-Ray Crystallography
  • 35. Genome Databases Genomic databases are integral parts of human genome informatics, which enjoyed an exponential growth in the post genomic era, as a result of the understanding of the genetic etiology of human disorders and the identification of numerous genomic variants. MethBase A reference methylome database Animal Genome Size Database Aspergillus Genomes Atlas of Genetics and Cytogenetics in Oncology and Haematology SilkBase Bombyx mori genome BGD Bovine Genome Database CGD Candida Genome Database Chicken (Gallus gallus) Genome CYORF Cyanobacteria Gene Annotation Database Cytogenetics Gallery OriDB DNA Replication Origin Database wFleaBase Daphnia Water Flea Genome Database diArk Database for eukaryotic genome and EST sequencing projects DGV Database of Genomic Variants
  • 36. DGVa Database of Genomic Variants archive GenAge Database of genes related to ageing dbVar Database of genomic structural variation Dog Genome Annotation (Canis familiaris) FlyBase Drosophila Genes and Genome Database Ensembl Tutorial: Browsing genomes Gene Ontology Consortium GMOD Generic Model Organism Database G10K Genome 10K Project: Genomes of 10,000 vertebrate species Genome Sequencing Consortiums and Centers GWIDD Genome Wide Docking Database GOLD Genomes OnLine Database euGenes Genomic Information for Eukaryotic Organisms WormBase Genomics and biology of C. elegans and related nematodes GiardiaDB Giardia lamblia Genomics Resource Google Cloud Life Sciences Honey Bee Genome Project - Apis mellifera HGD Hymenoptera Genome Database GeneLoc Integrated map for each human chromosome IGSR International Genome Sample Resource
  • 37. IMPC International Mouse Phenotyping Consortium IRDB Inverted Repeats Database in genomic DNA JGI Genome Portal KEGG Organisms: Complete Genomes KEGG Kyoto Encyclopedia of Genes and Genomes MGI Mouse Genome Informatics Mouse Genomes Project Mouse Genomic Imprinting NCBI Human Genome - Guide to Online Information Nematode Genomes PharmGKB Pharmacogenomics Knowledgebase PlasmoDB Plasmodium Genome Resource RGD Rat Genome Database SGD Saccharomyces Genome Database PomBase Schizosaccharomyces pombe Genome Project Telomerase Database TGD Tetrahymena Genome Database UCSC Genome Bioinformatics VEGA Vertebrate Genome Annotation database
  • 39.
  • 42.
  • 44. Assignments 1. Applications of Bioinformatics 2. How to Search literature using PubMed 3. Write the importance of Genome Data Viewer 4. Insect Genome Databases and their applications 5. Gene to Disease Databases