SlideShare a Scribd company logo
1 of 25
Genome Data ManagementGenome Data Management
Shabeer Ismaeel
MSC IT II
SEMESTER
Department Of Information
Technology .
• Biological Sciences.Biological Sciences.
• Genetics.Genetics.
• Characteristics of Biological Data.Characteristics of Biological Data.
• What is Bioinformatics?What is Bioinformatics?
• Human Genome and availability ofHuman Genome and availability of
information .information .
• Existing Biological Databases.Existing Biological Databases.
• Various Branches Benefited.Various Branches Benefited.
Contents
Biological Sciences.Biological Sciences.
– The biological sciences encompass an enormousThe biological sciences encompass an enormous
variety of information.variety of information.
• EnvironmentalEnvironmental sciencescience gives us a view of how speciesgives us a view of how species
live and interact in a world filled with natural phenomena.live and interact in a world filled with natural phenomena.
• BiologyBiology andand ecologyecology study particular species.study particular species.
• AnatomyAnatomy focuses on the overall structure of an organism,focuses on the overall structure of an organism,
documenting the physical aspects of individual bodies.documenting the physical aspects of individual bodies.
• Traditional medicine and physiologyTraditional medicine and physiology break the organismbreak the organism
into systems and tissues and strive to collect informationinto systems and tissues and strive to collect information
on the workings of these systems and the organism as aon the workings of these systems and the organism as a
wholewhole..
• Histology and cell biologyHistology and cell biology delve into thedelve into the
tissue and cellular levels and providetissue and cellular levels and provide
knowledge about the inner structure andknowledge about the inner structure and
function of the cell.function of the cell.
-This wealth of information that has been-This wealth of information that has been
generated, classified, and stored forgenerated, classified, and stored for
centuries has only recently become acenturies has only recently become a
major application of database technology.major application of database technology.
Genetics.Genetics.
• GeneticsGenetics has emerged as an ideal fieldhas emerged as an ideal field
for the application of informationfor the application of information
technology.technology.
– In a broad sense, it can be taught of as theIn a broad sense, it can be taught of as the
construction of models based onconstruction of models based on
information about genes and populationinformation about genes and population
and the seeking out of relationships in thatand the seeking out of relationships in that
information.information.
• Genes can be defined as units of heredity.Genes can be defined as units of heredity.
-The study of genetics can be divided into three-The study of genetics can be divided into three
branches:branches:
MendelianMendelian geneticsgenetics is the study of theis the study of the
transmission of traits between generations.transmission of traits between generations.
MolecularMolecular geneticsgenetics is the study of the chemicalis the study of the chemical
structure and function of genes at the molecularstructure and function of genes at the molecular
level.level.
PopulationPopulation geneticsgenetics is the study of how geneticis the study of how genetic
information varies across populations ofinformation varies across populations of
organisms.organisms.
 The origins ofThe origins of molecular geneticsmolecular genetics can be traced tocan be traced to
two important discoveries:two important discoveries:
- In 1869 when Friedrich Miescher discovered- In 1869 when Friedrich Miescher discovered
Nuclein and its primary component,Nuclein and its primary component,
deoxyribonucleic acid (DNA).deoxyribonucleic acid (DNA).
In subsequent research DNA and a related compound,In subsequent research DNA and a related compound,
ribonucleic acid, were found to be composed of nucleotides (aribonucleic acid, were found to be composed of nucleotides (a
sugar, a phosphate, and a base combining to form nucleic acid)sugar, a phosphate, and a base combining to form nucleic acid)
linked into long polymers via the sugar and phosphate.linked into long polymers via the sugar and phosphate.
--The second discovery was the demonstration inThe second discovery was the demonstration in
1944 by Oswald Avery that DNA was indeed the1944 by Oswald Avery that DNA was indeed the
molecular substance carrying genetic information.molecular substance carrying genetic information.
 Genes were shown to be composed of chains ofGenes were shown to be composed of chains of
nucleic acids arranged linearly on chromosomes andnucleic acids arranged linearly on chromosomes and
to serve three primary functions:to serve three primary functions:
-Replicating genetic information between-Replicating genetic information between
generations,generations,
-Providing blueprints for the creation of polypeptides,-Providing blueprints for the creation of polypeptides,
andand
-Accumulating changes– thereby allowing evolution-Accumulating changes– thereby allowing evolution
to occur.to occur.
------------------Watson and Crick found the double-helixWatson and Crick found the double-helix
structure of the DNA in 1953, which gave molecularstructure of the DNA in 1953, which gave molecular
biology a new direction.biology a new direction.
Characteristics of Biological DataCharacteristics of Biological Data
• Biological data exhibits many specialBiological data exhibits many special
characteristics that make managementcharacteristics that make management
of biological information a particularlyof biological information a particularly
challenging problem.challenging problem.
• The characteristics related to biologicalThe characteristics related to biological
information is calledinformation is called Bioinformatics.Bioinformatics.
What is Bioinformatics?What is Bioinformatics?
• Bioinformatics is the field of science in which
biology, computer science, and information
technology merge into a single discipline.
• The ultimate goal of the field is to enable the
discovery of new biological insights as well as to
create a global perspective from which unifying
principles in biology can be detected.
• There are three important sub-disciplines within
bioinformatics which include:
1.1. The development of new algorithms andThe development of new algorithms and
statistics with which to assess relationshipsstatistics with which to assess relationships
among members of large data sets.among members of large data sets.
2.2. The analysis and interpretation of various typesThe analysis and interpretation of various types
of data including nucleotide and amino acidof data including nucleotide and amino acid
sequences, protein domains, and proteinsequences, protein domains, and protein
structures.structures.
3.3. The development and implementation of toolsThe development and implementation of tools
that enable efficient access and management ofthat enable efficient access and management of
different types of information.different types of information.
Biological Data + Computer Calculations
Bioinformatics
The Bioinformatics SpectrumThe Bioinformatics Spectrum
Various characteristicsVarious characteristics
 Biological data is highly complex when comparedBiological data is highly complex when compared
with most other domains or applications.with most other domains or applications.
 The amount and range of variability in data is high.The amount and range of variability in data is high.
 Schemas in biological databases change at a rapidSchemas in biological databases change at a rapid
pace.pace.
 Representations of the same data by differentRepresentations of the same data by different
biologists will likely be different (even using thebiologists will likely be different (even using the
same system).same system).
 Most users of biological data do not require writeMost users of biological data do not require write
access to the database; read-only access isaccess to the database; read-only access is
adequate.adequate.
 Most biologists are not likely to have
knowledge of the internal structure of the
database or about schema design.
 The context of data gives added meaning for
its use in biological applications
 Defining and representing complex queries is
extremely important to the biologist.
 Users of biological information often require
access to “old” values of the data –
particularly when verifying previously reported
results.
What is the Human Genome?What is the Human Genome?
-The term genome is defined as the total genetic
information that can be obtained about an entity.
E.g., the human genome generally refers to the
complete set of genes required to create a
human being.
-The number is estimated to be more than
30,000 genes spread over 23 pairs of
chromosomes, with an estimated 3 to 4
billion nucleotides.
---The goal of the Human Genome Project (HGP
Began in 1990 ) is to obtain the complete
sequence – the ordering of the bases – of those
nucleotides.
Existing Biological Databases.Existing Biological Databases.
• Some of the existing database systems that areSome of the existing database systems that are
supporting or have grown out of the Human Genomesupporting or have grown out of the Human Genome
Project include:Project include:
• GenBankGenBank
– The notable DNA sequence database in the world today isThe notable DNA sequence database in the world today is
GenBank, maintained by the National Center forGenBank, maintained by the National Center for
Biotechnology Information (Biotechnology Information (NCBINCBI) of the National Library of) of the National Library of
Medicine (Medicine (NLMNLM).).
– Established in 1978 as a secret storage for DNA sequenceEstablished in 1978 as a secret storage for DNA sequence
data.data.
– Since 1978 expanded to include sequence tag data, proteinSince 1978 expanded to include sequence tag data, protein
sequence data, three-dimensional protein structure,sequence data, three-dimensional protein structure,
taxonomy, and links to the medical literature (MEDLINE).taxonomy, and links to the medical literature (MEDLINE).
- GenBank contains over 31 billion nucleotide bases of
more than 24 million sequences from over 100,000
species with roughly 1400 new organisms being added
each month.
-The database size in flat file format is over 100 GB
uncompressed and has been doubling every 15 months.
-The system is maintained as a combination of flat files,
relational databases, and files containing Abstract Syntax
Notation One (ASN.1 rules for encoding and decoding
data) .
• The Genome Database (GDB)The Genome Database (GDB)
--Created in 1989, GDB is a catalog of human gene mappingCreated in 1989, GDB is a catalog of human gene mapping
data, a process that associates a piece of information with adata, a process that associates a piece of information with a
particular location on the human genome.particular location on the human genome.
--The GDB system is built around Sybase, aThe GDB system is built around Sybase, a
commercial relational DBMS, and its data arecommercial relational DBMS, and its data are
modeled using standard Entity-Relationshipmodeled using standard Entity-Relationship
techniques.techniques.
------GDB distributes a Database Access Toolkit.------GDB distributes a Database Access Toolkit.
Online Mendelian Inheritance in ManOnline Mendelian Inheritance in Man
• Online Mandelian Inheritance in Man (Online Mandelian Inheritance in Man (OMIMOMIM) is) is
an electronic collection of information on thean electronic collection of information on the
genetic basis of human disease.genetic basis of human disease.
• In 1991 its administration was transferred fromIn 1991 its administration was transferred from
John Hopkins University to the NCBI(John Hopkins University to the NCBI(NationalNational
Center For Biotechnology InformationCenter For Biotechnology Information), and the), and the
entire database was converted to NCBI’sentire database was converted to NCBI’s
GenBank format. Today it contains more thanGenBank format. Today it contains more than
14,000 entries.14,000 entries.
EcoCycEcoCyc
– The Encyclopedia ofThe Encyclopedia of Escherichia coliEscherichia coli
Genes and Metabolism (Genes and Metabolism (EcoCycEcoCyc) is a recent) is a recent
experiment in combining information aboutexperiment in combining information about
the genome and the metabolism of E.coli K-the genome and the metabolism of E.coli K-
12(Bacteria).12(Bacteria).
– The database was created in 1996 as aThe database was created in 1996 as a
collaboration between Stanford Researchcollaboration between Stanford Research
Institute and Marine Biological Laboratory.Institute and Marine Biological Laboratory.
Gene OntologyGene Ontology
– Gene Ontology (GO) Consortium was formed inGene Ontology (GO) Consortium was formed in
1998 as a collaboration among three model1998 as a collaboration among three model
organism databases: FlyBase, Mouse Genomeorganism databases: FlyBase, Mouse Genome
Informatics (MGI) and Saccharomyces or yeastInformatics (MGI) and Saccharomyces or yeast
Genome Database (SGD).Genome Database (SGD).
• The goal is to produce a structured, precisely defined,The goal is to produce a structured, precisely defined,
common, controlled vocabulary for describing the roles ofcommon, controlled vocabulary for describing the roles of
genes and gene products in any organismgenes and gene products in any organism..
• Latest release of GO database has over 13,000 terms and moreLatest release of GO database has over 13,000 terms and more
than 18,000 relationships between terms.than 18,000 relationships between terms.
• GO was implemented using MySQL, an open source relationalGO was implemented using MySQL, an open source relational
database and a monthly database release is available in SQL anddatabase and a monthly database release is available in SQL and
XML(Extensible Markup Language) formats.XML(Extensible Markup Language) formats.
Summary Of the MajorSummary Of the Major
Genome-Related DatabasesGenome-Related Databases
Various Branches Benefited.Various Branches Benefited.
• Medicine
• PharmacogenomicsPharmacogenomics
• Biotechnology
• Bioinformatics
• Proteomics
Genome data management

More Related Content

What's hot (20)

ENTREZ.ppt
ENTREZ.pptENTREZ.ppt
ENTREZ.ppt
 
Scop database
Scop databaseScop database
Scop database
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Fasta
FastaFasta
Fasta
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
swiss-prot<bioinformatics>
swiss-prot<bioinformatics>swiss-prot<bioinformatics>
swiss-prot<bioinformatics>
 
Ncbi
NcbiNcbi
Ncbi
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesComparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organelles
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Clustal
ClustalClustal
Clustal
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Ddbj
DdbjDdbj
Ddbj
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 

Similar to Genome data management

Human genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifeHuman genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifearjunaa7
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and BioinformaticsAmit Garg
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSsandeshGM
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)jmoore89
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsbiinoida
 
introduction to bioinfromatics.pptx
introduction to bioinfromatics.pptxintroduction to bioinfromatics.pptx
introduction to bioinfromatics.pptxAbelPhilipJoseph
 
History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)Madan Kumar Ca
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECTNusrat Gulbarga
 
Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdfsirwansleman
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahuKAUSHAL SAHU
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsAmna Jalil
 

Similar to Genome data management (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
 
Human genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifeHuman genome project - Decoding the codes of life
Human genome project - Decoding the codes of life
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and Bioinformatics
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
introduction to bioinfromatics.pptx
introduction to bioinfromatics.pptxintroduction to bioinfromatics.pptx
introduction to bioinfromatics.pptx
 
History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdf
 
Human genome
Human genomeHuman genome
Human genome
 
The Human Genome Project
The Human Genome Project The Human Genome Project
The Human Genome Project
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Genomics
GenomicsGenomics
Genomics
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 

More from Shareb Ismaeel

More from Shareb Ismaeel (9)

Cybercrimes
CybercrimesCybercrimes
Cybercrimes
 
Unions
UnionsUnions
Unions
 
Spanning trees
Spanning treesSpanning trees
Spanning trees
 
Multiprocessor structures
Multiprocessor structuresMultiprocessor structures
Multiprocessor structures
 
Installation testing
Installation testingInstallation testing
Installation testing
 
E mail systems
E mail systemsE mail systems
E mail systems
 
E commerce
E commerceE commerce
E commerce
 
Disk structure
Disk structureDisk structure
Disk structure
 
.Netframework
.Netframework.Netframework
.Netframework
 

Recently uploaded

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 

Recently uploaded (20)

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 

Genome data management

  • 1. Genome Data ManagementGenome Data Management Shabeer Ismaeel MSC IT II SEMESTER Department Of Information Technology .
  • 2. • Biological Sciences.Biological Sciences. • Genetics.Genetics. • Characteristics of Biological Data.Characteristics of Biological Data. • What is Bioinformatics?What is Bioinformatics? • Human Genome and availability ofHuman Genome and availability of information .information . • Existing Biological Databases.Existing Biological Databases. • Various Branches Benefited.Various Branches Benefited. Contents
  • 3. Biological Sciences.Biological Sciences. – The biological sciences encompass an enormousThe biological sciences encompass an enormous variety of information.variety of information. • EnvironmentalEnvironmental sciencescience gives us a view of how speciesgives us a view of how species live and interact in a world filled with natural phenomena.live and interact in a world filled with natural phenomena. • BiologyBiology andand ecologyecology study particular species.study particular species. • AnatomyAnatomy focuses on the overall structure of an organism,focuses on the overall structure of an organism, documenting the physical aspects of individual bodies.documenting the physical aspects of individual bodies. • Traditional medicine and physiologyTraditional medicine and physiology break the organismbreak the organism into systems and tissues and strive to collect informationinto systems and tissues and strive to collect information on the workings of these systems and the organism as aon the workings of these systems and the organism as a wholewhole..
  • 4. • Histology and cell biologyHistology and cell biology delve into thedelve into the tissue and cellular levels and providetissue and cellular levels and provide knowledge about the inner structure andknowledge about the inner structure and function of the cell.function of the cell. -This wealth of information that has been-This wealth of information that has been generated, classified, and stored forgenerated, classified, and stored for centuries has only recently become acenturies has only recently become a major application of database technology.major application of database technology.
  • 5. Genetics.Genetics. • GeneticsGenetics has emerged as an ideal fieldhas emerged as an ideal field for the application of informationfor the application of information technology.technology. – In a broad sense, it can be taught of as theIn a broad sense, it can be taught of as the construction of models based onconstruction of models based on information about genes and populationinformation about genes and population and the seeking out of relationships in thatand the seeking out of relationships in that information.information. • Genes can be defined as units of heredity.Genes can be defined as units of heredity.
  • 6. -The study of genetics can be divided into three-The study of genetics can be divided into three branches:branches: MendelianMendelian geneticsgenetics is the study of theis the study of the transmission of traits between generations.transmission of traits between generations. MolecularMolecular geneticsgenetics is the study of the chemicalis the study of the chemical structure and function of genes at the molecularstructure and function of genes at the molecular level.level. PopulationPopulation geneticsgenetics is the study of how geneticis the study of how genetic information varies across populations ofinformation varies across populations of organisms.organisms.
  • 7.  The origins ofThe origins of molecular geneticsmolecular genetics can be traced tocan be traced to two important discoveries:two important discoveries: - In 1869 when Friedrich Miescher discovered- In 1869 when Friedrich Miescher discovered Nuclein and its primary component,Nuclein and its primary component, deoxyribonucleic acid (DNA).deoxyribonucleic acid (DNA). In subsequent research DNA and a related compound,In subsequent research DNA and a related compound, ribonucleic acid, were found to be composed of nucleotides (aribonucleic acid, were found to be composed of nucleotides (a sugar, a phosphate, and a base combining to form nucleic acid)sugar, a phosphate, and a base combining to form nucleic acid) linked into long polymers via the sugar and phosphate.linked into long polymers via the sugar and phosphate. --The second discovery was the demonstration inThe second discovery was the demonstration in 1944 by Oswald Avery that DNA was indeed the1944 by Oswald Avery that DNA was indeed the molecular substance carrying genetic information.molecular substance carrying genetic information.
  • 8.  Genes were shown to be composed of chains ofGenes were shown to be composed of chains of nucleic acids arranged linearly on chromosomes andnucleic acids arranged linearly on chromosomes and to serve three primary functions:to serve three primary functions: -Replicating genetic information between-Replicating genetic information between generations,generations, -Providing blueprints for the creation of polypeptides,-Providing blueprints for the creation of polypeptides, andand -Accumulating changes– thereby allowing evolution-Accumulating changes– thereby allowing evolution to occur.to occur. ------------------Watson and Crick found the double-helixWatson and Crick found the double-helix structure of the DNA in 1953, which gave molecularstructure of the DNA in 1953, which gave molecular biology a new direction.biology a new direction.
  • 9. Characteristics of Biological DataCharacteristics of Biological Data • Biological data exhibits many specialBiological data exhibits many special characteristics that make managementcharacteristics that make management of biological information a particularlyof biological information a particularly challenging problem.challenging problem. • The characteristics related to biologicalThe characteristics related to biological information is calledinformation is called Bioinformatics.Bioinformatics.
  • 10. What is Bioinformatics?What is Bioinformatics? • Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. • The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be detected. • There are three important sub-disciplines within bioinformatics which include:
  • 11. 1.1. The development of new algorithms andThe development of new algorithms and statistics with which to assess relationshipsstatistics with which to assess relationships among members of large data sets.among members of large data sets. 2.2. The analysis and interpretation of various typesThe analysis and interpretation of various types of data including nucleotide and amino acidof data including nucleotide and amino acid sequences, protein domains, and proteinsequences, protein domains, and protein structures.structures. 3.3. The development and implementation of toolsThe development and implementation of tools that enable efficient access and management ofthat enable efficient access and management of different types of information.different types of information.
  • 12. Biological Data + Computer Calculations Bioinformatics
  • 13. The Bioinformatics SpectrumThe Bioinformatics Spectrum
  • 14. Various characteristicsVarious characteristics  Biological data is highly complex when comparedBiological data is highly complex when compared with most other domains or applications.with most other domains or applications.  The amount and range of variability in data is high.The amount and range of variability in data is high.  Schemas in biological databases change at a rapidSchemas in biological databases change at a rapid pace.pace.  Representations of the same data by differentRepresentations of the same data by different biologists will likely be different (even using thebiologists will likely be different (even using the same system).same system).  Most users of biological data do not require writeMost users of biological data do not require write access to the database; read-only access isaccess to the database; read-only access is adequate.adequate.
  • 15.  Most biologists are not likely to have knowledge of the internal structure of the database or about schema design.  The context of data gives added meaning for its use in biological applications  Defining and representing complex queries is extremely important to the biologist.  Users of biological information often require access to “old” values of the data – particularly when verifying previously reported results.
  • 16. What is the Human Genome?What is the Human Genome? -The term genome is defined as the total genetic information that can be obtained about an entity. E.g., the human genome generally refers to the complete set of genes required to create a human being. -The number is estimated to be more than 30,000 genes spread over 23 pairs of chromosomes, with an estimated 3 to 4 billion nucleotides. ---The goal of the Human Genome Project (HGP Began in 1990 ) is to obtain the complete sequence – the ordering of the bases – of those nucleotides.
  • 17. Existing Biological Databases.Existing Biological Databases. • Some of the existing database systems that areSome of the existing database systems that are supporting or have grown out of the Human Genomesupporting or have grown out of the Human Genome Project include:Project include: • GenBankGenBank – The notable DNA sequence database in the world today isThe notable DNA sequence database in the world today is GenBank, maintained by the National Center forGenBank, maintained by the National Center for Biotechnology Information (Biotechnology Information (NCBINCBI) of the National Library of) of the National Library of Medicine (Medicine (NLMNLM).). – Established in 1978 as a secret storage for DNA sequenceEstablished in 1978 as a secret storage for DNA sequence data.data. – Since 1978 expanded to include sequence tag data, proteinSince 1978 expanded to include sequence tag data, protein sequence data, three-dimensional protein structure,sequence data, three-dimensional protein structure, taxonomy, and links to the medical literature (MEDLINE).taxonomy, and links to the medical literature (MEDLINE).
  • 18. - GenBank contains over 31 billion nucleotide bases of more than 24 million sequences from over 100,000 species with roughly 1400 new organisms being added each month. -The database size in flat file format is over 100 GB uncompressed and has been doubling every 15 months. -The system is maintained as a combination of flat files, relational databases, and files containing Abstract Syntax Notation One (ASN.1 rules for encoding and decoding data) .
  • 19. • The Genome Database (GDB)The Genome Database (GDB) --Created in 1989, GDB is a catalog of human gene mappingCreated in 1989, GDB is a catalog of human gene mapping data, a process that associates a piece of information with adata, a process that associates a piece of information with a particular location on the human genome.particular location on the human genome. --The GDB system is built around Sybase, aThe GDB system is built around Sybase, a commercial relational DBMS, and its data arecommercial relational DBMS, and its data are modeled using standard Entity-Relationshipmodeled using standard Entity-Relationship techniques.techniques. ------GDB distributes a Database Access Toolkit.------GDB distributes a Database Access Toolkit.
  • 20. Online Mendelian Inheritance in ManOnline Mendelian Inheritance in Man • Online Mandelian Inheritance in Man (Online Mandelian Inheritance in Man (OMIMOMIM) is) is an electronic collection of information on thean electronic collection of information on the genetic basis of human disease.genetic basis of human disease. • In 1991 its administration was transferred fromIn 1991 its administration was transferred from John Hopkins University to the NCBI(John Hopkins University to the NCBI(NationalNational Center For Biotechnology InformationCenter For Biotechnology Information), and the), and the entire database was converted to NCBI’sentire database was converted to NCBI’s GenBank format. Today it contains more thanGenBank format. Today it contains more than 14,000 entries.14,000 entries.
  • 21. EcoCycEcoCyc – The Encyclopedia ofThe Encyclopedia of Escherichia coliEscherichia coli Genes and Metabolism (Genes and Metabolism (EcoCycEcoCyc) is a recent) is a recent experiment in combining information aboutexperiment in combining information about the genome and the metabolism of E.coli K-the genome and the metabolism of E.coli K- 12(Bacteria).12(Bacteria). – The database was created in 1996 as aThe database was created in 1996 as a collaboration between Stanford Researchcollaboration between Stanford Research Institute and Marine Biological Laboratory.Institute and Marine Biological Laboratory.
  • 22. Gene OntologyGene Ontology – Gene Ontology (GO) Consortium was formed inGene Ontology (GO) Consortium was formed in 1998 as a collaboration among three model1998 as a collaboration among three model organism databases: FlyBase, Mouse Genomeorganism databases: FlyBase, Mouse Genome Informatics (MGI) and Saccharomyces or yeastInformatics (MGI) and Saccharomyces or yeast Genome Database (SGD).Genome Database (SGD). • The goal is to produce a structured, precisely defined,The goal is to produce a structured, precisely defined, common, controlled vocabulary for describing the roles ofcommon, controlled vocabulary for describing the roles of genes and gene products in any organismgenes and gene products in any organism.. • Latest release of GO database has over 13,000 terms and moreLatest release of GO database has over 13,000 terms and more than 18,000 relationships between terms.than 18,000 relationships between terms. • GO was implemented using MySQL, an open source relationalGO was implemented using MySQL, an open source relational database and a monthly database release is available in SQL anddatabase and a monthly database release is available in SQL and XML(Extensible Markup Language) formats.XML(Extensible Markup Language) formats.
  • 23. Summary Of the MajorSummary Of the Major Genome-Related DatabasesGenome-Related Databases
  • 24. Various Branches Benefited.Various Branches Benefited. • Medicine • PharmacogenomicsPharmacogenomics • Biotechnology • Bioinformatics • Proteomics