SlideShare a Scribd company logo
Curation
Ewan Birney (tweetable)
Who am I?
• Associate Director at
  European Bioinformatics
  Institute (EBI)
• Involved in genomics since I
  was 19 (> 20 years!)
• Trained as a biochemist –
  most people think I am CS
                                 EBI is in Hinxton, South
• Analysed – sometimes lead
                                 Cambridgeshire
  –
  human/mouse/rat/platypus
                                 EBI is part of EMBL, ~like
  etc genomes, ENCODE,
                                 CERN for molecular biology
  Others.
Molecular Biology
• The study of how life works – at a molecular level

• Key molecules:
  • DNA – Information store (Disk)
  • RNA – Key information transformer, also does stuff (RAM)
  • Proteins – The business end of life (Chip, robotic arms)
  • Metabolites – Fuel and signalling molecules (electricity)
• Theories of how these interact – no theories of to predict what
  they are
• Instead we determine attributes of molecules and store them in
  globally accessible, open, databases
Theory  Observation


                    Can accurately predict from models




 Must directly observe
    Molecular Geology,  Climate        High Energy
    Biology   Astronomy modelling      Physics
This ratio is not well correlated with data size


   ~60PB                        High Energy Physics

Data Size
             Molecular Astronomy
             Biology
    ~5PB                      Climate Models




             Ratio of model predictability
“Knowing stuff” is critical to biology…

• The bases of the human genome
  • … and the Mouse, Rat, Wheat, Ecoli, Plasmodium, Cow….
• The functions of proteins
  • Enzymes, Transcription Factors, Signalling….
• The types of cells, their lineages and organ composition
  • …and all the molecular components in each cell
• Small molecules
  • … and their conversions, binding partners
• Structures of molecules, complexes and cells
  • … at atomic and higher resolution
Two fundamental types of information

• Experimental data           • Consensus Knowledge

• The result of a specific    • Integration of different
  experiment                    strands of information on a
• Often an experiment           topic
  specific, data heavy part   • Realised as a
  plus a “meta-data” part       computationally accessible
• Might be contradictory        scheme


• “Primary paper”             • “Review article”
Five types of curation
Experimental Data Entry

• Intact – Protein:Protein
  interactions


• GWAS Catalog –
  extraction of summary
  statistics
Experimental Meta data capture

• Sample, CDS lines in
  ENA
• Sample in Metabolights,
  PRIDE etc
• Machine and analysis
  specification in PDB,
  PRIDE, ENA
Consensus integration of information

• GenCode gene models in
  human
• Summaries and GO
  assignment in UniProt
• Pathway information in
  Reactome
• GO assignment and
  summaries in MODs (eg,
  PomBase, WormBase,
  PhytoPathDB etc)
Knowledge frameworks

•   The EC classification
•   Cell type ontologies
•   Cell lineages – Worms!
•   SnowMed, HPO etc
•   GO ontologies
Knowledge management

• Creation of rules
  representing ENA
  standards compliance
• Cross-ontology
  coordination (eg, EFO) or
  tieing (GO  ChEBI)
• RuleBase / UniRule
  curation processes
Data Entry vs Programming

 Direct                                    Programmatic
 Data Entry                                Data Entry




                      “Messy” Scripting
         Improved
         Data entry
         tools              RuleBase,
                            Computational Accessible
                            Standards
Thank You!
Curation Dilema

• If you do your job well…   • If you do your job badly…

• Everyone assumes it’s      • Everyone assumes it’s
  easy                         easy
• People forget about the    • People forget about the
  complexity                   complexity


• You are ignored           • People complain 
Why we need an infrastructure…
Infrastructures are critical…
But we only notice them when they go wrong
Biology already needs an information
infrastructure

• For the human genome
  • (…and the mouse, and the rat, and… x 150 now, 1000 in the
    future!) - Ensembl
• For the function of genes and proteins
  • For all genes, in text and computational – UniProt and GO
• For all 3D structures
  • To understand how proteins work – PDBe
• For where things are expressed
  • The differences and functionality of cells - Atlas
..But this keeps on going…

• We have to scale across all of (interesting) life
  • There are a lot of species out there!
• We have to handle new areas, in particular medicine
  • A set of European haplotypes for good imputation
  • A set of actionable variants in germline and cancers
• We have to improve our chemical understanding
  • Of biological chemicals
  • Of chemicals which interfere with Biology
ELIXIR’s mission
To build a sustainable
European infrastructure for
biological
information, supporting life
science research and its
                                                  medicine
translation to:

                                    environment


                         bioindustries

            society


              22
How?

Fully Centralised                                 Fully Distributed




Pros: Stability, reuse,             Pros: Responsive, Geographic
Learning ease                       Language responsive
Cons: Hard to concentrate           Cons: Internal communication overhead
Expertise across of life science    Harder for end users to learn
Geographic, language placement      Harder to provide multi-decade stability
Bottlenecks and lack of diversity
Research        Healthcare




    International    National
    EBI / Elixir     Healthcare
    English          National Language
    Low legalities   Complex legalities

2
Other infrastructures needed for biology
• EuroBioImaging
  • Cellular and whole organism Imaging
• BioBanks (BBMRI)
  • We need numbers – European populations – in particular for rare
    diseases, but also for specific sub types of common disease
• Mouse models and phenotypes (Infrafrontier)
  • A baseline set of knockouts and phenotypes in our most tractable
    mammalian model
  • (it’s hard to prove something in human)
• Robust molecular assays in a clinical setting (EATRIS)
  • The ability to reliably use state of the art molecular techniques in a
    clinical research setting
(you can follow me on twitter @ewanbirney)
I blog and update this on Google Plus publically

More Related Content

What's hot

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Fokhruz Zaman
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
Denise Carvalho-Silva, PhD
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & Ontologies
Eric Jain
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
Sijo A
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
AyeshaYousaf20
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
Natalio Krasnogor
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
Anil Thanki
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
Phoenix Bioinformatics
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomics
Hoffman Lab
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
Brock University
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
Vinitha Nair
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
Chris Mungall
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
Nimrita Koul
 
Kegg
KeggKegg
Kegg
msfbi1521
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
Chris Mungall
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
Melanie Courtot
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
Chris Mungall
 

What's hot (20)

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & Ontologies
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomics
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
 
Kegg
KeggKegg
Kegg
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 

Similar to Ewan Birney Biocuration 2013

Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdf
sirwansleman
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hu
jimhutamu
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
Leighton Pritchard
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
robertstevens65
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
Connected Data World
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
RussellHanson
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
c.titus.brown
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
Chris Dwan
 
Big Data
Big DataBig Data
Big Data
SURFnet
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
UdayBhanushali111
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4
Michael Matthews
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Owali Shawon
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016
Christoph Steinbeck
 
Using public databases to inform research questions
Using public databases to inform research questionsUsing public databases to inform research questions
Using public databases to inform research questions
amlbinder
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study design
amlbinder
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics Senthil Natesan
 

Similar to Ewan Birney Biocuration 2013 (20)

Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdf
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hu
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Big Data
Big DataBig Data
Big Data
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016
 
Using public databases to inform research questions
Using public databases to inform research questionsUsing public databases to inform research questions
Using public databases to inform research questions
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study design
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics
 

More from Iddo

What can Community Challenges do for You?
What can Community Challenges do for You?What can Community Challenges do for You?
What can Community Challenges do for You?
Iddo
 
Surviving Scientific Presentations
Surviving Scientific PresentationsSurviving Scientific Presentations
Surviving Scientific Presentations
Iddo
 
Friedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrFriedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nr
Iddo
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
Iddo
 
Why Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongWhy Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is Wrong
Iddo
 
Tracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaTracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in Bacteria
Iddo
 
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Iddo
 
Friedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsFriedberg lab-overview-grad-students
Friedberg lab-overview-grad-students
Iddo
 
Understanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputUnderstanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low Output
Iddo
 
Random Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceRandom Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in Science
Iddo
 
Genome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryGenome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin Discovery
Iddo
 
Convergent divergent
Convergent divergentConvergent divergent
Convergent divergent
Iddo
 
Some US Science Funding sources
Some US Science Funding sourcesSome US Science Funding sources
Some US Science Funding sources
Iddo
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013Iddo
 
Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Iddo
 
Ismb grant-writing-2012
Ismb grant-writing-2012Ismb grant-writing-2012
Ismb grant-writing-2012Iddo
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011Iddo
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011Iddo
 
Afp cafa djuric
Afp cafa djuricAfp cafa djuric
Afp cafa djuricIddo
 
Go camp 2010_cacao
Go camp 2010_cacaoGo camp 2010_cacao
Go camp 2010_cacaoIddo
 

More from Iddo (20)

What can Community Challenges do for You?
What can Community Challenges do for You?What can Community Challenges do for You?
What can Community Challenges do for You?
 
Surviving Scientific Presentations
Surviving Scientific PresentationsSurviving Scientific Presentations
Surviving Scientific Presentations
 
Friedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrFriedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nr
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
 
Why Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongWhy Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is Wrong
 
Tracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaTracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in Bacteria
 
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...
 
Friedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsFriedberg lab-overview-grad-students
Friedberg lab-overview-grad-students
 
Understanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputUnderstanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low Output
 
Random Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceRandom Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in Science
 
Genome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryGenome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin Discovery
 
Convergent divergent
Convergent divergentConvergent divergent
Convergent divergent
 
Some US Science Funding sources
Some US Science Funding sourcesSome US Science Funding sources
Some US Science Funding sources
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013
 
Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Metagenomics Biocuration 2013
Metagenomics Biocuration 2013
 
Ismb grant-writing-2012
Ismb grant-writing-2012Ismb grant-writing-2012
Ismb grant-writing-2012
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011
 
Afp cafa djuric
Afp cafa djuricAfp cafa djuric
Afp cafa djuric
 
Go camp 2010_cacao
Go camp 2010_cacaoGo camp 2010_cacao
Go camp 2010_cacao
 

Recently uploaded

Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Vision-1.pptx, Eye structure, basics of optics
Vision-1.pptx, Eye structure, basics of opticsVision-1.pptx, Eye structure, basics of optics
Vision-1.pptx, Eye structure, basics of optics
Sai Sailesh Kumar Goothy
 
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
New Drug Discovery and Development .....
New Drug Discovery and Development .....New Drug Discovery and Development .....
New Drug Discovery and Development .....
NEHA GUPTA
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
LanceCatedral
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
pal078100
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Dr. Madduru Muni Haritha
 
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.GawadHemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
NephroTube - Dr.Gawad
 
Are There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdfAre There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdf
Little Cross Family Clinic
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
SwastikAyurveda
 
Pictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdfPictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdf
Dr. Rabia Inam Gandapore
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
Shweta
 
Light House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat EuropeLight House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat Europe
Lighthouse Retreat
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
aljamhori teaching hospital
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
vimalpl1234
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
SumeraAhmad5
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Prof. Marcus Renato de Carvalho
 
Pharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptx
Pharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptxPharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptx
Pharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
Sex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skullSex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skull
ShashankRoodkee
 

Recently uploaded (20)

Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
 
Vision-1.pptx, Eye structure, basics of optics
Vision-1.pptx, Eye structure, basics of opticsVision-1.pptx, Eye structure, basics of optics
Vision-1.pptx, Eye structure, basics of optics
 
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
 
New Drug Discovery and Development .....
New Drug Discovery and Development .....New Drug Discovery and Development .....
New Drug Discovery and Development .....
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
 
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.GawadHemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
 
Are There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdfAre There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdf
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
 
Pictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdfPictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdf
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
 
Light House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat EuropeLight House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat Europe
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
 
Pharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptx
Pharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptxPharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptx
Pharynx and Clinical Correlations BY Dr.Rabia Inam Gandapore.pptx
 
Sex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skullSex determination from mandible pelvis and skull
Sex determination from mandible pelvis and skull
 

Ewan Birney Biocuration 2013

  • 2. Who am I? • Associate Director at European Bioinformatics Institute (EBI) • Involved in genomics since I was 19 (> 20 years!) • Trained as a biochemist – most people think I am CS EBI is in Hinxton, South • Analysed – sometimes lead Cambridgeshire – human/mouse/rat/platypus EBI is part of EMBL, ~like etc genomes, ENCODE, CERN for molecular biology Others.
  • 3. Molecular Biology • The study of how life works – at a molecular level • Key molecules: • DNA – Information store (Disk) • RNA – Key information transformer, also does stuff (RAM) • Proteins – The business end of life (Chip, robotic arms) • Metabolites – Fuel and signalling molecules (electricity) • Theories of how these interact – no theories of to predict what they are • Instead we determine attributes of molecules and store them in globally accessible, open, databases
  • 4. Theory  Observation Can accurately predict from models Must directly observe Molecular Geology, Climate High Energy Biology Astronomy modelling Physics
  • 5. This ratio is not well correlated with data size ~60PB High Energy Physics Data Size Molecular Astronomy Biology ~5PB Climate Models Ratio of model predictability
  • 6. “Knowing stuff” is critical to biology… • The bases of the human genome • … and the Mouse, Rat, Wheat, Ecoli, Plasmodium, Cow…. • The functions of proteins • Enzymes, Transcription Factors, Signalling…. • The types of cells, their lineages and organ composition • …and all the molecular components in each cell • Small molecules • … and their conversions, binding partners • Structures of molecules, complexes and cells • … at atomic and higher resolution
  • 7. Two fundamental types of information • Experimental data • Consensus Knowledge • The result of a specific • Integration of different experiment strands of information on a • Often an experiment topic specific, data heavy part • Realised as a plus a “meta-data” part computationally accessible • Might be contradictory scheme • “Primary paper” • “Review article”
  • 8. Five types of curation
  • 9. Experimental Data Entry • Intact – Protein:Protein interactions • GWAS Catalog – extraction of summary statistics
  • 10. Experimental Meta data capture • Sample, CDS lines in ENA • Sample in Metabolights, PRIDE etc • Machine and analysis specification in PDB, PRIDE, ENA
  • 11. Consensus integration of information • GenCode gene models in human • Summaries and GO assignment in UniProt • Pathway information in Reactome • GO assignment and summaries in MODs (eg, PomBase, WormBase, PhytoPathDB etc)
  • 12. Knowledge frameworks • The EC classification • Cell type ontologies • Cell lineages – Worms! • SnowMed, HPO etc • GO ontologies
  • 13. Knowledge management • Creation of rules representing ENA standards compliance • Cross-ontology coordination (eg, EFO) or tieing (GO  ChEBI) • RuleBase / UniRule curation processes
  • 14. Data Entry vs Programming Direct Programmatic Data Entry Data Entry “Messy” Scripting Improved Data entry tools RuleBase, Computational Accessible Standards
  • 16. Curation Dilema • If you do your job well… • If you do your job badly… • Everyone assumes it’s • Everyone assumes it’s easy easy • People forget about the • People forget about the complexity complexity • You are ignored  • People complain 
  • 17. Why we need an infrastructure…
  • 19. But we only notice them when they go wrong
  • 20. Biology already needs an information infrastructure • For the human genome • (…and the mouse, and the rat, and… x 150 now, 1000 in the future!) - Ensembl • For the function of genes and proteins • For all genes, in text and computational – UniProt and GO • For all 3D structures • To understand how proteins work – PDBe • For where things are expressed • The differences and functionality of cells - Atlas
  • 21. ..But this keeps on going… • We have to scale across all of (interesting) life • There are a lot of species out there! • We have to handle new areas, in particular medicine • A set of European haplotypes for good imputation • A set of actionable variants in germline and cancers • We have to improve our chemical understanding • Of biological chemicals • Of chemicals which interfere with Biology
  • 22. ELIXIR’s mission To build a sustainable European infrastructure for biological information, supporting life science research and its medicine translation to: environment bioindustries society 22
  • 23. How? Fully Centralised Fully Distributed Pros: Stability, reuse, Pros: Responsive, Geographic Learning ease Language responsive Cons: Hard to concentrate Cons: Internal communication overhead Expertise across of life science Harder for end users to learn Geographic, language placement Harder to provide multi-decade stability Bottlenecks and lack of diversity
  • 24. Research Healthcare International National EBI / Elixir Healthcare English National Language Low legalities Complex legalities 2
  • 25. Other infrastructures needed for biology • EuroBioImaging • Cellular and whole organism Imaging • BioBanks (BBMRI) • We need numbers – European populations – in particular for rare diseases, but also for specific sub types of common disease • Mouse models and phenotypes (Infrafrontier) • A baseline set of knockouts and phenotypes in our most tractable mammalian model • (it’s hard to prove something in human) • Robust molecular assays in a clinical setting (EATRIS) • The ability to reliably use state of the art molecular techniques in a clinical research setting
  • 26. (you can follow me on twitter @ewanbirney) I blog and update this on Google Plus publically