T H E W O R L D O F
B I O C U R AT I O N
O P T I M I Z I N G I T S I M PA C T
April 7, 2014—Seventh International Biocurat...
S O M E O N E W H O I S R E S P O N S I B L E F O R T H E
C A R E A N D S U P E R V I S I O N O F B I O L O G I C A L
K N ...
W H AT D O B I O C U R AT O R S D O T O D AY ?
• Credits to Kaveh Bazargan ᔥ
• @kaveh1000
F R U I T I N F O O D
P R O C E S S O R
S M O O T H I E
R E S E A R C H
R E S E A R C H I N
W O R D P R O C E S S O R
P D F
F R U I T ? ?
R E S E A R C H ? ?
?
R E S E A R C H ? ?
Y O U , T H E
B I O C U R AT O R
B I O C U R AT O R S O F T H E W O R L D U N I T E !
• You have nothing to lose but your PDF files
!
!
X
O U R R O L E I N T H E
R E S E A R C H L I F E C Y C L E
T H E W O R L D O F B I O C U R A T I O N
http://www.nbcnews.com/id/49258816/ns/technology_and_science-science/t/live-concert-microbial-data-turned-song-lab/#.UzSB9...
http://www.nbcnews.com/id/49258816/ns/technology_and_science-science/t/live-concert-microbial-data-turned-song-lab/#.UzSB9...
http://www.langdonbiology.org/AP/labs/Notebook/AP_notebook.htm
C O L L E C T I N G D ATA
Thomas Nast - http://www.victorianweb.org/art/illustration/nast/51.jpg
W R I T I N G
U P
R E S U LT S
http://rrresearch.fieldofscience.com/2012_02_01_archive.html
R E V I E W I N G
C O N C L U S I O N S
C A P T U R I N G
K N O W L E D G E
I S B
C A P T U R I N G
K N O W L E D G E
D E S I G N I N G E X P E R I M E N T S C O L L E C T I N G D ATA
R E V I E W I ...
~ 3 0 0 B I O C U R A T O R S
B I O C U R AT I O N I N V E R S I O N
D E S I G N I N G
E X P E R I M E N T S
C O L L E C T...
I N T H E L A B
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologi...
S U P P O R T S TA N D A R D S , T H E Y ’ R E O U R
F R I E N D
• November, 1999
• 45 biologists
• 14 days
• 140 megabase...
Q U E S T F O R
O R T H O L O G S
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
a...
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
a...
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
a...
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
a...
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologi...
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologi...
K N O C K O U T
M O U S E
P R O J E C T 2
• Broad standardized phenotyping of knockout mice on a
standard genetic backgrou...
K N O C K O U T
M O U S E
P R O J E C T 2
• Broad standardized phenotyping of knockout mice on a
standard genetic backgrou...
P R O T O C O L S A R E S TA N D A R D I Z E D
R E Q U I R E U S E O F PA R T I C U L A R O N T O L O G Y
T E R M S T O D ...
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologi...
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologi...
S TA N D A R D S T H R O U G H U T I L I T Y —
A P O L L O
C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T...
S TA N D A R D S T H R O U G H U T I L I T Y —
A P O L L O
C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T...
T O O L S F O R T H E C O M M U N I T Y
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-...
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-...
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-...
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-...
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologi...
S U B M I S S I O N
• CANTO: curation.pombase.org
• Structured Digital Abstracts
• Identifiers for all named genes, proteins, metabolites or o...
P U B L I S H I N G
P U B L I S H I N G
P U B L I S H I N G
• First there were letters
P U B L I S H I N G
• First there were letters
• Then Henry Oldenburg created the first scientific journal in 1665
P U B L I S H I N G
• First there were letters
• Then Henry Oldenburg created the first scientific journal in 1665
• Resul...
P U B L I S H I N G
• First there were letters
• Then Henry Oldenburg created the first scientific journal in 1665
• Resul...
P E E R A N D E D I T O R I A L
R E V I E W B E C A M E A F I LT E R
C O N S E Q U E N T LY …
• Figshare: figshare.org
• iDigBio: www.idigbio.org
• Dryad: datadryad.org
• eLife: www.elifesciences.org
• Unlike journal...
D O W E N E E D T O
C U R AT E ?
S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M .
N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C ...
D O W E N E E D T O C U R AT E ?
• Resolution of differences
• Clarity, eliminating noise
• Validation & design of automat...
E V E N A P L A C E L I K E G O O G L E U S E S
C U R AT O R S ( * A N D S O F T WA R E )
• Hundreds of operators per coun...
D O W E N E E D T O C U R AT E ?
• Resolution of differences
• Clarity, eliminating noise
• Validation & design of automat...
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
• Much of this information comes
from Freebase w...
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
• Much of this information comes
from Freebase w...
D O W E N E E D T O C U R AT E ?
• Resolution of differences
• Clarity, eliminating noise
• Validation & design of automat...
• PDF is still the dominant form of distribution
• PDF “Annotation”
• UTOPIA, www.utopiadocs.com
• DOMEO, swan.mindinforma...
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
Write/modify
software
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
Run the algorithm
Write/modify
software
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
Run the algorithm
Write/modify
software
Evaluate re...
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
• Requires trusted reference datasets!
Run the algo...
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
• Requires trusted reference datasets!
• Biocurator...
S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M .
N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C ...
T H E PA R A B L E O F G O O G L E F L U : T R A P S I N B I G D ATA
A N A LY S I S . D AV I D L A Z E R E T A L . S C I E...
D O W E N E E D T O C U R AT E ?
• Yes
!
!
!
!
D O W E N E E D T O C U R AT E ?
• Yes
!
!
!
!
• But…
S Y S T E M AT I C R E V I E W &
C R I T I C I S M I S R E Q U I R E D
O U R S T R E N G T H I S I N Q U A L I T Y O F T H...
C U S I C K , M . , E T A L . L I T E R AT U R E - C U R AT E D P R O T E I N
I N T E R A C T I O N D ATA S E T S
N AT M E...
G R E E N B E R G , S . , H O W C I TAT I O N D I S T O R T I O N S C R E AT E U N F O U N D E D
A U T H O R I T Y: A N A ...
W E ' R E R E S P O N S I B L E F O R T H E Q U A L I T Y
• “Reviewing the quality of the data is an obligation of
any ent...
PA I N T A P O P T O S I S - S U M M A RY
• 52 families annotated: 

- 8 were par$cipants in execution phase of apoptosis;...
Example 1: Protein (cytochrome c) upstream of
apoptosis execution
Cytochrome c is directly involved in apoptotic DNA fragm...
Example 1: Protein (cytochrome c) upstream of
apoptosis execution
Cytochrome c is directly involved in apoptotic DNA fragm...
Example 1: Protein (cytochrome c) upstream of
apoptosis execution
Cytochrome c is directly involved in apoptotic DNA fragm...
Example 2: Phenotype of reduced cell survival and
increased DNA fragmentation
• E3 ubiquitin-protein ligase TRAF7

was ann...
Example 3: Target
DSG2 was annotated to execution phase of
apoptosis
Example 3: Target
DSG2 was annotated to execution phase of
apoptosis
Example 3: Target
DSG2 was annotated to execution phase of
apoptosis
DSG2 is a *target* of a protease (caspase), and
altho...
P R O V E T H E N E E D F O R B I O C U R AT I O N
• Publish: Quantitative improvements before/after
• Publish: Curator co...
R E C O G N I T I O N & C R E D I T
O R C I D . O R G
E N A B L I N G
R E S E A R C H
W H AT I S A B I O C U R AT O R ?
W H AT I S A B I O C U R AT O R ?
W H AT I S A B I O C U R AT O R ?
W H AT I S A B I O C U R AT O R ?
• A highly skilled and trained keeper of our biological
heritage of knowledge.
W H AT I S A B I O C U R AT O R ?
• A highly skilled and trained keeper of our biological
heritage of knowledge.
• A conte...
W H AT I S A B I O C U R AT O R ?
• A highly skilled and trained keeper of our biological
heritage of knowledge.
• A conte...
 B6.Cg-­‐Alms1foz/fox/J
increased	
  weight,	
  
adipose	
  tissue	
  volume,	
  	
  
glucose	
  homeostasis	
  altered
AL...
 B6.Cg-­‐Alms1foz/fox/J
increased	
  weight,	
  
adipose	
  tissue	
  volume,	
  	
  
glucose	
  homeostasis	
  altered
GE...
R E S E A R C H R E S O U R C E S
Doelken S C et al. Dis. Model.
Mech. 2013;6:358-372
Smedley D et al. Database. 2013; bat025
Mungall CJ et al. Genome Biol. 2010; 11(1):R2
Washington N et al. Plos Biol 2009; ...
CANDIDATE GENE PRIORITIZATION
PHENOTYPIC INTERPRETATION OF VARIANTS IN EXOMES (PHIVE)
Whole exome
Remove off-target and
common variants
Variant score
fr...
C O N F I R M E D D I A G N O S E S
• Infantile Parkinsonism-dystonia
• Wiedemann Steiner syndrome
• de novo SYNGAP1 mutat...
R E L AT E D N E S S A C R O S S B I O L O G Y
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current bes...
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current bes...
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current bes...
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current bes...
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current bes...
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
B I O D I V E R S I T Y D ATA J O U R N A L
B I O D I V E R S I T Y D ATA J O U R N A L
B I O D I V E R S I T Y D ATA J O U R N A L
F R O M W R I T I N G , S U B M I S S I O N , P E E R - R E V I E W, E D I T I...
W H AT C A N I S B D O ?
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
• Create...
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
• Create...
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
• Create...
W H AT C A N Y O U D O ?
• Consider
• The ease of finding information
• Its relatedness to other information
• Its researc...
R E S E A R C H ? ?
Y O U , T H E
B I O C U R AT O R
I S B
A C K N O W L E D G E M E N T S A N D T H A N K S
Y O U A R E N O T A L O N E
Lewis isb 7 april2014
Upcoming SlideShare
Loading in …5
×

Lewis isb 7 april2014

223 views
164 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
223
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Lewis isb 7 april2014

  1. 1. T H E W O R L D O F B I O C U R AT I O N O P T I M I Z I N G I T S I M PA C T April 7, 2014—Seventh International Biocuration Conference
  2. 2. S O M E O N E W H O I S R E S P O N S I B L E F O R T H E C A R E A N D S U P E R V I S I O N O F B I O L O G I C A L K N O W L E D G E R E S O U R C E S A N D T H E I R U S E W H A T I S A B I O C U R A T O R ?
  3. 3. W H AT D O B I O C U R AT O R S D O T O D AY ? • Credits to Kaveh Bazargan ᔥ • @kaveh1000
  4. 4. F R U I T I N F O O D P R O C E S S O R
  5. 5. S M O O T H I E
  6. 6. R E S E A R C H
  7. 7. R E S E A R C H I N W O R D P R O C E S S O R
  8. 8. P D F
  9. 9. F R U I T ? ?
  10. 10. R E S E A R C H ? ? ?
  11. 11. R E S E A R C H ? ? Y O U , T H E B I O C U R AT O R
  12. 12. B I O C U R AT O R S O F T H E W O R L D U N I T E ! • You have nothing to lose but your PDF files ! ! X
  13. 13. O U R R O L E I N T H E R E S E A R C H L I F E C Y C L E T H E W O R L D O F B I O C U R A T I O N
  14. 14. http://www.nbcnews.com/id/49258816/ns/technology_and_science-science/t/live-concert-microbial-data-turned-song-lab/#.UzSB9ceT4_E D E S I G N I N G E X P E R I M E N T S
  15. 15. http://www.nbcnews.com/id/49258816/ns/technology_and_science-science/t/live-concert-microbial-data-turned-song-lab/#.UzSB9ceT4_E D E S I G N I N G E X P E R I M E N T S
  16. 16. http://www.langdonbiology.org/AP/labs/Notebook/AP_notebook.htm C O L L E C T I N G D ATA
  17. 17. Thomas Nast - http://www.victorianweb.org/art/illustration/nast/51.jpg W R I T I N G U P R E S U LT S
  18. 18. http://rrresearch.fieldofscience.com/2012_02_01_archive.html R E V I E W I N G C O N C L U S I O N S
  19. 19. C A P T U R I N G K N O W L E D G E
  20. 20. I S B C A P T U R I N G K N O W L E D G E D E S I G N I N G E X P E R I M E N T S C O L L E C T I N G D ATA R E V I E W I N G C O N C L U S I O N S W R I T I N G U P R E S U LT S
  21. 21. ~ 3 0 0 B I O C U R A T O R S B I O C U R AT I O N I N V E R S I O N D E S I G N I N G E X P E R I M E N T S C O L L E C T I N G D ATA W R I T I N G U P R E S U LT S R E V I E W I N G C O N C L U S I O N S C A P T U R I N G K N O W L E D G E http://www.nsf.gov/statistics/nsf13331/pdf/nsf13331.pdf H U N D R E D S O F T H O U S A N D S O F G R A D S T U D E N T S P O S T- D O C S L A B O R AT O R I E S J O U R N A L S
  22. 22. I N T H E L A B
  23. 23. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats
  24. 24. S U P P O R T S TA N D A R D S , T H E Y ’ R E O U R F R I E N D • November, 1999 • 45 biologists • 14 days • 140 megabases of Drosophila genome ! • Published in March 2000 G E N E O N T O L O G Y, E T A L .
  25. 25. Q U E S T F O R O R T H O L O G S questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  26. 26. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  27. 27. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  28. 28. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology • Joint benchmarking effort questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  29. 29. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology • Joint benchmarking effort • Only possible through the use of shared reference proteomes and formats questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  30. 30. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology • Joint benchmarking effort • Only possible through the use of shared reference proteomes and formats questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  31. 31. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Develop and follow guidelines (paper and web-based) • e.g. Gaudet, P., et al. Towards BioDBcore: a community-defined information specification for biological databases. Database 2011. PMCID: PMC3017395 • Resource Identification Initiative • www.force11.org/Resource_identification_initiative • Vasilevsky NA, et al. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ. 2013 Sep 5;1:e148. doi: 10.7717/peerj.148. PubMed PMID: 24032093; PubMed Central PMCID: PMC3771067.
  32. 32. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment
  33. 33. K N O C K O U T M O U S E P R O J E C T 2 • Broad standardized phenotyping of knockout mice on a standard genetic background • Data collection from many centres • www.mousephenotype.org
  34. 34. K N O C K O U T M O U S E P R O J E C T 2 • Broad standardized phenotyping of knockout mice on a standard genetic background • Data collection from many centres • www.mousephenotype.org Cindy Smith
  35. 35. P R O T O C O L S A R E S TA N D A R D I Z E D R E Q U I R E U S E O F PA R T I C U L A R O N T O L O G Y T E R M S T O D E S C R I B E P H E N O T Y P E
  36. 36. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment • Work with labs to embed standards into their data generation pipeline
  37. 37. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment • Stealth standards
  38. 38. S TA N D A R D S T H R O U G H U T I L I T Y — A P O L L O C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T. O R G
  39. 39. S TA N D A R D S T H R O U G H U T I L I T Y — A P O L L O C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T. O R G
  40. 40. T O O L S F O R T H E C O M M U N I T Y
  41. 41. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access
  42. 42. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration
  43. 43. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration • Built-in support for standards (transparently compliant)
  44. 44. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration • Built-in support for standards (transparently compliant) • Automatic generation of ready-made computable data
  45. 45. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration • Built-in support for standards (transparently compliant) • Automatic generation of ready-made computable data • Client-side application relieves server bottleneck and supports privacy
  46. 46. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment • Stealth standards • Re-purpose internal curation tools for external users • Provide on-line documentation, hands-on training and rapid-response user help • Work with educators to make these tools an integral part of the curriculum • e.g. CACAO (Critical Assessment of Community Annotation using Ontologies), ecoliwiki.net/colipedia/index.php/CACAO_0.1 • DNA subway (Apollo)
  47. 47. S U B M I S S I O N
  48. 48. • CANTO: curation.pombase.org • Structured Digital Abstracts • Identifiers for all named genes, proteins, metabolites or other objects in the article • Main results described in simple ontology terms • Experimental evidence types • Not only a synopsis of the results but computer-readable • Gerstein, M., et al. Structured digital abstract makes text mining easy. Nature 447, 142 (10 May 2007) | doi:10.1038/447142a. • Minimal Information reporting guidelines • http://mibbi.sourceforge.net/portal.shtml S U B M I T T I N G D ATA — I N A S T R U C T U R E D WAY
  49. 49. P U B L I S H I N G
  50. 50. P U B L I S H I N G
  51. 51. P U B L I S H I N G • First there were letters
  52. 52. P U B L I S H I N G • First there were letters • Then Henry Oldenburg created the first scientific journal in 1665
  53. 53. P U B L I S H I N G • First there were letters • Then Henry Oldenburg created the first scientific journal in 1665 • Result: too much to absorb
  54. 54. P U B L I S H I N G • First there were letters • Then Henry Oldenburg created the first scientific journal in 1665 • Result: too much to absorb Washed away on the sea of information
  55. 55. P E E R A N D E D I T O R I A L R E V I E W B E C A M E A F I LT E R C O N S E Q U E N T LY …
  56. 56. • Figshare: figshare.org • iDigBio: www.idigbio.org • Dryad: datadryad.org • eLife: www.elifesciences.org • Unlike journal articles, the scale of web-native publishing may overwhelm attempts at manual curation (using current strategies) T H E M E D I U M O F P U B L I C AT I O N I S C H A N G I N G
  57. 57. D O W E N E E D T O C U R AT E ?
  58. 58. S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M . N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C H 2 0 1 4 ) “…powerful, online filters will distill communities impact judgements algorithmically” S O M E S AY N O …
  59. 59. D O W E N E E D T O C U R AT E ? • Resolution of differences • Clarity, eliminating noise • Validation & design of automated methods
  60. 60. E V E N A P L A C E L I K E G O O G L E U S E S C U R AT O R S ( * A N D S O F T WA R E ) • Hundreds of operators per country • Multiple kinds of errors: overlapping jurisdictions, accidental merges, road maps to satellite images mismatch, etc. • Every road that you see has been hand-massaged ! ! http://www.theatlantic.com/technology/archive/2012/09/how-google-builds-its-maps-and-what-it-means-for-the-future-of-everything/ 261913/
  61. 61. D O W E N E E D T O C U R AT E ? • Resolution of differences • Clarity, eliminating noise • Validation & design of automated methods
  62. 62. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! !
  63. 63. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! !
  64. 64. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! !
  65. 65. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! ! • Much of this information comes from Freebase which is structured in terms of entities and properties
  66. 66. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! ! • Much of this information comes from Freebase which is structured in terms of entities and properties Robert West, et al. Knowledge Base Completion via Search-Based Question Answering. http://www.cs.ubc.ca/~murphyk/Papers/www14.pdf WWW’14 April 7–11, 2014, Seoul, Korea. ACM 978-1-4503-2744-2/14/04. DOI:2568032
  67. 67. D O W E N E E D T O C U R AT E ? • Resolution of differences • Clarity, eliminating noise • Validation & design of automated methods
  68. 68. • PDF is still the dominant form of distribution • PDF “Annotation” • UTOPIA, www.utopiadocs.com • DOMEO, swan.mindinformatics.org • Textpresso, www.textpresso.org • All of these are still lacking domain specifics (or need to be taught) • FORCE11, www.force11.org • Common goal is advancing scientific communications • Beyond the PDF L I T E R AT U R E I S I N F O R M AT I V E B U T I S N O T I N F O R M AT I O N X
  69. 69. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S
  70. 70. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S
  71. 71. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S Write/modify software
  72. 72. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S Run the algorithm Write/modify software
  73. 73. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S Run the algorithm Write/modify software Evaluate results
  74. 74. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S • Requires trusted reference datasets! Run the algorithm Write/modify software Evaluate results
  75. 75. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S • Requires trusted reference datasets! • Biocurators are partners with developers! Run the algorithm Write/modify software Evaluate results
  76. 76. S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M . N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C H 2 0 1 4 ) “…powerful, online filters will distill communities impact judgements algorithmically” D O W E N E E D T O C U R AT E ?
  77. 77. T H E PA R A B L E O F G O O G L E F L U : T R A P S I N B I G D ATA A N A LY S I S . D AV I D L A Z E R E T A L . S C I E N C E 1 4 M A R C H 2 0 1 4 : V O L . 3 4 3 N O . 6 1 7 6 P P. 1 2 0 3 - 1 2 0 5 “‘Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.” D O W E N E E D T O C U R AT E ?
  78. 78. D O W E N E E D T O C U R AT E ? • Yes ! ! ! !
  79. 79. D O W E N E E D T O C U R AT E ? • Yes ! ! ! ! • But…
  80. 80. S Y S T E M AT I C R E V I E W & C R I T I C I S M I S R E Q U I R E D O U R S T R E N G T H I S I N Q U A L I T Y O F T H E I N F O R M A T I O N W E C A N P R O V I D E
  81. 81. C U S I C K , M . , E T A L . L I T E R AT U R E - C U R AT E D P R O T E I N I N T E R A C T I O N D ATA S E T S N AT M E T H O D S . J A N 2 0 0 9 ; 6 ( 1 ) : 3 9 – 4 6 . P M C I D : P M C 2 6 8 3 7 4 5 “…literature curated datasets have inherent reliability difficulties…” H O W C A N B I O C U R AT O R S A D D R E S S C R I T I C I S M S ?
  82. 82. G R E E N B E R G , S . , H O W C I TAT I O N D I S T O R T I O N S C R E AT E U N F O U N D E D A U T H O R I T Y: A N A LY S I S O F A C I TAT I O N N E T W O R K B M J J U LY 2 0 0 9 ; 3 3 9 D O I : H T T P : / / D X . D O I . O R G / 1 0 . 1 1 3 6 / T H E R I S K ( B Y A N A L O G Y ) 56
  83. 83. W E ' R E R E S P O N S I B L E F O R T H E Q U A L I T Y • “Reviewing the quality of the data is an obligation of any entity that assumes responsibility over the data.” • Limor Peer et al., IDCC 2014
  84. 84. PA I N T A P O P T O S I S - S U M M A RY • 52 families annotated: 
 - 8 were par$cipants in execution phase of apoptosis; • 44 others are either: A. upstream  of  apoptosis     B. phenotypes   C. targets

  85. 85. Example 1: Protein (cytochrome c) upstream of apoptosis execution Cytochrome c is directly involved in apoptotic DNA fragmentation
  86. 86. Example 1: Protein (cytochrome c) upstream of apoptosis execution Cytochrome c is directly involved in apoptotic DNA fragmentation ➢ [Cells] – [cytochrome c] = No apoptotic DNA fragmentation
  87. 87. Example 1: Protein (cytochrome c) upstream of apoptosis execution Cytochrome c is directly involved in apoptotic DNA fragmentation ➢ [Cells] – [cytochrome c] = No apoptotic DNA fragmentation ➢ [Cells] – [cytochrome c] + [cytochrome c] = apoptotic DNA fragmentation
  88. 88. Example 2: Phenotype of reduced cell survival and increased DNA fragmentation • E3 ubiquitin-protein ligase TRAF7
 was annotated to execution phase of apoptosis ➢ Exogenous expression of TRAF7 ➢ No other data in terms of where in apoptosis this may be. ! ➢ All we know is altering TRAF7 levels affects apoptosis.
  89. 89. Example 3: Target DSG2 was annotated to execution phase of apoptosis
  90. 90. Example 3: Target DSG2 was annotated to execution phase of apoptosis
  91. 91. Example 3: Target DSG2 was annotated to execution phase of apoptosis DSG2 is a *target* of a protease (caspase), and although its degradation indeed seems to be a part of apoptosis it does not *mediate* apoptosis.
  92. 92. P R O V E T H E N E E D F O R B I O C U R AT I O N • Publish: Quantitative improvements before/after • Publish: Curator consistency studies • Publish: Independent external reviews
  93. 93. R E C O G N I T I O N & C R E D I T O R C I D . O R G
  94. 94. E N A B L I N G R E S E A R C H
  95. 95. W H AT I S A B I O C U R AT O R ?
  96. 96. W H AT I S A B I O C U R AT O R ?
  97. 97. W H AT I S A B I O C U R AT O R ?
  98. 98. W H AT I S A B I O C U R AT O R ? • A highly skilled and trained keeper of our biological heritage of knowledge.
  99. 99. W H AT I S A B I O C U R AT O R ? • A highly skilled and trained keeper of our biological heritage of knowledge. • A content specialist who understands the research and can succinctly distill biological research results into computable data
  100. 100. W H AT I S A B I O C U R AT O R ? • A highly skilled and trained keeper of our biological heritage of knowledge. • A content specialist who understands the research and can succinctly distill biological research results into computable data • Considers the ease of finding this information, its relatedness to other information, and its research and educational usability
  101. 101.  B6.Cg-­‐Alms1foz/fox/J increased  weight,   adipose  tissue  volume,     glucose  homeostasis  altered ALSM1(NM_015120.4)   [c.10775delC]  +  [-­‐] GENOTYPE PHENOTYPE obesity,   diabetes  mellitus,    insulin  resistance increased  food  intake,     hyperglycemia,   insulin  resistance kcnj11c14/c14;  insrt143/+(AB) M O D E L S R E C A P I T U L AT E VA R I O U S P H E N O T Y P I C A S P E C T S O F D I S E A S E
  102. 102.  B6.Cg-­‐Alms1foz/fox/J increased  weight,   adipose  tissue  volume,     glucose  homeostasis  altered GENOTYPE PHENOTYPE obesity,   diabetes  mellitus,    insulin  resistance increased  food  intake,     hyperglycemia,   insulin  resistance kcnj11c14/c14;  insrt143/+(AB) M O D E L S R E C A P I T U L AT E VA R I O U S P H E N O T Y P I C A S P E C T S O F D I S E A S E ?
  103. 103. R E S E A R C H R E S O U R C E S Doelken S C et al. Dis. Model. Mech. 2013;6:358-372
  104. 104. Smedley D et al. Database. 2013; bat025 Mungall CJ et al. Genome Biol. 2010; 11(1):R2 Washington N et al. Plos Biol 2009; e1000247 C R O S S - S P E C I E S P H E N O T Y P E C O M PA R I S O N S 
 B Y S E M A N T I C S I M I L A R I T Y
  105. 105. CANDIDATE GENE PRIORITIZATION
  106. 106. PHENOTYPIC INTERPRETATION OF VARIANTS IN EXOMES (PHIVE) Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity PhenIX/PhIVE score to give final candidates http://monarchinitiative.org  
  107. 107. C O N F I R M E D D I A G N O S E S • Infantile Parkinsonism-dystonia • Wiedemann Steiner syndrome • de novo SYNGAP1 mutation leading autosomal dominant mental retardation • Frank-ter Haar syndrome • Infantile hypophosphatasia • … (~28%)
  108. 108. R E L AT E D N E S S A C R O S S B I O L O G Y
  109. 109. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding
  110. 110. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability
  111. 111. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability • Support research and educational usability
  112. 112. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability • Support research and educational usability • Support inference
  113. 113. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability • Support research and educational usability • Support inference • Not just for supporting searches, not just for finding PDF/online papers!
  114. 114. W H AT C A N B E D O N E ?
  115. 115. W H AT C A N B E D O N E ?
  116. 116. W H AT C A N B E D O N E ?
  117. 117. W H AT C A N B E D O N E ?
  118. 118. W H AT C A N B E D O N E ?
  119. 119. B I O D I V E R S I T Y D ATA J O U R N A L
  120. 120. B I O D I V E R S I T Y D ATA J O U R N A L
  121. 121. B I O D I V E R S I T Y D ATA J O U R N A L F R O M W R I T I N G , S U B M I S S I O N , P E E R - R E V I E W, E D I T I N G , P U B L I C AT I O N T O D I S S E M I N AT I O N !
  122. 122. W H AT C A N I S B D O ?
  123. 123. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators …
  124. 124. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators … • Create a curation mindset across the entire life cycle • Support embedded/repurposed software, education, actively engage with text-miners, provide on-line support …
  125. 125. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators … • Create a curation mindset across the entire life cycle • Support embedded/repurposed software, education, actively engage with text-miners, provide on-line support … • Prove the necessity for curation • Publish studies, greater emphasis on review and quality (assessment)
  126. 126. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators … • Create a curation mindset across the entire life cycle • Support embedded/repurposed software, education, actively engage with text-miners, provide on-line support … • Prove the necessity for curation • Publish studies, greater emphasis on review and quality (assessment) • Work with traditional publishers • FORCE11, structured submissions
  127. 127. W H AT C A N Y O U D O ? • Consider • The ease of finding information • Its relatedness to other information • Its research and educational usability
  128. 128. R E S E A R C H ? ? Y O U , T H E B I O C U R AT O R I S B
  129. 129. A C K N O W L E D G E M E N T S A N D T H A N K S Y O U A R E N O T A L O N E

×