3. Metadata formats are the new
data formats
• There are scores of bio-metadata efforts that
are not overlapping but actually kinda are
14 September, 2017ICBO 2017
BIOCONTAINERS
4. Turns out I was wrong
• Last few years have only strengthened my
view I was wrong
• In 2012 ICBO I was invited to sit on the
discussion panel
• Competition turns out not be good thing –
the bad stuff never goes away
14 September, 2017ICBO 2017
5. Ontology developers need users more
than users need ontology developers
Bad Good
Sectarianism Competition
Subjective discrimination Scientific evidence
Telling users they’re doing it wrong Listening to users
Alienating communities Encouraging discussion (even
when we don’t agree)
Philosophy Philosophy
Pepsi Coca Cola
THE GOOD, THE BAD, AND THE FUNDING
My ICBO 2012 panel slide
6. Bad Good
Mappings between standards Standards
Schisms in clinical data world Clinical adoption of ontologies
High barriers to entry Helping share open data
Ownership credit Community credit
Complexities in applying
common framework*
Common framework
Talks that don’t mention
metadata standards
Talks about metadata
standards
My ICBO 2017 version
Clinic icon by ibrandify / Freepik
7. Past is prologue
• 1960 – MeSH first published
• 1965 – SNOMED born
• 1969 – ICD6 published
• 1994 – ICD10 came into use by WHO
• 1997 – Enterprise Vocabulary Services (EVS) Project was launched
• 1997 – Skynet becomes self-aware
• 1998 – Gene Ontology born
• 1999 – SNOMED-CT born
• 2001 – OBO born
• 2003 – NCI Thesaurus OWL published
• 2003 – Human Disease Ontology (then just disease ontology) born
• 2004 – Mammalian Phenotype ontology released
• 2004 – First ChEBI released
• 2005 – NCBO becomes NIH National Centre for Biomedical Computing
• c2006 – OBO Foundry established
• 2008 – First EFO development
• 2009 – First ICBO
• 2009-2017 – Rapid growth of new OBO bio-ontologies from 60 to 155
• 2017 – 20th Bio-ontologies meet at ISMB
14 September, 2017ICBO 2017
8. Historical Perspective
14 September, 2017ICBO 2017
Bodenreider and Robert Stevens (2007) Bio-ontologies: current trends and future
directions. Brief Bioinform. 2006 Sep; 7(3): 256–274.
11. Have we won yet?
• How do we measure success?
• The Gene Ontology is now seen as an everyday ‘tool’
– Embedded in many analysis tools
– Literature mining, tagging articles
– Gene set enrichment
• It has penetration: I have many conversations with
clinicians talking about GO Codes or enrichment but
not nothing about ‘ontologies’
• It has no competitor
• Is a de facto standard for gene function annotation
14 September, 2017ICBO 2017
12. HPO not won yet..
• …but is probably 2-0 up at half-time
• HPO being adopted by major clinical projects
• Opportunity remains for other bio-ontologies..
14 September, 2017ICBO 2017
13. What about biggest areas?
14 September, 2017ICBO 2017
https://report.nih.gov/categorical_spending.aspx
$M
$1,000M
$2,000M
$3,000M
$4,000M
$5,000M
$6,000M
$7,000M
$8,000M
Neurosciences Cancer Infectious
Diseases
Women's
Health
Brain Disorders Rare Diseases Pediatric Aging
FY 2013Actual
FY 2014Actual
FY 2015Actual
FY 2016Actual
FY 2017Estimated
FY 2018Estimated
Spendinginmillions
• Disease remains huge spend area
14. “Which disease standard should
I use?”
Clinical
14 September, 2017ICBO 2017
Disease
Disease
Disease
Disease
Disease
Disease
Disease
Disease
Disease
Disease
17. With modest grant funding, what Open
Bio-ontologies have done is remarkable
14 September, 2017ICBO 2017
• Protégé and OBO-Edit are excellent tools
• These are organisations backing the Allotrope efforts in data models and
ontologies
• Think of what could be achieved if they invested in OBO
• Question: why has this happened?
18. Format Wars
• “A format war describes competition between
mutually incompatible proprietary formats that
compete for the same market” (Wikipedia)
• What we think of as ‘best’ technically may not
be what the users think of as ‘best’
14 September, 2017ICBO 2017
• Better quality
picture and sound,
seen as less
important than
length of storage
• Large and
unwieldy, prone
to damage if not
handled carefully
• Supporting tools
lacking,
improving slowly
19. Winning the war
• Crossing pre-clinical and clinical is possible -
HPO and GO prove this can happen
• But there is a schism in disease resources
• What can bio-ontologies offer that other
terminologies can not?
• What are our major challenges and how can
we overcome them?
• Where are our new opportunity areas?
14 September, 2017ICBO 2017
20. Lessons from getting people to
adopt ontologies
• EFO is an application ontology built for
application focused use cases
• It works because it looks like people’s data,
• i.e. it has a lot of common terms across
multiple areas
• EFO is really a ‘method’ for generating an
application ontology
• We need tools that replicate and enact this
method for applications
14 September, 2017ICBO 2017
EFO knows
words, it has the
best words
21. Tech lessons from EFO
• Testing works – ROBOT tool now
invaluable
• Diffs can be ‘informative’, gets us closer to
sort CI we see in soft development
14 September, 2017ICBO 2017
22. Tech lessons from EFO
• Combining lots of bits of ontologies
(especially manually) is complex and can
easily go wrong
14 September, 2017ICBO 2017
23. Application driven approach in
other areas
• Two approaches in Cellular Phenotype ontologies
• Familiar modes of access are key
14 September, 2017ICBO 2017
Fully automated GO x PATO User driven on request
24. Use helps – an authority saying
‘use this one’ helps more
Open Targets Platform www.opentargets.org
14 September, 2017ICBO 2017
25. Still plenty of existing challenges
- Engaging end users
Apr 2017Ontologies in Agriculture..
Where now?
26. We’re open – but requesting
terms can be challenging
Apr 2017Ontologies in Agriculture..
28. Exposing ontologies to users
• Class descriptions in
ontologies can be hard to
consume
• User should not see
that…
• …but how then can they
evaluate the fitness and
correctness of the class?
• Accessibility barrier – also
barrier to engagement?
ICBO 2017 14 September, 2017
29. How do we validate our models?
14 September, 2017ICBO 2017
• Validation by expert is crucial
• Integration tests?
30. Familiar modes of access (Part II)
- making consumption easier
Ontologies in Agriculture.. 14 September, 2017
31. Challenges - Self-organisation,
self-publishing
• June Nature article highlighted abuse in taxa creation
• Self-publication considered ‘enough’ for name to be accepted
• As bio-ontologists we are creating knowledge
• As it is adopted and used with data, it becomes ‘truth’
• Who is our independent governance body?
14 September, 2017ICBO 2017
32. Challenges - Mapping between
standards is not enough
ICBO 2017 14 September, 2017
• Error prone
• Ignores underlying mess
33. New challenges:
Self-reporting
• “Biologists” producing data will not be
scientists
• They will be patients, healthy subjects
• How do we get them to use ontologies?
• Simpler schema exist – schema.org
success is build on its simplicity
• Are we doomed to always ‘curate later’?
14 September, 2017ICBO 2017
34. New challenges: Supporting trust
14 September, 2017ICBO 2017
• What does fake data look like?
• Do we have a role in helping spot it?
35. New challenges: Surviving the AI
Bubble
• Lot of really exciting AI advances in last couple of
years
• In 1980s second “AI winter” hit – huge expectations
led to inevitable disillusionment
• One of biggest unsolved issue with ‘deep learning’
remains the opaque nature of decision making model
• “Why has the model made that decision?”
• This will become critical if AI is to penetrate
biomedicine
• Do we have a role as ontologists, semanticists,
curators?
14 September, 2017ICBO 2017
36. Old and new challenges: FAIR
Data
• FAIR principles are being touted wildly as the
signpost of good practice in open research
• Ontology use is being widely encouraged…
• ..but FAIR is agnostic to which
• In one sense this adds to our initial problems –
competing standards, schisms in metadata
• I’d like to see FAIR endorse some preferred
standards
• Cf. CDISC
14 September, 2017ICBO 2017
37. Standards and “Standards”
• Bio-ontologies are currently “standards”, which is to say they are de
facto standards but not really standards
• Ensuring people go to ‘source’ requires centres of authority
• We need to be able to answer the question ‘which ontology should I
use?’
• It is the single most asked question I get from academics and
industry alike
• There are many valid answers:
14 September, 2017ICBO 2017
It has the terms in you need
It is used by other people
you collaborate with
It’s the one your funders
mandate
It’s the one authorities
mandate
It integrates with other
ontologies
It is actively
developed
I started using an ontology
by mistake and now I’m
trapped - help me
38. Why use an ontology?
• “Rule 0 – Use a certified standard”
14 September, 2017ICBO 2017
39. Standards work
• De facto standards –
– Adoption by independent major organisations or
communities
– HMTL in early 90s
• De jure standard –
– Independently audited and verified
– Guarantee that level of quality is reached
– HTML after 1995
• Where would be if HTML had not become an official
standard for marking up web pages
14 September, 2017ICBO 2017
40. One OBO Standard, ontology
• OBO has been a huge success story as a
collaborative, community effort with little
direct funding
• OBO vision is a set of interconnected, non-
overlapping ontologies
• I think we’re close to that vision
• Time to think about OBO as one ontology
and not 50 separate ontologies
14 September, 2017ICBO 2017
42. 14 September, 2017
Summing up
• Bio-ontologies moving from de facto to full standard –
could we see an OBO ISO?
• Need to become more aggressive in pushing our bio-
ontologies into other use – CDISC, FAIR, Foundations
such as Allotrope
• Push for endorsement
• Pushing bio-ontologies into new application areas
• Open bio-ontologies should move from self-organised,
self-regulated to self-organised, independently-
regulated
• Appreciate how far this community has come and the
amazing work that has been achieved so far
43. Acknowledgements
Tony Stephenson
Nicholas Piano
Amy Tang
Richard Holland
14 September, 2017ICBO 2017
Robert Stevens
Simon Jupp
Chris Mungall
Melissa Haendel
Helen Parkinson
Phillip Lord
Anna Farne-Malone
Emma Hastings
Drashtti Vasant
Mélanie Courtot
Frank Gibson
Alan Ruttenberg