The Role of Ontology in the Era of Big Military Data
1. Distributed Common Ground System – Army
(DCGS-A)
Barry Smith
Director
National Center for Ontological Research
The Role of Ontology in the Era
of Big (Military) Data
1
2. Distributed Development of a
Shared Semantic Resource (SSR)
in support of US Army’s Distributed Common
Ground System Standard Cloud (DSC) initiative
with thanks to: Tanya Malyuta, Ron Rudnicki
Background materials: http://x.co/yYxN
2
4. Making data (re-)usable through
common controlled vocabularies
• Allow multiple databases to be treated as if
they were a single data source by eliminating
terminological redundancy in ways data are
described
– not ‘Person’, and ‘Human’, and ‘Human Being’, and
‘Pn’, and ‘HB’, but simply: person
• Allow development and use of common tools
and techniques, common training, single
validation of data, focused around
– semantic technology
– coordinated ontology development and use
4
5. Ontology =def.
• controlled vocabulary organized as a graph
• nodes in the graph are terms representing types
in reality
• each node is associated with definition and
synonyms
• edges in the graph represent well-defined
relations between these types
• the graph is structured hierarchically via subtype
relations
5
6. Ontologies
• computer-tractable representations of types
in specific areas of reality
• divided into more and less general
– upper = organizing ontologies, provide common
architecture and thus promote interoperability
– lower = domain ontologies, provide grounding in
reality
• reflecting top-down and bottom-up strategy
6
7. Success story in biomedicine
Goal: integration of biological and clinical data
– across different species
– across levels of granularity (organ,
organism, cell, molecule)
– across different perspectives (physical,
biological, clinical)
– within and across domains (growth, aging,
environment, genetic disease, toxicity …)
8
8. RELATION
TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
Anatomical
Entity
(FMA,
CARO)
Organ
Function
(FMP, CPRO) Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cellular
Component
(FMA, GO)
Cellular
Function
(GO)
MOLECULE
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
9
9. RELATION
TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OF
ORGANISMS
Family, Community,
Population
Organ
Function
(FMP, CPRO)
Population
Phenotype
Population
Process
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
Anatomical
Entity
(FMA,
CARO) Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cellular
Component
(FMA, GO)
Cellular
Function
(GO)
MOLECULE
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Molecular Process
(GO)
Population-level ontologies 10
10. RELATION
TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
Anatomical
Entity
(FMA,
CARO)
Organ
Function
(FMP, CPRO)
Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cellular
Component
(FMA, GO)
Cellular
Function
(GO)
MOLECULE
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Molecular Process
(GO)
Environment Ontology
Environment
Ontology
11
11. CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
Anatomical
Entity
(FMA,
CARO)
Organ
Function
(FMP, CPRO) Phenotypic
Quality
(PaTO)
Organism-Level
Process
(GO)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
Cellular
Component
(FMA, GO)
Cellular
Function
(GO)
Cellular Process
(GO)
MOLECULE
Molecule
(ChEBI, SO,
RNAO, PRO)
Molecular Function
(GO)
Molecular
Process
(GO)
rationale of OBO Foundry coverage
GRANULARITY
RELATION TO
TIME
12
12. OBO Foundry approach extended into
other domains
13
NIF Standard Neuroscience
Information Framework
ISF Ontologies Integrated Semantic
Framework
OGMS and Extensions Ontology for General
Medical Science
IDO Consortium Infectious Disease
Ontology
cROP Common Reference
Ontologies for Plants
19. The problem of joint / coalition operations
Fire
Support
LogisticsAir
Operations
Intelligence
Civil-Military
Operations
Targeting
Maneuver
&
Blue
Force
Tracking
23
20. US DoD Civil Affairs strategy for non-classified
information sharing
24
21. Ontologies / semantic technology
can help to solve this problem
Fire
Support
LogisticsAir
Operations
Intelligence
Civil-Military
Operations
Targetin
g
Maneuver
&
Blue Force
Tracking
25
22. But each community produces its own ontology,
this will merely create new, semantic siloes
Fire
Support
LogisticsAir
Operations
Intelligence
Civil-Military
Operations
Targeting
Maneuver
&
Blue
Force
Tracking
26
23. What we are doing to avoid the
problem of semantic siloes
Distributed Development of a Shared
Semantic Resource
Pilot testing to demonstrate feasibility
27
25. Semantic Enhancement
Annotation (tagging) of source data models using
terms from coordinated ontologies
– data remain in their original state (are treated at arms
length)
– tagged using interoperable ontologies created in tandem
– can be as complete as needed, lossless, long-lasting
because flexible and responsive
– big bang for buck – measurable benefit even from first
small investments
Coordination through shared governance and
training
29
26. Main challenge: Will it scale?
The problem of scalability turns on
• the ability to accommodate ever increasing
volumes and types of data and numbers of
users
• can we preserve coordination (consistency,
non-redundancy) as ever more domains
become involved?
• can we respond in agile fashion to ever
changing bodies of source data?
31
27. Strategy for agile ontology creation
• Identify or create carefully validated general
purpose plug-and-play reference ontology
modules for principal domains
• Develop a method whereby these reference
ontologies can be extended very easily to cope
with specific, local data through creation of
application ontologies
32
28. vehicle =def: an object used for
transporting people or goods
tractor =def: a vehicle that is used for
towing
crane =def: a vehicle that is used for
lifting and moving heavy objects
vehicle platform=def: means of providing
mobility to a vehicle
wheeled platform=def: a vehicle
platform that provides mobility through
the use of wheels
tracked platform=def: a vehicle
platform that provides mobility through
the use of continuous tracks
artillery vehicle = def. vehicle designed for
the transport of one or more artillery
weapons
wheeled tractor = def. a tractor that has a
wheeled platform
Russian wheeled tractor type T33 =
def. a wheeled tractor of type T33
manufactured in Russia
Ukrainian wheeled tractor type T33
= def. a wheeled tractor of type T33
manufactured in Ukraine
Reference Ontology Application Ontology
29. vehicle =def: an object used for
transporting people or goods
tractor =def: a vehicle that is
used for towing
crane =def: a vehicle that is used for
lifting and moving heavy objects
vehicle platform=def: means of providing
mobility to a vehicle
wheeled platform=def: a vehicle
platform that provides mobility through
the use of wheels
tracked platform=def: a vehicle
platform that provides mobility through
the use of continuous tracks
artillery vehicle = def. vehicle designed for
the transport of one or more artillery
weapons
wheeled tractor = def. a tractor that has a
wheeled platform
Russian wheeled tractor type T33 =
def. a wheeled tractor of type T33
manufactured in Russia
Ukrainian wheeled tractor
type T33 = def. a wheeled
tractor of type T33
manufactured in Ukraine
Reference Ontology Application Ontology
42. Infectious Disease Ontology (IDO)
IDO Core (Reference Ontology)
• General terms in the ID domain.
IDO Extensions (Application Ontologies)
• Disease-, host-, pathogen-specific.
• Developed by subject matter experts.
The hub-and-spokes strategy ensures that logical
content of IDO Core is automatically inherited by
the IDO Extensions
•
with thanks to Lindsay Cowell (University of Texas SW
Medical Center) and Albert Goldfain (Blue Highway, Inc.)
43. IDO Core
• Contains general terms in the ID domain:
– E.g., ‘colonization’, ‘pathogen’, ‘infection’
• A contract between IDO extension ontologies
and the datasets that use them.
• Intended to represent information along
several dimensions:
– biological scale (gene, cell, organ, organism, population)
– discipline (clinical, immunological, microbiological)
– organisms involved (host, pathogen, and vector types)
46. How IDO evolves: the case of Staph.
aureus
IDOCore
IDOSa
IDOHumanSa
IDORatSa
IDOStrep
IDORatStrep
IDOHumanStrep
IDOMRSa
IDOHumanBacterial
IDOAntibioticResistant
IDOMAL IDOHIV
HUB and
SPOKES:
Domain
ontologies
SEMI-LATTICE:
By subject matter
experts in different
communities of
interest.
IDOFLU
50. BWO:disease by infectious agent
= def. a disease that is the consequence of the presence of
pathogenic microbial agents, including pathogenic viruses,
pathogenic bacteria, fungi, protozoa, multicellular parasites,
and aberrant proteins known as prions
51. Strategy used to build BWO(I)
with thanks to Lindsay Cowell and Oliver He (Michigan)
1. Start with a glossary such as:
http://www.emedicinehealth.com/biological_warfare/
2. Select corresponding terms from IDO core and
related ontologies such as the CHEBI Chemistry
Ontology terms needed to describe bioweapons
3. All ontology terms keep their original definitions
and IDs.
4. The result is a spreadsheet
57
52. 5. Where glossary terms have no ontology
equivalent, create BWO ontology terms and
definitions as needed
58
no corresponding
ontology term
53. 6. Use the Ontofox too to create the first version of
the BWO(I) application ontology
(http://ontofox.hegroup.org/)
7. Use BWO(I) in annotations, and where gaps are
identified create extension terms, for instance
– weaponized brucella
– aerosol anthrax
– smallpox incubation period
This establishes a virtuous cycle between ontology
development and use in annotations
59
54. Potential uses of BWO
– semantic enhancement of bioweapons
intelligence data
– results will be automatically interoperable with
relevant bioinformatics and public health IT tools
for dealing with infections, epidemics, vaccines,
forensics, …
–to annotate research literature and research data
on bioweapons
– to create computable definitions to substitute for
definitions in free text glossaries
60
55. Why do people think they need lexicons
• Training
• Compiling lessons learned
• Compiling results of testing, e.g. of proposed new
doctrine
• Collective inferencing
• Official reporting
• Doctrinal development
• Standard operating procedures
• Sharing of data
• People need to (ensure that they) understand
each other
Editor's Notes
Mental functioning related anatomical structure: an anatomical structure in which there inheres the disposition to be the agent of a mental processBehaviour inducing state: a bodily quality inhering in a mental functioning related anatomical structure which leads to behaviour of some sortAffective representation: a cognitive representation sustained by an organism about its own emotionsCognitive representation: a representation which specifically depends on an anatomical structure in the cognitive system of an organismMental process: a bodily process which brings into being, sustains or modifies a cognitive representation or a behaviour inducing state
Mental functioning related anatomical structure: an anatomical structure in which there inheres the disposition to be the agent of a mental processBehaviour inducing state: a bodily quality inhering in a mental functioning related anatomical structure which leads to behaviour of some sortAffective representation: a cognitive representation sustained by an organism about its own emotionsCognitive representation: a representation which specifically depends on an anatomical structure in the cognitive system of an organismMental process: a bodily process which brings into being, sustains or modifies a cognitive representation or a behaviour inducing state
The subjective feeling component of the emotion. These avoid sounding tautological/repetitive by focusing on distinct separable aspects of the feeling. It would be strange to include subjective feelings as separate components in the ontology if the best we could manage would be terms such as ‘feeling angry’, ‘feeling frightened’. Still, those sorts of feelings are often referred to in the scientific literature.
Aims to overcome some of these obstacles.Ontologically correct natural language defs.
Subscribing to IDO core means agreeing to a semantics. Without bias to any dimension.
GrowthMicro ontologies for particular diseases that inherit terms and axioms about host-pathogen interaction. Evolve in a cross-producty way…so if you are interested in frog pneumonia, IDO Core has you covered.