For efficient and innovative use of big data, it is important to integrate multiple data bases across domains. For example, various public data bases are developed in life science, and how to find a novel scientific result using them is an essential technique. In social and business areas, open data strategies in many countries promote diversity of public data, how to combine big data and open data is a big challenge. That is, diversity of dataset is a problem to be solved for big data.
Ontology gives a systematized knowledge to integrate multiple datasets across domains with semantics of them. Linked Data also provides techniques to interlink datasets based on semantic web technologies. We consider that combinations of ontology and Linked Data based on ontological engineering can contribute to solution of diversity problem in big data.
In this talk, I discuss how ontological engineering could be applied to big data with some trial examples.
Ensuring Technical Readiness For Copilot in Microsoft 365
Ontology Engineering for Big Data
1. Ontology Engineering
for Big Data
Kouji Kozaki
The Institute of Scientific and Industrial Research (I.S.I.R),
Osaka University, Japan
2013/09/03 1
Ontology and Semantic Web for Big Data
(ONSD2013) Workshop in the 2013
International Computer Science and
Engineering Conference
(ICSEC2013), Bangkok, Thailand, 5th
Sep. 2013
ONSD2013@ICEC2013
2. Self introduction: Kouji KOZAKI
Brief biography
2002 Received Ph.D. from Graduate School of Engineering, Osaka
University.
2002- Assistant Professor, 2008- Associate Professor in ISIR, Osaka
University.
Specialty
Ontological Engineering
Main research topics
Fundamental theories of ontological engineering
2013/09/03 2ONSD2013@ICEC2013
3. Ontological topics
Some examples of topics which I work on
Definition of disease
What’s “disease” ?
What’s “causal chain” ?
Is it a object or process ?
Role theory
What’s ontological difference among the following concepts?
Person
Teacher
Walker
Murderer
Mother
2013/09/03 3
…. Natural type
Role (dependent concept)
ONSD2013@ICEC2013
4. Self introduction: Kouji KOZAKI
Brief biography
2002 Received Ph.D. from Graduate School of Engineering, Osaka University.
2002- Assistant Professor, 2008- Associate Professor in ISIR, Osaka University.
Specialty
Ontological Engineering
Main research topics
Fundamental theories of ontological engineering
Ontology development tool based on the ontological theories
Ontology development in several domains and ontology-based application
Hozo(法造) -an environment for ontology building/using- (1996- )
A software to support ontology(=法) building(=造) and
use
It’s available at http://www.hozo.jp as a free software
Registered Users:3,500 (June 2012)
Java API for application development is provided.
Support formats: Original format, RDF(S), OWL.
Linked Data publishing support is coming soon.
2013/09/03 4ONSD2013@ICEC2013
5. My history on Ontology Building
2002-2007 Nano technology ontology
Supported by NEDO(New Energy and Industrial Technology Development Organization)
2006- Clinical Medical ontology
Supported by Ministry of Health, Labour and Welfare, Japan
Cooperated with: Graduate School of Medicine, The University of Tokyo.
2007-2009 Sustainable Science ontology
Cooperated with: Research Institute for Sustainability Science, Osaka Univ.
2007-2010 IBMD(Integrated Bio Medical Database)
Supported by MEXT through "Integrated Database Project".
Cooperated with: Tokyo Medical and Dental University, Graduate School of Medicine, Osaka U.
2008-2012 Protein Experiment Protocol ontology
Cooperated with: Institute for Protein Research, Osaka Univ.
2008-2010 Bio Fuel ontology
Supported by the Ministry of Environment, Japan.
2009-2012 Disaster Risk ontology
Cooperated with: NIED (National Research Institute for Earth Science and Disaster Prevention)
2012- Bio mimetic ontology
Supported by JSPS KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas
2012- Ontology of User Action on Web
Cooperated with: Consumer first Corp.
2013- Information Literacy ontology
Supported by JSPS KAKENHI
2013/09/03 5ONSD2013@ICEC2013
6. Agenda
(1) Motivation
Ontology vs. Big Data
How we can use ontology for big data?
(2) Case Studies towards Ontology Engineering
for Big Data
Ontology Exploration according to the users viewpoints
A Disease Ontology developed in Japanese Medical
Ontology Project
(3) Concluding Remarks
2013/09/03 6ONSD2013@ICEC2013
7. Ontology vs. Big Data
Question
Is Ontology useful for Big Data?
My answer:(I believe) Yes
Combination of ontology and Big Data could
provide new solutions for many problem.
2013/09/03 7
Ontology
Not so big.
(someone is big)
Built by hands.
Used based on
semantics by reasoning.
Big Data
Very big.
Collected automatically.
Used without semantics
by Machine Learning or
Data mining.
ONSD2013@ICEC2013
8. How to combine
Ontology and Big Data
Basic technology
Mapping ontology to database
Mapping classes (concepts) defined in ontology to database
schema
Mapping classes/instances defined in ontology to data in DB
Add metadata on data using vocabulary defined in
ontology
e.g. annotation on document such as webpage, paper etc.
Convert database (e.g. RDB) to ontology-based
(RDF) database
e.g. linked data such as DBPedia, some bioinformatics DB,
etc.
You can choose some of these technology
according to your purpose
2013/09/03 ONSD2013@ICEC2013 8
9. How to combine
Ontology and Big Data
Basic technology
Mapping ontology to database
Mapping classes (concepts) defined in ontology to database
schema
Mapping classes/instances defined in ontology to data in DB
Add metadata on data using vocabulary defined in
ontology
e.g. annotation on document such as webpage, paper etc.
Convert database (e.g. RDB) to ontology-based
(RDF) database
e.g. linked data such as DBPedia, some bioinformatics DB,
etc.
You can choose some of these technology
according to your purpose
2013/09/03 ONSD2013@ICEC2013 9
Case Study
A method for mapping Abnormality Ontology (in medical
domain) to medical database
10. hypertension
Classification of Abnormality
Representations 1
blood pressure
200 mmHg
blood pressure is high
Various types of abnormality representations
are used in medical domain
blood glucose level
150 mm/dL
blood glucose level is high
hyperglycemia
2013/09/03 10
ONSD2013@ICEC2013
11. ☑
11
Classification of Abnormality
representations 2
※Based on quality and quantity ontologies in the Upper Ontology “YAMATO”.
Property
representation
Quantitative
representation
blood pressure
200 mmHg
blood glucose
level 150 mm/dL
Qualitative
representation
blood pressure
is high
blood glucose
level is high
hypertension
hyperglycemia
☑Diagnosis
Identify a concrete
value for each
patient in clinical
tests
☑Definition of
disease
2013/09/03 ONSD2013@ICEC2013
Abnormality
Ontology
Medical
Database
Mapping
12. Structural
abnormality
Size
abnormality
Formational
abnormality
Conformational
abnormality
Small in
size
Small in
line
Small in
area
Small in
volume
Narrowing tube
Vascular stenosis Gastrointestinal
tract stenosis
Arterial stenosis …
Intestinal
stenosis
Layer 1:
Generic Abnormal
States (Object-
independent)
Layer 3:
Specific context-
dependent
Abnormal States
Coronary stenosis
in
Angina pectoris
Coronary stenosis
in
Arteriosclerosis
Intestinal stenosis
in
Ileus
Esophageal stenosis
in
Esophagitis
Esophageal
stenosis
is-a
Material
abnormality
Large
in size
disease
dependent
Blood vessel
dependent
Topological
abnormality
……
…
Is-a hierarchy of Abnormality Ontology
12
Tube-
dependent…
Narrowing
of valve
Layer2:
Object-dependent
Abnormal States
…
…
…
Coronary stenosis
2013/09/03
13. How can we deal with
clinical test data ?
•In hospitals, huge volume of diagnostic/clinical test data
have been accumulated.
•Most are quantitative data:
e.g., blood prresure 180mmHg, blood cross-sectional area
40 mmx2,
Quantitative value Qualitative value
180mmHg (Vqt) high (Vql)
Quantitative
value:180 mmhg
Threshold value
blood pressure
high
13
high
e.g., 140mmhg
2013/09/03
14. blood pressure
Attribute (A)
high
Value (V)
Basic policy for definition of
abnormal states
hypertension
Property (P)
A property is decomposed into a tuple:
<Attribute (A), Attribute Value (V)> in a qualitative form.
14
Qualitative representation can be converted into a
Property representation.
2013/09/03
15. Quantity
Property
blood pressure
180 mmhg
cross-section area
xxcmx2
abnormality
knowledge
Clinical test
data
blood pressure
high
cross-section area
small
Hypertension
Narrowing
Quality
Our model enables
“Interoperability” from Clinical test
data to conceptual knowledge about
abnormal States.
15
Qualitative representation can be
converted Quantitative data to
Property representation.
2013/09/03
16. How to combine
Ontology and Big Data
Basic technology
Mapping ontology to database
Mapping classes (concepts) defined in ontology to database
schema
Mapping classes/instances defined in ontology to data in DB
Add metadata on data using vocabulary defined in
ontology
e.g. annotation on document such as webpage, paper etc.
Convert database (e.g. RDB) to ontology-based
(RDF) database
e.g. linked data such as DBPedia, some bioinformatics DB,
etc.
You can choose some of these technology
according to your purpose
2013/09/03 ONSD2013@ICEC2013 16
Case Study
Annotation on web browsing history of users based on
Web User Action Ontology
17. 0
5
10
15
20
25
30
35
40
会議毎の利用タイプの推移
Theamount ofpaperssurveyedin each conference
9 19 18 24 25 11 23 26 17 18
Theamountsoftypesofusage
Web browsing history
(access logs) of users
List of all URLs the user accessed
for 130M users × 2 year
s
Web User
Action Ontology
Analysis of
consumption
behavior
Annotation on web browsing
history of users based on ontology
This is collaborative work with Consumer first, Inc.
18. Basic Idea
The format of the access logs (Web browsing history) of
users provided by Consumer first, Inc.
User id, access date and time, URL …
Problem
URL is meaning less string for human while someone guess its contents
if it is famous site.
Diversity of access logs.
In order to analyze them, we need consistent meaning.
Annotations on the access log
We tried to add metadata which present human understandable
meaning of each URL
We also developed a prototype of automatic annotation
Its recall and relevance rate is almost 0.7 ~0.9
We think this result is not bad for statistical analysis.
2013/09/03 ONSD2013@ICEC2013 18
19. Ontology Engineering
for Big Data
Basic technology
= How to combine Ontology and Big Data
Mapping ontology to database
Add metadata on data using vocabulary defined in
ontology
Convert database (e.g. RDB) to ontology-based
(RDF) database
How to use Combinations of Ontology and Big Data
Ontology can provide semantics to add raw data.
Generalized concepts in ontology can connect data in
various concept levels across domains.
We can use ontology as given (and authorized) knowledge
to analysis big data.
2013/09/03 19ONSD2013@ICEC2013
20. Ontology Engineering
for Big Data
Features of ontology in class level
It reflects understanding of the target world.
Well organized ontologies have generalized rich knowledge
based on consistent semantics.
Ontologies are systematized knowledge of domains.
Combination of ontology and big data
Ontology can provide semantics to add raw data.
Generalized concepts in ontology can connect data in
various concept levels across domains.
We can use ontology as given (and authorized) knowledge
to analysis big data.
2013/09/03 20ONSD2013@ICEC2013
21. Two possible way to use
ontology for big data
Metadata
...
LOD(Linked Open Data)
Ontology
Big Data
Ontology
Use ontology to bridge
datasets across domains
Use ontology to combine deep
domain knowledge and raw data
2013/09/03 21ONSD2013@ICEC2013
22. Case studies
Use ontology to bridge datasets across
domains
Understanding an Ontology through Divergent
Exploration
Presented at ESWC2011
Use ontology to combine deep domain
knowledge and raw data
Japanese Medical Ontology project
Disease ontology and Ontology of Abnormal
State
presented at ICBO (International Conference on Biomedical
Ontology) 2011, 2012 and 2013
2013/09/03 22ONSD2013@ICEC2013
23. Use ontology to bridge datasets
across domains
Basic technology
Terms (classes/instances) defined in ontology are used as common
vocabulary for search data.
If the ontology has mapping to Multiple DBs, the user can search
across them.
Motivation and Issue
Combinations of multiple datasets
could be valuable for Big Data Analysis.
e.g. climate and agriculture,
healthcare and life science, etc.
However, to get all combinations across
multiple Big Data is not realistic for their size.
Requests by the users are also very different
according to their interests.
It is important to consider efficient method
to obtain meaningful combinations.
2013/09/03 ONSD2013@ICEC2013 23
O ntology
Docum ents / Law D ata
Search
Search across
multiple DBs
Common Vocabulary
Raw
24. A method to obtain meaningful
combinations using ontology
exploration
2013/09/03 24
Problem Setting
Problem Solution
Innovation
Layer 0
Layer 1
Layer 2
Layer 3
Layer 4
Contents Management
using the Metadata
Map Generation
Depending on
Viewpoints
Comparison and
Convergence
of multiple Maps
Context Based
Convergence
Divergent
Exploration
Ontology-based
Information
Retrieval
An ontology presents an
explicit essential understanding
of the target world.
It provides a base knowledge
to be shared among the
users.
They explore the ontology
according to their viewpoint
and generate conceptual
maps as the result.
These maps represent
understanding from the their
own viewpoints.
They can use the maps as
viewpoints (combinations) to
get data from multiple DBs.
ONSD2013@ICEC2013
25. (Divergent)
Ontology exploration tool
Exploration of an ontology
“Hozo” – Ontology Editor
Multi-perspective conceptual chains
represent the explorer’s understanding of
ontology from the specific viewpoint. Conceptual maps
Visualizations as
conceptual maps from
different view points
1) Exploration of multi-perspective conceptual chains
2) Visualizations of conceptual chains
2013/09/03 25ONSD2013@ICEC2013
26. Referring to
another concept
2013/09/03 26
Node represents
a concept
(=rdfs:Class)
slot represents
a relationship
(=rdf:Property)
Is-a (sub-class-of)
relationshp
ONSD2013@ICEC2013
30. Functions for ontology
exploration
Exploration using the aspect dialog:
Divergent exploration from one concept using the aspect
dialog for each step
Search path:
Exploration of paths from stating point and ending points.
The tool allows users to post-hoc editing for extracting
only interesting portions of the map.
Change view:
The tool has a function to highlight specified paths of
conceptual chains on the generated map according to given
viewpoints.
Comparison of maps:
The system can compare generated maps and show the
common conceptual chains both of the maps.
2013/09/03 30
Manual exploration
Machine exploration
ONSD2013@ICEC2013
31. 2013/09/03 31
Ending point (1)
Ending point (3)
Ending point (2)
Search
Path
Starting point
Selecting of ending points
Finding all possible
paths from stating
point to ending points
ONSD2013@ICEC2013
33. 2013/09/03 33
What does the result mean?
Selected ending points
ONSD2013@ICEC2013
Problem
Kinds of method to solve the problem
Possible combination of them
35. Usage and evaluation of
ontology exploration tool
Step 1: Usage for knowledge structuring in
sustainability science
Step 2: Verification of exploring the abilities of the
ontology exploration tool
Step 3: Experiments for evaluating the ontology
exploration tool
2013/09/03 35ONSD2013@ICEC2013
36. Sustainability Science
Sustainability Science probes interactions
between global, social, and human systems,
the complex mechanisms that lead to
degradation of these systems, and
concomitant risks to human well-being.
The journal provides a platform for building
sustainability science as a new academic
discipline.
These include endeavors to simultaneously
understand phenomena and solve problems,
uncertainty and application of the
precautionary principle, the co-evolution of
knowledge and recognition of problems, and
trade-offs between global and local problem
solving.
Volume 1 / 2006 - Volume 8 / 2013
Editor-in-Chief: Kazuhiko Takeuchi
Managing Editor: Osamu Saito
ISSN: 1862-4065 (print version)
ISSN: 1862-4057 (electronic version)
36
37. Knowledge Structuring in Sustainability Science
Sustainability Science (SS)
– We aimed at establishing a new interdisciplinary
scheme that serves as a basis for constructing a
vision that will lead global society to a sustainable
one.
– It is required an integrated understanding of the
entire field instead of domain-wise knowledge
structuring.
Sustainability science ontology
– Developed in collaboration with domain expert in
Osaka University Research Institute for
Sustainability Science (RISS).
– Number of concepts:649, Number of slots:
1,075
Usage of the ontology exploration tool
– It was confirmed that the exploration was fun for
them and the tool had a certain utility for achieving
knowledge structuring in sustainability science.
[Kumazawa 2009]
http://en.ir3s.u-tokyo.ac.jp/about_sus
Sustainability Science
37
38. Biofuel Use Strategies for Sustainable Development
(BforSD, FY2008-FY2010)
Development of the ontology-based
mapping system which create
comprehensive views of problems and
policy measures on biofuel
(1) Structuring biofuel problems: Develop the
biofuel ontology which explicitly
conceptualizes biofuel problems through
literature review and interviews
(2)Develop an ontology exploration tool
which interactively generates conceptual
maps with paths between concepts in the
biofuel ontology
(3)In collaboration with other sub-themes,
develop an application method of this map
tool for policy making support to find,
frame and prioritize relevant problems and
policy measures.
(source) US DOE
38
One of the sub-themes
39. Usage and evaluation of
ontology exploration tool
Step 1: Usage for knowledge structuring in
sustainability science
Step 2: Verification of exploring the abilities of the
ontology exploration tool
Step 3: Experiments for evaluating the ontology
exploration tool
2013/09/03 39ONSD2013@ICEC2013
40. Verification of Ontology Exploration Tool
Verification methods
1) Enrichment of SS ontology
We enriched the SS ontology on the basis of
29 typical scenarios (cases) structured by
domain experts in biofuel through literature
review and interviews
29 scenarios
(cases)
27 conceptual
maps
40
41. 1) Energy
services for the
poor
(+/−) Competition of biomass energy systems with the present use of biomass resources (such as agricultural residues) in applications
such as animal feed and bedding, fertilizer, and construction materials1
(−) In many developing countries, small-scale biomass energy projects face challenges obtaining finance from traditional financing
institutions1
(−) Liquid biofuels are likely to replace only a small share of global energy supplies and cannot alone eliminate our dependence on fossil
fuels2
2) Agro-
industrial
development
and job creation
(+) Biofuel is powering new small- and large-scale agro-industrial development and spawning new industries in industrialized and
developing countries1
(+/−) In the short-to-medium term, bioenergy use will depend heavily on feedstock costs and reliability of supply, cost and availability of
competing energy sources, and government policy decisions1
(+) In the longer term, the economics of biofuel will probably improve as agricultural productivity and agro-industrial efficiency improve,
more supportive agricultural and energy policies are adopted, carbon markets mature and expand, and new methodologies for carbon
sequestration accounting are developed1
(+) In the longer term, expanded demand and increased prices for agricultural commodities may represent opportunities for agricultural
and rural development2
(+) Biofuel industries create jobs, including highly skilled science, engineering, and business-related employment; medium-level
technical staff; low-skill industrial plant jobs; and unskilled agricultural labor1
(+/−) Small-scale and labor intensive production often lead to trade-offs between production efficiency and economic competitiveness1
3) Health and
gender
(−) Market opportunities cannot overcome existing social and institutional barriers to equitable growth, with exclusion factors such as
gender, ethnicity, and political powerless, and may even worsen them2
(−) Forest burning for development of feedstock plantation and sugarcane burning to facilitate manual harvesting result in air pollution,
higher surface water runoff, soil erosion, and unintended forest fires3,4
(−) Exploitation of cheap labor (plantation and migrant workers)4
(−) Increased use of pesticides could create health hazards for labors and communities living near areas of feedstock production1,3
4) Agricultural
structure
(−) The demand for land to grow biofuel crops could put pressure on competing land usage for food crops, resulting in an increase in food
prices1,2
(+/−) Significant economies of scale can be gained from processing and distributing biofuels on a large scale. The transition to liquid
biofuels can be harmful to farmers who do not own their own land, and to the rural and urban poor who are net buyers of food1
(−) While global market forces could lead to new and stable income streams, they could also increase marginalization of poor and
indigenous people and affect traditional ways of living if they end up driving small farmers without clear titles from their land and
destroying their livelihood1
(+): Positive effects,(−): Negative effects,(+/−): Both positive and negative effects
(Source) 1: UN-Energy (2007), 2: FAO (2008), 3: CBD (2008), 4: Martinelli et al. (2008)
Positive and negative effects of biofuel
41
42. 5) Food security (−) Demand for agricultural feedstock for liquid biofuels will be a significant factor for agricultural markets and world agriculture over
the next decade and perhaps beyond2
(−) Rapid growing demand for biofuel feedstock has contributed to higher food prices, which poses an immediate threat to the food
security of poor net food buyers in both urban and rural areas2
(+/−) The effect of biofuels on food security is context-specific, depending on the particular technology and country characteristics
involved1
6) Government
budget
(−) Because ethanol is used largely as a substitute for gasoline, providing a large tax reduction for blending ethanol and gasoline reduces
government revenue from this tax, mainly targeting the non-poor1
(−) Production of biofuels in many countries, except sugarcane-based ethanol production in Brazil, is not currently economically viable
without subsidies, given existing agricultural production and biofuel-processing technologies and recent relative prices of commodity
feedstock and crude oil2
(−) Policy intervention, especially in the form of subsidies and mandated blending of biofuels with fossil fuels, are driving the rush to
liquid biofuels, which leads to high economic, social, and environmental costs in both developed and developing countries2
7) Trade, foreign
exchange
balance, and
energy security
(+) Diversifying global fuel supplies could have beneficial effects on the global oil market and many developing countries because fossil
fuel dependence has become a major risk for many developing economies1
(+/−) Rapidly rising demand for ethanol has had an impact on the price of sugar and maize in recent years, bringing substantial rewards to
farmers not only in Brazil and the United States but around the world1,2
(−) Linking of agricultural prices to the vicissitudes of the world oil market clearly presents risks; however, it is an essential transition to
the development of a biofuel industry that does not rely on major food commodity crops1
8) Biodiversity
and natural
resource
management
(+/−) Depending on the types of crop grown, what they replaced, and the methods of cultivation and harvesting, biofuels can have
negative and positive effects on land use, soil and water quality, and biodiversity1,3
(−) Problems with water availability and use may represent a limitation on agricultural biofuel production1,3
(−) Introduction of criteria, standards, and certification schemes for biofuels may generate indirect negative environmental and
biodiversity effects, passively in other countries3
(−) If the production of biofuel feedstock requires increased fertilizer and pesticide use, there could be additional detrimental effects such
as increase in GHGs emission and eutrophicating nutrients and biodiversity loss3
(−) Wild biodiversity is threatened by loss of habitat when the area under crop production is expanded, whereas agricultural biodiversity
is vulnerable in the case of large-scale monocropping, which is based on a narrow pool of genetic material, and can also lead to reduced
use of traditional varieties2,3
(+) If crops are grown on degraded or abandoned land, such as previously deforested areas or degraded crop- and grasslands, and if soil
disturbances are minimized, feedstock production for biofuels can have a positive impact on biodiversity by restoring or conserving
habitat and ecosystem function3
9) Climate
change
(+/−) Full lifecycle GHG emissions of biofuel vary widely based on land use changes, choice of feedstock, agricultural practices, refining
or conversion processes, and end-use practices1,2
(−) Land use change associated with production of biofuel feedstock can affect GHG emissions; draining wetlands and clearing land with
fire are detrimental with regard to GHG emissions and air quality2,3
(−) The greatest potential for reducing GHG emission comes from replacement of coal rather than petroleum fuels1
(+) Biofuels offer the only realistic near-term renewable option for displacing and supplementing liquid transport fuels1
(+): Positive effects,(−): Negative effects,(+/−): Both positive and negative effects
(Source) 1: UN-Energy (2007), 2: FAO (2008), 3: CBD (2008), 4: Martinelli et al. (2008) 42
43. Verification of Ontology Exploration Tool
burn agriculture=(deforestation, soil deterioration caused by farmland development for
biofuel crops)⇒ harvest sugarcanes (air pollution caused by intentional burn),disruption of
ecosystem caused by deforestation(water pollution)
The concepts appearing in these
scenarios were extracted and
generalized to add into the ontology
Example: Air pollution, cause of forest fire, soil deterioration, water pollution are attributed
to intentional burn when forest is logged or sugarcanes are harvested in the
farmland development for biofuel crops.
43
44. Verification of Ontology Exploration Tool
Verification methods
1) Enrichment of SS ontology
We enriched the SS ontology on the basis of
29 typical scenarios (cases) structured by
domain experts in biofuel through literature
review and interviews
2) Verification of scenario reproducing
operations
We verified whether the ontology exploration
tool could generate conceptual maps which
represent original scenarios.
Result:
– 93% (27/29) of the scenarios were
successfully reproduced as conceptual maps.
29 scenarios
(cases)
27 conceptual
maps
44
45. Usage and evaluation of
ontology exploration tool
Step 1: Usage for knowledge structuring in
sustainability science
Step 2: Verification of exploring the abilities of the
ontology exploration tool
Step 3: Experiments for evaluating the ontology
exploration tool
1) Whether meaningful maps for domain experts were obtained.
2) Whether meaningful maps other than anticipated maps were
obtained.
2013/09/03 45
Maps which are representing the contents of the scenarios anticipated
by ontology developers at the time of ontology construction.
Note: the subjects don’t know what scenarios are anticipated.
ONSD2013@ICEC2013
46. Experiment for evaluating
ontology exploration tool
Experimental method
1) The four experts to generated
conceptual maps with the tool in
accordance with condition settings of
given tasks.
2) They remove paths that were
apparently inappropriate from the
paths of conceptual chains included in
the generated maps.
3) They select paths according to their
interests and enter a four-level general
evaluation with free comments.
2013/09/03 46
The subjects:
4 experts in different fields.
A: Agricultural economics
B: Social science
(stakeholder analysis)
C: Risk analysis
D: Metropolitan environmental
planning
A: Interesting
B: Important but ordinary
C: Neither good or poor
D: Obviously wrong
ONSD2013@ICEC2013
47. Experimental results (1)
2013/09/03 47
Table.2 Experimental results.
A B C D
Expert A 2 2
Expert A
(second time) 1 1
Expert B 7 4 1 2
Expert B
(second time) 6 3 3
Expert C 8 1 5 2
Expert D 3 1 1 1
Expert A 1 1
Expert B 6 5 1
Expert C 7 2 4 1
Expert D 5 3 1 1
Expert B 8 4 2 2
Expert C 4 2 2
Expert D 3 3
61 30 22 8 1
Task 3
Total
Number of
selected paths
Path distribution based on general evaluation
Task 1
Task 2
l
a
E
n
in
c
n
p
ONSD2013@ICEC2013
48. Experimental results (1)
2013/09/03 48
Table.2 Experimental results.
A B C D
Expert A 2 2
Expert A
(second time) 1 1
Expert B 7 4 1 2
Expert B
(second time) 6 3 3
Expert C 8 1 5 2
Expert D 3 1 1 1
Expert A 1 1
Expert B 6 5 1
Expert C 7 2 4 1
Expert D 5 3 1 1
Expert B 8 4 2 2
Expert C 4 2 2
Expert D 3 3
61 30 22 8 1
Task 3
Total
Number of
selected paths
Path distribution based on general evaluation
Task 1
Task 2
l
a
E
n
in
c
n
p
Number of maps
generated: 13
Number of paths
evaluated: 61
Number of paths evaluated: 61
A: Interesting 30 (49%)
B: Important but ordinary 22 (36%)
C: Neither good or poor 8(13%)
D: Obviously wrong 1(2%)
We can conclude that the tool could generate
maps or paths sufficiently meaningful for experts.
85%
ONSD2013@ICEC2013
49. Experimental results (2)
Quantitatively comparison of the anticipated maps with the
maps generated by the subjects
2013/09/03 49
(N) Nodes and links
included in the paths
of anticipated maps
(M) Nodes and links included
in the paths of generated and
selected by the experts
50 15050
N∩M About 75% of paths in the
generated maps are new paths
which is not anticipated from
the typical scenarios .
It is meaningful enough to claim a positive support for the developed tool.
This suggests that the tool has a sufficient possibility of presenting
unexpected contents and stimulating conception by the user.
About half (50%) of the paths
included in the anticipated maps
were included in the maps
generated by the experts.
ONSD2013@ICEC2013
50. Summery: Use ontology to
bridge datasets across domains
Basic technology
Terms (classes/instances) defined in ontology are used as common
vocabulary for search data.
If the ontology has mapping to Multiple DBs, the user can search
across them.
Motivation and Issue
Combinations of multiple datasets could be valuable for Big Data
Analysis.
However, to get all combinations across multiple Big Data is not
realistic for their size.
Requests by the users are very different according to their interests.
Ontology Engineering for Big Data to Solve the issue
Ontology Exploration contribute to obtain meaningful
combinations (= viewpoints) according to the users’
interests.
2013/09/03 ONSD2013@ICEC2013 50
51. Case studies
Use ontology to bridge datasets across
domains
Understanding an Ontology through Divergent
Exploration
Presented at ESWC2011
Use ontology to combine deep domain
knowledge and raw data
Japanese Medical Ontology project
Disease ontology and Ontology of Abnormal
State
presented at ICBO (International Conference on Biomedical
Ontology) 2011, 2012 and 2013
2013/09/03 52ONSD2013@ICEC2013
52. Medical ontology project in Japan
Developed ontologies
Disease ontology:
Definitions of diseases as causal chains of abnormal state.
6000+ diseases
Anatomy ontology:
Connections between blood vessel, nerves, bones : 10,000+
It based on ontological frameworks (upper level
ontology) which can apply to other domains
Models for causal chains
Abnormal state ontology for data integration
General framework to define complicated structures
2013/09/03 53ONSD2013@ICEC2013
53. Disease Ontology
Definition of the disease ontology
How to connect the disease
ontology to medical database
2013/09/03 54ONSD2013@ICEC2013
54. An example of causal chain
constituted diabetes.
2013/09/03 55
Disorder (nodes)
Causal Relationship
Core causal chain of a disease
(each color represents a disease)
Legends
loss of sight
Elevated level
of glucose in
the blood
Type I diabetes
Diabetes-related
Blindness
Steroid diabetes
Diabetes
…
…
…
…
…
…
…
… … …
…
possible causes and effects
Destruction of
pancreatic
beta cells
Lack of insulin I
in the blood
Long-term steroid
treatment
Deficiency
of insulin
Is-a relation between diseases
using chain-inclusion relationship
between causal chains
ONSD2013@ICEC2013
55. Structural
abnormality
Size
abnormality
Formational
abnormality
Conformational
abnormality
Small in
size
Small in
line
Small in
area
Small in
volume
Narrowing tube
Vascular stenosis Gastrointestinal
tract stenosis
Arterial stenosis …
Intestinal
stenosis
Layer 1:
Generic Abnormal
States (Object-
independent)
Layer 3:
Specific context-
dependent
Abnormal States
Coronary stenosis
in
Angina pectoris
Coronary stenosis
in
Arteriosclerosis
Intestinal stenosis
in
Ileus
Esophageal stenosis
in
Esophagitis
Esophageal
stenosis
is-a
Material
abnormality
Large
in size
disease
dependent
Blood vessel
dependent
Topological
abnormality
……
…
Is-a hierarchy of Abnormality Ontology
56
Tube-
dependent…
Narrowing
of valve
Layer2:
Object-dependent
Abnormal States
…
…
…
Coronary stenosis
2013/09/03
ONSD2013@ICEC2013
56. Medical Department No. of
Abnormal
states
No. of
Diseases
Allergy and Rheumatoid 1,195 87
Cardiovascular Medicine 3,052 546
Diabetes and Metabolic
Diseases
1,989 445
Orthopedic Surgery 1,883 198
Nephrology and
Endocrinology
1,706 198
Neurology 2,960 396
Digestive Medicine 1,125 233
Respiratory Medicine 1,739 788
Ophthalmology 1,306 561
Hematology and Oncology 354 415
Dermatology 908 1,086
Pediatrics 2,334 879
Otorhinolaryngology 1,118 470
Total 21,669 6,302
Disease chains Graphical Tool
Hozo-Ontology Editor
Clinicians from 13 medical
departments describe
causal chains of diseases :
• 6,302 diseases
•21,669 abnormal states
2013/09/03
ONSD2013@ICEC2013
57. Medical Department No. of
Abnormal
state
No. of
Disease
Allergy and Rheumatoid 1,195 87
Cardiovascular Medicine 3,052 546
Diabetes and Metabolic
Diseases
1,989 445
Orthopedic Surgery 1,883 198
Nephrology and
Endocrinology
1,706 198
Neurology 2,960 396
Digestive Medicine 1,125 233
Respiratory Medicine 1,739 788
Ophthalmology 1,306 561
Hematology and Oncology 354 415
Dermatology 908 1,086
Pediatrics 2,334 879
Otorhinolaryngology 1,118 470
Total 21,669 6,302
Each Clinician defines diseases in terms of
causal chains at his/her division
Causal Relationship
Abnormal States
Myocardial Infarction (disease)
2013/09/03
58. Each Clinician defines diseases in terms of
causal chains at his/her division
Causal Relationship
Abnormal States
Myocardial Infarction (disease)
•Using three layer-model of abnormality ontology
•Combining causal chains including the same or related
abnormal states by consulting is-a hierarchy
⇒Generic causal chains can be generated. 59
Layer 3
Layer 2
Layer 1
59. Causal Relationship
Abnormal States
Myocardial Infarction (disease)
Layer 3
Layer 2
Layer 1
Each Clinician describes the definition of disease
(causal chains of disease)at particular department 60
From 13medical divisions
All 21,000 abnormal states
can be visualized with
possible causal relationships
•Using three layer-model of abnormality ontology
•Combining causal chains including the same or related
abnormal states by consulting is-a hierarchy
⇒Generic causal chains can be generated.
60. Knowledge provided by
the Disease Ontology
Definition of disease
It can answer the following questions;
What abnormal state could be a cause of which
diseases?
What condition may be occur on a patient of the
disease?
That is it can provide base knowledge to
analysis big data related to disease.
2013/09/03 ONSD2013@ICEC2013 61
61. DEMO:
Visualization of abnormal state ontology
with possible causal relationships
Java client application Developed by HOZO API.
Disease Chain LOD
Linked Open Data converted from the disease ontology.
SPARQL endpoint (web API for query) and Visualization
Tool of Disease Chains by HTML5.
http://lodc.med-ontology.jp/
2013/09/03 62ONSD2013@ICEC2013
62. SPARQL Endpoint
(c)The user can also browse
connected triples by clicking
rectangles that represent the objects.
(a)The user can make simple
SPARQL queries by selecting
a property and an object from
lists.
(b) When the user selects a resource
shown as a query result, triples
connected the resource are visualized.
2013/09/03 63ONSD2013@ICEC2013
64. Abnormal state
Is-a hierarchy
Clinical DB
knowledge
data
attribute⇔property
interoperability
65
Anomaly
representation
Abnormal states
Layers
Generic Chains
Disease
chains
2013/09/03
65. Summary(2):Disease Ontology
Disease Ontology
Provides domain knowledge described by medical
experts.
Medical DB (Big Data)
Provides evidential data from medial information system
such as electronic medical records.
It could be a good example to combine
Ontology and Big Data.
2013/09/03 66
Existing Knowledge Evidence /
New Knowledge
ONSD2013@ICEC2013
66. Concluding Remarks
Ontology Engineering for Big Data
Combination of them are good!
Basic technology: how to combine ontology to big data
Mapping ontology to database
Add metadata on data using vocabulary defined in ontology
Convert database (e.g. RDB) to ontology-based (RDF) database
How to use Combinations of Ontology and Big Data:
Two possible approaches
Use ontology to bridge datasets across domains
Ontology exploration method to obtain meaningful combinations (=
viewpoints)
Use ontology to combine deep domain knowledge and raw data
Future Plan
Generalizing our approaches and feedback them as new function of
Hozo
2013/09/03 67ONSD2013@ICEC2013
67. Acknowledgements
A part of this work was supported by JSPS KAKENHI
Grant Numbers 24120002 and 22240011.
A part of research on medical ontology is supported
by the Ministry of Health, Labor and Welfare, Japan,
through its “Research and development of medical
knowledge base databases for medical information
systems” and by the Japan Society for the Promotion
of Science (JSPS) through its “Funding Program for
World-Leading Innovative R&D on Science and
Technology (FIRST Program)”.
I’m also grateful to all collaborator of each study.
2013/09/03 ONSD2013@ICEC2013 68