Ontology Engineering for Big Data
Upcoming SlideShare
Loading in...5
×
 

Ontology Engineering for Big Data

on

  • 2,797 views

For efficient and innovative use of big data, it is important to integrate multiple data bases across domains. For example, various public data bases are developed in life science, and how to find a ...

For efficient and innovative use of big data, it is important to integrate multiple data bases across domains. For example, various public data bases are developed in life science, and how to find a novel scientific result using them is an essential technique. In social and business areas, open data strategies in many countries promote diversity of public data, how to combine big data and open data is a big challenge. That is, diversity of dataset is a problem to be solved for big data.
Ontology gives a systematized knowledge to integrate multiple datasets across domains with semantics of them. Linked Data also provides techniques to interlink datasets based on semantic web technologies. We consider that combinations of ontology and Linked Data based on ontological engineering can contribute to solution of diversity problem in big data.
In this talk, I discuss how ontological engineering could be applied to big data with some trial examples.

Statistics

Views

Total Views
2,797
Views on SlideShare
2,774
Embed Views
23

Actions

Likes
8
Downloads
154
Comments
0

2 Embeds 23

https://twitter.com 21
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Ontology Engineering for Big Data Ontology Engineering for Big Data Presentation Transcript

  • Ontology Engineering for Big Data Kouji Kozaki The Institute of Scientific and Industrial Research (I.S.I.R), Osaka University, Japan 2013/09/03 1 Ontology and Semantic Web for Big Data (ONSD2013) Workshop in the 2013 International Computer Science and Engineering Conference (ICSEC2013), Bangkok, Thailand, 5th Sep. 2013 ONSD2013@ICEC2013
  • Self introduction: Kouji KOZAKI  Brief biography  2002 Received Ph.D. from Graduate School of Engineering, Osaka University.  2002- Assistant Professor, 2008- Associate Professor in ISIR, Osaka University.  Specialty  Ontological Engineering  Main research topics  Fundamental theories of ontological engineering 2013/09/03 2ONSD2013@ICEC2013
  • Ontological topics  Some examples of topics which I work on  Definition of disease  What’s “disease” ?  What’s “causal chain” ?  Is it a object or process ?  Role theory  What’s ontological difference among the following concepts?  Person  Teacher  Walker  Murderer  Mother 2013/09/03 3 …. Natural type Role (dependent concept) ONSD2013@ICEC2013
  • Self introduction: Kouji KOZAKI  Brief biography  2002 Received Ph.D. from Graduate School of Engineering, Osaka University.  2002- Assistant Professor, 2008- Associate Professor in ISIR, Osaka University.  Specialty  Ontological Engineering  Main research topics  Fundamental theories of ontological engineering  Ontology development tool based on the ontological theories  Ontology development in several domains and ontology-based application  Hozo(法造) -an environment for ontology building/using- (1996- )  A software to support ontology(=法) building(=造) and use  It’s available at http://www.hozo.jp as a free software  Registered Users:3,500 (June 2012)  Java API for application development is provided.  Support formats: Original format, RDF(S), OWL.  Linked Data publishing support is coming soon. 2013/09/03 4ONSD2013@ICEC2013
  • My history on Ontology Building  2002-2007 Nano technology ontology  Supported by NEDO(New Energy and Industrial Technology Development Organization)  2006- Clinical Medical ontology  Supported by Ministry of Health, Labour and Welfare, Japan  Cooperated with: Graduate School of Medicine, The University of Tokyo.  2007-2009 Sustainable Science ontology  Cooperated with: Research Institute for Sustainability Science, Osaka Univ.  2007-2010 IBMD(Integrated Bio Medical Database)  Supported by MEXT through "Integrated Database Project".  Cooperated with: Tokyo Medical and Dental University, Graduate School of Medicine, Osaka U.  2008-2012 Protein Experiment Protocol ontology  Cooperated with: Institute for Protein Research, Osaka Univ.  2008-2010 Bio Fuel ontology  Supported by the Ministry of Environment, Japan.  2009-2012 Disaster Risk ontology  Cooperated with: NIED (National Research Institute for Earth Science and Disaster Prevention)  2012- Bio mimetic ontology  Supported by JSPS KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas  2012- Ontology of User Action on Web  Cooperated with: Consumer first Corp.  2013- Information Literacy ontology  Supported by JSPS KAKENHI 2013/09/03 5ONSD2013@ICEC2013
  • Agenda  (1) Motivation  Ontology vs. Big Data  How we can use ontology for big data?  (2) Case Studies towards Ontology Engineering for Big Data  Ontology Exploration according to the users viewpoints  A Disease Ontology developed in Japanese Medical Ontology Project  (3) Concluding Remarks 2013/09/03 6ONSD2013@ICEC2013
  • Ontology vs. Big Data  Question  Is Ontology useful for Big Data?  My answer:(I believe) Yes  Combination of ontology and Big Data could provide new solutions for many problem. 2013/09/03 7  Ontology  Not so big. (someone is big)  Built by hands.  Used based on semantics by reasoning.  Big Data  Very big.  Collected automatically.  Used without semantics by Machine Learning or Data mining. ONSD2013@ICEC2013
  • How to combine Ontology and Big Data  Basic technology  Mapping ontology to database  Mapping classes (concepts) defined in ontology to database schema  Mapping classes/instances defined in ontology to data in DB  Add metadata on data using vocabulary defined in ontology  e.g. annotation on document such as webpage, paper etc.  Convert database (e.g. RDB) to ontology-based (RDF) database  e.g. linked data such as DBPedia, some bioinformatics DB, etc.  You can choose some of these technology according to your purpose 2013/09/03 ONSD2013@ICEC2013 8
  • How to combine Ontology and Big Data  Basic technology  Mapping ontology to database  Mapping classes (concepts) defined in ontology to database schema  Mapping classes/instances defined in ontology to data in DB  Add metadata on data using vocabulary defined in ontology  e.g. annotation on document such as webpage, paper etc.  Convert database (e.g. RDB) to ontology-based (RDF) database  e.g. linked data such as DBPedia, some bioinformatics DB, etc.  You can choose some of these technology according to your purpose 2013/09/03 ONSD2013@ICEC2013 9 Case Study A method for mapping Abnormality Ontology (in medical domain) to medical database
  • hypertension Classification of Abnormality Representations 1 blood pressure 200 mmHg blood pressure is high Various types of abnormality representations are used in medical domain blood glucose level 150 mm/dL blood glucose level is high hyperglycemia 2013/09/03 10 ONSD2013@ICEC2013
  • ☑ 11 Classification of Abnormality representations 2 ※Based on quality and quantity ontologies in the Upper Ontology “YAMATO”. Property representation Quantitative representation blood pressure 200 mmHg blood glucose level 150 mm/dL Qualitative representation blood pressure is high blood glucose level is high hypertension hyperglycemia ☑Diagnosis Identify a concrete value for each patient in clinical tests ☑Definition of disease 2013/09/03 ONSD2013@ICEC2013 Abnormality Ontology Medical Database Mapping
  • Structural abnormality Size abnormality Formational abnormality Conformational abnormality Small in size Small in line Small in area Small in volume Narrowing tube Vascular stenosis Gastrointestinal tract stenosis Arterial stenosis … Intestinal stenosis Layer 1: Generic Abnormal States (Object- independent) Layer 3: Specific context- dependent Abnormal States Coronary stenosis in Angina pectoris Coronary stenosis in Arteriosclerosis Intestinal stenosis in Ileus Esophageal stenosis in Esophagitis Esophageal stenosis is-a Material abnormality Large in size disease dependent Blood vessel dependent Topological abnormality …… … Is-a hierarchy of Abnormality Ontology 12 Tube- dependent… Narrowing of valve Layer2: Object-dependent Abnormal States … … … Coronary stenosis 2013/09/03
  • How can we deal with clinical test data ? •In hospitals, huge volume of diagnostic/clinical test data have been accumulated. •Most are quantitative data: e.g., blood prresure 180mmHg, blood cross-sectional area 40 mmx2, Quantitative value Qualitative value 180mmHg (Vqt) high (Vql) Quantitative value:180 mmhg Threshold value blood pressure high 13 high e.g., 140mmhg 2013/09/03
  • blood pressure Attribute (A) high Value (V) Basic policy for definition of abnormal states hypertension Property (P) A property is decomposed into a tuple: <Attribute (A), Attribute Value (V)> in a qualitative form. 14 Qualitative representation can be converted into a Property representation. 2013/09/03
  • Quantity Property blood pressure 180 mmhg cross-section area xxcmx2 abnormality knowledge Clinical test data blood pressure high cross-section area small Hypertension Narrowing Quality Our model enables “Interoperability” from Clinical test data to conceptual knowledge about abnormal States. 15 Qualitative representation can be converted Quantitative data to Property representation. 2013/09/03
  • How to combine Ontology and Big Data  Basic technology  Mapping ontology to database  Mapping classes (concepts) defined in ontology to database schema  Mapping classes/instances defined in ontology to data in DB  Add metadata on data using vocabulary defined in ontology  e.g. annotation on document such as webpage, paper etc.  Convert database (e.g. RDB) to ontology-based (RDF) database  e.g. linked data such as DBPedia, some bioinformatics DB, etc.  You can choose some of these technology according to your purpose 2013/09/03 ONSD2013@ICEC2013 16 Case Study Annotation on web browsing history of users based on Web User Action Ontology
  • 0 5 10 15 20 25 30 35 40 会議毎の利用タイプの推移 Theamount ofpaperssurveyedin each conference 9 19 18 24 25 11 23 26 17 18 Theamountsoftypesofusage Web browsing history (access logs) of users List of all URLs the user accessed for 130M users × 2 year s Web User Action Ontology Analysis of consumption behavior Annotation on web browsing history of users based on ontology This is collaborative work with Consumer first, Inc.
  • Basic Idea  The format of the access logs (Web browsing history) of users provided by Consumer first, Inc.  User id, access date and time, URL …  Problem  URL is meaning less string for human while someone guess its contents if it is famous site.  Diversity of access logs.  In order to analyze them, we need consistent meaning.  Annotations on the access log  We tried to add metadata which present human understandable meaning of each URL  We also developed a prototype of automatic annotation  Its recall and relevance rate is almost 0.7 ~0.9  We think this result is not bad for statistical analysis. 2013/09/03 ONSD2013@ICEC2013 18
  • Ontology Engineering for Big Data  Basic technology = How to combine Ontology and Big Data  Mapping ontology to database  Add metadata on data using vocabulary defined in ontology  Convert database (e.g. RDB) to ontology-based (RDF) database  How to use Combinations of Ontology and Big Data  Ontology can provide semantics to add raw data.  Generalized concepts in ontology can connect data in various concept levels across domains.  We can use ontology as given (and authorized) knowledge to analysis big data. 2013/09/03 19ONSD2013@ICEC2013
  • Ontology Engineering for Big Data  Features of ontology in class level  It reflects understanding of the target world.  Well organized ontologies have generalized rich knowledge based on consistent semantics.  Ontologies are systematized knowledge of domains.  Combination of ontology and big data  Ontology can provide semantics to add raw data.  Generalized concepts in ontology can connect data in various concept levels across domains.  We can use ontology as given (and authorized) knowledge to analysis big data. 2013/09/03 20ONSD2013@ICEC2013
  • Two possible way to use ontology for big data Metadata ... LOD(Linked Open Data) Ontology Big Data Ontology Use ontology to bridge datasets across domains Use ontology to combine deep domain knowledge and raw data 2013/09/03 21ONSD2013@ICEC2013
  • Case studies  Use ontology to bridge datasets across domains  Understanding an Ontology through Divergent Exploration  Presented at ESWC2011  Use ontology to combine deep domain knowledge and raw data  Japanese Medical Ontology project  Disease ontology and Ontology of Abnormal State  presented at ICBO (International Conference on Biomedical Ontology) 2011, 2012 and 2013 2013/09/03 22ONSD2013@ICEC2013
  • Use ontology to bridge datasets across domains  Basic technology  Terms (classes/instances) defined in ontology are used as common vocabulary for search data.  If the ontology has mapping to Multiple DBs, the user can search across them.  Motivation and Issue  Combinations of multiple datasets could be valuable for Big Data Analysis.  e.g. climate and agriculture, healthcare and life science, etc.  However, to get all combinations across multiple Big Data is not realistic for their size.  Requests by the users are also very different according to their interests.  It is important to consider efficient method to obtain meaningful combinations. 2013/09/03 ONSD2013@ICEC2013 23 O ntology Docum ents / Law D ata Search Search across multiple DBs Common Vocabulary Raw
  • A method to obtain meaningful combinations using ontology exploration 2013/09/03 24 Problem Setting Problem Solution Innovation Layer 0 Layer 1 Layer 2 Layer 3 Layer 4 Contents Management using the Metadata Map Generation Depending on Viewpoints Comparison and Convergence of multiple Maps Context Based Convergence Divergent Exploration Ontology-based Information Retrieval An ontology presents an explicit essential understanding of the target world. It provides a base knowledge to be shared among the users. They explore the ontology according to their viewpoint and generate conceptual maps as the result. These maps represent understanding from the their own viewpoints. They can use the maps as viewpoints (combinations) to get data from multiple DBs. ONSD2013@ICEC2013
  • (Divergent) Ontology exploration tool Exploration of an ontology “Hozo” – Ontology Editor Multi-perspective conceptual chains represent the explorer’s understanding of ontology from the specific viewpoint. Conceptual maps Visualizations as conceptual maps from different view points 1) Exploration of multi-perspective conceptual chains 2) Visualizations of conceptual chains 2013/09/03 25ONSD2013@ICEC2013
  • Referring to another concept 2013/09/03 26 Node represents a concept (=rdfs:Class) slot represents a relationship (=rdf:Property) Is-a (sub-class-of) relationshp ONSD2013@ICEC2013
  • 272013/09/03 ONSD2013@ICEC2013
  • 2013/09/03 28 Aspect dialog constriction tracing classes Option settings for exploration property names Conceptual map visualizer Kinds of aspects Selected relationships are traced and shown as links in conceptual map ONSD2013@ICEC2013
  • 29 Explore the focused (selected) path. 2013/09/03 ONSD2013@ICEC2013
  • Functions for ontology exploration  Exploration using the aspect dialog:  Divergent exploration from one concept using the aspect dialog for each step  Search path:  Exploration of paths from stating point and ending points.  The tool allows users to post-hoc editing for extracting only interesting portions of the map.  Change view:  The tool has a function to highlight specified paths of conceptual chains on the generated map according to given viewpoints.  Comparison of maps:  The system can compare generated maps and show the common conceptual chains both of the maps. 2013/09/03 30 Manual exploration Machine exploration ONSD2013@ICEC2013
  • 2013/09/03 31 Ending point (1) Ending point (3) Ending point (2) Search Path Starting point Selecting of ending points Finding all possible paths from stating point to ending points ONSD2013@ICEC2013
  • 2013/09/03 32 Search Path Selected ending points ONSD2013@ICEC2013
  • 2013/09/03 33 What does the result mean? Selected ending points ONSD2013@ICEC2013 Problem Kinds of method to solve the problem Possible combination of them
  • DEMO: Ontology Exploration 2013/09/03 34ONSD2013@ICEC2013
  • Usage and evaluation of ontology exploration tool  Step 1: Usage for knowledge structuring in sustainability science  Step 2: Verification of exploring the abilities of the ontology exploration tool  Step 3: Experiments for evaluating the ontology exploration tool 2013/09/03 35ONSD2013@ICEC2013
  • Sustainability Science  Sustainability Science probes interactions between global, social, and human systems, the complex mechanisms that lead to degradation of these systems, and concomitant risks to human well-being.  The journal provides a platform for building sustainability science as a new academic discipline.  These include endeavors to simultaneously understand phenomena and solve problems, uncertainty and application of the precautionary principle, the co-evolution of knowledge and recognition of problems, and trade-offs between global and local problem solving. Volume 1 / 2006 - Volume 8 / 2013 Editor-in-Chief: Kazuhiko Takeuchi Managing Editor: Osamu Saito ISSN: 1862-4065 (print version) ISSN: 1862-4057 (electronic version) 36
  • Knowledge Structuring in Sustainability Science  Sustainability Science (SS) – We aimed at establishing a new interdisciplinary scheme that serves as a basis for constructing a vision that will lead global society to a sustainable one. – It is required an integrated understanding of the entire field instead of domain-wise knowledge structuring.  Sustainability science ontology – Developed in collaboration with domain expert in Osaka University Research Institute for Sustainability Science (RISS). – Number of concepts:649, Number of slots: 1,075  Usage of the ontology exploration tool – It was confirmed that the exploration was fun for them and the tool had a certain utility for achieving knowledge structuring in sustainability science. [Kumazawa 2009] http://en.ir3s.u-tokyo.ac.jp/about_sus Sustainability Science 37
  • Biofuel Use Strategies for Sustainable Development (BforSD, FY2008-FY2010) Development of the ontology-based mapping system which create comprehensive views of problems and policy measures on biofuel (1) Structuring biofuel problems: Develop the biofuel ontology which explicitly conceptualizes biofuel problems through literature review and interviews (2)Develop an ontology exploration tool which interactively generates conceptual maps with paths between concepts in the biofuel ontology (3)In collaboration with other sub-themes, develop an application method of this map tool for policy making support to find, frame and prioritize relevant problems and policy measures. (source) US DOE 38 One of the sub-themes
  • Usage and evaluation of ontology exploration tool  Step 1: Usage for knowledge structuring in sustainability science  Step 2: Verification of exploring the abilities of the ontology exploration tool  Step 3: Experiments for evaluating the ontology exploration tool 2013/09/03 39ONSD2013@ICEC2013
  • Verification of Ontology Exploration Tool  Verification methods 1) Enrichment of SS ontology We enriched the SS ontology on the basis of 29 typical scenarios (cases) structured by domain experts in biofuel through literature review and interviews 29 scenarios (cases) 27 conceptual maps 40
  • 1) Energy services for the poor (+/−) Competition of biomass energy systems with the present use of biomass resources (such as agricultural residues) in applications such as animal feed and bedding, fertilizer, and construction materials1 (−) In many developing countries, small-scale biomass energy projects face challenges obtaining finance from traditional financing institutions1 (−) Liquid biofuels are likely to replace only a small share of global energy supplies and cannot alone eliminate our dependence on fossil fuels2 2) Agro- industrial development and job creation (+) Biofuel is powering new small- and large-scale agro-industrial development and spawning new industries in industrialized and developing countries1 (+/−) In the short-to-medium term, bioenergy use will depend heavily on feedstock costs and reliability of supply, cost and availability of competing energy sources, and government policy decisions1 (+) In the longer term, the economics of biofuel will probably improve as agricultural productivity and agro-industrial efficiency improve, more supportive agricultural and energy policies are adopted, carbon markets mature and expand, and new methodologies for carbon sequestration accounting are developed1 (+) In the longer term, expanded demand and increased prices for agricultural commodities may represent opportunities for agricultural and rural development2 (+) Biofuel industries create jobs, including highly skilled science, engineering, and business-related employment; medium-level technical staff; low-skill industrial plant jobs; and unskilled agricultural labor1 (+/−) Small-scale and labor intensive production often lead to trade-offs between production efficiency and economic competitiveness1 3) Health and gender (−) Market opportunities cannot overcome existing social and institutional barriers to equitable growth, with exclusion factors such as gender, ethnicity, and political powerless, and may even worsen them2 (−) Forest burning for development of feedstock plantation and sugarcane burning to facilitate manual harvesting result in air pollution, higher surface water runoff, soil erosion, and unintended forest fires3,4 (−) Exploitation of cheap labor (plantation and migrant workers)4 (−) Increased use of pesticides could create health hazards for labors and communities living near areas of feedstock production1,3 4) Agricultural structure (−) The demand for land to grow biofuel crops could put pressure on competing land usage for food crops, resulting in an increase in food prices1,2 (+/−) Significant economies of scale can be gained from processing and distributing biofuels on a large scale. The transition to liquid biofuels can be harmful to farmers who do not own their own land, and to the rural and urban poor who are net buyers of food1 (−) While global market forces could lead to new and stable income streams, they could also increase marginalization of poor and indigenous people and affect traditional ways of living if they end up driving small farmers without clear titles from their land and destroying their livelihood1 (+): Positive effects,(−): Negative effects,(+/−): Both positive and negative effects (Source) 1: UN-Energy (2007), 2: FAO (2008), 3: CBD (2008), 4: Martinelli et al. (2008) Positive and negative effects of biofuel 41
  • 5) Food security (−) Demand for agricultural feedstock for liquid biofuels will be a significant factor for agricultural markets and world agriculture over the next decade and perhaps beyond2 (−) Rapid growing demand for biofuel feedstock has contributed to higher food prices, which poses an immediate threat to the food security of poor net food buyers in both urban and rural areas2 (+/−) The effect of biofuels on food security is context-specific, depending on the particular technology and country characteristics involved1 6) Government budget (−) Because ethanol is used largely as a substitute for gasoline, providing a large tax reduction for blending ethanol and gasoline reduces government revenue from this tax, mainly targeting the non-poor1 (−) Production of biofuels in many countries, except sugarcane-based ethanol production in Brazil, is not currently economically viable without subsidies, given existing agricultural production and biofuel-processing technologies and recent relative prices of commodity feedstock and crude oil2 (−) Policy intervention, especially in the form of subsidies and mandated blending of biofuels with fossil fuels, are driving the rush to liquid biofuels, which leads to high economic, social, and environmental costs in both developed and developing countries2 7) Trade, foreign exchange balance, and energy security (+) Diversifying global fuel supplies could have beneficial effects on the global oil market and many developing countries because fossil fuel dependence has become a major risk for many developing economies1 (+/−) Rapidly rising demand for ethanol has had an impact on the price of sugar and maize in recent years, bringing substantial rewards to farmers not only in Brazil and the United States but around the world1,2 (−) Linking of agricultural prices to the vicissitudes of the world oil market clearly presents risks; however, it is an essential transition to the development of a biofuel industry that does not rely on major food commodity crops1 8) Biodiversity and natural resource management (+/−) Depending on the types of crop grown, what they replaced, and the methods of cultivation and harvesting, biofuels can have negative and positive effects on land use, soil and water quality, and biodiversity1,3 (−) Problems with water availability and use may represent a limitation on agricultural biofuel production1,3 (−) Introduction of criteria, standards, and certification schemes for biofuels may generate indirect negative environmental and biodiversity effects, passively in other countries3 (−) If the production of biofuel feedstock requires increased fertilizer and pesticide use, there could be additional detrimental effects such as increase in GHGs emission and eutrophicating nutrients and biodiversity loss3 (−) Wild biodiversity is threatened by loss of habitat when the area under crop production is expanded, whereas agricultural biodiversity is vulnerable in the case of large-scale monocropping, which is based on a narrow pool of genetic material, and can also lead to reduced use of traditional varieties2,3 (+) If crops are grown on degraded or abandoned land, such as previously deforested areas or degraded crop- and grasslands, and if soil disturbances are minimized, feedstock production for biofuels can have a positive impact on biodiversity by restoring or conserving habitat and ecosystem function3 9) Climate change (+/−) Full lifecycle GHG emissions of biofuel vary widely based on land use changes, choice of feedstock, agricultural practices, refining or conversion processes, and end-use practices1,2 (−) Land use change associated with production of biofuel feedstock can affect GHG emissions; draining wetlands and clearing land with fire are detrimental with regard to GHG emissions and air quality2,3 (−) The greatest potential for reducing GHG emission comes from replacement of coal rather than petroleum fuels1 (+) Biofuels offer the only realistic near-term renewable option for displacing and supplementing liquid transport fuels1 (+): Positive effects,(−): Negative effects,(+/−): Both positive and negative effects (Source) 1: UN-Energy (2007), 2: FAO (2008), 3: CBD (2008), 4: Martinelli et al. (2008) 42
  • Verification of Ontology Exploration Tool burn agriculture=(deforestation, soil deterioration caused by farmland development for biofuel crops)⇒ harvest sugarcanes (air pollution caused by intentional burn),disruption of ecosystem caused by deforestation(water pollution) The concepts appearing in these scenarios were extracted and generalized to add into the ontology Example: Air pollution, cause of forest fire, soil deterioration, water pollution are attributed to intentional burn when forest is logged or sugarcanes are harvested in the farmland development for biofuel crops. 43
  • Verification of Ontology Exploration Tool  Verification methods 1) Enrichment of SS ontology We enriched the SS ontology on the basis of 29 typical scenarios (cases) structured by domain experts in biofuel through literature review and interviews 2) Verification of scenario reproducing operations We verified whether the ontology exploration tool could generate conceptual maps which represent original scenarios.  Result: – 93% (27/29) of the scenarios were successfully reproduced as conceptual maps. 29 scenarios (cases) 27 conceptual maps 44
  • Usage and evaluation of ontology exploration tool  Step 1: Usage for knowledge structuring in sustainability science  Step 2: Verification of exploring the abilities of the ontology exploration tool  Step 3: Experiments for evaluating the ontology exploration tool  1) Whether meaningful maps for domain experts were obtained.  2) Whether meaningful maps other than anticipated maps were obtained. 2013/09/03 45 Maps which are representing the contents of the scenarios anticipated by ontology developers at the time of ontology construction. Note: the subjects don’t know what scenarios are anticipated. ONSD2013@ICEC2013
  • Experiment for evaluating ontology exploration tool  Experimental method 1) The four experts to generated conceptual maps with the tool in accordance with condition settings of given tasks. 2) They remove paths that were apparently inappropriate from the paths of conceptual chains included in the generated maps. 3) They select paths according to their interests and enter a four-level general evaluation with free comments. 2013/09/03 46 The subjects: 4 experts in different fields. A: Agricultural economics B: Social science (stakeholder analysis) C: Risk analysis D: Metropolitan environmental planning A: Interesting B: Important but ordinary C: Neither good or poor D: Obviously wrong ONSD2013@ICEC2013
  • Experimental results (1) 2013/09/03 47 Table.2 Experimental results. A B C D Expert A 2 2 Expert A (second time) 1 1 Expert B 7 4 1 2 Expert B (second time) 6 3 3 Expert C 8 1 5 2 Expert D 3 1 1 1 Expert A 1 1 Expert B 6 5 1 Expert C 7 2 4 1 Expert D 5 3 1 1 Expert B 8 4 2 2 Expert C 4 2 2 Expert D 3 3 61 30 22 8 1 Task 3 Total Number of selected paths Path distribution based on general evaluation Task 1 Task 2 l a E n in c n p ONSD2013@ICEC2013
  • Experimental results (1) 2013/09/03 48 Table.2 Experimental results. A B C D Expert A 2 2 Expert A (second time) 1 1 Expert B 7 4 1 2 Expert B (second time) 6 3 3 Expert C 8 1 5 2 Expert D 3 1 1 1 Expert A 1 1 Expert B 6 5 1 Expert C 7 2 4 1 Expert D 5 3 1 1 Expert B 8 4 2 2 Expert C 4 2 2 Expert D 3 3 61 30 22 8 1 Task 3 Total Number of selected paths Path distribution based on general evaluation Task 1 Task 2 l a E n in c n p Number of maps generated: 13 Number of paths evaluated: 61 Number of paths evaluated: 61 A: Interesting 30 (49%) B: Important but ordinary 22 (36%) C: Neither good or poor 8(13%) D: Obviously wrong 1(2%) We can conclude that the tool could generate maps or paths sufficiently meaningful for experts. 85% ONSD2013@ICEC2013
  • Experimental results (2)  Quantitatively comparison of the anticipated maps with the maps generated by the subjects 2013/09/03 49 (N) Nodes and links included in the paths of anticipated maps (M) Nodes and links included in the paths of generated and selected by the experts 50 15050 N∩M About 75% of paths in the generated maps are new paths which is not anticipated from the typical scenarios . It is meaningful enough to claim a positive support for the developed tool. This suggests that the tool has a sufficient possibility of presenting unexpected contents and stimulating conception by the user. About half (50%) of the paths included in the anticipated maps were included in the maps generated by the experts. ONSD2013@ICEC2013
  • Summery: Use ontology to bridge datasets across domains  Basic technology  Terms (classes/instances) defined in ontology are used as common vocabulary for search data.  If the ontology has mapping to Multiple DBs, the user can search across them.  Motivation and Issue  Combinations of multiple datasets could be valuable for Big Data Analysis.  However, to get all combinations across multiple Big Data is not realistic for their size.  Requests by the users are very different according to their interests.  Ontology Engineering for Big Data to Solve the issue  Ontology Exploration contribute to obtain meaningful combinations (= viewpoints) according to the users’ interests. 2013/09/03 ONSD2013@ICEC2013 50
  • Case studies  Use ontology to bridge datasets across domains  Understanding an Ontology through Divergent Exploration  Presented at ESWC2011  Use ontology to combine deep domain knowledge and raw data  Japanese Medical Ontology project  Disease ontology and Ontology of Abnormal State  presented at ICBO (International Conference on Biomedical Ontology) 2011, 2012 and 2013 2013/09/03 52ONSD2013@ICEC2013
  • Medical ontology project in Japan  Developed ontologies  Disease ontology:  Definitions of diseases as causal chains of abnormal state.  6000+ diseases  Anatomy ontology:  Connections between blood vessel, nerves, bones : 10,000+  It based on ontological frameworks (upper level ontology) which can apply to other domains  Models for causal chains  Abnormal state ontology for data integration  General framework to define complicated structures 2013/09/03 53ONSD2013@ICEC2013
  • Disease Ontology  Definition of the disease ontology  How to connect the disease ontology to medical database 2013/09/03 54ONSD2013@ICEC2013
  • An example of causal chain constituted diabetes. 2013/09/03 55 Disorder (nodes) Causal Relationship Core causal chain of a disease (each color represents a disease) Legends loss of sight Elevated level of glucose in the blood Type I diabetes Diabetes-related Blindness Steroid diabetes Diabetes … … … … … … … … … … … possible causes and effects Destruction of pancreatic beta cells Lack of insulin I in the blood Long-term steroid treatment Deficiency of insulin Is-a relation between diseases using chain-inclusion relationship between causal chains ONSD2013@ICEC2013
  • Structural abnormality Size abnormality Formational abnormality Conformational abnormality Small in size Small in line Small in area Small in volume Narrowing tube Vascular stenosis Gastrointestinal tract stenosis Arterial stenosis … Intestinal stenosis Layer 1: Generic Abnormal States (Object- independent) Layer 3: Specific context- dependent Abnormal States Coronary stenosis in Angina pectoris Coronary stenosis in Arteriosclerosis Intestinal stenosis in Ileus Esophageal stenosis in Esophagitis Esophageal stenosis is-a Material abnormality Large in size disease dependent Blood vessel dependent Topological abnormality …… … Is-a hierarchy of Abnormality Ontology 56 Tube- dependent… Narrowing of valve Layer2: Object-dependent Abnormal States … … … Coronary stenosis 2013/09/03 ONSD2013@ICEC2013
  • Medical Department No. of Abnormal states No. of Diseases Allergy and Rheumatoid 1,195 87 Cardiovascular Medicine 3,052 546 Diabetes and Metabolic Diseases 1,989 445 Orthopedic Surgery 1,883 198 Nephrology and Endocrinology 1,706 198 Neurology 2,960 396 Digestive Medicine 1,125 233 Respiratory Medicine 1,739 788 Ophthalmology 1,306 561 Hematology and Oncology 354 415 Dermatology 908 1,086 Pediatrics 2,334 879 Otorhinolaryngology 1,118 470 Total 21,669 6,302 Disease chains Graphical Tool Hozo-Ontology Editor Clinicians from 13 medical departments describe causal chains of diseases : • 6,302 diseases •21,669 abnormal states 2013/09/03 ONSD2013@ICEC2013
  • Medical Department No. of Abnormal state No. of Disease Allergy and Rheumatoid 1,195 87 Cardiovascular Medicine 3,052 546 Diabetes and Metabolic Diseases 1,989 445 Orthopedic Surgery 1,883 198 Nephrology and Endocrinology 1,706 198 Neurology 2,960 396 Digestive Medicine 1,125 233 Respiratory Medicine 1,739 788 Ophthalmology 1,306 561 Hematology and Oncology 354 415 Dermatology 908 1,086 Pediatrics 2,334 879 Otorhinolaryngology 1,118 470 Total 21,669 6,302 Each Clinician defines diseases in terms of causal chains at his/her division Causal Relationship Abnormal States Myocardial Infarction (disease) 2013/09/03
  • Each Clinician defines diseases in terms of causal chains at his/her division Causal Relationship Abnormal States Myocardial Infarction (disease) •Using three layer-model of abnormality ontology •Combining causal chains including the same or related abnormal states by consulting is-a hierarchy ⇒Generic causal chains can be generated. 59 Layer 3 Layer 2 Layer 1
  • Causal Relationship Abnormal States Myocardial Infarction (disease) Layer 3 Layer 2 Layer 1 Each Clinician describes the definition of disease (causal chains of disease)at particular department 60 From 13medical divisions All 21,000 abnormal states can be visualized with possible causal relationships •Using three layer-model of abnormality ontology •Combining causal chains including the same or related abnormal states by consulting is-a hierarchy ⇒Generic causal chains can be generated.
  • Knowledge provided by the Disease Ontology  Definition of disease  It can answer the following questions;  What abnormal state could be a cause of which diseases?  What condition may be occur on a patient of the disease?  That is it can provide base knowledge to analysis big data related to disease. 2013/09/03 ONSD2013@ICEC2013 61
  • DEMO:  Visualization of abnormal state ontology with possible causal relationships  Java client application Developed by HOZO API.  Disease Chain LOD  Linked Open Data converted from the disease ontology.  SPARQL endpoint (web API for query) and Visualization Tool of Disease Chains by HTML5.  http://lodc.med-ontology.jp/ 2013/09/03 62ONSD2013@ICEC2013
  • SPARQL Endpoint (c)The user can also browse connected triples by clicking rectangles that represent the objects. (a)The user can make simple SPARQL queries by selecting a property and an object from lists. (b) When the user selects a resource shown as a query result, triples connected the resource are visualized. 2013/09/03 63ONSD2013@ICEC2013
  • 2013/09/03 64ONSD2013@ICEC2013
  • Abnormal state Is-a hierarchy Clinical DB knowledge data attribute⇔property interoperability 65 Anomaly representation Abnormal states Layers Generic Chains Disease chains 2013/09/03
  • Summary(2):Disease Ontology  Disease Ontology  Provides domain knowledge described by medical experts.  Medical DB (Big Data)  Provides evidential data from medial information system such as electronic medical records. It could be a good example to combine Ontology and Big Data. 2013/09/03 66 Existing Knowledge Evidence / New Knowledge ONSD2013@ICEC2013
  • Concluding Remarks  Ontology Engineering for Big Data  Combination of them are good!  Basic technology: how to combine ontology to big data  Mapping ontology to database  Add metadata on data using vocabulary defined in ontology  Convert database (e.g. RDB) to ontology-based (RDF) database  How to use Combinations of Ontology and Big Data: Two possible approaches  Use ontology to bridge datasets across domains  Ontology exploration method to obtain meaningful combinations (= viewpoints)  Use ontology to combine deep domain knowledge and raw data  Future Plan  Generalizing our approaches and feedback them as new function of Hozo 2013/09/03 67ONSD2013@ICEC2013
  • Acknowledgements  A part of this work was supported by JSPS KAKENHI Grant Numbers 24120002 and 22240011.  A part of research on medical ontology is supported by the Ministry of Health, Labor and Welfare, Japan, through its “Research and development of medical knowledge base databases for medical information systems” and by the Japan Society for the Promotion of Science (JSPS) through its “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program)”.  I’m also grateful to all collaborator of each study. 2013/09/03 ONSD2013@ICEC2013 68
  • Acknowledgement 2013/09/03 Thank you for your attention! Hozo Support Site: http://www.hozo.jp/ Contact: kozaki@ei.sanken.oaka-u.ac.jp 69ONSD2013@ICEC2013