• Save
Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain

on

  • 279 views

We describe a domain ontology development approach that extracts domain terms from folksonomies and drive the search for classes and relationships in the Linked Open Data cloud. As a result, we obtain ...

We describe a domain ontology development approach that extracts domain terms from folksonomies and drive the search for classes and relationships in the Linked Open Data cloud. As a result, we obtain lightweight domain ontologies that combine the emergent knowledge of social tagging systems with formal knowledge from Ontologies. In order to illustrate the feasibility of our approach, we have produced an ontology in the financial domain from tags available in Delicious, using DBpedia, OpenCyc and UMBEL as additional knowledge sources.

Statistics

Views

Total Views
279
Views on SlideShare
246
Embed Views
33

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 33

https://twitter.com 33

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Tags and list names lack semantics <br /> - Polysemy <br /> - Synonyms <br /> - Morphological variations (plurals, singular) <br /> - Verb Conjugations <br /> Need to identify tag semantics and the relations between tags. <br />

Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Presentation Transcript

  • 1. Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva†, Leyla Jael García-Castro±, Alexander García*, Oscar Corcho† †{hgarcia, ocorcho}@fi.upm.es Ontology Engineering Group Universidad Politécnica de Madrid, Spain ± leylajael@gmail.com Universitat Jaume I, Castellón de la Plana, Spain *alexgarciac@gmail.com State University, Florida, USA June 2014
  • 2. Folksonomies Introduction Java Programming language Tutorial Web 2.0 User- generated Content Social Networks Tools for organizing, sharing & discovering Information Java Programming language Tagging Systems Folksonomy Java Java Persistent Access Database Knowledge Base Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 2
  • 3. • Vocabulary emerges around resources and users Golder and Huberman (2006), Marlow et al. (2006) • Maintained by a large user community • Flexible (No restricted) • Up-to-date • Emergent semantics from the aggregation of individual classifications Gruber (2007), Mika (2007), Specia and Motta (2007) Folksonomies Folksonomies as a source of knowledge Introduction Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 3
  • 4. Folksonomies Statistical-based Ontology-based State of the art TagSimilarity Measures Ontology Generation relation? Two tags are related if.. Hybrid approaches Ontology Folksonomy Ontology Ontology Cattuto et al. (2008) Markines et al. (2009) Körner et al. (2010) Benz et al. (2011) Heymann and Garcia-Molina. (2006) Begelman et al. (2006) Hamasaki et al. (2007) Jäschke et al. (2008) Kennedy et al. (2007) Mika (2007) Benz et al. (2010) Limpens et al. (2010) Angeletou et al. (2008) Cantador et al. (2008) García-Silva et al. (2009) Maala et al. (2008) Passant (2007) Tesconi et al. (2008)) Giannakidou et al. (2008) Specia and Motta (2007). Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 4
  • 5. Folksonomies State of the art Mika, 2007 Stat Yes Del,Oth Yes Yes No Yes Onto Desc Study No Hamasaki et al., 2007 Stat Yes Pol No Yes Yes No Onto Task-based No Jaschke et al., 2008 Stat Yes Del,Bib Yes Yes No No Hier Desc Study No Limpens et al., 2010 Stat Semi Oth No No Yes Yes Enri Pres/Rec No Begelman et al., 2006 Stat Yes Del,Raw Yes Yes No No Clus Desc Study No Kennedy et al., 2007 Stat Yes Fli Yes Yes Yes Yes Inst Pres/Rec No Heyman & Garcia Molina, 2006 Stat Yes Del,Cit No Yes No No Hier Task-based No Benz et al., 2010 Stat Yes Del No Yes Yes Yes Hier Pres/Rec No Giannakidou et al., 2008 Hyb Yes Fli Yes Yes Yes No Clus No No Specia & Motta, 2007 Hyb Semi Del,Fli Yes Yes Yes Yes Onto Desc Study No Angeletou et al., 2008 Ont Yes Fli Yes Yes Yes Yes Enri Pres/Rec No Cantador et al., 2008 Ont Yes Fli,Del Yes Yes No Yes Inst Pres/Rec No Tesconi et al., 2008 Ont Yes Del Yes Yes Yes Yes Enri Pres/Rec No Passant, 2007 Ont No Oth Yes Yes Yes Yes Enri Desc Study No Maala et al., 2008 Ont Yes Fli Yes Yes No Yes Enri Desc Study No Disambi- guation Sem. Ident Output Evaluation Domain Knowledge Approach Type Auto Dat Src. Select. & Cleaning Context Ident. Statistical-based • Most of the approaches do not distinguish between classes and instances • Relation semantics is limited to some types and is not precesily defined • No domain knowledge Limitations Ontology-based • All the approaches produce either enrichments or instances (No Classes) • Relations are not identified • No domain knowledge Hybrid • Semi-automatic ontology generation • No domain knowledge Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 5
  • 6. Proposal Goal: Generate a domain baseline ontology, containing classes and relationships, out of folksonomy information. Folksonomy Terminology Extraction List of domain terms Domain Experts Semantic Elicitation Linked Open Data* *“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” drive the extraction of domain classes and relationships from LOD Domain relevant resources (URL) Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 6
  • 7. We propose a process to extract domain knowledge from large and generic knowledge bases which is driven by the domain terminology in the folksonomy • It may save time in the ontology development process • It allows ontology engineers to understand the domain with a limited participation of domain experts. • Smaller and more focused ontologies which are potentially easier to understand and maintain. • complex queries and reasoning task may execute faster on smaller data sets • In observance of methodological practice, our technique harvests community knowledge and reuses existing ontologies • The Ontology has links to external classes and relationships available in the Linked Open Data cloud. Benefits Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 7
  • 8. Challenges Problem: Tags lack semantics Ambiguity Synonyms Acronyms Morphological variations Plurals Singulars Verb Conjugations Misspellings Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 8
  • 9. Goal: To extract domain terminology from the folksonomy Folksonomy A = U x T x R, G = (V,E) where V = U ∪T ∪ R, and E ={(u, t, r)|(u, t, r) ∈ A} Resource graph G’ = (V’,E’) where V’ = R, and E’={(ri, rj)|∃((u, tm, ri)∈A ^ (u, tn, rj)∈A ^ tm= tn)} Spreading Activaction Seeds: Domain relevant resources from Domain Experts Nodes weighted with an activation value used to start the search. Activation value spreads to adjacent nodes by an activation function. Activation function: ~ Shared tags between the visited node and the source node, and the source node activation value. Activation function > threshold: Node marked as activated and the spreading continuous to adjacent nodes. Tags of activated nodes are collected as domain terms. Terminology Extraction Approach Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 9
  • 10. Semantic Elicitation Approach Enabling folksonomies for knowledge extraction: A semantic grounding approach (2012) A García-Silva, I Cantador, Ó Corcho International Journal on Semantic Web and Information Systems 8 (3), 24-41 • Normalize the tag to the standard notation of DBpedia resource titles • Search for a resource with a label equal to the normalized tag using SPARQL • If not exists: Use an spelling suggestion service and search again • If exists: Check if it is related to a disambiguation resource • If true: retrieve disambiguation candidates Select the most similar candidate to the tag context • Vector space model • Candidate Resources represented using their textual descriptions • Tag represented using its context (i.e, cooccurrent tags) • Selection of most similar candidate using Cosine • If false: Select the resource (Default sense in Wikipedia) Goal: To relate domain terms (tags) to DBpedia resources Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 10
  • 11. Semantic Elicitation Approach Goal: Identify classes from resources • Use ask constructor to verify if the entity is a class • If not: • Create queries to traverse all the possible paths of equivalent relations between the entity and a class in the RDF graph # Query 1. ASK{<resource> <rdf:type> <rdfs:Class>} # Query 2 SELECT ?class WHERE{ <resource> ?rel1 ?class. ?class <rdf:type> <rdfs:Class> FILTER (?rel1 = <owl:sameAs>) } # Query 3 SELECT ?class WHERE{ <resource> ?rel1 ?node. ?node ?rel2 ?class. ?class <rdf:type> <rdfs:Class> FILTER((?rel1 = <owl:sameAs>) && (?rel2 = <owl:sameAs>))} RelFinder: Revealing Relationships in RDF Knowledge Bases. Philipp Heim, Sebastian Hellmann, Jens Lehmann, Steffen Lohmann and Timo Stegemann In: Proceedings of the 4th International Conference on Semantic and Digital Media Technologies (SAMT 2009), pages 182-187. Springer, Berlin/Heidelberg, 2009. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 11
  • 12. Semantic Elicitation Approach Goal: To identify relations between classes • For each pair of classes • Create queries to traverse all the possible paths between two classes in the RDF graph, and retrieve the relationships. Caveats • May result in adding non relevant domain information to the ontology • Large path • Path passes through abstract concepts or relationships • cyc:ObjectType • umbel:RefConcept Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 12
  • 13. Semantic Elicitation Approach Minimizing the risk to add non relevant information to the ontology • Keep the path length short • Our experiments show satisfactory results with short path lengths that allow us to enrich the initial set of classes while preserving the precision of the ontology • Avoid high level concepts • Create lists of high level concepts collected from the knowledge base vocabularies to filter out the paths containing those concepts • Knowledge base core vocabularies are usually well documented • http://umbel.org/specications/vocabulary • http://mappings.dbpedia.org/server/ontology/classes/ • http://www.cyc.com/kb/thing • Use semantic similarity distances • Wu and Palmer, 1994 : Depth of the classes and the common subsumer in the taxonomy • Jiang and Conrath, 1997: subclasses per class, class depth, information content, etc. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 13
  • 14. Experiment in the financial Domain Evaluation Finance vocabulary Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 14 Input Evaluation
  • 15. Experiment in the financial Domain Evaluation 15Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain Terminology Extraction Finance Ontology Finance vocabulary
  • 16. • Ran the process with an activation threshold 0.8 • The ontology produced consists of 187 classes, 378 relations of 8 different types, and 12 modules. Inspecting a financial ontology Evaluation 16Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
  • 17. A Evaluation Class Precision = 80.67%, Relation Precision=96.4% 17Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain Inspecting a financial ontology Evaluation Ontology Modules Module Precision (Class) Module Precision (Class) Organization 77,80% Stock Exchange 84,60% Company 88,50% Money Transactions 100% Person 55,60% Country 100% Union 3,74% Research 100% Banker 100% Driver 0% Human 100% Member 100%
  • 18. • We have generated a method for automatically developing domain ontologies • Limited user participation • We benefit from the aggregation of the individual classifications to extract an emergent domain vocabulary • In accordance with methodological guidelines we reuse existing knowledge (The Web of Data) • We tap into existing links between data sets to collect related semantic information • We avoid, to some extent, semantic mismatches • We avoid heterogeneous representations • In practice, we expect the method will be used by ontology engineers to generate baseline ontologies that can be refined later according to the ontology requirements. Conclusions Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 18
  • 19. • Develop a method to assess automatically the validity of the relationships found in the linked data cloud: • OpenCyc Stock Exchange is owl:sameAs UMBEL Exchange of User Rights • However: • Stock Exchange is an organization • Exchange of User Rights is an event • The use of semantic similarity measures to decide whether to include or not relationships found setting up a path between two classes. • To be able to discover and use datasets in the linked data cloud that cover the domain of interest. Future Work 19Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
  • 20. Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva†, Leyla Jael García-Castro±, Alexander García*, Oscar Corcho† †{hgarcia, ocorcho}@fi.upm.es Ontology Engineering Group Universidad Politécnica de Madrid, Spain ± leylajael@gmail.com Universitat Jaume I, Castellón de la Plana, Spain *alexgarciac@gmail.com State University, Florida, USA June 2014