Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

685 views

Published on

The choice of which vocabulary to reuse when modeling and publishing Linked Open Data (LOD) is far from trivial. There is no study that investigates the different strategies of reusing vocabularies for LOD modeling and publishing. In this paper, we present the results of a survey with 79 participants that examines the most preferred vocabulary reuse strategies of LOD modeling. The participants, LOD publishers and practitioners, were asked to assess different vocabulary reuse strategies and explain their ranking decision. We found significant differences between the modeling strategies that range from reusing popular vocabularies, minimizing the number of vocabularies, and staying within one domain vocabulary. A very interesting insight is that the popularity in the meaning of how frequent a vocabulary is used in a data source is more important than how often individual classes and properties are used in the LOD cloud. Overall, the results of this survey help in better understanding the strategies how data engineers reuse vocabularies and may also be used to develop future vocabulary engineering tools.

Published in: Science, Technology, Education
2 Comments
0 Likes
Statistics
Notes
  • It's stated on slide 7: We define a popular vocabulary by the number of data sets using a vocabulary and the total number of occurrences of the vocabulary term. The higher the numbers the 'more popular' we consider the vocabulary. However, as far as I know, there is no set of 'agreed' popular vocabularies but the official W3C list.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Is there a set of agreed 'popular vocabularies' ? Or how do you define a 'popular vocabulary' ?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
685
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
6
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide
  • Provide clear data structure
    Make data easier to be consumed
    Establish an ontological agreement in data representation
  • Provide clear data structure
    Make data easier to be consumed
    Establish an ontological agreement in data representation
  • Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

    1. 1. Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling Johann Schaible GESIS Leibniz-Institute for the Social Sciences, Cologne, Germany johann.schaible@gesis.org Thomas Gottron Institute for Web Science and Technologies, University of Koblenz- Landau, Germany gottron@uni-koblenz.de Ansgar Scherp Kiel University and Leibniz Information Center for Economics, Kiel, Germany mail@ansgarscherp.net 1) Extended Version as technical report: http://bit.ly/lodsurveyreport 2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible
    2. 2. • How to… – …choose which vocabulary to reuse? – …find an appropriate mix of vocabularies? • In order to achieve aspects, such as – providing a clear data structure – making data easier to be consumed – Achieving ontological agreement  Leads to different reuse strategies  Based on experience and “gut-feeling” Motivation…
    3. 3. …and Contribution Condense and aggregate expert’s knowledge and experience (“gut-feeling”) 1. Which aspects for reusing vocabularies are most important 2. Which vocabulary reuse strategy to follow in a real-world scenario
    4. 4. Survey Design Ranking Task T1 Ranking Task T2 Ranking Task T3 Aspects for reusing vocabularies Reasons for ranking decision Reasons for ranking decision Reuse vs. Interlink Appropriate Mix of vocabularies Additional Meta- Information • Perspective of a LOD modeler • “Suppose, you have to model data as LOD…“
    5. 5. Ranking Tasks Structure Assignment: • Model data from a specific domain as LOD • Need to reuse vocabularies • “Which of the provided options do you consider the better vocabulary reuse strategy”
    6. 6. Ranking Tasks Example Strategy minV: Reuse a minimum amount of vocabularies Strategy pop: Reuse mainly popular vocabularies
    7. 7. Features for Popularity Number of datasets using vocabulary V Total occurrence of vocabulary term vi Strategy: minV Strategy: pop
    8. 8. Ranking Task T1 Reuse vs. Interlink • Domain: Movies and actors • Vocabulary reuse strategies: 1. pop: Reuse popular vocabularies 2. link: Define own vocabulary and link it to existing popular vocabulary () 3. max: Reuse a maximum amount of vocabularies (lower boundary) • Number of possible models to choose from: 3
    9. 9. Ranking Task T2 Find appropriate mix of different vocabularies • Domain: Publications and authors • Vocabulary reuse strategies: 1. minV: Reuse a minimum amount of vocabularies 2. max: Reuse a maximum amount of vocabularies (lower boundary) 3. pop: Reuse popular vocabularies 4. minC: Reuse a minimum amount of vocabularies per concept • Number of possible models to choose from: 4
    10. 10. Ranking Task T3 Vocabulary reuse given additional meta-information • Domain: Music and musical artists • Vocabulary reuse strategies: 1. minD: Reuse only domain specific vocabularies 2. minV: Reuse a minimum amount of vocabularies 3. pop: Reuse popular vocabularies • Number of possible model to choose from: 3
    11. 11. Results of Ranking Tasks Key insights • Reusing over interlinking • Popular vocabularies over minimizing number of vocabularies • Additional meta-information has effect on choice 11
    12. 12. Meta-Information Useful? Key insights • No definite favorite support • # of datasets a vocabulary over total term occurrence • Most common use by others information: not valuable 12
    13. 13. Aspects for vocabulary reuse 0 1 2 3 4 5 Clear Data Structure Data easier to be consumed Ontological Aggreement Before Ranking Tasks After first ranking task After second ranking task Ratingsona5-pointLikert-scale 13
    14. 14. • Linked Data experts and practitioners • Acquired through LOD and Semantic Web mailing lists • N = 79 (16 female, 63 male) (n.s. difference in answers) • 67% academia, 23% industry, 10% both • Research associates (22), postdocs (14), professors (8), engineers and other professions (27). • Age: M = 34.6, SD = 8.6 • Experience in LOD ( in years): M = 4, SD = 2.64 • Expertise in consuming and publishing LOD: M = 3.64, S = 1 (on a 5-point-Likert Scale) (n.s. difference in answers of group > 4 and group < 4) Participants
    15. 15. • Which aspect are more important? – All aspects are „somewhat important“ (Mdn = 4 ) – Aspects are rated higher in theory than in real-life • Which strategy to follow? – Preferred choice: reuse popular vocabularies Better than minimizing number of vocabularies – Popular vs. domain specific vocabularies: unclear – Interlinking has not a good uptake • Which meta-information is most useful? – # of datasets using a vocabulary – Most common use has no good uptake Conclusion 15
    16. 16. 1) Extended Version as technical report: http://bit.ly/lodsurveyreport 2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible Questions? Thank you very much for participating in the survey and helping me with my research

    ×