Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Interlinking educational data to Web of Data (Thesis presentation)

740 views

Published on

This is a thesis presentation about interlinking educational data to Web of Data. I explain how I used the Linked Data approach to expose and interlink educational data to the Linked Open Data cloud

  • Be the first to comment

Interlinking educational data to Web of Data (Thesis presentation)

  1. 1. International doctorate thesis I li ki Ed i l dInterlinking Educational data to Web of Data Presented by: Enayat Rajabi Supervisors: Salvador Sanchez-Alonso Miguel-Ángel Sicilia May 2015
  2. 2. Agendag 1 Research context1. Research context 2. Motivation 3 State of the art3. State of the art 4. General objective & approach 5. Specific objectives 6. Studies & experimentationsp 7. Conclusion & future work 2 out of 54
  3. 3. Research contextResearch context  Linked Data  An approach for e posing structured data An approach for exposing structured data (triples) on the Web  Currently, the LOD cloud includes ~10,000y, , datasets (88 Billion triples) in different domains  Datasets include metadata about objects I li ki l h l bli h li k Interlinking tools help publishers to link datasets 12 datasets! May 2007 1. Research context3 out of 54
  4. 4. Research contextResearch context  eLearning  eLearning repositories (including educational data) eLearning repositories (including educational data)  eLearning metadata schema (Dublin Core, IEEE LOM,…)  Analysis on largest educational repository with around one million metadata (GLOBE) 1. Research context4 out of 54
  5. 5. MotivationMotivation  An increasing number of educational resources are published on the Webare published on the Web.  Some of these resources are implicitly or semantically related to each other.semantically related to each other.  The Linked Data approach allows resources to be reusable, and accessible for learners.  There exist a number of tools for exposing data and semi-automatic linking between datasets. 2. Motivation5 out of 54
  6. 6. State of the art (background)State of the art (background) • Is there any mapping• Is the (meta)data Practical steps for exposing data as Linked Data y pp g tool to convert data? ( ) structure flat or hierarchical? Selecting a proper Converting d t i tproper schema or ontology data into a structured format Mapping data according to the Importing the RDF dump to a triple store • Creating a dump file • Selecting a proper triple store • Setting up a SPARQL ontology triple store Setting up a SPARQL endpoint 3. State of the art6 out of 54
  7. 7. Linked Data exposure infrastructurep 3. State of the art7 out of 54
  8. 8. State of the artState of the art  Interlinking educational data  Studying the importance of interlinking on an educational context (Stefan Dietz, 2012)  Exposing IEEE LOM as RDF  RDF binding of some IEEE LOM elements (Nil & P l é 2002)(Nilson & Palmér, 2002)  Interlinking tools  Theoretical comparison of interlinking tools (Wolger et al., 2011) 3. State of the art8 out of 54
  9. 9. General objectiveGeneral objective Investigating an interlinking approachInvestigating an interlinking approach educational contexton an 4. General objective and approach9 out of 54
  10. 10. General approachGeneral approach 4. General objective and approach10 out of 54
  11. 11. Specific objectivesSpecific objectives 1 Analyzing an eLearning metadata schema1. Analyzing an eLearning metadata schema for exposing it as Linked Open Data 2. Examining the datasets in the Linked2. Examining the datasets in the Linked Open Data cloud 3. Investigating existing interlinking tools ing g g g an educational context 4. Assessing the interlinking results andg g their advantages 5. Specific objectives11 out of 54
  12. 12. Objective 1: Analyzing a metadata schema for exposing it as Linked Open Data 6. Studies and experimentations12 out of 54
  13. 13. Exposing a flat schemaExposing a flat schema DCT titlDCTerms:title DCTerms:date DCTerms:publisher Mapping an RDB to Dublin Core 6. Studies and experimentations13 out of 54
  14. 14. Exposing a complex schema (IEEE LOM)Exposing a complex schema (IEEE LOM) 6. Studies and experimentations14 out of 54
  15. 15. IEEE LOM ontologyIEEE LOM ontology IEEE LOM schema has a hierarchical structure and it supports different kinds of data types, sopp yp , we had to:  Map the data types  Specif a correct element for identifier (URI) Specify a correct element for identifier (URI)  Choose a strategy for exposing aggregated elements (e.g., keyword)  Reuse existing vocabularies  Test the ontology in an implementationimplementation 6. Studies and experimentations15 out of 54
  16. 16. A case study based on the ontologyy gy 6. Studies and experimentations16 out of 54
  17. 17. Remarks of this investigationRemarks of this investigation  Analyzing the IEEE LOM schema for the sake of:  exposing its elements as Linked Open Data  creating an complete ontology  identifying the appropriate elements for interlinking identifying the appropriate elements for interlinking  The exposing approach was applied for other schemas as well. E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “I li ki Ed i l R W b f D h h“Interlinking Educational Resources to Web of Data through IEEE LOM”. Computer Science and Information Systems, vol. 12, No. 1, pp. 233–255, 2015. 6. Studies and experimentations17 out of 54
  18. 18. Objective 2: Examining the datasets in the Linked Open Data cloud
  19. 19. The LOD datasets analysisThe LOD datasets analysis We analyzed the Linked Open Data cloud to realize: 1. what datasets are more important in the cloud to be linked in an educational domain?to be linked in an educational domain?  Examining the LOD cloud using Social Network Analysis (SNA) 2. what educational datasets are appropriate for interlinking?  Selecting a set of educational datasets in Selecting a set of educational datasets in datahub using some metrics 6. Studies and experimentations19 out of 54
  20. 20. The LOD datasets analysisThe LOD datasets analysis  We considered the LOD cloud as a directed graph and analyzed them according the following SNA metrics:analyzed them according the following SNA metrics:  Betweenness Centrality (BC): If a dataset has a high BC value, then many datasets are connected through it to others.  In-Degree: the number of datasets point to the current dataset D I D O D B C li g p  Out-Degree: the number of datasets that to the current dataset point to Dataset In-Degree Out-Degree Betweenness Centrality DBpedia 181 30 82,664 Geonames 55 0 10,958 DrugBank 8 12 7,446 Bio2rdf-goa 11 8 3,751 Ordance-survey 16 0 3 272Ordance survey 16 0 3,272 6. Studies and experimentations20 out of 54
  21. 21. The LOD datasets analysisThe LOD datasets analysis High BCHigh BC 6. Studies and experimentations21 out of 54
  22. 22. Selecting educational datasetsSelecting educational datasets Exploring the LOD cloud to find educational d t t i th f ll i tdataset using the following steps:  Finding the datasets in datahub tagged as educational subjectseducational subjects  Checking their SPARQL endpoints or RDF dumps’ availability  Retrieving their specification (size, metadata schema, language…) from an interlinking point of view using SPARQLpoint of view using SPARQL 6. Studies and experimentations22 out of 54
  23. 23. Datahub endpointDatahub endpoint Exploring datahub endpoint to find educational datasets 6. Studies and experimentations23 out of 54
  24. 24. Educational datasets bubble graphEducational datasets bubble graph Selecting 20 available educational datasets 6. Studies and experimentations24 out of 54
  25. 25. Getting datasets specification using SPARQLGetting datasets specification using SPARQL Datasets Size (triple) SPARQL Endpoint Charles University in Prague 93,233,661 http://linked.opendata.cz/sparql UNISTAT-KIS 8,026,637 http://data.linkedu.eu/kis/query h d d k ( ) 7 h // l d / lAchievement Standards Network (ASN) 7,494,201 http://sparql.jesandco.org:8890/sparql Data.gov.uk 6,619,847 http://services.data.gov.uk/education/sparql University of Southampton 5,726,668 http://sparql.data.southampton.ac.uk/ Yovisto - academic video search 4,932,352 http://sparql.yovisto.com/ University of Muenster(LODUM) 4,179,372 http://data.uni-muenster.de/sparql/ O U i it i UK 3 588 626 htt //d t k/ lOpen University in UK 3,588,626 http://data.open.ac.uk/sparql University of Huddersfield 3,553,343 http://data.linkedu.eu/hud/query Semantic ISVU (Kent) 2,421,268 http://kent.zpr.fer.hr:8080/educationalProgram /sparql University of Bristol 1,885,124 http://resrev.ilrt.bris.ac.uk/data-server- workshop/sparql Aalto University 1 589 122 http://data aalto fi/sparqlAalto University 1,589,122 http://data.aalto.fi/sparql Open Courseware Consortium metadata 636,453 http://data.linkedu.eu/ocw/query OxPoints (University of Oxford) 318,392 https://data.ox.ac.uk/sparql/ TheSoz Thesaurus for the Social Sciences (GESIS) 305,329 http://lod.gesis.org/thesoz/sparql PROD 62,375 http://data.linkedu.eu/prod/query h // d d i 2 i /LMF/ l/Open Data @ Tor Vergata 54,968 http://opendata.ccd.uniroma2.it/LMF/sparql/se lect Vytautas Magnus University, Kaunas 39,279 http://kaunas.rkbexplorer.com/sparql/ MoreLab 3,906 http://www.morelab.deusto.es/joseki/articles Forge project 132 http://data.linkedu.eu/forge/query 6. Studies and experimentations25 out of 54
  26. 26. Getting entities from the datasetsGetting entities from the datasets Open University of UK endpoint 6. Studies and experimentations26 out of 54
  27. 27. Remarks of this investigation  Selecting the DBpedia dataset as the LOD hub for i t li ki d ti l d t t Remarks of this investigation interlinking educational datasets  Identifying a set of well-formed educational datasets for interlinkingdatasets for interlinking E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing Broken Links on the Web of Data: an Experiment with DBpedia,” Journal of the Association for Information Science and Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014. E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first published on March 10 2015published on March 10, 2015 6. Studies and experimentations27 out of 54
  28. 28. Objective 3: Investigating existing interlinking tools in an educational context 6. Studies and experimentations
  29. 29. Interlinking tools (comparison)Interlinking tools (comparison) Tool Domain SPARQL/ RDF Dump Manual/ Automatic Well- documented Customization flexibility GWAP Multimedia No Manual No Unknown LIMES LOD Y A i Y YLIMES LOD Yes Automatic Yes Yes LOD Refine General Yes Automatic Yes Partially RDF-IA LOD RDF Dump Automatic No Unknown SAI Multimedia No Automatic No Unknown Silk LOD Yes Automatic Yes Yes UCI LOD Y M l N U kUCI LOD Yes Manual No Unknown 6. Studies and experimentations29 out of 54
  30. 30. Interlinking tools (general idea)Interlinking tools (general idea) Source • Source data type: RDF dump• Source data type: RDF dump • Source entity: dcterms:title • Filtering: English titles Target • Target data type. SPARQL Endpoint • Target entity: dcterms:title • Filtering: English titles Setting • Matching algorithm: Trigrams• Matching algorithm: Trigrams • Threshold of acceptance: 95% • Output file format: N-TRIPLE • ... 6. Studies and experimentations30 out of 54
  31. 31. Interlinking processInterlinking process 6. Studies and experimentations31 out of 54
  32. 32. The interlinking tools (SILK)The interlinking tools (SILK) 6. Studies and experimentations32 out of 54
  33. 33. The interlinking tools (LIMES)The interlinking tools (LIMES) Source & Target datasets Condition 6. Studies and experimentations33 out of 54
  34. 34. The interlinking tools (LOD Refine)The interlinking tools (LOD Refine) 6. Studies and experimentations34 out of 54
  35. 35. Sample interlinking results (exact matched)Sample interlinking results (exact matched) Title in both datasets Globe resource Target URI Dataset name l http://www.globe- i f / t/l 2 http://schools.nyc.gov/NR/rdonlyres/6C64098 F-0C24-4B27-A22F- F542A2F97DA0/130926/TTS_G11_LiteracySSa ndScience_NuclearEnergy.pdf ASN Nuclear Energy info.org/ont/lom2o wl# 108450 http://resrev.ilrt.bris.ac.uk/research-revealed- hub/publications/118933#pub Bristol http://data.linkedu.eu/hud/book/118555 Huddersfield Bibliography http://www.globe- info.org/ont/lom2o wl#178214 http://resrev.ilrt.bris.ac.uk/research-revealed- hub/publications/15140#pub OpenUK http://data.uni- muenster de/context/istg/allegro/6/210/T0024 Muenstermuenster.de/context/istg/allegro/6/210/T0024 4773 Muenster 6. Studies and experimentations35 out of 54
  36. 36. Evaluating the interlinking toolsEvaluating the interlinking tools We used three tools to interlink GLOBE to DBpedia GLOBE and DBpedia on title 6. Studies and experimentations36 out of 54
  37. 37. Evaluating the interlinking toolsEvaluating the interlinking tools Does the result change if we use more than one tool? Common results among the tools 6. Studies and experimentations37 out of 54
  38. 38. Remarks of this investigation  Applying the interlinking tools for linking datasets i li bl th d Remarks of this investigation is a reliable method.  Silk and LIMES were the efficient tools for similarity discovery among the LOD datasets.similarity discovery among the LOD datasets. E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An empirical study on the evaluation of interlinking tools on the Web of Data,” Journal of Information Science, vol 40, pp.637– 648 2014 fi t bli h d J 11 2014648 2014, first published on June 11, 2014. 6. Studies and experimentations38 out of 54
  39. 39. Objective 4: Assessing the interlinking results and their advantages 6. Studies and experimentations
  40. 40. Evaluating the interlinking resultsEvaluating the interlinking results  Interlinking tools perform an interlinking d i t t th t h dprocess and print out the matched resources.  The question under this discussion is to what The question under this discussion is to what extent are the results reliable?  An important step after doing the interlinking is evaluating the interlinking results by human and domain expertsand domain experts. 6. Studies and experimentations40 out of 54
  41. 41. The interlinking approachThe interlinking approach 6. Studies and experimentations41 out of 54
  42. 42. GLOBE metadata analysisGLOBE metadata analysis Creating a criteria under which we can find appropriate elements for interlinking (datatype, completeness, content)pp p g ( yp , p , ) 6. Studies and experimentations42 out of 54
  43. 43. GLOBE metadata analysisGLOBE metadata analysis 6. Studies and experimentations43 out of 54
  44. 44. Interlinking resultsInterlinking results Title Keyword Taxon Coverage GLOBE 8,260 228,352 134,791 12,941 Percentage 2% 74% 76% 78% Interlinking through the Keyword element 6. Studies and experimentations44 out of 54
  45. 45. Evaluating the interlinking resultsEvaluating the interlinking results We evaluated the interlinking results from the following perspectives:the following perspectives:  Reliability  Level of agreement between the ratersg  Relationship among results (e.g., threshold 75%)  Is parent of, Is related to, Is part of  Enrichment of content Li ki d Linking one resource to many datasets on the Web 6. Studies and experimentations45 out of 54
  46. 46. Remarks of this investigationg  Human experts (the results raters) agreed that the interlinking results are reliable.  Interlinking a learning repository to several educational datasets in the LOD cloud leads to the enrichment of content.  Interlinking results can lead to duplicate metadata finding. E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first bli h d M h 10 2015 E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with Engineering-related published on March 10, 2015 Resources in GLOBE,” International Journal of Engineering and Education, Vol 31-3, 2015. 6. Studies and experimentations46 out of 54
  47. 47. Conclusions 1. Exposing eLearning metadata as Linked O D tOpen Data  A complete analysis was done on exposing the IEEE LOM schema as RDFIEEE LOM schema as RDF.  A new ontology was designed for RDF binding of IEEE LOM.  Keyword, Coverage, Classification, and Title were appropriate elements for interlinking. 7. Conclusion & Future work47 out of 54
  48. 48. Conclusions (cont.) 2. Evaluating Linked Data tools & datasets ( )  Silk and LIMES were the efficient frameworks in terms of discovering similarities between two ddatasets.  DBpedia was identified as the hub of the LOD cloud.cloud.  Twenty educational dataset were identified as the most proper targets for interlinking.  The Open University of UK includes rich metadata schema and reliable endpoint. 7. Conclusion & Future work48 out of 54
  49. 49. Conclusions (cont.) 3. Enriching the educational datasets ( ) g  Interlinking results were reviewed and verified by human experts.  Several educational resources were linked to more than one resources in the LOD cloud.  A duplicate identification was proposed after the analysis of the interlinking results. 7. Conclusion & Future work49 out of 54
  50. 50. Additional contributions  Implementing several platform for exposing data as Linked DataLinked Data  Organic.Edunet (http://data.organic-edunet.eu)  ARIADNE (http://ariadne.grnet.gr)  Open Discovery Space (http://data opendiscoveryspace eu) Open Discovery Space (http://data.opendiscoveryspace.eu)  Agrega (http://agrega2.red.es/ )  Submitting the IEEE LOM ontology to Linked Open Vocabularies (LOV) atVocabularies (LOV) at http://lov.okfn.org/dataset/lov/vocabs/lom  Developing an online Mashup to interlinking eLearning objects to Web of Data (research stay ineLearning objects to Web of Data (research stay in Agroknow)  Writing a book chapter about “Optimizing Big Data using the Linked Data approach” 7. Conclusion & Future work50 out of 54
  51. 51. Additional contributions (cont.)( ) 7. Conclusion & Future work51 out of 54
  52. 52. Future work  Content  Applying the interlinking approach to other educational repositories  Tools and software Tools and software  Extending the tools to link one datasets to several datasets at the same time  Adding some semantic similarity services to tools to improve the interlinking results  Linking educational resources by datasets crawling 7. Conclusion & Future work52 out of 54
  53. 53. Publications (journal papers)(j p p )  E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Resources to Web of Data through IEEE LOM”. Computer Science and Information Systems, vol. 12, No. 1, pp. 233–255, 2015.  E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first published on March 10, 2015 doi:10.1177/0165551515575922.  E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing Broken Links on the Web of Data: an Experiment with DBpedia,” Journal of the Association for Information Science and Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014 doi: 10.1002/asi.23109.gy ( ) pp /  E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An empirical study on the evaluation of interlinking tools on the Web of Data,” Journal of Information Science, vol 40, pp.637–648 2014, first published on June 11, 2014 doi:10.1177/0165551514538151.  E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with Engineering related Resources in GLOBE ” International Journal of Engineering and EducationEngineering-related Resources in GLOBE, International Journal of Engineering and Education, 2015. In press.  E. Rajabi, W Greller, K Niemann, K Kastrantas, S Sanchez-Alonso, Social data interoperability in educational repositories and federations , International Journal of Metadata, Semantics and Ontologies 8 (2), 169 - 178, 2013.  E. Rajabi, S. Sanchez-Alonso, M.-A. Sicilia, and N. Mouneselis, “A linked and open dataset from a network of learning repositories on organic agriculture”, British Journal of Educational Technology, submitted (under second review).  M-C Valiente, M.-A. Sicilia, E. Garcia-Barriocanal, E. Rajabi, "Adopting the metadata approach to improve the search and analysis of educational resources for online learning", Computers in Human 7. Conclusion & Future work53 out of 54 p y g , p Behavior. 2015. In press.
  54. 54. Publications (conference papers)( p p )  E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with GLOBE Resources,” presented at the First International Conference on Technological Ecosystem for Enhancing Multiculturality, Salamanca, Spain, 2013.g y g y, , p ,  E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, "Research Objects Interlinking: The Case of Dryad Repository”, presented at Metadata and Semantics Research, Karlsruhe, Germany, 2014.  E. Rajabi, and S. Sanchez-Alonso, "Enriching the e-learning contents usingj , , g g g interlinking”, presented at 5th eLearning conference, Belgrade, Serbia, 2014. Link: https://scholar.google.es/scholar?oi=bibs&cluster=16249634834288673991&bt nI=1&hl=en  E. Rajabi, M.-A. Sicilia, S. Sanchez-Alonso, A Simple Approach towards SKOSification of Digital Repositories , Metadata and Semantics Research, 67-74, 2013.  M-A. Sicilia, S. Sanchez-Alonso, E. Garcia-Barriocanal, J. Minguillón and E. Rajabi, Exploring the keyword space in large learning resource aggregations: the case of GLOBE, Lacro workshop, April 2013. 7. Conclusion & Future work54 out of 54

×