A structured catalog of open educational datasets

  • 760 views
Uploaded on

This slideset introduce some of our work towards building a structured educational dataset catalog.

This slideset introduce some of our work towards building a structured educational dataset catalog.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
760
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building a structured catalog for educational datasets Stefan Dietze 04/07/13 1Stefan Dietze
  • 2. Linked Open (educational) Data  LOD: 300+ datasets, 32 billion distinct RDF statements  DataHub: 6000+ open datasets 2  LinkedUp: FP7-ICT-2012-8, CSA (http://linkedup-project.eu)  Goal: enabling large-scale take-up of (Linked) Open Data (education as application context)
  • 3. Linked Open (educational) Data  LOD: 300+ datasets, 32 billion distinct RDF statements  DataHub: 6000+ open datasets http://datahub.io/dataset/bbc 60.000.000 triples Using/exploiting Linked Data in Education ?  Lack of reliable dataset metadata about  Resource types  Topics & disciplines  Quality, currentness & availability  Provenance  Lack of links and cross-dataset references  Lack of scalable query methods Example dataset description 3
  • 4. 04/07/13 4Stefan Dietze Linked Data „Observatory“ – Processing Chain Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Category Mapping, Normalisation, Filtering Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank Goals:  RDF catalog of datasets dataset of datasets (classification of datasets according to, eg, represented types, disciplines/topics, data quality, accessability)  Links and coreferences => unified view on data => Linked Education Graph  Infrastructure & APIs for federated queries
  • 5. 04/07/13 5Stefan Dietze Linked Data „Observatory“ – Processing Chain Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Category Mapping, Normalisation, Filtering Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Complex Matching of RDF Datatype Properties, Nunes, B. P., Mera, A., Casanova, M. A., Fetahu, B., Paes Leme, L. Dietze, S., 24th International Conference on Database and Expert Systems Applications – DEXA 2013, August 2013, Prague, CR. Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl. , ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Indexing of Linked Data, What’s all the data about, Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; under review. A Probabilistic Scheme for Keyword- Based Incremental Query Construction., Demidova, E., Zhou, X, Nejdl, W., IEEE Transactions on Knowledge and Data Engineering, 24(3):426-439, 2012. [DEXA13] [WEBSCI13] [ESWC13] [ISWC13?] [TKDE12]
  • 6. 04/07/13 6Stefan Dietze <yov:Lecture8748720> <yov:title>Pluto & the Dwarf Planets</yov:title> … < yov:Lecture8748720> Online Lecture <ss:SlideSet-2139393292> <title>Planetary motion & gravity</title> … </ss:Slideset-2139393292> Lecture Slideset Relatedness of resources/entities? (types, semantics) Metadata about datasets? <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Video Documentary Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Challenge: data heterogeneity
  • 7. 04/07/13 7Stefan Dietze Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Data disambiguation, linking & annotation <yov:Lecture8748720> <yov:title>Pluto & the Dwarf Planets</yov:title> … < yov:Lecture8748720> Online Lecture <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Brian Cox? Sun? Pluto? Video Documentary
  • 8. db:Pluto (Dwarf Planet) db:Astrono- mical Objects db:Sun 04/07/13 8Stefan Dietze Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Data disambiguation, linking & annotation db:Astronomy <yov:Lecture8748720> <yov:title>Pluto & the Dwarf Planets</yov:title> … < yov:Lecture8748720> Online Lecture <ss:SlideSet-2139393292> <title>Planetary motion & gravity</title> … </ss:Slideset-2139393292> Lecture Slideset <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Video Documentary
  • 9. db:Pluto (Dwarf Planet) db:Astrono- mical Objects 04/07/13 9Stefan Dietze Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Data disambiguation, linking & annotation <yov:Lecture8748720> <title>Pluto & the Dwarf Planets</title> … < yov:Lecture8748720> Online Lecture db:Astronomy  Computation of connectivity scores between resources/entities  Method: combination of a  (i) semantic (graph-based) connectivity score (SCS) with  (ii) a Web co-occurence-based measure (CBM) (similar to NGD)  For (i): adaptation of Katz-Index from SNA for (linked) data graphs (considering path number and path lengths of transversal properties) Data linking Dataset categorisation: computation of normalised (DBpedia) category relevance scores for datasets db:Sun SCS = 0.32 CBM = 0.24 <ss:SlideSet-2139393292> <title>Planetary motion & gravity</title> … </ss:Slideset-2139393292> Lecture Slideset <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Video Documentary
  • 10. Data disambiguation, linking & annotation Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). 04/07/13 10Stefan Dietze  Evaluation based on USA Today News items (80.000 entity pairs)  Manually created gold standard (1000 entity pairs)  Baseline: Explicit Semantic Analysis (ESA) => CBM/SCS: „relatedness“; ESA: „similarity“ Precision/Recall/F1 for SCS, CBM, ESA.
  • 11. Enhanced dataset descriptions on the DataHub Dataset RDF graph: correlations based on semantic annotations (categories) Dataset classification: expanded dataset catalog & graph 04/07/13 11Stefan Dietze http://linkedup-project.eu http://data.linkededucation.org/linkedup/catalog/ Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
  • 12. 04/07/13 12Stefan Dietze Thank you! http://purl.org/dietze