Your SlideShare is downloading. ×
KnowEscape workshop, OKCon 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

KnowEscape workshop, OKCon 2013

1,126
views

Published on

Published in: Education, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,126
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Motivation Data on the Web Some eyecatching opener illustrating growth and or diversity of web data Curation and profiling of Linked Data KnowEscape workshop, Open Knowledge Conference 2013 (OKCon2013) Stefan Dietze1, Besnik Fetahu1, Mathieu d’Aquin2 1 L3S Research Center (Germany); 2 The Open University (UK) http://linkedup-project.eu http://purl.org/dietze @stefandietze 19/09/2013 1Stefan Dietze
  • 2. 17/09/2013 2Stefan Dietze Success models: data & applications  LinkedUp Challenge to identify innovative tools & applications  Evaluation methods and approaches http://www.linkedup-challenge.org/ “LinkedUp” – Linking Web Data for Education L Data curation Technology transfer & community-building  Collecting & exposing open data of educational relevance => LinkedUp Data Catalog  Profiling and linking of Web Data for education => educational data graph  Disseminating knowledge & building communities (educators, computer scientists, data engineers)  Gathering stakeholder feedback: use cases, and requirements http://linkedup-challenge.org/#usecases http://data.linkededucation.org http://linkedup-project.eu/events European project aimed at advancing take-up of open data and related technologies http://linkedup-project.eu
  • 3. Problem: too many datasets, too few information Stefan Dietze 19/09/13 http://datahub.io/dataset/bbc 60.000.000 triples Using/exploiting Linked Data in Education ?  Lack of reliable dataset metadata about  Resource types  Topics & disciplines  Quality, currentness & availability  Provenance  Lack of links and cross-dataset references  Lack of scalable query methods  LOD: 300+ datasets, 32++ billion distinct RDF statements  DataHub: 6000+ open datasets
  • 4.  Goal: dataset metadata & search for data consumers  “LinkedUp/Linked Education cloud” as “expanded” subset of LOD cloud at The DataHub (http://datahub.io/groups/linked-education)  RDF (VoID) catalog of datasets = dataset of datasets (Linked Education Catalog): classification of datasets according to, eg, represented types, disciplines/topics, data quality, accessability  Links and coreferences => unified view on data => Linked Education Graph  Infrastructure, unified (SPARQL) endpoint & APIs for distributed/federated querying Data curation and dataset profiling LinkedUp approach Educational Datasets LinkedUp Catalog LinkedUp Links Automated processing to generate:  Descriptive VoID/RDF Dataset Catalog  Data links 19/09/2013 4Stefan Dietze
  • 5. Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. [WEBSCI‘13] 19/09/2013 5Stefan Dietze Linked Data „Observatory“ for linking and profiling Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Category Mapping, Normalisation, Filtering Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl. , ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Generating structured Profiles of Linked Data Graphs, Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; under review. [ESCW‘13] [ISWC‘13]
  • 6. Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. <po:Programme …> <po:title>Secret Universe – The Life of the Cell</po:title> … </po:Programme…> BBC Programme <sioc:Item …> <label>Viral diseases & bacteria</title> … </sioc:Item ….> SlideShare Set po:Programme sioc:Item ? http://datahub.io/group/linked-education 19/09/2013 6Stefan Dietze
  • 7. Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Co-occurence graph after mapping (201 frequent types mapped into 79 classes) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. bibo:Slideshow bibo:Film bibo:Document 19/09/2013 7Stefan Dietze <po:Programme …> <po:title>Secret Universe – The Life of the Cell</po:title> … </po:Programme…> BBC Programme <sioc:Item …> <label>Viral diseases & bacteria</title> … </sioc:Item ….> SlideShare Set po:Programme sioc:Item
  • 8. LinkedUp Data Catalog in a nutshell http://datahub.io/group/linked-education http://data.linkededucation.org/linkedup/catalog/  VoID dataset catalog: browse, explore and query for datasets/types  Federated queries using type mappings 19/09/2013 8Stefan Dietze
  • 9. <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset Topics/categories addressed? Relatedness of resources/entities? (types, semantics) <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Generating structured Profiles of Linked Data Graphs, Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; under review. Dataset topic profiling: data heterogeneity? 19/09/2013 9Stefan Dietze
  • 10. <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme Data disambiguation, linking & profiling Brian Cox? Sun? Pluto? 19/09/2013 10Stefan Dietze
  • 11. db:Pluto (Dwarf Planet) db:Astrono- mical Objects db:Sun Data disambiguation, linking & profiling db:Astronomy 19/09/2013 11Stefan Dietze <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset
  • 12. db:Pluto (Dwarf Planet) db:Astrono- mical Objects <yov:Lecture8748720> <title>Pluto & the Dwarf Planets</title> … < yov:Lecture8748720> Online Lecture db:Astronomy  Computation of connectivity scores between resources/entities  Method: combination of a  (i) semantic (graph-based) connectivity score (SCS) with  (ii) a Web co-occurence-based measure (CBM) (similar to NGD)  For (i): adaptation of Katz-Index from SNA for (linked) data graphs (considering path number and path lengths of transversal properties) Data linking Dataset categorisation: computation of normalised (DBpedia) category relevance scores for datasets db:Sun SCS = 0.32 CBM = 0.24 http://purl.org/vol/doc/ http://purl.org/vol/ns/ 19/09/2013 12Stefan Dietze Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Data disambiguation, linking & profiling <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme
  • 13. <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme db:Astrono- mical Objects db:Astronomy db:Sun Dataset profiling  Goal: extracting representative metadata („topic profile“) for each dataset  Approach: computation of normalised (DBpedia) category relevance scores  Using representative sample resource sets per reource type & dataset Generating structured Profiles of Linked Data Graphs, Fetahu, B; Adamou, A., Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; under review. DBpedia category graph
  • 14. Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset/type) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank 19/09/2013 14Stefan Dietze Dataset profiling: topic extraction process (1/2) Category Mapping, Normalisation, Filtering Step 1 – NER:  Online NER & NED vs. incremental similarity-based „NER“:  Online NER: DBpedia Spotlight  Incremental & similarity-based NER: compare [via Jaccard Index] textual desc of already extracted entities with literal values of a resource instance (assumption: recurring entities likely within datasets)
  • 15. Endpoint Retrieval & Graph Extraction Schema Extraction and Mapping Sample Graph Extraction (per dataset/type) NER & NED (per resource) Interlinking & Co- Resolution (cross-dataset) Dataset Catalog/Index Links/ Cross-references rdfs:label:„…ECB….“ ? Dataset metadata (RDF/VoID):  Schema mappings (types, properties)  Entities & categories  Topic relevance scores  Availability, currentness data (tbc) dbpedia:Finance dbpedia:Sports dbpedia:England-Wales-Cricket-Board dbpedia:European_Central_Bank 19/09/2013 15Stefan Dietze Dataset profiling: topic extraction process (1/2) Category Mapping, Normalisation, Filtering Step 1 – NER:  Online NER & NED vs. incremental similarity-based „NER“:  Online NER: DBpedia Spotlight  Incremental & similarity-based NER: compare [via Jaccard Index] textual desc of already extracted entities with literal values of a resource instance (assumption: recurring entities likely within datasets) Step 2 – Computation of profile (ranked categories)  Entities => DBpedia categories = “Topics”: extraction of topics from DBpedia entities via dcterms:subject  Expand the set of topics by leveraging hierarchical category organization (skos:broader)  Normalised topic score: topics datasets # entities for dataset D # entities for all datasets # of entities for t in dataset D # of entities for t for all datasets
  • 16. http://data.linkededucation.org/linkedup/categories-explorer http://data.linkededucation.org/ Dataset profile explorer http://data.linkededucation.org/request/pipeline/sparql
  • 17. LinkedUp Data Catalog – hands-on in a nutshell http://data.linkededucation.org http://data.linkededucation.org/linkedup/catalog/sparql http://data.linkededucation.org/request/pipeline/sparql Querying FOR datasets • Retrieving datasets for categories SELECT ?datasetname ?link ?score WHERE { ?linkset a void:Linkset. ?linkset vol:hasLink ?link. ?link vol:linksResource <http://dbpedia.org/resource/Category:Technology>. ?link vol:hasScore ?score. ?dataset a void:Dataset. ?linkset void:target ?dataset. ?dataset dcterms:title ?datasetname. FILTER (?score > 0.5) } • Retrieve datasets describing schools: select distinct ?endpoint ?cl where { ?ds void:sparqlEndpoint ?endpoint. {{?ds void:classPartition [ void:class ?cl]} UNION {?ds void:subset [ void:classPartition [ void:class ?cl] ]}} {{?cl owl:equivalentClass aiiso:School} UNION {?cl rdfs:subClassOf aiiso:School} UNION {FILTER ( str(?cl) = str(aiiso:School) ) }} } Querying THE datasets • Federated queries using mappings beetwen aaiso:school and other „school“ types prefix void: <http://rdfs.org/ns/void#> prefix aiiso: <http://purl.org/vocab/aiiso/schema#> prefix owl: <http://www.w3.org/2002/07/owl#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select distinct ?endpoint ?school ?cl where { … as above …. } service silent ?endpoint { ?school a ?cl } } 19/09/2013 17Stefan Dietze type mappings! topic profiles/scores! query federation!
  • 18. Outlookin a nutshell  Merging the two VoID datasets  Datasets and type mappings (LinkedUp Catalog)  Category annotations (data.linkededucation.org)  Extracting statistical observations (RDF Data Cube)  Feeding data back into the DataHub  Application to entire LOD cloud group on DataHub  Consideration of additional profiling features  Quality aspects  Dataset and link dynamics  Temporal and spatial coverage (=> http://www.duraark.eu) fake example 19/09/2013 18Stefan Dietze
  • 19. LinkedUp Vidi Competition 19/09/13 19 Tools and demos that analyse or integrate open web data for educational purposes • Wanted: applications tools that address real educational needs • Anyone can participate - researchers, students, developers, industry • Challenging focused tracks with clear goals • More data, more challenging, more support, more prizes More info: http://linkedup-challenge.org/ Launch at 4 November 2013 Submission deadline is 14 February 2014 20,000 Euro prize money
  • 20. Thank you! Contact  http://purl.org/dietze | @stefandietze See also (data)  http://datahub.io/group/linked-education  http://data.linkededucation.org  http://data.linkededucation.org/linkedup/catalog/  http://lak.linkededucation.org See also (general)  http://linkedup-project.eu  http://linkedup-challenge.org  http://linkededucation.org  http://linkeduniversities.org 19/09/2013 20Stefan Dietze