euclid_linkedup WWW tutorial (Besnik Fetahu)
Upcoming SlideShare
Loading in...5
×
 

euclid_linkedup WWW tutorial (Besnik Fetahu)

on

  • 1,074 views

 

Statistics

Views

Total Views
1,074
Views on SlideShare
477
Embed Views
597

Actions

Likes
1
Downloads
7
Comments
0

6 Embeds 597

http://www.euclid-project.eu 445
http://linkedup-project.eu 139
http://euclid-project.eu 9
http://linkedup.okfn.org 2
http://feedly.com 1
http://oci.open.ac.uk 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

euclid_linkedup WWW tutorial (Besnik Fetahu) euclid_linkedup WWW tutorial (Besnik Fetahu) Presentation Transcript

  • Online Learning and Linked Data Lessons Learned and Best Practices Dataset Profiling 3. April 2014 1Besnik Fetahu
  • LinkedUp: Data Catalog Features  34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)  VoID representations of datasets include the following information:  Manual dataset schema alignments  Accessibility information, i.e. SPARQL endpoint URL 3. April 2014 2Besnik Fetahu http://purl.org/ontology/bibo/Thesis owl:equivalentClass http://purl.org/ontology/bibo/Thesis http://swrc.ontoware.org/ontology#Article owl:equivalentClass http://purl.org/ontology/bibo/AcademicArticle http://data.linkededucation.org/linkedup/dataset/data-open-ac-uk void:sparqlEndpoint http://data.open.ac.uk/queryCo-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
  • LinkedUp: Data Catalog Features  34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)  VoID representations of datasets include the following information:  Datasets’ resources type graph  Datasets’ Topic Extraction (Dataset Profiling) 3. April 2014 3Besnik Fetahu morelab OpenCourseWare
  • LinkedUp: Data Catalog Features  34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)  VoID representations of datasets include the following information:  Federated query interface: 3. April 2014 4Besnik Fetahu PREFIX void: <http://rdfs.org/ns/void#> PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#> SELECT DISTINCT ?endpoint WHERE{ ?ds void:sparqlEndpoint ?endpoint. {{ ?ds void:classPartition [void:class aiiso:School] } UNION {?ds void:subset [void:classPartition [void:class aiiso:School]] }} }
  • LinkedUp: Why dataset profiling? 3. April 2014 5Besnik Fetahu  Few linked dataset characteristics (from Linked Open Data Cloud). Growing number of datasets: 227 datasets Data represented as triples: 31 billion triples Multi-lingual content: 18 languages Broad set of topics covered Inter-dataset links Domain Number of datasets Triples % (Out-)Links % Media 25 1,841,852,061 5.82 % 50,440,705 10.01 % Geographic 31 6,145,532,484 19.43 % 35,812,328 7.11 % Government 49 13,315,009,400 42.09 % 19,343,519 3.84 % Publications 87 2,950,720,693 9.33 % 139,925,218 27.76 % Cross-domain 41 4,184,635,715 13.23 % 63,183,065 12.54 % Life sciences 41 3,036,336,004 9.60 % 191,844,090 38.06 % User-generated content 20 134,127,413 0.42 % 3,449,143 0.68 % 295 31,634,213,770 503,998,829 Domains covered by “lod-cloud” datasets
  • LinkedUp: Why dataset profiling? 3. April 2014 6Besnik Fetahu Domain Number of datasets Triples % (Out-)Links % Media 25 1,841,852,061 5.82 % 50,440,705 10.01 % Geographic 31 6,145,532,484 19.43 % 35,812,328 7.11 % Government 49 13,315,009,400 42.09 % 19,343,519 3.84 % Publications 87 2,950,720,693 9.33 % 139,925,218 27.76 % Cross-domain 41 4,184,635,715 13.23 % 63,183,065 12.54 % Life sciences 41 3,036,336,004 9.60 % 191,844,090 38.06 % User-generated content 20 134,127,413 0.42 % 3,449,143 0.68 % 295 31,634,213,770 503,998,829 How do I find information about “renewable energy”? 31 billion resources 18 languages 180 organisations  How can we do that? Check datasets that cover such topic? Use SPARQL filter clause? What are all possible forms of renewable energy? 38 out of 228 datasets contain topic coverage information regex(*) filter clause needs to check all triples that contain a specific keyword renewable energy: solar energy, wind energy, geothermal…...
  • LinkedUp: How to profile Linked Data? 3. April 2014 7Besnik Fetahu  What is a linked data profile? Linked Dataset profiles consist of structured information describing their topic coverage. A profile is represented as a graph. The vertices in the profile graph consist of datasets, resources, and topics. The edges of the profile graph are constructed between the tuples ‹dataset, resources› and ‹resources, topics›. Finally, edges between resources and topics are weighted conveying the relevance of a topic for a dataset. Profile Definition <resource_uri_1> ?predicate_x value <resource_uri_1> ?predicate_y value <resource_uri_1> ?predicate_z value A dataset consists of a set of resource instances. A resource is represented by a set of triples. A topic is equivalent to a DBpedia category, associated to one of the resource values. <resource_uri_1> <resource_uri_2> …… <resource_uri_n>
  • Linked-Up: Profiling Linked Data 3. April 2014 8Besnik Fetahu i. Metadata extraction ii. Sampling of resource instances iii. Entity and topic extraction iv. Topic ranking (PageRank with Priors, HITS with Priors and K-Step Markov) v. Weighted dataset-topic profile graphs vi. Profiles representation A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles. Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes, Marco Antonio Casanova, Davide Taibi, and Wolfgang Nejdl. In Proceedings of the 11th Extended Semantic Web Conference, Springer, 2014 (to appear).
  • Profiling Linked Data – (I) 3. April 2014 9Besnik Fetahu i. Metadata extraction:  DataHub’s CKAN API i. Sampling of resource instances  weighted, random, centrality i. Entity and topic extraction  Consider only the textual values assigned to a resource  NER: Disambiguate and extract named entities (DBpedia Spotlight)
  • Profiling Linked Data – (II) 3. April 2014 10Besnik Fetahu i. Topic ranking (PageRank with Priors, HITS with Priors and K-Step Markov)  Rank topics for each dataset, and compute their relevance w.r.t the associated resources i. Weighted dataset-topic profile graph  The computed topic weights for each dataset, represent the weights for the edges <dataset, topic> i. Profiles representation (Vocabulary of Interlinked Datasets (VoID) and Vocabulary of Links (VoL))  VoID: Captures information about a Linked Dataset as a set of links  VoL : Defines a link (of entity or topic type), along with the provenance information and the relevance score of such link
  • Profiling Linked Data: Representation Example 3. April 2014Besnik Fetahu 11 Dataset Profile Metadata Dataset’s Profile and Index Entity Type Link extracted entity extracted topic Provenance information (resources) for the entity link Provenance information (entities) for the topic link Topic Type Link topic relevance score
  • SELECT ?dataset ?link ?score ?link_1 ?entity ?resource WHERE { ?dataset a void:Linkset. ?dataset vol:hasLink ?link. ?link vol:linksResource <http://dbpedia.org/resource/Category:Renewable_energy>. ?link vol:derivedFrom ?entity. ?link vol:hasScore ?score. ?link_1 vol:linksResource ?entity. ?dataset vol:hasLink ?link_1. ?link_1 vol:derivedFrom ?resource } ORDER BY DESC(?score) 3. April 2014Besnik Fetahu 12 How are the profiles useful? • “Renewable Energy” is in different forms: • Solar Energy • Wind-farms • Biogas • Hydroelectricity etc. http://enipedia.tudelft.nl/wiki/Windmar_Renewable_Energy http://enipedia.tudelft.nl/data/page/eGRID/Plant/57050 http://enipedia.tudelft.nl/wiki/Us_Energy_Biogas_Corp http://www.reegle.info/profiles/JP How do I find information about “renewable energy”?
  • Profiling Linked Data: Evaluation 3. April 2014Stefan Dietze 13 Profiling accuracy for the different ranking approaches using the full sample of analysed resource instances, and with NDCG score averaged over all datasets. The correlation between ranking accuracy (averaged over all datasets and for ∆NDCG ) and ranking time.
  • Profiling Linked Data: Example use cases 3. April 2014Besnik Fetahu 14  Type specific views on datasets/ categories  “Document” (foaf:document)  “Person “ (foaf:person)  “Course” (aaiso:course)  LinkedUp Catalog only (as schema mappings already available here)  Exploratory functionalities over the dataset profiles  Available for LinkedUp catalog and the LOD-Cloud.
  • Online Learning and Linked Data Lessons Learned and Best Practices Cite4Me and Linked Challenge 3. April 2014Besnik Fetahu 15
  • Semantic Search and Retrieval of Publications 3. April 2014Besnik Fetahu 16 Semantic Search Graph Search Paper Recommendation In-depth Analysis Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013)
  • LinkedUp: Veni Challenge 3. April 2014Besnik Fetahu 17 DataConf. KnowNodes Mismuseos ReCredible YourHistory 3. April 2014 http://www.globe-town.org/ WeShare - 3rd price / people‘s choice GlobeTown - 2nd price http://seek.cloud.gsic.tel.uva.es/weshare/ http://www.polimedia.nl/ PoliMedia – 1st price
  • Demos and Other Resources 3. April 2014Besnik Fetahu 18 Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013) A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles. Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes, Marco Antonio Casanova, Davide Taibi, and Wolfgang Nejdl. In Proceedings of the 11th Extended Semantic Web Conference, Springer, 2014 (to appear). Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.  LinkedUp Catalog: http://data.linkededucation.org/linkedup/catalog/  DevTalk LinkedUp: http://data.linkededucation.org/linkedup/devtalk/  LOD Profile Data: http://data-observatory.org/lod-profiles/sparql  LOD Profile Explorer: http://data-observatory.org/lod-profiles/profile-explorer  Cite4Me Application: http://www.cite4me.com/  LinkedUp Challenge: http://linkedup-challenge.org/