Linked Data for Federation of OER Data & RepositoriesStefan Dietze
An overview over different alternatives and opportunities of using Linked Data principles and datasets for federated access to distributed OER repositories. The talk was held at the ARIADNE/GLOBE convening (http://ariadne-eu.org/content/open-federations-2013-open-knowledge-sharing-education) at LAK 2013, Leuven, Belgium on 8 April 2013
This slideset introduces the LAK Dataset and Challenge, held at the Learning Analytics & Knowledge (LAK) conference in Leuven, Belgium, April 2013. Further information about the dataset and submissions is available at http://ceur-ws.org/Vol-974/ as well as http://www.solaresearch.org/events/lak/lak-data-challenge/.
Linked Data for Federation of OER Data & RepositoriesStefan Dietze
An overview over different alternatives and opportunities of using Linked Data principles and datasets for federated access to distributed OER repositories. The talk was held at the ARIADNE/GLOBE convening (http://ariadne-eu.org/content/open-federations-2013-open-knowledge-sharing-education) at LAK 2013, Leuven, Belgium on 8 April 2013
This slideset introduces the LAK Dataset and Challenge, held at the Learning Analytics & Knowledge (LAK) conference in Leuven, Belgium, April 2013. Further information about the dataset and submissions is available at http://ceur-ws.org/Vol-974/ as well as http://www.solaresearch.org/events/lak/lak-data-challenge/.
Mining and Understanding Activities and Resources on the WebStefan Dietze
Research Seminar at KMRC Tübingen, Germany, on mining and understanding of Web acivities and resources through knowledge discovery and machine learning approaches.
B2: Open Up: Open Data in the Public SectorMarieke Guy
Parallel session [B2: Open Up: Open Data in the Public Sector] run at the Institutional Web Management Workshop 2013 (IWMW 2013) event, University of Bath on 26 - 28th June 2013.
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
Presentation from mentoring event of Open Education Europa Challenge (http://www.openeducationchallenge.eu/) about using Linked Data in educational applications.
Interpreting Data Mining Results with Linked Data for Learning AnalyticsMathieu d'Aquin
Interpreting Data Mining Results with Linked Data for Learning Analytics:Motivation, Case Study and Directions
Presentation at the LAK 2013 conference - 10-04-2013
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
Talk at WWW2017 on LRMI adoption, quality and usage. Full paper here: http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/companion/p283.pdf.
Quality criteria for architectural 3D data in usage and preservation processeslindlar
Quality assessment of digital material has been just one of the new tasks the digital revolution brought into the library domain. With the first big print material digitization
efforts in the digital heritage domain dating back to the 1980ies, plenty of experience has been gathered and recommendations on best-practise published. Along the same line, libraries of today may often publish guidelines on formats or quality parameters for digital textual materials which enter their holdings.
While digital texts such as e-journals are in common use today, non-textual materials of various domains are just entering the holdings of cultural heritage institutions. An
example for this is architectural data, which is of interest to a variety of libraries and archives – ranging from special collection libraries, such as the RIBA Library of the
Royal Institute of British Architects, to national archives responsible for the archival of information about publically funded buildings. Architectural practise of today
commonly includes 3D object processing. The output of these processes is slowly reaching the aforementioned cultural heritage institutions which are now facing the task
of quality assessment of the material.
The presentation will present a first analysis of potential quality factors and compare architectural and cultural heritage domain expectations in 3D data quality. It will look at two forms of 3D data: modelled 3D objects and scanned 3D objects. The work presented in this presentation is based on work conducted in the ongoing EU FP-7 DURAARK project.
Mining and Understanding Activities and Resources on the WebStefan Dietze
Research Seminar at KMRC Tübingen, Germany, on mining and understanding of Web acivities and resources through knowledge discovery and machine learning approaches.
B2: Open Up: Open Data in the Public SectorMarieke Guy
Parallel session [B2: Open Up: Open Data in the Public Sector] run at the Institutional Web Management Workshop 2013 (IWMW 2013) event, University of Bath on 26 - 28th June 2013.
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
Presentation from mentoring event of Open Education Europa Challenge (http://www.openeducationchallenge.eu/) about using Linked Data in educational applications.
Interpreting Data Mining Results with Linked Data for Learning AnalyticsMathieu d'Aquin
Interpreting Data Mining Results with Linked Data for Learning Analytics:Motivation, Case Study and Directions
Presentation at the LAK 2013 conference - 10-04-2013
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
Talk at WWW2017 on LRMI adoption, quality and usage. Full paper here: http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/companion/p283.pdf.
Quality criteria for architectural 3D data in usage and preservation processeslindlar
Quality assessment of digital material has been just one of the new tasks the digital revolution brought into the library domain. With the first big print material digitization
efforts in the digital heritage domain dating back to the 1980ies, plenty of experience has been gathered and recommendations on best-practise published. Along the same line, libraries of today may often publish guidelines on formats or quality parameters for digital textual materials which enter their holdings.
While digital texts such as e-journals are in common use today, non-textual materials of various domains are just entering the holdings of cultural heritage institutions. An
example for this is architectural data, which is of interest to a variety of libraries and archives – ranging from special collection libraries, such as the RIBA Library of the
Royal Institute of British Architects, to national archives responsible for the archival of information about publically funded buildings. Architectural practise of today
commonly includes 3D object processing. The output of these processes is slowly reaching the aforementioned cultural heritage institutions which are now facing the task
of quality assessment of the material.
The presentation will present a first analysis of potential quality factors and compare architectural and cultural heritage domain expectations in 3D data quality. It will look at two forms of 3D data: modelled 3D objects and scanned 3D objects. The work presented in this presentation is based on work conducted in the ongoing EU FP-7 DURAARK project.
This presentation was presented at the IGeLU conference in Oxford, UK. It introduces the audience into the EU funded research project DURAARK and gives an insight for the first archieved goals and next steps concerning the preservation of three dimensional architectural data.
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...Jakob Beetz
Presentation of the DURAARK http://duraark.eu/ project at the 30th CIB W78 conference on applications of IT in AEC in Beijing 2013
http://2013cibw78.civil.tsinghua.edu.cn/
The paper presents the literature review on long term preservation of 3D architectural building data. The review identified the existing gap in the research and practice of the long term preservation of 3D architectural models,
and suggested future research opportunities in this domain.
This german presentation was presented at the 8th "Wildauer Bibliothekssymposium" in Wildau, GE. It introduces the audience into the EU funded research project DURAARK and gives an insight for the first archieved goals and next steps concerning the preservation of three dimensional architectural data.
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...lindlar
Presentation of the DURAARK project at the final seminar of the DEDICATE project ("Design's Digital Curation for Architecture") held in Glasgow on October 21st, 2013.
http://architecturedigitalcuration.blogspot.de/
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
The increasing adoption of Linked Data principles has led
to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of techniques for resource sampling from datasets, topic extraction from reference datasets and their ranking based on graphical models. To enable a good trade-off between scalability and accuracy of generated profiles, appropriate parameters are determined experimentally. Our evaluation considers topic profiles for all accessible datasets from the Linked Open Data cloud. The results show that our approach generates accurate profiles even with comparably small sample sizes (10%) and outperforms established topic modelling approaches.
This german presentation was presented at the 19th "Archivierung von Unterlagen aus digitalen Systemen" conference in Vienna, AT. It introduces the audience into the EU funded research project DURAARK and gives an insight for the preservation planning of three dimensional data.
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
An overview of recent works on entitiy linking and retrieval in large corpora, specifically bibliographic data. The works address both traditional Linked Data and knowledge graphs as well as data extracted from Web markup, such as the Web Data Commons.
Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
OSFair2017 training | Explore, model, analyze and visualize systematic resear...Open Science Fair
Natalia manola presents OpenAIRE analytics (using text & data mining)
Training title:TDM unlocking a goldmine of information
Training overview:
Text and Data Mining (TDM) is a natural ‘next step’ in open science. It can lead to new and unexpected discoveries and increase the impact of publications and repositories. This workshop showcases examples of successful TDM and infrastructural solutions for researchers. We will also discuss what is needed to make most of infrastructures and how publishers and repositories can open up their content.
DAY 2 - PARALLEL SESSION 4 & 5
Linked Data Mapping Cultures
An Evaluation of Metadata Usage and Distribution
in a Linked Data Environment
Konstantin Baierer, Evelyn Dröge, Vivien Petras, Violeta Trkulja
Berlin School of Library and Information Science, Humboldt-Universität zu Berlin
Presentation at the International Conference on Dublin Core and Metadata Applications
Austin, October 9, 2014
Metadata as Linked Data for Research Data Repositoriesandrea huang
“Every man has his own cosmology and who can say that his own is right.” said by Einstein. This is also true when we come to understand data semantics that one data may be different interpreted by different data creators, curators and re-users. Then, how do we build a better research data repository?
We start with the point made by Willis, C., Greenberg, J., & White, H. (2012) that the metadata of research data increases the access to and reuse of the data. And Stanford, Harvard, and Cornell believe the use of linked data technologies is a promising method to gather contextual information about research resources.
To look for inspiration tools that can meet the urgent needs of innovative solutions providing feature-rich services for helping data publishing such as visualization, validation & reuse in different applications by research repositories (Assante, et.al, 2016), the CKAN (Comprehensive Knowledge Archive Network) as a major solution that makes linked metadata available, citable, and validated becomes our first choice.
Original file: http://m.odw.tw/u/odw/m/metadata-as-linked-data-for-research-data-repositories/
Information Extraction and Linked Data CloudDhaval Thakker
In the media industry there is a great emphasis on providing descriptive metadata as part of the media assets to the consumers. Information extraction (IE) is considered an important tool for metadata generation process and its performance largely depend on the knowledge base it utilizes. The advances in the “Linked Data Cloud” research provide a great opportunity for generating such knowledge base that benefit from the participation of wider community. In this talk, I will discuss our experiences of utilizing Linked Data Cloud in conjunction with a GATE-based IE system.
Understanding Scientific and Societal Adoption and Impact of Science Through ...Stefan Dietze
Keynote on analysing scholarly discourse at Second International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data SemTech4STLD, held on 26 May at ESWC2024
AI in between online and offline discourse - and what has ChatGPT to do with ...Stefan Dietze
Talk at Bonn University on general AI and NLP challenges in the context of online discourse analysis. Specific focus on challenges arising from the widespread adoption of neural large language models.
An interdisciplinary journey with the SAL spaceship – results and challenges ...Stefan Dietze
Keynote at HELMeTO2022 conference, Palermo, Italy on recent research in Search As Learning (SAL), at the intersection of machine learning and cognitive psychology.
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
Inaugural lecture at Heinrich-Heine-University Düsseldorf on 28 May 2019.
Abstract:
When searching the Web for information, human knowledge and artificial intelligence are in constant interplay. On the one hand, human online interactions such as click streams, crowd-sourced knowledge graphs, semi-structured web markup or distributional semantic models built from billions of Web documents are informing machine learning and information retrieval models, for instance, as part of the Google search engine. On the other hand, the very same search engines help users in finding relevant documents, facts, or data for particular information needs, thereby helping users to gain knowledge. This talk will give an overview of recent work in both of the aforementioned areas. This includes 1) research on mining structured knowledge graphs of factual knowledge, claims and opinions from heterogeneous Web documents as well as 2) recent work in the field of interactive information retrieval, where supervised models are trained to predict the knowledge (gain) of users during Web search sessions in order to personalise rankings. Both streams of research are converging as part of online platforms and applications to facilitate access to data(sets), information and knowledge.
Analysing User Knowledge, Competence and Learning during Online ActivitiesStefan Dietze
Research talk given at Italian National Research Council (CNR), Institute for Educational Technologies (ITD) on learning analytics in everyday online activities.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
When stars align: studies in data quality, knowledge graphs, and machine lear...
Turning Data into Knowledge (KESW2014 Keynote)
1. Turning Data into Knowledge – profiling and interlinking Web datasets
Stefan Dietze
L3S Research Center
- KESW2014 -
30/09/14
1
Stefan Dietze
KESW2014
2. KESW2014
Recent work on Linked Data exploration/discovery/search
Entity interlinking & dataset interlinking recommendation
Dataset profiling
Data consistency & conflicts
Research areas
Web science, Information Retrieval, Semantic Web & Linked
Data, data & knowledge integration (mapping, classification,
interlinking)
Application domains: education/TEL, Web archiving, …
Some projects
Introduction
http://www.l3s.de/
30/09/14 2
See also: http://purl.org/dietze
Stefan Dietze
3. KESW2014
…why are there so few datasets actually used?
Date reuse and in-links focused on trusted „reference graphs“ such as DBpedia, Freebase etc
Long tail of LD datasets which are neither reused nor linked to (LOD Cloud alone 300+ datasets, 50 bn triples)
Explanations?
Linked Data is awesome, but...
30/09/14
„HTTP-accessibility“ (SPARQL, URI-dereferencing)
„Structure“ & „Semantics“ (=> shared/linked vocabularies)
„Interlinked“
„Persistent“
Hm, really?
Stefan Dietze
3
4. KESW2014
Linked data is more diverse (and messy) than we think
SPARQL endpoint availability over time [Buil-Aranda et al 2013]
Accessibility of datasets?
Less than 50% of all SPARQL endpoints actually responsive at given point of time [Buil-Aranda2013]
“THE” SPARQL protocol? No, but many variants & subsets
“Semantics”, links, quality?
…data accuracy (eg DBpedia)? [Paulheim2013]
…vocabulary reuse? [D’AquinWebSci13]
…schema compliance (RDFS, schemas) [HoganJWS2012]
Stefan Dietze
SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013).
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525
An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., Journal of Web Semantics 14, 2012
30/09/14
4
5. KESW2014
What about data consistency?
Analyzing Relative Incompleteness of Movie Descriptions in the Web of Data: A Case Study, Yuan, W., Demidova, E., Dietze, S., Zhu, X., International Semantic Web Conference 2014 (ISWC2014)
30/09/14
Stefan Dietze
5
6. KESW2014
Too many/diverse datasets, too little knowledge
Stefan Dietze
30/09/14
?
?
?
?
?
?
Topics? Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ? Which topics are covered?
Types? Which datasets describe statistics, videos, slides, publications etc?
Quality? Currentness, dynamics, accessability/reliability, data quantity & quality?
6
7. KESW2014
db:Astro. Objects
Dataset Metadata
Stefan Dietze
30/09/14
BIBO
AAISO
FOAF
contains
Entity & dataset disambiguation & linking [ESWC13]
Topic profile extraction [WWW13, ESCW14]
db:Astronomy
db:Astro. Objects
Dataset Catalog/Registry
yov:Video
po:Programme
BBC Programme
<po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…>
<yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…>
Yovisto Video
bibo:Fil
bibo:Fi
bibo:Film
Schema mappings [WebSci13]
Data mapping, linking and profiling
7
9. KESW2014
Schema assessment and mapping
Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties)
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
po:Programme
sioc:Item
30/09/14
yov:Video
?
Stefan Dietze
9
10. KESW2014
typeX
typeX
Schema assessment and mapping
Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties)
Co-occurence after mapping into most frequent schemas (201 frequent types mapped into 79 classes)
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
bibo:Film
bibo:Document
po:Programme
sioc:Item
30/09/14
foaf:Document
yov:Video
typeX
10
11. KESW2014
Application: LinkedUp Data Catalog
in a nutshell
RDF (VoID) dataset catalog: browse &
query distributed datasets
Federated queries using type mappings
Live information about endpoint
accessibility
Stefan Dietze 30/09/14
11
http://data.linkededucation.org/linkedup/catalog/
http://datahub.io/group/linked-education
DBpedia categories
12. KESW2014
Stefan Dietze
30/09/14
contains
yov:Video
po:Programme
BBC Programme
<po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…>
<yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…>
Yovisto Video
Towards profiling: dataset disambiguation/linking
?
Relatedness of entities, meaningfulness of paths? [ESWC13]
Extraction of “topics” & relatedness of datasets [ESWC14]
?
?
?
14
db:Astro. Objects
db:CartoonCharacters
?
13. KESW2014
Stefan Dietze
30/09/14
contains
yov:Video
po:Programme
BBC Programme
<po:Programme …>
<po:Series>Wonders of the Solar System</.>
<po:Actor>Brian Cox</…>
</po:Programme…>
<yo:Video …>
<dc:title>Pluto & the Dwarf Planets</dc:title>
…
</yo:Video…>
Yovisto Video
Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013).
db:Pluto (Dwarf Planet)
db:Astrono- mical Objects
db:Sun
db:Astronomy
Computation of connectivity scores between entities
Combination of a (i) semantic (graph-based) connectivity score (SCS) with (ii) a Web co-occurence-based measure (CBM) (similar to NGD)
For (i): adaptation of Katz-Index from SNA for (linked) data graphs (considering path number and path lengths of transversal properties)
SCS = 0.32
CBM = 0.24
15
Dataset disambiguation/linking
14. KESW2014
Entity linking: evaluation
30/09/14
16
Stefan Dietze
Evaluation based on USA Today News items (80.000 entity pairs)
Manually created gold standard (1000 entity pairs)
Baseline: Explicit Semantic Analysis (ESA) => CBM/SCS: „relatedness“; ESA: „similarity“
Precision/Recall/F1 for SCS, CBM, ESA.
Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013).
15. KESW2014
„SCS Connector“ demo
http://lod2.inf.puc-rio.br/scs/SemConnectivities
SCS Connector – Quantifying and Visualising Semantic Paths between Entity Pairs, Nunes, B. P., Herrera, J. E. T., Taibi, D., Lopes, G. R., Casanova, M. A., Dietze, S., Demo Paper at 11th Extended Semantic Web Conference (ESWC2014), Heraklion, Crete, Greece, (2014. – *BEST ESWC2014 DEMO AWARD*
17
Stefan Dietze
30/09/14
16. KESW2014
Dataset Metadata
db:Astronomy
db:Astro. Objects
Dataset Catalog/Registry
yov:Video
<yo:Video …>
<dc:title>Pluto & the Dwarf Planets</dc:title>
…
</yo:Video…>
Yovisto Video
Extracting representative (DBpedia) categories („topic profile“) & entities for arbitrary datasets
Sounds easy? But how to do that for 300+ datasets with < 50 bn triples?
Scalability vs representativeness: sampling & ranking for good scalability/accuracy balance [ESWC2014] (applied to all responsive LOD datasets)
A Scalable Approach for Efficiently Generating
Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014).
Dataset profiling: what‘s the data about?
18
Stefan Dietze
30/09/14
db:Pluto (Dwarf Planet)
17. KESW2014
Efficient dataset profiling: method
1.Sampling of resource instances (random sampling, weighted sampling, resource centrality sampling)
2.Entity and topic extraction (NER via DBpedia Spotlight, category mapping and expansion)
3.Normalisation and ranking (using graphical- models such as PageRank with Priors, HITS with Priors and K-Step Markov)
Result: weighted dataset-topic profile graph
A Scalable Approach for Efficiently Generating
Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014).
19
Stefan Dietze
30/09/14
18. KESW2014
Dataset profiling: exploring LOD datasets/topics in a nutshell
http://data-observatory.org/lod-profiles/
Automatic extraction of dataset “topics” [ESWC2014] => RDF/VoiD dataset profiles
Visualisation & exploration of dataset-topic graph (datasets, topics, relationships)
Includes all (responsive) datasets of LOD Cloud
20
Stefan Dietze
30/09/14
19. KESW2014
Dataset profiling: evaluation
NDCG (averaged over all datasets) .
Datasets & Ground Truth
Yovisto, Oxpoints, LAK Dataset, Semantic Web Dogfood
Crowd-sourced topic indicators from datasets (keywords, tags)
Manual mapping to entities & category extraction (ranking according to frequency) Baselines
1) LDA, 2) tf/idf (applied to entire datasets)
Topic extraction according to our approach, weighting/ranking based on term weight Measure
NDCG @ rank l
Performance (time/NDCG) for different sampling strategies/sizes etc
21
Stefan Dietze
30/09/14
20. KESW2014
30/09/14
What (dataset) have these categories in common?
dbp:Category:1955_births
dbp:Category:People_from_London
dbp:Category:Buzzwords
dbp:Category:Semantic_Web
dbp:Category:Web_Services
dbp:Category:HTTP
dbp:Category:Unitarian_Universalists
dbp:Category:World_Wide_Web
dbp:Category:Royal_Medal_winners
Stefan Dietze
22
?
?
21. KESW2014
30/09/14
Diversity of category profile for a single publication
Berners-Lee, Tim; Hendler, James, Ora Lassila (2001). "The Semantic Web". Scientific American Magazine.
foaf:Person
foaf:Document
dbp:Tim_Berners-Lee
dbp:Category:1955_births
dbp:Category:People_from_London
dbp:Category:Buzzwords
dbp:Semantic_Web
dbp:Category:Semantic_Web
dbp:Category:Web_Services
dbp:Category:HTTP
dbp:Category:Unitarian_Universalists
first-level categories (dcterms:subject)
dbp:Category:World_Wide_Web
dbp:Category:Royal_Medal_winners
Stefan Dietze
DBLP
23
22. KESW2014
30/09/14
http://data-observatory.org/led-explorer/
Type specific views on datasets/ categories
“Document” (foaf:document)
“Person “ (foaf:person)
“Course” (aaiso:course)
Currently applied to datasets in LinkedUp Catalog only (as schema mappings already available here)
Type-specific exploration of dataset categories
Stefan Dietze
Exploring type-specific topic profiles of datasets: a demo for educational linked data, Taibi, D., Dietze, S., Fetahu, B., Fulantelli, G., Demo at International Semantic Web Conference 2014 (ISWC2014)
24
24. KESW2014
KEYSTONE & PROFILES 2014
30/09/14
27
Stefan Dietze
http://www.keystone-cost.eu/
KEYSTONE: semantic keyword-based search on structured data sources (2013-2017)
Research network focused on distributed search, dataset profiling, to Semantic Web, Databases, etc.
Open to new members (beyond Europe)
http://www.keystone-cost.eu/profiles
http://www.ijswis.org/?q=node/51/
PROFILES2014 - Dataset PROFIling & fEderated Search for Linked Data
Workshop collocated with ESWC2014
IJSWIS Special Issue on … LD search & profiling
Deadline 8 December 2014
25. KESW2014
Summing up
Summary
Increasing amounts of data => require knowledge about nature and relationships of datasets
Profiling: scalable methods for extracting dataset metadata
Interlinking: connectivity of entities or datasets What about LD evolution?
In RDF graphs (eg LOD Cloud), „all“ nodes are connected
Impact of evolution on preservation, linking and enrichment?
Which parts of datasets to preserve (entity „neighbourhood“)? => semantic relatedness /relevance/entity retrieval
Link correctness in evolving LD?
….
30/09/14
29
Stefan Dietze
26. KESW2014
Спасибо! Thank You!
WWW See also (general)
http://purl.org/dietze
http://linkedup-project.eu
http://duraark.eu
http://data.l3s.de See also (data)
http://data.l3s.de
http://data.linkededucation.org
http://lak.linkededucation.org
30/09/14
30
Stefan Dietze
Besnik Fetahu (L3S)
Elena Demidova (L3S)
Bernardo Pereira Nunes (PUC Rio)
Marco Casanova (PUC Rio)
Luiz Andre Paes Leme (PUC Rio)
Giseli Lopes (PUC Rio)
Davide Taibi (CNR, IT)
Mathieu d’Aquin (Open University, UK)
and many more…
Acknowledgements