Presented at the workshop of the "Reading Experience Database" (RED) project - London - 25/02/2011.
Discussion on how linked data can benefit research in humanities, using RED and data.open.ac.uk as early examples.
This document discusses using linked data for digital humanities projects. It describes how linked data allows for the flexible integration of heterogeneous data, metadata, and background knowledge from various sources. By reusing web resources, vocabularies, and ontologies through web standards like URIs, RDF, and SPARQL, linked data enables efficient investigation of integrated research questions across collections, institutions, and domains. It also explains how data provenance is important for digital humanities and fits well with linked data through standards like PROV-O. Examples are provided of digital history projects and a linked data project on Dutch ships and sailors that demonstrates these concepts.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
LIBER and LERU roadmap towards Open AccessLIBER Europe
The document discusses the shift towards open access and data sharing in scientific research. It notes that as research becomes more data-intensive and collaborative, new models are needed for scholarly communication that promote open data. The challenges of e-science, like managing large datasets, require solutions to balance the need for data sharing against researchers' career incentives. Research libraries and data centers must also adapt by directly supporting open scholarship and becoming more involved in the research process.
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
This document discusses exposing humanities research data as linked open data to make it more accessible and connectable. It describes the benefits of following linked data principles by putting data online in a standard format, making it addressable through URIs, and linking it to other data. As an example, it outlines how the Reading Experience Database was connected to the web of data, allowing relationships to be represented between experiences, people, documents, and other metadata. Overall, the document argues that representing research as linked data provides opportunities for reuse, linking to other resources, and deriving new insights from the connections between data.
This document discusses using linked data for digital humanities projects. It describes how linked data allows for the flexible integration of heterogeneous data, metadata, and background knowledge from various sources. By reusing web resources, vocabularies, and ontologies through web standards like URIs, RDF, and SPARQL, linked data enables efficient investigation of integrated research questions across collections, institutions, and domains. It also explains how data provenance is important for digital humanities and fits well with linked data through standards like PROV-O. Examples are provided of digital history projects and a linked data project on Dutch ships and sailors that demonstrates these concepts.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
LIBER and LERU roadmap towards Open AccessLIBER Europe
The document discusses the shift towards open access and data sharing in scientific research. It notes that as research becomes more data-intensive and collaborative, new models are needed for scholarly communication that promote open data. The challenges of e-science, like managing large datasets, require solutions to balance the need for data sharing against researchers' career incentives. Research libraries and data centers must also adapt by directly supporting open scholarship and becoming more involved in the research process.
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
This document discusses exposing humanities research data as linked open data to make it more accessible and connectable. It describes the benefits of following linked data principles by putting data online in a standard format, making it addressable through URIs, and linking it to other data. As an example, it outlines how the Reading Experience Database was connected to the web of data, allowing relationships to be represented between experiences, people, documents, and other metadata. Overall, the document argues that representing research as linked data provides opportunities for reuse, linking to other resources, and deriving new insights from the connections between data.
This document discusses linked data and semantic web technologies. It describes Mathieu d'Aquin, a research fellow at the Knowledge Media Institute of the Open University who works on semantic web, linked data, and knowledge technologies. It then provides an overview of key concepts in the semantic web and linked data, including using URIs to identify entities on the web, representing data as graphs using RDF, and linking data across the web. Examples are given of how linked data can be queried and used in applications.
The document discusses knowledge graphs and their future directions. It summarizes a panel discussion on knowledge graphs at ESWC 2020 and references several papers on industry-scale knowledge graphs, weak supervision for knowledge graph construction, and representing entities and identities in knowledge bases. It concludes that knowledge graph construction involves complex pipelines with many components and calls for an updated theory of knowledge engineering to address the demands of modern knowledge graphs at large scale and with continuous changes.
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
Content + Signals: The value of the entire data estate for machine learningPaul Groth
Content-centric organizations have increasingly recognized the value of their material for analytics and decision support systems based on machine learning. However, as anyone involved in machine learning projects will tell you the difficulty is not in the provision of the content itself but in the production of annotations necessary to make use of that content for ML. The transformation of content into training data often requires manual human annotation. This is expensive particularly when the nature of the content requires subject matter experts to be involved.
In this talk, I highlight emerging approaches to tackling this challenge using what's known as weak supervision - using other signals to help annotate data. I discuss how content companies often overlook resources that they have in-house to provide these signals. I aim to show how looking at a data estate in terms of signals can amplify its value for artificial intelligence.
The document discusses using the Semantic Web as a knowledge base for artificial intelligence applications. It describes how the Semantic Web publishes data on the web in a standardized, linked format. This vast amount of distributed knowledge could be mined by AI in various ways, such as linking data mining to find patterns, using reasoning to analyze and understand raw data, and assessing agreement between ontologies. The Semantic Web represents a large, collaborative base of formally represented knowledge that provides many opportunities for future AI research and applications.
Towards an Ontology for Historical PersonsJohn Bradley
This document discusses developing an Ontology for Historical Persons (OHP) to better structure prosopographical data on the semantic web. It provides examples of existing models like FOAF, TEI and DDH's factoid model. Developing a standardized OHP could help connect separate prosopography projects and move from closed to open collaboration. The OHP would define entities like persons, assertions, roles, events and relationships to provide a framework for consistently representing prosopographical data in a linked open manner. The document proposes an initial workshop to further explore and develop ideas for the OHP.
The Challenges of Making Data Travel, by Sabina LeonelliLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Sabina Leonelli, Exeter Centre for the Study of Life Sciences (Egenis) & Department of Sociology, Philosophy and Anthropology, University of Exeter
This document outlines plans for the Clariah Structured Data Hub project. Clariah aims to provide humanities scholars access to large digital resources and tools to enable ground-breaking research. The Structured Data Hub will curate and link structured datasets on various levels from micro to macro. It will also create tools to facilitate the research process, such as data evaluation, linking, analysis, and visualization. The project will involve a design phase with two pilot studies, followed by preparation, execution, and close phases to develop a research infrastructure with linked data and tools.
Leveraging the power of the web - Open Repositories 2015Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It notes that current systems are creating friction despite original intentions of openness. It advocates for building capacity for open, web-enabled research through infrastructure, tools, standards, incentives and training to support reuse, collaboration and interoperability. The goal is to foster sustainable communities of practitioners doing open science.
This document discusses using semantic web technologies to help make sense of big data by linking and integrating heterogeneous data sources. It presents a self-adaptive natural language interface model that takes a natural language query as input, considers possible concept annotations and SPARQL query patterns, runs the queries, and returns results to a reasoner to identify the correct query and answer. The model was tested on geography and Quran ontologies and was able to correctly answer questions with different SPARQL patterns. The conclusion discusses how semantic web and linked data can help analyze big data and create more personalized applications.
Building capacity for open, data-driven science - Grand RoundsKaitlin Thaney
Kaitlin Thaney gave a presentation on building capacity for open, data-driven science. She discussed leveraging the power of the web for open scholarship through access to content, data, code and materials. Adopting practices from open source development like code as a research object and iterative development can help further open science. Building capacity requires fostering sustainable practitioner communities through rewards, incentives and reputation systems while providing professional development support and lowering barriers to entry. Shifting to open practices is challenging and requires tools, cultural awareness, connections, skills training and incentives.
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
The document discusses the Universal Preprint Service initiative which aims to promote interoperability between preprint archives. It provides background on existing preprint models and services. The initiative is supported by several organizations and held its first meeting in 1999 to discuss technical recommendations for achieving interoperability between archives.
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
Talk covering how knowledge graphs are making us rethink how change occurs in Knowledge Organization Systems. Based on https://arxiv.org/abs/1611.00217
Making social science more reproducible by encapsulating access to linked dataAlbert Meroño-Peñuela
This document discusses improving reproducibility in social science research by encapsulating access to linked data. It proposes using GitHub to collaboratively write SPARQL queries that can be used to combine and select subsets of linked open data. A tool called GRLC is presented that automatically builds APIs from the SPARQL queries in a GitHub repository. This allows research questions to be encoded as SPARQL and executed through HTTP links. GRLC has been successfully used in several domains and projects to improve data sharing and reuse.
The presentation explores the trend towards a scholarly communication system that is friendly to machines. It presents 3 exhibits illustrating the trend and 1 exhibit illustrating inertia in the system. It makes the point that machine-actionability can be much easier achieved if content and metadata are available in Open Access and under a permissive Creative Commons license. It also observes that even with content and metadata openly available, new costs related to advanced tools to explore the scholarly record will emerge. Finally, it points at significant challenges regarding the persistence of the scholarly record in light of increasingly interconnected and actionable content and advanced tools to interact with it.
The slides were used for a plenary presentation at the LIBER 2011 Conference in Barcelona, Spain, on June 30 2011.
This presentation was provided by Scott Ziegler of Louisiana State University during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
This document describes QB'er, a tool for converting statistical datasets into linked open data on the semantic web. It aims to address problems with today's workflow for working with multiple disconnected datasets, including a lack of comparability and repeating cleaning efforts. QB'er allows researchers to standardize individual datasets according to community best practices, share code lists with colleagues, and publish standardized, interlinked datasets on a structured data hub. This grows a graph of interconnected datasets and makes the cleaning and mapping efforts reusable rather than disposable. A demonstration shows uploading a historical census dataset and mapping its variable values to codes while preserving the original values.
Linked Data in a University Context: Publication, Applications and Beyond
The Open University (OU) is exposing its data as linked open data to make it more transparent, reusable and discoverable both internally and externally. This includes data about courses, research outputs, library resources and more. By linking its data to other university and external datasets, the OU aims to create new applications and make existing processes more efficient. Other universities in the UK and worldwide are now following the OU's example in publishing institutional data as linked open data.
Experience from 10 months of University Linked Data Mathieu d'Aquin
Experience from 10 months of University Linked Data at the Open University:
1. The Open University exposed its public data as linked open data to make the data more discoverable, reusable, and integrated with other datasets.
2. Exposing data as linked data provides benefits like increased transparency, data reuse internally and externally, and reduced costs of managing the university's public data.
3. Other UK universities have since followed the Open University's example in exposing their data as linked data.
This document discusses linked data and semantic web technologies. It describes Mathieu d'Aquin, a research fellow at the Knowledge Media Institute of the Open University who works on semantic web, linked data, and knowledge technologies. It then provides an overview of key concepts in the semantic web and linked data, including using URIs to identify entities on the web, representing data as graphs using RDF, and linking data across the web. Examples are given of how linked data can be queried and used in applications.
The document discusses knowledge graphs and their future directions. It summarizes a panel discussion on knowledge graphs at ESWC 2020 and references several papers on industry-scale knowledge graphs, weak supervision for knowledge graph construction, and representing entities and identities in knowledge bases. It concludes that knowledge graph construction involves complex pipelines with many components and calls for an updated theory of knowledge engineering to address the demands of modern knowledge graphs at large scale and with continuous changes.
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
Content + Signals: The value of the entire data estate for machine learningPaul Groth
Content-centric organizations have increasingly recognized the value of their material for analytics and decision support systems based on machine learning. However, as anyone involved in machine learning projects will tell you the difficulty is not in the provision of the content itself but in the production of annotations necessary to make use of that content for ML. The transformation of content into training data often requires manual human annotation. This is expensive particularly when the nature of the content requires subject matter experts to be involved.
In this talk, I highlight emerging approaches to tackling this challenge using what's known as weak supervision - using other signals to help annotate data. I discuss how content companies often overlook resources that they have in-house to provide these signals. I aim to show how looking at a data estate in terms of signals can amplify its value for artificial intelligence.
The document discusses using the Semantic Web as a knowledge base for artificial intelligence applications. It describes how the Semantic Web publishes data on the web in a standardized, linked format. This vast amount of distributed knowledge could be mined by AI in various ways, such as linking data mining to find patterns, using reasoning to analyze and understand raw data, and assessing agreement between ontologies. The Semantic Web represents a large, collaborative base of formally represented knowledge that provides many opportunities for future AI research and applications.
Towards an Ontology for Historical PersonsJohn Bradley
This document discusses developing an Ontology for Historical Persons (OHP) to better structure prosopographical data on the semantic web. It provides examples of existing models like FOAF, TEI and DDH's factoid model. Developing a standardized OHP could help connect separate prosopography projects and move from closed to open collaboration. The OHP would define entities like persons, assertions, roles, events and relationships to provide a framework for consistently representing prosopographical data in a linked open manner. The document proposes an initial workshop to further explore and develop ideas for the OHP.
The Challenges of Making Data Travel, by Sabina LeonelliLEARN Project
1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Sabina Leonelli, Exeter Centre for the Study of Life Sciences (Egenis) & Department of Sociology, Philosophy and Anthropology, University of Exeter
This document outlines plans for the Clariah Structured Data Hub project. Clariah aims to provide humanities scholars access to large digital resources and tools to enable ground-breaking research. The Structured Data Hub will curate and link structured datasets on various levels from micro to macro. It will also create tools to facilitate the research process, such as data evaluation, linking, analysis, and visualization. The project will involve a design phase with two pilot studies, followed by preparation, execution, and close phases to develop a research infrastructure with linked data and tools.
Leveraging the power of the web - Open Repositories 2015Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It notes that current systems are creating friction despite original intentions of openness. It advocates for building capacity for open, web-enabled research through infrastructure, tools, standards, incentives and training to support reuse, collaboration and interoperability. The goal is to foster sustainable communities of practitioners doing open science.
This document discusses using semantic web technologies to help make sense of big data by linking and integrating heterogeneous data sources. It presents a self-adaptive natural language interface model that takes a natural language query as input, considers possible concept annotations and SPARQL query patterns, runs the queries, and returns results to a reasoner to identify the correct query and answer. The model was tested on geography and Quran ontologies and was able to correctly answer questions with different SPARQL patterns. The conclusion discusses how semantic web and linked data can help analyze big data and create more personalized applications.
Building capacity for open, data-driven science - Grand RoundsKaitlin Thaney
Kaitlin Thaney gave a presentation on building capacity for open, data-driven science. She discussed leveraging the power of the web for open scholarship through access to content, data, code and materials. Adopting practices from open source development like code as a research object and iterative development can help further open science. Building capacity requires fostering sustainable practitioner communities through rewards, incentives and reputation systems while providing professional development support and lowering barriers to entry. Shifting to open practices is challenging and requires tools, cultural awareness, connections, skills training and incentives.
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
The document discusses the Universal Preprint Service initiative which aims to promote interoperability between preprint archives. It provides background on existing preprint models and services. The initiative is supported by several organizations and held its first meeting in 1999 to discuss technical recommendations for achieving interoperability between archives.
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
Talk covering how knowledge graphs are making us rethink how change occurs in Knowledge Organization Systems. Based on https://arxiv.org/abs/1611.00217
Making social science more reproducible by encapsulating access to linked dataAlbert Meroño-Peñuela
This document discusses improving reproducibility in social science research by encapsulating access to linked data. It proposes using GitHub to collaboratively write SPARQL queries that can be used to combine and select subsets of linked open data. A tool called GRLC is presented that automatically builds APIs from the SPARQL queries in a GitHub repository. This allows research questions to be encoded as SPARQL and executed through HTTP links. GRLC has been successfully used in several domains and projects to improve data sharing and reuse.
The presentation explores the trend towards a scholarly communication system that is friendly to machines. It presents 3 exhibits illustrating the trend and 1 exhibit illustrating inertia in the system. It makes the point that machine-actionability can be much easier achieved if content and metadata are available in Open Access and under a permissive Creative Commons license. It also observes that even with content and metadata openly available, new costs related to advanced tools to explore the scholarly record will emerge. Finally, it points at significant challenges regarding the persistence of the scholarly record in light of increasingly interconnected and actionable content and advanced tools to interact with it.
The slides were used for a plenary presentation at the LIBER 2011 Conference in Barcelona, Spain, on June 30 2011.
This presentation was provided by Scott Ziegler of Louisiana State University during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
This document describes QB'er, a tool for converting statistical datasets into linked open data on the semantic web. It aims to address problems with today's workflow for working with multiple disconnected datasets, including a lack of comparability and repeating cleaning efforts. QB'er allows researchers to standardize individual datasets according to community best practices, share code lists with colleagues, and publish standardized, interlinked datasets on a structured data hub. This grows a graph of interconnected datasets and makes the cleaning and mapping efforts reusable rather than disposable. A demonstration shows uploading a historical census dataset and mapping its variable values to codes while preserving the original values.
Linked Data in a University Context: Publication, Applications and Beyond
The Open University (OU) is exposing its data as linked open data to make it more transparent, reusable and discoverable both internally and externally. This includes data about courses, research outputs, library resources and more. By linking its data to other university and external datasets, the OU aims to create new applications and make existing processes more efficient. Other universities in the UK and worldwide are now following the OU's example in publishing institutional data as linked open data.
Experience from 10 months of University Linked Data Mathieu d'Aquin
Experience from 10 months of University Linked Data at the Open University:
1. The Open University exposed its public data as linked open data to make the data more discoverable, reusable, and integrated with other datasets.
2. Exposing data as linked data provides benefits like increased transparency, data reuse internally and externally, and reduced costs of managing the university's public data.
3. Other UK universities have since followed the Open University's example in exposing their data as linked data.
WWW2013 Tutorial: Linked Data & EducationStefan Dietze
Linked data provides opportunities for sharing educational data on the web in a standardized way. It allows for the integration of heterogeneous educational resources and datasets from different platforms. This can enable new applications like cross-platform recommender systems and exploratory search. However, there are also challenges to address like annotation overhead, performance, and scalability when dealing with large amounts of distributed data.
The document discusses how linked open data and semantic web technologies can be applied to educational data and resources on the web. It provides examples of projects that aim to expose, interlink, and enrich educational datasets using these technologies. The goal is to improve data sharing and interoperability, facilitate reuse of open educational resources, and leverage linked data as a knowledge base to support learning and education.
Open Data Dialog 2013 - Linked Data in EducationStefan Dietze
The document discusses opportunities and challenges of using linked data in education. It begins by outlining how linked data principles can be useful for sharing educational data by providing background knowledge and common standards and vocabularies. However, it notes that currently only a few datasets are actually reused or linked, in part due to heterogeneity in datasets, unreliable metadata, and a lack of links between datasets. The LinkedUp project aims to address these issues by collecting and profiling open educational datasets, generating links between them, and building applications and tools to help utilize the data. Key activities include developing a dataset catalog, generating topic profiles of datasets, running challenges to identify innovative applications, and engaging stakeholders.
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
Presentation from mentoring event of Open Education Europa Challenge (http://www.openeducationchallenge.eu/) about using Linked Data in educational applications.
Building the Open University's Web of Linked DataMathieu d'Aquin
The Open University is exposing its data as linked open data and integrating it using semantic web technologies. This includes data about courses, educational resources, research publications, podcasts, and more. The data is hosted at data.open.ac.uk and links to external datasets. Applications are being built that combine and explore this integrated data in new ways to benefit users, such as a mobile course explorer and tools to analyze research communities and impacts. Exposing university data as linked open data is gaining adoption in the UK and beyond.
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
The document discusses exploring web data and knowledge through the semantic web. It describes how the semantic web adds meaning to data through shared vocabularies and schemas. It also discusses challenges with the large number and diversity of linked open datasets, including issues with accessibility, heterogeneity of schemas, and data quality. It proposes approaches to address these challenges, such as dataset profiling, metadata catalogs, and infrastructure for federated querying.
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Christoph Lange
The Linked Data paradigm has emerged as a powerful enabler for data and knowledge interlinking and exchange using standardised Web technologies.
In this article, we discuss our vision how the Linked Data paradigm can be employed to evolve the intranets of large organisations -- be it enterprises, research organisations or governmental and public administrations -- into networks of internal data and knowledge.
In particular for large enterprises data integration is still a key challenge. The Linked Data paradigm seems a promising approach for integrating enterprise data. Like the Web of Data, which now complements the original document-centred Web, data intranets may help to enhance and flexibilise the intranets and service-oriented architectures that exist in large organisations. Furthermore, using Linked Data gives enterprises access to 50+ billion facts from the growing Linked Open Data (LOD) cloud. As a result, a data intranet can help to bridge the gap between structured data management (in ERP, CRM or SCM systems) and semi-structured or unstructured information in documents, wikis or web portals, and make all of these sources searchable in a coherent way.
Keynote at Baltic DB&IS 2014, 9 June 2014, Tallinn, Estonia
Retrieval, Crawling and Fusion of Entity-centric Data on the WebStefan Dietze
Stefan Dietze gave a keynote presentation covering three main topics:
1) Challenges in entity retrieval from heterogeneous linked datasets and knowledge graphs due to diversity and lack of standardization.
2) Approaches for enabling discovery and search through dataset recommendation, profiling, and entity retrieval methods that cluster entities to address link sparsity.
3) Going beyond linked data to exploit semantics embedded in web markup, with case studies in data fusion for entity reconciliation and retrieval.
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
This document discusses enabling discovery and search of linked data and knowledge graphs. It presents approaches for dataset recommendation including using vocabulary overlap and existing links between datasets. It also discusses profiling datasets to create topic profiles using entity extraction and ranking techniques. These recommendation and profiling approaches aim to help with discovering relevant datasets and entities for a given topic or task.
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
An overview of recent works on entitiy linking and retrieval in large corpora, specifically bibliographic data. The works address both traditional Linked Data and knowledge graphs as well as data extracted from Web markup, such as the Web Data Commons.
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Stefan Dietze
This document discusses using linked open data to improve education and learning. It describes how educational data was previously isolated in different platforms using competing standards, which caused issues with interoperability. Linked open data standards like RDF and SPARQL are helping to connect educational datasets into a joint graph to facilitate data sharing and reuse across repositories. Projects like LinkedUp are working to profile and link educational web data to build applications that can recommend resources and give insights into learning contexts using open datasets.
The proliferation of communication technologies is profoundly changing the nature of academic practice. In this presentation I describe the impact of blogging and social networking tools on the practice and dissemination of academic research across disciplinary boundaries. I suggest that the traditional notion of the university is giving way to communities of scholars who are not tied to particular institutions, and less dependent on traditional forms of dissemination and publication. The resulting ‘democratisation’ of academia is portrayed in terms of a tension between democracy and expert knowledge mediated by technology.
One prominent contemporary challenge for technologists is to understand the ongoing impact of technological change on academic communities. At The Open University, the Digital Scholarship research team is mapping the use of Twitter in order to better understand user engagement with these technologies. I will present headline findings from this research and discuss the implications for scholarly practice at the OU.
Scholarship in a connected world: New ways to know, new ways to showDerek Keats
The document discusses how libraries and scholarship are changing in a digital world of abundance rather than scarcity. It covers four key areas: ubiquitous computing, the social academic, research data, and free and open versus secret science. The author argues that libraries must adapt to this new environment by embracing new technologies, facilitating social and open sharing of knowledge, helping with research data management, and promoting open access over secret science.
Organizational Implications of Data Science Environments in Education, Resear...Victoria Steeves
Data science (DS) poses key organizational challenges for academic institutions. DS is a multidisciplinary field that includes a range of research methodologies and fields of inquiry. DS as a domain is interested in many of the same issues as libraries: data access and curation, reproducibility, the value of ontologies, and open scholarship. At the same time, identifying opportunities to collaborate and deploy unified services can be challenging. The Data Science Environment (DSE) program, co-funded by the Gordon and Betty Moore and Alfred P. Sloan foundations, provides resources to help universities develop collaborations between researchers, develop tools in DS, and create new career paths for data scientists. Working groups within the DSE focus on reproducibility, career paths, education/training, research methods, space issues, and software/tools. This program has introduced new opportunities for libraries to explore how to engage with this community and consider how to bring the expertise in the DS community to bear on library missions and goals. In this panel, program members from each of the three partner universities, the University of Washington, New York University and the University of California, Berkeley, consider the research questions of the DSE and the organizational impact of these groups in the University as a whole and for the libraries specifically. The panel will employ a case-study presentation model framed through three lenses: the role of data sciences in information science, the
potential career paths for data scientists in libraries, and the potential
amplification of information services (e.g. data curation, institutional repositories, scholarly publishing).
CNI Program: Talk Description: https://www.cni.org/topics/digital-curation/organizational-implications-of-data-science-environments-in-education-research-and-research-management-in-libraries
Video of Talk--Vimeo: https://vimeo.com/149713097
Video of Talk--YouTube: https://www.youtube.com/watch?v=L0G9JsPMEXY
Semantic societies: uncovering new research paths, engaging members bettercharlierapple
Societies' community role is threatened by the development of substitute internet communities. Smart use of technology to add new value is one way to strengthen the position of the Society at the centre of its discipline. This paper reports on a collaborative pilot study that is building a transatlantic community around two societies' journals, and explores the challenges inherent in deploying semantic technologies in the scholarly society sector.
Digital Humanities in a Linked Data World - Semnantic AnnotationsDov Winer
This document discusses the use of semantic annotations and linked data in digital humanities projects. It begins by outlining some common "scholarly primitives" or methods used by humanities researchers, such as annotating, comparing, and representing. It then provides examples of digital humanities projects that employ techniques like semantic annotations, named entity identification, and linking open data to transform traditional scholarly workflows. Specifically, it describes projects involving networks of historical figures, semantic annotation of philosophical texts, and modeling relationships in a linked data framework. The document concludes by discussing how linked open data can treat the web as a global database and provides statistics on the growth of linked open datasets.
Similar to Exposing Humanities Data for Reuse and Linking - RED, linked data and the semantic web (20)
A factorial study of neural network learning from differences for regressionMathieu d'Aquin
The document describes a factorial study that trained neural networks to perform regression tasks using differences between cases rather than raw data. It varied factors like the amount of training data, number of epochs, number of similar cases used to determine differences, and whether original features were included with differences. The study found that learning from differences generally required similar data amounts but converged faster. Adding original features was not always beneficial but never significantly hurt performance. The best settings depended on the specific task. Learning from differences showed potential but has limitations like difficulty scaling to large datasets.
Recentrer l'intelligence artificielle sur les connaissancesMathieu d'Aquin
The document appears to contain rules for assigning values to variables (x[n]) based on logical conditions. It includes 14 rules using comparisons of the variable values, logical operators, and numeric values. It also reports the training and test accuracies of the rules as 92.13% and 89.3% respectively.
This document summarizes Mathieu d'Aquin's career path and research interests. It notes that he has worked at LORIA in Nancy, France from 2002-2006, at the Knowledge Media Institute at the Open University in Milton Keynes, UK from 2006-2017, and at the Data Science Institute at NUI Galway in Ireland from 2017-2021. His research has focused on using knowledge-driven and hybrid data-driven/knowledge-driven approaches to understand data provenance, content, and results from data analysis in order to achieve intelligent data understanding.
Unsupervised learning approach for identifying sub-genres in music scoresMathieu d'Aquin
This document discusses an unsupervised learning approach to identify sub-genres in music scores. It explores different ways of representing musical features like pitch and timing in vector formats that can be analyzed using clustering algorithms. Evaluating different feature representations on a sample of folk tunes, the best results were obtained using a combined weighting of pitch, timing, beats extracted from audio files. This approach shows potential for applications like music information retrieval, studying musical genres and connections between tunes.
Knowledge engineering remains relevant for developing knowledge-based systems and representing knowledge on the semantic web and in knowledge graphs. It also has applications in data science for understanding the relationships between data, models, and techniques. Recent work has applied knowledge engineering to explain data patterns, propagate data policies, and make technological artifacts more accessible to non-experts. The field can help scale and integrate tools for knowledge curation, explanation, and knowledge-driven data access and interpretation.
This document discusses the need to study data science as a discipline through examining the processes, techniques, and outputs. It presents data science as consisting of iterative steps like forming hypotheses, collecting and analyzing data, and extracting results. Ontologies and platforms are proposed as tools to systematically describe datasets, licenses, models, and tasks. Case studies examine modeling data flows and understanding patterns in large data science systems. The document argues for an interdisciplinary approach and using techniques like science fiction to ensure data science is developed and applied responsibly through considering social and ethical implications.
This document discusses dealing with open domain data and recent examples. It begins by explaining that typical knowledge systems are closed domain, while open domain systems can answer unknown questions. It then discusses early work using the Watson ontology and Semantic Web to build open domain question answering. A core assumption was that the Semantic Web would know everything if it continued growing, which did not occur. However, recent projects like AFEL have shown the Semantic Web and DBpedia can represent data from many domains and be used for tasks like detecting topics in activity streams, explaining patterns in data, and finding biases. While applications using open domain linked data are still limited, the ability to represent diverse data in a single graph remains important.
This document discusses web analytics and personal analytics for learning. It describes how web analytics can analyze user activities on websites and online systems. Personal analytics can help users improve their behavior by self-tracking. Learning analytics analyze student activities and data from university systems to provide recommendations and applications like vital signs dashboards for doctors. The goal of analytics for everyday learning (AFEL) is to create theory-backed methods and tools that support self-directed learners in making effective use of online resources according to their goals. A scenario is described of a learner who uses an AFEL dashboard to track her progress on different topics and set goals to focus more on areas she is weaker in, like statistics. Challenges discussed include collecting integrated personal data
Learning Analytics: understand learning and support the learnerMathieu d'Aquin
The document discusses learning analytics, which is defined as the measurement, collection, analysis, and reporting of data about learners and their contexts for the purpose of understanding and optimizing learning and the environments where it occurs. It provides examples of how learning analytics can be used for prediction, exploration, and interpretation of learning data. It also discusses challenges in recognizing and measuring learning using data from open, unconstrained online environments. Finally, it presents a cognitive model of learning and knowledge construction that involves constructive friction as the driving force behind learning.
The AFEL Project aims to create tools to support self-directed learners by analyzing data from their online activities. It collects browsing history and social media data to identify topics of interest and measure progress. Indicators show how learners engage with different topics over time. Learners can set goals which are checked against their daily activities. Recommendations guide further learning based on indicators and goals. The project developed a data platform, visual analytics tools, and a mobile app to help learners optimize their use of online resources for informal learning.
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...Mathieu d'Aquin
This document summarizes a study that assessed whether the readability of terms of use documents from various websites is adapted to the education levels of their target audiences. It finds that readability is often not well-adapted, using two main methods: analyzing over 1500 terms of use with the SMOG readability index and comparing typical education levels of website audiences in different countries. Results show mismatches between document complexity and user education levels for many US and India-based sites. The study concludes readability assessment is useful but has limitations when applied broadly.
This document discusses using data to support self-directed learning. It presents a simple model of online learning involving people, resources, topics, and organizations. A scenario is described of a learner named Jane who uses an online dashboard to view her learning activities and progress across different topics. The dashboard helps Jane realize she has been procrastinating on topics she enjoys less, like statistics, and set goals to focus more on those areas. Challenges discussed include recognizing and measuring learning in open online environments. The document also references a cognitive model of learning as a co-evolutionary process driven by "constructive friction," and identifies indicators of learning like coverage of topics.
Towards an “Ethics in Design” methodology for AI research projects Mathieu d'Aquin
The document proposes an "Ethics in Design" methodology for AI research projects. It argues that current ethics debates focus too much on technical data protection and not broader societal impacts. The methodology calls for a reflective, dialectic process involving data scientists and social scientists throughout a project's lifecycle to identify ethical issues, minimize risks, and increase positive societal impact. It explores applying this approach to two case studies and outlines principles of being dialectic, reflective, creative, and all-encompassing. The document concludes by advocating adopting these guidelines and collaborating across fields to further develop ethics methodologies.
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...Mathieu d'Aquin
The document describes how Jane, a 37-year-old administrative assistant, uses the AFEL platform to track and improve her self-directed online learning activities related to her hobbies, career development, and math skills. Jane connects data from her browsing history, Facebook, and MOOCs to the AFEL dashboard. By reviewing her dashboard daily, Jane realizes she has been procrastinating on statistics and sets goals to focus more on it. The dashboard will now remind Jane of her goals and recommend additional learning activities.
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsMathieu d'Aquin
1) The document discusses how knowledge representation and ontologies have evolved from closed knowledge bases for specific domains to open knowledge infrastructures that can handle large amounts of diverse data and information at scale.
2) It provides examples of how ontologies and semantic technologies are being used to build intelligent systems that can search, integrate, and automatically process and analyze large datasets.
3) Going forward, ontologies will play an important role in populating knowledge from data and dialog, enabling the automatic exploitation of data by autonomous agents, and enhancing data analytics and mining through semantic representation of datasets, tools, and policies.
Data analytics beyond data processing and how it affects Industry 4.0Mathieu d'Aquin
The document discusses how data analytics is moving beyond just data processing to affect Industry 4.0. It summarizes the research areas and industry partnerships of the Insight Centre for Data Analytics in NUI Galway, including linked data, machine learning, and media analytics. Key applications discussed are monitoring energy consumption using stream processing and event detection, predicting future behavior through machine learning, and detecting and classifying anomalies to inform predictive maintenance decisions.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Exposing Humanities Data for Reuse and Linking - RED, linked data and the semantic web
1. Exposing Humanities Data for Reuse and Linking RED, linked data and the semantic web Mathieu d’Aquin Knowledge Media Institute, the Open University LUCERO project, http://data.open.ac.uk
2. Motivation… From my rather ignorant perspective, humanities research = collecting data and using it for research and teaching RED is obviously a perfect example of this Challenges: How do we expose this data in such way that it makes all the potential uses of it feasible How do we expose this data so that it can connect to other collections, open information resources, etc. How do we benefit from other information resources to enrich this data, derive new research questions, connect it to aspects not originally thought about…
3. Linked Data (tada!!) As set of principles and technologies for a Web of Data Putting the “raw” data online in a standard representation (RDF) Make the data Web addressable (URIs) Link with to other Data
5. Linked Data at the OU? RAE DBPedia Data from Research Outputs OpenLearn Content ORO Archive of Course Material Library’s Catalogue Of Digital Content geonames data.gov.uk A/V Material Podcasts iTunesU BBC DBLP
9. Linked data… and humanities Still early stage, but Can there be a Web of Data for humanities? What are the implications? How can be we benefit? Is this going to happen naturally, or should we make a particular effort RED: an early example exploring the potential of linked data for humanities research
10. Event Location locatedIn subClassOf subClassOf Experience City Country date: Date readerInvolved originCountry textInvolved occupation givesBackgroundTo Person religion gender creator/editor LinkedEvent Ontology Document CITO Citation Ontology Dublin Core title: String description: String published: Date providesExcerptFor FOAF DBPedia
11.
12.
13.
14. Conclusion The benefits of exposing your research data as linked data is undeniable: allow for reuse and linking! Still, requires efforts The potential of linking to other data is very promising Connect things that don’t need to aggregated any more. They are in the same data space: the Web… With which come all the issues around provenance, quality, trust, etc. This represents a serious conceptual shift in the way we manage and use academic/research/educational data