Prov-O-Viz is a visualisation service for provenance graphs expressed using the W3C PROV vocabulary. It uses the Sankey-style visualisation from D3js.
See http://provoviz.org
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
Ā
Presentation of our paper at the WHISE workshop at ESWC 2016 on requirements for metadata over non-public datasets for the science & technology studies field.
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sƶren Auer
Ā
Despite an improved digital access to scientific publications in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. The document-oriented workflows in science have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis. We need to represent, analyse, augment and exploit scholarly communication in a knowledge-based way by expressing and linking scientific contributions and related artefacts through semantically rich, interlinked knowledge graphs. This should be based
on deep semantic representation of scientific contributions, their manual, crowd-sourced and automatic augmentation and finally the intuitive exploration and interaction employing question answering on the resulting scientific knowledge base. We need to synergistically combine automated extraction and augmentation techniques, with large-scale collaboration to reach an unprecedented level of knowledge graph breadth and depth. As a result, knowledge-based information flows can facilitate completely new ways of search and exploration. The efficiency and effectiveness of scholarly communication will significant increase, since ambiguities are reduced, reproducibility is facilitated, redundancy is avoided, provenance and contributions can be better traced and the interconnections of research contributions are made more explicit and transparent. In this talk we will present first steps in this direction in the context of our Open Research Knowledge Graph initiative and the ScienceGRAPH project.
Content + Signals: The value of the entire data estate for machine learningPaul Groth
Ā
Content-centric organizations have increasingly recognized the value of their material for analytics and decision support systems based on machine learning. However, as anyone involved in machine learning projects will tell you the difficulty is not in the provision of the content itself but in the production of annotations necessary to make use of that content for ML. The transformation of content into training data often requires manual human annotation. This is expensive particularly when the nature of the content requires subject matter experts to be involved.
In this talk, I highlight emerging approaches to tackling this challenge using what's known as weak supervision - using other signals to help annotate data. I discuss how content companies often overlook resources that they have in-house to provide these signals. I aim to show how looking at a data estate in terms of signals can amplify its value for artificial intelligence.
The literature contains a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data reuse. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data.
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
Ā
Presentation of our paper at the WHISE workshop at ESWC 2016 on requirements for metadata over non-public datasets for the science & technology studies field.
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sƶren Auer
Ā
Despite an improved digital access to scientific publications in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. The document-oriented workflows in science have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis. We need to represent, analyse, augment and exploit scholarly communication in a knowledge-based way by expressing and linking scientific contributions and related artefacts through semantically rich, interlinked knowledge graphs. This should be based
on deep semantic representation of scientific contributions, their manual, crowd-sourced and automatic augmentation and finally the intuitive exploration and interaction employing question answering on the resulting scientific knowledge base. We need to synergistically combine automated extraction and augmentation techniques, with large-scale collaboration to reach an unprecedented level of knowledge graph breadth and depth. As a result, knowledge-based information flows can facilitate completely new ways of search and exploration. The efficiency and effectiveness of scholarly communication will significant increase, since ambiguities are reduced, reproducibility is facilitated, redundancy is avoided, provenance and contributions can be better traced and the interconnections of research contributions are made more explicit and transparent. In this talk we will present first steps in this direction in the context of our Open Research Knowledge Graph initiative and the ScienceGRAPH project.
Content + Signals: The value of the entire data estate for machine learningPaul Groth
Ā
Content-centric organizations have increasingly recognized the value of their material for analytics and decision support systems based on machine learning. However, as anyone involved in machine learning projects will tell you the difficulty is not in the provision of the content itself but in the production of annotations necessary to make use of that content for ML. The transformation of content into training data often requires manual human annotation. This is expensive particularly when the nature of the content requires subject matter experts to be involved.
In this talk, I highlight emerging approaches to tackling this challenge using what's known as weak supervision - using other signals to help annotate data. I discuss how content companies often overlook resources that they have in-house to provide these signals. I aim to show how looking at a data estate in terms of signals can amplify its value for artificial intelligence.
The literature contains a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data reuse. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data.
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
Data Communities - reusable data in and outside your organization.Paul Groth
Ā
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Ā
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
Ā
Keynote Integrative Bioinformatics 2018
https://docs.google.com/document/d/1E7D4_CS0vlldEcEuknXjEnSBZSZCJvbI5w1FdFh-gG4/edit
Can we improve research productivity through providing answers stemming from knowledge graphs? In this presentation, I discuss different ways of building and combining knowledge graphs.
Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/
The need for a transparent data supply chainPaul Groth
Ā
Illustrating data supply chains and motivating the need for a more transparent data supply chain in the context of responsible data science. Presented at the 2018 KNAW-Royal Society bilateral meeting on responsible data science.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Ā
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Knowledge graphs ilaria maresi the hyve 23apr2020Pistoia Alliance
Ā
Data for drug discovery and healthcare is often trapped in silos which hampers effective interpretation and reuse. To remedy this, such data needs to be linked both internally and to external sources to make a FAIR data landscape which can power semantic models and knowledge graphs.
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
Ā
Some thoughts on successful data for the agricultural domain. Keynote at Linked Open Data in Agriculture
MACS-G20 Workshop in Berlin, September 27th and 28th, 2017 https://www.ktbl.de/inhalte/themen/ueber-uns/projekte/macs-g20-loda/lod/
The presentation was delivered by FORTH at the 3rd International Workshop on the role of Semantic Web in Provenance Management 2012 (SWPM2012) in Heraklion, Greece on 28th of May 2012.
Abstract:
Workflow systems can produce very large amounts of provenance information. In this paper we introduce provenance-based inference rules as a means to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). We motivate this kind of (provenance) inference and identify a number of basic inference rules over a conceptual model appropriate for representing provenance. The proposed inference rules concern the interplay between (i) actors and carried out activities, (ii) activities and devices that were used for such activities, and, (iii) the presence of information objects and physical things at events. However, since a knowledge base is not static but it changes over time for various reasons, we also study how we can satisfy change requests while supporting and respecting the aforementioned inference rules. Towards this end, we elaborate on the specification of the required change operations.
Data Communities - reusable data in and outside your organization.Paul Groth
Ā
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Ā
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
Ā
Keynote Integrative Bioinformatics 2018
https://docs.google.com/document/d/1E7D4_CS0vlldEcEuknXjEnSBZSZCJvbI5w1FdFh-gG4/edit
Can we improve research productivity through providing answers stemming from knowledge graphs? In this presentation, I discuss different ways of building and combining knowledge graphs.
Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/
The need for a transparent data supply chainPaul Groth
Ā
Illustrating data supply chains and motivating the need for a more transparent data supply chain in the context of responsible data science. Presented at the 2018 KNAW-Royal Society bilateral meeting on responsible data science.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Ā
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Knowledge graphs ilaria maresi the hyve 23apr2020Pistoia Alliance
Ā
Data for drug discovery and healthcare is often trapped in silos which hampers effective interpretation and reuse. To remedy this, such data needs to be linked both internally and to external sources to make a FAIR data landscape which can power semantic models and knowledge graphs.
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
Ā
Some thoughts on successful data for the agricultural domain. Keynote at Linked Open Data in Agriculture
MACS-G20 Workshop in Berlin, September 27th and 28th, 2017 https://www.ktbl.de/inhalte/themen/ueber-uns/projekte/macs-g20-loda/lod/
The presentation was delivered by FORTH at the 3rd International Workshop on the role of Semantic Web in Provenance Management 2012 (SWPM2012) in Heraklion, Greece on 28th of May 2012.
Abstract:
Workflow systems can produce very large amounts of provenance information. In this paper we introduce provenance-based inference rules as a means to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). We motivate this kind of (provenance) inference and identify a number of basic inference rules over a conceptual model appropriate for representing provenance. The proposed inference rules concern the interplay between (i) actors and carried out activities, (ii) activities and devices that were used for such activities, and, (iii) the presence of information objects and physical things at events. However, since a knowledge base is not static but it changes over time for various reasons, we also study how we can satisfy change requests while supporting and respecting the aforementioned inference rules. Towards this end, we elaborate on the specification of the required change operations.
This presenations provides an outlook of what we anticipate with the structured data hub: to create linkable datasets, enhance the use of provenance, add quality flags to data, answer new questions and finally, borrow from and provide to public sources such as dbpedia
Advancing the comparability of occupational data through Linked Open DataRichard Zijdeman
Ā
Occupations are a crucial resource for historical research in a wide variety of fields. This presentation indicates the size of the error that is made when combining data from the two major classification schemes OCCHISCO and HISCO. Next it shows how Linked Data provides a solution to circumvent this and similar issues.
Historical occupational classification and occupational stratification schemesRichard Zijdeman
Ā
Lecture slides of 1 day course on the practice of coding historical occupations into HISCO and HISCAM consisting of 2x1.5 hours lecture and afternoon computer hands-on session in R.
Introduction into R for historians (part 4: data manipulation)Richard Zijdeman
Ā
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
Keepit Course 3: Provenance (and OPM), based on slides by Luc MoreauJISC KeepIt project
Ā
This presentation offers a brief introduction to provenance, a record of the process that led to the current state of an object, based on a new descriptive model designed to allow provenance information to be exchanged between systems, the Open Provenance Model (OPM). It was given as part of module 3 of a 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. For more on this and other presentations in this course look for the tag 'KeepIt course' in the project blog http://blogs.ecs.soton.ac.uk/keepit/
Addressing Diversity in Archival Collections with Outreachgibbsr55
Ā
Slides for the "Addressing Diversity in Archival Collections with Outreach" presentation, given on December 2, 2009, at the University of Tennessee at Knoxville
Labour force participation of married women, US 1860-2010Richard Zijdeman
Ā
In this presentation I describe the shape of labour force participation curve of married women in the US. It is hypothesized to be U-shaped, but it appears to be more S-shaped. However, more importantly it provides an effort to test the underlying mechanisms of the U-shape at the US state level.
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Ā
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like āinternetā. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
Provenance is focused on the description and understanding of where and how data is produced, the actors involved in the production of such data, and the processes by which the data was manipulated and transformed until it arrived to the collection from which it is being accessed. Provenance aims at providing the ability to trace the sources of data, enabling the exploration not just of the relationships between datasets, but also of their authors and affiliations, with the goal of preserving data ownership and establishing a notion of trust based on authenticity and reliability.
The Future Internet poses important challenges for provenance, derived from complex and rich scenarios characterized by the presence of large amounts of data stemming from heterogeneous sources like user communities, services, and things. Such challenges span across technical but also socioeconomic dimensions. The former includes aspects like vocabularies for representing provenance, interoperability and scalability issues, and means to produce, acquire, and reason with provenance in order to provide measures of trust and information quality. However, it is probably in the socieconomic dimension where more significant efforts need to be made as to addressing issues like the role of provenance in the overall picture of the Future Internet, entry barriers preventing the generation of provenance-aware internet content, means required to incentivate the production of such content, and ways to prevent provenance forgery.
In this talk, we provide and overview on provenance and the above mentioned challenges and introduce ongoing work in order to address trust issues from the provenance perspective in the Future Internet. We also link provenance to other relevant aspects for trust discussed in the session, like security, legal frameworks, and economics.
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/
[2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sisās work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls
[3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays
[4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
Ā
The huge amount of library data stored in our modern research and statistic centers of organizations is springing up on daily bases. These databases grow exponentially in size with respect to time, it becomes exceptionally difficult to easily understand the behavior and interpret data with the relationships that exist between attributes. This exponential growth of data poses new organizational challenges like the conventional record management system infrastructure could no longer cope to give precise and detailed information about the behavior data over time. There is confusion and novel concern in selecting tools that can support and handle big data visualization that deals with multi-dimension. Viewing all related data at once in a database is a problem that has attracted the interest of data professionals with machine learning skills. This is a lingering issue in the data industry because the existing techniques cannot be used to remove or filter noise from relevant data and pad up missing values in order to get the required information. The aim is to develop a stacked generalization model that combines the functionality of random forest and decision tree to visualization library database visualization. In this paper, the random forest and decision tree techniques were employed to effectively visualize large amounts of school library data. The proposed system was implemented with a few lines of Python code to create visualizations that can help users at a glance understand and interpret the behavior of data and its relationships. The model was trained and tested to learn and extract hidden patterns of data with a cross-validation test. It combined the functionalities of both models to form a stacked generalization model that performed better than the individual techniques. The stacked model produced 95% followed by the RF which produced a 95% accuracy rate and 0.223600 RMSE error value in comparison with the DT which recorded an 80.00% success rate and 0.15990 RMSE value.
Tutorial presented at 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012), January 28-30, 2012. http://sites.google.com/site/web2011ihi/participants/tutorials
This tutorial weaves together three themes and the associated topics:
[1] The role of biomedical ontologies
[2] Key Semantic Web technologies with focus on Semantic provenance and integration
[3] In-practice tools and real world use cases built to serve the needs of sleep medicine researchers, cardiologists involved in clinical practice, and work on vaccine development for human pathogens.
Linked data for Enterprise Data IntegrationSƶren Auer
Ā
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
Ā
Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs.
Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.
The main aim of Data-Centric Architecture is to reduce complexity of information systems by using shared data with clear meaning. But how can you trust your data? How do you know if it is accurate and reliable?
Providing geospatial information as Linked Open DataPat Kenny
Ā
ADAPT is revolutionising the way people can seamlessly interact with digital content, systems and each other and enabling users to achieve unprecedented levels of access and efficiency. - Prof. Declan O'Sullivan, Trinity College Dublin. Address given at Ordnance Survey Ireland GI R&D Initiatives, Tuesday, 22 March 2016, 13:00 to 20:30 (GMT), Maynooth University.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Ontologies for Emergency & Disaster Management Stephane Fellah
Ā
Ogc meeting march 2014
OGC OWS-10 Cross-Community Interoperability
Ontologies for Emergency & Disaster Management
(The application of geospatial linked data)
Similar to Prov-O-Viz: Interactive Provenance Visualization (20)
Linkitup: Link Discovery for Research DataRinke Hoekstra
Ā
Linkitup is a Web-based dashboard for enrichment of research output published via industry grade data repository services. It takes metadata entered through Figshare.com and tries to find equivalent terms, categories, persons or entities on the Linked Data cloud and several Web 2.0 services. It extracts references from publications, and tries to find the corresponding Digital Object Identifier (DOI). Linkitup feeds the enriched metadata back as links to the original article in the repository, but also builds a RDF representation of the metadata that can be downloaded separately, or published as research output in its own right. In this paper, we compare Linkitup to the standard workflow of publishing linked data, and show that it significantly lowers the threshold for publishing linked research data.
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerRinke Hoekstra
Ā
In this paper we explore the possibilities of using the Linked Data representation of all Dutch regulations stored in the MetaLex Doc- ument Server for the purposes of network analysis over the citation graph between regulations, both at the document level, and at the article level. We show that this is possible using relatively straightforward SPARQL queries, and present preliminary results of the analysis.
A Network Analysis of Dutch Regulations. Rinke Hoekstra. figshare.
http://dx.doi.org/10.6084/m9.figshare.689880
Retrieved 11:12, Oct 07, 2013 (GMT)
This presentation describes the use by Data2Semantics (http://www.data2semantics.org) of the VIVO portal (http://vivoweb.org) for interlinking researchers contributing to projects within the COMMIT programme (http://www.commit-nl.nl).
The Data2Semantics project (COMMIT P23) is all about enriching research data, and making it more reusable for future research. Using Linked Data for this task is a fairly obvious step to make (surprise!). However, there are several shortcomings the current practices in publishing Linked Data, that calls for a slightly
different approach which (hopefully) bridges a gap between Web 2.0 and Web 3.0. I will present a proof-of-concept service (Linkitup) that works on top of existing scientific data repositories, and allows individual researchers to enrich their data with additional (linked) metadata.
Talk about the use of Linked Data in historical research on census data. Has some slides about TabLInker as well (http://github.com/Data2Semantics/TabLinker). Part of the data2semantics project (http://data2semantics.org)
Presentatie voor de Belastingdienst in het kader van een onderzoek naar de (on)mogelijkheden rond het herkennen en extraheren van concepten en hun definities, en het representeren daarvan met Semantic Web standaarden.
History of Knowledge Representation (SIKS Course 2010)Rinke Hoekstra
Ā
The goal of AI research is the simulation and approximation of human intelligence by computers. To a large extent this comes down to the development of computational reasoning services that allow machines to solve problems. Robots are the stereotypical example: imagine what a robot needs to know before it is able to interact with the world the way we do? It needs to have a highly accurate internal representation of reality. It needs to turn perception into action, know how to reach its goals, what objects it can use to its advantage, what kinds of objects exist, etc.
The field of knowledge representation (KR) tries to deal with the problems surrounding the incorporation of some body of knowledge (in whatever form) in a computer system, for the purpose of automated, intelligent reasoning. In this sense, knowledge representation is the basic research topic in AI. Any artificial intelligence is dependent on knowledge, and thus on a representation of that knowledge. The history of knowledge representation has been nothing less than turbulent. The roller coaster of promise of the 50's and 60's, the heated debates of the 70's, the decline and realism of the 80's and the ontology and knowledge management hype of the 90's each left a clear mark on contemporary knowledge representation technology and its application.
Presentatie over het publiceren van overheidsdata als linked data. Met nadruk op hoe context-afhankelijkheid hierbij gerespecteerd kan blijven.
Gehouden voor een groep mensen van (Bureau) Forum Standaardisatie, Novay, ICTU/eOverheid voor burgers, Information Dynamics en de Vrije Universiteit
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Ā
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Ā
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Ā
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Ā
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
Ā
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Ā
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Ā
Clients donāt know what they donāt know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsā needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Ā
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties ā USA
Expansion of bot farms ā how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks ā Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
4. DefinitionāØ
(OxfordEnglishDictionary)
ā¢ The fact of coming from some particular source or quarter;
origin, derivation;
ā¢ the history or pedigree of a work of art, manuscript, rare
book, etc.;
ā¢ concretely, arecordofthepassage of an item through its
various owners.
11. Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Compliance and auditing of business processes
12. Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Compliance and auditing of business processes
13. Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Liability, trust and privacy in open government data
Compliance and auditing of business processes
14. Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Liability, trust and privacy in open government data
Compliance and auditing of business processes
15. Provenance
Making trust judgements on the Web
Licensing and attribution of combined information
Liability, trust and privacy in open government data
Compliance and auditing of business processes
Safeguarding quality, reproducibility and integrity of the scientific process
16. āWebDesignIssuesā
āAt the toolbar (menu, whatever) associated
with a document there is a button marked
āOh, yeah?ā. You press it when you lose that
feeling of trust. It says to the Web, āso how
do I know I can trust this information?ā. The
software then goes directly or indirectly back
to metainformation about the document,
which suggests a number of reasons.ā
Tim Berners-Lee, Web Design Issues, September 1997
19. ProvenanceinOpenGovernment
Need provenance for data integration and reuseāØ
diversity of data sourcesāØ
varying qualityāØ
different scopeāØ
different assumptions
āProvenance is the number one
issue that we face when publishing
government data in data.gov.ukā
John Sheridan, UK National Archives, data.gov.uk
20. ProvenanceinScience
āWe need a paradigm that makes it simple [ā¦]
to perform and publish reproducible
computational research. [ā¦] a Reproducible
Research Environment (RRE) [ā¦] provides
computational tools together with the ability
to automatically track the provenance of data,
analysis, and results and to package them (or
pointers to persistent versions of them) for
redistribution.ā
Jill Mesirov, Chief Informatics Officer of the MIT/āØ
Harvard Broad Institute, in Science, January 2010
Need provenance for reproducibility āØ
and verification of processes
21.
22. W3CWorkingGroup
Provenance is a record that describes the people,
institutions, entities, and activities, involved in
producing, influencing, or delivering a piece of data or
a thing.
http://www.w3.org/TR/prov-overview
Luc Moreau & Paul Groth
23. Provenance?
ā¢ Provenance = Metadata?āØ
Provenance can be seen as metadata, but not all metadata is
provenance
ā¢ Provenance = Trust?āØ
Provenance provides a substrate for deriving different trust
metrics
ā¢ Provenance = Authentication?āØ
Provenance records can be used to verify and authenticate
amongst users
24. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
25. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording
26. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating
27. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
28. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability
29. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
30. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust
31. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability
32. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability compliance
33. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability compliance explanation
34. ThreeDimensions
ā¢ ContentāØ
Capturing and representing provenance information
ā¢ ManagementāØ
Storing, querying, and accessing provenance information
ā¢ UseāØ
Interpreting and understanding provenance in practice
recording annotating workflow systems
scalability interoperability
trust accountability compliance explanation debugging
40. NaiveApproaches
InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical GroupingāØ
Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf
Orbiter has several limitations. It does not have capabilities for query subgraph high-
lighting, regular expression ļ¬lters, process grouping, annotations, or programmable views[16].
Furthermore, the structure of each summary node, where child nodes are grouped within
parents and are hidden until the parent is expanded, beneļ¬ts queries earlier in the depen-
dency chain. Initial overviews often correspond with system bootup, and appear very similar
across diāµerent traces (time slices of system activity).
Figure 10: In these screenshots of Orbiter, the presence of edges overwhelms the visibility of
nodes. By relying on a node-link graph layout and using spatial location to encode object
relationships, Orbiterās graph layout algorithm must draw many long edges to communi-
cate node connections. Without edge bundling or opacity variation, the meanings of these
relationships are obscured.
Another one of Orbiterās weaknesses is its node-link diagram layout. As a result, each
nodeās position in the X-Y plane and the length and angle of connecting lines are wasted
attributes. The chosen graph layout algorithm (dot by default) arranges nodes to minimize
Figure 11: (Top): A screenshot of the portion of the graph generated by GraphViz for a
trace of the third provenance challenge. (Bottom): A zoomed-in view of the same graph.
The horizontal black bars across the images are dense collections of edges.
Eāµective large graph visualizations present the user with a summary view that can be
explored, ļ¬ltered, and expanded interactively.
2.5 Tree Visualization
While trees are a subcategory of graphs, because of their hierarchical composition, tree visu-
alization forms its own subļ¬eld of research. A survey of over two-hundred tree visualizations
is given at Hans-Jrg Schulzās treevis.net. Visitors can narrow down by dimensionality
(2D, 3D, or mixed), representation (explicit node-link diagram, implicit treemap, or combi-
nation), alignment (XY plot, radial layout, or free diagram)[55]. These categories are shown
Figure 12: Left: Pajek uses various summary node-link and matrix-based representations
depending on the structure of the supplied data set. Pictured is a main core subgraph
extracted from routing data on the Internet. Right: TopoLayout optimizes the choice of
visualization display depending on the underlying graph structure. The right column is
TopoLayoutās output, while the left and middle columns are the outputs of the GRIP and
FM graph layout algorithms.
Figure 13: treevis.net deļ¬nes diāµerent categories for tree maps. Tree maps can be cate-
gorized by dimensionality (2D, 3D, or mixed), representation (explicit, implicit, or mixed),
or alignment (XY, radial, or spring).
Tree visualizations are either explicit or implicit. Explicit representations resemble node-
link diagrams. An example of an implicit representation is a tree map, a diagram where the
entire tree is inscribed in a rectangle representing the root node. This root is subdivided
hierarchically into more rectangles, which represent child nodes, and each child node is
subdivided into more child nodes. Treemaps are excellent for displaying hierarchical or
categorical data[57]. One famous example, shown in Figure 14, is the āMap of the Marketā
from SmartMoney.com, which displays in red and green the changes in market value of
publicly-traded companies, grouped by market sector, with cell size proportional to market
capitalization[64].
TreePlus is an example of a tree-inspired graph visualization tool (Figure 15). It uses
the guiding metaphor of āplant a seed to watch it growā to summarize navigation of its tree-
41. InProv
InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical GroupingāØ
Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf
6 Final Design
Figure 30: A view of a cluster of system activity. This particular timeslice shows the activity
of the init.sh and mount processes.
This visualization was designed with the Visual Information-Seeking Mantra in mind -
āoverview ļ¬rst, zoom and ļ¬lter, then details-on-demandā[56].
55. Discussion
ā¢ Provenance is vital in many areasāØ
government, science, industry, ā¦
ā¢ PROV is the W3Cstandard for expressing provenance
ā¢ Provenance graphs can be overwhelming and complex
ā¢ PROV-O-Viz builds intuitive Sankey-style visualizations
ā¢ ā¦ for any provenance trace expressed using PROV
to
2Data SemanticsSemantics for Scientific Data PublishersFrom Data
http://semweb.cs.vu.nl/provoviz
Thanks to: Paul Groth, Provenance XG, WG, Luc Moreau, James Cheney, Paolo Missier, Olaf Hartig, Satya Sahoo