This document introduces exploratory querying and SPEX, a tool for exploratory querying of spatial and temporal data. It summarizes the goals of exploratory querying, gives examples of exploratory querying software, and demonstrates SPEX through use cases of exploring integrated datasets like the Dutch BAG registry and real estate listings. The presentation describes SPEX's capabilities for interactively querying and visualizing linked geospatial and temporal data to understand dataset contents for further use and integration.
COOL-WD: A Completeness Tool for WikidataFariz Darari
COOL-WD is a completeness tool for Wikidata. Its features are: (1) Display any Wikidata entity enriched with completeness information for each of its properties; (2) Adding new completeness statements or by removing incorrect ones; (3) Aggregate completeness statements and analyze the completeness of classes of entities; and (4) Process any SPARQL query over Wikidata and evaluate the completeness of the query answer.
This XML Prague 2015 Pre-conference presentations shows practical usage of linked data sources. These sources can help to: enrich content with entities, add link to external data sources, use the enriched content in question answering, machine translation or other scenarios. The aim is to show the practical application of linked data sources in XML tooling. The presentation is an update and provides outcomes of the related session held at XML Prague 2014.
The presentation was given by Mr. Bas Kempen, ISRIC, during the GSOC Mapping Global Training hosted by ISRIC - World Soil Information, 6 - 23 June 2017, Wageningen (The Netherlands).
The Use of Big Data Techniques for Digital ArchivingSven Schlarb
These slides were used in a presentation at the "Our Digital Future - Multidisciplinary Perspectives on Long Term Data Preservation and Access" conference in Cambridge/UK in March 2016 in the session "Current and Future perspectives on technology for data preservation and sharing". They describe work in progress in the E-ARK project, which is co-funded by the European Commission and has as its main objective the creation of a scalable open source, digital archiving system offering efficent search and access content of very large digital object collections. The focus of this presentation lies on describing the core big data technologies (Apache Hadoop, Apache Hbase, and the document repository Lily developed by NGData), the architecture of the E-ARK integrated prototype implementation, and data mining use cases related to geographical data, named entitity extraction, and OLAP data analysis.
COOL-WD: A Completeness Tool for WikidataFariz Darari
COOL-WD is a completeness tool for Wikidata. Its features are: (1) Display any Wikidata entity enriched with completeness information for each of its properties; (2) Adding new completeness statements or by removing incorrect ones; (3) Aggregate completeness statements and analyze the completeness of classes of entities; and (4) Process any SPARQL query over Wikidata and evaluate the completeness of the query answer.
This XML Prague 2015 Pre-conference presentations shows practical usage of linked data sources. These sources can help to: enrich content with entities, add link to external data sources, use the enriched content in question answering, machine translation or other scenarios. The aim is to show the practical application of linked data sources in XML tooling. The presentation is an update and provides outcomes of the related session held at XML Prague 2014.
The presentation was given by Mr. Bas Kempen, ISRIC, during the GSOC Mapping Global Training hosted by ISRIC - World Soil Information, 6 - 23 June 2017, Wageningen (The Netherlands).
The Use of Big Data Techniques for Digital ArchivingSven Schlarb
These slides were used in a presentation at the "Our Digital Future - Multidisciplinary Perspectives on Long Term Data Preservation and Access" conference in Cambridge/UK in March 2016 in the session "Current and Future perspectives on technology for data preservation and sharing". They describe work in progress in the E-ARK project, which is co-funded by the European Commission and has as its main objective the creation of a scalable open source, digital archiving system offering efficent search and access content of very large digital object collections. The focus of this presentation lies on describing the core big data technologies (Apache Hadoop, Apache Hbase, and the document repository Lily developed by NGData), the architecture of the E-ARK integrated prototype implementation, and data mining use cases related to geographical data, named entitity extraction, and OLAP data analysis.
Data curation and data archiving at different stages of the research processAndrea Scharnhorst
Henk van den Berg, Jerry de Vries, Andrea Scharnhorst (2019) Data curation and data archiving at different stages of the research process. Presentation given at the DANS Colloquium on Research and Data: Women readers finding their literary foremothers, March 21, 2019, The Hague
LoCloud Geolocation enrichment tools, Siri Slettvag, Asplan Viak Internet (Av...locloud
Presentation about the Geolocation Enrichment tools developed by Avinet as part of the LoCloud project. The tool can be used to add geographic locations to existing datasets provided by cultural institutions. It can be used as part of existing workflows by curators, or in crowd-sourcing projects with users finding places and adding coordinates.
http://www.locloud.eu
Here I motivate the need for improving the ways in which we access the integration favorable space of Linked Data, by bridging the gap between various Linked Data querying methods, getting links to these queries, and providing RESTful APIs based on them.
GeoSemantic Technologies for Archaeological ResourcesPaul Cripps
The semantics of heritage data is a growing area of interest with ontologies such as the CIDOC-CRM providing semantic frameworks and exemplary projects such as STAR and STELLAR demonstrating what can be done using semantic technologies applied to archaeological resources. In the world of the Semantic Web, advances regarding geosemantics have emerged to extend research more fully into the spatio-temporal domain, for example extending the SPARQL standard to produce GeoSPARQL. Importantly, the use of semantic technologies, particularly the structure of RDF, aligns with graph and network based approaches, providing a rich fusion of techniques for geospatial analysis of heritage data expressed in such a manner.
This paper gives an overview of the ongoing G-STAR research project (GeoSemantic Technologies for Archaeological Resources) with reference to broader sectoral links particularly to commercial archaeology. Particular attention is paid to examining the integration of spatial data into the heritage Global Graph and the relationship between Spatial Data Infrastructure (SDI) and Linked Data, moving beyond notions of ‘location’ as simple nodes, placenames and coordinates towards fuller support for complex geometries and advanced spatial reasoning. Finally, the potential impacts of such research is discussed with particular reference to the current practice of commercial archaeology, access to and publishing of (legacy, big) data, and leveraging network models to better understand and manage change within archaeological information systems.
Lighting presentation for Moving People / Linking Lives at the University of Virginia, March 2015. It discusses EAC-CPF, mapping to RDF linked data, and using xEAC as the publication framework for highly interlinked archival authorities and scholarly prosopographies.
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Marcus Smith
A presentation of two aspects of the linked open data work ongoing at the Swedish National Heritage Board (Riksantikvarieämbetet): Swedish Open Cultural Heritage (SOCH/K-samsök) and the Digital Archaeological Process (DAP).
Delivered at the Smithsonian, Washington, DC, 2014-11-10
Data curation and data archiving at different stages of the research processAndrea Scharnhorst
Henk van den Berg, Jerry de Vries, Andrea Scharnhorst (2019) Data curation and data archiving at different stages of the research process. Presentation given at the DANS Colloquium on Research and Data: Women readers finding their literary foremothers, March 21, 2019, The Hague
LoCloud Geolocation enrichment tools, Siri Slettvag, Asplan Viak Internet (Av...locloud
Presentation about the Geolocation Enrichment tools developed by Avinet as part of the LoCloud project. The tool can be used to add geographic locations to existing datasets provided by cultural institutions. It can be used as part of existing workflows by curators, or in crowd-sourcing projects with users finding places and adding coordinates.
http://www.locloud.eu
Here I motivate the need for improving the ways in which we access the integration favorable space of Linked Data, by bridging the gap between various Linked Data querying methods, getting links to these queries, and providing RESTful APIs based on them.
GeoSemantic Technologies for Archaeological ResourcesPaul Cripps
The semantics of heritage data is a growing area of interest with ontologies such as the CIDOC-CRM providing semantic frameworks and exemplary projects such as STAR and STELLAR demonstrating what can be done using semantic technologies applied to archaeological resources. In the world of the Semantic Web, advances regarding geosemantics have emerged to extend research more fully into the spatio-temporal domain, for example extending the SPARQL standard to produce GeoSPARQL. Importantly, the use of semantic technologies, particularly the structure of RDF, aligns with graph and network based approaches, providing a rich fusion of techniques for geospatial analysis of heritage data expressed in such a manner.
This paper gives an overview of the ongoing G-STAR research project (GeoSemantic Technologies for Archaeological Resources) with reference to broader sectoral links particularly to commercial archaeology. Particular attention is paid to examining the integration of spatial data into the heritage Global Graph and the relationship between Spatial Data Infrastructure (SDI) and Linked Data, moving beyond notions of ‘location’ as simple nodes, placenames and coordinates towards fuller support for complex geometries and advanced spatial reasoning. Finally, the potential impacts of such research is discussed with particular reference to the current practice of commercial archaeology, access to and publishing of (legacy, big) data, and leveraging network models to better understand and manage change within archaeological information systems.
Lighting presentation for Moving People / Linking Lives at the University of Virginia, March 2015. It discusses EAC-CPF, mapping to RDF linked data, and using xEAC as the publication framework for highly interlinked archival authorities and scholarly prosopographies.
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Marcus Smith
A presentation of two aspects of the linked open data work ongoing at the Swedish National Heritage Board (Riksantikvarieämbetet): Swedish Open Cultural Heritage (SOCH/K-samsök) and the Digital Archaeological Process (DAP).
Delivered at the Smithsonian, Washington, DC, 2014-11-10
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
Given at Data Day Seattle 2015.
Bitly generates over 9 billion clicks on shortened links a month, as well as over 100 million unique link shortens. Analyzing data of this scale is not without its challenges. At Bitly, we have started adopting Apache Spark as a way to process our data. In this talk, I’ll elaborate on how I use Spark as part of my data science workflow. I’ll cover how Spark fits into our existing architecture, the kind of problems I’m solving with Spark, and the benefits and challenges of using Spark for large-scale data science.
Slides from the first meeting of the project group PUSHPIN at the University of Paderborn. I focus on the general focus of the project group and the topics for the seminar phase.
Why do they call it Linked Data when they want to say...?Oscar Corcho
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
Repositories are systems to safely store and publish digital objects and their descriptive metadata. Repositories mainly serve their data by using web interfaces which are primarily oriented towards human consumption. They either hide their data behind non-generic interfaces or do not publish them at all in a way a computer can process easily. At the same time the data stored in repositories are particularly suited to be used in the Semantic Web as metadata are already available. They do not have to be generated or entered manually for publication as Linked Data. In my talk I will present a concept of how metadata and digital objects stored in repositories can be woven into the Linked (Open) Data Cloud and which characteristics of repositories have to be considered while doing so. One problem it targets is the use of existing metadata to present Linked Data. The concept can be applied to almost every repository software. At the end of my talk I will present an implementation for DSpace, one of the software solutions for repositories most widely used. With this implementation every institution using DSpace should become able to export their repository content as Linked Data.
The explosion in growth of the Web of Linked Data has provided, for the first time, a plethora of information in disparate locations, yet bound together by machine-readable, semantically typed relations. Utilisation of the Web of Data has been, until now, restricted to the members of the community, eating their own dogfood, so to speak. To the regular web user browsing Facebook and watching YouTube, this utility is yet to be realised. The primary factor inhibiting uptake is the usability of the Web of Data, where users are required to have prior knowledge of elements from the Semantic Web technology stack. Our solution to this problem is to hide the stack, allowing end users to browse the Web of Data, explore the information it contains, discover knowledge, and use Linked Data. We propose a template-based visualisation approach where information attributed to a given resource is rendered according to the rdf:type of the instance.
If You Have The Content, Then Apache Has The Technology!gagravarr
Within the ASF, there are a wide variety of projects with technologies to help you store, retrieve, host, transform and generate content. This talk will review the landscape of Apache content technologies, provide a quick introduction to the more common and more interesting projects, and flag up new and innovative features within them. It'll also highlight talks from the rest of the week on many of the projects covered, so that you'll know where and when to go to learn more about those projects and technologies which catch your eye!
The Digital Archaeological Workflow: A Case Study from SwedenMarcus Smith
# The Digital Archaeological Workflow: A Case Study from Sweden
The Digital Archaeological Workflow (DAP) is a programme of work being carried out at the Information Development Unit at the Swedish National Heritage Board, in partnership with the major Swedish archaeological stakeholders. The programme aims to streamline the flow of archaeological data (and its associated metadata) between different actors in the Swedish archaeological process, and to ensure that this data is preserved in a sustainable and accessible manner. It aims to address a number of problems which have hampered the practice of archaeology in Sweden for some time, but which have now started to become more acute as digital technology saturates the processes involved.
There is no centralised register of archaeological fieldwork in Sweden, making it difficult not only to keep track of what is going on where, but also to know what fieldwork – if any – has taken place in connection to a particular site in the national sites and monuments record. Sweden also has no central digital archive for the storage of either archaeological fieldwork data or reports; as such records are now produced digitally, valuable archaeological data is thus increasingly at risk of being lost.
Furthermore, despite the fact that almost all of the data and administrative metadata surrounding archaeological work are digital-born, they are still handled according to analogue paradigms, particularly when information must be shared between different organisations. Sources of archaeological data which are currently made available digitally by various national and local bodies are not typically linked together. This leads to inefficiencies in information transfer, duplication of data and effort, and to information describing the same 'objects' being stored in different systems within different organisations.
The DAP programme intends to address these problems over the course of a five-year period, using standardised platform-agnostic data formats and protocols to streamline information transfer between organisations, by releasing a series of open taxonomies and ontologies for common Swedish archaeological terms and concepts on the semantic web in order to facilitate data interoperability, and by creating a secure digital repository both for the raw data and reports arising from fieldwork and research. We aim to make this information freely available as linked open data.
Our overall mapping of the current Swedish archaeological process is complete (although some details remain) and we are currently working on a conceptual model on which our future information architecture will be based. In parallell, we are also working to translate and release our existing (analogue) archaeological taxonomies to SKOS and release them as linked open data authorities, beginning with the Swedish monuments types thesaurus.
Duraspace Hot Topics Series 6: Metadata and Repository ServicesMatthew Critchlow
Presented by Declan Fleming, Arwen Hutt, and Matt Critchlow. The second in a three part Webinar series on Research Data Curation at UC San Diego, as part of the larger Research Cyberinfrastructure initiative.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Collaboratively Conceived, Designed and Implemented: Matching Visualization ...Nancy Hoebelheinrich
Presented as a poster at the American Geophysical Union 2014 Annual Meeting in San Francisco, California on behalf of the ESIP Semantic Web Cluster's ToolMatch team.
Providing open data is of interest for its societal and commercial value, for transparency, and because more people can do fun things with data. There is a growing number of initiatives to provide open data, from, for example, the UK government and the World Bank. However, much of this data is provided in formats such as Excel files, or even PDF files. This raises the question of
- How best to provide access to data so it can be most easily reused?
- How to enable the discovery of relevant data within the multitude of available data sets?
- How to enable applications to integrate data from large numbers of formerly unknown data sources?
One way to address these issues to to use the design principles of linked data (http://www.w3.org/DesignIssues/LinkedData.html), which suggest best practices for how to publish and connect structured data on the Web. This presentation gives an overview of linked data technologies (such as RDF and SPARQL), examples of how they can be used, as well as some starting points for people who want to provide and use linked data.
The presentation was given on August 8, at the Hacknight event (http://hacknight.se/) of Forskningsavdelningen (http://forskningsavd.se/) (Swedish: “Research Department”) a hackerspace in Malmö.
Similar to Exploratory querying of the Dutch GeoRegisters (20)
1. Exploratory querying of
the Dutch Georegisters
with the purpose
of further integration with other sources
by Stanislav Ronzhin
&
Rob Lemmens
2. • To introduce Exploratory Querying
• To present SPEX, a tool for Exploratory
Querying in space and time
• To demonstrate SPEX in action
Goals for today
9. Exploratory Querying of spatio-temporal data
bagGeopunt:0546010000010756 geo:asWKT
"POINT(4.4884203637992 52.157846104773)".
bagWoonplaats:2088 a bag:Woonplaats;
bag:woonplaatsnaam "Leiden“.
bagPand:0546100000040803 a bag:Pand;
bag:bouwjaar “1650”;
bag:ingangsdatum "2010-08-26T00:00:00".
10. SPEX - Spatio-temporal Content Explorer
Scheider, S., Degbelo, A., Lemmens, R., van Elzakker, C., Zimmerhof, P., Kostic, N., Jones,
J.,& Banhatti, G. (2015, in publishing). Exploratory querying of SPARQL endpoints in space
and time. Semantic Web journal.
11. Who are the users?
• Data managers, (geo) information
professionals, non-experts (with a little help)
• Who want to understand the content of data
for further use/integration
• Unexperienced semantic web/SPARQL users
12. Emergency management use case
• Browser for triplified and enriched Ushahidi data
• Selection of operating hospitals in some area
16. Query to select all the verblijfsobjects with their area in a
neighborhood of interest
17. Query to select all the verblijfsobjects with their addresses in
a neighborhood
18. To sum up
• Exploratory Querying – simultaneously learning
about the information needed while specifying it
• SPEX is a prototype tool for Exploratory Querying
in space and time
• For those who want to know the content of data
for further use/integration
19. Future development
• Develop workflow that would embed SPEX
• Named Graph support, k-Nearest Neighbor
query, functionality for data extraction
great big graphs have severe drawbacks for visualizing linked data sets because what they exactly not provide are overview, zoom and details-on-demand.
Think about a novice who is willing to buy his or her first road bicycle. The problem here is that for a novice all of them look almost identical when searching on the WEB. In order to find a suitable one a novice has to deal with an old dilemma of joining the learning of concepts about bikes together with their specification. In the case of a bike search, a novice learns about different characteristics of a road bicycle (e.g frame material, number of gears, etc.) while exploring the range of possible variants. When the biker finally decides on the particular value of a characteristic (e.g aluminum frame), he or she is specifying this value.
In contrast to relational databases, linked data is self-descriptive. This means that linked metadata are just additional triples that are stored together with other data triples. Therefore, it allows doing both specification and learning about concepts at the same time. For instance, linked data representation of BGT data (Basisregistratie Grootschalige Topografie) consists of more than 180 different classes and numerous data triples. The question is how a user can find data of interest, for instance about his or her property. This can be done in a closed iterative loop of classify and instantiate queries (Scheider et al., 2015). The former searches for classes and relations, thus facilitating learning of the BGT concepts when the latter helps to explore particular instances of identified classes and relations. The results of a classify query feeds into an instantiation query, which in turn provide information that feedbacks new classify query and so on. Thus, data exploration involves querying and querying in turn cannot be done without exploration. This approach is called exploratory querying in (Kadlag et al., 2004) and, more generally, exploratory search in (Marchionini. 2006; White & Roth, 2009).
From these, RelFinder and gFacet directly work on SPARQL endpoints and can be used without a-priori knowledge about content, based on autosuggestion and feedback. RelFinder is restricted to instance based queries without variables. gFacet can already be considered a visual query tool, as it follows a visual graph-pattern strategy very similar to the design principles proposed in this paper (compare Section 4.2).
From these, iSPARQL, SPARQLViz, Visualbox and Sgvizler require substantial apriori knowledge about SPARQL or contained vocabularies for building a meaningful query. LodLive, ViziQuer, NITELIGHT, QueryVOWL and SparqlFilterFlow, in contrast, have a form of overview and suggestion tool for available vocabularies and datasets in an endpoint as well as substantial feedback and support in building queries. ViziQuer and NITELIGHT, however, seem to be made primarily for tech-users and focus less on data exploration. An interesting interactive query tool suitable for inexperienced users which gives feedback on data satisfying a query in terms of a “flow” chart is SparqlFilterFlow
In its most basic diffinition Geoinformation consists of tree components: SPACE TIME and THEME. These roughly resemble What Where When.
Instead, a mix of graphical options which fit these tasks and corresponding data types are more adequate.
Map-like query interfaces can play a key role in retrieval tasks and are easily adopted by users.
Visual tool for construction of graph patterns
A result set pane, A query pane. Map and a time slider.
A visual query system needs to support query formulation by letting users select visual data representation elements and manipulating them. Each possible manipulation needs to translate into a syntactical operation in the formal query language. Even though the choice of a visual query interface depends on the query language (i.e., SPARQL), the development in the Semantic Web illustrates a wide variety of query construction approaches.