In an expert webinar on April 15th 2020 we discussed (in Finnish) how the FAIR data principles affect service development in RDM services. I presented some relevant outputs from the FAIRsFAIR project. These are the slides (in English). The webinar will be published on the fairdata.fi service site https://www.fairdata.fi/koulutus/koulutuksen-tallenteet/
The importance of metadata for datasets: The DCAT-AP European standardGiorgia Lodi
The presentation was delivered for a course at the University of Bologna. It presents DCAT-AP and the Italian extension DCAT-AP_IT. It includes a discussion on the new version of DCAT and DCAT-AP
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...Lisette Giepmans
BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 1: Tools for data sharing analysis and enhancement
Opal is a software application to manage study data, and includes a feature enabling data harmonisation and data integration across studies. As such, Opal supports the development and implementation of processing algorithms required to transform study-specific data into a common harmonised format. Moreover, when connected to a Mica web interface, Opal allows users to seamlessly and securely search distributed datasets across several Opal instances.
Opal is freely available for download at www.obiba.org and is provided under the GPL3 open source licence. All studies or networks of studies using the Opal software for data storage, data management or data harmonisation must mention Opal in manuscripts, presentations, or other works made public and include a web link to the Maelstrom Research website (www.maelstrom-research.org).
Mica is a software application developed to create web portals for individual epidemiological studies or for study consortia. Features supported by Mica include a standardised study catalogue, study-specific and harmonised variable data dictionary browsers, online data access request forms, and communication tools (e.g. forums, events, news).
When used in conjunction with the Opal software, Mica also allows authenticated users (i.e. with username and password) to perform distributed queries on the content of study databases hosted on remote servers, and retrieve summary statistics of that content.
Mica is a Java-based, cross-platform, client-server application and comes along with the following two clients: the administrators' user interface and a content management system (Drupal) used to render the catalogue content on the study or consortium.
Mica is freely available for download at www.obiba.org and is provided under the GPL3 open source license.
The need of Interoperability in Office and GIS formatsMarkus Neteler
Free GIS and Interoperability: The need of Interoperability in Office and GIS formats
GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione
[GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration]
In an expert webinar on April 15th 2020 we discussed (in Finnish) how the FAIR data principles affect service development in RDM services. I presented some relevant outputs from the FAIRsFAIR project. These are the slides (in English). The webinar will be published on the fairdata.fi service site https://www.fairdata.fi/koulutus/koulutuksen-tallenteet/
The importance of metadata for datasets: The DCAT-AP European standardGiorgia Lodi
The presentation was delivered for a course at the University of Bologna. It presents DCAT-AP and the Italian extension DCAT-AP_IT. It includes a discussion on the new version of DCAT and DCAT-AP
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...Lisette Giepmans
BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 1: Tools for data sharing analysis and enhancement
Opal is a software application to manage study data, and includes a feature enabling data harmonisation and data integration across studies. As such, Opal supports the development and implementation of processing algorithms required to transform study-specific data into a common harmonised format. Moreover, when connected to a Mica web interface, Opal allows users to seamlessly and securely search distributed datasets across several Opal instances.
Opal is freely available for download at www.obiba.org and is provided under the GPL3 open source licence. All studies or networks of studies using the Opal software for data storage, data management or data harmonisation must mention Opal in manuscripts, presentations, or other works made public and include a web link to the Maelstrom Research website (www.maelstrom-research.org).
Mica is a software application developed to create web portals for individual epidemiological studies or for study consortia. Features supported by Mica include a standardised study catalogue, study-specific and harmonised variable data dictionary browsers, online data access request forms, and communication tools (e.g. forums, events, news).
When used in conjunction with the Opal software, Mica also allows authenticated users (i.e. with username and password) to perform distributed queries on the content of study databases hosted on remote servers, and retrieve summary statistics of that content.
Mica is a Java-based, cross-platform, client-server application and comes along with the following two clients: the administrators' user interface and a content management system (Drupal) used to render the catalogue content on the study or consortium.
Mica is freely available for download at www.obiba.org and is provided under the GPL3 open source license.
The need of Interoperability in Office and GIS formatsMarkus Neteler
Free GIS and Interoperability: The need of Interoperability in Office and GIS formats
GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione
[GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration]
Search Joins with the Web - ICDT2014 Invited LectureChris Bizer
The talk will discuss the concept of Search Joins. A Search Join is a join operation which extends a local table with additional attributes based on the large corpus of structured data that is published on the Web in various formats. The challenge for Search Joins is to decide which Web tables to join with the local table in order to deliver high-quality results. Search joins are useful in various application scenarios. They allow for example a local table about cities to be extended with an attribute containing the average temperature of each city for manual inspection. They also allow tables to be extended with large sets of additional attributes as a basis for data mining, for instance to identify factors that might explain why the inhabitants of one city claim to be happier than the inhabitants of another.
In the talk, Christian Bizer will draw a theoretical framework for Search Joins and will highlight how recent developments in the context of Linked Data, RDFa and Microdata publishing, public data repositories as well as crowd-sourcing integration knowledge contribute to the feasibility of Search Joins in an increasing number of topical domains.
Presentation about the collaboration between ADAPT and the Ordnance Survey Ireland at Linked Data Seminar -- Culture, Base Registries & Visualisations held in Amsterdam, The Netherlands on the 2nd of December 2016
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
An introduction to linked data (semantic web) for a Knowledge and Information Network (KIN) webinar. The presentation shows some examples of linked data in action, data visualization, difference between open and linked data and how linkd data is being used in UK gov and local gov.
Should We Expect a Bang or a Whimper? Will Linked Data Revolutionize Scholar Authoring and Workflow Tools?
Jeff Baer, Senior Director of Product Management, Research Development Services, Proquest
Search Joins with the Web - ICDT2014 Invited LectureChris Bizer
The talk will discuss the concept of Search Joins. A Search Join is a join operation which extends a local table with additional attributes based on the large corpus of structured data that is published on the Web in various formats. The challenge for Search Joins is to decide which Web tables to join with the local table in order to deliver high-quality results. Search joins are useful in various application scenarios. They allow for example a local table about cities to be extended with an attribute containing the average temperature of each city for manual inspection. They also allow tables to be extended with large sets of additional attributes as a basis for data mining, for instance to identify factors that might explain why the inhabitants of one city claim to be happier than the inhabitants of another.
In the talk, Christian Bizer will draw a theoretical framework for Search Joins and will highlight how recent developments in the context of Linked Data, RDFa and Microdata publishing, public data repositories as well as crowd-sourcing integration knowledge contribute to the feasibility of Search Joins in an increasing number of topical domains.
Presentation about the collaboration between ADAPT and the Ordnance Survey Ireland at Linked Data Seminar -- Culture, Base Registries & Visualisations held in Amsterdam, The Netherlands on the 2nd of December 2016
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
An introduction to linked data (semantic web) for a Knowledge and Information Network (KIN) webinar. The presentation shows some examples of linked data in action, data visualization, difference between open and linked data and how linkd data is being used in UK gov and local gov.
Should We Expect a Bang or a Whimper? Will Linked Data Revolutionize Scholar Authoring and Workflow Tools?
Jeff Baer, Senior Director of Product Management, Research Development Services, Proquest
Presentation given by Peter Burnhill of EDINA, at the Digital Preservation Coalition's "Trust and E-journals" event on 31 January 2012 at the Wellcome Collection Conference Centre, Euston Road, London, UK.
Bridging the gap between the semantic web and big data: answering SPARQL que...IJECEIAES
Nowadays, the database field has gotten much more diverse, and as a result, a variety of non-relational (NoSQL) databases have been created, including JSON-document databases and key-value stores, as well as extensible markup language (XML) and graph databases. Due to the emergence of a new generation of data services, some of the problems associated with big data have been resolved. In addition, in the haste to address the challenges of big data, NoSQL abandoned several core databases features that make them extremely efficient and functional, for instance the global view, which enables users to access data regardless of how it is logically structured or physically stored in its sources. In this article, we propose a method that allows us to query non-relational databases based on the ontology-based access data (OBDA) framework by delegating SPARQL protocol and resource description framework (RDF) query language (SPARQL) queries from ontology to the NoSQL database. We applied the method on a popular database called Couchbase and we discussed the result obtained.
Linked Data Generation for the University Data From Legacy Database dannyijwest
Web was developed to share information among the users through internet as some hyperlinked documents.
If someone wants to collect some data from the web he has to search and crawl through the documents to
fulfil his needs. Concept of Linked Data creates a breakthrough at this stage by enabling the links within
data. So, besides the web of connected documents a new web developed both for humans and machines, i.e.,
the web of connected data, simply known as Linked Data Web. Since it is a very new domain, still a very
few works has been done, specially the publication of legacy data within a University domain as Linked
Data.
Charleston 2012 - The Future of Serials in a Linked Data WorldProQuest
The educational objective of this session is to review today’s MARC-based environment in which the serial record predominates, and compare that with what might be possible in a future world of linked data. The session will inspire conversation and reflection on a number of questions. What will a world of statement-based rather than record-based metadata look like? What will a new environment mean for library systems, workflows, and information dissemination?
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
Nelson Piedra , Janneth Chicaiza
and Jorge López, Universidad Técnica Particular de Loja, Edmundo
Tovar, Universidad Politécnica de Madrid,
and Oscar Martínez, Universitas
Miguel Hernández
Explore the advantages of using linked data with OERs.
Providing geospatial information as Linked Open DataPat Kenny
ADAPT is revolutionising the way people can seamlessly interact with digital content, systems and each other and enabling users to achieve unprecedented levels of access and efficiency. - Prof. Declan O'Sullivan, Trinity College Dublin. Address given at Ordnance Survey Ireland GI R&D Initiatives, Tuesday, 22 March 2016, 13:00 to 20:30 (GMT), Maynooth University.
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
This presentation given to University of Iowa Libraries on Nov. 17, 2014, discussing 1) the alphabet soup of metadata standards, e.g. CDM, VRA, CCO, METS, MODS, RDF, including sample tagging and their applications for digital libraries, and 2) why metadata matters. It does not address metadata issues and tools for metadata creation, extraction, transformation, quality control, syndication and ingest.
Information on the web is tremendously increasing in
recent years with the faster rate. This massive or voluminous data
has driven intricate problems for information retrieval and
knowledge management. As the data resides in a web with several
forms, the Knowledge management in the web is a challenging
task. Here the novel 'Semantic Web' concept may be used for
understanding the web contents by the machine to offer
intelligent services in an efficient way with a meaningful
knowledge representation. The data retrieval in the traditional
web source is focused on 'page ranking' techniques, whereas in
the semantic web the data retrieval processes are based on the
‘concept based learning'. The proposed work is aimed at the
development of a new framework for automatic generation of
ontology and RDF to some real time Web data, extracted from
multiple repositories by tracing their URI’s and Text Documents.
Improved inverted indexing technique is applied for ontology
generation and turtle notation is used for RDF notation. A
program is written for validating the extracted data from
multiple repositories by removing unwanted data and considering
only the document section of the web page.
Discovering Resume Information using linked data dannyijwest
In spite of having different web applications to create and collect resumes, these web applications suffer
mainly from a common standard data model, data sharing, and data reusing. Though, different web
applications provide same quality of resume information, but internally there are many differences in terms
of data structure and storage which makes computer difficult to process and analyse the information from
different sources. The concept of Linked Data has enabled the web to share data among different data
sources and to discover any kind of information while resolving the issues like heterogeneity,
interoperability, and data reusing between different data sources and allowing machine process-able data
on the web.
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
Amit Sheth and Susie Stephens, "Semantic Web: Technolgies and Applications for Real-World," Tutorial at 2007 World Wide Web Conference, Banff, Canada.
Tutorial discusses technologies and deployed real-world applications through 2007.
Tutorial description at: http://www2007.org/tutorial-T11.php
A look at the research being carried out by Dr Stuart Dunn at Kings College London. This includes his work on rediscovering Corpse Paths in Great Britain.
A presentation by Clare Rowland from the Centre for Ecology and Hydrology given at EDINA's GeoForum 2017 about the new Landcover 2015 data now available in Environment Digimap.
A presentation by John Murray from Fusion Data Science given at EDINA's GeoForum 2017 about the use of Lidar Data and the technology and techniques that can be used on it to create useful datasets.
Slides accompanying the presentation:"Reference Rot in Theses: A HiberActive Pilot", a 10x10 session (10 slides over 10 minutes) presented by Nicola Osborne (EDINA, University of Edinburgh). This presentation was part of Repository Fringe 2017 (#rfringe17) held on 3rd August 2017 in Edinburgh. The slides describe a project to develop Site2Cite, a new (pilot) tool for researchers to archive their web citations and ensure their readers can access that archive copy should the website change over time (including "Reference Rot" and "Content Drift").
Slides accompanying the "If I Googled You, What Would I Find? Managing your digital footprint" session at the CILIPS Conference 2017: Strategies for Success, presented at the Apex Hotel, Dundee, on Tuesday 6th June 2017 by Nicola Osborne, EDINA Digital Education Manager.
"Managing your Digital Footprint : Taking control of the metadata and tracks and traces that define us online" invited presentation for CIG Scotland's 7th Metadata & Web 2.0 Seminar: "Somewhere over the Rainbow: our metadata online, past, present & future", which took place at the National Library of Scotland, 5th April 2017.
Slides accompanying Nicola Osborne's(EDINA Digital Education Manager) session on "Social media and blogging to develop and communicate research in the arts and humanities" at the "Academic Publishing: Routes to Success" event held at the University of Stirling on 23rd January 2017.
"Enhancing your research impact through social media" - presentation given by Nicola Osborne, EDINA Digital Education Manager, at the Edinburgh Postgraduate Law Conference 2017 (19th January 2017).
Social Media in Marketing in Support of Your Personal Brand - Nicola Osborne, EDINA Digital Education Manager, for Abertay University (Dundee) 4th Year Marketing Students.
Best Practice for Social Media in Teaching & Learning Contexts, slides accompanying a presentation by Nicola Osborne, EDINA Digital Education Manager, for Abertay University (Dundee). The hashtag for this event was #AbTLEJan2017.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
Going for GOLD - Adventures in Open Linked Metadata
1. Going for GOLD – Adventures in Open Linked Metadata
James Reid, Business Development and Projects Manager , EDINA, University of
Edinburgh; Will Waites, University of Edinburgh; Ben Butchart, Senior Software
Engineer, EDINA, University of Edinburgh;
Introduction
EDINA1 as a JISC2 national data centre makes available, inter alia many geographical datasets and
services for use by the UK academic community. To support discovery of these resources it
operates metadata catalogues that one can search and browse to find specific items of interest.
These metadata are typically made available both in HTML form for human consumption and in a
machine readable form, often as ISO XML for compliance with international obligations.
Catalogue systems which publish geospatial metadata (that is, information held in structured
records that describe the characteristics of data and services), and which provide interfaces for
their query and maintenance have been promoted by the Open Geospatial Consortia 3 (OGC) for
some time. The Catalogue Services (CS) specification which provides just such a standard method
of defining, querying and organising stores of geospatial metadata (Nebert et al. 2007) is the de
facto standard for discovering information about geospatial data and services in a standards
compliant fashion. The CS specification defines an HTTP protocol binding named Catalogue
Services for the Web (CSW) which underpins resource discovery across and within Spatial Data
Infrastructures (SDIs) such as the UK Location Information Infrastructure. The UK academic
sector has its own thematic SDI and already has its own geospatial discovery service through
GoGeo4.
Independent of this, there is a growing trend to make such metadata catalogues available in a
uniform way using techniques from Linked Data 5 and the semantic web. Reasons for this include
the ability to correlate metadata across different catalogue implementations that may use very
different internal representations, to facilitate linking and annotation of the metadata by third
parties and to provide a basic level of referential infrastructure on top of which more difficult
questions of provenance and quality can be addressed.
Context
An example of the broader relevance of CSW and geospatial metadata for discovery purposes, is
the recommendation issued in the context of INSPIRE 6 by the INSPIRE Network Services Drafting
Team (2008) to SDIs in the European Union to derive the base functionality of discovery services
from the ISO profile of CSW. However, CSW is, arguably, not ideally suited for the modern Web
where search engines are the users’ default gateway to information 7. The reasons why CSW might
1
edina.ac.uk
2
www.jisc.ac.uk
3
www.opengeospatial.org
4
www.gogeo.ac.uk
5
linkeddata.org/
6
INSPIRE defines the legislative framework for the establishment of a pan-European geospatial resource infrastructure.
One of its key aims is to improve the 'discoverability' of geospatial resources by publication of resource metadata.
7
Several studies of research behaviour in the UK HFE sector support the view that 'googling' is a default reflex when
seeking out resources. As noted by in 'Information behaviour of the researcher of the future', UCL, 2008, “they [resource
providers] need to make their sites more highly visible in cyberspace by opening them up to search engines”
AGI GeoCommunity '11: Placing Ourselves in the New Economy
2. be regarded as sub-optimal from a purely 'discoverability' perspective are (adapted after Lopez-
Pellicer
et al. 2010):
Search engines are poorly optimised to index Deep Web databases. The term 'Deep Web'
refers to the database content that is effectively hidden from search engines behind Web
forms and back-office applications. Surfacing Deep Web content is a research problem
that has concerned the search engine community since its description by Bergman (2001).
From this perspective, the underlying content of SDI metadata repositories are opaque and
hidden behind catalogue application interfaces; as such, SDI metadata forms part of the
Deep Web. Hence, the 'findability' via search engines depends on the success of crawling
processes that require the analysis of the Web interface, and then the automatic
generation of appropriate queries.
Applications are increasingly becoming Linked Data friendly. The Linked Data community,
which has grown significantly over the last few years, promotes a Web of data based on
the architectural principles of the Web (Bizer et al., 2008). Linked Data is less a
technology than a set of best practices for publishing, sharing and connecting data and
information using Uniform Resource Identifiers (URIs) that are resolved to Resource
Description Framework (RDF) documents8. RDF is a W3C recommendation for modeling
and exchanging metadata (Miller et al., 2004). As an aside, the original Geography Markup
Language (GML) model was based upon RDF and vestiges of this ancestry is still evident
today. In the UK, the Cabinet Office has published guidance on the design of URIs 9 which
forms the basis for the UK Location Programmes approach to publishing Linked Data 10 (see
below).
The evolution of metadata vocabularies towards RDF based models. Well known metadata
vocabularies have evolved to models based on RDF with an emphasis on the linking of
metadata descriptions. The abstract data models of Dublin Core Metadata Initiative
(DCMI)11 and the Open Archive Initiative (OAI) 12 have evolved alongside development of
the RDF data model. This has resulted in abstract models based on the RDF data model
(Nilsson et al. 2008;Lagoze et al. 2008) which emphasis the use (and reuse) of entities
rather than using plain literals as the value of properties. This evolution ultimately enables
the effective hyperlinking of metadata and traversal queries using query languages and
protocols, such as SPARQL (Seaborne et al., 2008).
Whilst CSW are undoubtedly useful to enable the discovery and provide access to geographic
information resources within the geographic community (Nogueras et al., 2005) and indeed are
essential to the development of regional, national and global SDIs, they are nevertheless disjoint
with the operational model of Deep Web crawlers. Popular search engines have developed several
techniques to extract information from Deep Web databases without previous knowledge of their
interfaces - Lopez-Pellicer et al. (2010) note that:
“The operational model for Web crawlers described in Raghavan (2001), based on (1) form analysis,
(2) query generation and (3) response analysis is widely accepted. It models queries as functions with
n named inputs X1..Xn. where the challenge is to discover the possible values of these named inputs
that return most of the content of the database. This approach is suitable for CSW HTTP GET requests.
However, the constraints are encoded in a single named input as a CQL string (see Nebert et al.
8
http://www.w3.org/RDF/
9
“Designing URI Sets for the UK Public Sector” downloadable at:
http://www.cabinetoffice.gov.uk/sites/default/files/resources/designing-URI-sets-uk-public-sector.pdf
10
See e.g. http://location.defra.gov.uk/wp-content/uploads/2011/03/INSPIRE-UK-Location-and-Linked-Data.pdf
11
http://dublincore.org/documents/dcmi-terms/
12
http://www.openarchives.org/
AGI GeoCommunity '11: Placing Ourselves in the New Economy
3. 2007), or an XML Filter (Vretanos, 2004). This characteristic is incompatible with the query model of
the Deep Web crawlers. Researchers working for search engines, such as Google (see Madhavan et al.
2008), discourage the alternative operational model that is based on the development of ad-hoc
connectors as non-sustainable in production environments.”
The upshot of this, is that geospatial metadata is not as 'open' as it potentially could be because
of the formalised constraints imposed by the (pre-)existing geospatial metadata query standards.
For example, the GoGeo CSW metadata repository effectively resides behind an opaque interface 13
from the point of view of other (non-geospatial) communities. The CSW interface does not define
a simple Web API to query and retrieve metadata. Some communities that potentially could use
CSW are accustomed to simple APIs and common formats for purposes of 'mash-up'. For
example, many geo-mashups and related data services (see Turner, 2006) use Web APIs to
access and share data built following the REST architectural style (Fielding, 2000) . These APIs are
characterized by the identification of resources by opaque URIs, semantic descriptions of
resources, stateless and cacheable communication, and a uniform interface based on the verbs of
the HTTP protocol which sits in opposition to the style adopted by CSW.
Approach
The work described below addressed the perceived shortcomings in the by producing Linked Data
from extant ISO19115/19139 records. To do so we evaluated a number of alternative strategies
for producing the RDF:
1. Metadata crosswalking. There are several geographic metadata crosswalks to the Dublin
Core vocabulary which may be viewed as the lowest common denominator metadata
baseline. We adopted the use of a Dublin Core crosswalk to implement uniform mappings
from geographic metadata schemas to the RDF data model. This approach consists of three
steps:
Apply a metadata crosswalk from the original metadata schema (ISO) to the Dublin Core
vocabulary.
Add additional metadata such as provenance of the record, original information model or
crosswalk identification.
Apply the profile for expressing as RDF the metadata terms.
2. An alternative approach was to publish direct from the relational data store
underpinning the metadata resource , direct to RDF. An extension to this approach was
to produce the RDF direct from the Unified Modelling Language (UML) representations
of the underlying schemas using a visual modeling approach.
Results
For the purposes of the project, we worked with two types of metadata catalogues:
1. Catalogue Services for the Web,(CSW) services, important particularly because they are
required for the implementation of the EU INSIPRE directive. We harvested, amongst
others, the Scottish Location Information Discovery Service CSW for this purpose.
13
GoGeo also supports a Z39.50 target interface but this is not heavily used.
AGI GeoCommunity '11: Placing Ourselves in the New Economy
4. 2. Customised catalogues that themselves aggregate data from various sources as
exemplified by GoGeo and are typically implemented on top of a relational database
schema.
These two separate catalogue implementations, lend themselves conveniently to the two
alternative strategies for RDF production. In the case of CSW the crosswalk approach is the
obvious solution whilst for database based schemas, schema mapping and UML derivation
approaches seemed more appropriate.
For the CSW “metadata crosswalk” approach, we applied XSLT transforms which are generally
appropriate where data is available in a predictable, standard form and is ideally suited for a
circumstance where administrative or other elevated network privileges are not available or
practicable – for example the data is available via an HTTP API but connections to the underlying
(SQL) database are not possible i.e. the CSW is the proxy interface to the Deep Web. For the
purposes of establishing a Linked Data production flow-line, the metadata are harvested and
stored in an intermediate database (triplestore) and it is against this intermediate database that
queries are performed and representations (RDF and others) are published.
A Note on URI Governance
Implicit in any approach that mints URIs are certain assumptions. Firstly, that a URI can be clearly
and unambiguously defined and assigned to identify particular resources. Secondly that URIs are
stable and persist. Axiomatically, the essence of a persistent URI is it's immutability. Permanence
implies long term commitment and ownership which presupposes some established mechanism
for governance i.e. some authority has to set the rules and own the URIs which are used to
identify things. The aforementioned Cabinet Office paper, “Designing URI Sets for the UK Public
Sector”, endeavoured to address this issue but it's adoption and adaptation to location
(geospatial) information in “Designing URI Sets for Location”14 highlighted an issue of political
sensitivity – specifically the fact that in its original presentation the guidance neglected to allow
for nationally determined and nationally specific URI schemes. At time of writing, the guidance is
being recast to allow for Scottish, Welsh and Northern Irish URI naming schemes to be adopted if
required by the devolved administrations 15.
A Note on Target Representation16
As far as possible we worked with well known and widely used vocabularies (a set of agreed
terms). The core vocabulary is, Dublin Core Terms 17, hereafter simply referred to as DC.
Unfortunately, DC does not contain a notion of a collection except indirectly as an underspecified
range for dc:isPartOf and does not contain any particular way to refer to a representation of an
abstract dataset such as a downloadable resource like a CSV or Shape file. It also contains no
particular mechanism either to talk about people and organisations who might be authors or
maintainers of datasets or to represent geographical or temporal extent. For all else, bar the
question of geographical extent there are, fortunately, solutions that are more or less well
14
http://location.defra.gov.uk/wp-content/uploads/2010/04/Designing_URI_Sets_for_Location-Ver0.5.pdf
15
Scottish Government are adopting a http://{resource}.data.scotland.gov.uk pattern where {resource} maps to the
United Nations Classification of Functions of Government (COFOG) headings, using a cofog01, cofog02 approach rather
than the descriptive label e.g. "general public services", "defence", etc. To improve the overall quality of this approach
work is being undertaken with the UN and Rensellaer (http://rpi.edu/) to produce a SKOS RDF version of the COFOG.
16
Throughout, in the examples given below, Turtle notation is used (http://www.w3.org/TeamSubmission/turtle/). Except
where confusion and ambiguity might arise, prefix declarations are generally omitted and well known or standard prefixes
are used.
17
We have studiously avoided the use of the legacy Dublin Core Elements namespace.
For this reason where the prefix dc: appears it is taken to refer to the Dublin Core
Terms namespace.
AGI GeoCommunity '11: Placing Ourselves in the New Economy
5. established. For describing data catalogues there is a specific vocabulary which is specifically
designed to augment DC called the Data Catalog Vocabulary or DCat. Of particular interest are the
concepts of dcat:Catalog, dcat:CatalogRecord and dcat:Distribution. Also used is dcat:Dataset
which is simply an alias for dc:Dataset.
Further predicates are used for expressing other metadata such as keywords and spatio-temporal
granularity. For referring to people, either natural or corporate, common practice is to use the
Friend-of-a-Friend or FOAF vocabulary18. Where the precise nature of the entity in question is
unclear foaf:Agent19 was used. Where it is clear that a natural person is being referred to, the
more specific foaf:Person was used and if it was an organisation we used foaf:Organisation. For
more involved descriptions of people and organisations and their relationship, the Organisation
Ontology[?] may be used. This vocabulary provides facilities for describing the role that a
particular person may fill in an organisation, for example.
Before addressing questions of how to represent spatial data, (which are far from settled),
consider the fictitious example catalogue in Figure 1.
@prefix ex: <http://example.org/>
ex:catalogue a dcat:Catalog;
dc:description "An example catalogue";
dcat:keyword "Examples";
dc:record ex:rec1, ex:rec2, ex:rec3.
ex:org a foaf:Organisation;
foaf:name "Ficticious Inc.";
foaf:mbox <mailto:someone@example.org>.
ex:sa a foaf:Organisation;
foaf:name "Space Agency of Somewheria".
ex:rec1 a dcat:CatalogRecord;
dc:modified "2011-07-01"^^xsd:date;
dc:maintainer ex:org;
dcat:dataset [
a dcat:Dataset;
dc:identifier "ABCD-EF12-3456-7890-DCBA";
dc:modified "1984-03-23"^^xsd:date;
dc:contributor ex:sa;
dc:title "Pictures of crop circles from space";
dcat:distribution [
a dcat:Distribution;
dc:description "Shape file download of crop circles";
dcat:accessURL <http://download.example.org/crops.shp>
], [
a dcat:Distribution;
]
].
ex:rec2 a dcat:CatalogRecord;
...
ex:rec3 a dcat:CatalogRecord;
Figure 1. An example catalogue
This example is not intended to be a complete representation of all the metadata that may be
contained in such a catalogue but is intended to give a clearer idea of the structure. The
distinction between the description of a catalogue record and the description of the dataset itself
may seem pedantic but is in fact quite important. Frequently, the metadata record may be
changed or updated in isolation, even when no change to the dataset itself has been made. The
dataset may be maintained by one person or organisation and this person may have no influence
whatsoever over the metadata catalogue. Separating out the concepts is important in order to be
able to express this difference. This separation is well known and is already expressed through
18
http://xmlns.com/foaf/spec/
19
a subclass of dc:Agent
AGI GeoCommunity '11: Placing Ourselves in the New Economy
6. existing geospatial metadata fields. It is worth noting the lack of an explicit URI to identify the
dataset itself. This is for notational convenience in the above example. In practice, where a
dataset has no natural dereferenceable URI, it is given a non-resolveable one in the urn:uuid
namespace (where it has an identifier that is a conformant UUID). This non-resolveable URI is
essentially a constant that is used to refer to the dataset in third-party documents. However in
the current project we were concerned with publishing catalogue records and not dataset
descriptions as such and for expediency (elaborated on in the section on the use of named graphs
in the metadata crosswalk section below) it was more straightforward to adopt this approach -
although it is not a necessary feature of the representation.
Geographical Metadata
When we talk about geographical metadata we typically mean an expression of the coverage
(spatial) area of a particular dataset. Most commonly this will be a (minimal) bounding box but it
may be, in principle, a shape of arbitrary complexity. Whilst there is well established practice for
representing point data in RDF, there does not appear to be any such consensus when it comes to
representing even a shape as simple as rectangle 20. What is reasonably certain is that the
dc:spatial predicate should be used to link from the dataset to its spatial extent. However the
range of this predicate is simply a dc:Location and is not further specified other than such an
entity must describe “a spatial region or named place”.
One approach, that taken by NeoGeo 21 is to attempt to create a completely granular “native” RDF
description of the geometry. This is not incorrect by any means but was regarded as inappropriate
for several reasons. Firstly, there is no support in any current software for presenting such data to
the user as a map rendering. Secondly, correspondence with the authors of NeoGeo suggests that
they are primarily interested in publishing such geometries as first class entities in themselves
where as we assume that the geometries in our application are unlikely to be of primary interest
outside of the context of the datasets which they annotate. Lastly, with an eventual eye to
publishing these metadata using an RDF database that is aware of geospatial datatypes, the
complexity involved in creating indexes for data expressed in this way can be considerable.
The Ordnance Survey on the other hand has opted to encode any geographical component on
data which they publish simply by encoding fragments of Geography Markup Language (GML) as
literals (Goodwin et al. 2009). This works well in that it is easy to index and many more tools exist
that understand GML natively – at least relative to the “native” RDF approach.
A third approach is that presented in the GeoSPARQL W3C member submission 22 which, though
the specification document conflates vocabulary with query language extensions which could
potentially be better presented as two separate documents, allows for both approaches.
In our implementation we opted to construct literals using the Well Known Text (WKT) 23 and
annotate them provisionally with terms from the GeoSPARQL namespace in the hope that the
useful parts of this vocabulary will be incorporated at a later date into a more widely used
standard i.e. as the standards are still undergoing comment and revision cycles, it is too early to
20
Where by “rectangle” is meant the shape described by geodesics in a standard coordinate system like WGS84 implied by
the coordinates of two diagonally opposite
points
aka MBR or minimum bounding rectangle
21
http://geovocab.org/doc/neogeo.html
22
See http://geosparql.org/
23
See http://www.geoapi.org/3.0/javadoc/org/opengis/referencing/doc-files/WKT.html
AGI GeoCommunity '11: Placing Ourselves in the New Economy
7. say which approach represents the 'best' for future proofing purposes and consequently we have
applied a 'best guess' logic. Considering the example in Figure 2, though we have used the WKT
as the lowest common denominator representation of geospatial data, there is room alongside for
other equivalent representations in e.g. GML as the Ordnance Survey does - or indeed, in
expressly materialised granular RDF as with NeoGeo. Yet it also retains the properties of being
easy to display to users using common tools and easy to index when importing into an RDF
database.
[
a dcat:Dataset;
dc:spatial [
a geo:Geometry;
geo:asWKT "<http://www.opengis.net/def/crs/ogc/1.3/CRS84>
POLYGON((12.9709 52.3005, 12.9709 52.6954,
13.8109 52.6954, 13.8109 52.3005))"^^geo:WKTLiteral
]
].
Figure 2. Geographical metadata example.
Metadata Crosswalk
In some sense any transformation technique could be called a “metadata crosswalk”. The specific
meaning here is a transformation that is done on a particular document (in this case a catalogue
record) rather than at the level of a database or collection of such documents. The approach turns
on a 'harvesting' mechanism similar to that used more generally across the European spatial data
infrastructure and elsewhere. Algorithm 1 is executed periodically against a data source.
procedure Harvest(source, start)
for xml in source modified since start do
rdf ← Transform(xml)
Store(rdf )
end for
end procedure
Algorithm 1. Harvesting algorithm
Retrieved documents are transformed using an XSLT transform and are then stored in an RDF
database24. For this project a specialised CSW client was written and this implements the query
and actual fetching of ISO19139 documents from a server such as Geonetwork 25 (which is used in
national and thematic based catalogues in the UK).
The storage step is also important in that it must extract from the intermediary RDF data a
suitable graph name, a URI that will be used to both identify a catalogue record and to group the
statements that belong in this record. It is clear from Figure 1 that, because a catalogue record is
“several levels deep” it is not sufficient to just, say, consider all statements with a given subject in
order to obtain a complete catalogue record. In order to save on the expense of complex queries,
the data relating to a particular record is therefore grouped together into a named graph during
harvesting. This also simplifies updating records that have changed since it is merely necessary to
replace the entire graph rather than do complex queries to determine exactly which statements
need to be retracted before any new ones are added.
The bulk of the logic, however, is done by the transformation step. In this case it uses an XSLT
stylesheet26 to transform ISO19139 XML data into RDF/XML which may then be stored. The
24
In our case using 4Store (http://4store.org/)
25
http://geonetwork-opensource.org/
26
The current version of this stylesheet is available at https://bitbucket.org/ww/
gold/src/tip/static/csw/md_metadata.xsl
AGI GeoCommunity '11: Placing Ourselves in the New Economy
8. structure of this stylesheet is straightforward, however it is perhaps appropriate to give some
account of some of the specific mappings that we made. Salient mappings that we adopted are
reproduced in Table 1.
ISO19139 RDF
gmd:MD_Metadata dcat:CatalogRecord
gmd:fileIdentifier dc:identifier
also used in construction of URI
gmd:contact/gmd:CI_ResponsibleParty foaf:Agent
gmd:identificationInfo/gmd:MD_DataIdentification dcat:Dataset
gmd:identificationInfo/srv:SV_ServiceIdentification dcat:Distribution
gmd:distributionInfo/gmd:MD_Distribution dcat:Distribution
gmd:CI_Citation/gmd:title dc:title
gmd:CI_Citation/gmd:abstract dc:abstract
gmd:MD_Keywords dcat:keyword
gmd:MD_LegalConstraints dc:accessRights
dc:rights
gmd:MD_Constraints dc:rights
gmd:MD_Resolution dcat:spatialGranularity
dcat:temporalGranularity
gmd:EX_GeographicBoundingBox dc:spatial
geo:Geometry
gmd:EX_TemporalExtent dc:temporal
time:Interval
Table 1. Crosswalk mapping of ISO1939 elements to common RDF vocabularies.
It should be noted, however, that this table is greatly simplified for in many cases it makes sense
to produce more than one RDF statement, particularly when one considers type assertions and
necessary bits of indirection, for what appears as perhaps a single element in the ISO19139 (a
good example of this is the temporal extent where the expression of a time interval in RDF can be
quite verbose), and vice-versa where it takes many elements (at least eight) in the ISO19139 to
express a geographical bounding box but because we opted for the simpler representation, the
RDF requires only three.
In our testing we experimented not only with CSW services run and maintained by ourselves but
also those run by others, we found some anomalies in some elements. For example, to distinguish
between various types of dates, e.g. publication or modification date, it appears that the
controlled vocabulary and localisation support in many catalogues is incomplete for this purpose.
This required a number of special cases in the transform, e.g. “publication” vs. “publicatie” as
found in Dutch catalogues for example as appears in the
gmd:dateType/gmd:CIDateTypeCode/codeListValue elements. This further complicated the
transformation requiring manual intervention where automated harvesting and RDF production
proved problematic. Nevertheless, this approach worked surprisingly well and provided us with a
AGI GeoCommunity '11: Placing Ourselves in the New Economy
9. generic pipeline for harvesting geospatial metadata from any CSW and producing Linked Data
outputs, albeit by making some default choices on the vocabularies used and representation
simplification.
Database and UML to RDF Approaches
Database Mapping Approach
For our second approach to producing RDF Linked Data, the database mapping approach, we used
the popular D2R software27. D2R is a toolset for publishing relational databases on the Semantic
Web. It enables RDF and HTML browsers to navigate the content of the database, and allows
applications to query the database using a formal query language - SPARQL (SPARQL Protocol
and RDF Query Language)28. In this approach, queries are expressed in the SPARQL query
language (whether expressly composed or implied by the act of fetching or dereferencing a
resource identifier) and are transformed to more traditional SQL queries that are then executed
against the underlying database, the results being returned according to the standard practice for
SPARQL query results. The major aspect for consideration here is the mapping between the
database schema and the RDF constructs expected in the outputs. These mappings can become
quite onerous to hand-craft and we developed an approach to making this less tedious by
extending the capabilities of a visual UML tool29.
Transforming the catalogue to Linked Data on the fly by translating requests into SQL queries
against the relational database in which it is stored and then transforming the results has some
advantages. Though it is slower than querying native RDF storage directly, chiefly because
SPARQL queries may entail expensive self-joins when translated into SQL, there is no need to
provision a large RDF database server to host what is essentially a cache of the original data.
There is also no need to coordinate updates or periodically harvest the source. It may also be,
with a well-normalised relational schema and judiciously chosen set of indexes, that the relational
database has very much smaller resource requirements than an equivalent RDF store – this is
particularly relevant when there is a significant volume of data involved. Realistically though,
geospatial metadata repositories are seldom of sufficient size to make this a prima facie concern.
Data currency is more of a consideration and in instances where the underlying geospatial
metadata store is being regularly changed, it may be more appropriate to use a D2R type
approach then try to re-synchronise and lock-step harvested resources. Data custodians need to
balance the trade-off between periodic harvesting and change frequency in order to ensure overall
data currency. This is of course a generic issue and not limited solely to the type of data
publication work explored here.
The D2R server is a single Java program that takes an input file describing the coordinates of the
database to be translated and the translation rules. The process for configuring it is relatively
simple:
1. Generate a crude configuration file using the generate-mapping program
2. Refine and customise the mapping (the hard part!)
3. Run the d2r-server
The result is an HTTP service that supports simple HTML and RDF representations of the resources
in the database and a SPARQL endpoint through which arbitrary queries may be made. For
production use a reverse proxy or embedding into a Java servlet system may be considered
desirable.
27
http://sourceforge.net/projects/d2rq-map/
28
http://www.w3.org/TR/rdf-sparql-query/
29
We used Enterprise Architect.
AGI GeoCommunity '11: Placing Ourselves in the New Economy
10. There were however two main difficulties encountered with this approach. The first is that
whereas working from a widely published international standard as the source data format (say
ISO), we can be reasonably confident that it will not change in the short term and therefore any
mapping from that to Linked Data also will not need to change, the same is not true for
information in a relational database where internal business rules and ad hoc schemas often
prevail. The main use of relational databases is as internal data storage and as such there are
frequently no external constraints on the schema changing. As it stands, if the schema changes
for whatever reason, the mapping for the D2R server must, of necessity, be revisited. Whilst this
did not happen during the lifetime of this project, it can be reasonably expected that it will happen
from time to time in the future as the services are extended or modified.
The other difficulty is related to the first. A further property of stable, agreed international
standards as data sources, is that they tend to be relatively well documented. On the other hand,
it is much less common to have the same level of documentation for a relational database schema
that is intended primarily for internal use. Consequently, the process of refining the mapping
configuration for the D2R server is not trivial if the schema is at all complicated or subject to
frequent changes. The relative merits of which approach to use – crosswalking or derivation direct
from a database, are ultimately bound to issues of available infrastructure, skillsets, data currency
etc. Table 2 provides a summary of the two approaches.
Approach used Crosswalk D2R
Data currency Cached Live
Computing resources RAM-bound for RDF DB CPU and network bound
Moving parts / Just D2R Software
Administrative complexity
• Harvesting / update
machinery
• RDF storage
• HTTP/REST service
Configuration complexity Simple Complex
Skillsets for customisation 1. RDF (Turtle, RDF/XML)
1. RDF (Turtle, RDF/XML) 2. Java (customisation of behaviour,
deployment, appearance)
2. XSLT (for customising
the transformation) 3. HTML+JS+CSS (customisation of
behaviour)
3. HTML+JS+CSS (for
customising human-
readable output)
4. Go language (for
customising the behaviour
of this particular
implementation)
Look and Feel Easy customisation of Somewhat more complex
HTML files
SPARQL queries Faster Slower
Table 2: Side-by-side comparison of Crosswalk and D2R based approaches
AGI GeoCommunity '11: Placing Ourselves in the New Economy
11. Production from UML – a UML Profile for D2R scripting
As already noted, the task of republishing data deposited in relational database as Linked Data
principally involves two main technical challenges.
1. Converting the tables, columns and relationships in the relational model to appropriate RDF
constructs ( that is, RDF classes and properties)
2. Mapping columns and relationships to appropriate existing Linked Data vocabularies.
This is the approach described above and is actually a fairly common problem which has lead to
the evolution of tools such as D2R. These tools have emerged to assist dataset providers in
mapping relational models to RDF and tend to treat the relational database as a virtual RDF graph
enabling access through APIs such as Sesame 30 and Jena31. While the D2R platform greatly
facilitated semantic access to data in our geospatial relational database, the creation of the
necessary D2R mapping scripts is tedious and error prone, as no visual authoring tools are
currently available. As many UML authoring tools already have good support for relational
database modelling , we extended our UML tool to support simple D2R constructs, thereby
automatically generating D2r scripts from within a visual editing environment. Our approach was
to define an XML profile with D2R constructs that could augment existing UML stereotypes used in
standard database modelling (<table>, <column> etc). The data modeller first loads the
relational model into the UML tool. Most UML tools have a simple import mechanism to load tables
into the tool direct from the relational store. Then, the data modeller imports the 'D2R profile' (our
GOLD extensions), to expand their toolbox. Theses imported D2R tools provide access to
constructs such as “ClassMap” and “PropertyBridge” that can then be dragged onto the tables and
columns in the UML model to define the mappings. Once the modeller has completed all the
mappings they can export the UML model to XMI 32. The output XMI contains all the information
that the data modeller specified in the UML modelling tool, including the stereotypes and
TaggedValues from the D2R profile. An XSLT stylesheet can the be applied to the XMI file to
generate the D2R mapping in RDF format. In the case of Enterprise Architect it was possible to
specify the XSLT stylesheet as part of the XMI export so the export and xsl transform can be
combined into one step. Our experiences with this approach suggests that for Linked Data
production direct from a relational data store, particularly ones where table structure is more than
very simple, this visual editing support for mapping production is both more intuitive from the
data modellers perspective and significantly, less error prone than conventional 'hand-crafting'
approaches. It is also more adaptable and robust to underlying schema changes and more flexible
where multiple target output schemas are required. Our D2R profile extensions are available on
request.
Conclusion
In endeavouring to expose the Deep web, we have taken a Linked Data approach and published
geospatial metadata as RDF. We have explored alternate options for RDF generation – from cross-
walking through well known vocabularies such as Dublin Core, to RDF generation direct from a
relational database. In either case, there is an expectation that the outputs will be consistent yet
the vagaries of geospatial data (quality aspects) mean that establishing which approach is more
flexible or robust in any particular application instance, is necessarily subject to trial and error. We
have established a basic workflow and infrastructure which support CSW harvesting (from any
CSW) and automated Linked Data publication of that geospatial metadata by adopting well known
30
http://www.w3.org/2001/sw/wiki/Sesame
31
http://jena.sourceforge.net/
32
The XML Metadata Interchange (XMI) is an Object Management Group (OMG) standard for exchanging metadata
information
AGI GeoCommunity '11: Placing Ourselves in the New Economy
12. and frequently used vocabularies e.g. FOAF, DCat. An open question remains as to whether or not
Linked Data is the panacea to resource discovery and reuse that its proponents assert. A
significant issue to overcome is to establish core, common vocabularies – particularly in respect
to alternate and competing approaches to the representation of geometry information.
References
Bergman, Michael K. (2001) White Paper: The Deep Web: Surfacing Hidden Value, The Journal of
Electronic Publishing (JEP), Volume 7, Issue 1, August, 2001. Available from:
http://hdl.handle.net/2027/spo.3336451.0007.104
Bizer, C.; Heath, T.; Idehen, K. and Berners-Lee, T. (2008) Linked data on the web (LDOW2008)
WWW '08: Proceeding of the 17th international conference on World Wide Web, ACM, 1265-1266
Fielding, R. T. (2000) REST: Architectural Styles and the Design of Network based Software
Architectures University of California, Irvine
Goodwin, J.; Dolbear, C. and Hart, G. (2009) Geographical Linked Data: The Administrative
Geography of Great Britain on the Semantic Web Transactions in GIS, doi: 10.1111/j.1467-
9671.2008.01133.x
Lagoze, C. and de Sompel, H. V. (2008) ORE User Guide – Resource Map Implementation in
RDF/XML Open Archives Initiative. Available from: http://www.openarchives.org/ore/1.0/rdfxml
Lopez-Pellicer F. J., Florczyk A. J., Nogueras-Iso J, Muro-Medrano P. R. and Zarazaga-Soria F. J.
(2010) Exposing CSW catalogues as Linked Data, Lecture Notes in Geoinformation and Cartography
(LNG&C). Geospatial Thinking. 2010, vol. , p. 183-200. ISSN 1863-2246
Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A. and Halevy, A. (2008) Google's Deep
Web crawl, Proceedings of the VLDB Endowment, VLDB Endowment, 1, 1241-1252
Miller, E. and Manola, F. (2004) RDF Primer. W3C, Available from http://www.w3.org/TR/2004/REC-
rdf-primer-20040210/
Nebert, D., Whiteside, A. and Vretanos, P. A. (2007). Open GIS Catalogue Services Specification.
OpenGIS Publicy Available Standard, Open GIS Consortium Inc.
Nilsson, M.; Powell, A.; Johnston, P. and Naeve, A. (2008) Expressing Dublin Core metadata using
the Resource Description Framework (RDF) [online] Dublin Core Metadata Initiative, DCM
Nogueras-Iso, J., Zarazaga-Soria, F.J., Muro-Medrano, P.R. (2005) Geographic Information Metadata
for Spatial Data Infrastructures: Resources, Interoperability and Information Retrieval. Springer-
Verlag New York, Inc., Secaucus, NJ, USA
Raghavan, S. and Garcia-Molina, H. (2001) Crawling the Hidden Web VLDB '01:Proceedings of the
27th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., 129-138
Seaborne, A. & Prud'hommeaux, E. (2008) SPARQL Query Language for RDF W3C.
W3CRecommendation. Available from:http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
Turner, A. (2006) Introduction to neogeography O'Reilly Media, Inc.
Vretanos, P. A. (2004) OpenGIS Filter Encoding Implementation Specification Open Geospatial
Consortium Inc.
AGI GeoCommunity '11: Placing Ourselves in the New Economy