Linked data intro primer

Part 1 of a Linked Data Workshop for library staff at University of Oregon and Oregon State University Libraries.

Technology Education

Linked Data Principles
Oregon Digital Linked Data
Workshop, Eugene, Oregon
November 25, 2013
Tom Johnson thomas.
johnson@oregonstate.edu

4 Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover
more things.

Subject, Predicate, Object
Subject

Predicate

<http://example.
org/object1>

<http://purl.org/dcterms/title>

<http://example.
org/object1>

<http://purl.org/dcterms/isPartOf>

Object
“Example Title”

<http://example.
org/collection1>

Serializing
N-Triples:
<http://example.org/1> <http://purl.org/dcterms/title> “Comet in Moominland”@en .
<http://example.org/1> <http://purl.org/dcterms/title> “Mumintrollet på kometjakt”@sv .
<http://example.org/1> <http://purl.org/dcterms/creator> <http://example.
org/person/tjannson> .
<http://example.org/1> <http://purl.org/dcterms/subject> <http://id.loc.
gov/authorities/subjects/sh2001004219> .
<http://id.loc.gov/authorities/subjects/sh2001004219> <http://www.w3.
org/2004/02/skos/core#prefLabel> "Moomins (Fictitious characters)"@en .
<http://example.org/person/tjannson> <http://xmlns.com/foaf/0.1/name> “Tove Jansson” .

Serializing
Turtle
ex:1 dc:title “Comet in Moominland”@en,
“Mumintrollet på kometjakt”@sv ;
dc:creator ex:person/tjansson ;
dc:subject lcsh:sh2001004219 .
ex:person/tjansson foaf:name “Tove Jansson” .
lcsh:sh2001004219 skos:prefLabel "Moomins (Fictitious
characters)"@en .

Practical Semantics
➢ Hierarchical Metadata Terms
⇒ relationships between vocabularies
⇒ e.g. mrel:photographer < dc:contributor

➢ Domain and Range Statements
⇒ Limit vocabulary application for data quality and
interoperability

➢ Objects in one statement can be subjects in
others.

Global Scale
➢ Statement-centric
➢ Model is “Open World”
⇒ Data we don’t have is assumed to be unknown
locally, not globally.

➢ Outside data is valued
➢ Linking is web scale

Resources
➢ Linked Open Vocabularies (vocabulary search engine)
http://lov.okfn.org/dataset/lov/
➢ W3C Library Linked Data Incubator Group Reports http:
//www.w3.org/2005/Incubator/lld/
➢ Open Metadata Registry (hosts RDA vocabularies) http:
//metadataregistry.org/

Questions/Discussion
Challenges from readings?
Questions about why & how?
Observations?

This document proposes a system to give researchers credit for depositing their data by allowing them to easily submit a "data paper" about their deposited data set. It involves developing a helper application that would integrate with data repositories and publishers to streamline the process of depositing data and submitting the associated data paper. The proposal outlines three phases: requirements gathering and design, development and trial deployment, and expansion and sustainability. The goal is to incentivize data sharing by providing researchers a publication and citation opportunity for describing and taking credit for their datasets.

Using Neo4j for exploring the research graph connections made by RD-Switchboard

amiraryani

In this talk, Jingbo Wang (NCI) and Amir Aryani (ANDS) have presented the Neo4j queries that can help data managers to explore the connections between datasets, researchers, grants, and publications using the graph model and Research Data Switchboard. In addition, they have discussed a paper on "Graph connections made by RD-Switchboard using NCI’s metadata", presented in the Reproducible Open Science workshop in Hannover September 2016.

Tactical Keyword Research in A RankBrain World

Peter "Dr. Pete" Meyers

Converting Metadata to Linked Data

This document discusses converting metadata to linked open data. It provides an overview of the process of mapping metadata fields and their values to URIs and standardized vocabularies. This involves selecting existing terms where possible, cleaning up field values, and manually mapping values that don't match existing terms. It also discusses tools for working with linked data and principles for publishing open data online.

Open Harvester - Search publications for a researcher from CrossRef, PubMed a...

Muhammad Javed

Scientists are looking for ways to leverage web 2.0 technologies in the research laboratory and as a consequence a number of approaches to web-based electronic notebooks are being evaluated. In this presentation I discuss the Eureka Research Workbench, an electronic laboratory notebook built on semantic technology and XML. Using this approach the context of the information recorded in the laboratory can be captured and searched along with the data itself. A discussion of the current system is presented along with the next planned development of the framework and long-term plans relative to linked open data. Presented at the 246th American Chemical Society Meeting in Indianapolis, IN, USA on September 12th, 2013.

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...

EDINA, University of Edinburgh

An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded. In this paper we highlight the current status of progress toward semantic representation of science in ELNs.

2010 06 ipaw_prv

Jun Zhao

SciDataCon 2014 TDM Workshop Intro Slides

Jenny Molloy

This document discusses text and data mining (TDM) and provides definitions from 1982, 1999, and 2008 that describe mining as automatically generating logical representations of text passages, the (semi)automated discovery of trends and patterns across large datasets, and the use of automated methods to exploit knowledge in biomedical literature. It also lists different types of content that can be mined, such as images, graphs, tables, datasets, and text, and provides 101 potential uses for content mining, such as finding papers about chemistry in German or papers acknowledging support from the Wellcome Trust.

The OpenOffice.org ODF Toolkit Project

Alexandro Colorado

Svante Schubert presented on metadata and the new metadata model for OpenDocument Format 1.2. The new model addresses limitations of the current ODF metadata by making it more extensible and descriptive. It uses RDF and OWL to annotate content in a common way, aligning with semantic web standards. Metadata is stored in RDF files and linked to content elements via IDs. This allows software to more easily find, combine and share information. OpenOffice.org 3 will provide APIs to access and extend the new metadata capabilities.

Using OpenURL Activity Data - Activity Data Online Exchange Event

The document discusses OpenURL activity data collected by the OpenURL Router. It describes what the data includes, such as anonymized IP addresses and metadata about journal articles accessed. The goals of the project are to make this activity data openly available, develop prototype services using the data, and potentially aggregate data from other institutions to analyze usage on a broader scale. Key considerations for aggregating data include legal issues regarding personal data, technical challenges in standardizing data extraction, and the financial costs of ongoing data sharing and maintenance.

Meher ppt

This document summarizes basic search techniques for navigating electronic information sources. It discusses searching by keywords and phrases, truncation to find different word forms, proximity searches to specify distance between words within sentences or paragraphs, Boolean searches using AND, OR and NOT operators, limiting searches by date or file type, and field searching within specific fields like titles or URLs. The techniques described allow researchers to efficiently search and retrieve relevant electronic documents.

Keyword searching idc

SuchittaU

Liberating Laboratory Data - Eureka

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

The current movement toward openness and sharing of data is likely to have a profound effect on the speed of scientific research and the complexity of questions we can answer. However, a fundamental problem with currently available datasets (and their metadata) is heterogeneity in terms of implementation, organization, and representation. To address this issue we have developed a generic scientific data model (SDM) to organize and annotate raw and processed data, and the associated metadata. This paper will present the current status of the SDM, implementation of the SDM in JSON-LD, and the associated scientific data model ontology (SDMO). Example usage of the SDM to store data from a variety of sources with be discussed along with future plans for the work.

Annotopia open annotation services platform

Annotopia is an open-access, open-source, open annotation services platform developed for scientific annotation of documents and datasets on the web using the W3C Open Annotation model http://www.openannotation.org/spec/core/. Using Annotopia, virtually any client application including lightweight web clients, can create, selectively share, and access annotation of web documents and data. This can be done regardless of the ownership of the base objects being annotated. Annotopia supports unstructured, semi-structured and fully-structured (semantic) annotation; manual and automated (textmining) annotation; permissions, groups, and sharing. It also provides access to specialized vocabulary and text analytics services. Annotopia is an open source platform licensed under Apache 2.0.

Stack queue

Majharoddin Kazi

The document discusses stacks and queues, which are linear data structures that maintain order. Stacks follow LIFO (last in, first out) order, where new elements are added to the top and the top element is removed first. Queues follow FIFO (first in, first out) order, where new elements are added to the rear and elements are removed from the front. The document compares stacks and queues, noting that stacks are used for calculations and function calls while queues are used for character buffers and print queues.

Ngsp

This document discusses challenges with the current scientific publishing system and proposes a vision for next generation scientific publishing (NGSP). Some key problems include retractions due to misconduct, lack of reproducibility, and non-reusable data and methods. NGSP would feature transparent and computable data and methods, open annotation of narratives and objects, and no restrictions on text mining or remixing. It would move information more quickly and allow verification through an open, service-oriented system without walled gardens. Taking NGSP forward will require collaboration across stakeholders in research communications.

Scientific Units in the Electronic Age

Scientists have standardized on the SI unit system since the late 1700’s. While much work has been done over the years to refine and redefine the system, little has formally done to standardize the representation of the SI units in electronic systems. This paper will present a summary of current efforts toward electronic representation of scientific units in text, XML, and RDF, an analysis of needs for current computer/network systems, and an outline of future work.

FAIRness through a novel combination of Web technologies

Research Data Alliance

The document discusses making data FAIR (Findable, Accessible, Interoperable, and Reusable) through a novel combination of web technologies. It describes the core FAIR principles for each component - findable, accessible, interoperable, and reusable. It then discusses how applying these principles through an "internet-inspired" approach using existing standards and protocols could help make large, heterogeneous and complex data more actionable for various applications and users. The presentation provides examples of how this could work through a layered architecture similar to the internet, with shared standards and specifications at each layer.

Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...

Recently, the US government has mandated that publicly funded scientific research data be freely made available in a useable form, allowing integration of data in other systems. While this mandate has been articulated, existing publications and new papers (PDF) still do not provide accessible data, meaning that the usefulness is limited without human intervention. This presentation outlines our efforts to extract scientific data from PDF files, using the PDFToText software and regular expressions (regex), and process it into a form that structures the data and its context (metadata). Extracted data is processed (cleaned, normalized), organized, and inserted into a contextually developed MySQL database. The data and metadata can then be output using a generic JSON-LD based scientific data model (SDM) under development in our laboratory.

Meher ppt (1)

This document discusses basic search techniques for electronic information sources. It describes keyword and phrase searching, truncation searching using right, left, and internal truncation, proximity searching within words, sentences and paragraphs with ordered and unordered options, Boolean searching using AND, OR and NOT operators, limiting searches by date, file type or other limits, and field searching within specific fields like title or URL. The techniques covered allow researchers to more effectively search and navigate the growing amount of electronic information available.

Search Me: Using Lucene.Net

gramana

Fairport domain specific metadata using w3 c dcat & skos w ontology views

FAIRPORT is an international project to develop a lightweight interoperability architecture for biomedical - and potentially other - data repositories. This slide deck is a presentation to the FAIRPORT technical team. It describes a proposed model for supporting domain-specific search metadata using a common schema model across all repositories. The proposal makes use of the following existing technologies, with minor extensions: - the W3C DCAT model for dataset description - the W3C SKOS knowledge organization system - OWL2 Ontology Language - Dublin Core Vocabulary - NCBO Bioportal biomedical ontologies collection

Linked Open Data Fundamentals for Libraries, Archives and Museums

trevorthornton

This document provides an overview of linked open data concepts for libraries, archives, and museums. It discusses what linked open data is, potential benefits for cultural institutions, and technical concepts like URIs, HTTP, RDF, ontologies, and SPARQL. The document also covers publishing linked open data by establishing URIs for resources and using content negotiation. Trust and attribution of linked data sources are addressed. Open data licensing, including options from Creative Commons, is also summarized.

The Semantic Web #4 - RDF (1)

Myungjin Lee

What's hot

Open Bibliography, Citations and Scholarship

benosteen

Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...

EDINA, University of Edinburgh

2010 06 ipaw_prv

Jun Zhao

SciDataCon 2014 TDM Workshop Intro Slides

Jenny Molloy

The OpenOffice.org ODF Toolkit Project

Alexandro Colorado

Using OpenURL Activity Data - Activity Data Online Exchange Event

Meher ppt

Keyword searching idc

SuchittaU

Liberating Laboratory Data - Eureka

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

Annotopia open annotation services platform

Stack queue

Majharoddin Kazi

Ngsp

Scientific Units in the Electronic Age

FAIRness through a novel combination of Web technologies

Research Data Alliance

Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...

Meher ppt (1)

Search Me: Using Lucene.Net

gramana

Fairport domain specific metadata using w3 c dcat & skos w ontology views

What's hot (20)

Open Bibliography, Citations and Scholarship

Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...

2010 06 ipaw_prv

SciDataCon 2014 TDM Workshop Intro Slides

The OpenOffice.org ODF Toolkit Project

Using OpenURL Activity Data - Activity Data Online Exchange Event

Meher ppt

Keyword searching idc

Liberating Laboratory Data - Eureka

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

Annotopia open annotation services platform

Stack queue

Ngsp

Scientific Units in the Electronic Age

FAIRness through a novel combination of Web technologies

Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...

Meher ppt (1)

Search Me: Using Lucene.Net

Fairport domain specific metadata using w3 c dcat & skos w ontology views

Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...

Presentation by: Karen Estlund, Sarah Hamid, and Bryce Peake At the CNI spring 2012 meeting, we presented on a new collaborative journal publishing project from The Fembot Collective and the University of Oregon (UO) Libraries, Ada: A Journal of Gender, New Media, and Technology. The Fembot Collective is a collaborative of feminist media scholars, producers, and artists engaged with the intersection of new media and technology and scholarly communication. One aspiration of this project was to reclaim the means of scholarly production through a community-centered model of open peer review and multi-modal publication processes. As a work in progress, Ada has continuously evolved to meet the needs of diverse authors, readers, and commentators. In the face of changing scholarly communication practices, the Fembot and library collaboration offers an alternative system of open-access publication and review that recaptures academic production structures in favor of cross-disciplinary, multi-modal, collaborative knowledge. Our community standards state that “responding is political work” emphasizing a space that demands constant redirection and active participation by its collaborators in order to generate new expressions of feminist open access scholarship over time. Now in our third year of publication and working on our ninth issue, we will review lessons learned about audience, production, infrastructure, design and assessment. We will discuss the ways in which our intervention has been transformed by, while also transforming, discussions about participatory media, open and collaborative peer review, production costs, and the intersections of technical and intellectual labor. http://adanewmedia.org http://fembotcollective.org https://library.uoregon.edu/digitalscholarship

Oregon Digital: Collaborative Hydra Development

APIs & Open Data with Oregon Digital Newspapers

The document discusses how APIs and open data can provide access to digitized newspaper content from the Oregon Digital Newspaper Program. It provides examples of different types of API requests that can be used including searches, requests for individual pages or titles, and batch requests. One example of a student project that used the newspaper data is also mentioned, and potential future opportunities are discussed.

Collaborative Digital Collection Management: Briefing on Oregon Digital & th...

Fostering A Graduate Research Community with Digital Scholarship Programs and...

RDF in Hydra Summit Overview