SlideShare a Scribd company logo
1 of 30
‹#› Het begint met een idee
MAKING SOCIAL SCIENCE MORE REPRODUCIBLE
BY ENCAPSULATING ACCESS TO LINKED DATA
Albert Meroño-Peñuela
Richard Zijdeman
Ashkan Ashkpour
Rinke Hoekstra
ESSHC 2018
Vrije Universiteit Amsterdam
 VU University Amsterdam – Computer
Science (Knowledge Representation &
Reasoning group)
 International Institute of Social
History (IISG), Amsterdam
 CLARIAH – National Infrastructure for
Digital Humanities
> DataLegend : Structured Data Hub
 DANS: CEDAR – Dutch historical
censuses as 5-star LOD
2
INSTITUTIONAL SLIDE
Vrije Universiteit Amsterdam
Links > Queries
3
QUERIES AND LINKS
Vrije Universiteit Amsterdam
 Reproducibility: proxy for replicability
 Key for the scientific method
 Currently, we include a small link to “cite datasets” as data
provenance
4
REPRODUCIBILITY & DATA PROVENANCE
Vrije Universiteit Amsterdam
 Point to an “open dataset”
 Two big problems
Combination of multiple datasets
Subset of the original data
 Usually these two are resolved with “data munging”
 NOT part of the citation
 Critical for reproducibility!
5
DATA “CITATIONS”
Vrije Universiteit Amsterdam
 Solutions achieved through Linked (Open) Data
> Combination of datasets: RDF
> Selecting and transforming subsets of the data: SPARQL
 Success shown in a great number of disciplines, including
social history
> See http://linkeddatabook.com/editions/1.0/
6
SEMANTIC WEB SOLUTIONS
Vrije Universiteit Amsterdam
7
RDF: COMBINING DATASETS
<https://www.w3.org/People/Berners-Lee/>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person>
Vrije Universiteit Amsterdam
 Implement the “research question” of the study
8
SPARQL: SELECTING SUBSETS OF THE DATA
Vrije Universiteit Amsterdam
Unfortunately there are two problems...
1. Encoding a research question in SPARQL is difficult (as
you’ve seen)
2. Lack of methods and tools to save, maintain and execute
these queries quickly (i.e. without having to write them
again)
9
OKAY, HOW CAN I USE THEM?
‹#› Het begint met een idee
10 Het begint met een idee
 http://grlc.io/ and
https://github.com/CLARIAH/grlc
1. Incentivates collaborative writing
of SPARQL queries in GitHub
2. Automatically builds APIs using
those queries, providing
executable URIs (HTTP links)
This means:
 External query management
 API is organized just as the GitHub
repository
 Thin layer – nothing stored server-
side
10 Faculty / department / title presentation
‹#› Het begint met een idee
11 Het begint met een idee
 Collaborative writing of research
questions in SPARQL
 Good support of query curation
processes
> Versioning
> Branching
> Clone-pull-push
> Pull requests
 Web-friendly features!
> One URI per query
> Uniquely identifiable
> De-referenceable
(raw.githubusercontent.com)
11 Faculty / department / title presentation
SPARQL IN GITHUB
Vrije Universiteit Amsterdam
12
AUTOMATIC BUILD OF APIS
• 1 research
question = 1
SPARQL query = 1
URI (HTTP link)
• Actionable
(executable if we
click them)
• JSON specification,
Swagger-UI for
human readability
Vrije Universiteit Amsterdam
13
… AND THE ACTIONABLE LINKS?
 Assuming your queries are at
https://github.com/:owner/:repo
> http://grlc.io/api/:owner/:repo/spec returns the JSON swagger
spec
> http://grlc.io/api/:owner/:repo/ returns the swagger UI
> http://grlc.io/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n
calls operation with specifiec parameter values
> Uses BASIL’s SPARQL variable name convention for query parameters
 Sends requests to
> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their
decorators
> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference
queries, get the SPARQL, and parse it
> Supports versioning through http://grlc.io/api/:owner/:repo/commit/:sha
Vrije Universiteit Amsterdam
14
GRLC’S ARCHITECTURE
Vrije Universiteit Amsterdam
15
SPARQL DECORATOR SYNTAX
Vrije Universiteit Amsterdam
16
SPARQL + RDFA + DUMPS + #LD
• Compatible with most
Linked Data access
methods
• Loads remote RDFa/dumps
in memory
• Uses TPF for #LD servers
• Mixes all these into one
homogeneous API
Vrije Universiteit Amsterdam
17
PROVENANCE
• Two sources: query history,
spec generation
• Uses W3C PROV
• Uses Git2PROV to get
query history
• Adds spec provenance at
generation time
• Visualizations with PROV-
O-Viz
(http://provoviz.org/)
Vrije Universiteit Amsterdam
18
ENUMERATIONS & DROPDOWNS
• Fills in the
swag[paths][op][method][
parameters][enum] array
• Uses the triple pattern of
the SPARQL query’s BGP
against the same SPARQL
endpoint
Vrije Universiteit Amsterdam
19
CONTENT NEGOTIATION
• API endpoints can now
end with .content_type
(e.g grlc.io/CLARIAH/wp-
queries/MyQuery.csv)
• Supports .csv, .json,
.html (can be extended)
• grlc sets ‘Accept’ HTTP
header and agnostically
returns same ‘Content-
Type’ as the SPARQL
endpoint
• Up to the SPARQL
endpoint to accept it
Vrije Universiteit Amsterdam
20
SAME QUESTIONS, DIFFERENT DATASETS
Three ways of separating queries (research questions,
SPARQL) from the data (datasets, endpoints):
• Using a grlc-repository endpoint.txt file (OK, but all
queries asked against same data)
• Using a query-dependent #+ endpoint: name (OK,
but endpoint still depends on query at execution)
• Using a query-dependent HTTP parameter (OK,
profit!  )
Vrije Universiteit Amsterdam
21
SAME QUESTIONS, DIFFERENT DATASETS
If our SPARQL query (=research question) is at
http://grlc.io/api/user/repo/query
We can ask it to many different endpoints using
http://grlc.io/api/user/repo/query?endpoint=dataset1
http://grlc.io/api/user/repo/query?endpoint=dataset2
etc
Vrije Universiteit Amsterdam
 1,551 unique visitors since July 2016
 3,251 sessions
 58.97% return rate
 5 active open source contributors, 31 pull requests
 Community of users and developers
22
QUALITATIVE EVALUATION
Vrije Universiteit Amsterdam
> “multiple copies of the same queries in different places
(…) was problematic. grlc allows queries to be
maintained in a single location”
> “with grlc the R code becomes clearer due to the
decoupling with SPARQL; and shorter, since a curl
suffices to retrieve the data”
> “it allows us to manage SPARQL queries separate from
the rest of the API – this enables, for instance, to have
different queries without having to deploy a new version
of the API”
> “we use grlc to provide FAQ for those who would prefer
REST over SPARQL, but also to explore the data”
> “we use grlc to expose the ECAI conference proceedings
not only as Linked Data that can be used by Semantic
Web practitioners, but also as a Web API that web
developers can consume”
> “grlc helps to share, extend and repurpose queries by
providing a URI for the resulted queries and by
supporting collaborative update of those queries”23
USE CASES
Vrije Universiteit Amsterdam
24
QUANTITATIVE EVALUATION
The cost of grlc is independent of the dataset size
HTTP requests and payloads are important costs
Vrije Universiteit Amsterdam
 The need of data links in papers for reproducibility
 Dataset citations not enough
> Combine datasets
> Reduce and transform datasets
 We use Linked Open Data and Semantic Web technology in grlc
> Motivates collaborative writing of research questions in SPARQL
> Enables the maintenance and creation of ACTIONABLE DATA LINKS to
combine and transform datasets
> Allows to separate queries and data, and ask the same questions to
different datasets
 Success in multiple domains, including Social Science & History
 Open source
http://grlc.io
https://github.com/CLARIAH/grlc25
CONCLUSIONS
‹#› Het begint met een idee
THANK YOU!
DATALEGEND.NET
CLARIAH.NL
26
http://grlc.io/
Vrije Universiteit Amsterdam
27
PAGINATION
• Large query results are
typically nasty to consuming
applications
• Split the result in multiple
parts (or “pages”)
• Size? #+ pagination: 100
• Navigating pages
• rel=next,prev,first,last links
in the HTTP headers (GitHub
API Traversal convention)
• Extra request parameter
?page (defaults to 1)
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-project/Queries/houseType_all
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18447
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=2>; rel=next,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=889>; rel=last
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=3
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18142
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=4>; rel=next,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=2>; rel=prev,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=1>; rel=first,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=889>; rel=last
Vrije Universiteit Amsterdam
28
DOCKER CONTAINER
• Uses docker
• Infrastructure-independent
install
• Bundles (composes) all required
packages (python, python libs,
grlc, nginx). Can be easily
extended to more
• Publicly available at
hub.docker.com
• One-command server deploy:
docker pull clariah/grlc
Vrije Universiteit Amsterdam
29
QUANTIATIVE EVALUATION
Vrije Universiteit Amsterdam
30
QUANTIATIVE EVALUATION

More Related Content

What's hot

New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 
SSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a countrySSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a countryRick Mourits
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
 
WG5: A data wrangling experiment
WG5: A data wrangling experimentWG5: A data wrangling experiment
WG5: A data wrangling experimentWARCnet
 
Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingTobias Kuhn
 
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard JensenTuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard JensenWARCnet
 
Introducing Web of Science Profiles
Introducing Web of Science ProfilesIntroducing Web of Science Profiles
Introducing Web of Science ProfilesORCID, Inc
 
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...Andrea Bollini
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichmentsemanticsconference
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Repository technologies
Repository technologiesRepository technologies
Repository technologiesAndrea Bollini
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphPeter Haase
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation SlidesDuraSpace
 
The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...ariadnenetwork
 

What's hot (20)

New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 
SSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a countrySSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a country
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
WG5: A data wrangling experiment
WG5: A data wrangling experimentWG5: A data wrangling experiment
WG5: A data wrangling experiment
 
Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
Citizen Science Open Data
Citizen Science Open DataCitizen Science Open Data
Citizen Science Open Data
 
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard JensenTuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
 
Linked Data
Linked DataLinked Data
Linked Data
 
Introducing Web of Science Profiles
Introducing Web of Science ProfilesIntroducing Web of Science Profiles
Introducing Web of Science Profiles
 
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Repository technologies
Repository technologiesRepository technologies
Repository technologies
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge Graph
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...
 

Similar to Making social science more reproducible by encapsulating access to linked data

Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAlbert Meroño-Peñuela
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticAlbert Meroño-Peñuela
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsAlbert Meroño-Peñuela
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked DataAlbert Meroño-Peñuela
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science GatewaysRaminder Singh
 
Plannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgetsPlannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgetskirstenveelo
 
Application integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsApplication integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsNandana Mihindukulasooriya
 
Designing RESTful APIs
Designing RESTful APIsDesigning RESTful APIs
Designing RESTful APIsanandology
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Juan Sequeda
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachSymfonyMu
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked DataDave Reynolds
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 

Similar to Making social science more reproducible by encapsulating access to linked data (20)

Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data Agnostic
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIs
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science Gateways
 
Plannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgetsPlannen Code Jam OpenSocial gadgets
Plannen Code Jam OpenSocial gadgets
 
Application integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsApplication integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standards
 
Designing RESTful APIs
Designing RESTful APIsDesigning RESTful APIs
Designing RESTful APIs
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Library cloud abcd
Library cloud   abcdLibrary cloud   abcd
Library cloud abcd
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Danbri Drupalcon Export
Danbri Drupalcon ExportDanbri Drupalcon Export
Danbri Drupalcon Export
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approach
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked Data
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 

More from Albert Meroño-Peñuela

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyAlbert Meroño-Peñuela
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsAlbert Meroño-Peñuela
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationAlbert Meroño-Peñuela
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesAlbert Meroño-Peñuela
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)Albert Meroño-Peñuela
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeAlbert Meroño-Peñuela
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftAlbert Meroño-Peñuela
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataAlbert Meroño-Peñuela
 

More from Albert Meroño-Peñuela (16)

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skills
 
The MIDI Linked Data Cloud
The MIDI Linked Data CloudThe MIDI Linked Data Cloud
The MIDI Linked Data Cloud
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music Notation
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
 
Historical Reasoning on the Web
Historical Reasoning on the WebHistorical Reasoning on the Web
Historical Reasoning on the Web
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)
 
What Is Linked Historical Data?
What Is Linked Historical Data?What Is Linked Historical Data?
What Is Linked Historical Data?
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept Drift
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
 
Semantic Web for the Humanities
Semantic Web for the HumanitiesSemantic Web for the Humanities
Semantic Web for the Humanities
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 
Linked Humanities data
Linked Humanities dataLinked Humanities data
Linked Humanities data
 

Recently uploaded

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Recently uploaded (20)

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

Making social science more reproducible by encapsulating access to linked data

  • 1. ‹#› Het begint met een idee MAKING SOCIAL SCIENCE MORE REPRODUCIBLE BY ENCAPSULATING ACCESS TO LINKED DATA Albert Meroño-Peñuela Richard Zijdeman Ashkan Ashkpour Rinke Hoekstra ESSHC 2018
  • 2. Vrije Universiteit Amsterdam  VU University Amsterdam – Computer Science (Knowledge Representation & Reasoning group)  International Institute of Social History (IISG), Amsterdam  CLARIAH – National Infrastructure for Digital Humanities > DataLegend : Structured Data Hub  DANS: CEDAR – Dutch historical censuses as 5-star LOD 2 INSTITUTIONAL SLIDE
  • 3. Vrije Universiteit Amsterdam Links > Queries 3 QUERIES AND LINKS
  • 4. Vrije Universiteit Amsterdam  Reproducibility: proxy for replicability  Key for the scientific method  Currently, we include a small link to “cite datasets” as data provenance 4 REPRODUCIBILITY & DATA PROVENANCE
  • 5. Vrije Universiteit Amsterdam  Point to an “open dataset”  Two big problems Combination of multiple datasets Subset of the original data  Usually these two are resolved with “data munging”  NOT part of the citation  Critical for reproducibility! 5 DATA “CITATIONS”
  • 6. Vrije Universiteit Amsterdam  Solutions achieved through Linked (Open) Data > Combination of datasets: RDF > Selecting and transforming subsets of the data: SPARQL  Success shown in a great number of disciplines, including social history > See http://linkeddatabook.com/editions/1.0/ 6 SEMANTIC WEB SOLUTIONS
  • 7. Vrije Universiteit Amsterdam 7 RDF: COMBINING DATASETS <https://www.w3.org/People/Berners-Lee/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>
  • 8. Vrije Universiteit Amsterdam  Implement the “research question” of the study 8 SPARQL: SELECTING SUBSETS OF THE DATA
  • 9. Vrije Universiteit Amsterdam Unfortunately there are two problems... 1. Encoding a research question in SPARQL is difficult (as you’ve seen) 2. Lack of methods and tools to save, maintain and execute these queries quickly (i.e. without having to write them again) 9 OKAY, HOW CAN I USE THEM?
  • 10. ‹#› Het begint met een idee 10 Het begint met een idee  http://grlc.io/ and https://github.com/CLARIAH/grlc 1. Incentivates collaborative writing of SPARQL queries in GitHub 2. Automatically builds APIs using those queries, providing executable URIs (HTTP links) This means:  External query management  API is organized just as the GitHub repository  Thin layer – nothing stored server- side 10 Faculty / department / title presentation
  • 11. ‹#› Het begint met een idee 11 Het begint met een idee  Collaborative writing of research questions in SPARQL  Good support of query curation processes > Versioning > Branching > Clone-pull-push > Pull requests  Web-friendly features! > One URI per query > Uniquely identifiable > De-referenceable (raw.githubusercontent.com) 11 Faculty / department / title presentation SPARQL IN GITHUB
  • 12. Vrije Universiteit Amsterdam 12 AUTOMATIC BUILD OF APIS • 1 research question = 1 SPARQL query = 1 URI (HTTP link) • Actionable (executable if we click them) • JSON specification, Swagger-UI for human readability
  • 13. Vrije Universiteit Amsterdam 13 … AND THE ACTIONABLE LINKS?  Assuming your queries are at https://github.com/:owner/:repo > http://grlc.io/api/:owner/:repo/spec returns the JSON swagger spec > http://grlc.io/api/:owner/:repo/ returns the swagger UI > http://grlc.io/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n calls operation with specifiec parameter values > Uses BASIL’s SPARQL variable name convention for query parameters  Sends requests to > https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their decorators > https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference queries, get the SPARQL, and parse it > Supports versioning through http://grlc.io/api/:owner/:repo/commit/:sha
  • 16. Vrije Universiteit Amsterdam 16 SPARQL + RDFA + DUMPS + #LD • Compatible with most Linked Data access methods • Loads remote RDFa/dumps in memory • Uses TPF for #LD servers • Mixes all these into one homogeneous API
  • 17. Vrije Universiteit Amsterdam 17 PROVENANCE • Two sources: query history, spec generation • Uses W3C PROV • Uses Git2PROV to get query history • Adds spec provenance at generation time • Visualizations with PROV- O-Viz (http://provoviz.org/)
  • 18. Vrije Universiteit Amsterdam 18 ENUMERATIONS & DROPDOWNS • Fills in the swag[paths][op][method][ parameters][enum] array • Uses the triple pattern of the SPARQL query’s BGP against the same SPARQL endpoint
  • 19. Vrije Universiteit Amsterdam 19 CONTENT NEGOTIATION • API endpoints can now end with .content_type (e.g grlc.io/CLARIAH/wp- queries/MyQuery.csv) • Supports .csv, .json, .html (can be extended) • grlc sets ‘Accept’ HTTP header and agnostically returns same ‘Content- Type’ as the SPARQL endpoint • Up to the SPARQL endpoint to accept it
  • 20. Vrije Universiteit Amsterdam 20 SAME QUESTIONS, DIFFERENT DATASETS Three ways of separating queries (research questions, SPARQL) from the data (datasets, endpoints): • Using a grlc-repository endpoint.txt file (OK, but all queries asked against same data) • Using a query-dependent #+ endpoint: name (OK, but endpoint still depends on query at execution) • Using a query-dependent HTTP parameter (OK, profit!  )
  • 21. Vrije Universiteit Amsterdam 21 SAME QUESTIONS, DIFFERENT DATASETS If our SPARQL query (=research question) is at http://grlc.io/api/user/repo/query We can ask it to many different endpoints using http://grlc.io/api/user/repo/query?endpoint=dataset1 http://grlc.io/api/user/repo/query?endpoint=dataset2 etc
  • 22. Vrije Universiteit Amsterdam  1,551 unique visitors since July 2016  3,251 sessions  58.97% return rate  5 active open source contributors, 31 pull requests  Community of users and developers 22 QUALITATIVE EVALUATION
  • 23. Vrije Universiteit Amsterdam > “multiple copies of the same queries in different places (…) was problematic. grlc allows queries to be maintained in a single location” > “with grlc the R code becomes clearer due to the decoupling with SPARQL; and shorter, since a curl suffices to retrieve the data” > “it allows us to manage SPARQL queries separate from the rest of the API – this enables, for instance, to have different queries without having to deploy a new version of the API” > “we use grlc to provide FAQ for those who would prefer REST over SPARQL, but also to explore the data” > “we use grlc to expose the ECAI conference proceedings not only as Linked Data that can be used by Semantic Web practitioners, but also as a Web API that web developers can consume” > “grlc helps to share, extend and repurpose queries by providing a URI for the resulted queries and by supporting collaborative update of those queries”23 USE CASES
  • 24. Vrije Universiteit Amsterdam 24 QUANTITATIVE EVALUATION The cost of grlc is independent of the dataset size HTTP requests and payloads are important costs
  • 25. Vrije Universiteit Amsterdam  The need of data links in papers for reproducibility  Dataset citations not enough > Combine datasets > Reduce and transform datasets  We use Linked Open Data and Semantic Web technology in grlc > Motivates collaborative writing of research questions in SPARQL > Enables the maintenance and creation of ACTIONABLE DATA LINKS to combine and transform datasets > Allows to separate queries and data, and ask the same questions to different datasets  Success in multiple domains, including Social Science & History  Open source http://grlc.io https://github.com/CLARIAH/grlc25 CONCLUSIONS
  • 26. ‹#› Het begint met een idee THANK YOU! DATALEGEND.NET CLARIAH.NL 26 http://grlc.io/
  • 27. Vrije Universiteit Amsterdam 27 PAGINATION • Large query results are typically nasty to consuming applications • Split the result in multiple parts (or “pages”) • Size? #+ pagination: 100 • Navigating pages • rel=next,prev,first,last links in the HTTP headers (GitHub API Traversal convention) • Extra request parameter ?page (defaults to 1) ~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all HTTP/1.0 200 OK Content-Type: text/csv; charset=UTF-8 Content-Length: 18447 Server: grlc/1.0.0 Link: <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=2>; rel=next, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=889>; rel=last ~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=3 HTTP/1.0 200 OK Content-Type: text/csv; charset=UTF-8 Content-Length: 18142 Server: grlc/1.0.0 Link: <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=4>; rel=next, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=2>; rel=prev, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=1>; rel=first, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=889>; rel=last
  • 28. Vrije Universiteit Amsterdam 28 DOCKER CONTAINER • Uses docker • Infrastructure-independent install • Bundles (composes) all required packages (python, python libs, grlc, nginx). Can be easily extended to more • Publicly available at hub.docker.com • One-command server deploy: docker pull clariah/grlc

Editor's Notes

  1. “A way of studying the connections of the past through connections in the present…”
  2. Talk about how all these social science history related projects brought interesting new ideas to Semantic Web research SSH can benefit from these results too from the methodological point of view
  3. Links like in the ones we use on the Web, HTTP resources (like web pages) If the Web is really about variety, both methods should be allowed, empowered and freely exchangeable…
  4. Great for attribution and track of use But to what extent does it enable replication? Not enough: whole tracks in conferences with replication studies and negative results Why does this happen?
  5. . Instructions on how to merge them? How to clean them? Outliers? Suited for the purpose of the study? Critical in Social History -> combination of datasets is mandatory for interdisciplinary studies (demography, economics, history of work, etc.) “Projections” or “aggregations” or other kinds of queries and transformations that we do before analysis
  6. MIME, enumerate, method, pagination
  7. MIME, enumerate, method, pagination