The Dendro research data management 
platform 
! 
Applying ontologies to long-term preservation in a collaborative 
environment 
João Rocha da Silva 
joaorosilva@gmail.com 
Faculdade de 
Engenharia da 
Universidade do 
João Aguiar Castro Porto / INESC TEC 
joaoaguiarcastro@gmail.com 
Cristina Ribeiro 
mcr@fe.up.pt DEI—Faculdade de 
Engenharia da 
Universidade do 
Porto / INESC TEC João Correia Lopes 
jlopes@fe.up.pt 
iPRES 2014, October 06 - 10 2014, Melbourne, Australia
Contents 
• Research data management in the long tail 
• Linked Open Data: why do we need it? 
• Collaboration for easier metadata production 
• The Dendro platform 
• Conclusions 
2
Research Data Management 
in the long tail of research 
Why we need to start early 
3
The long tail of research 
2011: Science magazine reviewers 
are asked about their data requirements 
~1700 replied 
4
Source 
Dealing with data. Challenges and opportunities. Introduction. (2011). Science 
(New York, N.Y.), 331(6018), 692–3. doi:10.1126/science.331.6018.692 
5
Source 
Dealing with data. Challenges and opportunities. Introduction. (2011). Science 
(New York, N.Y.), 331(6018), 692–3. doi:10.1126/science.331.6018.692 
6
Gathering 
Processing 
Paper writing 
Preservation, 
Sharing 
7
Gathering 
Processing 
Paper writing 
Researcher 
leaves 
Metadata 
8
Gathering 
Processing 
Paper writing 
Project ends 
9
Gathering 
Processing 
Paper writing 
“Where is the data?” 
“How / when / by whom was the data 
produced?” 
10
Curators cannot cope with a posteriori 
description 
Researchers must participate 
in RDM from the start 
They are the domain experts 
11
Linked Open Data 
What is it? Why do we need it? 
12
Linked Open Data 
• Simplicity! 
- LOD is a very simple model for representing knowledge 
• Meaning! 
- Resources are interlinked by properties with established 
meaning 
• Interoperability! 
- Standard methods for querying data - SPARQL 
- Representations use standard formats - RDF, OWL 
13
nie:isLogicalPartOf 
!!!! 
http://dendro.fe.up.pt/ 
project/datanotes/data 
“Base data of the 
DCB experiments” 
dc:title 
nie:title 
base data.xls 
rdf:type 
nie:File 
dcb:initialCrackLength 
180mm 
! 
! 
!!! 
! 
http://dendro.fe.up.pt/project/ 
datanotes/data/base 
%20data.xls 
14
Analytical Chemistry 
Dataset 
Fracture Mechanics 
Dataset 
… 
Generic 
Author 
Description 
Creation date 
… 
Author 
Description 
Creation date 
… 
… 
Domain 
Specific 
Sample Count 
Analysed Substance 
… 
Initial Crack Length 
Specimen Type 
… 
15
Collaboration 
For metadata useful now and in the future 
16
Gathering 
Processing 
Paper writing 
Preservation, 
Sharing 
17
Gathering 
Deposit 
Collaboration Description 
“Freeze” in 
repository 
Sharing 
18
Gathering 
… 
19
Demo 
Dendroβ 
20
The Dendro platform 
An open-source platform for Linked Open Data in 
research environments 
21
Metadata 
Ontologies 
Description 
• Data store fully built on 
Linked Data 
• No relational database 
to preserve 
• Model can grow by 
loading more ontologies 
• External systems can 
retrieve resources via 
SPARQL 
22
Metadata 
Ontologies 
File 
Storage 
! 
! 
Deposit 
• GridFS cluster for 
large or 
numerous files 
• Can work in the 
cloud if needed 
23
Metadata 
Ontologies 
Business 
Logic 
File 
Storage 
! 
! 
Collaboration 
• Flexible access control 
system 
• Backup / Restore 
• Versions history 
• File type previews 
• Integration 
• DSpace (SWORD) 
• ePrints (SWORD) 
• CKAN 
• Figshare 
• …….. 
24
Metadata 
Ontologies 
API 
Business 
Logic 
File 
Storage 
! 
! 
Sharing 
• All operations 
available via RESTful 
API using JSON 
• All resources are de-referenceable 
(HTTP 
content negotiation) 
• Plugin architecture 
allows integration 
with external systems 
Web UI 
25
For curators 
• Curators can work with researchers to build more 
ontologies using existing tools (e.g. Protégé) 
• Established ontologies can be loaded (DC, FOAF…) 
• Ontologies mature (reuse across Dendro instances) 
• Data, metadata and its meaning go together 
Beyond ! 
INSPIRE: An ontology for biodiversity metadata records Creating lightweight ontologies for dataset description: Practical applications in a 
cross-domain research data management workflow 
Rocha da Silva, J., Castro, J., Ribeiro, C., Honrado, J., Lomba, A., Gonçalves, J. 
Castro, J., Rocha da Silva, J., Ribeiro, C. 
10th International Workshop on Ontology Content (OntoContent 2014) 
Digital Libraries 2014 (DL2014) 
(pre-print available at http://dendro.fe.up.pt/) (pre-print available at http://dendro.fe.up.pt/) 
26
For programmers 
• 100% Open-source software 
• Rich API allows Dendro to be connected to almost 
any system (e.g. mobile apps) 
Ontology-based multi-domain metadata for research data management using triple stores 
LabTablet: semantic metadata collection on a multi-domain laboratory notebook 
Rocha da Silva, J., Ribeiro, C., Correia Lopes, J. 
Amorim,R., Castro, J., Rocha da Silva, J., Ribeiro, C. 
18th International Database Engineering & Applications Symposium (IDEAS 2014) 
8th Metadata and Semantics Research Conference (MTSR 2014) 
(pre-print available at http://dendro.fe.up.pt/) (pre-print available at http://dendro.fe.up.pt/) 
27
Dendro dies, data lives on 
Triple Store Ontologies 
“Database” “Documentation” 
28
Conclusions 
• Research data management should start early 
• Linked Open Data: simple, interoperable, flexible 
• Collaboration support helps researchers while 
gathering metadata for later deposit 
• Dendro: a fully open-source platform for RDM, built 
on Linked Open Data 
• Dendro integrates with major repository platforms 
29
Conclusions (cont’d) 
• Ontologies: source of metadata descriptors 
• Data model grows as more ontologies are loaded 
• Curators can model and share the ontologies 
• Domain ontologies evolve with reuse 
30
Visit us at 
http://dendro.fe.up.pt
João Rocha da Silva! 
PhD Student, Senior Web Developer, Semantic Web 
at INESC TEC 
João Rocha da Silva is an Informatics Engineering 
PhD student at the Faculty of Engineering of the 
University of Porto. He specializes on research 
data management, applying the latest Semantic 
Web Technologies to the adequate preservation 
and discovery of research data assets.! ! 
He is also an experienced freelancer iOS 
Developer with several Apps published on the App 
Store, and a self-taught DIY mechanic with a 
special interest in classic cars, particularly his 1987 
Toyota Corolla GT Twin Cam, also known as Hachi- 
Roku or AE86.! 
João Aguiar Castro! 
PhD Student, Research Data Management researcher 
at INESC TEC 
João Aguiar Castro holds a Masters degree in 
Information Science, and is currently a Digital 
Platforms PhD student at the Faculty of Engineering of 
the University of Porto. He is a research data 
management researcher, particularly in the definition of 
application profiles that meet the metadata needs of 
different research domains 
Cristina Ribeiro! João Correia Lopes! 
João Correia Lopes is an Assistant Professor in 
Informatics Engineering at Universidade do Porto and a 
researcher at INESC TEC. He has graduated in Electrical 
Engineering in the University of Porto in 1984 and holds a 
PhD in Computing Science by Glasgow University 
in1997. His teaching includes undergraduate and 
graduate courses in databases and web applications, 
software engineering and object-oriented programming, 
markup languages and semantic web. He has been 
involved in research projects in the area of long-term 
preservation, service-oriented architectures and e- 
Science. Currently his main research interests are e- 
Science and the management of research data. 
Assistant Professor in Informatics Engineering at 
Universidade do Porto, Researcher at INESC TEC 
Cristina Ribeiro is an Assistant Professor in 
Informatics Engineering at Universidade do Porto 
and a researcher at INESC TEC. She has 
graduated in Electrical Engineering, holds a Master 
in Electrical and Computer Engineering and a Ph.D. 
in Informatics. Her teaching includes undergraduate 
and graduate courses in information retrieval, 
digital libraries, knowledge representation and 
markup languages. She has been involved in 
research projects in the areas of cultural heritage, 
multimedia databases and information retrieval. 
Currently her main research interests are 
information retrieval, digital preservation and the 
management of research data. 
Assistant Professor in Informatics Engineering at 
Universidade do Porto, Researcher at INESC TEC
Extras
RDF/XML, 
SPARQL 
Endpoint 
HTML 
JSON 
API 
DB Adapter ES Endpoint GridFS Client 
Presentation 
Graph Database 
(LOD) 
Web Interface 
Distributed 
document index 
AngularJS 
(JavaScript) 
NodeJS 
(JavaScript) 
File Storage 
Cluster 
Business Logic 
Logic 
Openlink 
Virtuoso 7 
ElasticSearch 
MongoDB 
(GridFS) 
Web Human Users 
JSON JSON JSON 
Data
Curated 
Dataset 
Working 
Files 
Deposit 
Curator 
Dendro 
FOAF 
DC 
dc:title 
nie:isPartOf 
dcb:specimenLength 
Ontology 
concept 
reuse 
Web Portal 
SPARQL 
Endpoint 
Sharing & 
evolution 
“Mature” 
ontologies on the web 
Metadata 
validation 
Data 
producers 
Free-Text 
Search 
API 
CKAN 
Dryad 
Domain-Specific 
Lightweight Ontologies 
dcb 
dcb 
Data 
reuser 
dcb 
Specification of new metadata ontologies 
1 
2 
3 
4
The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

The Dendro research data management platform: Applying ontologies to long-term preservation in a collaborative environment

  • 1.
    The Dendro researchdata management platform ! Applying ontologies to long-term preservation in a collaborative environment João Rocha da Silva joaorosilva@gmail.com Faculdade de Engenharia da Universidade do João Aguiar Castro Porto / INESC TEC joaoaguiarcastro@gmail.com Cristina Ribeiro mcr@fe.up.pt DEI—Faculdade de Engenharia da Universidade do Porto / INESC TEC João Correia Lopes jlopes@fe.up.pt iPRES 2014, October 06 - 10 2014, Melbourne, Australia
  • 2.
    Contents • Researchdata management in the long tail • Linked Open Data: why do we need it? • Collaboration for easier metadata production • The Dendro platform • Conclusions 2
  • 3.
    Research Data Management in the long tail of research Why we need to start early 3
  • 4.
    The long tailof research 2011: Science magazine reviewers are asked about their data requirements ~1700 replied 4
  • 5.
    Source Dealing withdata. Challenges and opportunities. Introduction. (2011). Science (New York, N.Y.), 331(6018), 692–3. doi:10.1126/science.331.6018.692 5
  • 6.
    Source Dealing withdata. Challenges and opportunities. Introduction. (2011). Science (New York, N.Y.), 331(6018), 692–3. doi:10.1126/science.331.6018.692 6
  • 7.
    Gathering Processing Paperwriting Preservation, Sharing 7
  • 8.
    Gathering Processing Paperwriting Researcher leaves Metadata 8
  • 9.
    Gathering Processing Paperwriting Project ends 9
  • 10.
    Gathering Processing Paperwriting “Where is the data?” “How / when / by whom was the data produced?” 10
  • 11.
    Curators cannot copewith a posteriori description Researchers must participate in RDM from the start They are the domain experts 11
  • 12.
    Linked Open Data What is it? Why do we need it? 12
  • 13.
    Linked Open Data • Simplicity! - LOD is a very simple model for representing knowledge • Meaning! - Resources are interlinked by properties with established meaning • Interoperability! - Standard methods for querying data - SPARQL - Representations use standard formats - RDF, OWL 13
  • 14.
    nie:isLogicalPartOf !!!! http://dendro.fe.up.pt/ project/datanotes/data “Base data of the DCB experiments” dc:title nie:title base data.xls rdf:type nie:File dcb:initialCrackLength 180mm ! ! !!! ! http://dendro.fe.up.pt/project/ datanotes/data/base %20data.xls 14
  • 15.
    Analytical Chemistry Dataset Fracture Mechanics Dataset … Generic Author Description Creation date … Author Description Creation date … … Domain Specific Sample Count Analysed Substance … Initial Crack Length Specimen Type … 15
  • 16.
    Collaboration For metadatauseful now and in the future 16
  • 17.
    Gathering Processing Paperwriting Preservation, Sharing 17
  • 18.
    Gathering Deposit CollaborationDescription “Freeze” in repository Sharing 18
  • 19.
  • 20.
  • 21.
    The Dendro platform An open-source platform for Linked Open Data in research environments 21
  • 22.
    Metadata Ontologies Description • Data store fully built on Linked Data • No relational database to preserve • Model can grow by loading more ontologies • External systems can retrieve resources via SPARQL 22
  • 23.
    Metadata Ontologies File Storage ! ! Deposit • GridFS cluster for large or numerous files • Can work in the cloud if needed 23
  • 24.
    Metadata Ontologies Business Logic File Storage ! ! Collaboration • Flexible access control system • Backup / Restore • Versions history • File type previews • Integration • DSpace (SWORD) • ePrints (SWORD) • CKAN • Figshare • …….. 24
  • 25.
    Metadata Ontologies API Business Logic File Storage ! ! Sharing • All operations available via RESTful API using JSON • All resources are de-referenceable (HTTP content negotiation) • Plugin architecture allows integration with external systems Web UI 25
  • 26.
    For curators •Curators can work with researchers to build more ontologies using existing tools (e.g. Protégé) • Established ontologies can be loaded (DC, FOAF…) • Ontologies mature (reuse across Dendro instances) • Data, metadata and its meaning go together Beyond ! INSPIRE: An ontology for biodiversity metadata records Creating lightweight ontologies for dataset description: Practical applications in a cross-domain research data management workflow Rocha da Silva, J., Castro, J., Ribeiro, C., Honrado, J., Lomba, A., Gonçalves, J. Castro, J., Rocha da Silva, J., Ribeiro, C. 10th International Workshop on Ontology Content (OntoContent 2014) Digital Libraries 2014 (DL2014) (pre-print available at http://dendro.fe.up.pt/) (pre-print available at http://dendro.fe.up.pt/) 26
  • 27.
    For programmers •100% Open-source software • Rich API allows Dendro to be connected to almost any system (e.g. mobile apps) Ontology-based multi-domain metadata for research data management using triple stores LabTablet: semantic metadata collection on a multi-domain laboratory notebook Rocha da Silva, J., Ribeiro, C., Correia Lopes, J. Amorim,R., Castro, J., Rocha da Silva, J., Ribeiro, C. 18th International Database Engineering & Applications Symposium (IDEAS 2014) 8th Metadata and Semantics Research Conference (MTSR 2014) (pre-print available at http://dendro.fe.up.pt/) (pre-print available at http://dendro.fe.up.pt/) 27
  • 28.
    Dendro dies, datalives on Triple Store Ontologies “Database” “Documentation” 28
  • 29.
    Conclusions • Researchdata management should start early • Linked Open Data: simple, interoperable, flexible • Collaboration support helps researchers while gathering metadata for later deposit • Dendro: a fully open-source platform for RDM, built on Linked Open Data • Dendro integrates with major repository platforms 29
  • 30.
    Conclusions (cont’d) •Ontologies: source of metadata descriptors • Data model grows as more ontologies are loaded • Curators can model and share the ontologies • Domain ontologies evolve with reuse 30
  • 31.
    Visit us at http://dendro.fe.up.pt
  • 32.
    João Rocha daSilva! PhD Student, Senior Web Developer, Semantic Web at INESC TEC João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets.! ! He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi- Roku or AE86.! João Aguiar Castro! PhD Student, Research Data Management researcher at INESC TEC João Aguiar Castro holds a Masters degree in Information Science, and is currently a Digital Platforms PhD student at the Faculty of Engineering of the University of Porto. He is a research data management researcher, particularly in the definition of application profiles that meet the metadata needs of different research domains Cristina Ribeiro! João Correia Lopes! João Correia Lopes is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. He has graduated in Electrical Engineering in the University of Porto in 1984 and holds a PhD in Computing Science by Glasgow University in1997. His teaching includes undergraduate and graduate courses in databases and web applications, software engineering and object-oriented programming, markup languages and semantic web. He has been involved in research projects in the area of long-term preservation, service-oriented architectures and e- Science. Currently his main research interests are e- Science and the management of research data. Assistant Professor in Informatics Engineering at Universidade do Porto, Researcher at INESC TEC Cristina Ribeiro is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. She has graduated in Electrical Engineering, holds a Master in Electrical and Computer Engineering and a Ph.D. in Informatics. Her teaching includes undergraduate and graduate courses in information retrieval, digital libraries, knowledge representation and markup languages. She has been involved in research projects in the areas of cultural heritage, multimedia databases and information retrieval. Currently her main research interests are information retrieval, digital preservation and the management of research data. Assistant Professor in Informatics Engineering at Universidade do Porto, Researcher at INESC TEC
  • 33.
  • 34.
    RDF/XML, SPARQL Endpoint HTML JSON API DB Adapter ES Endpoint GridFS Client Presentation Graph Database (LOD) Web Interface Distributed document index AngularJS (JavaScript) NodeJS (JavaScript) File Storage Cluster Business Logic Logic Openlink Virtuoso 7 ElasticSearch MongoDB (GridFS) Web Human Users JSON JSON JSON Data
  • 35.
    Curated Dataset Working Files Deposit Curator Dendro FOAF DC dc:title nie:isPartOf dcb:specimenLength Ontology concept reuse Web Portal SPARQL Endpoint Sharing & evolution “Mature” ontologies on the web Metadata validation Data producers Free-Text Search API CKAN Dryad Domain-Specific Lightweight Ontologies dcb dcb Data reuser dcb Specification of new metadata ontologies 1 2 3 4