SlideShare a Scribd company logo
eBank UK: linking research data, scholarly communication
and learning
Dr Liz Lyon, Rachel Heery, Monica Duke,
UKOLN, University of Bath
Dr Simon Coles, Dr Jeremy Frey, Prof. Michael Hursthouse,
School of Chemistry, University of Southampton
Dr Leslie Carr, Christopher Gutteridge,
School of Electronics and Computer Science, University of Southampton
Abstract
This paper includes an overview of the changing landscape of scholarly communication and
describes outcomes from the innovative eBank UK project, which seeks to build links from e-
research through to e-learning. As introduction, the scholarly knowledge cycle is described and the
role of digital repositories and aggregator services in linking data-sets from Grid-enabled projects to
e-prints through to peer-reviewed articles as resources in portals and Learning Management
Systems, are assessed. The development outcomes from the eBank UK project are presented
including the distributed information architecture, requirements for common ontologies, data
models, metadata schema, open linking technologies, provenance and workflows. Some emerging
challenges for the future are presented in conclusion.
1. Introduction and context – the
scholarly knowledge cycle
The background and context for the eBank
project is the changing landscape of scholarly
communication afforded by the developing
global network infrastructure (Internet, World
Wide Web, Semantic Web and Grid) together
with the potential for increasing links between
the activities of e-research / e-science, digital
libraries and e-learning at all levels from
schools to post-graduate.
The flow of data and information in the
process of research and learning in an academic
setting leads ultimately to the creation and
acquisition of knowledge. Data and information
is continuously used and re-used in new ways as
part of a research activity or in the development
of course materials in support of learning. These
workflows are interlinked and together underpin
the scholarly knowledge cycle (figure 1) [SKC].
The creation, management, sharing and
interoperation of original data (which may be,
for example, numerical data generated by an
experiment or a survey, or images captured as
part of a clinical study together with the
associated metadata) is an intrinsic part of e-
Science, keeping data & metadata together. This
process consists of one or more subprocesses
which might include aggregation of
experimental data, selection of a particular data
subset, repetition of a laboratory experiment,
statistical analysis or modelling of a set of data,
manipulation of a molecular structure,
annotation of a diagram or editing of a digital
image, and which in turn generate modified
datasets. This in turn entails managing version
control and access to previous versions, as well
as auditing information on processes and
transformations.
The successive levels of derived data are
closely related to the original data (through
refinement, transformation, interpretation,
generalisation etc.) and may themselves be
further processed to form successive
interpretations or abstractions. If the data at any
particular stage of processing is considered to
be generically useful, it may be considered for
inclusion in a publicly-available database for
use beyond the boundaries of the project which
funded the data. This has only been undertaken
in fairly specialised fields of joint community
endeavour (for example in the bioinformatics
field, protein and genome sequences databases,
and in chemistry, material safety information,
and the crystal structures Cambridge Structural
Figure 1: The Scholarly Knowledge Cycle
Database), as such databases require some level
of service guarantee from a service provider.
Provision in other chemistry areas i.e. on
spectroscopic data is patchy due to the cost of
maintaining the data as well as attitudes towards
sharing.
Eventually, useful and interesting
knowledge can be interpreted from the
processed data; this knowledge is interpolated
into a scientific article for communication with
the scientific community through the
publication process. Sufficient data must be
included to support the arguments and claims
advanced in the article, however the practical
limits of the publication medium mean that the
data is shown in a very restricted form (such as
a graph or summary table). As well as
submitting the paper for peer review and
publication, it may be deposited in an eprint
archive for immediate dissemination and
discussion by the community.
These secondary items (the articles and
database tables) can themselves re-enter the
research cycle through a citation in a related
paper or inclusion in a meta-study, or inclusion
in postgraduate training or undergraduate
learning activities by a reference in a reading
list or in modular materials as part of an online
course in a Learning Management System.
The eBank UK project is addressing this
challenge of whole-lifecycle reuse by
investigating the role of aggregator services in
linking data-sets from Grid-enabled projects to
e-prints contained in digital repositories through
to peer-reviewed articles as resources in e-
learning portals.
2. The eBank UK Project –
addressing the data publication
bottleneck
This innovative JISC-funded project which is
led by UKOLN in partnership with the
Universities of Southampton and Manchester, is
seeking to build the links between e-research
data, scholarly communication and other on-line
sources. It is working in the chemistry domain
with the EPSRC funded eScience testbed
CombeChem [combechem], which is a pilot
project that seeks to apply the Grid philosophy
to integrate existing structure and property data
sources into an information and knowledge
environment. The specific examplar chosen
from this subject area is that of crystallography
as it has a strict workflow and produces data
that is rigidly formatted to an internationally
accepted standard, that demonstrates the
usefulness of an end-to-end philosophy,
tracking from laboratory to literature and back
again (figure 2).
Figure 2. A generalised workflow for
crystallography experiments.
The EPSRC National Crystallography
Service (NCS) is housed in the School of
Chemistry at the University of Southampton,
and is an ideal case study due to its high sample
throughput, state of the art instrumentation,
expert personnel and profile in the academic
chemistry community. Moreover, recent
advances in crystallographic technology and
computational resources have caused an
explosion of crystallographic data, as shown by
the recent exponential growth of the Crystal
Structure Database (CSD). However, despite
this rise it is commonly recognised that, via
traditional publication routes, approximately
only 20% of the data generated globally is
reaching the public domain. This situation is
even worse in the high throughput NCS
scenario where approximately 15% of its output
is disseminated, despite producing more than 60
peer reviewed journal articles per annum. With
the advent of the eScience environment
imminent, this problem can only get more
severe.
The key issue is that the accepted unit of
dissemination is the peer-reviewed journal
article; the relationship between the article and
the collection of data that underpins it (as made
available in the published article itself) is weak,
being expressed in reduced graphical or tabular
form. Consequently not only is a small minority
of scientific research being published in
recognised outlets, but the amount of useful
information being disseminated by those articles
is a tiny fraction of the available (original)
information.
Figure 3. A crystallographic EPrint
record, containing bibliographic information,
an interactive visualisation of the derived
structure, metadata and links to all the data
generated during the course of the
experiment.
generic schema for
scientific experiments.
To improve dissemination of published
articles, the Open Access Initiative allows
researchers to share metadata describing papers
that they make available in ‘institutional’ or
‘subject-based’ repositories. Building on the
OAI concept we present an institutional
repository that also makes available all the raw,
derived and results data from a crystallographic
experiment that cannot be included in the peer-
reviewed published article (figure 3). A schema
for the crystallographic experiment has been
devised that details crystallographic metadata
items and is built on a
The deposition process generates metadata
to be presented to the OAI interface by two
different mechanisms. Firstly the depositor must
enter metadata manually, which is mainly
common bibliographic information but also
includes chemical information items that have
been added to the conventional Dublin Core
EPrint
(Local)
EBank
(World
JOURNAL
PUBLICATION
EBank
REPORT
REPORT
(EPrint)
STRUCTURE
REPORT
CIF
DATASET
(Contains
DATAFILES)
RESULTS
DERIVED
RAW DATA
HOLDING
INVESTIGATION
Files associated with this stage Metadata associated with this stage
Name Description of the stage
File Type Description Name Data Type
Initialisation Mount new sample on diffractometer
Parameterisation to set up data collection
datcol.non
i*.kcd
.drx
ASCII
BINARY
ASCII
Parameters for producing i*.kcd files
Unit cell determination images
Unit cell
Morphology STRING
Collection Collect Data collect.rmat
datcol.non
.py
s*.kcd
*scan*.jpg
ASCII
ASCII
ASCII
BINARY
JPG
Orientation of the crystal
Command file
Proprietary configuration file
Diffraction images
Visual version of .kcd file
Instrument_Type
Temperature
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
STRING
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
Processing Process and correct images scale_all.in
scale_all.out
.hkl
.htm
ASCII
ASCII
ASCII
HTML
Result of processing
Result of correction on processed data
Derived data set
Report file
Cell_a
Cell_b
Cell_c
Cell_alpha
Cell_beta
Cell_gamma
Crystal_system
Completeness
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
INTEGER
INTEGER
INTEGER
INTEGER
INTEGER
INTEGER
STRING
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
Solution Solve Structures .prp
xs.lst
ASCII
ASCII
Symmetry file, log of process
Solution log file
Space_group
Figure_of_merit
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
STRING
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
Refinement Refine Structure xl.lst
.res
ASCII
ASCII
Final refinement listing
Output coordinates
R1_obs
wR2_obs
R1_all
wR2_all
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
INTEGER
INTEGER
INTEGER
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
CIF Produce CIF .cif ASCII Final results
Report Generate e-Print report .html .HTML Publication format (HTML/XHTML) Authors
Affiliations
Formula
Compound_name
STRING
STRING
STRING
STRING
2D_diagram STRING
tion of the crystallography experiment schema,
which is an
ternational identifier that uniquely describes
the
nd transform it into knowledge and
the publication bottleneck problem can be
ques based on
CGI-mechanisms and portal-related standards,
par
schema. The chemical information includes, for
example, an InChI code,
in
two dimensional structure.
During the deposition of data in a
Crystallographic EPrint metadata items are
seamlessly extracted and indexed for searching
at the local archive level. The top level
document includes 'Dublin Core bibliographic'
and 'chemical identifier' metadata elements in
an Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) [OAIPMH]
compliant form, which allow access to a
secondary level of searchable crystallographic
metadata items (figure 4), which are in turn
directly linked to the associated archived data.
In this manner the output from a
crystallographic experiment may be
disseminated as ‘data’ in such a way that
aggregator services and researchers may add
value to it a
overcome.
3. Architecture and Data Flow
The metadata about the datasets available in the
repository will be harvested by a central service
using the OAI-PMH. This metadata will then be
indexed together with any other available
metadata about research publications. A
searchable interface will enable users to
discover datasets as well as related literature.
The metadata contains links back to the datasets
which the user will be able to follow in order to
obtain access to the original data, when this is
available. Harvested metadata from different
repositories not only provides a common entry
point to potentially disparate resources (such as
datasets in dataset repositories and published
literature which may reside elsewhere) but also
offers the potential of enhancement of the
metadata such as the addition of subject
keywords to research datasets based on the
knowledge of subject classification terms
assigned to related publications. A further area
of work investigates the embedding of the
search interface within web sites, adopting their
look-and-feel. The PSIgate1
portal will be used
to pilot these embedding techni
ticularly for resource discovery in a teaching
and research training context.
During deposition of the crystallographic
experiment data sets into the e-data repository, a
number of descriptive elements are either
entered by the depositor (figure 6), or generated
automatically, as described previously. The
repository is OAI-PMH compliant, such that it
Figure 4. A representa indicating all the files
generated during the course of the experiment.
1
http://www.psigate.ac.uk/
can respond to a number of (HTTP) requests
made by ‘harvesters’. Harvesters collect
metadata from the e-data repositories and can
pool metadata from distributed
InChI International Chemical Identifier
Figure 5. The schema ele
sources, thus
per ming the function of aggregators. Once an
agg
L interface with
link back to the data repositories where the
ori
arch and Retrieve
Web Service (SRW) [SRW]. This provides a
bas
r, the
knowledge that the aggregator possesses about
as been harvested, and thus
hange of more specialised
metadata sets is encouraged (see figure 5 for
sig
blin Core goes
som way to providing interoperability,
L interface with
link back to the data repositories where the
ori
arch and Retrieve
Web Service (SRW) [SRW]. This provides a
bas
r, the
knowledge that the aggregator possesses about
as been harvested, and thus
hange of more specialised
metadata sets is encouraged (see figure 5 for
sig
blin Core goes
som way to providing interoperability,
the resources in the repositories is embodied in
the metadata that h
the resources in the repositories is embodied in
the metadata that h
presented to the OAI interface.
Data Name Data Description Data Type XML wrapped content
EPrint_type
Creators
Affiliations
Formula_empirical
Title
CCDC_Code
Compound_class
Available_data
Related_publications
Publication_date
‘Crystal Structure’
ePrint author(s)
Institution(s) of creator(s)
Total atom count
IUPAC Chemical name
Cambridge Structural Database identifier
Chemical category
Actual data available for various eData Report stages
Other output relating to this compound/structure
Date of releasing eData REport to eBank/world
String
String
String
String
String
String
String (set)
String
String
Phrase ‘Crystal Structure’
ePrint authors ‘Surname, Christian name, initial’
Various authors addresses
Atom symbols with their total count (can be real
number) subscript
Chemical name with text & integers
6 digit code
1 word descriptor of chemical category
Presence of data associated with all stages
Literature reference link
Keywords U
Date of public release of ePrint
ndefined descriptors String Phrase describing chemical relevance
Structure_diagram 3D structure diagram CML
CML
Three dimensional structural diagram in CML format
Unique compound identifier (contains some
structural information)
ments
the services that it can offer are directly related
to the quality, quantity and content of the
metadata that is available (see section 4).
4. Meeting the interoperability
challenge: design of an XML schema
for harvesting.
The OAI-PMH supports selective harvesting of
repositories, based on the partitioning of
resources into ‘sets’ or on last updated date, so
that cumulative harvesting of repositories can be
performed. Data Exchange in the OAI-PMH is
carried out in XML. Whilst the protocol
specifies provision of the simple Dublin Core
set of elements as a minimum interoperability
requirement, the exc
the services that it can offer are directly related
to the quality, quantity and content of the
metadata that is available (see section 4).
4. Meeting the interoperability
challenge: design of an XML schema
for harvesting.
The OAI-PMH supports selective harvesting of
repositories, based on the partitioning of
resources into ‘sets’ or on last updated date, so
that cumulative harvesting of repositories can be
performed. Data Exchange in the OAI-PMH is
carried out in XML. Whilst the protocol
specifies provision of the simple Dublin Core
set of elements as a minimum interoperability
requirement, the exc
for
regator has harvested collections of
metadata, it can take on the role of the ‘service
provider’ (in OAI-PMH terms). eBank UK has
developed a demonstrator of one such
aggregator, or service provider.
One immediate benefit of the harvesting
approach is that the service provider can now
act as a single entry point for the discovery of
resources that are distributed in disparate
repositories. Once the metadata has been
harvested by the eBank UK aggregator, an
SGML indexing and search engine is used to
provide the searching functionality over the
metadata. The search results can be presented
to the user through an HTM
e searching functionality over the
metadata. The search results can be presented
to the user through an HTM nificant chemistry information used in
eBank). The metadata, encoded in XML, is
simply wrapped up in the OAI-PMH response
wrappers (also encoded in XML). The
alternative metadata formats can be specified in
the requests by stating the ‘metadata format’
parameter required.
Experience within the ePrints community
has shown that there are significant challenges
to be met when building services on aggregated
metadata [Guy]. Although a simple set of
metadata elements like the Du
nificant chemistry information used in
eBank). The metadata, encoded in XML, is
simply wrapped up in the OAI-PMH response
wrappers (also encoded in XML). The
alternative metadata formats can be specified in
the requests by stating the ‘metadata format’
parameter required.
Experience within the ePrints community
has shown that there are significant challenges
to be met when building services on aggregated
metadata [Guy]. Although a simple set of
metadata elements like the Du
s
s
ginal data resides. Any links made between
different data sets or data sets and published
literature by searching across the aggregated
metadata are revealed to the user through the
display of the search results.
Moreover, the Cheshire search engine
[Cheshire] used in the eBank UK service
provider makes it simple to make the data
available through additional machine interfaces,
such as the SOAP-based Se
ginal data resides. Any links made between
different data sets or data sets and published
literature by searching across the aggregated
metadata are revealed to the user through the
display of the search results.
Moreover, the Cheshire search engine
[Cheshire] used in the eBank UK service
provider makes it simple to make the data
available through additional machine interfaces,
such as the SOAP-based Se
e
e
is for embedding the search service into
external frameworks, such as portals; other
mechanisms that could be employed include
simple CGI-based mechanisms and portal
protocols such as WSRP.
The harvesting approach overcomes the
limitations of cross searching protocols, which
are well-documented in the digital library
literature [Powell, Lossau]. Howeve
is for embedding the search service into
external frameworks, such as portals; other
mechanisms that could be employed include
simple CGI-based mechanisms and portal
protocols such as WSRP.
The harvesting approach overcomes the
limitations of cross searching protocols, which
are well-documented in the digital library
literature [Powell, Lossau]. Howeve
guidelines are required to help reach some
consensus on the specific interpretation of the
use of those elements, for example by
specifying an acceptable minimum of metadata
to be completed, agreeing on the format for
author names and encoding the identifiers of the
resources described [Powell2].
In the eBank UK project we have made use of
the extension mechanisms of the so-called
qualified Dublin Core. We reuse some of the
guidelines are required to help reach some
consensus on the specific interpretation of the
use of those elements, for example by
specifying an acceptable minimum of metadata
to be completed, agreeing on the format for
author names and encoding the identifiers of the
resources described [Powell2].
In the eBank UK project we have made use of
the extension mechanisms of the so-called
qualified Dublin Core. We reuse some of the
basic 15 Dublin Core elements, such as creator
and type [SimpleDC]. The identifier field
contains a link to the HTML page in the edata
repository that contains links to all the data sets
subject of the experimental data. The encoding
of these vocabularies in the XML schemas has
been developed in line with the
recommendations of the Dublin Core Metadata
Initiative [DCguidelines], and they can be easily
substituted when standard vocabularies emerge
within the Crystallography community. Further,
the Dublin Core guidelines for
lines], and they can be easily
substituted when standard vocabularies emerge
within the Crystallography community. Further,
the Dublin Core guidelines for
Figure 6. Entering metadata during the
deposition process.
related to an experiment. The metadata also
contains references to the identifiers of
individual sets of data within an experiment, to
enable future links to be made should that data
be referenced elsewhere. Vocabularies specific
to the crystallography data are being used to
describe the datasets types as well as the names
and identifiers of the chemicals which are the
encoding in
d that “Encoding schemes
.”
There are a number of guidelines for emergent
sta
DC should
cilitate the mapping of the elements to simple
Du
ndards (if not the articles
emselves) [announcement]. The eBank UK
pro
proach of offering data (often in
the region of terabytes in size) through one
e other hand, the CCLRC
g a model which attempts to
encoding in
d that “Encoding schemes
.”
There are a number of guidelines for emergent
sta
DC should
cilitate the mapping of the elements to simple
Du
ndards (if not the articles
emselves) [announcement]. The eBank UK
pro
proach of offering data (often in
the region of terabytes in size) through one
e other hand, the CCLRC
g a model which attempts to
XML recommen
should be implemented using the 'xsi:type'
attribute of the XML element for the property
XML recommen
should be implemented using the 'xsi:type'
attribute of the XML element for the property
ndards in the chemistry community
[Chemistry xml]; an example from our schema
is the representation of the Chemical empirical
formula in the following manner:
<dc:subject
xsi:type="ebankterms:empiricalFormula">C288
H200 Cl24 F48 O48 P16 Pd16</dc:subject>
There are a number of reasons for using the
combination of Simple and Qualified Dublin
Core. Firstly, the use of qualified
ndards in the chemistry community
[Chemistry xml]; an example from our schema
is the representation of the Chemical empirical
formula in the following manner:
<dc:subject
xsi:type="ebankterms:empiricalFormula">C288
H200 Cl24 F48 O48 P16 Pd16</dc:subject>
There are a number of reasons for using the
combination of Simple and Qualified Dublin
Core. Firstly, the use of qualified
fa
fa
blin Core, which as mentioned earlier, is a
requisite for OAI-PMH compliant repositories.
Secondly, by re-using and adhering to a well-
documented standard it is hoped that the
applicability of the approach will extend to
other communities that the eBank UK project
hopes to interact with in the future.
In principle, the Dublin Core fields used for
‘bibliographic’ information in the data set
descriptions should work well with descriptions
of published literature available through the
OAI-PMH and which could be harvested and
cross-searched in the eBank UK service. In
practice the Open Access approach is not
currently as widespread in the Chemistry
community as in some other fields, and
metadata available via the OAI-PMH that
describes the published literature is rather
scarce. However, some publishers have
recently become more willing to expose
bibliographic information on articles in journals
that they publish more openly i.e. through
specified open sta
blin Core, which as mentioned earlier, is a
requisite for OAI-PMH compliant repositories.
Secondly, by re-using and adhering to a well-
documented standard it is hoped that the
applicability of the approach will extend to
other communities that the eBank UK project
hopes to interact with in the future.
In principle, the Dublin Core fields used for
‘bibliographic’ information in the data set
descriptions should work well with descriptions
of published literature available through the
OAI-PMH and which could be harvested and
cross-searched in the eBank UK service. In
practice the Open Access approach is not
currently as widespread in the Chemistry
community as in some other fields, and
metadata available via the OAI-PMH that
describes the published literature is rather
scarce. However, some publishers have
recently become more willing to expose
bibliographic information on articles in journals
that they publish more openly i.e. through
specified open sta
th
th
ject has been pro-active in engaging the
leading organisations involved in the quality
control, publication and dissemination of
crystallography data and it is hoped that the
relevant publishers can be encouraged to release
their metadata for interaction with the eBank
UK service demo.
One further point that should be noted is that
the current metadata exchanged to an extent
hides a degree of complexity in the underlying
experimental data being deposited by the
crystallographers. Such data is often manifested
as a series of related files that may have some
underlying ordering, based, for example, on the
stage of the experiment from which the data was
produced. In this context, the eBank UK project
is investigating the use of so-called ‘packaging
standards’ which attempt to address the need to
describe more complex objects. This is an issue
which is increasingly becoming of interest to
the OAI-PMH community [Herbert]. One other
initiative which is currently seeking to increase
access to scientific (climatic) data, has opted for
a simplified ap
ject has been pro-active in engaging the
leading organisations involved in the quality
control, publication and dissemination of
crystallography data and it is hoped that the
relevant publishers can be encouraged to release
their metadata for interaction with the eBank
UK service demo.
One further point that should be noted is that
the current metadata exchanged to an extent
hides a degree of complexity in the underlying
experimental data being deposited by the
crystallographers. Such data is often manifested
as a series of related files that may have some
underlying ordering, based, for example, on the
stage of the experiment from which the data was
produced. In this context, the eBank UK project
is investigating the use of so-called ‘packaging
standards’ which attempt to address the need to
describe more complex objects. This is an issue
which is increasingly becoming of interest to
the OAI-PMH community [Herbert]. One other
initiative which is currently seeking to increase
access to scientific (climatic) data, has opted for
a simplified ap
‘access’ point identified by a single identifier
[German]. On th
[cclrc] is developin
‘access’ point identified by a single identifier
[German]. On th
[cclrc] is developin
make explicit the inherent complexity of
scientific datasets, and the eBank UK project
could contribute the experience gained to date
to this effort.
make explicit the inherent complexity of
scientific datasets, and the eBank UK project
could contribute the experience gained to date
to this effort.
5. Conclusions – issues and challenges
for the future
Th
he project is to
wo
experimentalists and sold to the publishing
whole in order to embed it into
the scholarly learning and research cycles. At
ance of the
softwa vation of
e data may also be addressed.
be
to gen approach to archiving and
dissem
may b her areas of chemistry
and also further to all the experimental sciences.
Refer
[annou
[Chem
http://www.iupac.org/projects/2002/2002
clrc] Sufi, S., Matthews, B. & Kleese van
[Chesh
.edu/
C., Surridge, M. and Welsh, A. (2003)
an, F.,
Fox, G. and Hey, T., Eds. Wiley Series
es p. 945-962.
cguidelines] Andy Powell and Pete Johnston
[eBank ject
JISC/EPSRC E-Science
[Germ and Citation of Scientific
tion-infrastructure of network-
based scientific-cooperation and digital
[Guy] 004)
in
k/issue38/guy/int
ro.html
[Herbe
ional Laboratory Digital
.org/dlib/november03/be
kaert/11bekaert.html
[Lossa
Discover the
Academic Internet D-Lib Magazine.
[oai-PM pen Archives Initiative
Protocol for Metadata Harvesting
[powel AI
t Gateway
Content. Poster paper, 10th WWW
is paper recounts the lessons learnt so far in
the provision of experimental data together with
the results of analysis linked to an e-print article
and has assessed the requirements for future
work and identified the key challenges that still
need to be addressed. Finally, the immediate
benefits to the learning & teaching and research
communities in the scholarly environment are
indicated and considered together with the
potential for very significant impact in the
longer term.
An immediate challenge for t
rk with the crystallographic community to
develop an internationally standard vocabulary
for the exchange of structural chemistry data via
OAI-PMH procedures. The eBank UK approach
must then be promoted to crystallography
community as a
the same time the issues of mainten
re and archive along with preser
th
The primary challenge for the future will
eralise this
inating crystallographic data so that it
e employed in ot
ences
ncement] Institute of Physics
http://www.iop.org/news/0467j
istry xml] XML Data Dictionaries in
Chemistry
-022-1-024.html
[c
Dam, K. (2003). An interdisciplinary
model for the representation of scientific
studies and associated data holdings. UK
e-Science All Hands Meeting,
Nottingham, 2-4 September 2003.
http://www.nesc.ac.uk/events/ahm2003/
AHMCD/pdf/020.pdf
ire] Cheshire Web Page
http://cheshire.berkeley
[combechem] Frey, J. G., Bradley, M., Essex, J.
W., Hursthouse, M. B., Lewis, S. M.,
Luck, M. M., Moreau, L., De Roure, D.
Grid Computing – Making the Global
Infrastructure a Reality, in Berm
in Communications Networking and
Distributed Systems, pag p
John Wiley and Sons.
[d
(2003) Guidelines for implementing
Dublin Core in XML
http://www.dublincore.org/documents/20
03/04/02/dc-xml-guidelines/
UK Project] eBank UK Pro
http://www.ukoln.ac.uk/projects/ebank-
uk/
an] Publication
Primary Data DFG-Support Program:
"Informa
publication"
Project Page: http://www.std-doi.de/
Marieke Guy and Andy Powell (2
Improving the Quality of Metadata
Eprint Archives, Ariadne Issue 38
January-2004
http://www.ariadne.ac.u
rt] Jeroen Bekaert, Patrick Hochstenbach
and Herbert Van de Sompel (2003)
Using MPEG-21 DIDL to Represent
Complex Digital Objects in the Los
Alamos Nat
Library. D-Lib Magazine. Vol 9 No 11.
http://www.dlib
u] Norbert Lossau (2004) Search Engine
Technology and Digital Libraries:
Libraries Need to
Volume 10 Number 6. June 2004 ISSN
1082-9873
http://www.dlib.org/dlib/june04/lossau/0
6lossau.html
H] The O
http://www.openarchives.org/OAI/opena
rchivesprotocol.html
l] Andy Powell (2001) An O
Approach to Sharing Subjec
conference, Hong Kong. May 2001
http://w
10/oaiposter/
ww.rdn.ac.uk/publications/www
[po ell2] Andy Powell, Michael Day and Peter
.dublincore.org/documents/20
02/07/31/dcmes-xml/
[SKC] Liz Lyon. (2003) eBank UK – building
the links between research data, scholarly
communication and learning. Ariadne, Issue 36
http://www.ariadne.ac.uk/issue36/lyon/
[SRW] SRW Search/Retrieve Web Service
http://www.loc.gov/z3950/agency/
zing/srw/
w
Cliff (2003). Using simple Dublin Core
to describe eprints. March 2003
http://www.rdn.ac.uk/projects/eprints-
uk/docs/simpledc-guidelines/
[SimpleDC] Expressing Simple Dublin Core in
XML
http://www

More Related Content

What's hot

Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...
University Medicine Greifswald
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
NeuroMat
 
When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?
University Medicine Greifswald
 
Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal data
SOYEON KIM
 
An information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksAn information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networks
Jim Bagrow
 
NetBioSIG2013-Talk David Amar
NetBioSIG2013-Talk David AmarNetBioSIG2013-Talk David Amar
NetBioSIG2013-Talk David Amar
Alexander Pico
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply Chain
Paul Groth
 
NRNB Annual Report 2013
NRNB Annual Report 2013NRNB Annual Report 2013
NRNB Annual Report 2013
Alexander Pico
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
Alexander Pico
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
Alexander Pico
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
Alexander Pico
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
University Medicine Greifswald
 
NRNB Annual Report 2016: Overall
NRNB Annual Report 2016: OverallNRNB Annual Report 2016: Overall
NRNB Annual Report 2016: Overall
Alexander Pico
 
Technology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential NetworksTechnology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential Networks
Alexander Pico
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
Databricks
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
Devakumar Jain
 
NRNB Annual Report 2017
NRNB Annual Report 2017NRNB Annual Report 2017
NRNB Annual Report 2017
Alexander Pico
 
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSTWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
IJDKP
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
Bruce Kozuma
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
ijseajournal
 

What's hot (20)

Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
 
When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?
 
Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal data
 
An information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksAn information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networks
 
NetBioSIG2013-Talk David Amar
NetBioSIG2013-Talk David AmarNetBioSIG2013-Talk David Amar
NetBioSIG2013-Talk David Amar
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply Chain
 
NRNB Annual Report 2013
NRNB Annual Report 2013NRNB Annual Report 2013
NRNB Annual Report 2013
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
 
NRNB Annual Report 2016: Overall
NRNB Annual Report 2016: OverallNRNB Annual Report 2016: Overall
NRNB Annual Report 2016: Overall
 
Technology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential NetworksTechnology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential Networks
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 
NRNB Annual Report 2017
NRNB Annual Report 2017NRNB Annual Report 2017
NRNB Annual Report 2017
 
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSTWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 

Similar to E bank uk_linking_research_data_scholarly

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
ManjulaPatel
 
As we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledgeAs we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledge
Prashant Gupta
 
A Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkA Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration Framework
Lisa Muthukumar
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
ManjulaPatel
 
Supporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and dataSupporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and data
Don Pellegrino
 
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
remAYDOAN3
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
Elia Brodsky
 
Connecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledgeConnecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledge
Prashant Gupta
 
Poster: Very Open Data Project
Poster: Very Open Data ProjectPoster: Very Open Data Project
Poster: Very Open Data Project
Edward Blurock
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
voginip
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
Library Support For Ref
Library Support For RefLibrary Support For Ref
Library Support For Ref
David Clay
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020
Alexander Pico
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
Paul Groth
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
Blerina Spahiu
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
Waqas Tariq
 
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
University of Bari (Italy)
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-driven
Joshua Chudy
 
The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...
Ichigaku Takigawa
 

Similar to E bank uk_linking_research_data_scholarly (20)

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
As we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledgeAs we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledge
 
A Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkA Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration Framework
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
Supporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and dataSupporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and data
 
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 
Connecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledgeConnecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledge
 
Poster: Very Open Data Project
Poster: Very Open Data ProjectPoster: Very Open Data Project
Poster: Very Open Data Project
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Library Support For Ref
Library Support For RefLibrary Support For Ref
Library Support For Ref
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-driven
 
The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...
 

More from Luisa Francisco

Nanotechnology synthesis study_research
Nanotechnology synthesis study_researchNanotechnology synthesis study_research
Nanotechnology synthesis study_research
Luisa Francisco
 
Issues
IssuesIssues
High speed data_transmission_of_images_o
High speed data_transmission_of_images_oHigh speed data_transmission_of_images_o
High speed data_transmission_of_images_o
Luisa Francisco
 
Future of nanotechnology_in_memory_devic
Future of nanotechnology_in_memory_devicFuture of nanotechnology_in_memory_devic
Future of nanotechnology_in_memory_devic
Luisa Francisco
 
Big data security_issues_research_paper
Big data security_issues_research_paperBig data security_issues_research_paper
Big data security_issues_research_paper
Luisa Francisco
 
A state of_the_art_survey_of_indoor_posi
A state of_the_art_survey_of_indoor_posiA state of_the_art_survey_of_indoor_posi
A state of_the_art_survey_of_indoor_posi
Luisa Francisco
 
A research paper_on_lossless_data_compre
A research paper_on_lossless_data_compreA research paper_on_lossless_data_compre
A research paper_on_lossless_data_compre
Luisa Francisco
 
Computer maintenance
Computer maintenanceComputer maintenance
Computer maintenance
Luisa Francisco
 
Shs applied empowerment-technologies-for-the-strand
Shs applied empowerment-technologies-for-the-strandShs applied empowerment-technologies-for-the-strand
Shs applied empowerment-technologies-for-the-strand
Luisa Francisco
 

More from Luisa Francisco (9)

Nanotechnology synthesis study_research
Nanotechnology synthesis study_researchNanotechnology synthesis study_research
Nanotechnology synthesis study_research
 
Issues
IssuesIssues
Issues
 
High speed data_transmission_of_images_o
High speed data_transmission_of_images_oHigh speed data_transmission_of_images_o
High speed data_transmission_of_images_o
 
Future of nanotechnology_in_memory_devic
Future of nanotechnology_in_memory_devicFuture of nanotechnology_in_memory_devic
Future of nanotechnology_in_memory_devic
 
Big data security_issues_research_paper
Big data security_issues_research_paperBig data security_issues_research_paper
Big data security_issues_research_paper
 
A state of_the_art_survey_of_indoor_posi
A state of_the_art_survey_of_indoor_posiA state of_the_art_survey_of_indoor_posi
A state of_the_art_survey_of_indoor_posi
 
A research paper_on_lossless_data_compre
A research paper_on_lossless_data_compreA research paper_on_lossless_data_compre
A research paper_on_lossless_data_compre
 
Computer maintenance
Computer maintenanceComputer maintenance
Computer maintenance
 
Shs applied empowerment-technologies-for-the-strand
Shs applied empowerment-technologies-for-the-strandShs applied empowerment-technologies-for-the-strand
Shs applied empowerment-technologies-for-the-strand
 

Recently uploaded

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
HODECEDSIET
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 

Recently uploaded (20)

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 

E bank uk_linking_research_data_scholarly

  • 1. eBank UK: linking research data, scholarly communication and learning Dr Liz Lyon, Rachel Heery, Monica Duke, UKOLN, University of Bath Dr Simon Coles, Dr Jeremy Frey, Prof. Michael Hursthouse, School of Chemistry, University of Southampton Dr Leslie Carr, Christopher Gutteridge, School of Electronics and Computer Science, University of Southampton Abstract This paper includes an overview of the changing landscape of scholarly communication and describes outcomes from the innovative eBank UK project, which seeks to build links from e- research through to e-learning. As introduction, the scholarly knowledge cycle is described and the role of digital repositories and aggregator services in linking data-sets from Grid-enabled projects to e-prints through to peer-reviewed articles as resources in portals and Learning Management Systems, are assessed. The development outcomes from the eBank UK project are presented including the distributed information architecture, requirements for common ontologies, data models, metadata schema, open linking technologies, provenance and workflows. Some emerging challenges for the future are presented in conclusion. 1. Introduction and context – the scholarly knowledge cycle The background and context for the eBank project is the changing landscape of scholarly communication afforded by the developing global network infrastructure (Internet, World Wide Web, Semantic Web and Grid) together with the potential for increasing links between the activities of e-research / e-science, digital libraries and e-learning at all levels from schools to post-graduate. The flow of data and information in the process of research and learning in an academic setting leads ultimately to the creation and acquisition of knowledge. Data and information is continuously used and re-used in new ways as part of a research activity or in the development of course materials in support of learning. These workflows are interlinked and together underpin the scholarly knowledge cycle (figure 1) [SKC]. The creation, management, sharing and interoperation of original data (which may be, for example, numerical data generated by an experiment or a survey, or images captured as part of a clinical study together with the associated metadata) is an intrinsic part of e- Science, keeping data & metadata together. This process consists of one or more subprocesses which might include aggregation of experimental data, selection of a particular data subset, repetition of a laboratory experiment, statistical analysis or modelling of a set of data, manipulation of a molecular structure, annotation of a diagram or editing of a digital image, and which in turn generate modified datasets. This in turn entails managing version control and access to previous versions, as well as auditing information on processes and transformations. The successive levels of derived data are closely related to the original data (through refinement, transformation, interpretation, generalisation etc.) and may themselves be further processed to form successive interpretations or abstractions. If the data at any particular stage of processing is considered to be generically useful, it may be considered for inclusion in a publicly-available database for use beyond the boundaries of the project which funded the data. This has only been undertaken in fairly specialised fields of joint community endeavour (for example in the bioinformatics field, protein and genome sequences databases, and in chemistry, material safety information, and the crystal structures Cambridge Structural
  • 2. Figure 1: The Scholarly Knowledge Cycle Database), as such databases require some level of service guarantee from a service provider. Provision in other chemistry areas i.e. on spectroscopic data is patchy due to the cost of maintaining the data as well as attitudes towards sharing. Eventually, useful and interesting knowledge can be interpreted from the processed data; this knowledge is interpolated into a scientific article for communication with the scientific community through the publication process. Sufficient data must be included to support the arguments and claims advanced in the article, however the practical limits of the publication medium mean that the data is shown in a very restricted form (such as a graph or summary table). As well as submitting the paper for peer review and publication, it may be deposited in an eprint archive for immediate dissemination and discussion by the community. These secondary items (the articles and database tables) can themselves re-enter the research cycle through a citation in a related paper or inclusion in a meta-study, or inclusion in postgraduate training or undergraduate learning activities by a reference in a reading list or in modular materials as part of an online course in a Learning Management System. The eBank UK project is addressing this challenge of whole-lifecycle reuse by investigating the role of aggregator services in linking data-sets from Grid-enabled projects to e-prints contained in digital repositories through to peer-reviewed articles as resources in e- learning portals. 2. The eBank UK Project – addressing the data publication bottleneck This innovative JISC-funded project which is led by UKOLN in partnership with the Universities of Southampton and Manchester, is seeking to build the links between e-research data, scholarly communication and other on-line sources. It is working in the chemistry domain with the EPSRC funded eScience testbed CombeChem [combechem], which is a pilot project that seeks to apply the Grid philosophy to integrate existing structure and property data sources into an information and knowledge environment. The specific examplar chosen from this subject area is that of crystallography as it has a strict workflow and produces data
  • 3. that is rigidly formatted to an internationally accepted standard, that demonstrates the usefulness of an end-to-end philosophy, tracking from laboratory to literature and back again (figure 2). Figure 2. A generalised workflow for crystallography experiments. The EPSRC National Crystallography Service (NCS) is housed in the School of Chemistry at the University of Southampton, and is an ideal case study due to its high sample throughput, state of the art instrumentation, expert personnel and profile in the academic chemistry community. Moreover, recent advances in crystallographic technology and computational resources have caused an explosion of crystallographic data, as shown by the recent exponential growth of the Crystal Structure Database (CSD). However, despite this rise it is commonly recognised that, via traditional publication routes, approximately only 20% of the data generated globally is reaching the public domain. This situation is even worse in the high throughput NCS scenario where approximately 15% of its output is disseminated, despite producing more than 60 peer reviewed journal articles per annum. With the advent of the eScience environment imminent, this problem can only get more severe. The key issue is that the accepted unit of dissemination is the peer-reviewed journal article; the relationship between the article and the collection of data that underpins it (as made available in the published article itself) is weak, being expressed in reduced graphical or tabular form. Consequently not only is a small minority of scientific research being published in recognised outlets, but the amount of useful information being disseminated by those articles is a tiny fraction of the available (original) information. Figure 3. A crystallographic EPrint record, containing bibliographic information, an interactive visualisation of the derived structure, metadata and links to all the data generated during the course of the experiment. generic schema for scientific experiments. To improve dissemination of published articles, the Open Access Initiative allows researchers to share metadata describing papers that they make available in ‘institutional’ or ‘subject-based’ repositories. Building on the OAI concept we present an institutional repository that also makes available all the raw, derived and results data from a crystallographic experiment that cannot be included in the peer- reviewed published article (figure 3). A schema for the crystallographic experiment has been devised that details crystallographic metadata items and is built on a The deposition process generates metadata to be presented to the OAI interface by two different mechanisms. Firstly the depositor must enter metadata manually, which is mainly common bibliographic information but also includes chemical information items that have been added to the conventional Dublin Core EPrint (Local) EBank (World JOURNAL PUBLICATION EBank REPORT REPORT (EPrint) STRUCTURE REPORT CIF DATASET (Contains DATAFILES) RESULTS DERIVED RAW DATA HOLDING INVESTIGATION
  • 4. Files associated with this stage Metadata associated with this stage Name Description of the stage File Type Description Name Data Type Initialisation Mount new sample on diffractometer Parameterisation to set up data collection datcol.non i*.kcd .drx ASCII BINARY ASCII Parameters for producing i*.kcd files Unit cell determination images Unit cell Morphology STRING Collection Collect Data collect.rmat datcol.non .py s*.kcd *scan*.jpg ASCII ASCII ASCII BINARY JPG Orientation of the crystal Command file Proprietary configuration file Diffraction images Visual version of .kcd file Instrument_Type Temperature Software_Name (x n) Software_Version (x n) Software_URL (x n) STRING INTEGER STRING (x n) INTEGER (x n) URL (x n) Processing Process and correct images scale_all.in scale_all.out .hkl .htm ASCII ASCII ASCII HTML Result of processing Result of correction on processed data Derived data set Report file Cell_a Cell_b Cell_c Cell_alpha Cell_beta Cell_gamma Crystal_system Completeness Software_Name (x n) Software_Version (x n) Software_URL (x n) INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER STRING INTEGER STRING (x n) INTEGER (x n) URL (x n) Solution Solve Structures .prp xs.lst ASCII ASCII Symmetry file, log of process Solution log file Space_group Figure_of_merit Software_Name (x n) Software_Version (x n) Software_URL (x n) STRING INTEGER STRING (x n) INTEGER (x n) URL (x n) Refinement Refine Structure xl.lst .res ASCII ASCII Final refinement listing Output coordinates R1_obs wR2_obs R1_all wR2_all Software_Name (x n) Software_Version (x n) Software_URL (x n) INTEGER INTEGER INTEGER INTEGER STRING (x n) INTEGER (x n) URL (x n) CIF Produce CIF .cif ASCII Final results Report Generate e-Print report .html .HTML Publication format (HTML/XHTML) Authors Affiliations Formula Compound_name STRING STRING STRING STRING 2D_diagram STRING tion of the crystallography experiment schema, which is an ternational identifier that uniquely describes the nd transform it into knowledge and the publication bottleneck problem can be ques based on CGI-mechanisms and portal-related standards, par schema. The chemical information includes, for example, an InChI code, in two dimensional structure. During the deposition of data in a Crystallographic EPrint metadata items are seamlessly extracted and indexed for searching at the local archive level. The top level document includes 'Dublin Core bibliographic' and 'chemical identifier' metadata elements in an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [OAIPMH] compliant form, which allow access to a secondary level of searchable crystallographic metadata items (figure 4), which are in turn directly linked to the associated archived data. In this manner the output from a crystallographic experiment may be disseminated as ‘data’ in such a way that aggregator services and researchers may add value to it a overcome. 3. Architecture and Data Flow The metadata about the datasets available in the repository will be harvested by a central service using the OAI-PMH. This metadata will then be indexed together with any other available metadata about research publications. A searchable interface will enable users to discover datasets as well as related literature. The metadata contains links back to the datasets which the user will be able to follow in order to obtain access to the original data, when this is available. Harvested metadata from different repositories not only provides a common entry point to potentially disparate resources (such as datasets in dataset repositories and published literature which may reside elsewhere) but also offers the potential of enhancement of the metadata such as the addition of subject keywords to research datasets based on the knowledge of subject classification terms assigned to related publications. A further area of work investigates the embedding of the search interface within web sites, adopting their look-and-feel. The PSIgate1 portal will be used to pilot these embedding techni ticularly for resource discovery in a teaching and research training context. During deposition of the crystallographic experiment data sets into the e-data repository, a number of descriptive elements are either entered by the depositor (figure 6), or generated automatically, as described previously. The repository is OAI-PMH compliant, such that it Figure 4. A representa indicating all the files generated during the course of the experiment. 1 http://www.psigate.ac.uk/
  • 5. can respond to a number of (HTTP) requests made by ‘harvesters’. Harvesters collect metadata from the e-data repositories and can pool metadata from distributed InChI International Chemical Identifier Figure 5. The schema ele sources, thus per ming the function of aggregators. Once an agg L interface with link back to the data repositories where the ori arch and Retrieve Web Service (SRW) [SRW]. This provides a bas r, the knowledge that the aggregator possesses about as been harvested, and thus hange of more specialised metadata sets is encouraged (see figure 5 for sig blin Core goes som way to providing interoperability, L interface with link back to the data repositories where the ori arch and Retrieve Web Service (SRW) [SRW]. This provides a bas r, the knowledge that the aggregator possesses about as been harvested, and thus hange of more specialised metadata sets is encouraged (see figure 5 for sig blin Core goes som way to providing interoperability, the resources in the repositories is embodied in the metadata that h the resources in the repositories is embodied in the metadata that h presented to the OAI interface. Data Name Data Description Data Type XML wrapped content EPrint_type Creators Affiliations Formula_empirical Title CCDC_Code Compound_class Available_data Related_publications Publication_date ‘Crystal Structure’ ePrint author(s) Institution(s) of creator(s) Total atom count IUPAC Chemical name Cambridge Structural Database identifier Chemical category Actual data available for various eData Report stages Other output relating to this compound/structure Date of releasing eData REport to eBank/world String String String String String String String (set) String String Phrase ‘Crystal Structure’ ePrint authors ‘Surname, Christian name, initial’ Various authors addresses Atom symbols with their total count (can be real number) subscript Chemical name with text & integers 6 digit code 1 word descriptor of chemical category Presence of data associated with all stages Literature reference link Keywords U Date of public release of ePrint ndefined descriptors String Phrase describing chemical relevance Structure_diagram 3D structure diagram CML CML Three dimensional structural diagram in CML format Unique compound identifier (contains some structural information) ments the services that it can offer are directly related to the quality, quantity and content of the metadata that is available (see section 4). 4. Meeting the interoperability challenge: design of an XML schema for harvesting. The OAI-PMH supports selective harvesting of repositories, based on the partitioning of resources into ‘sets’ or on last updated date, so that cumulative harvesting of repositories can be performed. Data Exchange in the OAI-PMH is carried out in XML. Whilst the protocol specifies provision of the simple Dublin Core set of elements as a minimum interoperability requirement, the exc the services that it can offer are directly related to the quality, quantity and content of the metadata that is available (see section 4). 4. Meeting the interoperability challenge: design of an XML schema for harvesting. The OAI-PMH supports selective harvesting of repositories, based on the partitioning of resources into ‘sets’ or on last updated date, so that cumulative harvesting of repositories can be performed. Data Exchange in the OAI-PMH is carried out in XML. Whilst the protocol specifies provision of the simple Dublin Core set of elements as a minimum interoperability requirement, the exc for regator has harvested collections of metadata, it can take on the role of the ‘service provider’ (in OAI-PMH terms). eBank UK has developed a demonstrator of one such aggregator, or service provider. One immediate benefit of the harvesting approach is that the service provider can now act as a single entry point for the discovery of resources that are distributed in disparate repositories. Once the metadata has been harvested by the eBank UK aggregator, an SGML indexing and search engine is used to provide the searching functionality over the metadata. The search results can be presented to the user through an HTM e searching functionality over the metadata. The search results can be presented to the user through an HTM nificant chemistry information used in eBank). The metadata, encoded in XML, is simply wrapped up in the OAI-PMH response wrappers (also encoded in XML). The alternative metadata formats can be specified in the requests by stating the ‘metadata format’ parameter required. Experience within the ePrints community has shown that there are significant challenges to be met when building services on aggregated metadata [Guy]. Although a simple set of metadata elements like the Du nificant chemistry information used in eBank). The metadata, encoded in XML, is simply wrapped up in the OAI-PMH response wrappers (also encoded in XML). The alternative metadata formats can be specified in the requests by stating the ‘metadata format’ parameter required. Experience within the ePrints community has shown that there are significant challenges to be met when building services on aggregated metadata [Guy]. Although a simple set of metadata elements like the Du s s ginal data resides. Any links made between different data sets or data sets and published literature by searching across the aggregated metadata are revealed to the user through the display of the search results. Moreover, the Cheshire search engine [Cheshire] used in the eBank UK service provider makes it simple to make the data available through additional machine interfaces, such as the SOAP-based Se ginal data resides. Any links made between different data sets or data sets and published literature by searching across the aggregated metadata are revealed to the user through the display of the search results. Moreover, the Cheshire search engine [Cheshire] used in the eBank UK service provider makes it simple to make the data available through additional machine interfaces, such as the SOAP-based Se e e is for embedding the search service into external frameworks, such as portals; other mechanisms that could be employed include simple CGI-based mechanisms and portal protocols such as WSRP. The harvesting approach overcomes the limitations of cross searching protocols, which are well-documented in the digital library literature [Powell, Lossau]. Howeve is for embedding the search service into external frameworks, such as portals; other mechanisms that could be employed include simple CGI-based mechanisms and portal protocols such as WSRP. The harvesting approach overcomes the limitations of cross searching protocols, which are well-documented in the digital library literature [Powell, Lossau]. Howeve guidelines are required to help reach some consensus on the specific interpretation of the use of those elements, for example by specifying an acceptable minimum of metadata to be completed, agreeing on the format for author names and encoding the identifiers of the resources described [Powell2]. In the eBank UK project we have made use of the extension mechanisms of the so-called qualified Dublin Core. We reuse some of the guidelines are required to help reach some consensus on the specific interpretation of the use of those elements, for example by specifying an acceptable minimum of metadata to be completed, agreeing on the format for author names and encoding the identifiers of the resources described [Powell2]. In the eBank UK project we have made use of the extension mechanisms of the so-called qualified Dublin Core. We reuse some of the
  • 6. basic 15 Dublin Core elements, such as creator and type [SimpleDC]. The identifier field contains a link to the HTML page in the edata repository that contains links to all the data sets subject of the experimental data. The encoding of these vocabularies in the XML schemas has been developed in line with the recommendations of the Dublin Core Metadata Initiative [DCguidelines], and they can be easily substituted when standard vocabularies emerge within the Crystallography community. Further, the Dublin Core guidelines for lines], and they can be easily substituted when standard vocabularies emerge within the Crystallography community. Further, the Dublin Core guidelines for Figure 6. Entering metadata during the deposition process. related to an experiment. The metadata also contains references to the identifiers of individual sets of data within an experiment, to enable future links to be made should that data be referenced elsewhere. Vocabularies specific to the crystallography data are being used to describe the datasets types as well as the names and identifiers of the chemicals which are the encoding in d that “Encoding schemes .” There are a number of guidelines for emergent sta DC should cilitate the mapping of the elements to simple Du ndards (if not the articles emselves) [announcement]. The eBank UK pro proach of offering data (often in the region of terabytes in size) through one e other hand, the CCLRC g a model which attempts to encoding in d that “Encoding schemes .” There are a number of guidelines for emergent sta DC should cilitate the mapping of the elements to simple Du ndards (if not the articles emselves) [announcement]. The eBank UK pro proach of offering data (often in the region of terabytes in size) through one e other hand, the CCLRC g a model which attempts to XML recommen should be implemented using the 'xsi:type' attribute of the XML element for the property XML recommen should be implemented using the 'xsi:type' attribute of the XML element for the property ndards in the chemistry community [Chemistry xml]; an example from our schema is the representation of the Chemical empirical formula in the following manner: <dc:subject xsi:type="ebankterms:empiricalFormula">C288 H200 Cl24 F48 O48 P16 Pd16</dc:subject> There are a number of reasons for using the combination of Simple and Qualified Dublin Core. Firstly, the use of qualified ndards in the chemistry community [Chemistry xml]; an example from our schema is the representation of the Chemical empirical formula in the following manner: <dc:subject xsi:type="ebankterms:empiricalFormula">C288 H200 Cl24 F48 O48 P16 Pd16</dc:subject> There are a number of reasons for using the combination of Simple and Qualified Dublin Core. Firstly, the use of qualified fa fa blin Core, which as mentioned earlier, is a requisite for OAI-PMH compliant repositories. Secondly, by re-using and adhering to a well- documented standard it is hoped that the applicability of the approach will extend to other communities that the eBank UK project hopes to interact with in the future. In principle, the Dublin Core fields used for ‘bibliographic’ information in the data set descriptions should work well with descriptions of published literature available through the OAI-PMH and which could be harvested and cross-searched in the eBank UK service. In practice the Open Access approach is not currently as widespread in the Chemistry community as in some other fields, and metadata available via the OAI-PMH that describes the published literature is rather scarce. However, some publishers have recently become more willing to expose bibliographic information on articles in journals that they publish more openly i.e. through specified open sta blin Core, which as mentioned earlier, is a requisite for OAI-PMH compliant repositories. Secondly, by re-using and adhering to a well- documented standard it is hoped that the applicability of the approach will extend to other communities that the eBank UK project hopes to interact with in the future. In principle, the Dublin Core fields used for ‘bibliographic’ information in the data set descriptions should work well with descriptions of published literature available through the OAI-PMH and which could be harvested and cross-searched in the eBank UK service. In practice the Open Access approach is not currently as widespread in the Chemistry community as in some other fields, and metadata available via the OAI-PMH that describes the published literature is rather scarce. However, some publishers have recently become more willing to expose bibliographic information on articles in journals that they publish more openly i.e. through specified open sta th th ject has been pro-active in engaging the leading organisations involved in the quality control, publication and dissemination of crystallography data and it is hoped that the relevant publishers can be encouraged to release their metadata for interaction with the eBank UK service demo. One further point that should be noted is that the current metadata exchanged to an extent hides a degree of complexity in the underlying experimental data being deposited by the crystallographers. Such data is often manifested as a series of related files that may have some underlying ordering, based, for example, on the stage of the experiment from which the data was produced. In this context, the eBank UK project is investigating the use of so-called ‘packaging standards’ which attempt to address the need to describe more complex objects. This is an issue which is increasingly becoming of interest to the OAI-PMH community [Herbert]. One other initiative which is currently seeking to increase access to scientific (climatic) data, has opted for a simplified ap ject has been pro-active in engaging the leading organisations involved in the quality control, publication and dissemination of crystallography data and it is hoped that the relevant publishers can be encouraged to release their metadata for interaction with the eBank UK service demo. One further point that should be noted is that the current metadata exchanged to an extent hides a degree of complexity in the underlying experimental data being deposited by the crystallographers. Such data is often manifested as a series of related files that may have some underlying ordering, based, for example, on the stage of the experiment from which the data was produced. In this context, the eBank UK project is investigating the use of so-called ‘packaging standards’ which attempt to address the need to describe more complex objects. This is an issue which is increasingly becoming of interest to the OAI-PMH community [Herbert]. One other initiative which is currently seeking to increase access to scientific (climatic) data, has opted for a simplified ap ‘access’ point identified by a single identifier [German]. On th [cclrc] is developin ‘access’ point identified by a single identifier [German]. On th [cclrc] is developin make explicit the inherent complexity of scientific datasets, and the eBank UK project could contribute the experience gained to date to this effort. make explicit the inherent complexity of scientific datasets, and the eBank UK project could contribute the experience gained to date to this effort.
  • 7. 5. Conclusions – issues and challenges for the future Th he project is to wo experimentalists and sold to the publishing whole in order to embed it into the scholarly learning and research cycles. At ance of the softwa vation of e data may also be addressed. be to gen approach to archiving and dissem may b her areas of chemistry and also further to all the experimental sciences. Refer [annou [Chem http://www.iupac.org/projects/2002/2002 clrc] Sufi, S., Matthews, B. & Kleese van [Chesh .edu/ C., Surridge, M. and Welsh, A. (2003) an, F., Fox, G. and Hey, T., Eds. Wiley Series es p. 945-962. cguidelines] Andy Powell and Pete Johnston [eBank ject JISC/EPSRC E-Science [Germ and Citation of Scientific tion-infrastructure of network- based scientific-cooperation and digital [Guy] 004) in k/issue38/guy/int ro.html [Herbe ional Laboratory Digital .org/dlib/november03/be kaert/11bekaert.html [Lossa Discover the Academic Internet D-Lib Magazine. [oai-PM pen Archives Initiative Protocol for Metadata Harvesting [powel AI t Gateway Content. Poster paper, 10th WWW is paper recounts the lessons learnt so far in the provision of experimental data together with the results of analysis linked to an e-print article and has assessed the requirements for future work and identified the key challenges that still need to be addressed. Finally, the immediate benefits to the learning & teaching and research communities in the scholarly environment are indicated and considered together with the potential for very significant impact in the longer term. An immediate challenge for t rk with the crystallographic community to develop an internationally standard vocabulary for the exchange of structural chemistry data via OAI-PMH procedures. The eBank UK approach must then be promoted to crystallography community as a the same time the issues of mainten re and archive along with preser th The primary challenge for the future will eralise this inating crystallographic data so that it e employed in ot ences ncement] Institute of Physics http://www.iop.org/news/0467j istry xml] XML Data Dictionaries in Chemistry -022-1-024.html [c Dam, K. (2003). An interdisciplinary model for the representation of scientific studies and associated data holdings. UK e-Science All Hands Meeting, Nottingham, 2-4 September 2003. http://www.nesc.ac.uk/events/ahm2003/ AHMCD/pdf/020.pdf ire] Cheshire Web Page http://cheshire.berkeley [combechem] Frey, J. G., Bradley, M., Essex, J. W., Hursthouse, M. B., Lewis, S. M., Luck, M. M., Moreau, L., De Roure, D. Grid Computing – Making the Global Infrastructure a Reality, in Berm in Communications Networking and Distributed Systems, pag p John Wiley and Sons. [d (2003) Guidelines for implementing Dublin Core in XML http://www.dublincore.org/documents/20 03/04/02/dc-xml-guidelines/ UK Project] eBank UK Pro http://www.ukoln.ac.uk/projects/ebank- uk/ an] Publication Primary Data DFG-Support Program: "Informa publication" Project Page: http://www.std-doi.de/ Marieke Guy and Andy Powell (2 Improving the Quality of Metadata Eprint Archives, Ariadne Issue 38 January-2004 http://www.ariadne.ac.u rt] Jeroen Bekaert, Patrick Hochstenbach and Herbert Van de Sompel (2003) Using MPEG-21 DIDL to Represent Complex Digital Objects in the Los Alamos Nat Library. D-Lib Magazine. Vol 9 No 11. http://www.dlib u] Norbert Lossau (2004) Search Engine Technology and Digital Libraries: Libraries Need to Volume 10 Number 6. June 2004 ISSN 1082-9873 http://www.dlib.org/dlib/june04/lossau/0 6lossau.html H] The O http://www.openarchives.org/OAI/opena rchivesprotocol.html l] Andy Powell (2001) An O Approach to Sharing Subjec
  • 8. conference, Hong Kong. May 2001 http://w 10/oaiposter/ ww.rdn.ac.uk/publications/www [po ell2] Andy Powell, Michael Day and Peter .dublincore.org/documents/20 02/07/31/dcmes-xml/ [SKC] Liz Lyon. (2003) eBank UK – building the links between research data, scholarly communication and learning. Ariadne, Issue 36 http://www.ariadne.ac.uk/issue36/lyon/ [SRW] SRW Search/Retrieve Web Service http://www.loc.gov/z3950/agency/ zing/srw/ w Cliff (2003). Using simple Dublin Core to describe eprints. March 2003 http://www.rdn.ac.uk/projects/eprints- uk/docs/simpledc-guidelines/ [SimpleDC] Expressing Simple Dublin Core in XML http://www