2. What is Structural Biology and the SB Interest Group
Chairs:
Lucia Banci (U. Florence)
Chris Morris (STFC)
Antonio Rosato (U. Florence)
RDA Plenary 3 provided most of the indications that
lead to the implementation of the West-Life VRE
4. 16 Instruct Centres
157 platforms offered
for access
>2650 registered users
6 National Affiliated
Centres
11 user networks
Operational since 2012
For all Instruct platforms:
www.structuralbiology.eu
5. Instruct is the major player for integrative structural biology
across Europe, making possible to tackle demanding projects that
require joint technical efforts.
It links also to other biomedical infrastructures via the CORBEL
project.
Instruct offers
economic gains
through new
marketable
technologies, new
drug and vaccine
development,
and improved
health.
Impact
Scarselli,Cantini,Banci,Rappuolietal.,
ScienceTransl.Med.2011
By knowing the structural properties of the antigens
and of the epitopes in all the variants, a chimera
antigen was produced which elicits complete
protective immunity. Patent WO 2011051893 A1
Structure-based design of a Vaccine
against Meningococcus B
6. What do we have and what do we need regarding DATA?
Open Data culture pervasive in the Life Sciences (except
pharma). However this only applies to final (i.e. atomic
coordinates) and supporting (~intermediate) data. Structures
have their own DOIs, so are citable
Structural data are discoverable and accessible via the
wwPDB. Supporting data not necessarily so (but
INSTRUCT mandates that). Raw data are not discoverable,
usually unregistered
There have been significant efforts to ensure that
supporting data are reusable, i.e. structural data can be
re-computed
The above applies to traditional, single-technique
studies
7.
8. What do we have and what do we need regarding DATA?
In the INSTRUCT data policy:
"storage of data is the responsibility of the User to whom it
belongs. [...] Instruct Centres are not required to take responsibility for storing
data beyond the immediate acquisition visit or the time taken for post
experimental analysis if the latter is also provided by the Centre. However, Instruct
Centres aspire to offer an archive to store data, especially in cases where the data
volume makes this more practical that transferring the data, although there is not
a concrete time limit for this implementation"
The data belong to the Users until they are published.
Then, the Users must make the data available
Specialized trusted repositories exist for most
individual types of structural data.
9. What about integration of different structural data?
INSTRUCT is “Integrative Structural Biology”
However, if the data
from each technique
are deposited in
separate repositories,
re-use will be not
possible
The wwPDB task force
recommended:
In addition to archiving the models themselves, all relevant experimental data
and metadata as well as experimental and computational protocols should be
archived. Inclusivity is key.
A. Sali et al. Structure 2015
10. Key issues for INSTRUCT users
SB researchers visit multiple experimental facilities
and use multiple techniques to obtain info on their
systems
Correspondingly, they collect data in a variety of
different formats and described different
metadata, even for the same feature (e.g. sample
preparation
The aforementioned visits take place over several
weeks or even months, also depending on the
information obtained
Facilities have different policies for users’ data
storage, sometimes providing only very short term
storage
11. What do we need regarding DATA?
With integrated structural biology becoming routine we will
need to
Re-evaluate and merge ontologies and metadata in a
cross-technique manner.
Metadata need to be generated early in integrated SB to
allow data to be meaningfully combined later on.
However the infrastructure only sees the users when their
samples are ready.
Document how raw/intermediate data are combined to
get to a final structural description.
12. Users worry more about data
processing than data management
Agree
Disagree
If a web portal offers
integrated access to
data archives and to
processing software,
I will use it
Last year I
discarded some
samples or files
because their
provenance was
not recorded
well enough.
Last year I
repeated
some work
because I
could not
find the
sample or file
produced.
I would use
combined
techniques if
the software
was easier to
get and use.
If a web portal
offers
integrated
access to data
archives and to
processing
software, I will
use it
Potential
conflict
with the
open
access
paradigm
13. A VRE for structural biology
west-life.eu
For a description of the life cycle of SB data: https://goo.gl/G8wqrE
14. Toward a new VRE portal
Also: OneData from the INDIGO-DataCloud project
18. Data challenges and opportunities for a
federated data infrastructure
• Provide collaborative environments to share / process /
analyze data (“à la Dropbox”)
• Provide simple mechanisms to manage the data: sharing /
protection / deletion
• Various access levels depending on security requirements
– Personal X.509 certificated – not user friendly
– robot certificates
– open access
• Control data quality
• Provide mechanisms for citation / recognition
19.
20. Schematic diagram of a structure determination data pipeline.
From Westbrook et al. NAR, 2003
21. Open data access
• What is the data lifecycle in Structural Biology?
• Often linked to a specific grant