In a single generation, technology and economic conditions have radically altered the pace and practice of research. The quantity and complexity of scientific data has grown exponentially. Once manageable by lab notebooks alone, datasets now routinely outstrip the capabilities of ad-hoc management strategies assembled from notebooks, document management systems and data file servers. Management systems for particular data types exist, but create data silos in projects that span disciplines. The number of software packages that we typically rely upon for primary and follow-on analyses has grown along with dataset size and complexity. With each analysis and transfer of critical data in and out of isolated software environments, we risk loss of provenance between raw data and final product. Once relatively rare in the life sciences, team-based research is now common, adding additional challenges in managing data and analyses across a group of researchers. Combined, these trends threaten to erode the careful record of data collection and analysis that is the cornerstone of the scientific method. To be a fully engaged and productive life science researcher now requires significant project and knowledge management skills including the ability to organize large, multi-faceted datasets and to efficiently foster collaboration with numerous internal and external colleagues. A solution that facilitates data organization and exploration, captures and maintains provenance of analysis, and enables sharing of raw or filtered data, annotations, analyses and insights with collaborators is needed.
At Physion, we are rethinking the “lab notebook” by embracing the challenges of modern collaborative science. I hope to foster an active discussion based upon our product development experiences, perspective on current and evolving data management practices and challenges, and your experiences as researchers in this new era.
About Physion
Physion is dedicated to helping scientists do great science. By combining technical expertise, and deep domain knowledge, we strive to engineer software that liberates scientists to be more productive researchers.
About Barry Wark
Barry is the founder and President of Physion. He received a B.S. in Symbolic Systems from Stanford University in 2002 and a PhD from the Graduate Program in Neurobiology and Behavior at the University of Washington in 2009. Barry has been developing scientific software since 1996.
Stanford Neurosciences Professional Development Seminar April 2013
1. Comprehensive data management and collaboration
in life sciences
Barry Wark, Ph.D.
Founder and President, Physion
barry@physion.us
Twitter @barryjwark
Wednesday, April 10, 13
3. The nature of scientific research has changed, challenging
the fundamentals of the scientific method
There are technological solutions that can help you
overcome these challenges
Think globally, act locally
Wednesday, April 10, 13
5. The nature of scientific research has changed
fundamentally
Biology is a context dependent system. Studying
context dependence requires lots of data.
‣Data volume ‣ Analytical tools
• High-content screening: desktop confocal • Central computing resources, elastic
can image 25,000 samples per day provisioning
• Human genome $5000, and falling fast • Open source software democratizes
contribution and distribution
• IonWorks Barracuda® can perform 6,000
whole-cell patch clamp experiments per hour ‣Teams
‣Data variety • Experimental and analytical specialization
• “Coherent” data sets (e.g. Sage, Personal • Research cores and constortia
Genome Project)
• Distributed across organizations and
• Behavior, anatomy, physiology, genomics institutions
experiments on the same subject
Wednesday, April 10, 13
7. What is scientific data?
Goal: synthesize understanding
of the world
•Subject history •Derived values
•Subject preparation •Analysis
•Procedure •Intuition
•Measurements •Conclusions
•Simulation •Intellectual trajectory
Wednesday, April 10, 13
8. What is scientific data?
Goal: synthesize understanding
of the world
•Subject history •Derived values
•Subject preparation •Analysis
•Procedure •Intuition
•Measurements •Conclusions
•Simulation •Intellectual trajectory
Wednesday, April 10, 13
9. Data management is a growing challenge
http://stats.stackexchange.com/questions/16889/ideas-for-lab-notebook-software
Wednesday, April 10, 13
10. Data management landscape
Enterprise SDMS
Complexity/cost
Analytical tools
ELN
Paper notebook
Knowledge management
Wednesday, April 10, 13
11. Data management landscape
Enterprise SDMS
Complexity/cost
Analytical tools
ELN
OSF
Paper notebook Figshare
Acquisition Analysis
Pipeline stage
Wednesday, April 10, 13
12. Data management landscape
Enterprise SDMS
Complexity/cost
Analytical tools
Ovation
ELN
OSF
Paper notebook Figshare
Acquisition Analysis
Pipeline stage
Wednesday, April 10, 13
13. Ovation’s data model describes science
Ovation is built to represent the language of science. Scientific data, regardless of
discipline, fits this model.
Analogous example shows that representing music in the appropriate language of the domain
provides an appropriate data model
Music, in the language of the domain expert. Computer representation in the language of
May include margin notes, etc. the domain expert (including “margin notes”
from composer, conductor, etc.). Any genre
of music is representable.
Lab notebook representation Ovation representation 13
Wednesday, April 10, 13
14. Ubiquitous data model is the correct granularity for
knowledge transfer
Ovation’s data model is more granular than an ELN. Instead of loosing information
during conversion to (and from) a report format such as a Word document or PDF,
Ovation allows data to be transferred in the natural language and granularity of
science.
Information lost in transfer
Analogous example shows that transferring data via a “report” (a sound recording) produces an information bottleneck
Data transferred directly
Seamless collaboration and data transfer removes information bottlenecks 14
Wednesday, April 10, 13
15. Common data model enables collaboration
Interoperability across institutional boundaries is easier with Ovation than other
solutions. Unlike ad-hoc or customized data management systems, every Ovation
customer uses the same data model.
Individual Global
Collaborators
researcher community
Data transfer via Ovation data model
15
Wednesday, April 10, 13
16. Ovation Scientific Data Management System®
• Comprehensive data management
• Multi-modality
• Multi-user annotation
• Analysis provenance
• Seamless user experience
• Double-click installation
• Integration with existing tools: Matlab, Python,
R, Java
• Guide to success
• Effective collaboration
• Distributed and co-located experts
• Data ownership maintained
• Cloud-based replication and archiving
Wednesday, April 10, 13
17. What is the exact record of modern research?
Source
ID: xyz123
Birthday: Dec-1-2010
Number of offspring: 2
Source
Mother:
Father: Source
Greg Schwartz Noldus
Wednesday, April 10, 13
18. Integrated analysis workflow
Analysis pipelines that begin with a search, facilitate
automatic incorporation of new results
Acquire Organize Search Analyze
%% Run a simple query
iterator = context.query('Epoch', ' ...criteria... ');
while(iterator.hasNext())
currEpoch = itrator.next();
...analyze currEpoch...
end
Wednesday, April 10, 13
19. Integrated analysis workflow
Acquire Organize
Search Analyze
Acquire Organize
Replication technology allows Ovation to replicate a subset of the database for data locality within a computational cluster.
Execute workflows on a local or cloud cluster
Wednesday, April 10, 13
20. Share data in context
DerivedResponse
Trial
name: spikes
parameters: {…}
code: spikes.m
Stimulus Response
ovation:///f694d05a-131b-4644-aa7c-f6e8934e60c0/
DerivedResponse
Trial
name: spikes
parameters: {…}
code: spikes.m
Stimulus Response
Wednesday, April 10, 13
21. Share data in context
Project Source
Experiment Experiment
Device
Trial Group
DerivedResponse
Trial Trial Trial
name: spikes
parameters: {…}
code: spikes.m
Stimulus Response
Stimulus Response
Wednesday, April 10, 13
22. Ovation enables researchers to extract more
knowledge from existing data
• Lab’s lifetime work was enough data to answer fundamental questions about signal
and noise in the early visual system
• Data was locked in individual’s ad-hoc data management
• Ovation enabled meta-analysis of this existing data
• New graduate students start with the old data, not new experiments et al. • Arrestin Competition
(38):11867–11879 Doan
psin is pro-
d for each
e transduc-
convert the
nge in cur-
mptions, we
␣ and ␥0/
the single-
GRK1ϩ/Ϫ, “Ovation has changed the way we do science…”
—Fred Rieke
able 2). Be-
Wednesday, April 10, 13
23. Whose data?
Open vs. Proprietary science
•Funding agency •Personal options
mandates
•Creative Commons
•NIH and NSF require
•Portable Legal Consent
data management plans
(human subjects)
for new applications
•Blogs, Twitter
•New repositories
•Open Science
Framework
•Figshare
Wednesday, April 10, 13
26. ovation.io
• Store and archive all your data • Make your data available wherever you
need it
• Safe, secure, highly reliable cloud
storage • Replicate and synchronize data to
multiple devices
• “Offline” archiving
• Benefit from our scalable cloud-based
architecture
• Collaborate locally and globally
• Pay for what you use
• Share selected data with designated
users or the public
• Simple monthly fee
Wednesday, April 10, 13
28. Neuron
Inference in Visual Adaptation
Collaboration with ovation.io
>sp|P63252|1-427
MGSVRTNRYSIVSSEEDGMKLATMAVANGFG
NGKSKVHTRQQCRSRFVKKDGHCNVQFIN
VGEKGQRYLADIFTTCVDIRWRWMLVIFCLA
FVLSWLFFGCVFWLIALLHGDLDASKEGK
ACVSEVNSFTAAFLFSIETQTTIGYGFRCVT
DECPIAVFMVVFQSIVGCIIDAFIIGAVM
AKMAKPKKRNETLVFSHNAVIAMRDGKLCLM
WRVGNLRKSHLVEAHVRAQLLKSRITSEG
EYIPLDQIDINVGFDSGIDRIFLVSPITIVH
EIDEDSPLYDLSKQDIDNADFEIVVILEG
MVEATAMTTQCRSSYLANEILWGHRYEPVLF
EEKHYYKVDYSRFHKTYEVPNTPLCSARD
LAEKKYILSNANSFCYENEVALTSKEEDDSE
NGVPESTSTDTPPDIDLHNQASVPLEPRP
LRRESEI
an Increase in Temporal Contrast Depends on the Period between Contrast Switches
RGC (holding potential 10 mV) in response to a single switch in stimulus contrast (6%–36%,
n (A) and 32 s in (B).
als as in (A) and (B). Exponential fits to the response following an increase in contrast are shown in red.
Figure 1. The Time Course of Adaptation following an Increase in Temporal Contrast Depends on the Period between Contrast Switches
nt (mean ± SEM) of the exponential fit to the response following an increase in contrast (6%–36%) for
OFF) as a function of stimulus switching period.
(A and B) Inhibitory synaptic current to an OFF-transient RGC (holding potential 10 mV) in response to a single switch in stimulus contrast (6%–36%,
Meister, 2002; nonrectified, the r.m.s. current was fit with the same function.
mean $400 R*/rod/s; red). The switching period was 16 s in (A) and 32 s in (B).
ynamics of the The exponential amplitude A and baseline c did not change
(C and D) significantly as a function of the switching period approximately 100 trials as in (A) and (B). Exponential fits to the response following an increase in contrast are shown in red.
Mean synaptic currents from (not shown).
Figure 1E shows the population average time constant as
(E) Population-averaged (n z 10 for each period) time constant (mean ± SEM) of the exponential fit to the response following an increase in contrast (6%–36%) for
a function of period. The average effective time constant of
adaptation scales approximately linearly across a broad range
stall RGC types (ON, OFF-sustained, OFF-transient, and ON-OFF) as a function of stimulus switching period.
of switching periods ($8–32 s). The observed scaling fails for
ion depend on short periods but extends to the longest period (T = 32 s) that
eriodic switch we could measure reliably. A similar relationship was observed
scribed below, when comparing the time constant of an exponential fit to only
se in contrast the first 8 s of 8, 16, and 32 s periods (not shown). Thus the effect
et al., 2001; Smirnakis et al., 1997; Baccus and Meister, 2002;
is not simply the result of fitting an exponential to a nonexponen-
ptic currents in tial response over varying time windows. These results indicate
nonrectified, the r.m.s. current was fit with the same function.
Kim and Rieke, 2001). Here we focus on the dynamics of the
a stimulus that that a fixed first-order process does not govern the dynamics
period of 16 s of contrast adaptation in mouse retina. Instead, the adapting
The exponential amplitude A and baseline c did not change
slow component of adaptation.
d across trials machinery has access to multiple timescales.
trast stimulus, Dynamics of Adaptation to Luminance
significantly as a function of the switching period (not shown).
synaptic input To test the generality of multiple-timescale dynamics of adapta-
urse of several tion, we measured responses to periodic changes in mean light
Figure 1E shows the population average time constant as
Contrast and Luminance Adaptation
slow relaxation intensity (luminance). As for contrast adaptation, the dynamics of
ase in contrast adaptation following an increase in luminance depended on the
a function of period. The average effective time constant of
Wednesday, April 10, 13
Exhibit Multiple Timescales
nputs are con- stimulus switching period.
adaptation scales approximately linearly across a broad range
29. Early access for Stanford Neurosciences Program
In conjunction with this seminar, we are providing early-
access accounts on ovation.io for
Stanford Neuroscience Program students
•Collaboration events
•Survey
•Adoption
•Feedback!
•How much data?
Prize for most collaborative student
Wednesday, April 10, 13
30. Getting started with Ovation
✓Signup
✓Download
✓Get started
http://ovation.io info@ovation.io @ovation_io
Wednesday, April 10, 13