Uw e sciences institute april 2013

312 views

Published on

In a single generation, technology and economic conditions have radically altered the pace and practice of research. Once manageable in a laboratory notebook, the scale and complexity of scientific data in the life sciences has exploded. The number of software packages and distributed computational resources available to scientists for data storage and analysis has undergone similar expansion. Once solitary, research is now increasingly team-based, spanning cross-disciplinary and cross-institutional collaborations. Collaboration requiring specialized scientific computing resources magnifies the challenges of integrating raw data and maintaining analysis provenance. Consequently, the full potential of these resources can only be realized if the entire pipeline from data collection to analysis can readily capture the annotations and intuition of each distributed collaborator. Currently, few tools exist that integrate data management, provenance tracking and collaborative infrastructure into a package palatable to all stakeholders in this growing, distributed team.

Ovation™ (http://ovation.io) is a distributed and eventually consistent data management and collaboration platform. Ovation’s data model, interface and API are closely matched to the mental model of researchers, facilitating adoption by experimental and computational research teams. Ovation integrates with researchers’ existing acquisition and analysis tools including Matlab, Python, R and Mathematica. The Ovation platform helps individual scientists organize their data and track provenance, and empowers collaborative project teams through sharing of data, annotations and analyses. I will share our experience in deploying Ovation to research groups in the life sciences and discuss the potential of deeper integration with computational resources such as those at the UW eScience institute.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
312
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Uw e sciences institute april 2013

  1. 1. Challenges in Life Sciences data management andcloud enabled collaborationBarry Wark, Ph.D.Founder and President, Physionbarry@physion.usTwitter @barryjwark
  2. 2. Barry Wark PyXG Voyeur Symphony
  3. 3. The nature of scientific research has changed, challenging the fundamentals of the scientific methodLife scientists need solutions that help them bridge local needs with global resources Think globally, act locally
  4. 4. The nature of scientific research has changedfundamentally Biology is a context dependent system. Studying context dependence requires lots of data.‣Data volume ‣ Analytical tools • High-content screening: desktop confocal • Central computing resources, elastic can image 25,000 samples per day provisioning • Human genome $5000, and falling fast • Open source software democratizes contribution and distribution • IonWorks Barracuda® can perform 6,000 whole-cell patch clamp experiments per hour ‣Teams‣Data variety • Experimental and analytical specialization • “Coherent” data sets (e.g. Sage, Personal • Research cores and consortia Genome Project) • Distributed across organizations and • Behavior, anatomy, physiology, genomics institutions experiments on the same subject
  5. 5. Pipelined data flow through computational resources Researcher Analyst Result/Report dataset
  6. 6. Data that is not easily pipelined doesn’t getincorporated Researcher dataset Not scalable Researcher Analyst Result/Report dataset Researcher dataset
  7. 7. Analysis provenance that transits individualresearchers is hard to track Researcher dataset Researcher Analyst Result/Report dataset Researcher dataset Researcher Analyst Result/Report dataset
  8. 8. Comprehensive data management must span theentire data lifecycle Enterprise SDMSComplexity/Cost Analytical tools ELN OSF Paper notebook Figshare Acquisition Analysis Data lifecycle stage
  9. 9. Comprehensive data management must span theentire data lifecycle Enterprise SDMSComplexity/Cost Analytical tools Ovation ELN OSF Paper notebook Figshare Acquisition Analysis Data lifecycle stage
  10. 10. Ovation’s data model describes scienceOvation is built to represent the language of science. Scientific data, regardless ofdiscipline, fits this model. Analogous example shows that representing music in the appropriate language of the domain provides an appropriate data model Music, in the language of the domain expert. Computer representation in the language of May include margin notes, etc. the domain expert (including “margin notes” from composer, conductor, etc.). Any genre of music is representable. Lab notebook representation Ovation representation 11
  11. 11. Ubiquitous data model is the correct granularity forknowledge transferOvation’s data model is more granular than an ELN. Instead of loosing informationduring conversion to (and from) a report format such as a Word document or PDF,Ovation allows data to be transferred in the natural language and granularity ofscience. Information lost in transferAnalogous example shows that transferring data via a “report” (a sound recording) produces an information bottleneck Data transferred directly Seamless collaboration and data transfer removes information bottlenecks 12
  12. 12. Common data model enables collaborationInteroperability across institutional boundaries is easier with Ovation than othersolutions. Unlike ad-hoc or customized data management systems, every Ovationcustomer uses the same data model. Individual Global Collaborators researcher community Data transfer via Ovation data model 13
  13. 13. The Ovation data model for subject definition Protocol Project Subject Epoch Experiment Procedure Subject { species : Drosophila melanogaster, father : 79326326-9CC0-4770-8DC6-3695113C7A64, mother : A2D40CFF-3016-41AE-AC67-BB09A7D8D9E1 }
  14. 14. The Ovation data model for measurements Project SubjectProtocol Experiment EquipmentSetup Epoch Procedure Measurement DataElement
  15. 15. The Ovation data model for analysis provenance AnalysisRecord Optionally named AnalysisRecord Named DataElement Measurement Optionally named DataElement Measurement DataElement Measurement DataElement
  16. 16. Ovation architecture http://ovaiton.io ACL ACL Object Cloudant storageCouchDB CouchDB Local file cache hidden visible
  17. 17. Ovation uses eventual consistencyOvation chooses availability and partition toleranceover consistency (so you can work from the coffee shop) X2 1 Y1 X2’ 1 Y1 X1 Y1 Client 1 Client 2 Cloud
  18. 18. Ovation uses eventual consistencyThis means conflicting edits can be made bydisconnected clientsAppend-only (mostly) and user-isolatedchanges at the edge of the object graphminimize these conflicts. X2 1 Y1 X2’ 1 Y1 X2 X2’ 1 Y1 Client 1 Client 2 Cloud
  19. 19. Ovation uses eventual consistencyOvation requires users to resolve conflicts thatthey have authority to decide during sync. X2 3 Y1 X2’ 3 Y1 X2 X2’ 3 Y1 Client 1 Client 2 Cloud
  20. 20. Ovation Scientific Data Management System®• Comprehensive data management • Multi-modality • Multi-user annotation • Analysis provenance• Seamless user experience • Double-click installation • Integration with existing tools: Matlab, Python, R, Java • Guide to success• Effective collaboration • Distributed and co-located experts • Data ownership maintained • Cloud-based replication and archiving
  21. 21. Integrated analysis workflow Analysis pipelines that begin with a search, facilitate automatic incorporation of new resultsAcquire Organize Search Analyze %% Run a simple query iterator = context.query(Epoch, ...criteria... ); while(iterator.hasNext()) currEpoch = itrator.next(); ...analyze currEpoch... end
  22. 22. Integrated analysis workflow Acquire Organize Search Analyze Acquire OrganizeReplication technology allows Ovation to replicate a subset of the database for data locality within a computational cluster. Execute workflows on a local or cloud cluster
  23. 23. context = NewDataStoreCoordinator(username, password).getContext();epochs = context.query(context.getQuery(query-name));%% analysis parametersparams = struct();params.MaxLag = 1000; % time window for cross-correlation functionparams.ResponseDelayPts = 0; % exclude at end of modulated lightparams.MinAnalysisEpochs = 3;params.FrequencyCutoff = 500;params.FlushData = 1;%% ANALYZE AND COLLECT RESULTS ====> ORIGINAL ANALYSIS CODE HERE <====%% save analysis record for this figurear = project.insertAnalysisRecord(Figure 1’, epochs,AnalysisFunction.m, params, svnRevision, svnURL);ar.setUserDescription(Manuscript - Figure 1);ar.addTag(<manuscript>);ar.addOutput(Figure 1a’, ./Figure1a.pdf);ar.addResource(Figure 1b’, ./Figure1b.pdf);
  24. 24. Share data in context DerivedResponse Trial name: spikes parameters: {…} code: spikes.m Stimulus Response ovation:///f694d05a-131b-4644-aa7c-f6e8934e60c0/ DerivedResponse Trial name: spikes parameters: {…} code: spikes.m Stimulus Response
  25. 25. Share data in context Project Source Experiment Experiment Device Trial Group DerivedResponse Trial Trial Trial name: spikes parameters: {…} code: spikes.m Stimulus Response Stimulus Response
  26. 26. Ovation enables researchers to extract more knowledge from existing data • Lab’s lifetime work was enough data to answer fundamental questions about signal and noise in the early visual system • Data was locked in individual’s ad-hoc data management • Ovation enabled meta-analysis of this existing data • New graduate students start with the old data, not new experiments et al. • Arrestin Competition (38):11867–11879 Doanpsin is pro- d for eache transduc-convert thenge in cur-mptions, we␣ and ␥0/␴ the single- GRK1ϩ/Ϫ, “Ovation has changed the way we do science…” —Fred Riekeable 2). Be-
  27. 27. Our vision: living data sets Data Data Data
  28. 28. Our vision: living data sets Data Data Data
  29. 29. ovation.io• Store and archive all your data • Make your data available wherever you need it • Safe, secure, highly reliable cloud storage • Replicate and synchronize data to multiple devices • “Offline” archiving • Benefit from our scalable cloud-based architecture• Collaborate locally and globally • Pay for what you use • Share selected data with designated users or the public • Simple monthly fee
  30. 30. Data replication with ovation.io
  31. 31. Neuron Inference in Visual Adaptation Collaboration with ovation.io >sp|P63252|1-427 MGSVRTNRYSIVSSEEDGMKLATMAVANGFG NGKSKVHTRQQCRSRFVKKDGHCNVQFIN VGEKGQRYLADIFTTCVDIRWRWMLVIFCLA FVLSWLFFGCVFWLIALLHGDLDASKEGK ACVSEVNSFTAAFLFSIETQTTIGYGFRCVT DECPIAVFMVVFQSIVGCIIDAFIIGAVM AKMAKPKKRNETLVFSHNAVIAMRDGKLCLM WRVGNLRKSHLVEAHVRAQLLKSRITSEG EYIPLDQIDINVGFDSGIDRIFLVSPITIVH EIDEDSPLYDLSKQDIDNADFEIVVILEG MVEATAMTTQCRSSYLANEILWGHRYEPVLF EEKHYYKVDYSRFHKTYEVPNTPLCSARD LAEKKYILSNANSFCYENEVALTSKEEDDSE NGVPESTSTDTPPDIDLHNQASVPLEPRP LRRESEIan Increase in Temporal Contrast Depends on the Period between Contrast Switches RGC (holding potential 10 mV) in response to a single switch in stimulus contrast (6%–36%,n (A) and 32 s in (B).als as in (A) and (B). Exponential fits to the response following an increase in contrast are shown in red. Figure 1. The Time Course of Adaptation following an Increase in Temporal Contrast Depends on the Period between Contrast Switches nt (mean ± SEM) of the exponential fit to the response following an increase in contrast (6%–36%) for OFF) as a function of stimulus switching period. (A and B) Inhibitory synaptic current to an OFF-transient RGC (holding potential 10 mV) in response to a single switch in stimulus contrast (6%–36%, Meister, 2002; nonrectified, the r.m.s. current was fit with the same function. mean $400 R*/rod/s; red). The switching period was 16 s in (A) and 32 s in (B).ynamics of the The exponential amplitude A and baseline c did not change (C and D) significantly as a function of the switching period approximately 100 trials as in (A) and (B). Exponential fits to the response following an increase in contrast are shown in red. Mean synaptic currents from (not shown). Figure 1E shows the population average time constant as (E) Population-averaged (n z 10 for each period) time constant (mean ± SEM) of the exponential fit to the response following an increase in contrast (6%–36%) for a function of period. The average effective time constant of adaptation scales approximately linearly across a broad range stall RGC types (ON, OFF-sustained, OFF-transient, and ON-OFF) as a function of stimulus switching period. of switching periods ($8–32 s). The observed scaling fails for ion depend on short periods but extends to the longest period (T = 32 s) that eriodic switch we could measure reliably. A similar relationship was observedscribed below, when comparing the time constant of an exponential fit to only se in contrast the first 8 s of 8, 16, and 32 s periods (not shown). Thus the effect et al., 2001; Smirnakis et al., 1997; Baccus and Meister, 2002;ptic currents in is not simply the result of fitting an exponential to a nonexponen- tial response over varying time windows. These results indicate nonrectified, the r.m.s. current was fit with the same function. Kim and Rieke, 2001). Here we focus on the dynamics of the a stimulus thatperiod of 16 s that a fixed first-order process does not govern the dynamics of contrast adaptation in mouse retina. Instead, the adapting The exponential amplitude A and baseline c did not change slow component of adaptation.d across trialstrast stimulus, machinery has access to multiple timescales. Dynamics of Adaptation to Luminance significantly as a function of the switching period (not shown). synaptic input urse of several To test the generality of multiple-timescale dynamics of adapta- tion, we measured responses to periodic changes in mean light Figure 1E shows the population average time constant as Contrast and Luminance Adaptationslow relaxation intensity (luminance). As for contrast adaptation, the dynamics of a function of period. The average effective time constant of
  32. 32. Analysis provenance that transits individualresearchers is hard to track Researcher dataset Researcher Analyst Result/report dataset Researcher dataset Researcher Analyst Result/report dataset
  33. 33. Ovation enables integration of non-pipeline data, andcomprehensive analysis provenance Researcher dataset Researcher Analyst Result/report dataset Researcher dataset Researcher Analyst Result/report dataset
  34. 34. Getting started with Ovation ✓Signup ✓Download ✓Get startedhttp://ovation.io info@ovation.io @ovation_io

×