Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Capturing Context in Scientific Experiments: Towards Computer-Driven Science

184 views

Published on

Scientists publish computational experiments in ways that do not facilitate reproducibility or reuse. Significant domain expertise, time and effort are required to understand scientific experiments and their research outputs. In order to improve this situation, mechanisms are needed to capture the exact details and the context of computational experiments. Only then, Intelligent Systems would be able help researchers understand, discover, link and reuse products of existing research.

In this presentation I will introduce my work and vision towards enabling scientists share, link, curate and reuse their computational experiments and results. In the first part of the talk, I will present my work for capturing and sharing the context of scientific experiments by using scientific workflows and machine readable representations. Thanks to this approach, experiment results are described in an unambiguous manner, have a clear trace of their creation process and include a pointer to the sources used for their generation. In the second part of the talk, I will describe examples on how the context of scientific experiments may be exploited to browse, explore and inspect research results. I will end the talk by presenting new ideas for improving and benefiting from the capture of context of scientific experiments and how to involve scientists in the process of curating and creating abstractions on available research metadata.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Capturing Context in Scientific Experiments: Towards Computer-Driven Science

  1. 1. Capturing Context in Scientific Experiments: Towards Computer-Driven Science Daniel Garijo Information Sciences Institute and Department of Computer Science https://w3id.org/people/dgarijo @dgarijov dgarijo@isi.edu
  2. 2. A prediction of the future… from the past Useful for: • Every day tasks • Organize agenda • Calls • Look for information • Research features • Summarize related work • Reuse and comparison of work • Highlights • Do new data analyses Capturing Context in Scientific Experiments: Towards Computer-Driven Science 2 Source: https://www.businessinsider.com.au/apple-future-computer-knowledge-navigator-john-sculley-george-lucas-2017-10, https://www.youtube.com/watch?v=QRH8eimU_20 The knowledge navigator (Apple, 1987)
  3. 3. Meeting expectations… • In terms of Data • Open datasets • Open metadata portals • In terms of Software • Open Source repositories • Containers and virtual machines • In terms of Publications • Open journals • Open methods/protocols 3Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  4. 4. What are we missing? • Methods in publications are not designed for intelligent systems • Objectives, hypotheses, methodology and conclusions are tailored for humans • Link between data, software and publications is not clear (if exists) • Functionality and instructions for executing software requires specific domain expertise • Publications are difficult to reuse and reproduce 4 Retracted Scientific Studies: A Growing List - NYTimes.com Sections Home Search Skip to content Advertisement Email Share Tweet More Search Subscribe Log In 0 Settings Close search search sponsored by Search NYTimes.com SUBSCRIBE NOW 5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com The retraction by Science of a study of changing attitudes about gay marriage is the latest prominent withdrawal of research results from scientific literature. And it very likely won't be the last. A 2011 study in Nature found a 10-fold increase in retraction notices during the preceding decade. Many retractions barely register outside of the scientific field. But in some instances, the studies that were clawed back made major waves in societal discussions of the issues they dealt with. This list recounts some prominent retractions that have occurred since 1980. Photo In 1998, The Lancet, a British medical journal, published a study by Dr. Andrew Wakefield that suggested that autism in children was caused by the combined vaccine for measles, mumps and rubella. In 2010, The Lancet retracted the study following a review of Dr. Wakefield's scientific methods and financial conflicts. Despite challenges to the study, Dr. Wakefield's research had a strong effect on many parents. Vaccination rates tumbled in Britain, and measles cases grew. American antivaccine groups also seized on the research. The United States had more cases of measles in the first month of 2015 than the number that is typically diagnosed in a full year. Vaccinesand Autism Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  5. 5. The Cost of Reproducibility 5 • Necessary to fill in the gaps • 2 months of effort in reproducing published method [Kinnings et al, PLOS 2010] • Authors expertise was required Comparison of ligand binding sites Comparison of dissimilar protein structures Graph network generation Molecular Docking [Garijo et al PLOS] Collaboration with UCSD 5Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  6. 6. Scientist-Driven Science 6 Scientist Scientist + Automated Tools Scientist + Intelligent System Intelligent Systems help: • Comparing • Reusing/Repurposing • Testing new hypotheses • Explaining results Requirements: • Functionality • Relations between data, software and method • Provenance Scientists: • Keep their own records • Write their own software • Data cleaning • Reformatting • Analysis • Run the experiments • Manually analyze results and compare to state of the art Automated Tools help: • Searching • Setting up execution • Visualizing • Sharing Requirements • Data/Dataset metadata • Software/Software metadata • Method description • User/domain expertise Capturing Context in Scientific Experiments: Towards Computer-Driven Science Context of a computational experiment
  7. 7. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Environmental sciences modeling • A vision for context capture in computer-driven science 7Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  8. 8. Introduction Lab book Digital Log Laboratory Protocol (recipe) Scientific Workflow Experiment In silico experiment 8 Background: Computational Experiments Capturing Context in Scientific Experiments: Towards Computer-Driven Science 8
  9. 9. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Environmental sciences modeling • A vision for context capture in computer-driven science 9Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  10. 10. Workflow representation: Structures interchanged in the workflow lifecycle Dataset Stemmer algorithm Result Term weighting algorithm FinalResult File: Dataset123 LovinsStemmer algorithm Id:resultaa1 IDF algorithm Id:fresultaa2 Workflow Template Workflow Instance Workflow Execution Trace Design Instantiation Execution File: Dataset124 PorterStemmer algorithm Id:resultaa1 IDF algorithm Id:fresultaa2 File: Dataset123 LovinsStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset123 LovinsStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset124 PorterStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset124 PorterStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset124 PorterStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 File: Dataset123 LovinsStemmer execution Id:resultaa1 IDF execution Id:fresultaa2 … … Id:resultaa1 Workflow Lifecycle Capturing Context in Scientific Experiments: Towards Computer-Driven Science 11
  11. 11. Requirements Workflow template description Workflow execution trace description Workflow attribution Workflow metadata Link between templates and executions Requirements for workflow Representation [Garijo et al., 2017 FGCS] Plan: P-Plan [Garijo et al 2012] http://purl.org/net/p-plan Provenance: PROV (W3C) [Lebo et al 2013] http://www.w3.org/ns/prov# Dublin Core, PROV (W3C) 11Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  12. 12. OPMW: Extending provenance standards and plan models template1 opmw:isVariableOfTemplate opmw:isVariable OfTemplate Input Dataset Term Weighting Topics p-plan:isOutputVarOf p-plan:hasInputVar opmw:isStepOf Template opmw:correspondsTo Template opmw:corresponds toTemplateArtifact opmw:corresponds toTemplateProcess opmw:corresponds toTemplateArtifact opmw:Workflow ExecutionProcess opmw:Workflow ExecutionAccount prov:Entity prov:Activity prov:Bundle PROV, OPM Extension opmv:Artifact opmo:Account opmv:Process opmw:Workflow ExecutionArtifact opmw:Workflow TemplateArtifact opmw:Workflow TemplateProcess opmw:Workflow Template p-plan:Plan p-plan:Step p-plan:Variable P-Plan extension Class Object property Legend Instance ofInstance Subclass of execution1 File: Dataset123 IDF (java) File: FResultaa2 prov:wasGeneratedBy prov:used opmo:account opmo:account opmo:account http://www.opmw.org/ontology/ A Vocabulary for Workflow Representation: OPMW Capturing Context in Scientific Experiments: Towards Computer-Driven Science 13
  13. 13. Publishing workflows as Linked Data Specification Why Linked Data? •Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [Villazón-Terrazas et al 2011] Tested it for the WINGS workflow system 1 Base URI = http://www.opmw.org/ Ontology URI = http://www.opmw.org/ontology/ Assertion URI = http://www.opmw.org/export/resource/ClassName/instanceName Examples: http://www.opmw.org/export/resource/WorkflowTemplate/ABSTRACTSUBWFDOCKING http://www.opmw.org/export/resource/WorkflowExecutionAccount/ACCOUNT1348629 350796 Publishing scientific workflows as Linked Data 14Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  14. 14. Publishing workflows as Linked Data Why Linked Data? •Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [Villazón-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data Specification Modeling 1 2 OPMW P-Plan OPM DC PROV 15Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  15. 15. Publishing workflows as Linked Data Why Linked Data? •Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [Villazón-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data 16Capturing Context in Scientific Experiments: Towards Computer-Driven Science Specification Modeling Generation 1 2 3 Workflow system Workflow Template Workflow execution OPMW export OPMW RDF
  16. 16. Publishing workflows as Linked Data Why Linked Data? •Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [Villazón-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data 17Capturing Context in Scientific Experiments: Towards Computer-Driven Science Specification Modeling Generation Publication 1 2 3 4 RDF Triple store Permanent web- accessible file store RDF Upload Interface SPARQL Endpoint OPMW RDF
  17. 17. Publishing workflows as Linked Data Why Linked Data? •Facilitates exploitation of workflow resources in an homogeneous manner Adapted methodology from [Villazón-Terrazas et al 2011] Tested it for the WINGS workflow system Publishing scientific workflows as Linked Data 18Capturing Context in Scientific Experiments: Towards Computer-Driven Science Specification Modeling Generation Publication 1 2 3 4 Exploitation 5 Curl Linked Data Browser SPARQL endpoint Workflow explorer
  18. 18. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Machine learning analysis • Environmental sciences modeling • A vision for context capture in computer-driven science 18Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  19. 19. Capturing software functionality [Garijo et al 2014a] (Collaboration with U. of Manchester) Is it possible to generalize workflow steps based on their functionality in an experiment? 19Capturing Context in Scientific Experiments: Towards Computer-Driven Science • What kind of data manipulations are performed in a workflow? •E.g.: •Data retrieval •Data preparation •Data curation •Data visualization • etc.
  20. 20. Capturing software functionality [Garijo et al 2014a] (Collaboration with U. of Manchester) Analyzed software steps of 260 workflows from 4 different workflow systems Created a catalog of workflow step functionalities (motifs) Guidelines for annotating workflows Catalog available at: http://purl.org/net/wf-motifs# 20Capturing Context in Scientific Experiments: Towards Computer-Driven Science = 260 workflows 89 12526 20
  21. 21. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Environmental sciences modeling • A vision for context capture in computer-driven science 21Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  22. 22. Capturing Software Metadata [Gil et al 2015] • Scientific workflows capture some software metadata • High amount of software not used in scientific workflows • Software in open repositories often have missing metadata • How to use it? • What can I use it with? • What are the dependencies? • Is it still maintained? • How can I contribute? • … • Ontology for scientific software metadata • Described with scientist in mind: • How can scientist contribute to populate it? • What do scientists need in terms of software? 22Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  23. 23. Software Metadata: Categories 23Capturing Context in Scientific Experiments: Towards Computer-Driven Science Used in the OntoSoft metadata Registry: http://ontosoft.org/portals http://ontosoft.org/software
  24. 24. Using the ontology in the Ontosoft software registry 24Capturing Context in Scientific Experiments: Towards Computer-Driven Science Software entries from distributed repositories are readily accessible Semantic search Comparison matrix of software entries PIHM PIHMgis DrEICH TauDEM WBMsed nto$ o%$ Metadata completion highlighted Software is contrasted by property
  25. 25. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Environmental sciences modeling • A vision for context capture in computer-driven science 25Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  26. 26. Detecting commonalities in computational experiments [Garijo et al 2014b] PROBLEMS to address: • Workflows have many detailed steps and may be difficult to understand • The general method may not apparent • How are different workflow related? • What steps do they have in common? 26Capturing Context in Scientific Experiments: Towards Computer-Driven Science A B C A F D A B C G B H A B F B E Common workflow fragments Workflow 1 Workflow 2 Workflow 3
  27. 27. 1 2 3 4 28Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Dataset Stemmer algorithm Result Term weighting algorithm FinalResult Stemmer algorithm Term weighting algorithm Duplicated workflows are removed Single-step workflows are removed
  28. 28. 1 2 3 4 29Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Popular graph mining techniques Inexact FSM: usage of heuristics to calculate similarity between two graphs. The solution might not be complete Exact FSM: deliver all the possible fragments to be found the dataset.
  29. 29. 1 2 3 4 30Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Remove redundant fragments
  30. 30. 1 2 3 4 31Capturing Context in Scientific Experiments: Towards Computer-Driven Science A method for detecting reusable workflow fragments [Garijo et al 2014b] Link fragments back to the workflows where they were found http://purl.org/net/wf-fd
  31. 31. ? Research question: Are our proposed workflow fragments useful? •A fragment is useful if it has been designed and (re)used by a user. •Comparison between proposed fragments and user designed fragments (groupings) and workflows Workflow fragment assessment 32Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  32. 32. ? Workflow fragment assessment 33Capturing Context in Scientific Experiments: Towards Computer-Driven Science Metrics: Precision and recall Fragments (F) Workflows (W) Groupings (G)
  33. 33. ? Workflow fragment assessment 34Capturing Context in Scientific Experiments: Towards Computer-Driven Science Workflow corpora User Corpus 1 (WC1) • Designed mostly by a single a single user • 790 workflows (475 after data preparation) User Corpus 2 (WC2) • Created by a user, with collaborations of others • 113 workflows (96 after data preparation) Multi User Corpus 3 (WC3) • Workflows submitted by 62 users during the month of Jan 2014 • 5859 workflows (357 after data preparation) User Corpus 4 (WC4) • Designed mostly by a single a single user • 53 workflows (50 after data preparation)
  34. 34. ? Workflow fragment assessment 35Capturing Context in Scientific Experiments: Towards Computer-Driven Science Result assessment •30%-60% of proposed fragments are equal to user defined groupings or workflows •40%-80% of proposed of proposed fragments are equal or similar to user defined groupings or workflows Commonly occurring patterns are potentially useful for users designing workflows What about the rest of the fragments? Are those useful?
  35. 35. ? Workflow fragment assessment 36Capturing Context in Scientific Experiments: Towards Computer-Driven Science User feedback: user survey Q1: Would you consider the proposed fragment a valuable grouping? •I would not select it as a grouping (0) •I would use it as a grouping with major changes (i.e., adding/removing more than 30% of the steps) (1) •I would use it as a grouping with minor changes (i.e., adding/removing less than 30% of the steps) (2). •I would use it as a grouping as it is (3) Q2: What do you think about the complexity of the fragment? •The fragment is too simple (0) •The fragment is fine as it is (1) •The fragment has too many steps (2) Not enough evidence to state that all proposed workflow fragments are useful
  36. 36. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Environmental sciences modeling • A vision for context capture in computer-driven science 36Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  37. 37. Using captured context to explain results [Gil and Garijo 2016] Current methods in paper are ambiguous, incomplete and described at inconsistent levels of detail Comparison of ligand binding sites Comparison of dissimilar protein structures Graph network generation Molecular Docking The SMAP software was used to compare the binding sites of the 749 M.tb protein structures plus 1,446 homology models (a total of 2,195 protein structures) with the 962 binding sites of 274 approved drugs, in an all- against-all manner. While the binding sites of the approved drugs were already defined by the bound ligand, the entire protein surface of each of the 2,195 M.tb protein structures was scanned in order to identify alternative binding sites. For each pairwise comparison, a P -value representing the significance of the binding site similarity was calculated. 38Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  38. 38. Using captured context to explain results [Gil and Garijo 2016] Current methods in paper are ambiguous, incomplete and described at inconsistent levels of detail Goal: Automatically generate reports from computer-generated data analysis records • Reports must: • Be truthful to actual events • Enable inspection • Be human-understandable • Abstract details • Ideally: • Become part of papers • Have persistent evidence • Be adapted to different audiences/expertise/purpose 39Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  39. 39. Data Narratives 1. A record of events that describe a new result • A workflow and/or provenance of all the computations executed 2. Persistent entries for key entities involved • URIs/DOIs for data, software versions, workflow,… 3. Narrative account(s) • Human-consumable rendering(s) that includes pointers to the detailed records and entries • Each account is generated for a different audience/purpose • A casual reader, a close colleague, someone inspecting how the work was done, someone reproducing the work 40Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  40. 40. Data Narrative Accounts: An example 40 “Topic modeling was run on the Reuters R8 dataset (10.6084/ m9.figshare.776887), and English Words dataset (10.6084/m9.figshare.776888), with iterations set to 100, stop word size set to 3, number of topics set to 10 and batch size set to 10. The results are at 10.6084/m9.figshare.776856” “The topics at 10.6084/m9.figshare.776856 were found in the Reuters R8 dataset (10.6084/m9.figshare.776887) and English Words dataset (10.6084/m9.figshare.776888)” • Execution view • Inputs, parameters and main outputs • Data view • Just the data that influenced the results • Method view • Main steps based on their functionality “Topic training was run on the input dataset. The results are product of PlotTopics, a visualization step” Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  41. 41. • Dependency view • How the steps depend on each other • Implementation view • How the steps were implemented in the execution • Software view • Details on the software used to implement the steps Data Narrative Accounts: An example 41 “First, the input data is filtered by Stop Words, followed by Small Words, Format Dataset, and Train Topics. The final results are produced by Plot Topics” “Train topics was implemented using Latent Dirichlet allocation” “The train topics step was generated with Online LDA open source software, written in Java. Plot topics was generated with the Termite software.” Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  42. 42. DANA: DAta NArratives 42 Experiment Records Provenance RepositoryExperiment- specific Knowledge Base DANA Generator Narrative accounts Software registry Query patterns Data Narrative aggregator Input Resource request Response Resource request Response Output Get query Pattern result Get pattern 1. Identify which experiment records to describe 2. Generation of an Experiment-specific knowledge base 3. Creation of the Data Narrative from templates 4. Produce narrative accounts Capturing Context in Scientific Experiments: Towards Computer-Driven Science https://knowledgecaptureanddiscovery.github.io/DataNarratives/
  43. 43. Formative evaluation • Survey with 6 target scenarios • Each scenario: • Description of a situation where a user has to do a task • A workflow sketch of the analysis done • Six candidate narratives of that workflow sketch. • 12 responses from users • Results • Each narrative is considered appropriate for describing some scenario • Different users chose different narratives for each scenario 43Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  44. 44. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Environmental sciences modeling • A vision for context capture in computer-driven science 44Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  45. 45. Using Context for Hypothesis Testing [Gil et al 2016] 45Capturing Context in Scientific Experiments: Towards Computer-Driven Science data Protein PRKCDBP is expressed in samples of patient P36 hypothesis revision PRKCDBP mutation is expressed in P36 workflows meta- workflows Wf#0# Wf#1# Wf#2# simMetrics# com parison* hypothesis# revisedHyp# hypothesisRevision*
  46. 46. Hypothesis Testing: My Contribution [Garijo et al 2017] 46Capturing Context in Scientific Experiments: Towards Computer-Driven Science HG2 HE2 HG1 HE1 HS2 Protein EGFR Colon Cancer SubtypeA Associated With revisionOf HS1 Protein EGFR Colon Cancer Associated With wasGeneratedBy Execution 1 wasGeneratedBy HQ2 Execution 2 C1 hasConfidence Report L2 hasConfidenceLevel wasGeneratedBy HQ1 C1 hasConfidence Report L1 hasConfidenceLevel Statement Qualifier Evidence History The DISK Ontology: http://disk-project.org/ontology/disk/
  47. 47. Using Context for Environmental Sciences Modeling 47Capturing Context in Scientific Experiments: Towards Computer-Driven Science Work in progress • Modeler wants to predict a situation • E.g., Impact of draught in the Amazon • Intelligent system assists: • Finding data of interest • Connecting environmental models: hydrology, economy, agronomy, etc. • Facilitating the execution of models • Visualizing results My contribution: • Extending our software ontology to capture requirements of environmental models • Relating variables to inputs, units, time, etc. Albedo Soil moisture Soil quality Precipi tation Comm odity prices Property rights Market access Crop/forest yields Land use House hold type Climate Model Hydrology Model Economy model … Intelligent System predictionsvariables Scenario Data Catalog Model Catalog
  48. 48. Outline • Capturing and publishing context of computational experiments • From scientific workflows to Linked Data • Capturing software functionality • Representing software metadata • Using context to facilitate reusability and exploration of experiments • Detecting commonalities among experiments • Explaining computational results • Using context in Intelligent Systems • Hypothesis testing • Environmental sciences modeling • A vision for context capture in computer-driven science 48Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  49. 49. Where are we headed? 49 Scientist Driven Science Computer Driven Science Scientist Scientist + Automated Tools Scientist + Intelligent System Intelligent System + Scientist • Can an Intelligent System co-author a paper? Can it be an author? • Can it win a Nobel prize? [Kitano, ISWC 2016] • What do we need to capture (in Software, Data, Methods, Provenance)? 1. Functionality and abstraction 2. Granularity 3. Importance Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  50. 50. Next steps for context capture in computational experiments • Capturing different levels of abstraction in experiments • Using user expertise to curate captured context • What do users consider important? • Improve explanation of details • How can we identify the core function of a software step? • Represent the goal and objectives of a computational experiment 50Capturing Context in Scientific Experiments: Towards Computer-Driven Science RDF Triple store
  51. 51. Summing up • Context is needed to understand and reuse computational experiments • Sharing context from computational experiments • Scientific workflows and their executions • Software functionality and metadata • Getting value out of context • Reusability, exploration, explanation • Used to power intelligent systems! • Next steps • Representing functionality and levels of abstraction • Interact with users to curate context 51Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  52. 52. Special thanks • Yolanda Gil • Varun Ratnakar • Oscar Corcho • Pinar Alper • Khalid Belhajjame • Asuncion Gomez Perez • Idafen Santana Perez • Felisa Verdejo • Francisco Garijo 52Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  53. 53. References • [Kinnings et al, PLOS 2010]: Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, Bourne PE (2010) The Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications. PLoS Comput Biol 6(11): e1000976. https://doi.org/10.1371/journal.pcbi.1000976 • [Garijo et al PLOS]: Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLoS ONE 8(11): e80278. https://doi.org/10.1371/journal.pone.0080278 • [Garijo et al 2014a]: Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.; Gil, Y.; and Goble, C .Common motifs in scientific workflows: An empirical analysis. Future Generation Computer Systems, 36: 338--351. 2014. • [Garijo et al 2014b]: Garijo, D.; Corcho, O.; Gil, Y.; Gutman, B. A; Dinov, I. D; Thompson, P.; and Toga, A Fragflow automated fragment detection in scientific workflows. W In e-Science (e-Science), 2014 IEEE 10th International Conference on, volume 1, pages 281--289, 2014. IEEE • [Garijo and Gil 2016]: Gil, Y.; and Garijo, D. Towards Automating Data Narratives. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, pages 565--576, 2017. ACM • [Garijo et al 2017]: Garijo, D.; Gil, Y.; and Ratnakar, V. The DISK Hypothesis Ontology: Capturing Hypothesis Evolution for Automated Discovery. In Proceedings of the Workshop on Capturing Scientific Knowledge (SciKnow), held in conjunction with the ACM International Conference on Knowledge Capture (K-CAP), Austin, Texas, 2017. • [Garijo et al 2017 FGCS]: Garijo, D.; Gil, Y.; and Corcho, O. Abstract, link, publish, exploit: An end to end framework for workflow sharing. Future Generation Computer Systems, . 2017. • [Gil et al 2015]: Gil, Y.; Ratnakar, V.; and Garijo, D. OntoSoft: Capturing scientific software metadata. In Proceedings of the 8th International Conference on Knowledge Capture, pages 32, 2015. ACM • [Kitano ISWC 2016]: Kitano, H. Artificial Intelligence to Win the Nobel Prize and Beyond: Creating the Engine for Scientific Discovery. Keynote http://iswc2016.semanticweb.org/pages/program/keynote- kitano.html 53Capturing Context in Scientific Experiments: Towards Computer-Driven Science
  54. 54. Capturing Context in Scientific Experiments: Towards Computer-Driven Science: Daniel Garijo Information Sciences Institute and Department of Computer Science https://w3id.org/people/dgarijo @dgarijov dgarijo@isi.edu

×