Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reproducibility 1

786 views

Published on

This is a keynote that I have given in polyweb workshop on the state of the art of data science reproducibility. I review tools that have been developed over the last few years in the first part. In the second part, I focus on proposals that I have been involved in to facilitate workflow reproducibility and preservation.

Published in: Data & Analytics
  • Be the first to comment

Reproducibility 1

  1. 1. Khalid Belhajjame https://sites.google.com/site/kbelhajjame @kbelhajj
  2. 2. “Science is built upon the founda0ons of theory and experiment validated and improved through open, transparent communica0on. With the increasingly central role of computa0on in scien0fic discovery, this means communica0ng all details of the computa0ons needed for others to replicate the experiment. V. Stodden, D. H. Bailey, J. M. Borwein, R. J. LeVeque, W. Rider, and W. Stein. Setting the default to reproducible: Reproducibility in computational and experimental mathematics. Khalid Belhajjame @ PoliWeb Workshop, 2014 2
  3. 3. basic studies on cancer are unreliable, with grim consequences for producing new medicines in the future Khalid Belhajjame @ PoliWeb Workshop, 2014 3
  4. 4. The research result, obtained by Stapel and co-­‐workers Roos Vonk (Radboud University) and Marcel Zeelenberg (nl) (Tilburg University), showing that meat eaters are more selfish than vegetarians, which was widely publicized in Dutch media is suspected to be based on faked data. Khalid Belhajjame @ PoliWeb Workshop, 2014 4
  5. 5. ¡ ReplicaEon means conducEng studies with independent: § InvesEgators § Data, § methods, § Laboratories, § Instruments. ¡ ReplicaEon is the ulEmate standard for strengthening evidence and trust in scienEfic findings. ¡ However, replicaEon is most of the Eme not possible: expensive (Eme and money), opportunisEc Khalid Belhajjame @ PoliWeb Workshop, 2014 5
  6. 6. Khalid Belhajjame @ PoliWeb Workshop, 2014 6 Way too expensive Reproducible Research: Make data and code available so that others Replication may reproduce findings Scholarly Article, is not enough Reproducibility (Re)useless
  7. 7. Workability Khalid Belhajjame @ PoliWeb Workshop, 2014 7 Cost Reproducibility Level (Almost) Nothing Replicability Reproducibility
  8. 8. ¡ The huge increases in performance both at the level of hardware and soVware, meant that highly complex analysis are possible. ¡ However, these same advances meant a higher risk of generaEng results that cannot be reproduced. Khalid Belhajjame @ PoliWeb Workshop, 2014 8
  9. 9. ¡ Researchers in experimental biology use carefully lab notebooks to document different aspects of their experiments. ¡ This is not the case for computaEonal scienEsts who tend to run their analysis with no clear record of the exact process they followed or intermediary datasets (results) they used and generated. ¡ It is therefore possible that numerous published results may be unreliable or even completely invalid. Khalid Belhajjame @ PoliWeb Workshop, 2014 9
  10. 10. ¡ OVen, there is no record of the process (workflow) that produced the published computaEonal results in scholarly communicaEons. ¡ Even the code is missing, or underwent changes. § It cannot be used to process the data referred to, (if we are lucky). Khalid Belhajjame @ PoliWeb Workshop, 2014 10
  11. 11. “The reproducible research movement recognizes that tradi0onal scien0fic research and publica0on prac0ces now fall short …, and encourages all those involved i n the produc0on of computa0onal science ... to facilitate and prac0ce really reproducible research.” We witnessed recently the emergence of a number of methods and tools for enabling reproducibility V. Stodden, D. H. Bailey, J. M. Borwein, R. J. LeVeque, W. Rider, and W. Stein. Setting the default to reproducible: Reproducibility in computational and experimental mathematics. Khalid Belhajjame @ PoliWeb Workshop, 2014 11
  12. 12. Khalid Belhajjame @ PoliWeb Workshop, 2014 12 System-­‐Level Reproducibility Reprozip Burrito ES3 Scripting oriented Reproducibility IPython Knitr IJulia Workflow oriented reproducibility Galaxy Taverna Vistrails Article Centered Reproducibility SOLE DEEP SHARE Investigation oriented Reproducibility ISA Research Object FuGE
  13. 13. Packing Experiments AUTHORS Computational Environment E Execution p’ Experiment ReproZip p Provenance Tree Capture of Provenance Khalid Belhajjame @ PoliWeb Workshop, 2014 13
  14. 14. Packing Experiments AUTHORS Computational Environment E Execution p’ Experiment ReproZip Capture of Provenance p • command-line arguments • working directory • files read • files written … process p’ Khalid Belhajjame @ PoliWeb Workshop, 2014 14
  15. 15. Packing Experiments AUTHORS Computational Environment E Experiment ReproZip Capture of Provenance Description of data Description of experiment Description of environment Khalid Belhajjame @ PoliWeb Workshop, 2014 15 Execution Provenance Tree Identification of Necessary Components Input and output files Executable programs and steps Environment variables, dependencies, …
  16. 16. Packing Experiments AUTHORS Computational Environment E Experiment ReproZip Capture of Provenance Description of data Description of experiment Description of environment
  17. 17. Khalid Belhajjame @ PoliWeb Workshop, 2014 16 Execution Provenance Tree Identification of Necessary Components Input and output files Executable programs and steps Environment variables, dependencies, … VisTrails Workflow Specification of Workflow Reproducible Package Figure taken from Chirigati et al., 2012
  18. 18. Khalid Belhajjame @ PoliWeb Workshop, 2014 17 System-­‐Level Reproducibility Reprozip Burrito ES3 Scripting oriented Reproducibility IPython Knitr IJulia Workflow oriented reproducibility Galaxy Taverna Vistrails Article Centered Reproducibility SOLE DEEP SHARE Investigation oriented Reproducibility ISA Research Object FuGE
  19. 19. ¡ IPython provides a rich architecture for interacEve compuEng with: § A browser-­‐based notebook with support for code, text, mathemaEcal expressions, inline plots and other rich media. § Support for interacEve data visualizaEon and use of GUI toolkits. Khalid Belhajjame @ PoliWeb Workshop, 2014 18
  20. 20. Khalid Belhajjame @ PoliWeb Workshop, 2014 19
  21. 21. Khalid Belhajjame @ PoliWeb Workshop, 2014 20 System-­‐Level Reproducibility Reprozip Burrito ES3 Scripting oriented Reproducibility IPython Knitr IJulia Workflow oriented reproducibility Galaxy Taverna Vistrails Article Centered Reproducibility SOLE DEEP SHARE Investigation oriented Reproducibility ISA Research Object FuGE
  22. 22. ¡ Inputs to computaEonal science are not linked with its outputs. § Inputs: Large quanEEes of data, complex data manipulaEon and/or numerical simulaEon use of large and oVen distributed soVware stacks. § Outputs: Research papers (text-­‐based, non-­‐interacEve) ¡ Authors and Readers § approach computaEonal § science from opposite direcEons ¡ The objecEve of SOLE is to link research papers with auxiliary resources that have been uElized, e.g., datasets, soVware programs, files, etc. Khalid Belhajjame @ PoliWeb Workshop, 2014 21
  23. 23. Khalid Belhajjame @ PoliWeb Workshop, 2014 22 System-­‐Level Reproducibility Reprozip Burrito ES3 Scripting oriented Reproducibility IPython Knitr IJulia Workflow oriented reproducibility Galaxy Taverna Vistrails Article Centered Reproducibility SOLE DEEP SHARE Investigation oriented Reproducibility ISA Research Object FuGE
  24. 24. ¡ Assists users to submit the structured content via simple templates and an internal authoring tool ¡ Performs value-­‐ added semanEc annotaEon of the experimental metadata Khalid Belhajjame @ PoliWeb Workshop, 2014 23
  25. 25. Duplicate Detection Reproducibility Summarization Combating Decay of Khalid Belhajjame @ PoliWeb Workshop, 2014 25
  26. 26. ScienEfic Workflow Reproducibility
  27. 27. ¡ Data driven analysis pipelines ¡ Systematic gathering of data and analysis tools into computational solutions for scientific problem-solving ¡ Tools for automating frequently performed data intensive activities ¡ Provenance for the resulting datasets § The method followed § The resources used § The datasets used Khalid Belhajjame @ PoliWeb Workshop, 2014 27
  28. 28. GWAS, Pharmacogenomics Association study of Nevirapine-­‐induced skin rash in Thai Population Trypanosomiasis (sleeping sickness parasite) in African Cattle Astronomy HelioPhysics Library Doc Preservation Systems Biology of Micro-­‐ Organisms Observing Systems Simulation Experiments JPL, NASA BioDiversity Invasive Species Modelling [Credit Carole A. Goble] Khalid Belhajjame @ PoliWeb Workshop, 2014 28
  29. 29. ¡ Scientific workflows are primarily used to specify and enact in silico experiments ¡ However, they can also be used as a a means to document the experiment that the scientist ran, and even repurpose it! Khalid Belhajjame @ PoliWeb Workshop, 2014 Kegg pathway query Kegg pathway query chromosome17 chromosome37 Detect common pathways Common pathways Scientific workflows Increasingly adopted in modern sciences. Transparent documentation of experimental methods Repeatable and configurable 29
  30. 30. ¡ A decayed or reduced ability to be executed or produce the same results ¡ To better understand workflow decay, we conducted an empirical analysis to identify the causes of workflow decay. ¡ To do so, we analyzed a sample of real workflows to determine if they suffer from decay and the reasons that caused their decay Khalid Belhajjame @ PoliWeb Workshop, 2014 30
  31. 31. ¡ Taverna workflows from myExperiment.org § Taverna 1 § Taverna 2 ¡ Selection process § By the creation year § By the creator § By the domain ¡ Software environment § Taverna 2.3 ¡ Experiment metadata § June-July 2012 § 4 researchers Khalid Belhajjame @ PoliWeb Workshop, 2014 31
  32. 32. Number of Taverna 1 workflows from 2007 to 2011 2007 2008 2009 2010 2011 Tested 12 10 10 10 4* Total 74 341 101 26 13 Number of Taverna 2 workflows from 2009 to 2012 2009 2010 2011 2012 Tested 12 10 15 9 Total 97 308 289 184 Khalid Belhajjame @ PoliWeb Workshop, 2014 32
  33. 33. Khalid Belhajjame @ PoliWeb Workshop, 2014 33
  34. 34. ¡ 75% of the 92 tested workflows failed to be either executed or produce the same result (if testable) ¡ Those from early years (2007-2009) had 91% failure rate Khalid Belhajjame @ PoliWeb Workshop, 2014 Taverna 1 Taverna 2 34
  35. 35. ¡ Manual analysis § By the validation report from Taverna workbench § By interpreting experiment results reported by Taverna ¡ Identified 4 categories of causes § Missing example data § Missing execution environment § Insufficient descriptions about workflows § Volatile third-party Resources ¡ Other unconsidered possible factors § Changes in the local operating environment (hardware, OS, middleware, compiler, etc) Khalid Belhajjame @ PoliWeb Workshop, 2014 35
  36. 36. Causes Refined Causes Examples Third party resources are not available Underlying dataset, particularly those locally hosted in-­‐house dataset, is no longer available Khalid Belhajjame @ PoliWeb Workshop, 2014 Researcher hosting the data changed institution, server is no longer available Services are deprecated DDBJ web services are not longer provided despite the fact that they are used in many myExperiment workflows Third party resources are available but not accessible Data is available but identified using different IDs than the ones known to the user Due to scalability reasons the input data is superseded by new one making the workflow not executable or providing wrong results Data is available but permission, certificate, or network to access it is needed Cannot get the input, which is a security token that can only be obtained by a registered user of ChemiSpider Services are available but need permission, certificate, or network to access and invoke them The security policies of the execution framework are updated due to new hosting institution rules Third party resources have changed Services are still available by using the same identifiers but their functionality have changed The web services are updated 36
  37. 37. ¡ 50% of the decay was caused by volatility of 3rd-party resource § Unavailable § Inaccessible § Updated ¡ Missing example data § Unable to re-run ¡ Missing execution environment § Such as local plugins ¡ Insufficient metadata § Such as any required dependency libraries or permission information Khalid Belhajjame @ PoliWeb Workshop, 2014 37
  38. 38. ScienEfic Workflow Reproducibility
  39. 39. ¡ Some services that compose workflows are annotated using concepts from domain ontologies ¡ Such annotaEons can be used to repair workflow § IdenEfy available services that can play the same role as an unavailable service within a workflow. Khalid Belhajjame @ PoliWeb Workshop, 2014 39
  40. 40. Task ontology: captures information about the action carried out by service operations within a domain of interest, e.g., Sequence_alignment and Protein_identification Domain ontology: captures information about the application domains covered by operation parameters, e.g., Protein_record and DNA_sequence Khalid Belhajjame @ PoliWeb Workshop, 2014 40
  41. 41. Task replaceability: For an operation op2 to be able to substitute an operation op1, op2 must fulfil a task that is equivalent to or subsumes the task op1 performs: Khalid Belhajjame @ PoliWeb Workshop, 2014 41
  42. 42. Parameter replaceability: To be compatible the domain of the output must be the same as or subconcept of the domain of the subsequent input. Khalid Belhajjame @ PoliWeb Workshop, 2014 42
  43. 43. While the method just presented is sound, its practical applicability is hindered by the following facts § Semantic annotations of web services are scarce. § Our experience suggests that a large proportion of existing semantic annotations suffer from inaccuracies § As a result, a substitute that is discovered for replacing an unavailable operation using such annotations may turn out to be unsuitable, and, inversely, a suitable substitute may be discarded. Khalid Belhajjame @ PoliWeb Workshop, 2014 43
  44. 44. Existing Workflow Specifications Provenance traces of missing operations Khalid Belhajjame @ PoliWeb Workshop, 2014 44
  45. 45. Formally, let wf1 be a workflow in which the operation op1 is unavailable. The operation op2 can replace the operation op1 in terms of its inputs and outputs if: Khalid Belhajjame @ PoliWeb Workshop, 2014 45
  46. 46. ¡ In addition to the compatibility in terms of inputs and outputs, we have to check that the candidate substitute performs a task compatible with that of the unavailable operation. ¡ To perform this test, we exploit the following observation. An operation op2 is able to replace the operation op1 in terms of task, if for every possible input instances that op1 is able to consume, op2 delivers the same output as that obtained by invoking op1. ¡ To perform the above test, however, we will have to call the missing operation op1! ¡ A solution that we adopt for overcoming the above problem makes use of workflow provenance logs. These are traces that contain intermediate data that were used as input and delivered as output by the constituent operations of a workflow when enacted. Khalid Belhajjame @ PoliWeb Workshop, 2014 46
  47. 47. § An operation op2 may be compatible in terms of task with op1 if: op2 delivers the same results that op1 delivered in past execuEons, that are logged within provenance logs, when fed using the same input values. § Notice that we say may be compatible. This is because we may not be able to compare the outputs obtained for every possible input value of the operation op1. Khalid Belhajjame @ PoliWeb Workshop, 2014 47
  48. 48. ¡ The condiEon that we have described for checking the suitability of an operation as a substitute for another one may be stronger than is required in practice. ¡ There are various parameter representations that are adopted in bioinformatics. ¡ Because of representation mismatch, a service operation that performs a task similar to the missing operation may be found to be unsuitable. Khalid Belhajjame @ PoliWeb Workshop, 2014 48
  49. 49. Example of values delivered by two operaEons using the same input value Value1 Value2 CosSym(value1,value2) = 0.007 Khalid Belhajjame @ PoliWeb Workshop, 2014 49
  50. 50. To overcome this problem, we use a two step process when comparing the values of parameters: 1. Given a parameter value, we derive its representaEon. 2. If the representaEon is associated with a key ahribute (idenEfier), extract the value of such an ahribute If two parameter values are associated with idenEfiers, then they are compared by comparing their idenEfiers. Khalid Belhajjame @ PoliWeb Workshop, 2014 50
  51. 51. Example of values delivered by two operaEons using the same input value Value1 Value2 Fasta Format Uniprot Format Khalid Belhajjame @ PoliWeb Workshop, 2014 51
  52. 52. ScienEfic Workflow Reproducibility
  53. 53. ¡ ScienEfic workflows are increasingly used by scienEsts as a means for specifying and enacEng their experiments. ¡ They tend to be data intensive ¡ The data sets obtained as a result of their enactment can be stored in public repositories to be queried, analyzed and used to feed the execuEon of other workflows. Khalid Belhajjame @ PoliWeb Workshop, 2014 53
  54. 54. ¡ The datasets obtained as a result of workflow execuEon oVen contain duplicates. ¡ As a result: § The analysis and interpretaEon of workflow results may become tedious. § The presence of duplicates also unnecessarily increases the size of workflow results. Khalid Belhajjame @ PoliWeb Workshop, 2014 54
  55. 55. ¡ Research in duplicate record detecEon has been acEve for more than three decades. § Elmagarmid et al., 2007 conducted a comprehensive survey of the topics. ¡ We do not aim to design yet another algorithm for comparing and matching records. ¡ Rather, we invesEgate how provenance traces produced as a result of workflow execuEons can be used to guide the detecEon of duplicate records in workflow results. Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. Du-­‐plicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1–16,2007. Khalid Belhajjame @ PoliWeb Workshop, 2014 55
  56. 56. ¡ A data driven workflow can be defined as a directed graph: wf = hN, Ei ¡ A node represent an analysis operaEon, which has a set of input and output parameters. hop, Iop, Oopi 2 N hhop, oi, hop0, iii 2 E ¡ The edges are dataflow dependencies: Khalid Belhajjame @ PoliWeb Workshop, 2014 56
  57. 57. The execuEon of workflows gives rise to provenance trace, which we capture using two relaEons. ¡ Transforma5on: to specify that the execuEon of an operaEon took as input a given ordered set of records and generated another ordered set of records. op, o1, ro1 , . . . , op, om, rom op, i1, ri1 , . . . , op, in, rin OutBop InBop ¡ Transfer: to specify transfer of records along the edges of the workflow. op , i , r op, o, r Khalid Belhajjame @ PoliWeb Workshop, 2014 57
  58. 58. To guide the detecEon of duplicates in workflow results we exploit the following fact: ¡ An operaEon that is known to be determinisEc produces idenEcal output bindings given the same input binding. deterministic op OutBop InBop T OutBop InBop T id OutBop, OutBop Khalid Belhajjame @ PoliWeb Workshop, 2014 58
  59. 59. Provenance-­‐Guided Detection of Duplicates: Example IdentifyProtein GetGOTerm Ri Ro R’i R’o 1. The set of records Ri that are bound to the input parameter of the starEng operaEon are compared to idenEfy duplicate records. The result of this phase is a parEEon of disjoint sets of idenEcal records. i o i’ o’ Ri R1i Rni Khalid Belhajjame @ PoliWeb Workshop, 2014 59
  60. 60. Provenance-­‐Guided Detection of Duplicates: Example IdentifyProtein Ri Ro R’i R’o 2. The sets of records Ro, R’i GetGOTerm and R’o are parEEoned into sets of idenEcal records based on the parEEoning of Ri. For example: Ro R1o Rno Rio ro Ro s.t. ri Rii , IdentifyProtein, o, ro IdentifyProtein, i, ri i o i’ o’ Khalid Belhajjame @ PoliWeb Workshop, 2014 60
  61. 61. Provenance-­‐Guided Detection of Duplicates: Example ¡ In the example just described, the operaEons that compose the workflow have exactly one input and one output parameter. § However, the algorithm we developed supports operaEons with mulEple input and output parameters. ¡ NoEce that we assumes that the analysis operaEons that compose the workflow are determinisEc. This is not always the case. § This raises the quesEon as to how to determine that a given operaEon is determinisEc. Khalid Belhajjame @ PoliWeb Workshop, 2014 61
  62. 62. To verify the determinism of operaEons, we use an approach whereby operaEons are probed. 1. Given an operaEon op, we select examples values that can be used by the inputs of op, and invoke op using those values mulEple Emes. 2. If op produces idenEcal output values given idenEcal input values, then it is likely to be determinisEc, otherwise, it is not determinisEc. Khalid Belhajjame @ PoliWeb Workshop, 2014 62
  63. 63. To support duplicates detecEon in collecEon based workflows we need to be able to: ¡ Iden5fy when two collec5ons are iden5cal Two collecEons Ri and Rj are idenEcal if they are of the same size and there is a bijecEve mapping: that maps each record ri in Ri to a record rj in Rj such that ri and rj are idenEcal ¡ Iden5fy duplicates records between two collec5ons that are known to be iden5cal IdenEfy a bijecEve mapping that maps every ri in Ri to an idenEcal rj in Rj. map : Ri Rj Khalid Belhajjame @ PoliWeb Workshop, 2014 63
  64. 64. ScienEfic Workflow Reproducibility
  65. 65. ¡ Overwhelming for users who are not the developers ¡ Abstractions required for reporting ¡ Lineage queries result in very long trails Khalid Belhajjame @ PoliWeb Workshop, 2014 65
  66. 66. ¡ a.k.a. Shims D. Hull et al ¡ Dealing with data and protocol heterogeneities ¡ Local organization of data ~ 60% Garijo D., Alper. P., Belhajjame K. et al Khalid Belhajjame @ PoliWeb Workshop, 2014 66
  67. 67. Process-Wise and Data- Wise abstractions ¡ Sub-workflows § Not always a significant unit of function (e.g. aesthetic purposes) ¡ Bookmarked data links § Cluster the output signature § Further complicates workflow ¡ Components § Library dependent Khalid Belhajjame @ PoliWeb Workshop, 2014 67
  68. 68. ¡ A graph model for representing workflows ¡ Graph re-write rules for summarization IF performs certain function THEN re-write WF graph ! !!!!!! motifs reduction-primitives Khalid Belhajjame @ PoliWeb Workshop, 2014 68
  69. 69. ¡ Domain Independent categorization § Data-Oriented Nature § Resource/Implementation- Oriented Nature ¡ Captured In a lightweight OWL Ontology http://purl.org/net/wf-­‐motifs Khalid Belhajjame @ PoliWeb Workshop, 2014 69
  70. 70. Pure Dataflows W= N,E! Operation and Port Nodes N = (Nop U Np)! ! Dataflow edges E = (Eopèp U Epèp U Epèop )! ! Khalid Belhajjame @ PoliWeb Workshop, 2014 70
  71. 71. DataRetrieva l DataMovingl motifs(color_pathway_by_objects) = {m1:DataRetrieval}! motifs(Get_Image_From_URL_2) = {m2:DataMoving}! ! Khalid Belhajjame @ PoliWeb Workshop, 2014 71
  72. 72. ¡ Collapse (Up/Down) ¡ Eliminate Khalid Belhajjame @ PoliWeb Workshop, 2014 72
  73. 73. Khalid Belhajjame @ PoliWeb Workshop, 2014 73
  74. 74. Khalid Belhajjame @ PoliWeb Workshop, 2014 74
  75. 75. Khalid Belhajjame @ PoliWeb Workshop, 2014 75
  76. 76. ¡ Strategies as a set of rules for summarization ¡ Two sample strategies based on an empirical analysis of workflows ¡ Reporting: § Process: Significant activities (Retrieval, Analysis, Visualization) § Data: § Reduced cardinality § Stripped of protocol specific payload/formatting Khalid Belhajjame @ PoliWeb Workshop, 2014 76
  77. 77. ¡ By-Eliminate § Minimal annotation effort § Single rule ¡ By Collapse § More specific annotation § Multiple rules Khalid Belhajjame @ PoliWeb Workshop, 2014 77
  78. 78. Workflow Designer Taverna Workbench Motif Ontology WF Summary WF Description Summarizer Summarization Rules Khalid Belhajjame @ PoliWeb Workshop, 2014 78
  79. 79. ¡ 30 Workflows from the Taverna system ¡ Entire dataset queries accessible from http://www.myexperiment.org/packs/467.html ¡ Manual Annotation using Motif Vocabulary Khalid Belhajjame @ PoliWeb Workshop, 2014 79
  80. 80. By-Collapse ¡ Causal Ordering of operations ¡ Reduced depth By-Elimination Khalid Belhajjame @ PoliWeb Workshop, 2014 80
  81. 81. Khalid Belhajjame @ PoliWeb Workshop, 2014 81
  82. 82. Khalid Belhajjame @ PoliWeb Workshop, 2014 82
  83. 83. Khalid Belhajjame @ PoliWeb Workshop, 2014 83
  84. 84. Khalid Belhajjame @ PoliWeb Workshop, 2014 84
  85. 85. ¡ Establishing Trust, but also understanding and reusability, in Computational Science is more than ever needed ¡ Reproducibility seems to be a cost-effective solution ¡ A number of tools and methods have been developed for doing so. ¡ However, …. that is not enough ¡ Changing our ways (culture) of doing science is more challenging Khalid Belhajjame @ PoliWeb Workshop, 2014 85
  86. 86. ¡ Pinar Alper ¡ Óscar Corcho ¡ Fernando ChirigaE ¡ Juliana Freire ¡ David De Roure ¡ Yolanda Gil ¡ Daniel Garijo ¡ Carole Goble ¡ David Koop ¡ SEan Soiland-­‐Reyes ¡ Paolo Missier ¡ Jun Zhao ¡ and many others … Khalid Belhajjame @ PoliWeb Workshop, 2014 86

×