Domains such as drug discovery, data science, and policy studies increasing rely on the combination of complex analysis pipelines with integrated data sources to come to conclusions. A key question then arises is what are these conclusions based upon? Thus, there is a tension between integrating data for analysis and understanding where that data comes from (its provenance). In this talk, I describe recent work that is attempting to facilitate transparency by combining provenance tracked within databases with the data integration and analytics pipelines that feed them. I discuss this with respect to use cases from public policy as well as drug discovery.
Given at: http://ccct.uva.nl/content/ccct-seminar-21-february-2014