The data, they are a-changin’

Professor of Big Data Analytics
Jun. 10, 2016
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
The data, they are a-changin’
1 of 20

More Related Content

What's hot

Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBTPaolo Missier
Selective and incremental re-computation in reaction to changes: an exercise ...Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Paolo Missier
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier

What's hot(20)

Similar to The data, they are a-changin’

Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Paolo Missier
YesWorkflow: Yes, Scripts can be Workflows, Too!YesWorkflow: Yes, Scripts can be Workflows, Too!
YesWorkflow: Yes, Scripts can be Workflows, Too!Bertram Ludäscher
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...The Statistical and Applied Mathematical Sciences Institute
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18Matt Yang
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisAntica Culina
Learning Pulse - paper presentation at LAK17Learning Pulse - paper presentation at LAK17
Learning Pulse - paper presentation at LAK17Daniele Di Mitri

Similar to The data, they are a-changin’ (20)

More from Paolo Missier

Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier

More from Paolo Missier(20)

Recently uploaded

Swiss Re Reinsurance Solutions - Claims Automated Rules Engine – Insurer Inno...Swiss Re Reinsurance Solutions - Claims Automated Rules Engine – Insurer Inno...
Swiss Re Reinsurance Solutions - Claims Automated Rules Engine – Insurer Inno...The Digital Insurer
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
Mitigating Third-Party Risks: Best Practices for CISOs in Ensuring Robust Sec...Mitigating Third-Party Risks: Best Practices for CISOs in Ensuring Robust Sec...
Mitigating Third-Party Risks: Best Practices for CISOs in Ensuring Robust Sec...TrustArc
Google Cloud Study Jams Info SessionGoogle Cloud Study Jams Info Session
Google Cloud Study Jams Info SessionGDSCPCCE
web test repair.pptxweb test repair.pptx
web test repair.pptxYuanzhangLin
Scaling out with WordPressScaling out with WordPress
Scaling out with WordPressKonstantin Kovshenin

The data, they are a-changin’

Editor's Notes

  1. The problem of sleective recomputaion summarises the main problems in computational reproducibility. This seems too broad, so we need to fous on specific regions in this problem space. We do this through a running example SVI: workflow, white box, many observables, control over provenance traces
  2. that associates a class label to each input variant depending on their estimated dele- teriousness, using a simple “traffic light” notation
  3. \diffin(x_i^v, x_i^{v'})
  4. \diffd(D_i^v, D_i^{v'}) d_{ij} \in \diffd(D_i^v, D_i^{v'}) \texttt{used}(P_j, d_{ij}, [\texttt{prov:role} = \texttt{'dep'}]) \in \mathit{prov(\y^v)}
  5. \diffd(D_i^v, D_i^{v'}) d_{ij} \in \diffd(D_i^v, D_i^{v'}) \texttt{used}(P_j, \D_i, [\texttt{prov:role} = \texttt{'dep'}]), \quad d_{ij} \in D_i
  6. \diffo(y_i^v, y_i^{v'})