SlideShare a Scribd company logo
ReComp@Scalable
Newcaslte,May9,2016
Your data won’t stay smart forever:
exploring the temporal dimension of (big data) analytics
Paolo Missier, Jacek Cala, Manisha Rathi
Scalable Computing Group Seminar
Newcastle, May 2016
(*) Painting by Johannes Moreelse
(*)
Panta Rhei (Heraclitus)
ReComp@Scalable
Newcaslte,May9,2016
Data to Knowledge
Meta-knowledge
Big
Data
The Big
Analytics
Machine
Algorithms
Tools
Middleware
Reference
datasets
“Valuable
Knowledge”
The Data-to-knowledge axiom of the Knowledge Economy:
ReComp@Scalable
Newcaslte,May9,2016
The missing element: time
Big
Data
The Big
Analytics
Machine
“Valuable
Knowledge”
V3
V2
V1
Meta-knowledge
Algorithms
Tools
Middleware
Reference
datasets
t
t
t
ReComp@Scalable
Newcaslte,May9,2016
The ReComp decision support system
Observe change
• In big data
• In meta-knowledge
Assess and
measure
• knowledge decay
Estimate
• Cost and benefits of refresh
Enact
• Reproduce (analytics)
processes
Big
Data
The Big
Analytics
Machine
“Valuable
Knowledge”
V3
V2
V1
Meta-knowledge
Algorithms
Tools
Middleware
Reference
datasets
t
t
t
ReComp@Scalable
Newcaslte,May9,2016
ReComp
Change
Events
Diff(.,.)
functions
“business
Rules”
Prioritised KAs
Cost estimates
Reproducibility
assessment
ReComp
Decision
Support
System
Previously
Computed KAs
And their metadata
Observe
change
Assess and
measure
Estimate
Enact
KA: Knowledge Assets
ReComp@Scalable
Newcaslte,May9,2016
ReComp scenarios
ReComp scenario Target Impact areas Why is ReComp
relevant?
Proof of concept
experiments
Expected
optimisation
Dataflow,
experimental
science
Genomics - Rapid Knowledge
advances
- Rapid scaling up
of genetic testing
at population level
WES/SVI pipeline,
workflow
implementation
(eScience Central)
Timeliness and
accuracy of patient
diagnosis subject to
budget constraints
Time series analysis - Personal health
monitoring
- Smart city
analytics
- IoT data streams
- Rapid data drift
- Cost of computation
at network edge (eg
IoT)
NYC taxi rides
challenge (DEBS’15)
Use of low-power
edge devices when
outcome is
predictable and data
drift is low
Data layer
optimisation
Tuning of large-scale
Data management
stack
Optimal Data
organisation sensitive
to current data
profiles
Graph DB re-
partitioning
System throughput vs
cost of re-tuning
Model learning Applications of
predictive analytics
Predictive models are
very sensitive to data
drift
Twitter content
analysis
Sustained model
predictive power over
time vs retraining
cost
Simulation TBD repeated simulation.
Computationally
expensive but often
not beneficial
Flood modelling /
CityCat Newcastle
Computational
resources vs
marginal benefit of
new simulation model
ReComp@Scalable
Newcaslte,May9,2016
Data-intensive systems: two properties
1. Observability (transparency)
How much of a data-intensive system can we observe?
• structure + data flow
2. Control
How much control do we have on the system?
• Execution frequency, total, partial
• Input density
ReComp@Scalable
Newcaslte,May9,2016
Observability / transparency
White box Black box
Structure
(static view)
Dataflow
- eScience Central, Taverna, VisTrails…
Scripting:
- R, Matlab, Python...
- Functions semantics
- Packaged components
- Third party services
Data
dependencies
(runtime
view)
Provenance recording:
• Inputs,
• Reference datasets,
• Component versions,
• Outputs
• Input
• Outputs
• No data dependencies
• No details on individual
components
Cost • Detailed resource monitoring
• Cloud  £££
• Wall clock time
• Service pricing
• Setup time (eg model
learning)
ReComp@Scalable
Newcaslte,May9,2016
Example: genomics / variant interpretation
What changes:
- Patient variants  improved sequencing / variant calling
- ClinVar, OMIM evolve rapidly
- New reference data sources
ReComp@Scalable
Newcaslte,May9,2016
x11
x12 y11
P
D11 D12
White box ReComp
Observables:
Inputs X = {xi1, x12, …}
Outputs y = {yi1, yi2,…}
Dependencies D11, D12, ...
Provenance prov(y)
Cost(y)
Process structure P = P1.P2…Pk
Measurables changes:
Input diff: δ(xt, xt+1)
Output diff: δ(yt, yt+1)
Dependency diff: δ(Dt, Dt+1)
For each run i:
Control:
Complete / partial rerun
ReComp@Scalable
Newcaslte,May9,2016
A history of runs
x11
x12 y1
P
D11 DCV
Run 1,
Patient A
x21
x22 y2
P
D21 DCV
Run 2,
Patient B
H = {<x,D,P,y, prov(y), cost(y)>}
Note on dependencies Dij:
Fine-grained: Dij are the results of a query to a dependent data source (OMIM)
Coarse-grained: only record that (a specific version of) OMIM has been used
ReComp@Scalable
Newcaslte,May9,2016
ReComp questions
• Scoping:
Which patients (from a large cohort) are going to be affected by
change in input/reference data?
• Impact:
For each patient in scope, how likely is each patient’s diagnosis to
change?
Approach:
Given Dt+1 and changes δ(Dt, Dt+1):
For each patient X and outcome y, query prov(y) to find references
to Dt.
Patient X is in scope if prov(y) ∩ δ(Dt, Dt+1) is not empty
Paolo, Missier, Cala Jacek, and Eldarina Wijaya. “The Data, They Are a-Changin’.” In
Proc. TAPP’16 (Theory and Practice of Provenance), edited by Sarah Cohen-Boulakia.
Washington D.C., USA: USENIX Association, 2016. https://arxiv.org/abs/1604.06412.
ReComp@Scalable
Newcaslte,May9,2016
Control:
Simulation rerun
Grid resolution
Regional boundaries
Observables:
Inputs X = {xi1, x12, …}
Outputs y = {yi1, yi2,…}
Cost(y)
Measurables changes:
Input diff: δ(xt, xt+1)
CityCAT (City Catchment Analysis Tool)
Inputs
• Topography (DEMs from LIDAR)
• Physical structures (buildings etc.)
• Landuse data
Outputs
• high resolution grid of flood depths
Example: flood modelling in Newcastle
ReComp@Scalable
Newcaslte,May9,2016
Example: model learning pattern
Observables:
Outputs y = {yi1, yi2,…}
Cost(y) (retraining)
Measurables changes:
Output quality relative to
ground truth: qty(yt)
Black box ReComp:
Control:
Request to retrain
ReComp@Scalable
Newcaslte,May9,2016
Black box ReComp – I
• When does the model require retraining?
• What is the expected cost and benefit of re-training the model at
any given time?
The Velox system / Berkeley AMP Lab:
Crankshaw, Daniel, Peter Bailis, Joseph E Gonzalez, Haoyuan Li, Zhao Zhang, Michael J Franklin, Ali
Ghodsi, and Michael I Jordan. “The Missing Piece in Complex Analytics: Low Latency, Scalable
Model Management and Serving with Velox.” In Procs CIDR 2015, Seventh Biennial Conference on
Innovative Data Systems Research, Asilomar, CA,
USAhttp://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper19u.pdf.
ReComp@Scalable
Newcaslte,May9,2016
Example: time series pattern
Control input data drift:
• How often and how densely should we sample from the stream to
keep the output sufficiently current?
Black box ReComp – top-k most frequent NYC taxi routes over time
Observables:
Outputs y = {yi1, yi2,…}
Cost(y)
Measurables changes:
Output diff: δ(yt, yt+1)
Control:
Sample frequency /
Sample density
ReComp@Scalable
Newcaslte,May9,2016
Example: graph repartitioning
• Taper Partitioner optimises for a given query
workload
• Performance of a partitioning is well defined
• # inter-partition traversals
• Performance degrades when query workload
changes
0
500000
1000000
1500000
2000000
2500000
3000000
0% 20% 40% 60% 80% 100% 120%
Inter-partitiontraversals
% Workload change
Inter-partition traversals vs. %
Workload change
Observables:
Outputs y = {yi1, yi2,…}
Cost(y) (re-partition)
Control:
Re-partition requests
Measurables
changes:
Output quality: #ipt
ReComp@Scalable
Newcaslte,May9,2016
A summary of ReComp problems
Observables:
Inputs X = {xi1, x12, …}
Outputs y = {yi1, yi2,…}
Dependencies D11, D12, ...
Provenance prov(y)
Cost(y)
Process structure P = P1.P2…Pk
Measurables changes:
Input diff: δ(xt, xt+1)
Output diff: δ(yt, yt+1)
Dependency diff: δ(Dt, Dt+1)
Quality(y)
Control:
- Data selection
- Partial / complete rerun
Problem Requires Sample problems
Forwards
Scoping:
Identify affected population
subset
White box
• Dataflow analytics,
• Simulation
• Many-runs problemsChange Impact analysis:
inputs, dependencies 
outputs
Backwards React to output instability /
input drift
Black box - Model learning
- Time series analytics
- Data-driven optimisation
ReComp@Scalable
Newcaslte,May9,2016
The next steps -- challenges
1. Optimisation:
Observables + control  reactive system
+ cost and utility functions  optimisation problems
2. Learning from history
Can we use history to learn estimates of impact without the need for
actual re-computation?
3. Software infrastructure and tooling
ReComp is a metadata management and analytics exercise
4. Reproducibility:
What really happens when I press the “ReComp” button?
5. Impact:
How do address key impact areas
- e-health
- Genomics
- Smart city management

More Related Content

What's hot

ReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case StudyReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case Study
Paolo Missier
 
Data Trajectories: tracking the reuse of published data for transitive credi...
Data Trajectories: tracking the reuse of published datafor transitive credi...Data Trajectories: tracking the reuse of published datafor transitive credi...
Data Trajectories: tracking the reuse of published data for transitive credi...
Paolo Missier
 
Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...
Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...
Paolo Missier
 
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
Pradeeban Kathiravelu, Ph.D.
 
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Ioannis Katakis
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
Tomonari Masada
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Till Blume
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
Ian Foster
 
A framework for mining signatures from event sequences and its applications i...
A framework for mining signatures from event sequences and its applications i...A framework for mining signatures from event sequences and its applications i...
A framework for mining signatures from event sequences and its applications i...
JPINFOTECH JAYAPRAKASH
 
DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
DataXDay Conference by Xebia
 
Big data
Big dataBig data
Big data
Zeeshan Khan
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
Skillwise Group
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
Pablo Bernabeu
 

What's hot (19)

ReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case StudyReComp and the Variant Interpretations Case Study
ReComp and the Variant Interpretations Case Study
 
Data Trajectories: tracking the reuse of published data for transitive credi...
Data Trajectories: tracking the reuse of published datafor transitive credi...Data Trajectories: tracking the reuse of published datafor transitive credi...
Data Trajectories: tracking the reuse of published data for transitive credi...
 
Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...
 
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data
 
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
A framework for mining signatures from event sequences and its applications i...
A framework for mining signatures from event sequences and its applications i...A framework for mining signatures from event sequences and its applications i...
A framework for mining signatures from event sequences and its applications i...
 
DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
 
Big data
Big dataBig data
Big data
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
 

Similar to Your data won’t stay smart forever: exploring the temporal dimension of (big data) analytics

Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Tomaso Aste
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
DataStax Academy
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Ian Foster
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
Dr Muhammad Adnan
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
Marco Quartulli
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Safe Software
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
Robert Grossman
 
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacionalio-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
CSUC - Consorci de Serveis Universitaris de Catalunya
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver2
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
Paul Groth
 

Similar to Your data won’t stay smart forever: exploring the temporal dimension of (big data) analytics (20)

Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacionalio-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
 
rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 

More from Paolo Missier

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Paolo Missier
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
Paolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Paolo Missier
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
Paolo Missier
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
Paolo Missier
 

More from Paolo Missier (20)

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Your data won’t stay smart forever: exploring the temporal dimension of (big data) analytics

  • 1. ReComp@Scalable Newcaslte,May9,2016 Your data won’t stay smart forever: exploring the temporal dimension of (big data) analytics Paolo Missier, Jacek Cala, Manisha Rathi Scalable Computing Group Seminar Newcastle, May 2016 (*) Painting by Johannes Moreelse (*) Panta Rhei (Heraclitus)
  • 2. ReComp@Scalable Newcaslte,May9,2016 Data to Knowledge Meta-knowledge Big Data The Big Analytics Machine Algorithms Tools Middleware Reference datasets “Valuable Knowledge” The Data-to-knowledge axiom of the Knowledge Economy:
  • 3. ReComp@Scalable Newcaslte,May9,2016 The missing element: time Big Data The Big Analytics Machine “Valuable Knowledge” V3 V2 V1 Meta-knowledge Algorithms Tools Middleware Reference datasets t t t
  • 4. ReComp@Scalable Newcaslte,May9,2016 The ReComp decision support system Observe change • In big data • In meta-knowledge Assess and measure • knowledge decay Estimate • Cost and benefits of refresh Enact • Reproduce (analytics) processes Big Data The Big Analytics Machine “Valuable Knowledge” V3 V2 V1 Meta-knowledge Algorithms Tools Middleware Reference datasets t t t
  • 6. ReComp@Scalable Newcaslte,May9,2016 ReComp scenarios ReComp scenario Target Impact areas Why is ReComp relevant? Proof of concept experiments Expected optimisation Dataflow, experimental science Genomics - Rapid Knowledge advances - Rapid scaling up of genetic testing at population level WES/SVI pipeline, workflow implementation (eScience Central) Timeliness and accuracy of patient diagnosis subject to budget constraints Time series analysis - Personal health monitoring - Smart city analytics - IoT data streams - Rapid data drift - Cost of computation at network edge (eg IoT) NYC taxi rides challenge (DEBS’15) Use of low-power edge devices when outcome is predictable and data drift is low Data layer optimisation Tuning of large-scale Data management stack Optimal Data organisation sensitive to current data profiles Graph DB re- partitioning System throughput vs cost of re-tuning Model learning Applications of predictive analytics Predictive models are very sensitive to data drift Twitter content analysis Sustained model predictive power over time vs retraining cost Simulation TBD repeated simulation. Computationally expensive but often not beneficial Flood modelling / CityCat Newcastle Computational resources vs marginal benefit of new simulation model
  • 7. ReComp@Scalable Newcaslte,May9,2016 Data-intensive systems: two properties 1. Observability (transparency) How much of a data-intensive system can we observe? • structure + data flow 2. Control How much control do we have on the system? • Execution frequency, total, partial • Input density
  • 8. ReComp@Scalable Newcaslte,May9,2016 Observability / transparency White box Black box Structure (static view) Dataflow - eScience Central, Taverna, VisTrails… Scripting: - R, Matlab, Python... - Functions semantics - Packaged components - Third party services Data dependencies (runtime view) Provenance recording: • Inputs, • Reference datasets, • Component versions, • Outputs • Input • Outputs • No data dependencies • No details on individual components Cost • Detailed resource monitoring • Cloud  £££ • Wall clock time • Service pricing • Setup time (eg model learning)
  • 9. ReComp@Scalable Newcaslte,May9,2016 Example: genomics / variant interpretation What changes: - Patient variants  improved sequencing / variant calling - ClinVar, OMIM evolve rapidly - New reference data sources
  • 10. ReComp@Scalable Newcaslte,May9,2016 x11 x12 y11 P D11 D12 White box ReComp Observables: Inputs X = {xi1, x12, …} Outputs y = {yi1, yi2,…} Dependencies D11, D12, ... Provenance prov(y) Cost(y) Process structure P = P1.P2…Pk Measurables changes: Input diff: δ(xt, xt+1) Output diff: δ(yt, yt+1) Dependency diff: δ(Dt, Dt+1) For each run i: Control: Complete / partial rerun
  • 11. ReComp@Scalable Newcaslte,May9,2016 A history of runs x11 x12 y1 P D11 DCV Run 1, Patient A x21 x22 y2 P D21 DCV Run 2, Patient B H = {<x,D,P,y, prov(y), cost(y)>} Note on dependencies Dij: Fine-grained: Dij are the results of a query to a dependent data source (OMIM) Coarse-grained: only record that (a specific version of) OMIM has been used
  • 12. ReComp@Scalable Newcaslte,May9,2016 ReComp questions • Scoping: Which patients (from a large cohort) are going to be affected by change in input/reference data? • Impact: For each patient in scope, how likely is each patient’s diagnosis to change? Approach: Given Dt+1 and changes δ(Dt, Dt+1): For each patient X and outcome y, query prov(y) to find references to Dt. Patient X is in scope if prov(y) ∩ δ(Dt, Dt+1) is not empty Paolo, Missier, Cala Jacek, and Eldarina Wijaya. “The Data, They Are a-Changin’.” In Proc. TAPP’16 (Theory and Practice of Provenance), edited by Sarah Cohen-Boulakia. Washington D.C., USA: USENIX Association, 2016. https://arxiv.org/abs/1604.06412.
  • 13. ReComp@Scalable Newcaslte,May9,2016 Control: Simulation rerun Grid resolution Regional boundaries Observables: Inputs X = {xi1, x12, …} Outputs y = {yi1, yi2,…} Cost(y) Measurables changes: Input diff: δ(xt, xt+1) CityCAT (City Catchment Analysis Tool) Inputs • Topography (DEMs from LIDAR) • Physical structures (buildings etc.) • Landuse data Outputs • high resolution grid of flood depths Example: flood modelling in Newcastle
  • 14. ReComp@Scalable Newcaslte,May9,2016 Example: model learning pattern Observables: Outputs y = {yi1, yi2,…} Cost(y) (retraining) Measurables changes: Output quality relative to ground truth: qty(yt) Black box ReComp: Control: Request to retrain
  • 15. ReComp@Scalable Newcaslte,May9,2016 Black box ReComp – I • When does the model require retraining? • What is the expected cost and benefit of re-training the model at any given time? The Velox system / Berkeley AMP Lab: Crankshaw, Daniel, Peter Bailis, Joseph E Gonzalez, Haoyuan Li, Zhao Zhang, Michael J Franklin, Ali Ghodsi, and Michael I Jordan. “The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox.” In Procs CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USAhttp://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper19u.pdf.
  • 16. ReComp@Scalable Newcaslte,May9,2016 Example: time series pattern Control input data drift: • How often and how densely should we sample from the stream to keep the output sufficiently current? Black box ReComp – top-k most frequent NYC taxi routes over time Observables: Outputs y = {yi1, yi2,…} Cost(y) Measurables changes: Output diff: δ(yt, yt+1) Control: Sample frequency / Sample density
  • 17. ReComp@Scalable Newcaslte,May9,2016 Example: graph repartitioning • Taper Partitioner optimises for a given query workload • Performance of a partitioning is well defined • # inter-partition traversals • Performance degrades when query workload changes 0 500000 1000000 1500000 2000000 2500000 3000000 0% 20% 40% 60% 80% 100% 120% Inter-partitiontraversals % Workload change Inter-partition traversals vs. % Workload change Observables: Outputs y = {yi1, yi2,…} Cost(y) (re-partition) Control: Re-partition requests Measurables changes: Output quality: #ipt
  • 18. ReComp@Scalable Newcaslte,May9,2016 A summary of ReComp problems Observables: Inputs X = {xi1, x12, …} Outputs y = {yi1, yi2,…} Dependencies D11, D12, ... Provenance prov(y) Cost(y) Process structure P = P1.P2…Pk Measurables changes: Input diff: δ(xt, xt+1) Output diff: δ(yt, yt+1) Dependency diff: δ(Dt, Dt+1) Quality(y) Control: - Data selection - Partial / complete rerun Problem Requires Sample problems Forwards Scoping: Identify affected population subset White box • Dataflow analytics, • Simulation • Many-runs problemsChange Impact analysis: inputs, dependencies  outputs Backwards React to output instability / input drift Black box - Model learning - Time series analytics - Data-driven optimisation
  • 19. ReComp@Scalable Newcaslte,May9,2016 The next steps -- challenges 1. Optimisation: Observables + control  reactive system + cost and utility functions  optimisation problems 2. Learning from history Can we use history to learn estimates of impact without the need for actual re-computation? 3. Software infrastructure and tooling ReComp is a metadata management and analytics exercise 4. Reproducibility: What really happens when I press the “ReComp” button? 5. Impact: How do address key impact areas - e-health - Genomics - Smart city management

Editor's Notes

  1. The times they are a’changin