SlideShare a Scribd company logo
1 of 59
Download to read offline
Prof. Paolo Missier
School of Computer Science
University of Birmingham, UK
April 5th, 2024
Towards explanations for Data-Centric AI
using provenance records
My contacts:
2
<event
name>
Outline
• Basics of data provenance for DAG pipelines
• Provenance in the context of Data-Centric AI use cases: Levels of detail / granularity
• Data Provenance for Data Science: methods and tooling
• Challenge: Why+provenance
3
<event
name>
Summary of data-centric use cases
1. Model-driven incremental data cleaning
1. Training set cleaning
2. Label correction
2. Training set optimization
1. Removing hard/easy examples
2. Reducing redundancies
4
<event
name>
Summary of data-centric use cases
Context Type of operation strategy Data processing and model training
ActiveClean Select items from training set
for manual cleaning
Item transformation:
x -> x’
Iterative batch cleaning strategy
driven by SGD
ActiveClean processing is interleaved with model
training, both stop at the same time.
Training set debugging Select items from training set
for label correction
Item transformation:
y -> y’
Aims to rank data points and
minimize manual corrections
The re-labelling strategy is incremental and
interleaved with model retraining. However,
winning strategy not published and thus its
generalizability is not clear.
Training set optimization,
reducing redundancy by
removing similar points
Prune items from training set
Filtering:
remove (y)
Cluster data points in
embedded space, select
representatives from each
cluster
Training set pruning happens before model
training
Training set optimization,
reducing redundancy by
pruning hard/easy examples
Prune items from training set Identify simple / hard examples,
sample from those depending
on training set size
Training set pruning happens before model
training
5
<event
name>
Reproducibility, explainability
The use cases provide examples of complex data transformations and data filtering operators
We aim to answer three types of questions:
• Which data transformations were applied to raw input dataset(s) to generate the final
training set used for modelling?
• Dataset level
• Which of the individual data items were affected by each of the transformations, and what
was the effect?
• Data item level
• Why was a specific data item transformed?
6
Representing provenance
A formal, interoperable data model and syntax for generic provenance constructs
- accommodates layers I, II
- extensible to a domain vocabulary à eg DC-Check
Seedat, Nabeel, Fergus Imrie, and Mihaela van der Schaar. ‘DC-Check: A Data-Centric AI Checklist to Guide the Development of Reliable Machine
Learning Systems’. arXiv, 9 November 2022. http://arxiv.org/abs/2211.05764.
7
The W3C PROV model (2013)
processing
Input 1
Input n
usage
usage
Output 1
Output m
generation
generation
(derivation)
(derivation)
8
<event
name>
Basic data derivation pattern: transformation
Consider an abstract data transformation operator: 𝐷 → 𝐷ʹ
D D’
A
wasGeneratedBy
wasDerivedFrom
used
We can record the provenance of 𝐷ʹ as a derivation from 𝐷
- mediated by some abstract activity 𝐴 that represents the cleaning or pruning operations
9
<event
name>
Item-level data transformation (1-1)
This high-level provenance is not very informative if we want to account for how 𝐴 operates on each data item
In the simple examples, 𝐴 performs 1-1, item-wise transformations:
𝑥 ∈ 𝐷 → 𝑥ʹ ∈ 𝐷ʹ
where either 𝑥ʹ = 𝑥 or 𝑥ʹ is a clean version of 𝑥
A wasGeneratedBy
used
wasDerivedFrom
x1
xn
x’1
x’n
…
wasDerivedFrom
PROV-N representation
10
<event
name>
Item-level data transformation (1-many)
This notation can also be used to capture M-N transformations
- to represent the effects of data imputation using statistics that affect multiple data points simultaneously
- This can be achieved by adding relationship instances as needed
Example: {WasDerivedFrom(𝑥𝑖ʹ, 𝑦)}𝑖∶1,𝑛
Denotes a single value 𝑦 ∈ 𝐷 used to produce multiple values 𝑥1ʹ, ... , 𝑥𝑛ʹ
11
<event
name>
Unfolding process iterations
Di-1 D’
Ai
wgby
wasDerivedFrom
used
Ai-1
used
D
12
<event
name>
Item-level data selection
Here we only need to represent whether each input datapoint survives the selection operator
PROV can be used to assert that operator op has removed datapoint 𝑥 ∈ 𝐷 from its output 𝐷ʹ
There is actually no need to represent the provenance of the surviving datapoints
Suppose op removed 𝑚 items from 𝐷. Using PROV, this is asserted as:
13
<event
name>
Data derivation through pipelines
When operators are composed into pipelines, provenance is a composition of the corresponding provenance patterns
Consider a sequential pipeline consisting of abstract data processing operators op1 ... op𝑛 and a training operator Tr
Each op𝑖 takes an input dataset 𝐷 and produces an output 𝐷ʹ: 𝐷ʹ = op𝑖(𝐷)
Similarly, training takes some 𝐷 and produces a model 𝑀: 𝑀 = Tr(𝐷)
Starting from initial “raw” dataset 𝐷0, and denoting with 𝐷𝑖 the intermediate datasets, this pipeline can be written
as
{𝐷𝑖 = op𝑖(𝐷𝑖−1)}𝑛
𝑖∶1, 𝑀 = Tr(𝐷𝑛)
Corresponding provenance:
D0 OP1 D1 OPn Dn Tr M
…
Dn Tr M
used wgby
used
D0 OP1 D1
wgby
…
14
<event
name>
Extension to DAG topologies is straightforward
These assertions extend naturally to pipelines with multiple inputs and outputs --> Directed Acyclic Graphs
Example: inputs 𝐷0
𝑎, 𝐷0
𝑏 Dc
0 are processed independently and eventually merged into 𝐷𝑛:
Da
0 OP1 Da
1
Db
0 OP2 Db
1
Dc
0
OP3 Dbc
0
OP4 Dabc
3
Da
0 OP1 Da
1
Db
0 OP2 Db
1
Dc
0
OP3 Dbc
0
OP4 Dabc
3
used
used
used
used
wgby
wgby
15
IDEAL
2023
Data Cleaning simulation pattern
cleaning
priority
strategy
D’
Model
training
M’
Model
eval
Dtr
corrupt
labels
Dn
Fixed Training
code
Eval
Score
clean
Model
training
Competitor side Evaluator side
A noisy version Dn is generated from
Dtr (eg label flipping)
Target performance recorded by
training on Dtr and testing on Dtest
Strategies are scored based on number
of cleaning actions required to achieve
95% of target performance
- Corrupt some of the labels in Dtr à Dn
- Let Pn be the model performance when using Dn for training. Pn will be less than P
- Strategy must suggest ranking of examples in Dn such that by "cleaning" those in order,
performance increases approximating P
16
What can be learnt from this exercise?
cleaning
strategy
D’
Model
training
M’
eval
Dn
Mbest
MLOps
The challenge is effectively a simulation of a 2-levels iterative process:
Challenge winners will have developed and demonstrated new
strategies for training set debugging
However:
Strategy may be optimized for dataset Dn, task T, and the pre-
selected model
IDEAL
2023
17
Provenance and versioning
CSi
Di
Model
training
M’
eval
Dn
Mbest
MLOps
We would like to:
1. Document that Di was derived from Dn using
CSi, as part of a longer pipeline
2. Be able to identify:
1. What effect CSi had on Dn:
1. Which data labels were cleaned
2. Why they were cleaned
3. Make sure CSi can be reused safely:
1. Specify assumptions, pre-requisites
2. Provide examples of past usages
IDEAL
2023
18
Provenance layer I: whole dataset
Assumptions:
- Dn, Di atomic units of data
- CS atomic unit of processing
Reproducibility: “Outer layer” questions
- Where does Di come from?
- Which version Di was used to train Mbest?
Derivation:
Di was derived from Dn using CSi
Mbest was trained on Di
Attribution:
CSi was created by <creator C>
xnj xi
j
CS
wasGeneratedBy
used
C
wasAssociatedWith
wasDerivedFrom
19
Provenance layer I specification
xnj xi
j
CS
wasGeneratedBy
used
C
wasAssociatedWith
wasDerivedFrom
entity(D_noisy, [ prov:type=”training-set’])
entity(D_clean, [ prov:type=”training-set’])
entity(e1, [ prov:type=”training-data”, inSet=‘D_noisy’, index=j, val=V])
entity(e2, [ prov:type=”training-data”, inset=‘D_clean, index=j, val=W])
entity(C, [ prov:type=”prov:agent”, prov:type=“CS-creator”])
activity(CS, [ prov:type=”cleaning-strategy”, version=”v1.0”, desc=‘…’])
wasDerivedFrom(e2, e1)
used(CS,e1)
wasGeneratedBy(e2, CS)
wasAssociatedWith(CS,C)
IDEAL
2023
Surface representation: PROV-N
Internal representation:
Property-value graphs! Hint:
Neo4J works well…
20
Provenance layer II: data-granular provenance
Assumptions:
- Dn = {xnj}, Di = {xi
j}
- CS atomic unit of processing
Explainability: Data-level Questions
- which xnj were cleaned?
- “how dirty was Dn?”
in aggregate: how many labels were
cleaned to achieve a target performance?
Derivations:
for each xi
j that has been cleaned by CSi:
xi
j was derived from xnj
IDEAL
2023
21
Provenance layer II specification
Assumptions:
- Dn = {xnj}, Di = {xi
j}
- CS atomic unit of processing
Explainability: Data-level Questions
- which xnj were cleaned?
- “how dirty was Dn?”
in aggregate: how many labels were
cleaned to achieve a target performance?
Derivations:
for each xi
j that has been cleaned by CSi:
xi
j was derived from xnj
IDEAL
2023
xnj xi
j
CSi
wasGeneratedBy
used
C
wasAssociatedWith
wasDerivedFrom
xnj
xnj
xi
j
xi
j
22
<event
name>
Representing entanglements
The term “entanglement” denotes an iterative interleaving of data preparation and modelling
During generic iteration 𝑖:
- Assess processor takes a partially cleaned training set 𝐷𝑖 along with current model version 𝑀𝑖 trained on 𝐷𝑖
- determine next batch of items in 𝐷𝑖 to be cleaned
- Clean is a separate processor, yielding a new version 𝐷𝑖+1
- This is used to train 𝑀𝑖+1 etc
Train
D0
M1 Di+1
Assess Clean
Train
Cleaning
targets
Mi+1
23
<event
name>
Provenance of entanglements
- PROV can be used to express a provenance graph for this process
- the graph must capture an unfolding of the process execution over the set of its iterations
Starting from version 𝐷𝑖+1 of the data and move backwards in time:
- 𝐷𝑖+1 was generated by instance 𝑖 + 1 of the clean processor
- This took as input the batch of data items identified by assess𝑖+1 as targets for cleaning
- This required 𝑀𝑖, which was generated by the 𝑖-th training iteration and 𝐷𝑖
- PROV allows for annotations to be added to each entity, activity, and relationships
- These annotations may be drawn from:
- a standard vocabulary (role to qualify the role of a processor in the pipeline)
- custom vocabularies, for instance to associate performance metrics with each version of the model
Cleani Traini-1 Assessi+1 Cleani+1
Di Mi Di+1
targets
Cleani-1 Traini Assessi Cleani
Di-1 Mi-1 targets
24
Use case 2: training set optimisation
Motivation: training efficiency
à model performance (test loss) correlates with training data size D according to a power law [11]
However, “Since scalings with N (model size), D (training tokens), Cmin (compute budget) are power-laws,
there are diminishing returns with increasing scale.” [11]
[11] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language
models. arXiv preprint arXiv:2001.08361, 2020.
This motivates trying to optimize D:
1- Redundancy in D leads to wasted training time
2- Not all training examples are equally important for
training:
Ø which ones should be kept / removed?
IDEAL
2023
25
Training set optimization Task 1: reducing redundancy
[12] Abbas, Amro, Kushal Tirumala, Dániel Simig, Surya Ganguli, and Ari S. Morcos. ‘SemDeDup: Data-Efficient Learning at Web-Scale through
Semantic Deduplication’. arXiv, 22 March 2023. http://arxiv.org/abs/2303.09540.
Approach [12]:
1. Map the training set D to an embedded space – using pre-trained foundation models
2. Cluster all data points in embedded space using k-means
3. Using cosine similarity, identify similar points within each cluster. Threshold and select
IDEAL
2023
26
Training set optimization Task 2: pruning easy/hard examples
Main findings from [13]:
1. Not all training examples are created equal
• Hard vs easy
2. The best pruning strategy depends on the
amount of initial data
• Small TS à keep the easy examples
• Large TS à keep the hard examples
[13] Sorscher, Ben, et al. Beyond neural scaling laws: beating power law scaling via data pruning, Advances in Neural Information Processing
Systems 35 (2022): 19523-19536.
Repr from [13]
A real simple pruning method – very similar to Task 1
"To compute a self-supervised pruning metric for ImageNet, we perform k-means clustering
in the embedding space of an ImageNet pre-trained self-supervised model and define the
difficulty of each data point by the Euclidean distance to its nearest cluster centroid, or
prototype"
Caveat: only tested on ImageNet!
IDEAL
2023
27
Filter
Provenance for training set optimization
This is a classic filter pipeline – only a little more sophisticated:
TSfull TSopt
Filter
wasGeneratedBy
wasDerivedFrom
used
TSfull Embed Cluster Select TSopt
Layers I and II are very similar to Use Case 1:
Reproducibility:
- Where does TSopt come from?
à black / gray box options
TSfull TSopt
wasDerivedFrom
used
Embed Cluster Select
TSemb TSclus
used
wgby
used
wgby wgby
IDEAL
2023
28
Provenance for training set optimization / Layer II
Assumptions:
- TSfull = {ti}, TSopt = {ti}
- Filter is an atomic unit of processing
Explainability: Data-level Questions:
- which ti were filtered out?
- “how redundant was TSfull?”
Derivations:
for each ti that has been removed by Filter:
ti was invalidated by Filter
TSfull ti
Filter
wasInvalidatedBy
used ti
ti
IDEAL
2023
29
How can we generate these provenance graphs?
Key idea for Layer II (data-granular): Interpreter-level observer
- Requires observer at the boundaries of CS, i.e. to tell which x.label have changed
- Observer has access to individual dataframe elements
- But it is unaware of data transformation semantics
[14] A. Chapman, P. Missier, G. Simonelli, and R. Torlone. 2020. Capturing and querying fine-grained provenance of preprocessing
pipelines in data science. Proc. VLDB Endow. 14, 4 (December 2020), 507–520. https://doi.org/10.14778/3436905.3436911
[15] A. Chapman, L. Lauro, P. Missier, and R. Torlone. 2022. DPDS: assisting data science with data provenance. Proc. VLDB Endow. 15, 12
(2022), 3614–3617. https://doi.org/10.14778/3554821.3554857
Adriane Chapman, Luca Lauro, Paolo Missier, and Riccardo Torlone. 2024. Supporting Better Insights of Data Science Pipelines with Fine-
grained Provenance. ACM Trans. Database Syst. Just Accepted (February 2024). https://doi.org/10.1145/3644385
xnj xi
j
CSi
wasGeneratedBy
used
C
wasAssociatedWith
wasDerivedFrom
A starting point:
Data Provenance for Data Science (DPDS)
IDEAL
2023
30
Capturing provenance: Layer I
CSi Di
Model
training
Dn Mbest MLOps
Typical implementation:
- Pandas / Spark python pipeline / Dataframe datasets
- CS can be a method call or a code block:
Layer I (coarse): Process-level observer
1 - method call:
Di = CS(Dn)
2 - Code block:
Dn à
à Di
“Begin CS”
--
--
--
“End CS”
Dn Di
CS
wasGeneratedBy
wasDerivedFrom
used
wasDerivedFrom
used
wasGeneratedBy
IDEAL
2023
31
Running example: A simple pipeline
D1 D2 D3
Add
‘E4,’ ‘Ex’, ‘E1’
Remove ‘E’
D4 D6
Da
Db
Left join
(K1,K2)
Impute
all missing
Dc
Left join
(K1,K2)
Impute E,F
D5
One-hot encoding
df = pd.merge(df_A, df_B, on=['key1', 'key2'], how='left’) # join
df = df.fillna('imputed’) # Imputation
df = pd.merge(df, df_C, on=['key1', 'key2'], how='left’) #join
df = df.fillna(value={'E':'Ex', 'F':'Fx’}) # Imputation
# one-hot encoding
c = 'E'
dummies = []
dummies.append(pd.get_dummies(df[c]))
df_dummies = pd.concat(dummies, axis=1)
df = pd.concat((df, df_dummies), axis=1)
df = df_A.drop([c], axis=1)
32
Aims
Capture, store and query element-level provenance
- Derivation of each element of each intermediate dataframe (when possible)
- Efficiently, at scale
fillna
Join
df_1
df_B (df_0)
df_A (df_-1)
33
<event
name>
Granularity
Base case:
- opaque program Po
- coarse-grained dataset
Default provenance:
- Every output depends on every input
<latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit>
x2
<latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit>
x1
<latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit>
y2
<latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit>
y1
P0
<latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit>
x2
<latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit>
x1
<latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit>
y2
<latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit>
y1
- Transparent program PT
- Fine-grained datasets
PT
…
…
…
…
…
…
…
…
<latexit sha1_base64="WDFO0CJ+nkhQJarjpMsYWauNLLg=">AAAJ6XicjZZfb9s2EMDVrtvidH/a9XEvRIOgA7Zl9lBse6yzummAIvGKpC0QBwZFnRUiFKmRlF3D0Dfoyx5WFH3dJ9rjvs2OshtRlNpNgAHe3e+OR9/xpDgX3Nh+/59r1z+68fEnn271tm9+9vkXX966/dUzowrN4JQpofSLmBoQXMKp5VbAi1wDzWIBz+PLX539+Ry04Uqe2GUO5xlNJZ9xRi2qni7vTW/t9Pf61UPai8FmsRNtnvH09tbfk0SxIgNpmaDGnA36uT1fUW05E1BuTwoDOWWXNIUzXEqagTlfVamWZBc1CZkpjT9pSaX1PVY0M2aZxUhm1F6Y0OaUXbazws5+OV9xmRcWJFtvNCsEsYq4c5OEa2BWLHFBmeaYK2EXVFNm8d/ZxmciYcFUllGZrCa5VvNyNXHbcLuqpLJJJJDXgBPKMILvH1pVPKvNNDYtbyU8dxTKMMG0to8PQneQtXUkQyueuTYPUQjtXuxh2traVRCBJxuiEoMIizReesgi3V+GCJczUWClwOeoOXynTtoui2TWhB+C5nNIHmmVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn+PPdc8gyvoXtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va/AC0LAnZJSnI7wuDY4IoHIYEJxcH8x3Bi8Dn3K3DINyLwqswIUFHw4Mr5FtE3E1p7oUkyZR+76YkF4UhQ3SwOIF2d4nKQVOrdJiOVoX3Hx9UYuvOBhRNOznTFWyKLYYjKkCZUMYbQpUYIBpyryJOoqxVOJo2sbXcAWpY4JD241Vy2AcvrWv3uuXWcti+/yv7zN8swzJ15V7pvYH4tJNUOr+gMqSPK223BxWi22koxIf8ED80SuDYSRrpXynDOqr/rpEOb1cIjN17t37jDE8eh8Th0VENTOY4si7A0im+kluddXx60omqwrbYw6Nulss2mzSLnnQVfXz8xIvHqCDjssSPoEH4ydNePPtxb/DT3v3f7u882N98Dm1FX0d3o2+iQfRz9CB6HI2j04hFs+hV9Gf0unfZ+6P3pvd2jV6/tvG5EzWe3l//AiP5yLY=</latexit>
y0
<latexit sha1_base64="DwI/TEIjT7TE0TxGOcOt4qwCSCM=">AAAJ6nicjZZfb9s2EMDV7l+ctVu7Pu6FWBCkwNbMHoptj3VaNw1QJN6WtAXiwKCos0KUIjWSsmcY+gh96cOGYa/7Qnvst+lR9iKKUrsJMMC7+93x6DueFOeCG9vvv7l2/YMPP/r4k63e9qc3bn72+a3bXzwzqtAMzpgSSr+IqQHBJZxZbgW8yDXQLBbwPH750Nmfz0EbruSpXeZwkdFU8hln1KLql+Xe3vTWTn+/Xz2kvRhsFjvR5hlPb2/9M0kUKzKQlglqzPmgn9uLFdWWMwHl9qQwkFP2kqZwjktJMzAXqyrXkuyiJiEzpfEnLam0vseKZsYssxjJjNpLE9qcsst2XtjZjxcrLvPCgmTrjWaFIFYRd3CScA3MiiUuKNMccyXskmrKLP492/hMJCyYyjIqk9Uk12periZuG25XlVQ2iQTyGnBCGUbw/UOrime1mcam5a2E545CGSaY1vbxYegOsraOZGjFM9fmIQqh3Ys9TFtbuwoi8HRDVGIQYZHGSw9ZpAfLEOFyJgqsFPgcNUf/qpO2yyKZNeFHoPkcksdaZS2WLhrsonVIa/WpajCtP8Kp3gskICCl1j8DjVU7jDvWwbJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg1/fYc81z+AK2gubAeYZ9S7CWmylIlUCNXTspGA7d0dropLCAwsag3cr1mKYT1qMhi7bdLpKi2+BliUhuyQFea8wOCaIwmlIcHJxMN8QvAh8zt06DMK9KLwKExJ0NDy8Qr5GxN2U5l5Ikkzpd25KclEYMkQHixNod5eoHDS1SofpaFV4//FhJbbubEDRtJMzXcGm2GI4ogKUCWW8IVSJAaIh9yriJMpahaNpE1vLHaCGBQ5pP14lh33wm3XtXrfcWg7b939ln/mbZVimrtwrvTcQf+4klc4vqQzpk0rb7UGF6HYaCvE+P8SPjBI4dpJG+lfKsI7qv2ukw9sVAmP33q3fOMPTJyFxdHxcA5M5jqxLsHSKr+RWZ52cnXaiqrAt9ui4m+WyzSbNoiddRR+fPPXiMSrIuCzxI2gQfvK0F8++2x98v3//p/s7Dw42n0Nb0ZfRV9HdaBD9ED2InkTj6CxiURq9in6P/uiJ3uven72/1uj1axufO1Hj6f39FsSqyOc=</latexit>
y00
<latexit sha1_base64="6btFIAfnhYUQuiag2Q9KpTGQ07U=">AAAJ6nicjZZfb9s2EMDVdlvj7F/bPfaFWBBkwLbUHoq2j3U6Lw1QJN6WtAXiwKCos0KUIjWSsmsY+gh72cOGoq/7Qnvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+o0PPvzo5lZv++NPPv3s81u37zw3qtAMzpgSSr+MqQHBJZxZbgW8zDXQLBbwIn71xNlfzEEbruSpXeZwkdFU8hln1KLq59d7e9NbO/39fvWQ9mKwWexEm2c8vb311yRRrMhAWiaoMeeDfm4vVlRbzgSU25PCQE7ZK5rCOS4lzcBcrKpcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTWhzyi7beWFnjy5WXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYJdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jw9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4NmGqMQgwiKNlx6ySA+WIcLlTBRYKfA5ao7+VSdtl0Uya8Lfg+ZzSH7QKmuxdNFgF61DWqtPVYNp/RFO9V4gAQEptf4ZaKzaYdyxDpatkyqddZ3TWLxLHryWwxL7NQDpShDUaeR18KjVvzgmlE7AdSkuGfzyDnuueQZX0F7YDDDPqHcR1mIrFakSqKFjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWtwDWpaE7JIU5LeFwTFBFE5DgpOLg/mG4EXgc+7WYRDuReFVmJCgo+HhFfI1Iu6mNPdCkmRKv3NTkovCkCE6WJxAu7tE5aCpVTpMR6vC+48PK7F1ZwOKpp2c6Qo2xRbDERWgTCjjDaFKDBANuVcRJ1HWKhxNm9ha7gA1LHBI+/EqOeyD19a1e91yazls3/+VfeZvlmGZunKv9N5A/KmTVDq/pDKkTypttwcVottpKMT7/BA/Mkrg2Eka6V8pwzqq/66RDm9XCIzde7d+4wxPn4bE0fFxDUzmOLIuwdIpvpJbnXVydtqJqsK22KPjbpbLNps0i550FX188syLx6gg47LEj6BB+MnTXjz/bn/wYP/+j/d3Hh9sPoe2orvRl9FX0SB6GD2Onkbj6CxiURr9Gv0e/dETvd96b3pv1+j1axufL6LG0/vzH7skyOY=</latexit>
x00
<latexit sha1_base64="I6+n8F2FX3hZLaVhUUK85v5iUBg=">AAAJ6XicjZZfb9s2EMDVrtvi7F+7PfaFaBB0wLbUHoq1j3U6Nw1QJF6RtAXiwKCos0KEIjWSsmsY+gZ72cOKoq/7RHvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+kc3Pv7k063e9meff/HlVzdvff3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/jysbO/nIM2XMkTu8zhPKOp5DPOqEXV89d3pzd3+nv96iHtxWCz2Ik2z3h6a+uvSaJYkYG0TFBjzgb93J6vqLacCSi3J4WBnLJLmsIZLiXNwJyvqlRLsouahMyUxp+0pNL6HiuaGbPMYiQzai9MaHPKLttZYWcPz1dc5oUFydYbzQpBrCLu3CThGpgVS1xQpjnmStgF1ZRZ/He28ZlIWDCVZVQmq0mu1bxcTdw23K4qqWwSCeQ14IQyjOD7h1YVz2ozjU3LWwnPHYUyTDCt7eOD0B1kbR3J0Ipnrs1DFEK7F3uYtrZ2FUTg2YaoxCDCIo2XHrJI95chwuVMFFgp8DlqDv9VJ22XRTJrwj+D5nNInmiVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn2PPdc8gyvobtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va3ANaloTskhTkD4XBMUEUDkOCk4uD+Z7gReBz7tZhEO5F4VWYkKCj4cEV8h0i7qY090KSZEq/d1OSi8KQITpYnEC7u0TloKlVOkxHq8L7jw8qsXVnA4qmnZzpCjbFFsMRFaBMKOMNoUoMEA25VxEnUdYqHE2b2FruADUscEj78So57IPX1rV73XJrOWzf/5V95m+WYZm6cq/03kB83kkqnV9QGdLHlbbbgwrR7TQU4kN+iB8aJXDsJI30r5RhHdV/10iHtysExu69W79xhidPQ+Lw6KgGJnMcWRdg6RRfya3OOj496URVYVvs4VE3y2WbTZpFT7qKPj5+5sVjVJBxWeJH0CD85GkvXvy4N/hp7/4v93ce7W8+h7ai29Gd6NtoED2IHkVPo3F0GrFoFv0W/RG96V32fu+97b1bo9evbXy+iRpP789/ABp0yLU=</latexit>
x0
<latexit sha1_base64="jiqv4wWSi+neM/UI4QKRqZzPeBg=">AAAKCnicjZZfb+Q0EMBzx79u+deDR14MVXVIQNlFJ+DxtrD0Kp3aBbV3J3WrlePMpuYcO9jO7q2iPPPCV+GFBxDilS/AI9+GcXZpHCd3EGklz8xvxuPMeDZxLrixw+Hft26/9PIrr762M9h9/Y0333p77847j4wqNIMLpoTST2JqQHAJF5ZbAU9yDTSLBTyOn37l7I+XoA1X8tyuc7jKaCr5gjNqUTXfe39Wkmd356XQFSEzAQtLtVYrskYd/74is2q+tz88HNYP6S5G28V+tH2m8zs7f80SxYoMpGWCGnM5Gub2qqTaciag2p0VBnLKntIULnEpaQbmqqzPUpED1CRkoTT+pCW11vcoaWbMOouRzKi9NqHNKftsl4VdfHlVcpkXFiTbbLQoBLGKuBdDEq6BWbHGBWWaY66EXVNNmcXXt4vPTMKKqSyjMilnuVbLqpy5bbgta6lqEwnkDeCEKozg+4dWFS8aM41Nx1sJzx2FKkwwbezT49AdZGOdyNCKZ27MYxRCuxd7nHa2dhVE4OGWqMUgwiqN1x6ySo/WIcLlQhRYKfA5ak7+VSddl1WyaMNfg+ZLSL7RKuuwdNViV51DWqvPVYvpvAineiGQgICUWv8MNFbdMO5YR+vOSZXO+s5p8JL6G2/ksMR+DUC6EgR1mngdPOn0L44RpRNwXYpLBj88x55rnsENdDdsBlhm1LsIG7GTilQJNNCpk4Lt3B1tiFoKDyxoDN6t2IhhPmkxGbts03mZFp8CrXDuHZAU5CeFwTFBFE5LgpOLg/mY4EXgS+7WYRDuReF1mJCgk/HxDfIRIu6mtPdCkmRKP3dTkovCkDE6WJxABwdE5aCpVTpMR6vCe8fHtdi5swFF017O9AWbY4vhiApQJpTxhlAtBoiG3KuIkyjrFI6mbWwj94AaVjik/Xi1HPbBM+vavWm5jRy27//KPvM3y7BMfbnXem8gftdLKp1fUxnSZ7W234MK0e80FuJFfoifGCVw7CSt9G+UYR3Vf9dIh7crBKbuf7f5xxmfPwiJk9PTBpgtcWRdg6X4mSE7nXV2cd6LqsJ22JPTfpbLLpu0i570FX169tCLx6gg08p9BI3CT57u4tFnh6PPD+99e2///tH2c2gnei/6IPowGkVfRPejB9E0uohY9GP0c/Rr9Nvgp8Evg98Hf2zQ27e2Pu9GrWfw5z88a9Wd</latexit>
{x0
lr y0
ij}
<latexit sha1_base64="NqcxcrQ0rsfiFl3i99oBHxn5Teo=">AAAKC3icjZbNbtw2EICVpD9e989Jjr0QMQwXaOvuFkHaY9bt1jEQ2NvCTgJYxoKiZmUmFKmQ1G4Wgu699FV66aFF0WsfoMe+TUnt1qJIJa2ABTgz3wyHmuGskoJRpYfDv2/cvPXW2++8uzXYfu/9Dz78aOf2nSdKlJLAORFMyGcJVsAoh3NNNYNnhQScJwyeJi++sfanC5CKCn6mVwVc5jjjdE4J1kY127kXV+jV/qxiskYoZjDXWEqxRKt9o6TPaxTXs53d4cGweVC4GG0Wu9Hmmc5ub/0Vp4KUOXBNGFbqYjQs9GWFpaaEQb0dlwoKTF7gDC7MkuMc1GXVHKZGe0aTormQ5sc1arSuR4VzpVZ5Ysgc6yvl26yyz3ZR6vnXlxXlRamBk/VG85IhLZB9MyilEohmK7PARFKTKyJXWGKizfvbNk/MYUlEnmOeVnEhxaKuYrsN1VUj1V0ihaIFrFD7EVx/3yqSeWvGiQq8BXPcjVD7CWatfXrkuwNvrRPuW82ZW/PYCL7diT3Ogq1tBQ3weEM0ohdhmSUrB1lmhysfoXzOSlMpcDmsjv9Vp6HLMp134W9B0gWk30mRByxedthlcEit5ZnoMMGLsKo3AikwyLB2z4ATEYaxxzpcBScVMu87pzK31N14LfsldmsA3JbAq9PE6eBJ0L9mjgiZgu1SsyTw8jX2QtIcrqF9vxlgkWPnIqzFIBUuUmihEyt529k72hKN5B+Y4QScW7EW/XyycjK22WazKiu/AFybwbeHMuCfl8qMCSTMuERmclFQnyFzEeiC2rUfhDpRaBPGJ/BkfHSNfGoQe1O6exkS5UK+dlNUsFKhsXHQZgLt7SFRgMRaSD8dKUrnHR81YnBnPQpnvZzqCzYzLWZGlIcSJpQzhBrRQyQUTkWshElQOJx1sbXcA0pYmiHtxmtkvw9eadvubcutZb99/1f2ubtZbsrUl3ujdwbiD72kkMUV5j592mj7PTBj/U5jxt7kZ/BjJZgZO2kn/WulX0fx3zWS/u3ygan9323/ccZnj3zi+OSkBeKFGVlXoLH5zOBBZ52en/WiotQBe3zSz1Iesmm36Glf0aenj514BDM0re1H0Mj/5AkXT748GD04uP/9/d2Hh5vPoa3o4+he9Ek0ir6KHkaPoml0HpHox+jn6Nfot8FPg18Gvw/+WKM3b2x87kadZ/DnP+cO1c4=</latexit>
{x0
lr y00
ij}
<latexit sha1_base64="3PylV2aI3eJhnvBvxRESOyUdht8=">AAAKDHicjZZfj9w0EMDT8u/2+NfCIy8Wp9MhAdddVLU8dg+W60nV3YLu2kqX08pxZrPmHDvYzm5XUT4AL3wVXngAIV5555Fvg51dLo6dFiKt5Jn5zXicGc8mKRhVejj8+9bt115/4823dga7b7/z7nvv37n7wVMlSknggggm5PMEK2CUw4WmmsHzQgLOEwbPkuuvrP3ZEqSigp/rdQFXOc44nVOCtVHN7uzFFXpxcDCrFtc1QjGDucZSihVaWyX9vkZxbajh4bB5ULgYbRd70faZzu7u/BWngpQ5cE0YVupyNCz0VYWlpoRBvRuXCgpMrnEGl2bJcQ7qqmpOU6N9o0nRXEjz4xo1WtejwrlS6zwxZI71Qvk2q+yzXZZ6/uVVRXlRauBks9G8ZEgLZF8NSqkEotnaLDCR1OSKyAJLTLR5gbvmiTmsiMhzzNMqLqRY1lVst6G6aqS6S6RQtIAVaj+C6+9bRTJvzThRgbdgjrsRaj/BrLVPj3134K11wn2rOXNrHhvBtzuxx1mwta2gAZ5siUb0IqyyZO0gq+xo7SOUz1lpKgUuh9XJv+o0dFml8y78NUi6hPQbKfKAxasOuwoOqbU8Fx0meBFW9UogBQYZ1u4ZcCLCMPZYR+vgpELmfedU5pa6G29kv8RuDYDbEnh1mjgdPAn61wwSIVOwXWqWBH54ib2QNIcb6MBvBljm2LkIGzFIhYsUWujUSt529o62RCP5B2Y4AedWbEQ/n6ycjG222azKynuAazP49lEG/PNSmTGBhJmXyEwuCuozZC4CXVK79oNQJwptwvgEnoyPb5BPDWJvSncvQ6JcyJduigpWKjQ2DtpMoP19JAqQWAvppyNF6bzj40YM7qxH4ayXU33BZqbFzIjyUMKEcoZQI3qIhMKpiJUwCQqHsy62kXtACSszpN14jez3wQtt271tuY3st+//yj53N8tNmfpyb/TOQPyulxSyWGDu02eNtt8DM9bvNGbsVX4GP1GCmbGTdtK/Ufp1FP9dI+nfLh+Y2v/d9h9nfP7YJ05OT1sgXpqRtQCNzWcGDzrr7OK8FxWlDtiT036W8pBNu0VP+4o+PXvixCOYoWltP4JG/idPuHj6xeHoweH9b+/vPTrafg7tRB9FH0efRKPoYfQoehxNo4uIRD9GP0e/Rr8Nfhr8Mvh98McGvX1r6/Nh1HkGf/4DI8vV9A==</latexit>
{x00
hk y00
ij}
- Transparent pipeline
- Fine-grained datasets
P’T
…
…
…
…
…
…
…
…
<latexit sha1_base64="WDFO0CJ+nkhQJarjpMsYWauNLLg=">AAAJ6XicjZZfb9s2EMDVrtvidH/a9XEvRIOgA7Zl9lBse6yzummAIvGKpC0QBwZFnRUiFKmRlF3D0Dfoyx5WFH3dJ9rjvs2OshtRlNpNgAHe3e+OR9/xpDgX3Nh+/59r1z+68fEnn271tm9+9vkXX966/dUzowrN4JQpofSLmBoQXMKp5VbAi1wDzWIBz+PLX539+Ry04Uqe2GUO5xlNJZ9xRi2qni7vTW/t9Pf61UPai8FmsRNtnvH09tbfk0SxIgNpmaDGnA36uT1fUW05E1BuTwoDOWWXNIUzXEqagTlfVamWZBc1CZkpjT9pSaX1PVY0M2aZxUhm1F6Y0OaUXbazws5+OV9xmRcWJFtvNCsEsYq4c5OEa2BWLHFBmeaYK2EXVFNm8d/ZxmciYcFUllGZrCa5VvNyNXHbcLuqpLJJJJDXgBPKMILvH1pVPKvNNDYtbyU8dxTKMMG0to8PQneQtXUkQyueuTYPUQjtXuxh2traVRCBJxuiEoMIizReesgi3V+GCJczUWClwOeoOXynTtoui2TWhB+C5nNIHmmVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn+PPdc8gyvoXtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va/AC0LAnZJSnI7wuDY4IoHIYEJxcH8x3Bi8Dn3K3DINyLwqswIUFHw4Mr5FtE3E1p7oUkyZR+76YkF4UhQ3SwOIF2d4nKQVOrdJiOVoX3Hx9UYuvOBhRNOznTFWyKLYYjKkCZUMYbQpUYIBpyryJOoqxVOJo2sbXcAWpY4JD241Vy2AcvrWv3uuXWcti+/yv7zN8swzJ15V7pvYH4tJNUOr+gMqSPK223BxWi22koxIf8ED80SuDYSRrpXynDOqr/rpEOb1cIjN17t37jDE8eh8Th0VENTOY4si7A0im+kluddXx60omqwrbYw6Nulss2mzSLnnQVfXz8xIvHqCDjssSPoEH4ydNePPtxb/DT3v3f7u882N98Dm1FX0d3o2+iQfRz9CB6HI2j04hFs+hV9Gf0unfZ+6P3pvd2jV6/tvG5EzWe3l//AiP5yLY=</latexit>
y0
<latexit sha1_base64="DwI/TEIjT7TE0TxGOcOt4qwCSCM=">AAAJ6nicjZZfb9s2EMDV7l+ctVu7Pu6FWBCkwNbMHoptj3VaNw1QJN6WtAXiwKCos0KUIjWSsmcY+gh96cOGYa/7Qnvst+lR9iKKUrsJMMC7+93x6DueFOeCG9vvv7l2/YMPP/r4k63e9qc3bn72+a3bXzwzqtAMzpgSSr+IqQHBJZxZbgW8yDXQLBbwPH750Nmfz0EbruSpXeZwkdFU8hln1KLql+Xe3vTWTn+/Xz2kvRhsFjvR5hlPb2/9M0kUKzKQlglqzPmgn9uLFdWWMwHl9qQwkFP2kqZwjktJMzAXqyrXkuyiJiEzpfEnLam0vseKZsYssxjJjNpLE9qcsst2XtjZjxcrLvPCgmTrjWaFIFYRd3CScA3MiiUuKNMccyXskmrKLP492/hMJCyYyjIqk9Uk12periZuG25XlVQ2iQTyGnBCGUbw/UOrime1mcam5a2E545CGSaY1vbxYegOsraOZGjFM9fmIQqh3Ys9TFtbuwoi8HRDVGIQYZHGSw9ZpAfLEOFyJgqsFPgcNUf/qpO2yyKZNeFHoPkcksdaZS2WLhrsonVIa/WpajCtP8Kp3gskICCl1j8DjVU7jDvWwbJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg1/fYc81z+AK2gubAeYZ9S7CWmylIlUCNXTspGA7d0dropLCAwsag3cr1mKYT1qMhi7bdLpKi2+BliUhuyQFea8wOCaIwmlIcHJxMN8QvAh8zt06DMK9KLwKExJ0NDy8Qr5GxN2U5l5Ikkzpd25KclEYMkQHixNod5eoHDS1SofpaFV4//FhJbbubEDRtJMzXcGm2GI4ogKUCWW8IVSJAaIh9yriJMpahaNpE1vLHaCGBQ5pP14lh33wm3XtXrfcWg7b939ln/mbZVimrtwrvTcQf+4klc4vqQzpk0rb7UGF6HYaCvE+P8SPjBI4dpJG+lfKsI7qv2ukw9sVAmP33q3fOMPTJyFxdHxcA5M5jqxLsHSKr+RWZ52cnXaiqrAt9ui4m+WyzSbNoiddRR+fPPXiMSrIuCzxI2gQfvK0F8++2x98v3//p/s7Dw42n0Nb0ZfRV9HdaBD9ED2InkTj6CxiURq9in6P/uiJ3uven72/1uj1axufO1Hj6f39FsSqyOc=</latexit>
y00
<latexit sha1_base64="6btFIAfnhYUQuiag2Q9KpTGQ07U=">AAAJ6nicjZZfb9s2EMDVdlvj7F/bPfaFWBBkwLbUHoq2j3U6Lw1QJN6WtAXiwKCos0KUIjWSsmsY+gh72cOGoq/7Qnvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+o0PPvzo5lZv++NPPv3s81u37zw3qtAMzpgSSr+MqQHBJZxZbgW8zDXQLBbwIn71xNlfzEEbruSpXeZwkdFU8hln1KLq59d7e9NbO/39fvWQ9mKwWexEm2c8vb311yRRrMhAWiaoMeeDfm4vVlRbzgSU25PCQE7ZK5rCOS4lzcBcrKpcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTWhzyi7beWFnjy5WXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYJdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jw9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4NmGqMQgwiKNlx6ySA+WIcLlTBRYKfA5ao7+VSdtl0Uya8Lfg+ZzSH7QKmuxdNFgF61DWqtPVYNp/RFO9V4gAQEptf4ZaKzaYdyxDpatkyqddZ3TWLxLHryWwxL7NQDpShDUaeR18KjVvzgmlE7AdSkuGfzyDnuueQZX0F7YDDDPqHcR1mIrFakSqKFjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWtwDWpaE7JIU5LeFwTFBFE5DgpOLg/mG4EXgc+7WYRDuReFVmJCgo+HhFfI1Iu6mNPdCkmRKv3NTkovCkCE6WJxAu7tE5aCpVTpMR6vC+48PK7F1ZwOKpp2c6Qo2xRbDERWgTCjjDaFKDBANuVcRJ1HWKhxNm9ha7gA1LHBI+/EqOeyD19a1e91yazls3/+VfeZvlmGZunKv9N5A/KmTVDq/pDKkTypttwcVottpKMT7/BA/Mkrg2Eka6V8pwzqq/66RDm9XCIzde7d+4wxPn4bE0fFxDUzmOLIuwdIpvpJbnXVydtqJqsK22KPjbpbLNps0i550FX188syLx6gg47LEj6BB+MnTXjz/bn/wYP/+j/d3Hh9sPoe2orvRl9FX0SB6GD2Onkbj6CxiURr9Gv0e/dETvd96b3pv1+j1axufL6LG0/vzH7skyOY=</latexit>
x00
<latexit sha1_base64="I6+n8F2FX3hZLaVhUUK85v5iUBg=">AAAJ6XicjZZfb9s2EMDVrtvi7F+7PfaFaBB0wLbUHoq1j3U6Nw1QJF6RtAXiwKCos0KEIjWSsmsY+gZ72cOKoq/7RHvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+kc3Pv7k063e9meff/HlVzdvff3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/jysbO/nIM2XMkTu8zhPKOp5DPOqEXV89d3pzd3+nv96iHtxWCz2Ik2z3h6a+uvSaJYkYG0TFBjzgb93J6vqLacCSi3J4WBnLJLmsIZLiXNwJyvqlRLsouahMyUxp+0pNL6HiuaGbPMYiQzai9MaHPKLttZYWcPz1dc5oUFydYbzQpBrCLu3CThGpgVS1xQpjnmStgF1ZRZ/He28ZlIWDCVZVQmq0mu1bxcTdw23K4qqWwSCeQ14IQyjOD7h1YVz2ozjU3LWwnPHYUyTDCt7eOD0B1kbR3J0Ipnrs1DFEK7F3uYtrZ2FUTg2YaoxCDCIo2XHrJI95chwuVMFFgp8DlqDv9VJ22XRTJrwj+D5nNInmiVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn2PPdc8gyvobtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va3ANaloTskhTkD4XBMUEUDkOCk4uD+Z7gReBz7tZhEO5F4VWYkKCj4cEV8h0i7qY090KSZEq/d1OSi8KQITpYnEC7u0TloKlVOkxHq8L7jw8qsXVnA4qmnZzpCjbFFsMRFaBMKOMNoUoMEA25VxEnUdYqHE2b2FruADUscEj78So57IPX1rV73XJrOWzf/5V95m+WYZm6cq/03kB83kkqnV9QGdLHlbbbgwrR7TQU4kN+iB8aJXDsJI30r5RhHdV/10iHtysExu69W79xhidPQ+Lw6KgGJnMcWRdg6RRfya3OOj496URVYVvs4VE3y2WbTZpFT7qKPj5+5sVjVJBxWeJH0CD85GkvXvy4N/hp7/4v93ce7W8+h7ai29Gd6NtoED2IHkVPo3F0GrFoFv0W/RG96V32fu+97b1bo9evbXy+iRpP789/ABp0yLU=</latexit>
x0
Pn
T
<latexit sha1_base64="XR1xHgdabLZzPwu5UV3KZ4hDO0k=">AAAJ7nicjZZRb9s2EICVrtvidFvb7XEvxIKgA9al9lB0e4yzuWmAInGHpCkQBwZFnRU2FKmRlD1D0I/Yyx46FHvd39nj/k1J2Y0oUu0qwADv7rvjUXc8K84ZVbrf/2/jxkc3P/7k083e1q3PPv/i9p27Xz5XopAETolgQr6IsQJGOZxqqhm8yCXgLGZwFl/9bO1nc5CKCn6ilzlcZDjldEYJ1kZ1trw3LenLanpnu7/brx8ULgbrxXa0fsbTu5v/ThJBigy4JgwrdT7o5/qixFJTwqDamhQKckyucArnZslxBuqirPOt0I7RJGgmpPlxjWqt61HiTKllFhsyw/pS+Tar7LKdF3r200VJeV5o4GS10axgSAtkD48SKoFotjQLTCQ1uSJyiSUm2ryiLfNMOCyIyDLMk3KSSzGvyondhuqylqo2kUDeAFao/Aiuv28V8awx41gF3oI57kao/ATTxj4+8N2BN9YR963mzI15aATf7sQepsHWtoIGeLomatGLsEjjpYMs0v2lj1A+Y4WpFLgcVodv1UnoskhmbfgXkHQOyWMpsoDFixa7CA6ptTwRLSZ4EVb1XiABBinW7hlwLMIw9lj7y+CkQmZd51Ta3CUHXsl+id0aALcl8Oo0cjp4FPSvGRVCJmC71CwJ/PYOey5pBtfQPb8ZYJ5h5yKsxCAVLhJooCMredvZO9oQteQfmOEYnFuxEv180mI0tNmm0zItHgCuKoR2UAr8+0KZMYGEmYjITC4K6j4yF4HOqV37QagThdZhfAKPhgfXyHcGsTelvZchUSbkOzdFOSsUGhoHbSbQzg4SOUishfTTkaJw3vFBLQZ31qNw2smprmBT02JmRHkoYUI5Q6gWPURC7lTESpgEhcNpG1vJHaCEhRnSbrxa9vvgd23bvWm5ley37wdln7mbZaZMXbnXemcg/tpJCplfYu7Tx7W22wMz1u00ZOx9fgY/VIKZsZO00r9W+nUU/18j6d8uHxjb/93mH2d48sQnDo+OGmAyNyPrEjQ23xg86Kzj05NOVBQ6YA+PulnKQzZpFz3pKvr4+KkTj2CGxpX9CBr4nzzh4vkPu4NHuw+fPdze219/Dm1GX0ffRN9Gg+jHaC96Eo2j04hEV9Ef0avor17e+7P3uvf3Cr2xsfb5Kmo9vX/eAKS3yxI=</latexit>
y0
ij
<latexit sha1_base64="3NfXwQdj/FlcBXZom/kAk4DjJws=">AAAJ7XicjZZfb9s2EMDVdlvj7F//PO6FWBBswLbMLoq2j3VWLw1QJN6QpAXiwKCos8yGIlWSsmcI+g572cOGYa/9PH3stykpexFFqt0EGODd/e549B1PinNGle733167fuOjjz+5udXb/vSzz7/48tbtO2dKFJLAKRFMyBcxVsAoh1NNNYMXuQScxQyex5c/WfvzBUhFBT/RqxwuMpxyOqMEa6M6ezIt55fV9NZOf69fPyhcDDaLnWjzjKe3t95MEkGKDLgmDCt1Pujn+qLEUlPCoNqeFApyTC5xCudmyXEG6qKs063QrtEkaCak+XGNaq3rUeJMqVUWGzLDeq58m1V22c4LPXt0UVKeFxo4WW80KxjSAtmzo4RKIJqtzAITSU2uiMyxxESbf2jbPBMOSyKyDPOknORSLKpyYrehuqylqk0kkDeAFSo/guvvW0U8a8w4VoG3YI67ESo/wbSxjw98d+CNdcR9qzlzYx4awbc7sYdpsLWtoAGebYha9CIs03jlIMt0f+UjlM9YYSoFLofV4b/qJHRZJrM2/AQkXUDysxRZwOJli10Gh9RanogWE/wRVvVBIAEGKdbuGXAswjD2WPur4KRCZl3nVNrcJQdey36J3RoAtyXw6jRyOngU9K+ZFEImYLvULAm8eo89lzSDK+gbvxlgkWHnIqzFIBUuEmigIyt529k72hC15B+Y4RicW7EW/XzSYjS02abTMi1+BFxVCO2iFPgPhTJjAgkzEJGZXBTU98hcBLqgdu0HoU4UWofxCTwaHlwh3xnE3pT2XoZEmZDv3RTlrFBoaBy0mUC7u0jkILEW0k9HisL5jw9qMbizHoXTTk51BZuaFjMjykMJE8oZQrXoIRJypyJWwiQoHE7b2FruACUszZB249Wy3we/advuTcutZb99/1f2mbtZZsrUlXutdwbir52kkPkcc58+rrXdHpixbqchYx/yM/ihEsyMnaSV/pXSr6P47xpJ/3b5wNi+d5s3zvDkqU8cHh01wGRhRtYcNJ6aV3LQWcenJ52oKHTAHh51s5SHbNIuetJV9PHxMycewQyNK/sRNPA/ecLF2b29wYO9+7/c33m8v/kc2oq+ir6Ovo0G0cPocfQ0GkenEYleRr9Hf0Z/9UTvj97fvX/W6PVrG5+7UevpvX4HCebKrA==</latexit>
Dhk
<latexit sha1_base64="DYY2PTbqLLiFDxWxO/IiCvrO+XA=">AAAJ7nicjZZfb9s2EMDV7k/jdH/a9bEvxIKgBbZldlG0e6zTemmAIvGKpCkQBwZFnRUiFKmRlF1D0IfoSx9WFH3t19njvk1J2YsoUu0mwADv7nfHo+94UpwzqnS//8+Vq198+dXX1zZ6m9e/+fa772/c/OGFEoUkcEwEE/JljBUwyuFYU83gZS4BZzGDk/jisbWfzEEqKviRXuZwluGU0xklWBvVyas705JdVNMbW/2dfv2gcDFYL7ai9TOe3tz4e5IIUmTANWFYqdNBP9dnJZaaEgbV5qRQkGNygVM4NUuOM1BnZZ1vhbaNJkEzIc2Pa1RrXY8SZ0ots9iQGdbnyrdZZZfttNCz385KyvNCAyerjWYFQ1oge3iUUAlEs6VZYCKpyRWRcywx0eYv2jTPhMOCiCzDPCknuRTzqpzYbagua6lqEwnkDWCFyo/g+vtWEc8aM45V4C2Y426Eyk8wbezjPd8deGMdcd9qztyYh0bw7U7sYRpsbStogGdroha9CIs0XjrIIt1d+gjlM1aYSoHLYbX/rzoJXRbJrA0/AUnnkPwuRRaweNFiF8EhtZZHosUEf4RVfRZIgEGKtXsGHIswjD3W7jI4qZBZ1zmVNnfJgVeyX2K3BsBtCbw6jZwOHgX9a0aFkAnYLjVLAn9+wp5LmsEldMdvBphn2LkIKzFIhYsEGujASt529o42RC35B2Y4BudWrEQ/n7QYDW226bRMi18BVxVC2ygF/kuhzJhAwkxEZCYXBfUzMheBzqld+0GoE4XWYXwCj4Z7l8hPBrE3pb2XIVEm5Cc3RTkrFBoaB20m0PY2EjlIrIX005GicP7jvVoM7qxH4bSTU13BpqbFzIjyUMKEcoZQLXqIhNypiJUwCQqH0za2kjtACQszpN14tez3wStt271puZXst+//yj5zN8tMmbpyr/XOQHzeSQqZn2Pu04e1ttsDM9btNGTsc34G31eCmbGTtNK/VPp1FP9dI+nfLh8Y2/du88YZHj31if2DgwaYzM3IOgeNp+aVHHTW4fFRJyoKHbD7B90s5SGbtIuedBV9fPjMiUcwQ+PKfgQN/E+ecPHi3s7gwc79P+5vPdpdfw5tRLejH6O70SB6GD2Knkbj6Dgi0UX0OvoretvLe29673rvV+jVK2ufW1Hr6X34CMFEyxU=</latexit>
x0
lk Pn
T
Pn
T
- Transparent program PT
- coarse-grained datasets
<latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit>
x2
<latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit>
x1
<latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit>
y2
<latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit>
y1
PT
<latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit>
x1
<latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit>
y1
<latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit>
y2
f
<latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit>
x1
<latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit>
x2
if c:
y1 ß x1
else:
y1 ß x2
Y2 ß f(x1, x2)
Runtime: c == True
34
A generic dataframe observer for Pandas
Approach:
- add an observer to monitor dataframe changes
- mostly transparent to application
- some control surfaced
IDEAL
2023
35
Approach to design (II)
- Grounded in well-known dataframe transformation operators
- Open: accommodates any transformation within three broad classes
36
Data reduction
<latexit sha1_base64="caGX98B8rPEaUMv/+I4c5iOo7DY=">AAADNnicbVLLjtMwFHXDawivDizZGCrEIKEqQSNggzRiZsFykOjMSE2prl2nNXXsyL6mVFHWfA1b4FfYsENs+QEk3IeApnMlSyfnnGvH14eVSjpMkm+t6MLFS5ev7FyNr12/cfNWe/f2iTPectHjRhl7xsAJJbXooUQlzkoroGBKnLLp4UI/fS+sk0a/wXkpBgWMtcwlBwzUsH3v6CF9QbNSDg/3jh49pnT17eS4gCUVD9udpJssi26DdA06ZF3Hw93W72xkuC+ERq7AuX6alDiowKLkStRx5p0ogU9hLPoBaiiEG1TLu9T0QWBGNDc2LI10yf7fUUHh3LxgwVkATlxTW5DnaX2P+fNBJXXpUWi+Oij3iqKhi8HQkbSCo5oHANzK8K+UT8ACxzC+jVOYMVME5uo4zrSYcVMUoEdVZkxdZSg+IMsrU9ebYo51Px1Ufw2dtN6y/GuHpmbKIArtvBWLq9GM5dQ0PBNjwY+DD1Q5gbdVZuV4gmCtmTW3C5HYtIa5KRWebabP9b8zUgc3MzOUoqF5HZIURF8qv5hJCEzajMc2OHnSTZ9291/vdw5erqOzQ+6S+2SPpOQZOSCvyDHpEU4+kk/kM/kSfY2+Rz+inytr1Fr33CEbFf36A9XlEGY=</latexit>
D0
= ⇡C(D), D0
= C(D) - Projection, Selection
<latexit sha1_base64="fFqxFPpIMzZxYgmMXJgJEbRtTTU=">AAADX3icbVJdixMxFE1bddeqa1efxJdgEbogZcbvBx9WV1B8WsHuLjS1ZDJ3prGZZEgy1hLyn/w1gk/qDxFMP9i1070QOHPPucnk5CSl4MZG0c9Gs3Xl6rWd3evtGzdv7d3u7N85MarSDAZMCaXPEmpAcAkDy62As1IDLRIBp8n0aMGffgVtuJKf7LyEUUFzyTPOqA2tcecDKfnYEUcsfLPuiKf+EV7hdyBT0Oefr3PwxPseMTwv6NhddF89iXzv7cHBuNON+tGy8DaI16CL1nU83m/8JaliVQHSMkGNGcZRaUeOasuZAN8mlYGSsinNYRigpAWYkVte2uOHoZPiTOmwpMXL7v8TjhbGzIskKAtqJ6bOLZqXccPKZi9HjsuysiDZ6qCsEtgqvHAQp1wDs2IeAGWah3/FbEI1ZTb4vHFKotTU0sT4dptImDFVFFSmjijlV/4lmVPeb5KZ9cN45M4F3dhvSS7GaZ1TZSBBmkrD4mqYJBlWNc1EaVrlQUdFOaGfHdE8n1iqtZrVtwvZ2ZQG34QIzzaTl+q/KC6DOlEzy6HGVTJELpBVKaqFJyEwcT0e2+DkcT9+3n/28Wn38M06OrvoPnqAeihGL9Aheo+O0QAx9B39QL/Q7+af1k5rr9VZSZuN9cxdtFGte/8AGM4hrw==</latexit>
⇡{Cid,Gender,Age}( Age<30(D))
37
Data augmentation
Vertical augmentation
<latexit sha1_base64="Jkv8keMS0FhjcfbzwX5TGOOML7Q=">AAADJnicbVJNj9MwEHXDxy7lq4UjF4sKablUCVoB2tMKLhwXie4WNaWauE5j6tiRPSZUUX4KV+DXcEOIG38ECaetgKY7kqWneW884/FLCikshuHPTnDl6rXrB4c3ujdv3b5zt9e/d261M4yPmJbajBOwXArFRyhQ8nFhOOSJ5BfJ8mXDX3zgxgqt3uCq4NMcFkqkggH61KzXjzNtwC1mVXo0fnxC39az3iAchuug+yDaggHZxtms3/kdzzVzOVfIJFg7icICpxUYFEzyuhs7ywtgS1jwiYcKcm6n1Xr2mj7ymTlNtfFHIV1n/6+oILd2lSdemQNmts01ycu4icP0+bQSqnDIFds0Sp2kqGmzCDoXhjOUKw+AGeFnpSwDAwz9una6JFovERJbd7ux4iXTeQ5qXsVa11WM/CMmaaXrepdMsZ5E0+qvYBDVe5J/5dDmdOFJrqwzvHkajZOU6pZm83NeB7LI4F0VG7HIEIzRZfs6b4Fdqd+blP7bSnWp/r0WyqsTXaLgLc4p7xxPukK6ZifeMFHbHvvg/Mkwejo8fn08OH2xtc4heUAekiMSkWfklLwiZ2REGCnJJ/KZfAm+Bt+C78GPjTTobGvuk50Ifv0By7kNaw==</latexit>
↵!
f(X):Y
<latexit sha1_base64="KZhlQb7RQuvbDZlIWBGjNXy9o1c=">AAADPnicbVLLjhMxEHSGxy7hlYUjF4sIKblEGbSCFaflceC4ILK7UiZEPY4nMfHYI7tNiKz5Br6GK/Ab/AA3xJULEp4kAjLZliyVu6rddrvSQgqL/f63RnTp8pWre/vXmtdv3Lx1u3Vw59RqZxgfMC21OU/BcikUH6BAyc8LwyFPJT9L588r/uw9N1Zo9QaXBR/lMFUiEwwwpMatbjLTBtx07LNx3Eky9E+nvOw+oRWEKX8NKuzLzovuuNXu9/qroLsg3oA22cTJ+KDxO5lo5nKukEmwdhj3Cxx5MCiY5GUzcZYXwOahzTBABTm3I796U0kfhMyEZtqEpZCusv9XeMitXeZpUOaAM1vnquRF3NBhdjTyQhUOuWLrRpmTFDWtBkQnwnCGchkAMCPCXSmbgQGGYYxbXVKt5wipLZvNRPEF03kOauITrUufIP+AaeZ1WW6TGZbDeOT/CtpxuSP5Vw51TheB5Mo6w6un0STNqK5p1j8adCCLGbz1iRHTGYIxelE/LlhjWxrmJmX4toW6UP9OCxXUqV6g4DXOqeCoQLpCumomwTBx3R674PRhL37UO3x12D5+trHOPrlH7pMOicljckxekhMyIIx8JJ/IZ/Il+hp9j35EP9fSqLGpuUu2Ivr1B33rF1I=</latexit>
↵!
f1(Age):ageRange(D)
group by gender
avg(age)
Horizontal augmentation
<latexit sha1_base64="/Fez8VR4cSmlF01/YiVQsD5zSEs=">AAADP3icbZLNjtMwEMfd8LWUj+3CkUtEhdTlUDXVChAS0vIlOC4S3V2pKZHjTlpTx47sMaWK/A48DVfgMXgCbogrBySctoJtuiNF+mf+v4njmUkLwQ32et8bwYWLly5f2bnavHb9xs3d1t6tY6OsZjBgSih9mlIDgksYIEcBp4UGmqcCTtLZ88o/+QDacCXf4qKAUU4nkmecUfSppHX/ZdJ/EnuC2klSxhmWr0COQbvHWdLvVO9PJ+D2XefFftJq97q9ZYTbIlqLNlnHUbLX+BOPFbM5SGSCGjOMegWOSqqRMwGuGVsDBWUzOoGhl5LmYEbl8lIuvOcz4zBT2j8Sw2X2bEVJc2MWeerJnOLU1L0qeZ43tJg9GpVcFhZBstVBmRUhqrDqUDjmGhiKhReUae7/NWRTqilD38eNU1KlZkhT45rNWMKcqTynclzGSrkyRviIaVYq5zbNDN0wGpX/gHbktpD/5bTuqcKbII3VUF0tjNMsVDVmqqpxeo6KYkrflbHmkylSrdW8/rnV5M+gvm9C+LHN5bn8e8Wlp1M1Rw41z0q/Ut60hbBVT/zCRPX12BbH/W70oHvw5qB9+Gy9OjvkDrlLOiQiD8kheU2OyIAw8ol8Jl/I1+Bb8CP4GfxaoUFjXXObbETw+y86OxeP</latexit>
E2 = ↵#
Gender:f2(Age)(D)
<latexit sha1_base64="bJJOqZd/k6cJtV5UgJsl0/znBVA=">AAADKHicbZJNj9MwEIbd8LFL+eqyRy4WFVL3UiWIjxWnFXDguEh0t6gJ1cR1WlPHjuwxpYryW7gCv4Yb2iv/AwmnrWCb7kiRXs3zju3MTFpIYTEML1rBtes3bu7t32rfvnP33v3OwYMzq51hfMC01GaYguVSKD5AgZIPC8MhTyU/T+eva37+mRsrtHqPy4InOUyVyAQD9Klx5zD2FNx0XA5fZr0PR1XvzdG40w374Srorog2oks2cTo+aP2JJ5q5nCtkEqwdRWGBSQkGBZO8asfO8gLYHKZ85KWCnNukXL2+oo99ZkIzbfynkK6ylytKyK1d5ql35oAz22R18io2cpgdJ6VQhUOu2PqizEmKmtatoBNhOEO59AKYEf6tlM3AAEPfsK1bUq3nCKmt2u1Y8QXTeQ5qUsZaV2WM/AumWamrahtmWI2ipPxn6EbVjuV/OTSZLjzkyjrD61+jcZpR3fDMdD077wNZzOBjGRsxnSEYoxfN49ZjvmT1fZPSj22hrvR/0kJ5d6oXKHiDOeV3x0NXSFf3xC9M1FyPXXH2pB897z9797R78mqzOvvkIXlEeiQiL8gJeUtOyYAwsiRfyTfyPfgR/Ax+BRdra9Da1BySrQh+/wVrcA35</latexit>
↵#
X:f(Y )(D)
38
Data transformation
<latexit sha1_base64="XtRrctBkqIU93sb+UHrmtJtjUkA=">AAADHnicbZJNbxMxEIad5aMlfLVw5LIiQiqXaBdVwLGCC8cikTbSbojGjjdr4rVX9rghsvZncAV+DTfEFX4MEt40ArLpSJZezfuMP8ZDayksJsmvXnTt+o2be/u3+rfv3L13/+DwwZnVzjA+YlpqM6ZguRSKj1Cg5OPacKio5Od08br1zy+4sUKrd7iq+aSCuRKFYIAhleUIbuqLo/HTZnowSIbJOuJdkW7EgGzidHrY+53PNHMVV8gkWJulSY0TDwYFk7zp587yGtgC5jwLUkHF7cSv79zET0JmFhfahKUwXmf/r/BQWbuqaCArwNJ2vTZ5lZc5LF5OvFC1Q67Y5UGFkzHquG1APBOGM5SrIIAZEe4asxIMMAxt2jqFar1AoLbp93PFl0xXFaiZz7VufI78I9LC66bZNgtssnTi/wKDtNlB/pVD19N1MLmyzvD2aXFOi1h3mFIbcPPAgaxLeO9zI+YlgjF62d0ufP02GvomZfi2pbqS/6CFCjTVSxS84zkVJiaYrpau7UkYmLQ7Hrvi7NkwfT48fns8OHm1GZ198og8JkckJS/ICXlDTsmIMKLJJ/KZfIm+Rt+i79GPSzTqbWoekq2Ifv4B+VsLDw==</latexit>
⌧f(X)
<latexit sha1_base64="Q7sjzw3r7FpZN6MWGMj9azYMGFk=">AAAD5HicbVJLb9NAELYbHiW8WjhyWREjFQlFccXrWAEHjkWiDykO0ex6N1663rX20RBZ/gfcEFf+Emd+DBKzaQQk7Vw8O9/3zYxnhjZKOj8a/Uq3eteu37i5fat/+87de/d3dh8cOxMs40fMKGNPKTiupOZHXnrFTxvLoaaKn9CztxE/OefWSaM/+kXDJzXMtBSSgcfQdOdnoY3UJdee+IqTwvMvmKX1FrQTxtZLWkeMIEAc99ERHHyw3JHsNIvv7F1GgpN6hhQRNIsKkomMFAWRjhjqAZsrCV0QF6jD9MFHNgdWkXNQgZOsnLayEF1G5tJXKN7DQAHOY+xp9ixmwmYuFKvyJCuwhGEsWBuzSR37GU53BqPhaGnkspOvnEGyssPpbvq7KA0LNY6AKXBunI8aP2nBeskU7/pFcLwBdgYzPkZXQ83dpF1OviNPMFIuexMGR7iM/q9ooXZuUVNk4igrt4nF4FXYOHjxetJK3QTPNbsoJIIi3pC4RlJKy5lXC3SAWYm9ElaBBeZx2WtVqDFnHqjr+v1C8zkzdQ26bAtjuna5bipa03XroPDdOJ+0fwmDvLtE+SeHTcw0CHLtcE/x10hBBTEbnMpYCDPkgWoq+NQWVs4qD9aa+WY6POB1Ks5NKVzbXF/J/4wnjWxq5l7yDSzoeND4bVSIM8GDyTfP47JzvD/MXw6ff9gfHLxZnc528ih5nOwlefIqOUjeJ4fJUcLSF+k4LVPeE72vvW+97xfUrXSleZisWe/HH5yBTeM=</latexit>
the transformation of a set of features X of D using a function f
is obtained by substituting each value dia with f(d⇤a),
for each feature a occurring in X.
Example: data imputation. Here f replaces nulls with the most frequent value, for
column Zip
<latexit sha1_base64="dKf0psuUtfBq7WDfOX5DpzZK5ls=">AAADKnicbZLNjtMwFIXd8DeUvw6IFZuICqmzqRI0ApYjYMFykOjMiCZUN67Tmjp2ZF9TKssPwxZ4GnYjtrwGEk6nAprOlSId3fPd2Lk5RS24wSQ570RXrl67fmPvZvfW7Tt37/X2758YZTVlI6qE0mcFGCa4ZCPkKNhZrRlUhWCnxeJV459+YtpwJd/hqmZ5BTPJS04BQ2vSe5gh2IkrB1mJ7j2v/YEfvD6Y9PrJMFlXvCvSjeiTTR1P9ju/s6mitmISqQBjxmlSY+5AI6eC+W5mDauBLmDGxkFKqJjJ3fr+Pn4SOtO4VDo8EuN19/8JB5Uxq6oIZAU4N22vaV7mjS2WL3LHZW2RSXpxUGlFjCpulhFPuWYUxSoIoJqHu8Z0DhoohpVtnVIotUAojO92M8mWVFUVyKnLlPIuQ/YZi9Ip77fNEv04zd1foJ/6HeTfOLQ9VQeTSWM1az4tzooyVi1mrjTYWeBA1HP44DLNZ3MErdWy/boQg2007E2I8NuW8lL+o+Iy0IVaImctz8qQnmDaWthmJyEwaTseu+Lk6TB9Njx8e9g/ermJzh55RB6TAUnJc3JE3pBjMiKUOPKFfCXfou/Rj+g8+nmBRp3NzAOyVdGvPwnyD0I=</latexit>
⌧f(Zip)(D)
39
Data fusion: join and append
<latexit sha1_base64="uo1XC2O2rrqRH/7jgx2X/lPakP4=">AAADKHicbZLNbtNAFIUn5q+Ev5Qu2YyIkFhFNqoKy6rtggWLgkhbKXai68k4HjKesWbuNESWn4Ut8DTsULe8BxLjNALi9EqWju75rmd8fdJSCotheNUJbt2+c/fezv3ug4ePHj/p7T49s9oZxodMS20uUrBcCsWHKFDyi9JwKFLJz9P5ceOfX3JjhVYfcVnypICZEplggL416e2djN/R+JMWaoyT6rimJ+MPk14/HISrotsiWos+WdfpZLfzO55q5gqukEmwdhSFJSYVGBRM8robO8tLYHOY8ZGXCgpuk2p1+5q+8J0pzbTxj0K66v4/UUFh7bJIPVkA5rbtNc2bvJHD7E1SCVU65IpdH5Q5SVHTZhV0KgxnKJdeADPC35WyHAww9AvbOCXVeo6Q2rrbjRVfMF0UoKZVrHVdxcg/Y5pVuq43zQzrUZRUf4F+VG8h/8ah7enSm1xZZ3jzaTROM6pbTK4NuJnnQJY5jKvYiFmOYIxetF/nQ7CJ+r1J6X/bQt3IN5HwdKoXKHjLc8pnx5uulK7ZiQ9M1I7Htjh7NYgOBvvv9/uHR+vo7JBn5Dl5SSLymhySt+SUDAkjS/KFfCXfgu/Bj+BncHWNBp31zB7ZqODXH8rzDh4=</latexit>
DL
./t
C DR
<latexit sha1_base64="fiWoK5ivN8nYSDBQRhG2qdf4NTc=">AAADIXicbZJNbxMxEIad5auErxaOXCwiJE7RLqoKxwp64MChINJWym6qsePNmnjtlT0mRKv9H1yBX8MNcUP8FiS8aQRk05EsvZr3GX+Mh1VKOozjn73oytVr12/s3Ozfun3n7r3dvfsnznjLxYgbZewZAyeU1GKEEpU4q6yAkilxyuYvW//0g7BOGv0Ol5XISphpmUsOGFKTo8lrmnodJD2avD3fHcTDeBV0WyRrMSDrOD7f6/1Op4b7UmjkCpwbJ3GFWQ0WJVei6afeiQr4HGZiHKSGUrisXl27oY9DZkpzY8PSSFfZ/ytqKJ1bliyQJWDhul6bvMwbe8yfZ7XUlUeh+cVBuVcUDW17QKfSCo5qGQRwK8NdKS/AAsfQqY1TmDFzBOaafj/VYsFNWYKe1qkxTZ2i+Igsr03TbJo5NuMkq/8Cg6TZQv6VQ9czVTCFdt6K9mk0ZTk1HaYwFvwscKCqAiZ1auWsQLDWLLrbhd/fREPflArfttCX8u+N1IFmZoFSdLzVpATTV8q3PQkDk3THY1ucPB0mB8P9N/uDwxfr0dkhD8kj8oQk5Bk5JK/IMRkRTiz5RD6TL9HX6Fv0PfpxgUa9dc0DshHRrz8U/gvI</latexit>
DL
] DR
<latexit sha1_base64="ZSc/aIuuYda02WJ0QVQW8PzBr8E=">AAADIHicbZJNbxMxEIad5auErxaOXCwiJE7RLqqAYwU9cOBQEGkrZTfV2PFmTbz2Yo8J0Wp/B1fg13BDHOG/IOFNIyCbjmTp1bzP+GM8rFLSYRz/7EWXLl+5em3nev/GzVu37+zu3T12xlsuRtwoY08ZOKGkFiOUqMRpZQWUTIkTNn/R+icfhHXS6Le4rERWwkzLXHLAkMoOJ69Sr4Oih5M3Z7uDeBivgm6LZC0GZB1HZ3u93+nUcF8KjVyBc+MkrjCrwaLkSjT91DtRAZ/DTIyD1FAKl9WrWzf0YchMaW5sWBrpKvt/RQ2lc8uSBbIELFzXa5MXeWOP+bOslrryKDQ/Pyj3iqKhbQvoVFrBUS2DAG5luCvlBVjgGBq1cQozZo7AXNPvp1osuClL0NM6NaapUxQfkeW1aZpNM8dmnGT1X2CQNFvIv3LoeqYKptDOW9E+jaYsp6bDFMaCnwUOVFXApE6tnBUI1ppFd7vw+Zto6JtS4dsW+kL+nZE60MwsUIqOt5qUYPpK+bYnYWCS7nhsi+PHw+TJcP/1/uDg+Xp0dsh98oA8Igl5Sg7IS3JERoST9+QT+Uy+RF+jb9H36Mc5GvXWNffIRkS//gCWmQue</latexit>
DL
] DR
<latexit sha1_base64="Tf7s3qEix3yKzKbh9vcpsGLm1tk=">AAADSXicbVLdihMxGE2n/qz1r6uX3gSL4FWZkaLeCIu7FwperGJ3FzrTkkkzbWwmGZIv1hLyIj6Nt+oT+BjeiSCY6ZbVTveDgZNzzpdMvpy8EtxAHP9oRe0rV69d37vRuXnr9p273f17J0ZZTdmQKqH0WU4ME1yyIXAQ7KzSjJS5YKf54rDWTz8ybbiS72FVsawkM8kLTgkEatIdHI3fpB8Ul2OXmgJzKZn2ExfYflqAO3w99S+Oxu8uFh6H1aTbi/vxuvAuSDaghzZ1PNlv/UmnitqSSaCCGDNK4goyRzRwKpjvpNawitAFmbFRgJKUzGRufT2PHwVmigulwycBr9n/OxwpjVmVeXCWBOamqdXkZdrIQvE8c1xWFpik5wcVVmBQuJ4VnnLNKIhVAIRqHv4V0znRhEKY6NYpuVILILnxnU4q2ZKqsiRy6lKlvEuBfYK8cMr7bbEAP0oyd2HoJX7H8q+dNDVVBZFJYzWrr4bTvMCq4ZkrTews+Iio5iS8seazORCt1bK5XUjJtjXMTYjwbEt5qb8OTXDnagmcNTQrQ7iCaCth65mEwCTNeOyCkyf95Gl/8HbQO3i5ic4eeoAeoscoQc/QAXqFjtEQUfQZfUFf0bfoe/Qz+hX9PrdGrU3PfbRV7fZfdvcasg==</latexit>
DL
./inner
DL.CId=DR.CId DR
40
Conceptual provenance capture model: templates
<latexit sha1_base64="Q+fPf+TzQY7bxgC074TZYQmdfIg=">AAAKYHicjZZfb9s2EMDldn9Sr12T7W17IRYES7E1s4cWG/ZUZ83SAEXiFUlbIPYMSjrJRClSIym7hqAPucc97GWfZEfZiylK7SbAAI/3uzuSdzw6zDnTZjD4s3fr9gcffvTxzp3+J3fvfXp/d++zl1oWKoKrSHKpXodUA2cCrgwzHF7nCmgWcngVvvnZ6l8tQGkmxaVZ5TDNaCpYwiJqcGq2u5wIWEYyy6iIy0liquvhtCwnBt6aMCn3h1VV9RvIXCpapFU5oTyf09/KiWLp3FCl5NKia/WsTGbDQ3RXjlKoHvxE7JCm8IIKlKvDpw9mu/uDo0H9kfZguBnsB5tvPNvb+WMSy6jIQJiIU62vh4PcTEuqDIs4YOhCQ06jNxjmGoeCZqCnZX1CFTnAmZgkUuFPGFLPuhYlzbReZSGSGTVz7evsZJfuujDJj9OSibwwIKJ1oKTgxEhij5vETEFk+AoHNFIM10qiOVU0MpiUPn6Nw82VXODR2jDMlLXkHX8M+RawQuV7cO19rQyTrZqGumUtuWOOQuUvMN3qx6e+OYit9kT4WtzzVj1Cwdc7vkdpK7TNIALPN0Qteh6WabhykGV6vPIRJhJeYKbA5ag++3c6bpss46QJPwXFFhD/omTWYumywS5bmzRGXcoG0zoIO/VeIAYOKTXuHmgo227sto5XrZ1KlXXtU+MtdgOvZT/Fbg5A2BR4eTpxKvikVb/YnKSKwVYpDiP4/R36XLEMbqCv/WKARUadi7AWW0sRMoYtdG4lL5y9o1uilvwNcxqCcyvWor+etDgZ2dVi50uL74BWFSEHJAXxsNDYJojEHkywczHQ3xK8CGzB7Nh3whwvrHbjE/RkdHqDfIOIvSnNWEiSTKp3BiU5LzTBxiwMdqCDAyJzUNRI5S9HycI549NabN1Zj6JpJ6e7nM2wxFTrgYm41E4TqkUPUZA7GbESjVqJo2kTW8sdoIIlNmnXXy37dfDW2HLfltxa9sv3f60+c4NlmKautdfzTkN80UlKha+w8OmLerbbgnLebTTi/H12iJ9pybHtxI3l30z6eZT/nSPl3y4fGNt3d/vijC6f+cTZ+fkWmCywZc3B0Bk+ya3Kuri67ERlYVrs2Xk3y0SbjZtJj7uSPr547viLKCfjqsI/QUP/L0978PL7o+Hjo8Gvj/afHG/+Du0EXwZfBYfBMPgheBI8C8bBVRAFf/Vu9+727t35u7/Tv9/fW6O3ehubz4PG1//iH9y29FY=</latexit>
↵!
f1(Age):ageRange(D)
A different provenance template pt𝜏 is associated with each type 𝜏 of operator
41
Capturing provenance: bindings
At runtime, when operator o of type 𝜏 is executed, the appropriate template pt𝜏 for 𝜏 is selected
Data items from the inputs and outputs of the operator are used to bind the variables in the template
14/03/2021 03_ b _c .
:///U / 65/D a /03_ b _c . 1/1
14/03/2021 03_ b _c .
:///U / 65/D a /03_ b _c . 1/1
op
{old values: F, I, V} à {new values: F’, J, V’}
+
Binding rules
<latexit sha1_base64="icVdmbcCfxxYOiITpBtlS3uqwUQ=">AAAD+HicdZNdb9MwFIaTlY8RPtbBJTdHVJQhVVWDJkCTKk2AJsbVkOjWqQ6V4zqtmWNHtrOuBP8X7hC3/Buu+R1IOGkFbTccKTo672O/yTnHccaZNp3OT3+jdu36jZubt4Lbd+7e26pv3z/WMleE9ojkUvVjrClngvYMM5z2M0VxGnN6Ep+9LvWTc6o0k+KDmWU0SvFYsIQRbFxqWP8VNAEZemGKA6nAAtsLAY2k0SD2mggFTQTVUyHVm5ki13RkgQrT3rMwQByLMadwAF3oD9MWHHZZC467b4YFa7mEBaTmxBcoAUBMQB8i+N/xYyqowmbJY8nkiXM5HU5a8K50Oe8mO3Mf+3TZxhGVzSlEQTCsNzrtTrXgchAugoa3WEfDbf+3qwHJU2dPONZ6EHYyExVYGUY4tQFyFcgwOcNjOnChwCnVUVF1w8LjsjyQuHImUhiosss7CpxqPUtjR6bYTPS6Viav0ga5SV5GBRNZbqggc6Mk52AklK2FEVOUGD5zASaKuW8FMsEKE+MGYMUllvLM4FjbIECCTolMUyxGBZLSzrsQJ4W0dlVMjB2EUfEXaIT2EvJvO17XZOZEKnSuaPlrgOIE5BozkQrnY8dhnk3wxwIpNp4YrJScrh/nhnoVdXXj3LVtKq7kP0kmHB3LqWF0TcuFuwtOzDOelzVxAxOuj8fl4PhZO3ze3n2/29h/tRidTe+h98jb8ULvhbfvvfWOvJ5H/ENf+hf+rPa59rX2rfZ9jm74iz0PvJVV+/EHk+tPwQ==</latexit>
For i : 1 . . . n :
used ent.:[hF = Xm, I = i, V = Di,Xm
i|Xm 2 X]
generated ent.:[hF0
= Yh, J = i, v = f(Di,X )i|Yh 2 Y ]
42
This applies to all operators
43
Implementation
We use templates in combination with dataframe diff:
(*) extends to joins, append
For each input/output pair Din, Dout of dataframes:
1. Compare both the shapes and values of Din, Dout (*)
2. Use the diff to:
• Select the appropriate template
• Bind the template variables using the relevant values in the two dataframes
• Generate an instantiated provlet
44
Running Example
D1 D2 D3
Add
‘E4,’ ‘Ex’, ‘E1’
Remove ‘E’
D4 D6
Da
Db
Left join
(K1,K2)
Impute
all missing
Dc
Left join
(K1,K2)
Impute E,F
D5
<latexit sha1_base64="vtTzVqyQbOaTVii0idD+QwwhSJQ=">AAAEKXicfVNbb9MwFE5WLqPcNvbIi0UFalE0NV03EFKl3Toh7WVI7CLVxXJcpzVz7Mh26IqV/8Ir8Gt4A175HUg4bRltN3GkREfn+853Ep/PUcqZNvX6D3+pdOPmrdvLd8p3791/8HBl9dGJlpki9JhILtVZhDXlTNBjwwynZ6miOIk4PY3O9wr89ANVmknx1oxS2k1wX7CYEWxcCa36a8DFM7CPwtY+wgC+l0y8s9AYwGlscmQPURgcokbuKBGE5b/0RgsanCEbo7D6vJZXnUBtBt5wao3/q5EZevNSrVFtBwdjvY1ZPbuZt+BAKpz1kR1U27VX0LZRMwBtdFG8QpgXPc3Znq0WTBmy0O4UnN0A7KBRAPYDsBeAg6Jppj0AEwE3p1ZGK5X6en0c4GoSTpOKN40jd4y/YU+SLKHCEI617oT11HQtVoYRTvMyzDRNMTnHfdpxqcAJ1V07Xl8OnrpKD8RSuUcYMK7OdlicaD1KIsdMsBnoRawoXod1MhO/7Fom0sxQQSaD4owDI0HhBdBjihLDRy7BRDH3rYAMsMLEOMfMTYmkPDc40nm5DAUdEpkkWPQslDJ326UXJoqtzPN50C28E3btJaES5lco/9rxIiZTB1KhM0WLXwMwioFc4Ewc4XiYpwPsnKZYf2CwUnK4KOduwTzVnRvnbm1DcS2/sK5jR3JoGF3AMuEujwOzlGfFmTjDhIv2uJqcNNbDrfXNN83K9u7UOsveY++JV/VC74W37b32jrxjj/gf/U/+Z/9L6WvpW+l76eeEuuRPe9a8uSj9+gO5hVWq</latexit>
D1 = Da ./left
K1,K2
Db
D2 = ⌧f1(⇤)(D1)
D3 = D2 ./left
K1,K2
Dc
D4 = ⌧f2(E,F )(D3)
D5 = ↵!
h(E):{E4,Ex,E1}(D4)
D6 = ⇡{Ax,B,Ay,D,C,F,E4,Ex,E1,}(D5)
45
Summary: Shape and value changes
Shape changes:
Rows
Added?
Rows
Removed?
Columns
Added?
Columns
Removed?
Columns
Removed?
Horizontal
Augmentation
Reduction
by selection
Reduction
by projection
data
transformation
(composite)
Y
Y
Y
Y
data
transformation
Y
N
N
N
Templates:
N
Value changes for each column:
Nulls reduced?
Values changed?
Y
Y
N
Templates:
data
transformation
(imputation)
data
transformation
1-1 derivations
46
Running Example
Dataframes Diff template
D1 ß {Da, Db} Explicit join provenance pattern
D2 ß D1 value change, reduced nulls à imputation Data transformation
D3 ß {D2, Dc} Explicit join provenance pattern
D4 ß D3 value change, reduced nulls à imputation Data transformation
D45 ß D4 Shape change, column(s) added <wait!>
D6 ß D5 Shape change, column(s) removed Data transformation, composite
D1 D2 D3
Add
‘E4,’ ‘Ex’, ‘E1’
Remove ‘E’
D4 D6
Da
Db
Left join
(K1,K2)
Impute
all missing
Dc
Left join
(K1,K2)
Impute E,F
D5
47
Evaluation: Provenance capture times
48
Evaluation: Provenance query times on Neo4J
49
Scalability
Synthetic Benchmarking datasets created using TPC-DI
50
Scalability: capture and storage / TCI-DI datasets
Basic operators Join + append operators
51
How can we realise why+provenance?
52
Representing provenance: Layer III
Need a language to express solution-specific explanations:
xnj xi
j
CS
wasGeneratedBy
used
C
wasAssociatedWith
wasDerivedFrom
why
- Why was ti selected/removed?
1. ti belongs to cluster Ch,
2. there exists tj in Ch such that d(ti,tj) < δ,
à ti and tj redundant,
à tj selected, ti removed”
- “Why was xnj cleaned?”
TSfull ti
Filter
wasInvalidatedBy
used
why
53
Capturing provenance: Layer III
Layer III (data- and process-granular):
Requires explanation generator as part of the transformation logic
Approach: operator send “explanations” to provenance server using API at runtime
At a chosen granularity: dataset à data item
xnj xi
j
CS
wasGeneratedBy
used
C
wasAssociatedWith
wasDerivedFrom
why
CS
D D’
Prov-DB
{xi, x’i, expli}
IDEAL
2023
56
<event
name>
Layer III provenance: Preliminary ideas
We frame the general provenance granularity problem in terms of the two orthogonal
dimensions:
- data derivation, from dataset to item-level
- detail of processor behaviour, from class to internal logic
dataset item
Data
detail
Processor
detail
class level
logic level
- transformation
- selection
D à D’
D à D’ ⊆ D
{ x à x’}x ∈ D, x’ ∈ D’
{ x ∈ D | 𝜎(x’) = True}
Processor
logic
Why x? (transformation, selection)
Why x’? (transformation, augmentation)
⟶
⟶
57
<event
name>
Processor logic at dataset level
dataset item
Data
detail
Processor
detail
class level
logic level
- transformation
- selection
D à D’
D à D’ ⊆ D
{ x à x’}x ∈ D, x’ ∈ D’
{ x ∈ D | 𝜎(x’) = True}
Processor
logic
Why x? (transformation, selection)
Why x’? (transformation, augmentation)
⟶
⟶
wgby
used
…
Ai Ti Mi
Di
Ci-i
CTi
Mi-1
Di-1
wgby wgby
used used
Cleaning
targets
Assessment Training
Cleaning Model
58
<event
name>
Processor logic at item level
dataset item
Data
detail
Processor
detail
class level
logic level
- transformation
- selection
D à D’
D à D’ ⊆ D
{ x à x’}x ∈ D, x’ ∈ D’
{ x ∈ D | 𝜎(x’) = True}
Processor
logic
Why x? (transformation, selection)
Why x’? (transformation, augmentation)
⟶
⟶
“why did the assessor 𝐴 choose 𝑥 for cleaning?”
“how did the cleaner 𝐶 choose the replacement value?”
“why did 𝑥 ∈ 𝐷 get selected for removal from the training set?”
59
<event
name>
A possible vocabulary / library: DC-Check
[4] Seedat, Nabeel, Fergus Imrie, and Mihaela van der Schaar. ‘DC-Check: A Data-Centric AI Checklist to Guide the Development of
Reliable Machine Learning Systems’. arXiv, 9 November 2022. http://arxiv.org/abs/2211.05764.
62
Summary of goals and action plan
problem instances
Prov-DB
Data Training Ops
Enable
reuse
Observe /
record
Reproduce /
explain
Curated
Data toolkit
Goals: to support
• Reusability and emerging best practices for
complex data intervention + usage patterns
• Reproducibility, explainability of pipeline instances
How:
- Enable data processing observations / capture
- Build a curated catalogue of interventions + usage patterns
- Associate provenance with data + model versions
Challenges:
- Observability: Instrumenting common runtime for transparent capture
- Granularity: pick a layer (I-II-III): precision vs scalability à how much do we need?
- “why?” vocabulary and language for expressing explanations
IDEAL
2023
63
Summary of references
[1] Seedat, Nabeel, Fergus Imrie, and Mihaela van der Schaar. ‘DC-Check: A Data-Centric AI Checklist to Guide the Development of Reliable
Machine Learning Systems’. arXiv, 9 November 2022. http://arxiv.org/abs/2211.05764.
[2] Mohammad Hossein Jarrahi, Ali Memariani, and Shion Guha. 2023. The Principles of Data-Centric AI. Commun. ACM 66, 8 (August 2023), 84–92.
https://doi.org/10.1145/3571724
[3] Zha, Daochen, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, and Xia Hu. ‘Data-Centric AI: Perspectives and Challenges’. arXiv, 2 April 2023.
http://arxiv.org/abs/2301.04819.
[4] Zha, Daochen, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. ‘Data-Centric Artificial Intelligence: A
Survey’. arXiv, 11 June 2023. https://doi.org/10.48550/arXiv.2303.10158.
[5] Singh, Prerna. ‘Systematic Review of Data-Centric Approaches in Artificial Intelligence and Machine Learning’. Data Science and Management 6,
no. 3 (1 September 2023): 144–57. https://doi.org/10.1016/j.dsm.2023.06.001.
[6] Neutatz, Felix, et al. "From Cleaning before ML to Cleaning for ML." IEEE Data Eng. Bull. 44.1 (2021): 24-41.
[7] Mazumder, Mark, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, et al. ‘DataPerf: Benchmarks for
Data-Centric AI Development’. arXiv, 13 October 2023. https://doi.org/10.48550/arXiv.2207.10062.
[8] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language
models. arXiv preprint arXiv:2001.08361, 2020.
[9] Abbas, Amro, Kushal Tirumala, Dániel Simig, Surya Ganguli, and Ari S. Morcos. ‘SemDeDup: Data-Efficient Learning at Web-Scale through Semantic
Deduplication’. arXiv, 22 March 2023. http://arxiv.org/abs/2303.09540.
[10] Sorscher, Ben, et al., Advances in Neural Information Processing Systems 35 (2022): 19523-19536. Beyond neural scaling laws: beating power
law scaling via data pruning
[11] A. Chapman, P. Missier, G. Simonelli, and R. Torlone. 2020. Capturing and querying fine-grained provenance of preprocessing pipelines in data
science. Proc. VLDB Endow. 14, 4 (December 2020), 507–520. https://doi.org/10.14778/3436905.3436911
[12] A. Chapman, L. Lauro, P. Missier, and R. Torlone. 2022. DPDS: assisting data science with data provenance. Proc. VLDB Endow. 15, 12 (2022),
3614–3617. https://doi.org/10.14778/3554821.3554857
IDEAL
2023

More Related Content

Similar to Towards explanations for Data-Centric AI using provenance records

deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiersamreshkr19
 
Artificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionArtificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionDr. Amarjeet Singh
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningPiotr Tylenda
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningAgnieszka Potulska
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...ijdpsjournal
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! EdholeEdhole.com
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! EdholeEdhole.com
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfChristinaGayenMondal
 
Implications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect PredictorsImplications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect Predictorsgregoryg
 

Similar to Towards explanations for Data-Centric AI using provenance records (20)

deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
Artificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionArtificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern Recognition
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine Learning
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine Learning
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Lec1
Lec1Lec1
Lec1
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
 
01_intro-cpp.ppt
01_intro-cpp.ppt01_intro-cpp.ppt
01_intro-cpp.ppt
 
01_intro-cpp.ppt
01_intro-cpp.ppt01_intro-cpp.ppt
01_intro-cpp.ppt
 
Implications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect PredictorsImplications of Ceiling Effects in Defect Predictors
Implications of Ceiling Effects in Defect Predictors
 
Dynamic Symbolic Database Application Testing
Dynamic Symbolic Database Application TestingDynamic Symbolic Database Application Testing
Dynamic Symbolic Database Application Testing
 

More from Paolo Missier

Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthPaolo Missier
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 

More from Paolo Missier (20)

Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Towards explanations for Data-Centric AI using provenance records

  • 1. Prof. Paolo Missier School of Computer Science University of Birmingham, UK April 5th, 2024 Towards explanations for Data-Centric AI using provenance records My contacts:
  • 2. 2 <event name> Outline • Basics of data provenance for DAG pipelines • Provenance in the context of Data-Centric AI use cases: Levels of detail / granularity • Data Provenance for Data Science: methods and tooling • Challenge: Why+provenance
  • 3. 3 <event name> Summary of data-centric use cases 1. Model-driven incremental data cleaning 1. Training set cleaning 2. Label correction 2. Training set optimization 1. Removing hard/easy examples 2. Reducing redundancies
  • 4. 4 <event name> Summary of data-centric use cases Context Type of operation strategy Data processing and model training ActiveClean Select items from training set for manual cleaning Item transformation: x -> x’ Iterative batch cleaning strategy driven by SGD ActiveClean processing is interleaved with model training, both stop at the same time. Training set debugging Select items from training set for label correction Item transformation: y -> y’ Aims to rank data points and minimize manual corrections The re-labelling strategy is incremental and interleaved with model retraining. However, winning strategy not published and thus its generalizability is not clear. Training set optimization, reducing redundancy by removing similar points Prune items from training set Filtering: remove (y) Cluster data points in embedded space, select representatives from each cluster Training set pruning happens before model training Training set optimization, reducing redundancy by pruning hard/easy examples Prune items from training set Identify simple / hard examples, sample from those depending on training set size Training set pruning happens before model training
  • 5. 5 <event name> Reproducibility, explainability The use cases provide examples of complex data transformations and data filtering operators We aim to answer three types of questions: • Which data transformations were applied to raw input dataset(s) to generate the final training set used for modelling? • Dataset level • Which of the individual data items were affected by each of the transformations, and what was the effect? • Data item level • Why was a specific data item transformed?
  • 6. 6 Representing provenance A formal, interoperable data model and syntax for generic provenance constructs - accommodates layers I, II - extensible to a domain vocabulary à eg DC-Check Seedat, Nabeel, Fergus Imrie, and Mihaela van der Schaar. ‘DC-Check: A Data-Centric AI Checklist to Guide the Development of Reliable Machine Learning Systems’. arXiv, 9 November 2022. http://arxiv.org/abs/2211.05764.
  • 7. 7 The W3C PROV model (2013) processing Input 1 Input n usage usage Output 1 Output m generation generation (derivation) (derivation)
  • 8. 8 <event name> Basic data derivation pattern: transformation Consider an abstract data transformation operator: 𝐷 → 𝐷ʹ D D’ A wasGeneratedBy wasDerivedFrom used We can record the provenance of 𝐷ʹ as a derivation from 𝐷 - mediated by some abstract activity 𝐴 that represents the cleaning or pruning operations
  • 9. 9 <event name> Item-level data transformation (1-1) This high-level provenance is not very informative if we want to account for how 𝐴 operates on each data item In the simple examples, 𝐴 performs 1-1, item-wise transformations: 𝑥 ∈ 𝐷 → 𝑥ʹ ∈ 𝐷ʹ where either 𝑥ʹ = 𝑥 or 𝑥ʹ is a clean version of 𝑥 A wasGeneratedBy used wasDerivedFrom x1 xn x’1 x’n … wasDerivedFrom PROV-N representation
  • 10. 10 <event name> Item-level data transformation (1-many) This notation can also be used to capture M-N transformations - to represent the effects of data imputation using statistics that affect multiple data points simultaneously - This can be achieved by adding relationship instances as needed Example: {WasDerivedFrom(𝑥𝑖ʹ, 𝑦)}𝑖∶1,𝑛 Denotes a single value 𝑦 ∈ 𝐷 used to produce multiple values 𝑥1ʹ, ... , 𝑥𝑛ʹ
  • 11. 11 <event name> Unfolding process iterations Di-1 D’ Ai wgby wasDerivedFrom used Ai-1 used D
  • 12. 12 <event name> Item-level data selection Here we only need to represent whether each input datapoint survives the selection operator PROV can be used to assert that operator op has removed datapoint 𝑥 ∈ 𝐷 from its output 𝐷ʹ There is actually no need to represent the provenance of the surviving datapoints Suppose op removed 𝑚 items from 𝐷. Using PROV, this is asserted as:
  • 13. 13 <event name> Data derivation through pipelines When operators are composed into pipelines, provenance is a composition of the corresponding provenance patterns Consider a sequential pipeline consisting of abstract data processing operators op1 ... op𝑛 and a training operator Tr Each op𝑖 takes an input dataset 𝐷 and produces an output 𝐷ʹ: 𝐷ʹ = op𝑖(𝐷) Similarly, training takes some 𝐷 and produces a model 𝑀: 𝑀 = Tr(𝐷) Starting from initial “raw” dataset 𝐷0, and denoting with 𝐷𝑖 the intermediate datasets, this pipeline can be written as {𝐷𝑖 = op𝑖(𝐷𝑖−1)}𝑛 𝑖∶1, 𝑀 = Tr(𝐷𝑛) Corresponding provenance: D0 OP1 D1 OPn Dn Tr M … Dn Tr M used wgby used D0 OP1 D1 wgby …
  • 14. 14 <event name> Extension to DAG topologies is straightforward These assertions extend naturally to pipelines with multiple inputs and outputs --> Directed Acyclic Graphs Example: inputs 𝐷0 𝑎, 𝐷0 𝑏 Dc 0 are processed independently and eventually merged into 𝐷𝑛: Da 0 OP1 Da 1 Db 0 OP2 Db 1 Dc 0 OP3 Dbc 0 OP4 Dabc 3 Da 0 OP1 Da 1 Db 0 OP2 Db 1 Dc 0 OP3 Dbc 0 OP4 Dabc 3 used used used used wgby wgby
  • 15. 15 IDEAL 2023 Data Cleaning simulation pattern cleaning priority strategy D’ Model training M’ Model eval Dtr corrupt labels Dn Fixed Training code Eval Score clean Model training Competitor side Evaluator side A noisy version Dn is generated from Dtr (eg label flipping) Target performance recorded by training on Dtr and testing on Dtest Strategies are scored based on number of cleaning actions required to achieve 95% of target performance - Corrupt some of the labels in Dtr à Dn - Let Pn be the model performance when using Dn for training. Pn will be less than P - Strategy must suggest ranking of examples in Dn such that by "cleaning" those in order, performance increases approximating P
  • 16. 16 What can be learnt from this exercise? cleaning strategy D’ Model training M’ eval Dn Mbest MLOps The challenge is effectively a simulation of a 2-levels iterative process: Challenge winners will have developed and demonstrated new strategies for training set debugging However: Strategy may be optimized for dataset Dn, task T, and the pre- selected model IDEAL 2023
  • 17. 17 Provenance and versioning CSi Di Model training M’ eval Dn Mbest MLOps We would like to: 1. Document that Di was derived from Dn using CSi, as part of a longer pipeline 2. Be able to identify: 1. What effect CSi had on Dn: 1. Which data labels were cleaned 2. Why they were cleaned 3. Make sure CSi can be reused safely: 1. Specify assumptions, pre-requisites 2. Provide examples of past usages IDEAL 2023
  • 18. 18 Provenance layer I: whole dataset Assumptions: - Dn, Di atomic units of data - CS atomic unit of processing Reproducibility: “Outer layer” questions - Where does Di come from? - Which version Di was used to train Mbest? Derivation: Di was derived from Dn using CSi Mbest was trained on Di Attribution: CSi was created by <creator C> xnj xi j CS wasGeneratedBy used C wasAssociatedWith wasDerivedFrom
  • 19. 19 Provenance layer I specification xnj xi j CS wasGeneratedBy used C wasAssociatedWith wasDerivedFrom entity(D_noisy, [ prov:type=”training-set’]) entity(D_clean, [ prov:type=”training-set’]) entity(e1, [ prov:type=”training-data”, inSet=‘D_noisy’, index=j, val=V]) entity(e2, [ prov:type=”training-data”, inset=‘D_clean, index=j, val=W]) entity(C, [ prov:type=”prov:agent”, prov:type=“CS-creator”]) activity(CS, [ prov:type=”cleaning-strategy”, version=”v1.0”, desc=‘…’]) wasDerivedFrom(e2, e1) used(CS,e1) wasGeneratedBy(e2, CS) wasAssociatedWith(CS,C) IDEAL 2023 Surface representation: PROV-N Internal representation: Property-value graphs! Hint: Neo4J works well…
  • 20. 20 Provenance layer II: data-granular provenance Assumptions: - Dn = {xnj}, Di = {xi j} - CS atomic unit of processing Explainability: Data-level Questions - which xnj were cleaned? - “how dirty was Dn?” in aggregate: how many labels were cleaned to achieve a target performance? Derivations: for each xi j that has been cleaned by CSi: xi j was derived from xnj IDEAL 2023
  • 21. 21 Provenance layer II specification Assumptions: - Dn = {xnj}, Di = {xi j} - CS atomic unit of processing Explainability: Data-level Questions - which xnj were cleaned? - “how dirty was Dn?” in aggregate: how many labels were cleaned to achieve a target performance? Derivations: for each xi j that has been cleaned by CSi: xi j was derived from xnj IDEAL 2023 xnj xi j CSi wasGeneratedBy used C wasAssociatedWith wasDerivedFrom xnj xnj xi j xi j
  • 22. 22 <event name> Representing entanglements The term “entanglement” denotes an iterative interleaving of data preparation and modelling During generic iteration 𝑖: - Assess processor takes a partially cleaned training set 𝐷𝑖 along with current model version 𝑀𝑖 trained on 𝐷𝑖 - determine next batch of items in 𝐷𝑖 to be cleaned - Clean is a separate processor, yielding a new version 𝐷𝑖+1 - This is used to train 𝑀𝑖+1 etc Train D0 M1 Di+1 Assess Clean Train Cleaning targets Mi+1
  • 23. 23 <event name> Provenance of entanglements - PROV can be used to express a provenance graph for this process - the graph must capture an unfolding of the process execution over the set of its iterations Starting from version 𝐷𝑖+1 of the data and move backwards in time: - 𝐷𝑖+1 was generated by instance 𝑖 + 1 of the clean processor - This took as input the batch of data items identified by assess𝑖+1 as targets for cleaning - This required 𝑀𝑖, which was generated by the 𝑖-th training iteration and 𝐷𝑖 - PROV allows for annotations to be added to each entity, activity, and relationships - These annotations may be drawn from: - a standard vocabulary (role to qualify the role of a processor in the pipeline) - custom vocabularies, for instance to associate performance metrics with each version of the model Cleani Traini-1 Assessi+1 Cleani+1 Di Mi Di+1 targets Cleani-1 Traini Assessi Cleani Di-1 Mi-1 targets
  • 24. 24 Use case 2: training set optimisation Motivation: training efficiency à model performance (test loss) correlates with training data size D according to a power law [11] However, “Since scalings with N (model size), D (training tokens), Cmin (compute budget) are power-laws, there are diminishing returns with increasing scale.” [11] [11] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. This motivates trying to optimize D: 1- Redundancy in D leads to wasted training time 2- Not all training examples are equally important for training: Ø which ones should be kept / removed? IDEAL 2023
  • 25. 25 Training set optimization Task 1: reducing redundancy [12] Abbas, Amro, Kushal Tirumala, Dániel Simig, Surya Ganguli, and Ari S. Morcos. ‘SemDeDup: Data-Efficient Learning at Web-Scale through Semantic Deduplication’. arXiv, 22 March 2023. http://arxiv.org/abs/2303.09540. Approach [12]: 1. Map the training set D to an embedded space – using pre-trained foundation models 2. Cluster all data points in embedded space using k-means 3. Using cosine similarity, identify similar points within each cluster. Threshold and select IDEAL 2023
  • 26. 26 Training set optimization Task 2: pruning easy/hard examples Main findings from [13]: 1. Not all training examples are created equal • Hard vs easy 2. The best pruning strategy depends on the amount of initial data • Small TS à keep the easy examples • Large TS à keep the hard examples [13] Sorscher, Ben, et al. Beyond neural scaling laws: beating power law scaling via data pruning, Advances in Neural Information Processing Systems 35 (2022): 19523-19536. Repr from [13] A real simple pruning method – very similar to Task 1 "To compute a self-supervised pruning metric for ImageNet, we perform k-means clustering in the embedding space of an ImageNet pre-trained self-supervised model and define the difficulty of each data point by the Euclidean distance to its nearest cluster centroid, or prototype" Caveat: only tested on ImageNet! IDEAL 2023
  • 27. 27 Filter Provenance for training set optimization This is a classic filter pipeline – only a little more sophisticated: TSfull TSopt Filter wasGeneratedBy wasDerivedFrom used TSfull Embed Cluster Select TSopt Layers I and II are very similar to Use Case 1: Reproducibility: - Where does TSopt come from? à black / gray box options TSfull TSopt wasDerivedFrom used Embed Cluster Select TSemb TSclus used wgby used wgby wgby IDEAL 2023
  • 28. 28 Provenance for training set optimization / Layer II Assumptions: - TSfull = {ti}, TSopt = {ti} - Filter is an atomic unit of processing Explainability: Data-level Questions: - which ti were filtered out? - “how redundant was TSfull?” Derivations: for each ti that has been removed by Filter: ti was invalidated by Filter TSfull ti Filter wasInvalidatedBy used ti ti IDEAL 2023
  • 29. 29 How can we generate these provenance graphs? Key idea for Layer II (data-granular): Interpreter-level observer - Requires observer at the boundaries of CS, i.e. to tell which x.label have changed - Observer has access to individual dataframe elements - But it is unaware of data transformation semantics [14] A. Chapman, P. Missier, G. Simonelli, and R. Torlone. 2020. Capturing and querying fine-grained provenance of preprocessing pipelines in data science. Proc. VLDB Endow. 14, 4 (December 2020), 507–520. https://doi.org/10.14778/3436905.3436911 [15] A. Chapman, L. Lauro, P. Missier, and R. Torlone. 2022. DPDS: assisting data science with data provenance. Proc. VLDB Endow. 15, 12 (2022), 3614–3617. https://doi.org/10.14778/3554821.3554857 Adriane Chapman, Luca Lauro, Paolo Missier, and Riccardo Torlone. 2024. Supporting Better Insights of Data Science Pipelines with Fine- grained Provenance. ACM Trans. Database Syst. Just Accepted (February 2024). https://doi.org/10.1145/3644385 xnj xi j CSi wasGeneratedBy used C wasAssociatedWith wasDerivedFrom A starting point: Data Provenance for Data Science (DPDS) IDEAL 2023
  • 30. 30 Capturing provenance: Layer I CSi Di Model training Dn Mbest MLOps Typical implementation: - Pandas / Spark python pipeline / Dataframe datasets - CS can be a method call or a code block: Layer I (coarse): Process-level observer 1 - method call: Di = CS(Dn) 2 - Code block: Dn à à Di “Begin CS” -- -- -- “End CS” Dn Di CS wasGeneratedBy wasDerivedFrom used wasDerivedFrom used wasGeneratedBy IDEAL 2023
  • 31. 31 Running example: A simple pipeline D1 D2 D3 Add ‘E4,’ ‘Ex’, ‘E1’ Remove ‘E’ D4 D6 Da Db Left join (K1,K2) Impute all missing Dc Left join (K1,K2) Impute E,F D5 One-hot encoding df = pd.merge(df_A, df_B, on=['key1', 'key2'], how='left’) # join df = df.fillna('imputed’) # Imputation df = pd.merge(df, df_C, on=['key1', 'key2'], how='left’) #join df = df.fillna(value={'E':'Ex', 'F':'Fx’}) # Imputation # one-hot encoding c = 'E' dummies = [] dummies.append(pd.get_dummies(df[c])) df_dummies = pd.concat(dummies, axis=1) df = pd.concat((df, df_dummies), axis=1) df = df_A.drop([c], axis=1)
  • 32. 32 Aims Capture, store and query element-level provenance - Derivation of each element of each intermediate dataframe (when possible) - Efficiently, at scale fillna Join df_1 df_B (df_0) df_A (df_-1)
  • 33. 33 <event name> Granularity Base case: - opaque program Po - coarse-grained dataset Default provenance: - Every output depends on every input <latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit> x2 <latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit> x1 <latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit> y2 <latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit> y1 P0 <latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit> x2 <latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit> x1 <latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit> y2 <latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit> y1 - Transparent program PT - Fine-grained datasets PT … … … … … … … … <latexit sha1_base64="WDFO0CJ+nkhQJarjpMsYWauNLLg=">AAAJ6XicjZZfb9s2EMDVrtvidH/a9XEvRIOgA7Zl9lBse6yzummAIvGKpC0QBwZFnRUiFKmRlF3D0Dfoyx5WFH3dJ9rjvs2OshtRlNpNgAHe3e+OR9/xpDgX3Nh+/59r1z+68fEnn271tm9+9vkXX966/dUzowrN4JQpofSLmBoQXMKp5VbAi1wDzWIBz+PLX539+Ry04Uqe2GUO5xlNJZ9xRi2qni7vTW/t9Pf61UPai8FmsRNtnvH09tbfk0SxIgNpmaDGnA36uT1fUW05E1BuTwoDOWWXNIUzXEqagTlfVamWZBc1CZkpjT9pSaX1PVY0M2aZxUhm1F6Y0OaUXbazws5+OV9xmRcWJFtvNCsEsYq4c5OEa2BWLHFBmeaYK2EXVFNm8d/ZxmciYcFUllGZrCa5VvNyNXHbcLuqpLJJJJDXgBPKMILvH1pVPKvNNDYtbyU8dxTKMMG0to8PQneQtXUkQyueuTYPUQjtXuxh2traVRCBJxuiEoMIizReesgi3V+GCJczUWClwOeoOXynTtoui2TWhB+C5nNIHmmVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn+PPdc8gyvoXtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va/AC0LAnZJSnI7wuDY4IoHIYEJxcH8x3Bi8Dn3K3DINyLwqswIUFHw4Mr5FtE3E1p7oUkyZR+76YkF4UhQ3SwOIF2d4nKQVOrdJiOVoX3Hx9UYuvOBhRNOznTFWyKLYYjKkCZUMYbQpUYIBpyryJOoqxVOJo2sbXcAWpY4JD241Vy2AcvrWv3uuXWcti+/yv7zN8swzJ15V7pvYH4tJNUOr+gMqSPK223BxWi22koxIf8ED80SuDYSRrpXynDOqr/rpEOb1cIjN17t37jDE8eh8Th0VENTOY4si7A0im+kluddXx60omqwrbYw6Nulss2mzSLnnQVfXz8xIvHqCDjssSPoEH4ydNePPtxb/DT3v3f7u882N98Dm1FX0d3o2+iQfRz9CB6HI2j04hFs+hV9Gf0unfZ+6P3pvd2jV6/tvG5EzWe3l//AiP5yLY=</latexit> y0 <latexit sha1_base64="DwI/TEIjT7TE0TxGOcOt4qwCSCM=">AAAJ6nicjZZfb9s2EMDV7l+ctVu7Pu6FWBCkwNbMHoptj3VaNw1QJN6WtAXiwKCos0KUIjWSsmcY+gh96cOGYa/7Qnvst+lR9iKKUrsJMMC7+93x6DueFOeCG9vvv7l2/YMPP/r4k63e9qc3bn72+a3bXzwzqtAMzpgSSr+IqQHBJZxZbgW8yDXQLBbwPH750Nmfz0EbruSpXeZwkdFU8hln1KLql+Xe3vTWTn+/Xz2kvRhsFjvR5hlPb2/9M0kUKzKQlglqzPmgn9uLFdWWMwHl9qQwkFP2kqZwjktJMzAXqyrXkuyiJiEzpfEnLam0vseKZsYssxjJjNpLE9qcsst2XtjZjxcrLvPCgmTrjWaFIFYRd3CScA3MiiUuKNMccyXskmrKLP492/hMJCyYyjIqk9Uk12periZuG25XlVQ2iQTyGnBCGUbw/UOrime1mcam5a2E545CGSaY1vbxYegOsraOZGjFM9fmIQqh3Ys9TFtbuwoi8HRDVGIQYZHGSw9ZpAfLEOFyJgqsFPgcNUf/qpO2yyKZNeFHoPkcksdaZS2WLhrsonVIa/WpajCtP8Kp3gskICCl1j8DjVU7jDvWwbJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg1/fYc81z+AK2gubAeYZ9S7CWmylIlUCNXTspGA7d0dropLCAwsag3cr1mKYT1qMhi7bdLpKi2+BliUhuyQFea8wOCaIwmlIcHJxMN8QvAh8zt06DMK9KLwKExJ0NDy8Qr5GxN2U5l5Ikkzpd25KclEYMkQHixNod5eoHDS1SofpaFV4//FhJbbubEDRtJMzXcGm2GI4ogKUCWW8IVSJAaIh9yriJMpahaNpE1vLHaCGBQ5pP14lh33wm3XtXrfcWg7b939ln/mbZVimrtwrvTcQf+4klc4vqQzpk0rb7UGF6HYaCvE+P8SPjBI4dpJG+lfKsI7qv2ukw9sVAmP33q3fOMPTJyFxdHxcA5M5jqxLsHSKr+RWZ52cnXaiqrAt9ui4m+WyzSbNoiddRR+fPPXiMSrIuCzxI2gQfvK0F8++2x98v3//p/s7Dw42n0Nb0ZfRV9HdaBD9ED2InkTj6CxiURq9in6P/uiJ3uven72/1uj1axufO1Hj6f39FsSqyOc=</latexit> y00 <latexit sha1_base64="6btFIAfnhYUQuiag2Q9KpTGQ07U=">AAAJ6nicjZZfb9s2EMDVdlvj7F/bPfaFWBBkwLbUHoq2j3U6Lw1QJN6WtAXiwKCos0KUIjWSsmsY+gh72cOGoq/7Qnvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+o0PPvzo5lZv++NPPv3s81u37zw3qtAMzpgSSr+MqQHBJZxZbgW8zDXQLBbwIn71xNlfzEEbruSpXeZwkdFU8hln1KLq59d7e9NbO/39fvWQ9mKwWexEm2c8vb311yRRrMhAWiaoMeeDfm4vVlRbzgSU25PCQE7ZK5rCOS4lzcBcrKpcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTWhzyi7beWFnjy5WXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYJdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jw9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4NmGqMQgwiKNlx6ySA+WIcLlTBRYKfA5ao7+VSdtl0Uya8Lfg+ZzSH7QKmuxdNFgF61DWqtPVYNp/RFO9V4gAQEptf4ZaKzaYdyxDpatkyqddZ3TWLxLHryWwxL7NQDpShDUaeR18KjVvzgmlE7AdSkuGfzyDnuueQZX0F7YDDDPqHcR1mIrFakSqKFjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWtwDWpaE7JIU5LeFwTFBFE5DgpOLg/mG4EXgc+7WYRDuReFVmJCgo+HhFfI1Iu6mNPdCkmRKv3NTkovCkCE6WJxAu7tE5aCpVTpMR6vC+48PK7F1ZwOKpp2c6Qo2xRbDERWgTCjjDaFKDBANuVcRJ1HWKhxNm9ha7gA1LHBI+/EqOeyD19a1e91yazls3/+VfeZvlmGZunKv9N5A/KmTVDq/pDKkTypttwcVottpKMT7/BA/Mkrg2Eka6V8pwzqq/66RDm9XCIzde7d+4wxPn4bE0fFxDUzmOLIuwdIpvpJbnXVydtqJqsK22KPjbpbLNps0i550FX188syLx6gg47LEj6BB+MnTXjz/bn/wYP/+j/d3Hh9sPoe2orvRl9FX0SB6GD2Onkbj6CxiURr9Gv0e/dETvd96b3pv1+j1axufL6LG0/vzH7skyOY=</latexit> x00 <latexit sha1_base64="I6+n8F2FX3hZLaVhUUK85v5iUBg=">AAAJ6XicjZZfb9s2EMDVrtvi7F+7PfaFaBB0wLbUHoq1j3U6Nw1QJF6RtAXiwKCos0KEIjWSsmsY+gZ72cOKoq/7RHvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+kc3Pv7k063e9meff/HlVzdvff3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/jysbO/nIM2XMkTu8zhPKOp5DPOqEXV89d3pzd3+nv96iHtxWCz2Ik2z3h6a+uvSaJYkYG0TFBjzgb93J6vqLacCSi3J4WBnLJLmsIZLiXNwJyvqlRLsouahMyUxp+0pNL6HiuaGbPMYiQzai9MaHPKLttZYWcPz1dc5oUFydYbzQpBrCLu3CThGpgVS1xQpjnmStgF1ZRZ/He28ZlIWDCVZVQmq0mu1bxcTdw23K4qqWwSCeQ14IQyjOD7h1YVz2ozjU3LWwnPHYUyTDCt7eOD0B1kbR3J0Ipnrs1DFEK7F3uYtrZ2FUTg2YaoxCDCIo2XHrJI95chwuVMFFgp8DlqDv9VJ22XRTJrwj+D5nNInmiVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn2PPdc8gyvobtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va3ANaloTskhTkD4XBMUEUDkOCk4uD+Z7gReBz7tZhEO5F4VWYkKCj4cEV8h0i7qY090KSZEq/d1OSi8KQITpYnEC7u0TloKlVOkxHq8L7jw8qsXVnA4qmnZzpCjbFFsMRFaBMKOMNoUoMEA25VxEnUdYqHE2b2FruADUscEj78So57IPX1rV73XJrOWzf/5V95m+WYZm6cq/03kB83kkqnV9QGdLHlbbbgwrR7TQU4kN+iB8aJXDsJI30r5RhHdV/10iHtysExu69W79xhidPQ+Lw6KgGJnMcWRdg6RRfya3OOj496URVYVvs4VE3y2WbTZpFT7qKPj5+5sVjVJBxWeJH0CD85GkvXvy4N/hp7/4v93ce7W8+h7ai29Gd6NtoED2IHkVPo3F0GrFoFv0W/RG96V32fu+97b1bo9evbXy+iRpP789/ABp0yLU=</latexit> x0 <latexit sha1_base64="jiqv4wWSi+neM/UI4QKRqZzPeBg=">AAAKCnicjZZfb+Q0EMBzx79u+deDR14MVXVIQNlFJ+DxtrD0Kp3aBbV3J3WrlePMpuYcO9jO7q2iPPPCV+GFBxDilS/AI9+GcXZpHCd3EGklz8xvxuPMeDZxLrixw+Hft26/9PIrr762M9h9/Y0333p77847j4wqNIMLpoTST2JqQHAJF5ZbAU9yDTSLBTyOn37l7I+XoA1X8tyuc7jKaCr5gjNqUTXfe39Wkmd356XQFSEzAQtLtVYrskYd/74is2q+tz88HNYP6S5G28V+tH2m8zs7f80SxYoMpGWCGnM5Gub2qqTaciag2p0VBnLKntIULnEpaQbmqqzPUpED1CRkoTT+pCW11vcoaWbMOouRzKi9NqHNKftsl4VdfHlVcpkXFiTbbLQoBLGKuBdDEq6BWbHGBWWaY66EXVNNmcXXt4vPTMKKqSyjMilnuVbLqpy5bbgta6lqEwnkDeCEKozg+4dWFS8aM41Nx1sJzx2FKkwwbezT49AdZGOdyNCKZ27MYxRCuxd7nHa2dhVE4OGWqMUgwiqN1x6ySo/WIcLlQhRYKfA5ak7+VSddl1WyaMNfg+ZLSL7RKuuwdNViV51DWqvPVYvpvAineiGQgICUWv8MNFbdMO5YR+vOSZXO+s5p8JL6G2/ksMR+DUC6EgR1mngdPOn0L44RpRNwXYpLBj88x55rnsENdDdsBlhm1LsIG7GTilQJNNCpk4Lt3B1tiFoKDyxoDN6t2IhhPmkxGbts03mZFp8CrXDuHZAU5CeFwTFBFE5LgpOLg/mY4EXgS+7WYRDuReF1mJCgk/HxDfIRIu6mtPdCkmRKP3dTkovCkDE6WJxABwdE5aCpVTpMR6vCe8fHtdi5swFF017O9AWbY4vhiApQJpTxhlAtBoiG3KuIkyjrFI6mbWwj94AaVjik/Xi1HPbBM+vavWm5jRy27//KPvM3y7BMfbnXem8gftdLKp1fUxnSZ7W234MK0e80FuJFfoifGCVw7CSt9G+UYR3Vf9dIh7crBKbuf7f5xxmfPwiJk9PTBpgtcWRdg6X4mSE7nXV2cd6LqsJ22JPTfpbLLpu0i570FX169tCLx6gg08p9BI3CT57u4tFnh6PPD+99e2///tH2c2gnei/6IPowGkVfRPejB9E0uohY9GP0c/Rr9Nvgp8Evg98Hf2zQ27e2Pu9GrWfw5z88a9Wd</latexit> {x0 lr y0 ij} <latexit sha1_base64="NqcxcrQ0rsfiFl3i99oBHxn5Teo=">AAAKC3icjZbNbtw2EICVpD9e989Jjr0QMQwXaOvuFkHaY9bt1jEQ2NvCTgJYxoKiZmUmFKmQ1G4Wgu699FV66aFF0WsfoMe+TUnt1qJIJa2ABTgz3wyHmuGskoJRpYfDv2/cvPXW2++8uzXYfu/9Dz78aOf2nSdKlJLAORFMyGcJVsAoh3NNNYNnhQScJwyeJi++sfanC5CKCn6mVwVc5jjjdE4J1kY127kXV+jV/qxiskYoZjDXWEqxRKt9o6TPaxTXs53d4cGweVC4GG0Wu9Hmmc5ub/0Vp4KUOXBNGFbqYjQs9GWFpaaEQb0dlwoKTF7gDC7MkuMc1GXVHKZGe0aTormQ5sc1arSuR4VzpVZ5Ysgc6yvl26yyz3ZR6vnXlxXlRamBk/VG85IhLZB9MyilEohmK7PARFKTKyJXWGKizfvbNk/MYUlEnmOeVnEhxaKuYrsN1VUj1V0ihaIFrFD7EVx/3yqSeWvGiQq8BXPcjVD7CWatfXrkuwNvrRPuW82ZW/PYCL7diT3Ogq1tBQ3weEM0ohdhmSUrB1lmhysfoXzOSlMpcDmsjv9Vp6HLMp134W9B0gWk30mRByxedthlcEit5ZnoMMGLsKo3AikwyLB2z4ATEYaxxzpcBScVMu87pzK31N14LfsldmsA3JbAq9PE6eBJ0L9mjgiZgu1SsyTw8jX2QtIcrqF9vxlgkWPnIqzFIBUuUmihEyt529k72hKN5B+Y4QScW7EW/XyycjK22WazKiu/AFybwbeHMuCfl8qMCSTMuERmclFQnyFzEeiC2rUfhDpRaBPGJ/BkfHSNfGoQe1O6exkS5UK+dlNUsFKhsXHQZgLt7SFRgMRaSD8dKUrnHR81YnBnPQpnvZzqCzYzLWZGlIcSJpQzhBrRQyQUTkWshElQOJx1sbXcA0pYmiHtxmtkvw9eadvubcutZb99/1f2ubtZbsrUl3ujdwbiD72kkMUV5j592mj7PTBj/U5jxt7kZ/BjJZgZO2kn/WulX0fx3zWS/u3ygan9323/ccZnj3zi+OSkBeKFGVlXoLH5zOBBZ52en/WiotQBe3zSz1Iesmm36Glf0aenj514BDM0re1H0Mj/5AkXT748GD04uP/9/d2Hh5vPoa3o4+he9Ek0ir6KHkaPoml0HpHox+jn6Nfot8FPg18Gvw/+WKM3b2x87kadZ/DnP+cO1c4=</latexit> {x0 lr y00 ij} <latexit sha1_base64="3PylV2aI3eJhnvBvxRESOyUdht8=">AAAKDHicjZZfj9w0EMDT8u/2+NfCIy8Wp9MhAdddVLU8dg+W60nV3YLu2kqX08pxZrPmHDvYzm5XUT4AL3wVXngAIV5555Fvg51dLo6dFiKt5Jn5zXicGc8mKRhVejj8+9bt115/4823dga7b7/z7nvv37n7wVMlSknggggm5PMEK2CUw4WmmsHzQgLOEwbPkuuvrP3ZEqSigp/rdQFXOc44nVOCtVHN7uzFFXpxcDCrFtc1QjGDucZSihVaWyX9vkZxbajh4bB5ULgYbRd70faZzu7u/BWngpQ5cE0YVupyNCz0VYWlpoRBvRuXCgpMrnEGl2bJcQ7qqmpOU6N9o0nRXEjz4xo1WtejwrlS6zwxZI71Qvk2q+yzXZZ6/uVVRXlRauBks9G8ZEgLZF8NSqkEotnaLDCR1OSKyAJLTLR5gbvmiTmsiMhzzNMqLqRY1lVst6G6aqS6S6RQtIAVaj+C6+9bRTJvzThRgbdgjrsRaj/BrLVPj3134K11wn2rOXNrHhvBtzuxx1mwta2gAZ5siUb0IqyyZO0gq+xo7SOUz1lpKgUuh9XJv+o0dFml8y78NUi6hPQbKfKAxasOuwoOqbU8Fx0meBFW9UogBQYZ1u4ZcCLCMPZYR+vgpELmfedU5pa6G29kv8RuDYDbEnh1mjgdPAn61wwSIVOwXWqWBH54ib2QNIcb6MBvBljm2LkIGzFIhYsUWujUSt529o62RCP5B2Y4AedWbEQ/n6ycjG222azKynuAazP49lEG/PNSmTGBhJmXyEwuCuozZC4CXVK79oNQJwptwvgEnoyPb5BPDWJvSncvQ6JcyJduigpWKjQ2DtpMoP19JAqQWAvppyNF6bzj40YM7qxH4ayXU33BZqbFzIjyUMKEcoZQI3qIhMKpiJUwCQqHsy62kXtACSszpN14jez3wQtt271tuY3st+//yj53N8tNmfpyb/TOQPyulxSyWGDu02eNtt8DM9bvNGbsVX4GP1GCmbGTdtK/Ufp1FP9dI+nfLh+Y2v/d9h9nfP7YJ05OT1sgXpqRtQCNzWcGDzrr7OK8FxWlDtiT036W8pBNu0VP+4o+PXvixCOYoWltP4JG/idPuHj6xeHoweH9b+/vPTrafg7tRB9FH0efRKPoYfQoehxNo4uIRD9GP0e/Rr8Nfhr8Mvh98McGvX1r6/Nh1HkGf/4DI8vV9A==</latexit> {x00 hk y00 ij} - Transparent pipeline - Fine-grained datasets P’T … … … … … … … … <latexit sha1_base64="WDFO0CJ+nkhQJarjpMsYWauNLLg=">AAAJ6XicjZZfb9s2EMDVrtvidH/a9XEvRIOgA7Zl9lBse6yzummAIvGKpC0QBwZFnRUiFKmRlF3D0Dfoyx5WFH3dJ9rjvs2OshtRlNpNgAHe3e+OR9/xpDgX3Nh+/59r1z+68fEnn271tm9+9vkXX966/dUzowrN4JQpofSLmBoQXMKp5VbAi1wDzWIBz+PLX539+Ry04Uqe2GUO5xlNJZ9xRi2qni7vTW/t9Pf61UPai8FmsRNtnvH09tbfk0SxIgNpmaDGnA36uT1fUW05E1BuTwoDOWWXNIUzXEqagTlfVamWZBc1CZkpjT9pSaX1PVY0M2aZxUhm1F6Y0OaUXbazws5+OV9xmRcWJFtvNCsEsYq4c5OEa2BWLHFBmeaYK2EXVFNm8d/ZxmciYcFUllGZrCa5VvNyNXHbcLuqpLJJJJDXgBPKMILvH1pVPKvNNDYtbyU8dxTKMMG0to8PQneQtXUkQyueuTYPUQjtXuxh2traVRCBJxuiEoMIizReesgi3V+GCJczUWClwOeoOXynTtoui2TWhB+C5nNIHmmVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn+PPdc8gyvoXtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va/AC0LAnZJSnI7wuDY4IoHIYEJxcH8x3Bi8Dn3K3DINyLwqswIUFHw4Mr5FtE3E1p7oUkyZR+76YkF4UhQ3SwOIF2d4nKQVOrdJiOVoX3Hx9UYuvOBhRNOznTFWyKLYYjKkCZUMYbQpUYIBpyryJOoqxVOJo2sbXcAWpY4JD241Vy2AcvrWv3uuXWcti+/yv7zN8swzJ15V7pvYH4tJNUOr+gMqSPK223BxWi22koxIf8ED80SuDYSRrpXynDOqr/rpEOb1cIjN17t37jDE8eh8Th0VENTOY4si7A0im+kluddXx60omqwrbYw6Nulss2mzSLnnQVfXz8xIvHqCDjssSPoEH4ydNePPtxb/DT3v3f7u882N98Dm1FX0d3o2+iQfRz9CB6HI2j04hFs+hV9Gf0unfZ+6P3pvd2jV6/tvG5EzWe3l//AiP5yLY=</latexit> y0 <latexit sha1_base64="DwI/TEIjT7TE0TxGOcOt4qwCSCM=">AAAJ6nicjZZfb9s2EMDV7l+ctVu7Pu6FWBCkwNbMHoptj3VaNw1QJN6WtAXiwKCos0KUIjWSsmcY+gh96cOGYa/7Qnvst+lR9iKKUrsJMMC7+93x6DueFOeCG9vvv7l2/YMPP/r4k63e9qc3bn72+a3bXzwzqtAMzpgSSr+IqQHBJZxZbgW8yDXQLBbwPH750Nmfz0EbruSpXeZwkdFU8hln1KLql+Xe3vTWTn+/Xz2kvRhsFjvR5hlPb2/9M0kUKzKQlglqzPmgn9uLFdWWMwHl9qQwkFP2kqZwjktJMzAXqyrXkuyiJiEzpfEnLam0vseKZsYssxjJjNpLE9qcsst2XtjZjxcrLvPCgmTrjWaFIFYRd3CScA3MiiUuKNMccyXskmrKLP492/hMJCyYyjIqk9Uk12periZuG25XlVQ2iQTyGnBCGUbw/UOrime1mcam5a2E545CGSaY1vbxYegOsraOZGjFM9fmIQqh3Ys9TFtbuwoi8HRDVGIQYZHGSw9ZpAfLEOFyJgqsFPgcNUf/qpO2yyKZNeFHoPkcksdaZS2WLhrsonVIa/WpajCtP8Kp3gskICCl1j8DjVU7jDvWwbJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg1/fYc81z+AK2gubAeYZ9S7CWmylIlUCNXTspGA7d0dropLCAwsag3cr1mKYT1qMhi7bdLpKi2+BliUhuyQFea8wOCaIwmlIcHJxMN8QvAh8zt06DMK9KLwKExJ0NDy8Qr5GxN2U5l5Ikkzpd25KclEYMkQHixNod5eoHDS1SofpaFV4//FhJbbubEDRtJMzXcGm2GI4ogKUCWW8IVSJAaIh9yriJMpahaNpE1vLHaCGBQ5pP14lh33wm3XtXrfcWg7b939ln/mbZVimrtwrvTcQf+4klc4vqQzpk0rb7UGF6HYaCvE+P8SPjBI4dpJG+lfKsI7qv2ukw9sVAmP33q3fOMPTJyFxdHxcA5M5jqxLsHSKr+RWZ52cnXaiqrAt9ui4m+WyzSbNoiddRR+fPPXiMSrIuCzxI2gQfvK0F8++2x98v3//p/s7Dw42n0Nb0ZfRV9HdaBD9ED2InkTj6CxiURq9in6P/uiJ3uven72/1uj1axufO1Hj6f39FsSqyOc=</latexit> y00 <latexit sha1_base64="6btFIAfnhYUQuiag2Q9KpTGQ07U=">AAAJ6nicjZZfb9s2EMDVdlvj7F/bPfaFWBBkwLbUHoq2j3U6Lw1QJN6WtAXiwKCos0KUIjWSsmsY+gh72cOGoq/7Qnvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+o0PPvzo5lZv++NPPv3s81u37zw3qtAMzpgSSr+MqQHBJZxZbgW8zDXQLBbwIn71xNlfzEEbruSpXeZwkdFU8hln1KLq59d7e9NbO/39fvWQ9mKwWexEm2c8vb311yRRrMhAWiaoMeeDfm4vVlRbzgSU25PCQE7ZK5rCOS4lzcBcrKpcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTWhzyi7beWFnjy5WXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYJdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jw9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4NmGqMQgwiKNlx6ySA+WIcLlTBRYKfA5ao7+VSdtl0Uya8Lfg+ZzSH7QKmuxdNFgF61DWqtPVYNp/RFO9V4gAQEptf4ZaKzaYdyxDpatkyqddZ3TWLxLHryWwxL7NQDpShDUaeR18KjVvzgmlE7AdSkuGfzyDnuueQZX0F7YDDDPqHcR1mIrFakSqKFjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWtwDWpaE7JIU5LeFwTFBFE5DgpOLg/mG4EXgc+7WYRDuReFVmJCgo+HhFfI1Iu6mNPdCkmRKv3NTkovCkCE6WJxAu7tE5aCpVTpMR6vC+48PK7F1ZwOKpp2c6Qo2xRbDERWgTCjjDaFKDBANuVcRJ1HWKhxNm9ha7gA1LHBI+/EqOeyD19a1e91yazls3/+VfeZvlmGZunKv9N5A/KmTVDq/pDKkTypttwcVottpKMT7/BA/Mkrg2Eka6V8pwzqq/66RDm9XCIzde7d+4wxPn4bE0fFxDUzmOLIuwdIpvpJbnXVydtqJqsK22KPjbpbLNps0i550FX188syLx6gg47LEj6BB+MnTXjz/bn/wYP/+j/d3Hh9sPoe2orvRl9FX0SB6GD2Onkbj6CxiURr9Gv0e/dETvd96b3pv1+j1axufL6LG0/vzH7skyOY=</latexit> x00 <latexit sha1_base64="I6+n8F2FX3hZLaVhUUK85v5iUBg=">AAAJ6XicjZZfb9s2EMDVrtvi7F+7PfaFaBB0wLbUHoq1j3U6Nw1QJF6RtAXiwKCos0KEIjWSsmsY+gZ72cOKoq/7RHvct9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33t+kc3Pv7k063e9meff/HlVzdvff3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/jysbO/nIM2XMkTu8zhPKOp5DPOqEXV89d3pzd3+nv96iHtxWCz2Ik2z3h6a+uvSaJYkYG0TFBjzgb93J6vqLacCSi3J4WBnLJLmsIZLiXNwJyvqlRLsouahMyUxp+0pNL6HiuaGbPMYiQzai9MaHPKLttZYWcPz1dc5oUFydYbzQpBrCLu3CThGpgVS1xQpjnmStgF1ZRZ/He28ZlIWDCVZVQmq0mu1bxcTdw23K4qqWwSCeQ14IQyjOD7h1YVz2ozjU3LWwnPHYUyTDCt7eOD0B1kbR3J0Ipnrs1DFEK7F3uYtrZ2FUTg2YaoxCDCIo2XHrJI95chwuVMFFgp8DlqDv9VJ22XRTJrwj+D5nNInmiVtVi6aLCL1iGt1SeqwbT+CKf6IJCAgJRa/ww0Vu0w7lj7y9ZJlc66zmks3iUPXsthif0agHQlCOo08jp41OpfnBJKJ+C6FJcMfn2PPdc8gyvobtgMMM+odxHWYisVqRKooSMnBdu5O1oTlRQeWNAYvFuxFsN80mI0dNmm01Va3ANaloTskhTkD4XBMUEUDkOCk4uD+Z7gReBz7tZhEO5F4VWYkKCj4cEV8h0i7qY090KSZEq/d1OSi8KQITpYnEC7u0TloKlVOkxHq8L7jw8qsXVnA4qmnZzpCjbFFsMRFaBMKOMNoUoMEA25VxEnUdYqHE2b2FruADUscEj78So57IPX1rV73XJrOWzf/5V95m+WYZm6cq/03kB83kkqnV9QGdLHlbbbgwrR7TQU4kN+iB8aJXDsJI30r5RhHdV/10iHtysExu69W79xhidPQ+Lw6KgGJnMcWRdg6RRfya3OOj496URVYVvs4VE3y2WbTZpFT7qKPj5+5sVjVJBxWeJH0CD85GkvXvy4N/hp7/4v93ce7W8+h7ai29Gd6NtoED2IHkVPo3F0GrFoFv0W/RG96V32fu+97b1bo9evbXy+iRpP789/ABp0yLU=</latexit> x0 Pn T <latexit sha1_base64="XR1xHgdabLZzPwu5UV3KZ4hDO0k=">AAAJ7nicjZZRb9s2EICVrtvidFvb7XEvxIKgA9al9lB0e4yzuWmAInGHpCkQBwZFnRU2FKmRlD1D0I/Yyx46FHvd39nj/k1J2Y0oUu0qwADv7rvjUXc8K84ZVbrf/2/jxkc3P/7k083e1q3PPv/i9p27Xz5XopAETolgQr6IsQJGOZxqqhm8yCXgLGZwFl/9bO1nc5CKCn6ilzlcZDjldEYJ1kZ1trw3LenLanpnu7/brx8ULgbrxXa0fsbTu5v/ThJBigy4JgwrdT7o5/qixFJTwqDamhQKckyucArnZslxBuqirPOt0I7RJGgmpPlxjWqt61HiTKllFhsyw/pS+Tar7LKdF3r200VJeV5o4GS10axgSAtkD48SKoFotjQLTCQ1uSJyiSUm2ryiLfNMOCyIyDLMk3KSSzGvyondhuqylqo2kUDeAFao/Aiuv28V8awx41gF3oI57kao/ATTxj4+8N2BN9YR963mzI15aATf7sQepsHWtoIGeLomatGLsEjjpYMs0v2lj1A+Y4WpFLgcVodv1UnoskhmbfgXkHQOyWMpsoDFixa7CA6ptTwRLSZ4EVb1XiABBinW7hlwLMIw9lj7y+CkQmZd51Ta3CUHXsl+id0aALcl8Oo0cjp4FPSvGRVCJmC71CwJ/PYOey5pBtfQPb8ZYJ5h5yKsxCAVLhJooCMredvZO9oQteQfmOEYnFuxEv180mI0tNmm0zItHgCuKoR2UAr8+0KZMYGEmYjITC4K6j4yF4HOqV37QagThdZhfAKPhgfXyHcGsTelvZchUSbkOzdFOSsUGhoHbSbQzg4SOUishfTTkaJw3vFBLQZ31qNw2smprmBT02JmRHkoYUI5Q6gWPURC7lTESpgEhcNpG1vJHaCEhRnSbrxa9vvgd23bvWm5ley37wdln7mbZaZMXbnXemcg/tpJCplfYu7Tx7W22wMz1u00ZOx9fgY/VIKZsZO00r9W+nUU/18j6d8uHxjb/93mH2d48sQnDo+OGmAyNyPrEjQ23xg86Kzj05NOVBQ6YA+PulnKQzZpFz3pKvr4+KkTj2CGxpX9CBr4nzzh4vkPu4NHuw+fPdze219/Dm1GX0ffRN9Gg+jHaC96Eo2j04hEV9Ef0avor17e+7P3uvf3Cr2xsfb5Kmo9vX/eAKS3yxI=</latexit> y0 ij <latexit sha1_base64="3NfXwQdj/FlcBXZom/kAk4DjJws=">AAAJ7XicjZZfb9s2EMDVdlvj7F//PO6FWBBswLbMLoq2j3VWLw1QJN6QpAXiwKCos8yGIlWSsmcI+g572cOGYa/9PH3stykpexFFqt0EGODd/e549B1PinNGle733167fuOjjz+5udXb/vSzz7/48tbtO2dKFJLAKRFMyBcxVsAoh1NNNYMXuQScxQyex5c/WfvzBUhFBT/RqxwuMpxyOqMEa6M6ezIt55fV9NZOf69fPyhcDDaLnWjzjKe3t95MEkGKDLgmDCt1Pujn+qLEUlPCoNqeFApyTC5xCudmyXEG6qKs063QrtEkaCak+XGNaq3rUeJMqVUWGzLDeq58m1V22c4LPXt0UVKeFxo4WW80KxjSAtmzo4RKIJqtzAITSU2uiMyxxESbf2jbPBMOSyKyDPOknORSLKpyYrehuqylqk0kkDeAFSo/guvvW0U8a8w4VoG3YI67ESo/wbSxjw98d+CNdcR9qzlzYx4awbc7sYdpsLWtoAGebYha9CIs03jlIMt0f+UjlM9YYSoFLofV4b/qJHRZJrM2/AQkXUDysxRZwOJli10Gh9RanogWE/wRVvVBIAEGKdbuGXAswjD2WPur4KRCZl3nVNrcJQdey36J3RoAtyXw6jRyOngU9K+ZFEImYLvULAm8eo89lzSDK+gbvxlgkWHnIqzFIBUuEmigIyt529k72hC15B+Y4RicW7EW/XzSYjS02abTMi1+BFxVCO2iFPgPhTJjAgkzEJGZXBTU98hcBLqgdu0HoU4UWofxCTwaHlwh3xnE3pT2XoZEmZDv3RTlrFBoaBy0mUC7u0jkILEW0k9HisL5jw9qMbizHoXTTk51BZuaFjMjykMJE8oZQrXoIRJypyJWwiQoHE7b2FruACUszZB249Wy3we/advuTcutZb99/1f2mbtZZsrUlXutdwbir52kkPkcc58+rrXdHpixbqchYx/yM/ihEsyMnaSV/pXSr6P47xpJ/3b5wNi+d5s3zvDkqU8cHh01wGRhRtYcNJ6aV3LQWcenJ52oKHTAHh51s5SHbNIuetJV9PHxMycewQyNK/sRNPA/ecLF2b29wYO9+7/c33m8v/kc2oq+ir6Ovo0G0cPocfQ0GkenEYleRr9Hf0Z/9UTvj97fvX/W6PVrG5+7UevpvX4HCebKrA==</latexit> Dhk <latexit sha1_base64="DYY2PTbqLLiFDxWxO/IiCvrO+XA=">AAAJ7nicjZZfb9s2EMDV7k/jdH/a9bEvxIKgBbZldlG0e6zTemmAIvGKpCkQBwZFnRUiFKmRlF1D0IfoSx9WFH3t19njvk1J2YsoUu0mwADv7nfHo+94UpwzqnS//8+Vq198+dXX1zZ6m9e/+fa772/c/OGFEoUkcEwEE/JljBUwyuFYU83gZS4BZzGDk/jisbWfzEEqKviRXuZwluGU0xklWBvVyas705JdVNMbW/2dfv2gcDFYL7ai9TOe3tz4e5IIUmTANWFYqdNBP9dnJZaaEgbV5qRQkGNygVM4NUuOM1BnZZ1vhbaNJkEzIc2Pa1RrXY8SZ0ots9iQGdbnyrdZZZfttNCz385KyvNCAyerjWYFQ1oge3iUUAlEs6VZYCKpyRWRcywx0eYv2jTPhMOCiCzDPCknuRTzqpzYbagua6lqEwnkDWCFyo/g+vtWEc8aM45V4C2Y426Eyk8wbezjPd8deGMdcd9qztyYh0bw7U7sYRpsbStogGdroha9CIs0XjrIIt1d+gjlM1aYSoHLYbX/rzoJXRbJrA0/AUnnkPwuRRaweNFiF8EhtZZHosUEf4RVfRZIgEGKtXsGHIswjD3W7jI4qZBZ1zmVNnfJgVeyX2K3BsBtCbw6jZwOHgX9a0aFkAnYLjVLAn9+wp5LmsEldMdvBphn2LkIKzFIhYsEGujASt529o42RC35B2Y4BudWrEQ/n7QYDW226bRMi18BVxVC2ygF/kuhzJhAwkxEZCYXBfUzMheBzqld+0GoE4XWYXwCj4Z7l8hPBrE3pb2XIVEm5Cc3RTkrFBoaB20m0PY2EjlIrIX005GicP7jvVoM7qxH4bSTU13BpqbFzIjyUMKEcoZQLXqIhNypiJUwCQqH0za2kjtACQszpN14tez3wStt271puZXst+//yj5zN8tMmbpyr/XOQHzeSQqZn2Pu04e1ttsDM9btNGTsc34G31eCmbGTtNK/VPp1FP9dI+nfLh8Y2/du88YZHj31if2DgwaYzM3IOgeNp+aVHHTW4fFRJyoKHbD7B90s5SGbtIuedBV9fPjMiUcwQ+PKfgQN/E+ecPHi3s7gwc79P+5vPdpdfw5tRLejH6O70SB6GD2Knkbj6Dgi0UX0OvoretvLe29673rvV+jVK2ufW1Hr6X34CMFEyxU=</latexit> x0 lk Pn T Pn T - Transparent program PT - coarse-grained datasets <latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit> x2 <latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit> x1 <latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit> y2 <latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit> y1 PT <latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit> x1 <latexit sha1_base64="CU9JDVqglwSIhF3eDSADQxQg63A=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QC4IN2JbaQ9H2sc7qpgGKxNuStkAcGBR1VohSpEpSdg1Df8Je9tCh2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb/60dlfzEEbruSJXeZwntFU8hln1KLql+V0ML2509/rVw9pLwabxU60ecbTW1t/TRLFigykZYIaczbo5/Z8RbXlTEC5PSkM5JS9oimc4VLSDMz5qsq1JLuoSchMafxJSyqt77GimTHLLEYyo/bChDan7LKdFXb28HzFZV5YkGy90awQxCriDk4SroFZscQFZZpjroRdUE2Zxb9nG5+JhAVTWUZlsprkWs3L1cRtw+2qksomkUBeA04owwi+f2hV8aw209i0vJXw3FEowwTT2j4+CN1B1taRDK145to8RCG0e7GHaWtrV0EEnm2ISgwiLNJ46SGLdH8ZIlzORIGVAp+j5vBfddJ2WSSzJvwYNJ9D8kSrrMXSRYNdtA5prT5RDab1RzjVB4EEBKTU+megsWqHccfaX7ZOqnTWdU5j8S558FoOS+zXAKQrQVCnkdfBo1b/4phQOgHXpbhk8Po99lzzDC6hr8NmgHlGvYuwFlupSJVADR05KdjO3dGaqKTwwILG4N2KtRjmkxajocs2na7S4i7QsiRkl6Qgvy8MjgmicBoSnFwczHcELwKfc7cOg3AvCq/ChAQdDQ8ukW8RcTeluReSJFP6vZuSXBSGDNHB4gTa3SUqB02t0mE6WhXef3xQia07G1A07eRMV7ApthiOqABlQhlvCFVigGjIvYo4ibJW4WjaxNZyB6hhgUPaj1fJYR+8sa7d65Zby2H7/q/sM3+zDMvUlXul9wbiz52k0vkFlSF9XGm7PagQ3U5DIT7kh/ihUQLHTtJI/1IZ1lH9d410eLtCYOzeu/UbZ3jyNCQOj45qYDLHkXUBlk7xldzqrOPTk05UFbbFHh51s1y22aRZ9KSr6OPjZ148RgUZlyV+BA3CT5724vkPe4P7e/d+urfzaH/zObQVfRl9FX0TDaIH0aPoaTSOTiMWpdGv0dvo957o/dZ71/tjjV69svG5EzWe3p//ADkXySk=</latexit> y1 <latexit sha1_base64="xzqgBwgQmDpW/YChTUBac1Vu1fE=">AAAJ6nicjZZfb9s2EMDVdlvjdH/657EvxIJgA7ZldlG0fazTemmAIvG2pC0QBwZFnRWiFKmSlF3D0EfYyx42DHvdF9rjvk2PshdRlNpNgAHe3e+OR9/xpDgX3Nh+/58rV6999PEn17d62zc+/ezzL27euv3CqEIzOGVKKP0qpgYEl3BquRXwKtdAs1jAy/j1E2d/OQdtuJIndpnDeUZTyWecUYuqn5fTe9ObO/29fvWQ9mKwWexEm2c8vbX19yRRrMhAWiaoMWeDfm7PV1RbzgSU25PCQE7Za5rCGS4lzcCcr6pcS7KLmoTMlMaftKTS+h4rmhmzzGIkM2ovTGhzyi7bWWFnj85XXOaFBcnWG80KQawi7uAk4RqYFUtcUKY55krYBdWUWfx7tvGZSFgwlWVUJqtJrtW8XE3cNtyuKqlsEgnkNeCEMozg+4dWFc9qM41Ny1sJzx2FMkwwre3jg9AdZG0dydCKZ67NQxRCuxd7mLa2dhVE4PmGqMQgwiKNlx6ySPeXIcLlTBRYKfA5ag7/VSdtl0Uya8JPQfM5JD9olbVYumiwi9YhrdUnqsG0/gin+iCQgICUWv8MNFbtMO5Y+8vWSZXOus5pLN4lD17LYYn9GoB0JQjqNPI6eNTqXxwTSifguhSXDN68x55rnsEl9FXYDDDPqHcR1mIrFakSqKEjJwXbuTtaE5UUHljQGLxbsRbDfNJiNHTZptNVWnwPtCwJ2SUpyO8Kg2OCKJyGBCcXB/MtwYvA59ytwyDci8KrMCFBR8ODS+QbRNxNae6FJMmUfu+mJBeFIUN0sDiBdneJykFTq3SYjlaF9x8fVGLrzgYUTTs50xVsii2GIypAmVDGG0KVGCAacq8iTqKsVTiaNrG13AFqWOCQ9uNVctgHb61r97rl1nLYvv8r+8zfLMMydeVe6b2B+FMnqXR+QWVIH1fabg8qRLfTUIgP+SF+aJTAsZM00r9UhnVU/10jHd6uEBi79279xhmePAuJw6OjGpjMcWRdgKVTfCW3Ouv49KQTVYVtsYdH3SyXbTZpFj3pKvr4+LkXj1FBxmWJH0GD8JOnvXhxb2/wYO/+j/d3Hu9vPoe2orvRl9HX0SB6GD2OnkXj6DRiURr9Ev0W/d4TvV97f/T+XKNXr2x87kSNp/fXO0KbySo=</latexit> y2 f <latexit sha1_base64="r+fEUvStuYOORyMK9Rc4VuDM2e8=">AAAJ6nicjZbdb9s2EMDVj61x9tWPx70QDYIN2JbZRdH2sU7npQGKxNuStkAcGBR1VohSpEpSdg1Df8Je9rCi2Ov+oT3uv9lR9iKKUrsJMMC7+93x6DueFOeCG9vv/33l6rXrH318Y6u3/cmnn33+xc1bt58bVWgGp0wJpV/G1IDgEk4ttwJe5hpoFgt4Eb964uwv5qANV/LELnM4z2gq+YwzalH1y5vpYHpzp7/Xrx7SXgw2i51o84ynt7b+miSKFRlIywQ15mzQz+35imrLmYBye1IYyCl7RVM4w6WkGZjzVZVrSXZRk5CZ0viTllRa32NFM2OWWYxkRu2FCW1O2WU7K+zs0fmKy7ywINl6o1khiFXEHZwkXAOzYokLyjTHXAm7oJoyi3/PNj4TCQumsozKZDXJtZqXq4nbhttVJZVNIoG8BpxQhhF8/9Cq4lltprFpeSvhuaNQhgmmtX18ELqDrK0jGVrxzLV5iEJo92IP09bWroIIPNsQlRhEWKTx0kMW6f4yRLiciQIrBT5HzeG/6qTtskhmTfgH0HwOyY9aZS2WLhrsonVIa/WJajCtP8KpPggkICCl1j8DjVU7jDvW/rJ1UqWzrnMai3fJg9dyWGK/BiBdCYI6jbwOHrX6F8eE0gm4LsUlg9fvseeaZ3AJfRU2A8wz6l2EtdhKRaoEaujIScF27o7WRCWFBxY0Bu9WrMUwn7QYDV226XSVFt8DLUtCdkkK8rvC4JggCqchwcnFwXxL8CLwOXfrMAj3ovAqTEjQ0fDgEvkGEXdTmnshSTKl37spyUVhyBAdLE6g3V2ictDUKh2mo1Xh/ccHldi6swFF007OdAWbYovhiApQJpTxhlAlBoiG3KuIkyhrFY6mTWwtd4AaFjik/XiVHPbBG+vavW65tRy27//KPvM3y7BMXblXem8g/txJKp1fUBnSx5W224MK0e00FOJDfogfGiVw7CSN9C+VYR3Vf9dIh7crBMbuvVu/cYYnT0Pi8OioBiZzHFkXYOkUX8mtzjo+PelEVWFb7OFRN8tlm02aRU+6ij4+fubFY1SQcVniR9Ag/ORpL57f2xs82Lv/0/2dx/ubz6Gt6MvobvR1NIgeRo+jp9E4Oo1YlEa/Rr9Hb3ui91vvXe+PNXr1ysbnTtR4en/+Ay+RySg=</latexit> x1 <latexit sha1_base64="7Vic4pjHuZg2JCQiZFc95WbmPNk=">AAAJ6nicjZZfb9s2EMDVrtvidH/a7bEvRIOgA7aldlFse6yzemmAIvG2pC0QBwZFnRWiFKmRlF3D0EfoSx9WFHvdF9rjvk2PshtRlNpNgAHe3e+OR9/xpDgX3Nh+/98rVz+69vEnn271tq9/9vkXX964+dUTowrN4JQpofSzmBoQXMKp5VbAs1wDzWIBT+PnPzv70zlow5U8scsczjOaSj7jjFpU/f5iem96Y6e/168e0l4MNoudaPOMpze3/pkkihUZSMsENeZs0M/t+Ypqy5mAcntSGMgpe05TOMOlpBmY81WVa0l2UZOQmdL4k5ZUWt9jRTNjllmMZEbthQltTtllOyvs7KfzFZd5YUGy9UazQhCriDs4SbgGZsUSF5RpjrkSdkE1ZRb/nm18JhIWTGUZlclqkms1L1cTtw23q0oqm0QCeQ04oQwj+P6hVcWz2kxj0/JWwnNHoQwTTGv7+CB0B1lbRzK04plr8xCF0O7FHqatrV0FEXi8ISoxiLBI46WHLNL9ZYhwORMFVgp8jprDd+qk7bJIZk34IWg+h+QXrbIWSxcNdtE6pLX6RDWY1h/hVB8EEhCQUuufgcaqHcYda3/ZOqnSWdc5jcW75MFrOSyxXwOQrgRBnUZeB49a/YtjQukEXJfiksEf77HnmmdwCd0JmwHmGfUuwlpspSJVAjV05KRgO3dHa6KSwgMLGoN3K9ZimE9ajIYu23S6Sou7QMuSkF2Sgvy+MDgmiMJpSHBycTDfEbwIfM7dOgzCvSi8ChMSdDQ8uES+RcTdlOZeSJJM6fduSnJRGDJEB4sTaHeXqBw0tUqH6WhVeP/xQSW27mxA0bSTM13BpthiOKIClAllvCFUiQGiIfcq4iTKWoWjaRNbyx2ghgUOaT9eJYd98MK6dq9bbi2H7fu/ss/8zTIsU1fuld4biL91kkrnF1SG9HGl7fagQnQ7DYX4kB/ih0YJHDtJI/1LZVhH9d810uHtCoGxe+/Wb5zhyaOQODw6qoHJHEfWBVg6xVdyq7OOT086UVXYFnt41M1y2WaTZtGTrqKPjx978RgVZFyW+BE0CD952osn9/YGP+zd//X+zoP9zefQVnQruh19Ew2iH6MH0aNoHJ1GLEqjl9Gf0eue6L3qven9tUavXtn4fB01nt7fbwE5Fckp</latexit> x2 if c: y1 ß x1 else: y1 ß x2 Y2 ß f(x1, x2) Runtime: c == True
  • 34. 34 A generic dataframe observer for Pandas Approach: - add an observer to monitor dataframe changes - mostly transparent to application - some control surfaced IDEAL 2023
  • 35. 35 Approach to design (II) - Grounded in well-known dataframe transformation operators - Open: accommodates any transformation within three broad classes
  • 36. 36 Data reduction <latexit sha1_base64="caGX98B8rPEaUMv/+I4c5iOo7DY=">AAADNnicbVLLjtMwFHXDawivDizZGCrEIKEqQSNggzRiZsFykOjMSE2prl2nNXXsyL6mVFHWfA1b4FfYsENs+QEk3IeApnMlSyfnnGvH14eVSjpMkm+t6MLFS5ev7FyNr12/cfNWe/f2iTPectHjRhl7xsAJJbXooUQlzkoroGBKnLLp4UI/fS+sk0a/wXkpBgWMtcwlBwzUsH3v6CF9QbNSDg/3jh49pnT17eS4gCUVD9udpJssi26DdA06ZF3Hw93W72xkuC+ERq7AuX6alDiowKLkStRx5p0ogU9hLPoBaiiEG1TLu9T0QWBGNDc2LI10yf7fUUHh3LxgwVkATlxTW5DnaX2P+fNBJXXpUWi+Oij3iqKhi8HQkbSCo5oHANzK8K+UT8ACxzC+jVOYMVME5uo4zrSYcVMUoEdVZkxdZSg+IMsrU9ebYo51Px1Ufw2dtN6y/GuHpmbKIArtvBWLq9GM5dQ0PBNjwY+DD1Q5gbdVZuV4gmCtmTW3C5HYtIa5KRWebabP9b8zUgc3MzOUoqF5HZIURF8qv5hJCEzajMc2OHnSTZ9291/vdw5erqOzQ+6S+2SPpOQZOSCvyDHpEU4+kk/kM/kSfY2+Rz+inytr1Fr33CEbFf36A9XlEGY=</latexit> D0 = ⇡C(D), D0 = C(D) - Projection, Selection <latexit sha1_base64="fFqxFPpIMzZxYgmMXJgJEbRtTTU=">AAADX3icbVJdixMxFE1bddeqa1efxJdgEbogZcbvBx9WV1B8WsHuLjS1ZDJ3prGZZEgy1hLyn/w1gk/qDxFMP9i1070QOHPPucnk5CSl4MZG0c9Gs3Xl6rWd3evtGzdv7d3u7N85MarSDAZMCaXPEmpAcAkDy62As1IDLRIBp8n0aMGffgVtuJKf7LyEUUFzyTPOqA2tcecDKfnYEUcsfLPuiKf+EV7hdyBT0Oefr3PwxPseMTwv6NhddF89iXzv7cHBuNON+tGy8DaI16CL1nU83m/8JaliVQHSMkGNGcZRaUeOasuZAN8mlYGSsinNYRigpAWYkVte2uOHoZPiTOmwpMXL7v8TjhbGzIskKAtqJ6bOLZqXccPKZi9HjsuysiDZ6qCsEtgqvHAQp1wDs2IeAGWah3/FbEI1ZTb4vHFKotTU0sT4dptImDFVFFSmjijlV/4lmVPeb5KZ9cN45M4F3dhvSS7GaZ1TZSBBmkrD4mqYJBlWNc1EaVrlQUdFOaGfHdE8n1iqtZrVtwvZ2ZQG34QIzzaTl+q/KC6DOlEzy6HGVTJELpBVKaqFJyEwcT0e2+DkcT9+3n/28Wn38M06OrvoPnqAeihGL9Aheo+O0QAx9B39QL/Q7+af1k5rr9VZSZuN9cxdtFGte/8AGM4hrw==</latexit> ⇡{Cid,Gender,Age}( Age<30(D))
  • 37. 37 Data augmentation Vertical augmentation <latexit sha1_base64="Jkv8keMS0FhjcfbzwX5TGOOML7Q=">AAADJnicbVJNj9MwEHXDxy7lq4UjF4sKablUCVoB2tMKLhwXie4WNaWauE5j6tiRPSZUUX4KV+DXcEOIG38ECaetgKY7kqWneW884/FLCikshuHPTnDl6rXrB4c3ujdv3b5zt9e/d261M4yPmJbajBOwXArFRyhQ8nFhOOSJ5BfJ8mXDX3zgxgqt3uCq4NMcFkqkggH61KzXjzNtwC1mVXo0fnxC39az3iAchuug+yDaggHZxtms3/kdzzVzOVfIJFg7icICpxUYFEzyuhs7ywtgS1jwiYcKcm6n1Xr2mj7ymTlNtfFHIV1n/6+oILd2lSdemQNmts01ycu4icP0+bQSqnDIFds0Sp2kqGmzCDoXhjOUKw+AGeFnpSwDAwz9una6JFovERJbd7ux4iXTeQ5qXsVa11WM/CMmaaXrepdMsZ5E0+qvYBDVe5J/5dDmdOFJrqwzvHkajZOU6pZm83NeB7LI4F0VG7HIEIzRZfs6b4Fdqd+blP7bSnWp/r0WyqsTXaLgLc4p7xxPukK6ZifeMFHbHvvg/Mkwejo8fn08OH2xtc4heUAekiMSkWfklLwiZ2REGCnJJ/KZfAm+Bt+C78GPjTTobGvuk50Ifv0By7kNaw==</latexit> ↵! f(X):Y <latexit sha1_base64="KZhlQb7RQuvbDZlIWBGjNXy9o1c=">AAADPnicbVLLjhMxEHSGxy7hlYUjF4sIKblEGbSCFaflceC4ILK7UiZEPY4nMfHYI7tNiKz5Br6GK/Ab/AA3xJULEp4kAjLZliyVu6rddrvSQgqL/f63RnTp8pWre/vXmtdv3Lx1u3Vw59RqZxgfMC21OU/BcikUH6BAyc8LwyFPJT9L588r/uw9N1Zo9QaXBR/lMFUiEwwwpMatbjLTBtx07LNx3Eky9E+nvOw+oRWEKX8NKuzLzovuuNXu9/qroLsg3oA22cTJ+KDxO5lo5nKukEmwdhj3Cxx5MCiY5GUzcZYXwOahzTBABTm3I796U0kfhMyEZtqEpZCusv9XeMitXeZpUOaAM1vnquRF3NBhdjTyQhUOuWLrRpmTFDWtBkQnwnCGchkAMCPCXSmbgQGGYYxbXVKt5wipLZvNRPEF03kOauITrUufIP+AaeZ1WW6TGZbDeOT/CtpxuSP5Vw51TheB5Mo6w6un0STNqK5p1j8adCCLGbz1iRHTGYIxelE/LlhjWxrmJmX4toW6UP9OCxXUqV6g4DXOqeCoQLpCumomwTBx3R674PRhL37UO3x12D5+trHOPrlH7pMOicljckxekhMyIIx8JJ/IZ/Il+hp9j35EP9fSqLGpuUu2Ivr1B33rF1I=</latexit> ↵! f1(Age):ageRange(D) group by gender avg(age) Horizontal augmentation <latexit sha1_base64="/Fez8VR4cSmlF01/YiVQsD5zSEs=">AAADP3icbZLNjtMwEMfd8LWUj+3CkUtEhdTlUDXVChAS0vIlOC4S3V2pKZHjTlpTx47sMaWK/A48DVfgMXgCbogrBySctoJtuiNF+mf+v4njmUkLwQ32et8bwYWLly5f2bnavHb9xs3d1t6tY6OsZjBgSih9mlIDgksYIEcBp4UGmqcCTtLZ88o/+QDacCXf4qKAUU4nkmecUfSppHX/ZdJ/EnuC2klSxhmWr0COQbvHWdLvVO9PJ+D2XefFftJq97q9ZYTbIlqLNlnHUbLX+BOPFbM5SGSCGjOMegWOSqqRMwGuGVsDBWUzOoGhl5LmYEbl8lIuvOcz4zBT2j8Sw2X2bEVJc2MWeerJnOLU1L0qeZ43tJg9GpVcFhZBstVBmRUhqrDqUDjmGhiKhReUae7/NWRTqilD38eNU1KlZkhT45rNWMKcqTynclzGSrkyRviIaVYq5zbNDN0wGpX/gHbktpD/5bTuqcKbII3VUF0tjNMsVDVmqqpxeo6KYkrflbHmkylSrdW8/rnV5M+gvm9C+LHN5bn8e8Wlp1M1Rw41z0q/Ut60hbBVT/zCRPX12BbH/W70oHvw5qB9+Gy9OjvkDrlLOiQiD8kheU2OyIAw8ol8Jl/I1+Bb8CP4GfxaoUFjXXObbETw+y86OxeP</latexit> E2 = ↵# Gender:f2(Age)(D) <latexit sha1_base64="bJJOqZd/k6cJtV5UgJsl0/znBVA=">AAADKHicbZJNj9MwEIbd8LFL+eqyRy4WFVL3UiWIjxWnFXDguEh0t6gJ1cR1WlPHjuwxpYryW7gCv4Yb2iv/AwmnrWCb7kiRXs3zju3MTFpIYTEML1rBtes3bu7t32rfvnP33v3OwYMzq51hfMC01GaYguVSKD5AgZIPC8MhTyU/T+eva37+mRsrtHqPy4InOUyVyAQD9Klx5zD2FNx0XA5fZr0PR1XvzdG40w374Srorog2oks2cTo+aP2JJ5q5nCtkEqwdRWGBSQkGBZO8asfO8gLYHKZ85KWCnNukXL2+oo99ZkIzbfynkK6ylytKyK1d5ql35oAz22R18io2cpgdJ6VQhUOu2PqizEmKmtatoBNhOEO59AKYEf6tlM3AAEPfsK1bUq3nCKmt2u1Y8QXTeQ5qUsZaV2WM/AumWamrahtmWI2ipPxn6EbVjuV/OTSZLjzkyjrD61+jcZpR3fDMdD077wNZzOBjGRsxnSEYoxfN49ZjvmT1fZPSj22hrvR/0kJ5d6oXKHiDOeV3x0NXSFf3xC9M1FyPXXH2pB897z9797R78mqzOvvkIXlEeiQiL8gJeUtOyYAwsiRfyTfyPfgR/Ax+BRdra9Da1BySrQh+/wVrcA35</latexit> ↵# X:f(Y )(D)
  • 38. 38 Data transformation <latexit sha1_base64="XtRrctBkqIU93sb+UHrmtJtjUkA=">AAADHnicbZJNbxMxEIad5aMlfLVw5LIiQiqXaBdVwLGCC8cikTbSbojGjjdr4rVX9rghsvZncAV+DTfEFX4MEt40ArLpSJZezfuMP8ZDayksJsmvXnTt+o2be/u3+rfv3L13/+DwwZnVzjA+YlpqM6ZguRSKj1Cg5OPacKio5Od08br1zy+4sUKrd7iq+aSCuRKFYIAhleUIbuqLo/HTZnowSIbJOuJdkW7EgGzidHrY+53PNHMVV8gkWJulSY0TDwYFk7zp587yGtgC5jwLUkHF7cSv79zET0JmFhfahKUwXmf/r/BQWbuqaCArwNJ2vTZ5lZc5LF5OvFC1Q67Y5UGFkzHquG1APBOGM5SrIIAZEe4asxIMMAxt2jqFar1AoLbp93PFl0xXFaiZz7VufI78I9LC66bZNgtssnTi/wKDtNlB/pVD19N1MLmyzvD2aXFOi1h3mFIbcPPAgaxLeO9zI+YlgjF62d0ufP02GvomZfi2pbqS/6CFCjTVSxS84zkVJiaYrpau7UkYmLQ7Hrvi7NkwfT48fns8OHm1GZ198og8JkckJS/ICXlDTsmIMKLJJ/KZfIm+Rt+i79GPSzTqbWoekq2Ifv4B+VsLDw==</latexit> ⌧f(X) <latexit sha1_base64="Q7sjzw3r7FpZN6MWGMj9azYMGFk=">AAAD5HicbVJLb9NAELYbHiW8WjhyWREjFQlFccXrWAEHjkWiDykO0ex6N1663rX20RBZ/gfcEFf+Emd+DBKzaQQk7Vw8O9/3zYxnhjZKOj8a/Uq3eteu37i5fat/+87de/d3dh8cOxMs40fMKGNPKTiupOZHXnrFTxvLoaaKn9CztxE/OefWSaM/+kXDJzXMtBSSgcfQdOdnoY3UJdee+IqTwvMvmKX1FrQTxtZLWkeMIEAc99ERHHyw3JHsNIvv7F1GgpN6hhQRNIsKkomMFAWRjhjqAZsrCV0QF6jD9MFHNgdWkXNQgZOsnLayEF1G5tJXKN7DQAHOY+xp9ixmwmYuFKvyJCuwhGEsWBuzSR37GU53BqPhaGnkspOvnEGyssPpbvq7KA0LNY6AKXBunI8aP2nBeskU7/pFcLwBdgYzPkZXQ83dpF1OviNPMFIuexMGR7iM/q9ooXZuUVNk4igrt4nF4FXYOHjxetJK3QTPNbsoJIIi3pC4RlJKy5lXC3SAWYm9ElaBBeZx2WtVqDFnHqjr+v1C8zkzdQ26bAtjuna5bipa03XroPDdOJ+0fwmDvLtE+SeHTcw0CHLtcE/x10hBBTEbnMpYCDPkgWoq+NQWVs4qD9aa+WY6POB1Ks5NKVzbXF/J/4wnjWxq5l7yDSzoeND4bVSIM8GDyTfP47JzvD/MXw6ff9gfHLxZnc528ih5nOwlefIqOUjeJ4fJUcLSF+k4LVPeE72vvW+97xfUrXSleZisWe/HH5yBTeM=</latexit> the transformation of a set of features X of D using a function f is obtained by substituting each value dia with f(d⇤a), for each feature a occurring in X. Example: data imputation. Here f replaces nulls with the most frequent value, for column Zip <latexit sha1_base64="dKf0psuUtfBq7WDfOX5DpzZK5ls=">AAADKnicbZLNjtMwFIXd8DeUvw6IFZuICqmzqRI0ApYjYMFykOjMiCZUN67Tmjp2ZF9TKssPwxZ4GnYjtrwGEk6nAprOlSId3fPd2Lk5RS24wSQ570RXrl67fmPvZvfW7Tt37/X2758YZTVlI6qE0mcFGCa4ZCPkKNhZrRlUhWCnxeJV459+YtpwJd/hqmZ5BTPJS04BQ2vSe5gh2IkrB1mJ7j2v/YEfvD6Y9PrJMFlXvCvSjeiTTR1P9ju/s6mitmISqQBjxmlSY+5AI6eC+W5mDauBLmDGxkFKqJjJ3fr+Pn4SOtO4VDo8EuN19/8JB5Uxq6oIZAU4N22vaV7mjS2WL3LHZW2RSXpxUGlFjCpulhFPuWYUxSoIoJqHu8Z0DhoohpVtnVIotUAojO92M8mWVFUVyKnLlPIuQ/YZi9Ip77fNEv04zd1foJ/6HeTfOLQ9VQeTSWM1az4tzooyVi1mrjTYWeBA1HP44DLNZ3MErdWy/boQg2007E2I8NuW8lL+o+Iy0IVaImctz8qQnmDaWthmJyEwaTseu+Lk6TB9Njx8e9g/ermJzh55RB6TAUnJc3JE3pBjMiKUOPKFfCXfou/Rj+g8+nmBRp3NzAOyVdGvPwnyD0I=</latexit> ⌧f(Zip)(D)
  • 39. 39 Data fusion: join and append <latexit sha1_base64="uo1XC2O2rrqRH/7jgx2X/lPakP4=">AAADKHicbZLNbtNAFIUn5q+Ev5Qu2YyIkFhFNqoKy6rtggWLgkhbKXai68k4HjKesWbuNESWn4Ut8DTsULe8BxLjNALi9EqWju75rmd8fdJSCotheNUJbt2+c/fezv3ug4ePHj/p7T49s9oZxodMS20uUrBcCsWHKFDyi9JwKFLJz9P5ceOfX3JjhVYfcVnypICZEplggL416e2djN/R+JMWaoyT6rimJ+MPk14/HISrotsiWos+WdfpZLfzO55q5gqukEmwdhSFJSYVGBRM8robO8tLYHOY8ZGXCgpuk2p1+5q+8J0pzbTxj0K66v4/UUFh7bJIPVkA5rbtNc2bvJHD7E1SCVU65IpdH5Q5SVHTZhV0KgxnKJdeADPC35WyHAww9AvbOCXVeo6Q2rrbjRVfMF0UoKZVrHVdxcg/Y5pVuq43zQzrUZRUf4F+VG8h/8ah7enSm1xZZ3jzaTROM6pbTK4NuJnnQJY5jKvYiFmOYIxetF/nQ7CJ+r1J6X/bQt3IN5HwdKoXKHjLc8pnx5uulK7ZiQ9M1I7Htjh7NYgOBvvv9/uHR+vo7JBn5Dl5SSLymhySt+SUDAkjS/KFfCXfgu/Bj+BncHWNBp31zB7ZqODXH8rzDh4=</latexit> DL ./t C DR <latexit sha1_base64="fiWoK5ivN8nYSDBQRhG2qdf4NTc=">AAADIXicbZJNbxMxEIad5auErxaOXCwiJE7RLqoKxwp64MChINJWym6qsePNmnjtlT0mRKv9H1yBX8MNcUP8FiS8aQRk05EsvZr3GX+Mh1VKOozjn73oytVr12/s3Ozfun3n7r3dvfsnznjLxYgbZewZAyeU1GKEEpU4q6yAkilxyuYvW//0g7BOGv0Ol5XISphpmUsOGFKTo8lrmnodJD2avD3fHcTDeBV0WyRrMSDrOD7f6/1Op4b7UmjkCpwbJ3GFWQ0WJVei6afeiQr4HGZiHKSGUrisXl27oY9DZkpzY8PSSFfZ/ytqKJ1bliyQJWDhul6bvMwbe8yfZ7XUlUeh+cVBuVcUDW17QKfSCo5qGQRwK8NdKS/AAsfQqY1TmDFzBOaafj/VYsFNWYKe1qkxTZ2i+Igsr03TbJo5NuMkq/8Cg6TZQv6VQ9czVTCFdt6K9mk0ZTk1HaYwFvwscKCqAiZ1auWsQLDWLLrbhd/fREPflArfttCX8u+N1IFmZoFSdLzVpATTV8q3PQkDk3THY1ucPB0mB8P9N/uDwxfr0dkhD8kj8oQk5Bk5JK/IMRkRTiz5RD6TL9HX6Fv0PfpxgUa9dc0DshHRrz8U/gvI</latexit> DL ] DR <latexit sha1_base64="ZSc/aIuuYda02WJ0QVQW8PzBr8E=">AAADIHicbZJNbxMxEIad5auErxaOXCwiJE7RLqqAYwU9cOBQEGkrZTfV2PFmTbz2Yo8J0Wp/B1fg13BDHOG/IOFNIyCbjmTp1bzP+GM8rFLSYRz/7EWXLl+5em3nev/GzVu37+zu3T12xlsuRtwoY08ZOKGkFiOUqMRpZQWUTIkTNn/R+icfhHXS6Le4rERWwkzLXHLAkMoOJ69Sr4Oih5M3Z7uDeBivgm6LZC0GZB1HZ3u93+nUcF8KjVyBc+MkrjCrwaLkSjT91DtRAZ/DTIyD1FAKl9WrWzf0YchMaW5sWBrpKvt/RQ2lc8uSBbIELFzXa5MXeWOP+bOslrryKDQ/Pyj3iqKhbQvoVFrBUS2DAG5luCvlBVjgGBq1cQozZo7AXNPvp1osuClL0NM6NaapUxQfkeW1aZpNM8dmnGT1X2CQNFvIv3LoeqYKptDOW9E+jaYsp6bDFMaCnwUOVFXApE6tnBUI1ppFd7vw+Zto6JtS4dsW+kL+nZE60MwsUIqOt5qUYPpK+bYnYWCS7nhsi+PHw+TJcP/1/uDg+Xp0dsh98oA8Igl5Sg7IS3JERoST9+QT+Uy+RF+jb9H36Mc5GvXWNffIRkS//gCWmQue</latexit> DL ] DR <latexit sha1_base64="Tf7s3qEix3yKzKbh9vcpsGLm1tk=">AAADSXicbVLdihMxGE2n/qz1r6uX3gSL4FWZkaLeCIu7FwperGJ3FzrTkkkzbWwmGZIv1hLyIj6Nt+oT+BjeiSCY6ZbVTveDgZNzzpdMvpy8EtxAHP9oRe0rV69d37vRuXnr9p273f17J0ZZTdmQKqH0WU4ME1yyIXAQ7KzSjJS5YKf54rDWTz8ybbiS72FVsawkM8kLTgkEatIdHI3fpB8Ul2OXmgJzKZn2ExfYflqAO3w99S+Oxu8uFh6H1aTbi/vxuvAuSDaghzZ1PNlv/UmnitqSSaCCGDNK4goyRzRwKpjvpNawitAFmbFRgJKUzGRufT2PHwVmigulwycBr9n/OxwpjVmVeXCWBOamqdXkZdrIQvE8c1xWFpik5wcVVmBQuJ4VnnLNKIhVAIRqHv4V0znRhEKY6NYpuVILILnxnU4q2ZKqsiRy6lKlvEuBfYK8cMr7bbEAP0oyd2HoJX7H8q+dNDVVBZFJYzWrr4bTvMCq4ZkrTews+Iio5iS8seazORCt1bK5XUjJtjXMTYjwbEt5qb8OTXDnagmcNTQrQ7iCaCth65mEwCTNeOyCkyf95Gl/8HbQO3i5ic4eeoAeoscoQc/QAXqFjtEQUfQZfUFf0bfoe/Qz+hX9PrdGrU3PfbRV7fZfdvcasg==</latexit> DL ./inner DL.CId=DR.CId DR
  • 40. 40 Conceptual provenance capture model: templates <latexit sha1_base64="Q+fPf+TzQY7bxgC074TZYQmdfIg=">AAAKYHicjZZfb9s2EMDldn9Sr12T7W17IRYES7E1s4cWG/ZUZ83SAEXiFUlbIPYMSjrJRClSIym7hqAPucc97GWfZEfZiylK7SbAAI/3uzuSdzw6zDnTZjD4s3fr9gcffvTxzp3+J3fvfXp/d++zl1oWKoKrSHKpXodUA2cCrgwzHF7nCmgWcngVvvnZ6l8tQGkmxaVZ5TDNaCpYwiJqcGq2u5wIWEYyy6iIy0liquvhtCwnBt6aMCn3h1VV9RvIXCpapFU5oTyf09/KiWLp3FCl5NKia/WsTGbDQ3RXjlKoHvxE7JCm8IIKlKvDpw9mu/uDo0H9kfZguBnsB5tvPNvb+WMSy6jIQJiIU62vh4PcTEuqDIs4YOhCQ06jNxjmGoeCZqCnZX1CFTnAmZgkUuFPGFLPuhYlzbReZSGSGTVz7evsZJfuujDJj9OSibwwIKJ1oKTgxEhij5vETEFk+AoHNFIM10qiOVU0MpiUPn6Nw82VXODR2jDMlLXkHX8M+RawQuV7cO19rQyTrZqGumUtuWOOQuUvMN3qx6e+OYit9kT4WtzzVj1Cwdc7vkdpK7TNIALPN0Qteh6WabhykGV6vPIRJhJeYKbA5ag++3c6bpss46QJPwXFFhD/omTWYumywS5bmzRGXcoG0zoIO/VeIAYOKTXuHmgo227sto5XrZ1KlXXtU+MtdgOvZT/Fbg5A2BR4eTpxKvikVb/YnKSKwVYpDiP4/R36XLEMbqCv/WKARUadi7AWW0sRMoYtdG4lL5y9o1uilvwNcxqCcyvWor+etDgZ2dVi50uL74BWFSEHJAXxsNDYJojEHkywczHQ3xK8CGzB7Nh3whwvrHbjE/RkdHqDfIOIvSnNWEiSTKp3BiU5LzTBxiwMdqCDAyJzUNRI5S9HycI549NabN1Zj6JpJ6e7nM2wxFTrgYm41E4TqkUPUZA7GbESjVqJo2kTW8sdoIIlNmnXXy37dfDW2HLfltxa9sv3f60+c4NlmKautdfzTkN80UlKha+w8OmLerbbgnLebTTi/H12iJ9pybHtxI3l30z6eZT/nSPl3y4fGNt3d/vijC6f+cTZ+fkWmCywZc3B0Bk+ya3Kuri67ERlYVrs2Xk3y0SbjZtJj7uSPr547viLKCfjqsI/QUP/L0978PL7o+Hjo8Gvj/afHG/+Du0EXwZfBYfBMPgheBI8C8bBVRAFf/Vu9+727t35u7/Tv9/fW6O3ehubz4PG1//iH9y29FY=</latexit> ↵! f1(Age):ageRange(D) A different provenance template pt𝜏 is associated with each type 𝜏 of operator
  • 41. 41 Capturing provenance: bindings At runtime, when operator o of type 𝜏 is executed, the appropriate template pt𝜏 for 𝜏 is selected Data items from the inputs and outputs of the operator are used to bind the variables in the template 14/03/2021 03_ b _c . :///U / 65/D a /03_ b _c . 1/1 14/03/2021 03_ b _c . :///U / 65/D a /03_ b _c . 1/1 op {old values: F, I, V} à {new values: F’, J, V’} + Binding rules <latexit sha1_base64="icVdmbcCfxxYOiITpBtlS3uqwUQ=">AAAD+HicdZNdb9MwFIaTlY8RPtbBJTdHVJQhVVWDJkCTKk2AJsbVkOjWqQ6V4zqtmWNHtrOuBP8X7hC3/Buu+R1IOGkFbTccKTo672O/yTnHccaZNp3OT3+jdu36jZubt4Lbd+7e26pv3z/WMleE9ojkUvVjrClngvYMM5z2M0VxGnN6Ep+9LvWTc6o0k+KDmWU0SvFYsIQRbFxqWP8VNAEZemGKA6nAAtsLAY2k0SD2mggFTQTVUyHVm5ki13RkgQrT3rMwQByLMadwAF3oD9MWHHZZC467b4YFa7mEBaTmxBcoAUBMQB8i+N/xYyqowmbJY8nkiXM5HU5a8K50Oe8mO3Mf+3TZxhGVzSlEQTCsNzrtTrXgchAugoa3WEfDbf+3qwHJU2dPONZ6EHYyExVYGUY4tQFyFcgwOcNjOnChwCnVUVF1w8LjsjyQuHImUhiosss7CpxqPUtjR6bYTPS6Viav0ga5SV5GBRNZbqggc6Mk52AklK2FEVOUGD5zASaKuW8FMsEKE+MGYMUllvLM4FjbIECCTolMUyxGBZLSzrsQJ4W0dlVMjB2EUfEXaIT2EvJvO17XZOZEKnSuaPlrgOIE5BozkQrnY8dhnk3wxwIpNp4YrJScrh/nhnoVdXXj3LVtKq7kP0kmHB3LqWF0TcuFuwtOzDOelzVxAxOuj8fl4PhZO3ze3n2/29h/tRidTe+h98jb8ULvhbfvvfWOvJ5H/ENf+hf+rPa59rX2rfZ9jm74iz0PvJVV+/EHk+tPwQ==</latexit> For i : 1 . . . n : used ent.:[hF = Xm, I = i, V = Di,Xm i|Xm 2 X] generated ent.:[hF0 = Yh, J = i, v = f(Di,X )i|Yh 2 Y ]
  • 42. 42 This applies to all operators
  • 43. 43 Implementation We use templates in combination with dataframe diff: (*) extends to joins, append For each input/output pair Din, Dout of dataframes: 1. Compare both the shapes and values of Din, Dout (*) 2. Use the diff to: • Select the appropriate template • Bind the template variables using the relevant values in the two dataframes • Generate an instantiated provlet
  • 44. 44 Running Example D1 D2 D3 Add ‘E4,’ ‘Ex’, ‘E1’ Remove ‘E’ D4 D6 Da Db Left join (K1,K2) Impute all missing Dc Left join (K1,K2) Impute E,F D5 <latexit sha1_base64="vtTzVqyQbOaTVii0idD+QwwhSJQ=">AAAEKXicfVNbb9MwFE5WLqPcNvbIi0UFalE0NV03EFKl3Toh7WVI7CLVxXJcpzVz7Mh26IqV/8Ir8Gt4A175HUg4bRltN3GkREfn+853Ep/PUcqZNvX6D3+pdOPmrdvLd8p3791/8HBl9dGJlpki9JhILtVZhDXlTNBjwwynZ6miOIk4PY3O9wr89ANVmknx1oxS2k1wX7CYEWxcCa36a8DFM7CPwtY+wgC+l0y8s9AYwGlscmQPURgcokbuKBGE5b/0RgsanCEbo7D6vJZXnUBtBt5wao3/q5EZevNSrVFtBwdjvY1ZPbuZt+BAKpz1kR1U27VX0LZRMwBtdFG8QpgXPc3Znq0WTBmy0O4UnN0A7KBRAPYDsBeAg6Jppj0AEwE3p1ZGK5X6en0c4GoSTpOKN40jd4y/YU+SLKHCEI617oT11HQtVoYRTvMyzDRNMTnHfdpxqcAJ1V07Xl8OnrpKD8RSuUcYMK7OdlicaD1KIsdMsBnoRawoXod1MhO/7Fom0sxQQSaD4owDI0HhBdBjihLDRy7BRDH3rYAMsMLEOMfMTYmkPDc40nm5DAUdEpkkWPQslDJ326UXJoqtzPN50C28E3btJaES5lco/9rxIiZTB1KhM0WLXwMwioFc4Ewc4XiYpwPsnKZYf2CwUnK4KOduwTzVnRvnbm1DcS2/sK5jR3JoGF3AMuEujwOzlGfFmTjDhIv2uJqcNNbDrfXNN83K9u7UOsveY++JV/VC74W37b32jrxjj/gf/U/+Z/9L6WvpW+l76eeEuuRPe9a8uSj9+gO5hVWq</latexit> D1 = Da ./left K1,K2 Db D2 = ⌧f1(⇤)(D1) D3 = D2 ./left K1,K2 Dc D4 = ⌧f2(E,F )(D3) D5 = ↵! h(E):{E4,Ex,E1}(D4) D6 = ⇡{Ax,B,Ay,D,C,F,E4,Ex,E1,}(D5)
  • 45. 45 Summary: Shape and value changes Shape changes: Rows Added? Rows Removed? Columns Added? Columns Removed? Columns Removed? Horizontal Augmentation Reduction by selection Reduction by projection data transformation (composite) Y Y Y Y data transformation Y N N N Templates: N Value changes for each column: Nulls reduced? Values changed? Y Y N Templates: data transformation (imputation) data transformation 1-1 derivations
  • 46. 46 Running Example Dataframes Diff template D1 ß {Da, Db} Explicit join provenance pattern D2 ß D1 value change, reduced nulls à imputation Data transformation D3 ß {D2, Dc} Explicit join provenance pattern D4 ß D3 value change, reduced nulls à imputation Data transformation D45 ß D4 Shape change, column(s) added <wait!> D6 ß D5 Shape change, column(s) removed Data transformation, composite D1 D2 D3 Add ‘E4,’ ‘Ex’, ‘E1’ Remove ‘E’ D4 D6 Da Db Left join (K1,K2) Impute all missing Dc Left join (K1,K2) Impute E,F D5
  • 50. 50 Scalability: capture and storage / TCI-DI datasets Basic operators Join + append operators
  • 51. 51 How can we realise why+provenance?
  • 52. 52 Representing provenance: Layer III Need a language to express solution-specific explanations: xnj xi j CS wasGeneratedBy used C wasAssociatedWith wasDerivedFrom why - Why was ti selected/removed? 1. ti belongs to cluster Ch, 2. there exists tj in Ch such that d(ti,tj) < δ, à ti and tj redundant, à tj selected, ti removed” - “Why was xnj cleaned?” TSfull ti Filter wasInvalidatedBy used why
  • 53. 53 Capturing provenance: Layer III Layer III (data- and process-granular): Requires explanation generator as part of the transformation logic Approach: operator send “explanations” to provenance server using API at runtime At a chosen granularity: dataset à data item xnj xi j CS wasGeneratedBy used C wasAssociatedWith wasDerivedFrom why CS D D’ Prov-DB {xi, x’i, expli} IDEAL 2023
  • 54. 56 <event name> Layer III provenance: Preliminary ideas We frame the general provenance granularity problem in terms of the two orthogonal dimensions: - data derivation, from dataset to item-level - detail of processor behaviour, from class to internal logic dataset item Data detail Processor detail class level logic level - transformation - selection D à D’ D à D’ ⊆ D { x à x’}x ∈ D, x’ ∈ D’ { x ∈ D | 𝜎(x’) = True} Processor logic Why x? (transformation, selection) Why x’? (transformation, augmentation) ⟶ ⟶
  • 55. 57 <event name> Processor logic at dataset level dataset item Data detail Processor detail class level logic level - transformation - selection D à D’ D à D’ ⊆ D { x à x’}x ∈ D, x’ ∈ D’ { x ∈ D | 𝜎(x’) = True} Processor logic Why x? (transformation, selection) Why x’? (transformation, augmentation) ⟶ ⟶ wgby used … Ai Ti Mi Di Ci-i CTi Mi-1 Di-1 wgby wgby used used Cleaning targets Assessment Training Cleaning Model
  • 56. 58 <event name> Processor logic at item level dataset item Data detail Processor detail class level logic level - transformation - selection D à D’ D à D’ ⊆ D { x à x’}x ∈ D, x’ ∈ D’ { x ∈ D | 𝜎(x’) = True} Processor logic Why x? (transformation, selection) Why x’? (transformation, augmentation) ⟶ ⟶ “why did the assessor 𝐴 choose 𝑥 for cleaning?” “how did the cleaner 𝐶 choose the replacement value?” “why did 𝑥 ∈ 𝐷 get selected for removal from the training set?”
  • 57. 59 <event name> A possible vocabulary / library: DC-Check [4] Seedat, Nabeel, Fergus Imrie, and Mihaela van der Schaar. ‘DC-Check: A Data-Centric AI Checklist to Guide the Development of Reliable Machine Learning Systems’. arXiv, 9 November 2022. http://arxiv.org/abs/2211.05764.
  • 58. 62 Summary of goals and action plan problem instances Prov-DB Data Training Ops Enable reuse Observe / record Reproduce / explain Curated Data toolkit Goals: to support • Reusability and emerging best practices for complex data intervention + usage patterns • Reproducibility, explainability of pipeline instances How: - Enable data processing observations / capture - Build a curated catalogue of interventions + usage patterns - Associate provenance with data + model versions Challenges: - Observability: Instrumenting common runtime for transparent capture - Granularity: pick a layer (I-II-III): precision vs scalability à how much do we need? - “why?” vocabulary and language for expressing explanations IDEAL 2023
  • 59. 63 Summary of references [1] Seedat, Nabeel, Fergus Imrie, and Mihaela van der Schaar. ‘DC-Check: A Data-Centric AI Checklist to Guide the Development of Reliable Machine Learning Systems’. arXiv, 9 November 2022. http://arxiv.org/abs/2211.05764. [2] Mohammad Hossein Jarrahi, Ali Memariani, and Shion Guha. 2023. The Principles of Data-Centric AI. Commun. ACM 66, 8 (August 2023), 84–92. https://doi.org/10.1145/3571724 [3] Zha, Daochen, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, and Xia Hu. ‘Data-Centric AI: Perspectives and Challenges’. arXiv, 2 April 2023. http://arxiv.org/abs/2301.04819. [4] Zha, Daochen, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. ‘Data-Centric Artificial Intelligence: A Survey’. arXiv, 11 June 2023. https://doi.org/10.48550/arXiv.2303.10158. [5] Singh, Prerna. ‘Systematic Review of Data-Centric Approaches in Artificial Intelligence and Machine Learning’. Data Science and Management 6, no. 3 (1 September 2023): 144–57. https://doi.org/10.1016/j.dsm.2023.06.001. [6] Neutatz, Felix, et al. "From Cleaning before ML to Cleaning for ML." IEEE Data Eng. Bull. 44.1 (2021): 24-41. [7] Mazumder, Mark, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, et al. ‘DataPerf: Benchmarks for Data-Centric AI Development’. arXiv, 13 October 2023. https://doi.org/10.48550/arXiv.2207.10062. [8] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. [9] Abbas, Amro, Kushal Tirumala, Dániel Simig, Surya Ganguli, and Ari S. Morcos. ‘SemDeDup: Data-Efficient Learning at Web-Scale through Semantic Deduplication’. arXiv, 22 March 2023. http://arxiv.org/abs/2303.09540. [10] Sorscher, Ben, et al., Advances in Neural Information Processing Systems 35 (2022): 19523-19536. Beyond neural scaling laws: beating power law scaling via data pruning [11] A. Chapman, P. Missier, G. Simonelli, and R. Torlone. 2020. Capturing and querying fine-grained provenance of preprocessing pipelines in data science. Proc. VLDB Endow. 14, 4 (December 2020), 507–520. https://doi.org/10.14778/3436905.3436911 [12] A. Chapman, L. Lauro, P. Missier, and R. Torlone. 2022. DPDS: assisting data science with data provenance. Proc. VLDB Endow. 15, 12 (2022), 3614–3617. https://doi.org/10.14778/3554821.3554857 IDEAL 2023