SlideShare a Scribd company logo
1
Reconciling Conflicting Data Curation Actions:
Transparency through Argumentation
Yilin Xia (yilinx2@illinois.edu)
Shawn Bowers (bowers@gonzaga.edu)
Lan Li (lanl2@illinois.edu)
Bertram Ludäscher (ludaesch@illinois.edu)
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning: the story so far …
● 80% of data science is data wrangling … (or so they say)
● Interactive data cleaning (e.g. Excel, OpenRefine, … )
● Script-based (e.g., Python/pandas, R, … )
● Single-user/single-curator setting (… only the lonely … )
● Multi-user/multi-curator collaboration (… friends ..)
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Collaborative Data Cleaning: Pros & possible Cons
Joining forces & pooling expertise
è higher throughput (efficiency)
è higher data quality output
But also …
è Need to coordinate more (e.g., vertical- and/or horizontal splitting, ...)
è Need to resolve conflicts / disputes
è Cost of collaboration
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Collaborative Data Cleaning Part-I: Provenance + Expert Merge
Collaborative DC
Provenance Model (CDCM)
Expert Recipe Merge
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Ross Loretta
Whole Team > Sum of Members?
● Before: Expert coordinator, merging bits & pieces of data cleaning recipes
● Alternative: Tightly-coupled, well-planned collaboration (“eager”)
● New proposal: Loosely-coupled or ad-hoc collaboration (“lazy”)
+ automated conflict-resolution strategy
Rosetta
Team
+ <
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Loosely-Coupled Multi-Curator Data Cleaning Example
6
Book Title Author Date
Against Method Feyerabend, P. 1975
Changing Order Collins, H.M. ␣␣1985 ␣
Exceeding Our Grasp P. Kyle Stanford 2006
Theory of Information 1992
Wrangling Goal: Create an APA style in-text citation based on the given dataset D
Ross Loretta
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Actions (Transformation Model)
7
cell_edit(row_id, column_name, new_value) Cell-Level
del_row(row_id) Row-Level
del_col(column_name) Column-Level
split_col(column_name, separator) Column-Level
transform(column_name, function) Column-Level
join_col(set_of_column_names, separator,
new_column_name)
Column-Level
rename(column_name, new_column_name) Column-Level
… …
OpenRefine
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Actions è Recipes
8
Step Action
1 rename("Book Title", "Book-Title")
2 cell_edit(3, "Author", "Stanford, P.")
3 transform("Date", "value.toNumber()")
4 del_row(4)
5 split_col("Author", ",")
6 del_col("Author 2")
7 join_col("Author 1", "Date", "," ,
"Citation")
Recipe 1
Step Actions
1 rename("Book Title", "Book_Title")
2 transform("Date", "value.trim()")
3 cell_edit(4, "Author", "Shannon, C.E.")
4 cell_edit(3, "Author", "Stanford, P.K.")
5 split_col("Author", ",")
6 rename("Author 1", "Last Name")
7 rename("Author 2", "First Name")
8 join_col("Last Name", "Date", "," ,
"Citation")
Recipe 2
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Actions è Recipes
9
Step Action
E rename("Book Title", "Book-Title")
F cell_edit(3, "Author", "Stanford, P.")
G transform("Date", "value.toNumber()")
H del_row(4)
I split_col("Author", ",")
J del_col("Author 2")
K join_col("Author 1", "Date", "," ,
"Citation")
Recipe 1
Step Actions
L rename("Book Title", "Book_Title")
M transform("Date", "value.trim()")
N cell_edit(4, "Author", "Shannon, C.E.")
O cell_edit(3, "Author", "Stanford, P.K.")
P split_col("Author", ",")
Q rename("Author 1", "Last Name")
R rename("Author 2", "First Name")
S join_col("Last Name", "Date", "," ,
"Citation")
Recipe 2
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Results
10
Book-Title Author Date Author 1 Citation
Against Method Feyerabend, P. 1975 Feyerabend Feyerabend,
1975
Changing Order Collins, H.M. 1985 Collins Collins, 1985
Exceeding Our
Grasp
Stanford, P. 2006 Stanford Stanford,
2006
Theory of
Information
1992
Book_Title Author Date Last Name First
Name
Citation
Against
Method
Feyerabend,
P.
1975 Feyerabend P. Feyerabend,
1975
Changing
Order
Collins, H.M. 1985 Collins H.M. Collins, 1985
Exceeding Our
Grasp
Stanford, P.K. 2006 Stanford P.K. Stanford,
2006
Theory of
Information
Shannon,
C.E.
1992 Shannon C.E. Shannon,
1992
rename("Book Title",
"Book-Title")
rename("Book Title",
"Book_Title")
del_row(4)
transform("Date",
"value.toNumber()")
transform("Date",
"value.trim()")
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Data Cleaning Conflicts
11
Execution Order Data Cleaning Actions
Attack Relationship
defeated(𝑋) ←
attacks(𝑌, 𝑋),
¬ defeated(𝑌).
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Operation Attack Relation (example, one of many)
12
B
A
Attack Relationship update(r,c,v1) del_row(r) del_col(c) split_col(c,sp1) transform(c,F1) join_col(c,...ci,sp1, cn1) rename(c, c1)
update(r,c,v2) A ⟷ B
del_row(r) A ⟶ B ∅
del_col(c) A ⟶ B ∅ ∅
split_col(c,sp2) A ⟵ B ∅ A ⟵ B A ⟷ B
transform(c,F2) A ⟷ B ∅ A ⟵ B A ⟶ B A ⟷ B
join_col(c,...ci,sp2, cn2) A ⟵ B ∅ A ⟵ B ∅ A ⟵ B A ⟷ B
rename(c, c2) A ⟶ B ∅ A ⟷ B A ⟶ B A ⟶ B A ⟶ B A ⟷ B
Describe whether/how operations A and B are in conflict with each other
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Example Data Cleaning Conflicts
13
Attack Description
E ↔ L rename("Book Title", "Book-Title") ↔
rename("Book Title", "Book_Title")
K ← Q del_row(4) → cell_edit(4, "Author", "Shannon, C.E.")
F → P cell_edit(3, "Author", "Stanford, P.") →
split_col("Author", ",")
… …
defeated(𝑋) ←
attacks(𝑌, 𝑋),
¬ defeated(𝑌).
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Formal
Argumentation
14
BBC4 Moral Maze
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Conflict: Argumentation Frameworks
15
defeated(𝑋) ç attacks(𝑌, 𝑋), ¬ defeated(𝑌).
accepted
defeated undecided
undecided
1. a isn’t attacked at all
2. ⇒ a is accepted
3. a attacks b
4. ⇒ b defeated
5. ⇒ b attacks c can be ignored
6. c and d attack each other
7. ⇒ status undecided
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Solving Conflict: Argumentation Frameworks (AF)
16
Input AF
(attack graph)
Output
(solved AF)
defeated(𝑋) ⇐
attacks(𝑌, 𝑋),
not defeated(𝑌).
Argument X is defeated
if it is attacked by Y
and Y is not defeated
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Refined Conflict Analysis: Stable Models (Extensions)
17
Well-founded
Solution
(“skeptical”
reasoning)
Stable Solution 1
(“brave” reasoning)
Stable Solution 2
(“brave” reasoning)
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Solving Ross + Loretta ( = Rosetta) ad-hoc “collaboration”
18
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Refined Solution (Stable Model/Stable Extension)
19
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Refined Solution put back in Recipe Order
20
Step Actions Curator
E rename("Book Title", "Book-Title") Alice
M transform("Date", "value.trim()") Bob
H del_row(4) Alice
O cell_edit(3, "Author", "Stanford,
P.K.")
Bob
P split_col("Author", ",") Bob
J del_col("Author 2") Alice
Q rename("Author 1", "Last Name") Bob
S join_col("Last Name", "Date", "," ,
"Citation")
Bob
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Et voilà! The merged recipe and combined solution!
21
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Conclusions (Work in Progress) & Future Work
22
An approach based on formal
argumentation frameworks for
- modeling the actions of users’ data-
cleaning recipes
- identifying conflicting actions across
recipes
- providing users with new tools to help
resolve these conflicts to generate a
single, unified, merged recipe.
An algorithm helps auto-process
recipes and solve conflicts
Take dependencies in account
when modeling
Explore criterias can be used to
evaluate possible merged recipe
23
Reconciling Conflicting Data Curation Actions:
Transparency Through Argumentation
Yilin Xia yilinx2@illinois.edu
Shawn Bowers bowers@gonzaga.edu
Lan Li lanl2@illinois.edu
Bertram Ludäscher ludaesch@illinois.edu

More Related Content

Similar to Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation

Power of functions in a typed world
Power of functions in a typed worldPower of functions in a typed world
Power of functions in a typed world
Debasish Ghosh
 
Functional programming
Functional programmingFunctional programming
Functional programming
Christian Hujer
 
Data Preprocessing
Data PreprocessingData Preprocessing
R language introduction
R language introductionR language introduction
R language introduction
Shashwat Shriparv
 
Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011
Devnology
 
Dbms
DbmsDbms
Dbms
AbiramiK
 
Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.ppt
butest
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
VijayasankariS
 
Warehousing
WarehousingWarehousing
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
DanWooster1
 
CptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial IntelligenceCptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial Intelligence
butest
 
Ghost
GhostGhost
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with Views
Philip Schwarz
 
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
GoDataDriven
 
PyData Paris 2015 - Track 3.1 Niels Zeilemaker
PyData Paris 2015 - Track 3.1 Niels ZeilemakerPyData Paris 2015 - Track 3.1 Niels Zeilemaker
PyData Paris 2015 - Track 3.1 Niels Zeilemaker
Pôle Systematic Paris-Region
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
Prof. Wim Van Criekinge
 
03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdf
Alireza418370
 
03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt
03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt
03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt
JITENDER773791
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
ProfPPavanKumar
 
03Preprocessing.ppt
03Preprocessing.ppt03Preprocessing.ppt
03Preprocessing.ppt
ProfPPavanKumar
 

Similar to Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation (20)

Power of functions in a typed world
Power of functions in a typed worldPower of functions in a typed world
Power of functions in a typed world
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
R language introduction
R language introductionR language introduction
R language introduction
 
Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011
 
Dbms
DbmsDbms
Dbms
 
Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.ppt
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Warehousing
WarehousingWarehousing
Warehousing
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
 
CptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial IntelligenceCptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial Intelligence
 
Ghost
GhostGhost
Ghost
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with Views
 
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
 
PyData Paris 2015 - Track 3.1 Niels Zeilemaker
PyData Paris 2015 - Track 3.1 Niels ZeilemakerPyData Paris 2015 - Track 3.1 Niels Zeilemaker
PyData Paris 2015 - Track 3.1 Niels Zeilemaker
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdf
 
03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt
03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt
03Preprocesmlmlmljhjninibvbnjhyuftrdtyfyujsing.ppt
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
 
03Preprocessing.ppt
03Preprocessing.ppt03Preprocessing.ppt
03Preprocessing.ppt
 

More from Bertram Ludäscher

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Bertram Ludäscher
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Bertram Ludäscher
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
Bertram Ludäscher
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
Bertram Ludäscher
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
Bertram Ludäscher
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Bertram Ludäscher
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
Bertram Ludäscher
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
Bertram Ludäscher
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
Bertram Ludäscher
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Bertram Ludäscher
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Bertram Ludäscher
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
Bertram Ludäscher
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
Bertram Ludäscher
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
Bertram Ludäscher
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Bertram Ludäscher
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
Bertram Ludäscher
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Bertram Ludäscher
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
Bertram Ludäscher
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's Seat
Bertram Ludäscher
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
Bertram Ludäscher
 

More from Bertram Ludäscher (20)

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's Seat
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
 

Recently uploaded

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation

  • 1. 1 Reconciling Conflicting Data Curation Actions: Transparency through Argumentation Yilin Xia (yilinx2@illinois.edu) Shawn Bowers (bowers@gonzaga.edu) Lan Li (lanl2@illinois.edu) Bertram Ludäscher (ludaesch@illinois.edu)
  • 2. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning: the story so far … ● 80% of data science is data wrangling … (or so they say) ● Interactive data cleaning (e.g. Excel, OpenRefine, … ) ● Script-based (e.g., Python/pandas, R, … ) ● Single-user/single-curator setting (… only the lonely … ) ● Multi-user/multi-curator collaboration (… friends ..)
  • 3. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Collaborative Data Cleaning: Pros & possible Cons Joining forces & pooling expertise è higher throughput (efficiency) è higher data quality output But also … è Need to coordinate more (e.g., vertical- and/or horizontal splitting, ...) è Need to resolve conflicts / disputes è Cost of collaboration
  • 4. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Collaborative Data Cleaning Part-I: Provenance + Expert Merge Collaborative DC Provenance Model (CDCM) Expert Recipe Merge
  • 5. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Ross Loretta Whole Team > Sum of Members? ● Before: Expert coordinator, merging bits & pieces of data cleaning recipes ● Alternative: Tightly-coupled, well-planned collaboration (“eager”) ● New proposal: Loosely-coupled or ad-hoc collaboration (“lazy”) + automated conflict-resolution strategy Rosetta Team + <
  • 6. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Loosely-Coupled Multi-Curator Data Cleaning Example 6 Book Title Author Date Against Method Feyerabend, P. 1975 Changing Order Collins, H.M. ␣␣1985 ␣ Exceeding Our Grasp P. Kyle Stanford 2006 Theory of Information 1992 Wrangling Goal: Create an APA style in-text citation based on the given dataset D Ross Loretta
  • 7. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Actions (Transformation Model) 7 cell_edit(row_id, column_name, new_value) Cell-Level del_row(row_id) Row-Level del_col(column_name) Column-Level split_col(column_name, separator) Column-Level transform(column_name, function) Column-Level join_col(set_of_column_names, separator, new_column_name) Column-Level rename(column_name, new_column_name) Column-Level … … OpenRefine
  • 8. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Actions è Recipes 8 Step Action 1 rename("Book Title", "Book-Title") 2 cell_edit(3, "Author", "Stanford, P.") 3 transform("Date", "value.toNumber()") 4 del_row(4) 5 split_col("Author", ",") 6 del_col("Author 2") 7 join_col("Author 1", "Date", "," , "Citation") Recipe 1 Step Actions 1 rename("Book Title", "Book_Title") 2 transform("Date", "value.trim()") 3 cell_edit(4, "Author", "Shannon, C.E.") 4 cell_edit(3, "Author", "Stanford, P.K.") 5 split_col("Author", ",") 6 rename("Author 1", "Last Name") 7 rename("Author 2", "First Name") 8 join_col("Last Name", "Date", "," , "Citation") Recipe 2
  • 9. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Actions è Recipes 9 Step Action E rename("Book Title", "Book-Title") F cell_edit(3, "Author", "Stanford, P.") G transform("Date", "value.toNumber()") H del_row(4) I split_col("Author", ",") J del_col("Author 2") K join_col("Author 1", "Date", "," , "Citation") Recipe 1 Step Actions L rename("Book Title", "Book_Title") M transform("Date", "value.trim()") N cell_edit(4, "Author", "Shannon, C.E.") O cell_edit(3, "Author", "Stanford, P.K.") P split_col("Author", ",") Q rename("Author 1", "Last Name") R rename("Author 2", "First Name") S join_col("Last Name", "Date", "," , "Citation") Recipe 2
  • 10. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Results 10 Book-Title Author Date Author 1 Citation Against Method Feyerabend, P. 1975 Feyerabend Feyerabend, 1975 Changing Order Collins, H.M. 1985 Collins Collins, 1985 Exceeding Our Grasp Stanford, P. 2006 Stanford Stanford, 2006 Theory of Information 1992 Book_Title Author Date Last Name First Name Citation Against Method Feyerabend, P. 1975 Feyerabend P. Feyerabend, 1975 Changing Order Collins, H.M. 1985 Collins H.M. Collins, 1985 Exceeding Our Grasp Stanford, P.K. 2006 Stanford P.K. Stanford, 2006 Theory of Information Shannon, C.E. 1992 Shannon C.E. Shannon, 1992 rename("Book Title", "Book-Title") rename("Book Title", "Book_Title") del_row(4) transform("Date", "value.toNumber()") transform("Date", "value.trim()")
  • 11. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Modeling Data Cleaning Conflicts 11 Execution Order Data Cleaning Actions Attack Relationship defeated(𝑋) ← attacks(𝑌, 𝑋), ¬ defeated(𝑌).
  • 12. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Operation Attack Relation (example, one of many) 12 B A Attack Relationship update(r,c,v1) del_row(r) del_col(c) split_col(c,sp1) transform(c,F1) join_col(c,...ci,sp1, cn1) rename(c, c1) update(r,c,v2) A ⟷ B del_row(r) A ⟶ B ∅ del_col(c) A ⟶ B ∅ ∅ split_col(c,sp2) A ⟵ B ∅ A ⟵ B A ⟷ B transform(c,F2) A ⟷ B ∅ A ⟵ B A ⟶ B A ⟷ B join_col(c,...ci,sp2, cn2) A ⟵ B ∅ A ⟵ B ∅ A ⟵ B A ⟷ B rename(c, c2) A ⟶ B ∅ A ⟷ B A ⟶ B A ⟶ B A ⟶ B A ⟷ B Describe whether/how operations A and B are in conflict with each other
  • 13. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Example Data Cleaning Conflicts 13 Attack Description E ↔ L rename("Book Title", "Book-Title") ↔ rename("Book Title", "Book_Title") K ← Q del_row(4) → cell_edit(4, "Author", "Shannon, C.E.") F → P cell_edit(3, "Author", "Stanford, P.") → split_col("Author", ",") … … defeated(𝑋) ← attacks(𝑌, 𝑋), ¬ defeated(𝑌).
  • 14. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Formal Argumentation 14 BBC4 Moral Maze
  • 15. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Modeling Conflict: Argumentation Frameworks 15 defeated(𝑋) ç attacks(𝑌, 𝑋), ¬ defeated(𝑌). accepted defeated undecided undecided 1. a isn’t attacked at all 2. ⇒ a is accepted 3. a attacks b 4. ⇒ b defeated 5. ⇒ b attacks c can be ignored 6. c and d attack each other 7. ⇒ status undecided
  • 16. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Solving Conflict: Argumentation Frameworks (AF) 16 Input AF (attack graph) Output (solved AF) defeated(𝑋) ⇐ attacks(𝑌, 𝑋), not defeated(𝑌). Argument X is defeated if it is attacked by Y and Y is not defeated
  • 17. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Refined Conflict Analysis: Stable Models (Extensions) 17 Well-founded Solution (“skeptical” reasoning) Stable Solution 1 (“brave” reasoning) Stable Solution 2 (“brave” reasoning)
  • 18. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Solving Ross + Loretta ( = Rosetta) ad-hoc “collaboration” 18 Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository. https://github.com/idaks/Games-and-Argumentation/tree/idcc
  • 19. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Refined Solution (Stable Model/Stable Extension) 19 Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository. https://github.com/idaks/Games-and-Argumentation/tree/idcc
  • 20. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Refined Solution put back in Recipe Order 20 Step Actions Curator E rename("Book Title", "Book-Title") Alice M transform("Date", "value.trim()") Bob H del_row(4) Alice O cell_edit(3, "Author", "Stanford, P.K.") Bob P split_col("Author", ",") Bob J del_col("Author 2") Alice Q rename("Author 1", "Last Name") Bob S join_col("Last Name", "Date", "," , "Citation") Bob
  • 21. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Et voilà! The merged recipe and combined solution! 21 Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository. https://github.com/idaks/Games-and-Argumentation/tree/idcc
  • 22. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Conclusions (Work in Progress) & Future Work 22 An approach based on formal argumentation frameworks for - modeling the actions of users’ data- cleaning recipes - identifying conflicting actions across recipes - providing users with new tools to help resolve these conflicts to generate a single, unified, merged recipe. An algorithm helps auto-process recipes and solve conflicts Take dependencies in account when modeling Explore criterias can be used to evaluate possible merged recipe
  • 23. 23 Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Yilin Xia yilinx2@illinois.edu Shawn Bowers bowers@gonzaga.edu Lan Li lanl2@illinois.edu Bertram Ludäscher ludaesch@illinois.edu