SlideShare a Scribd company logo
1 of 23
Download to read offline
1
Reconciling Conflicting Data Curation Actions:
Transparency through Argumentation
Yilin Xia (yilinx2@illinois.edu)
Shawn Bowers (bowers@gonzaga.edu)
Lan Li (lanl2@illinois.edu)
Bertram Ludäscher (ludaesch@illinois.edu)
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning: the story so far …
● 80% of data science is data wrangling … (or so they say)
● Interactive data cleaning (e.g. Excel, OpenRefine, … )
● Script-based (e.g., Python/pandas, R, … )
● Single-user/single-curator setting (… only the lonely … )
● Multi-user/multi-curator collaboration (… friends ..)
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Collaborative Data Cleaning: Pros & possible Cons
Joining forces & pooling expertise
è higher throughput (efficiency)
è higher data quality output
But also …
è Need to coordinate more (e.g., vertical- and/or horizontal splitting, ...)
è Need to resolve conflicts / disputes
è Cost of collaboration
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Collaborative Data Cleaning Part-I: Provenance + Expert Merge
Collaborative DC
Provenance Model (CDCM)
Expert Recipe Merge
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Ross Loretta
Whole Team > Sum of Members?
● Before: Expert coordinator, merging bits & pieces of data cleaning recipes
● Alternative: Tightly-coupled, well-planned collaboration (“eager”)
● New proposal: Loosely-coupled or ad-hoc collaboration (“lazy”)
+ automated conflict-resolution strategy
Rosetta
Team
+ <
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Loosely-Coupled Multi-Curator Data Cleaning Example
6
Book Title Author Date
Against Method Feyerabend, P. 1975
Changing Order Collins, H.M. ␣␣1985 ␣
Exceeding Our Grasp P. Kyle Stanford 2006
Theory of Information 1992
Wrangling Goal: Create an APA style in-text citation based on the given dataset D
Ross Loretta
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Actions (Transformation Model)
7
cell_edit(row_id, column_name, new_value) Cell-Level
del_row(row_id) Row-Level
del_col(column_name) Column-Level
split_col(column_name, separator) Column-Level
transform(column_name, function) Column-Level
join_col(set_of_column_names, separator,
new_column_name)
Column-Level
rename(column_name, new_column_name) Column-Level
… …
OpenRefine
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Actions è Recipes
8
Step Action
1 rename("Book Title", "Book-Title")
2 cell_edit(3, "Author", "Stanford, P.")
3 transform("Date", "value.toNumber()")
4 del_row(4)
5 split_col("Author", ",")
6 del_col("Author 2")
7 join_col("Author 1", "Date", "," ,
"Citation")
Recipe 1
Step Actions
1 rename("Book Title", "Book_Title")
2 transform("Date", "value.trim()")
3 cell_edit(4, "Author", "Shannon, C.E.")
4 cell_edit(3, "Author", "Stanford, P.K.")
5 split_col("Author", ",")
6 rename("Author 1", "Last Name")
7 rename("Author 2", "First Name")
8 join_col("Last Name", "Date", "," ,
"Citation")
Recipe 2
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Actions è Recipes
9
Step Action
E rename("Book Title", "Book-Title")
F cell_edit(3, "Author", "Stanford, P.")
G transform("Date", "value.toNumber()")
H del_row(4)
I split_col("Author", ",")
J del_col("Author 2")
K join_col("Author 1", "Date", "," ,
"Citation")
Recipe 1
Step Actions
L rename("Book Title", "Book_Title")
M transform("Date", "value.trim()")
N cell_edit(4, "Author", "Shannon, C.E.")
O cell_edit(3, "Author", "Stanford, P.K.")
P split_col("Author", ",")
Q rename("Author 1", "Last Name")
R rename("Author 2", "First Name")
S join_col("Last Name", "Date", "," ,
"Citation")
Recipe 2
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Results
10
Book-Title Author Date Author 1 Citation
Against Method Feyerabend, P. 1975 Feyerabend Feyerabend,
1975
Changing Order Collins, H.M. 1985 Collins Collins, 1985
Exceeding Our
Grasp
Stanford, P. 2006 Stanford Stanford,
2006
Theory of
Information
1992
Book_Title Author Date Last Name First
Name
Citation
Against
Method
Feyerabend,
P.
1975 Feyerabend P. Feyerabend,
1975
Changing
Order
Collins, H.M. 1985 Collins H.M. Collins, 1985
Exceeding Our
Grasp
Stanford, P.K. 2006 Stanford P.K. Stanford,
2006
Theory of
Information
Shannon,
C.E.
1992 Shannon C.E. Shannon,
1992
rename("Book Title",
"Book-Title")
rename("Book Title",
"Book_Title")
del_row(4)
transform("Date",
"value.toNumber()")
transform("Date",
"value.trim()")
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Data Cleaning Conflicts
11
Execution Order Data Cleaning Actions
Attack Relationship
defeated(𝑋) ←
attacks(𝑌, 𝑋),
¬ defeated(𝑌).
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Operation Attack Relation (example, one of many)
12
B
A
Attack Relationship update(r,c,v1) del_row(r) del_col(c) split_col(c,sp1) transform(c,F1) join_col(c,...ci,sp1, cn1) rename(c, c1)
update(r,c,v2) A ⟷ B
del_row(r) A ⟶ B ∅
del_col(c) A ⟶ B ∅ ∅
split_col(c,sp2) A ⟵ B ∅ A ⟵ B A ⟷ B
transform(c,F2) A ⟷ B ∅ A ⟵ B A ⟶ B A ⟷ B
join_col(c,...ci,sp2, cn2) A ⟵ B ∅ A ⟵ B ∅ A ⟵ B A ⟷ B
rename(c, c2) A ⟶ B ∅ A ⟷ B A ⟶ B A ⟶ B A ⟶ B A ⟷ B
Describe whether/how operations A and B are in conflict with each other
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Example Data Cleaning Conflicts
13
Attack Description
E ↔ L rename("Book Title", "Book-Title") ↔
rename("Book Title", "Book_Title")
K ← Q del_row(4) → cell_edit(4, "Author", "Shannon, C.E.")
F → P cell_edit(3, "Author", "Stanford, P.") →
split_col("Author", ",")
… …
defeated(𝑋) ←
attacks(𝑌, 𝑋),
¬ defeated(𝑌).
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Formal
Argumentation
14
BBC4 Moral Maze
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Conflict: Argumentation Frameworks
15
defeated(𝑋) ç attacks(𝑌, 𝑋), ¬ defeated(𝑌).
accepted
defeated undecided
undecided
1. a isn’t attacked at all
2. ⇒ a is accepted
3. a attacks b
4. ⇒ b defeated
5. ⇒ b attacks c can be ignored
6. c and d attack each other
7. ⇒ status undecided
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Solving Conflict: Argumentation Frameworks (AF)
16
Input AF
(attack graph)
Output
(solved AF)
defeated(𝑋) ⇐
attacks(𝑌, 𝑋),
not defeated(𝑌).
Argument X is defeated
if it is attacked by Y
and Y is not defeated
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Refined Conflict Analysis: Stable Models (Extensions)
17
Well-founded
Solution
(“skeptical”
reasoning)
Stable Solution 1
(“brave” reasoning)
Stable Solution 2
(“brave” reasoning)
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Solving Ross + Loretta ( = Rosetta) ad-hoc “collaboration”
18
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Refined Solution (Stable Model/Stable Extension)
19
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Refined Solution put back in Recipe Order
20
Step Actions Curator
E rename("Book Title", "Book-Title") Alice
M transform("Date", "value.trim()") Bob
H del_row(4) Alice
O cell_edit(3, "Author", "Stanford,
P.K.")
Bob
P split_col("Author", ",") Bob
J del_col("Author 2") Alice
Q rename("Author 1", "Last Name") Bob
S join_col("Last Name", "Date", "," ,
"Citation")
Bob
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Et voilà! The merged recipe and combined solution!
21
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Conclusions (Work in Progress) & Future Work
22
An approach based on formal
argumentation frameworks for
- modeling the actions of users’ data-
cleaning recipes
- identifying conflicting actions across
recipes
- providing users with new tools to help
resolve these conflicts to generate a
single, unified, merged recipe.
An algorithm helps auto-process
recipes and solve conflicts
Take dependencies in account
when modeling
Explore criterias can be used to
evaluate possible merged recipe
23
Reconciling Conflicting Data Curation Actions:
Transparency Through Argumentation
Yilin Xia yilinx2@illinois.edu
Shawn Bowers bowers@gonzaga.edu
Lan Li lanl2@illinois.edu
Bertram Ludäscher ludaesch@illinois.edu

More Related Content

Similar to Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation

Power of functions in a typed world
Power of functions in a typed worldPower of functions in a typed world
Power of functions in a typed worldDebasish Ghosh
 
Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011Devnology
 
Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.pptbutest
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3DanWooster1
 
CptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial IntelligenceCptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial Intelligencebutest
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsPhilip Schwarz
 
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )GoDataDriven
 
03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdfAlireza418370
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptxProfPPavanKumar
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptxProfPPavanKumar
 

Similar to Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation (20)

Power of functions in a typed world
Power of functions in a typed worldPower of functions in a typed world
Power of functions in a typed world
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
R language introduction
R language introductionR language introduction
R language introduction
 
Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011
 
Dbms
DbmsDbms
Dbms
 
Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.ppt
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Warehousing
WarehousingWarehousing
Warehousing
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
 
CptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial IntelligenceCptS 440 / 540 Artificial Intelligence
CptS 440 / 540 Artificial Intelligence
 
Ghost
GhostGhost
Ghost
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with Views
 
PyData Paris 2015 - Track 3.1 Niels Zeilemaker
PyData Paris 2015 - Track 3.1 Niels ZeilemakerPyData Paris 2015 - Track 3.1 Niels Zeilemaker
PyData Paris 2015 - Track 3.1 Niels Zeilemaker
 
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdf
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
 
03Preprocessing.ppt
03Preprocessing.ppt03Preprocessing.ppt
03Preprocessing.ppt
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
 

More from Bertram Ludäscher

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueBertram Ludäscher
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...Bertram Ludäscher
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachBertram Ludäscher
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchBertram Ludäscher
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatBertram Ludäscher
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceBertram Ludäscher
 

More from Bertram Ludäscher (20)

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's Seat
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
 

Recently uploaded

社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7gragkhusi
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 

Recently uploaded (20)

社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation

  • 1. 1 Reconciling Conflicting Data Curation Actions: Transparency through Argumentation Yilin Xia (yilinx2@illinois.edu) Shawn Bowers (bowers@gonzaga.edu) Lan Li (lanl2@illinois.edu) Bertram Ludäscher (ludaesch@illinois.edu)
  • 2. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning: the story so far … ● 80% of data science is data wrangling … (or so they say) ● Interactive data cleaning (e.g. Excel, OpenRefine, … ) ● Script-based (e.g., Python/pandas, R, … ) ● Single-user/single-curator setting (… only the lonely … ) ● Multi-user/multi-curator collaboration (… friends ..)
  • 3. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Collaborative Data Cleaning: Pros & possible Cons Joining forces & pooling expertise è higher throughput (efficiency) è higher data quality output But also … è Need to coordinate more (e.g., vertical- and/or horizontal splitting, ...) è Need to resolve conflicts / disputes è Cost of collaboration
  • 4. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Collaborative Data Cleaning Part-I: Provenance + Expert Merge Collaborative DC Provenance Model (CDCM) Expert Recipe Merge
  • 5. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Ross Loretta Whole Team > Sum of Members? ● Before: Expert coordinator, merging bits & pieces of data cleaning recipes ● Alternative: Tightly-coupled, well-planned collaboration (“eager”) ● New proposal: Loosely-coupled or ad-hoc collaboration (“lazy”) + automated conflict-resolution strategy Rosetta Team + <
  • 6. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Loosely-Coupled Multi-Curator Data Cleaning Example 6 Book Title Author Date Against Method Feyerabend, P. 1975 Changing Order Collins, H.M. ␣␣1985 ␣ Exceeding Our Grasp P. Kyle Stanford 2006 Theory of Information 1992 Wrangling Goal: Create an APA style in-text citation based on the given dataset D Ross Loretta
  • 7. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Actions (Transformation Model) 7 cell_edit(row_id, column_name, new_value) Cell-Level del_row(row_id) Row-Level del_col(column_name) Column-Level split_col(column_name, separator) Column-Level transform(column_name, function) Column-Level join_col(set_of_column_names, separator, new_column_name) Column-Level rename(column_name, new_column_name) Column-Level … … OpenRefine
  • 8. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Actions è Recipes 8 Step Action 1 rename("Book Title", "Book-Title") 2 cell_edit(3, "Author", "Stanford, P.") 3 transform("Date", "value.toNumber()") 4 del_row(4) 5 split_col("Author", ",") 6 del_col("Author 2") 7 join_col("Author 1", "Date", "," , "Citation") Recipe 1 Step Actions 1 rename("Book Title", "Book_Title") 2 transform("Date", "value.trim()") 3 cell_edit(4, "Author", "Shannon, C.E.") 4 cell_edit(3, "Author", "Stanford, P.K.") 5 split_col("Author", ",") 6 rename("Author 1", "Last Name") 7 rename("Author 2", "First Name") 8 join_col("Last Name", "Date", "," , "Citation") Recipe 2
  • 9. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Actions è Recipes 9 Step Action E rename("Book Title", "Book-Title") F cell_edit(3, "Author", "Stanford, P.") G transform("Date", "value.toNumber()") H del_row(4) I split_col("Author", ",") J del_col("Author 2") K join_col("Author 1", "Date", "," , "Citation") Recipe 1 Step Actions L rename("Book Title", "Book_Title") M transform("Date", "value.trim()") N cell_edit(4, "Author", "Shannon, C.E.") O cell_edit(3, "Author", "Stanford, P.K.") P split_col("Author", ",") Q rename("Author 1", "Last Name") R rename("Author 2", "First Name") S join_col("Last Name", "Date", "," , "Citation") Recipe 2
  • 10. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Data Cleaning Results 10 Book-Title Author Date Author 1 Citation Against Method Feyerabend, P. 1975 Feyerabend Feyerabend, 1975 Changing Order Collins, H.M. 1985 Collins Collins, 1985 Exceeding Our Grasp Stanford, P. 2006 Stanford Stanford, 2006 Theory of Information 1992 Book_Title Author Date Last Name First Name Citation Against Method Feyerabend, P. 1975 Feyerabend P. Feyerabend, 1975 Changing Order Collins, H.M. 1985 Collins H.M. Collins, 1985 Exceeding Our Grasp Stanford, P.K. 2006 Stanford P.K. Stanford, 2006 Theory of Information Shannon, C.E. 1992 Shannon C.E. Shannon, 1992 rename("Book Title", "Book-Title") rename("Book Title", "Book_Title") del_row(4) transform("Date", "value.toNumber()") transform("Date", "value.trim()")
  • 11. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Modeling Data Cleaning Conflicts 11 Execution Order Data Cleaning Actions Attack Relationship defeated(𝑋) ← attacks(𝑌, 𝑋), ¬ defeated(𝑌).
  • 12. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Operation Attack Relation (example, one of many) 12 B A Attack Relationship update(r,c,v1) del_row(r) del_col(c) split_col(c,sp1) transform(c,F1) join_col(c,...ci,sp1, cn1) rename(c, c1) update(r,c,v2) A ⟷ B del_row(r) A ⟶ B ∅ del_col(c) A ⟶ B ∅ ∅ split_col(c,sp2) A ⟵ B ∅ A ⟵ B A ⟷ B transform(c,F2) A ⟷ B ∅ A ⟵ B A ⟶ B A ⟷ B join_col(c,...ci,sp2, cn2) A ⟵ B ∅ A ⟵ B ∅ A ⟵ B A ⟷ B rename(c, c2) A ⟶ B ∅ A ⟷ B A ⟶ B A ⟶ B A ⟶ B A ⟷ B Describe whether/how operations A and B are in conflict with each other
  • 13. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Example Data Cleaning Conflicts 13 Attack Description E ↔ L rename("Book Title", "Book-Title") ↔ rename("Book Title", "Book_Title") K ← Q del_row(4) → cell_edit(4, "Author", "Shannon, C.E.") F → P cell_edit(3, "Author", "Stanford, P.") → split_col("Author", ",") … … defeated(𝑋) ← attacks(𝑌, 𝑋), ¬ defeated(𝑌).
  • 14. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Formal Argumentation 14 BBC4 Moral Maze
  • 15. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Modeling Conflict: Argumentation Frameworks 15 defeated(𝑋) ç attacks(𝑌, 𝑋), ¬ defeated(𝑌). accepted defeated undecided undecided 1. a isn’t attacked at all 2. ⇒ a is accepted 3. a attacks b 4. ⇒ b defeated 5. ⇒ b attacks c can be ignored 6. c and d attack each other 7. ⇒ status undecided
  • 16. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Solving Conflict: Argumentation Frameworks (AF) 16 Input AF (attack graph) Output (solved AF) defeated(𝑋) ⇐ attacks(𝑌, 𝑋), not defeated(𝑌). Argument X is defeated if it is attacked by Y and Y is not defeated
  • 17. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Refined Conflict Analysis: Stable Models (Extensions) 17 Well-founded Solution (“skeptical” reasoning) Stable Solution 1 (“brave” reasoning) Stable Solution 2 (“brave” reasoning)
  • 18. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Solving Ross + Loretta ( = Rosetta) ad-hoc “collaboration” 18 Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository. https://github.com/idaks/Games-and-Argumentation/tree/idcc
  • 19. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Refined Solution (Stable Model/Stable Extension) 19 Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository. https://github.com/idaks/Games-and-Argumentation/tree/idcc
  • 20. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Refined Solution put back in Recipe Order 20 Step Actions Curator E rename("Book Title", "Book-Title") Alice M transform("Date", "value.trim()") Bob H del_row(4) Alice O cell_edit(3, "Author", "Stanford, P.K.") Bob P split_col("Author", ",") Bob J del_col("Author 2") Alice Q rename("Author 1", "Last Name") Bob S join_col("Last Name", "Date", "," , "Citation") Bob
  • 21. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Et voilà! The merged recipe and combined solution! 21 Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository. https://github.com/idaks/Games-and-Argumentation/tree/idcc
  • 22. Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Conclusions (Work in Progress) & Future Work 22 An approach based on formal argumentation frameworks for - modeling the actions of users’ data- cleaning recipes - identifying conflicting actions across recipes - providing users with new tools to help resolve these conflicts to generate a single, unified, merged recipe. An algorithm helps auto-process recipes and solve conflicts Take dependencies in account when modeling Explore criterias can be used to evaluate possible merged recipe
  • 23. 23 Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation Yilin Xia yilinx2@illinois.edu Shawn Bowers bowers@gonzaga.edu Lan Li lanl2@illinois.edu Bertram Ludäscher ludaesch@illinois.edu