SlideShare a Scribd company logo
Dataset Quality Visualization in BEXIS2
M A S T E R T H E S I S
Nafiseh Navabpour
March 2021
Supervisors
Prof. Dr. Birgitta König-Ries, Roman Gerlach, Sirko Schindler
Outline
1. Problem statement
2. The path in solving the problem
3. A prototypical solution
4. Future perspective
2
Select a dataset
Problem statement
3
Dataset review
Problem statement
4
Dataset review
Problem statement
5
Dataset review
Problem statement
6
Summary of problem statement
• Data is continuously produced
• Data is stored in different quality level in data portals
• Data consumers search data portals to review dataset
quality
• Reviewing dataset quality is time consuming
=> Dataset quality overview
Problem statement
7
• Interview with BEXIS2 users
• Literature review
• Looking at other data portals
8
The path in solving the problem
The path in solving the problem
Interview with BEXIS2 users
Upload
data
Quality
check
• Excel sheet
• TXT
• CSV
• Image
• Comprehensible metadata
• Well-defined dataset
• Clearly defined variables
• Accurate metadata
• Correct information
• Complete data
• Ready for analyzing
• Ready for reuse
The path in solving the problem
9
Literature review:
Data quality Dimensions
Intrinsic DQ Accuracy Objectivity Believability Reputation
Contextual DQ
Value-
added
Relevancy Timeliness Completeness
amount of
data
Representational
DQ
Interpretability
Ease of
understanding
Consistency Conciseness
Accessibility DQ Accessibility Access security
The path in solving the problem
10
Wang et al. 1996.
Literature review:
Dataset summarization
Data origin
The way of
specifying
the time
The way of
specifying
the location
Dataset
completion
Anything else
that make
data clearer
A short
description
Dataset
format and
size
Data headers
Data types
and data
values
The path in solving the problem
11
Koesten et al. 2020.
Dataset quality components
Quality components Quality attributes
Description length Accuracy, Completeness
Dataset format and size Accessibility
A list of variables Relevancy, Completeness, Understandability
data types, data distribution Completeness, Amount of data
A list of files, file extensions Relevancy, Understandability, Accessibility
Dataset contributors Believability
Metadata/data completeness Completeness
Dataset security level Accessibility, Security
Shared elements Reputation
Comparison Value-added
The path in solving the problem
12
Literature review:
Dataset quality visualization pipeline
Data
import
Data
preparation Mapping
Data
manipulation
Rendering
The path in solving the problem
13
Qin et al. 2020.
looking at other data portals
The path in solving the problem
https://data.cityofnewyork.us
https://www.kaggle.com
Dataset quality
overview
a) General quality information
b) Comparison
c) Data quality
15
A prototypical solution
General quality information
16
A prototypical solution
General quality information
17
A prototypical solution
Dataset list
18
A prototypical solution
Comparison
19
A prototypical solution
Legend
20
A prototypical solution
Compare dataset sizes
21
A prototypical solution
Compare dataset sizes
22
A prototypical solution
Tabular data quality
23
A prototypical solution
File data quality
24
A prototypical solution
Positive and
negative points
+ Short text
+ Similar shapes
+ Color palette
+ Legend and tooltips
+ Interactive elements
- Loading time
25
A prototypical solution
Future
perspective
• Shorten loading time
• Review users’ feedback
• Add new features
• Data visualization
26
Future perspective
Thank you for your attention!
Jena, 23.03.2021
Nafiseh Navabpour
Nafiseh.Navabpour@uni-jena.de
References
• Atto – amazon tall tower observatory, Feb. 2021. [Online]. Available: https://www.attoproject.org/.
• Covid-19 world vaccination progress, Feb. 2021. [Online]. Available:
https://www.kaggle.com/gpreda/covid-world-vaccination-progress.
• L. Koesten, E. Simperl, T. Blount, E. Kacprzak, and J. Tennison, “Everything you always wanted to know
about a dataset: Studies in data summarisation,” International Journal of Human-Computer Studies, vol.
135, p. 102 367, 2020. DOI: 10.1016/j.ijhcs.2019.10.004.
• N. Navabpour, Dataset quality visualization in bexis2, version 1.0.0, Mar. 2021. DOI:
10.5281/zenodo.4485845.
• Covid-19 free meals locations, Feb. 2021. [Online]. Available:
https://data.cityofnewyork.us/Education/COVID-19-Free-Meals-Locations/sp4a-vevi.
• X. Qin, Y. Luo, N. Tang, and G. Li, “Making data visualization more efficient and effective: A survey,” The
VLDB Journal, vol. 29, no. 1, pp. 93–117, 2020. DOI: 10.1007/s00778-019-00588-3.
• R. Y. Wang and D. M. Strong, “Beyond accuracy: What data quality means to data consumers,” Journal of
management information systems, vol. 12, no. 4, pp. 5–33, 1996. DOI: 10.1080/07421222.1996.11518099.
28

More Related Content

Similar to Dataset quality visualization in BEXIS2

Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
jerdeb
 
PDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationPDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentation
DBOnto
 
capstone-story-template.pptx
capstone-story-template.pptxcapstone-story-template.pptx
capstone-story-template.pptx
PraveenPawar37
 
Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010
ERwin Modeling
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
Besnik Fetahu
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
Justin Sybrandt, Ph.D.
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Data
dapaasproject
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
"Open Access - Open Data" conference, 13th/14th December, 2010
 
Harper Analytics Beyond Usage Numbers
Harper Analytics Beyond Usage NumbersHarper Analytics Beyond Usage Numbers
Harper Analytics Beyond Usage Numbers
National Information Standards Organization (NISO)
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
Boris Glavic
 
A step towards a data quality theory
 A step towards a data quality theory A step towards a data quality theory
A step towards a data quality theory
Anastasija Nikiforova
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extraction
Andrea Miller-Nesbitt
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
Hmp 201512
Hmp 201512Hmp 201512
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Sotiris Beis
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Artificial Intelligence Institute at UofSC
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
Linked Enterprise Date Services
 
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
COST Action TD1210
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 

Similar to Dataset quality visualization in BEXIS2 (20)

Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
 
PDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationPDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentation
 
capstone-story-template.pptx
capstone-story-template.pptxcapstone-story-template.pptx
capstone-story-template.pptx
 
Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Data
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
 
Harper Analytics Beyond Usage Numbers
Harper Analytics Beyond Usage NumbersHarper Analytics Beyond Usage Numbers
Harper Analytics Beyond Usage Numbers
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
 
A step towards a data quality theory
 A step towards a data quality theory A step towards a data quality theory
A step towards a data quality theory
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extraction
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
Hmp 201512
Hmp 201512Hmp 201512
Hmp 201512
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
 
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 

More from Nafiseh Navabpour

BEXIS2 Workshop - Part2
BEXIS2 Workshop - Part2BEXIS2 Workshop - Part2
BEXIS2 Workshop - Part2
Nafiseh Navabpour
 
BEXIS2 Workshop - Part1
BEXIS2 Workshop - Part1BEXIS2 Workshop - Part1
BEXIS2 Workshop - Part1
Nafiseh Navabpour
 
Data Visualization: A new module for BEXIS 2
Data Visualization: A new module for BEXIS 2Data Visualization: A new module for BEXIS 2
Data Visualization: A new module for BEXIS 2
Nafiseh Navabpour
 
Bexis2 introduction for spp2089
Bexis2 introduction for spp2089Bexis2 introduction for spp2089
Bexis2 introduction for spp2089
Nafiseh Navabpour
 
Ontology Design for the Card Game Dirty7
Ontology Design for the Card Game Dirty7Ontology Design for the Card Game Dirty7
Ontology Design for the Card Game Dirty7
Nafiseh Navabpour
 
Jaquardwebstuhl
JaquardwebstuhlJaquardwebstuhl
Jaquardwebstuhl
Nafiseh Navabpour
 
50 Years of Data Science
50 Years of Data Science50 Years of Data Science
50 Years of Data Science
Nafiseh Navabpour
 
Question answering
Question answeringQuestion answering
Question answering
Nafiseh Navabpour
 
Facilitating the discovery of public datasets
Facilitating the discovery of public datasetsFacilitating the discovery of public datasets
Facilitating the discovery of public datasets
Nafiseh Navabpour
 
Kindheit im iran
Kindheit im iranKindheit im iran
Kindheit im iran
Nafiseh Navabpour
 
Data integration
Data integrationData integration
Data integration
Nafiseh Navabpour
 

More from Nafiseh Navabpour (11)

BEXIS2 Workshop - Part2
BEXIS2 Workshop - Part2BEXIS2 Workshop - Part2
BEXIS2 Workshop - Part2
 
BEXIS2 Workshop - Part1
BEXIS2 Workshop - Part1BEXIS2 Workshop - Part1
BEXIS2 Workshop - Part1
 
Data Visualization: A new module for BEXIS 2
Data Visualization: A new module for BEXIS 2Data Visualization: A new module for BEXIS 2
Data Visualization: A new module for BEXIS 2
 
Bexis2 introduction for spp2089
Bexis2 introduction for spp2089Bexis2 introduction for spp2089
Bexis2 introduction for spp2089
 
Ontology Design for the Card Game Dirty7
Ontology Design for the Card Game Dirty7Ontology Design for the Card Game Dirty7
Ontology Design for the Card Game Dirty7
 
Jaquardwebstuhl
JaquardwebstuhlJaquardwebstuhl
Jaquardwebstuhl
 
50 Years of Data Science
50 Years of Data Science50 Years of Data Science
50 Years of Data Science
 
Question answering
Question answeringQuestion answering
Question answering
 
Facilitating the discovery of public datasets
Facilitating the discovery of public datasetsFacilitating the discovery of public datasets
Facilitating the discovery of public datasets
 
Kindheit im iran
Kindheit im iranKindheit im iran
Kindheit im iran
 
Data integration
Data integrationData integration
Data integration
 

Recently uploaded

Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
kkirkland2
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Dutch Power
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Rosie Wells
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Dutch Power
 
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
SkillCertProExams
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
OECD Directorate for Financial and Enterprise Affairs
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
artemacademy2
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
gpww3sf4
 
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPointMẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
1990 Media
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
samililja
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
amekonnen
 
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
OECD Directorate for Financial and Enterprise Affairs
 
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
OECD Directorate for Financial and Enterprise Affairs
 

Recently uploaded (20)

Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
 
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
 
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPointMẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
 
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
 
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
 

Dataset quality visualization in BEXIS2

  • 1. Dataset Quality Visualization in BEXIS2 M A S T E R T H E S I S Nafiseh Navabpour March 2021 Supervisors Prof. Dr. Birgitta König-Ries, Roman Gerlach, Sirko Schindler
  • 2. Outline 1. Problem statement 2. The path in solving the problem 3. A prototypical solution 4. Future perspective 2
  • 7. Summary of problem statement • Data is continuously produced • Data is stored in different quality level in data portals • Data consumers search data portals to review dataset quality • Reviewing dataset quality is time consuming => Dataset quality overview Problem statement 7
  • 8. • Interview with BEXIS2 users • Literature review • Looking at other data portals 8 The path in solving the problem The path in solving the problem
  • 9. Interview with BEXIS2 users Upload data Quality check • Excel sheet • TXT • CSV • Image • Comprehensible metadata • Well-defined dataset • Clearly defined variables • Accurate metadata • Correct information • Complete data • Ready for analyzing • Ready for reuse The path in solving the problem 9
  • 10. Literature review: Data quality Dimensions Intrinsic DQ Accuracy Objectivity Believability Reputation Contextual DQ Value- added Relevancy Timeliness Completeness amount of data Representational DQ Interpretability Ease of understanding Consistency Conciseness Accessibility DQ Accessibility Access security The path in solving the problem 10 Wang et al. 1996.
  • 11. Literature review: Dataset summarization Data origin The way of specifying the time The way of specifying the location Dataset completion Anything else that make data clearer A short description Dataset format and size Data headers Data types and data values The path in solving the problem 11 Koesten et al. 2020.
  • 12. Dataset quality components Quality components Quality attributes Description length Accuracy, Completeness Dataset format and size Accessibility A list of variables Relevancy, Completeness, Understandability data types, data distribution Completeness, Amount of data A list of files, file extensions Relevancy, Understandability, Accessibility Dataset contributors Believability Metadata/data completeness Completeness Dataset security level Accessibility, Security Shared elements Reputation Comparison Value-added The path in solving the problem 12
  • 13. Literature review: Dataset quality visualization pipeline Data import Data preparation Mapping Data manipulation Rendering The path in solving the problem 13 Qin et al. 2020.
  • 14. looking at other data portals The path in solving the problem https://data.cityofnewyork.us https://www.kaggle.com
  • 15. Dataset quality overview a) General quality information b) Comparison c) Data quality 15 A prototypical solution
  • 16. General quality information 16 A prototypical solution
  • 17. General quality information 17 A prototypical solution
  • 21. Compare dataset sizes 21 A prototypical solution
  • 22. Compare dataset sizes 22 A prototypical solution
  • 23. Tabular data quality 23 A prototypical solution
  • 24. File data quality 24 A prototypical solution
  • 25. Positive and negative points + Short text + Similar shapes + Color palette + Legend and tooltips + Interactive elements - Loading time 25 A prototypical solution
  • 26. Future perspective • Shorten loading time • Review users’ feedback • Add new features • Data visualization 26 Future perspective
  • 27. Thank you for your attention! Jena, 23.03.2021 Nafiseh Navabpour Nafiseh.Navabpour@uni-jena.de
  • 28. References • Atto – amazon tall tower observatory, Feb. 2021. [Online]. Available: https://www.attoproject.org/. • Covid-19 world vaccination progress, Feb. 2021. [Online]. Available: https://www.kaggle.com/gpreda/covid-world-vaccination-progress. • L. Koesten, E. Simperl, T. Blount, E. Kacprzak, and J. Tennison, “Everything you always wanted to know about a dataset: Studies in data summarisation,” International Journal of Human-Computer Studies, vol. 135, p. 102 367, 2020. DOI: 10.1016/j.ijhcs.2019.10.004. • N. Navabpour, Dataset quality visualization in bexis2, version 1.0.0, Mar. 2021. DOI: 10.5281/zenodo.4485845. • Covid-19 free meals locations, Feb. 2021. [Online]. Available: https://data.cityofnewyork.us/Education/COVID-19-Free-Meals-Locations/sp4a-vevi. • X. Qin, Y. Luo, N. Tang, and G. Li, “Making data visualization more efficient and effective: A survey,” The VLDB Journal, vol. 29, no. 1, pp. 93–117, 2020. DOI: 10.1007/s00778-019-00588-3. • R. Y. Wang and D. M. Strong, “Beyond accuracy: What data quality means to data consumers,” Journal of management information systems, vol. 12, no. 4, pp. 5–33, 1996. DOI: 10.1080/07421222.1996.11518099. 28