Session: Approaches to Improved Collection and Dissemination of Earth Science Data Quality Information
AGU 2015 | San Francisco | 14-18 December 2015
MINERAL RESOURCES FLAGSHIP
Anusuriya Devaraju and Jens Klump
(anusuriya.devaraju@csiro.au)
Using Feedback from Data Consumers to Capture Quality
Information on Environmental Research Data
Images: Anett Moritz, bpaquality.wordpress.com
Outline
• Definitions (User Feedback, Research Data, Data Quality)
• Motivation
• Goals & Solutions
• Summary
Outline
• Definitions (User Feedback, Research Data, Data Quality)
• Motivation
• Goals & Solutions
• Summary
• Feedback refers to information about reactions to a product.
• Feedback Types
User Feedback
4 |
User experience
(assessment and
usage)
General (comment, how-
to, suggestion, dissuasion)
Rating
Requirements
(feature,
content)
Image by Commonwealth Fund
Research Datasets
5 |
Research data are facts, observations or
experiences on which an argument, theory
or test is based.1
1The University of Melbourne draft policy on the Management of Research Data and Records
Data Quality
• The quality of data is often examined based several categories.*
• Quality = Fitness for use (Wang & Strong, 1996)
• Appropriate for use or meets user needs
• Datasets are often used for a purpose different from the intended one.
• Inadequate understanding of the purpose may lead to poor quality of derived
data.
6 | * R.Y. Wang, D.M. Strong, Beyond accuracy: what data quality means to data consumers, 1996.
(Image: https://wq.io/research/quality)
Outline
• Definitions (User Feedback, Research Data, Data Quality)
• Motivation
• Goals & Solutions
• Summary
Quality Measures in Practice
Data quality descriptions supplied by data providers might be
incomplete or may only address specific quality aspects.
• Accessibility, e.g., persistent identification, file format
• Completeness, e.g., required metadata
• Compliant with community standards
• Private and confidentiality concerns
• Review code, e.g., check and verify replication code
• Link to other research products
Scholarly and data journals may take a role in ensuring data quality,
but this mechanism only applies to data sets submitted to the
journals.
8 | Reference: http://www.ijdc.net/index.php/ijdc/article/view/9.1.263/358
User Feedback and Data Quality
9 | Image : http://whartonmagazine.com/blogs/women-and-leadership-moving-forward/
Data consumers may
complement existing
entities to assess and
document the quality
of published data sets.
Data quality
information may be
gathered via a user
feedback approach.
Discovered
issues, data
application and
derived datasets
PROVIDER CONSUMER
Data creation &
publication
Existing Feedback Mechanisms
10 |
Research Data Portals Feedback Mechanism
Research Data Australia (RDA) General feedback form, and user contributed tags for data
discovery
CSIRO Data Access Portal Refer to the email of the data collector in the metadata
TERN Data Discovery Portal General contact form
Australian Ocean Data Network Portal
(AODN)
General contact form and portal help forum
Atlas of Living Australia (ALA) UserVoice feedback portal
OzFlux Data Portal Email link (for all inquiries and assistance)
National Marine Mammal Data Portal General feedback form
Urban Research Infrastructure Network Email link for general inquiries, Social media buttons for
distribute the link of a data set.
Examples of research data portals and their feedback mechanisms
Why Does Quality Information From Users Matter?
Feedback
information from
data consumers
gives other users
and data providers
a better insight into
application and
assessment of
published data sets.
11 |
An example of corrected groundwater chemistry data sets provided by the Geological
Survey of South Australia and correction notes produced by (Gray and Bardwell, 2015).
Data providers may
use the feedback
information to
handle erroneous
data and improve
existing data
collection and
processing methods.
12 |
Why Does Quality Information From Users Matter?
An issue tracking component installed as part of the Terrestrial
Environmental Observatories (TERENO) data portal
Outline
• Definitions (User Feedback, Research Data, Data Quality)
• Motivation
• Goals & Solutions
• Summary
Goals
Develop a systematic and reusable approach to
1. Capture user feedback on the application and assessment of
research datasets (with identifiers)
2. Link feedback information to actual data sets
3. Support discovery of research data using feedback information.
14 |
Feedback Application Server
DataPortalwith
FeedbackPlugin
Linked Data &
SPARQL Clients
Feedback Data Store (MySQL)
REST
Feedback Web
Service
RDF
SPARQL
D2R Server
D2R Engine
JSON RDF
User Feedback System
15 |
Feedback from users may be
gathered :
• Implicit (automated
tracking of data activities)
• Explicit (predefined input
templates)
1 Gather feedback
2 Store feedback
3 Publish
feedback
The prototype of
the user feedback
system
16 |
1. Gather Feedback1
17 |
A relational data model
representing key aspects
of user feedback:
• Feedback types and
contributors
• Target data and
context
• Supporting
documents
2. Store Feedback2
18 |
3. Publish Feedback3
A high level overview of the W3C PROV model
Image : http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
19 |
3. Publish Feedback
Feedback published as Linked Data
Entities and agent involved in an error report
feedback activity
3
Conclusions
• We developed a prototype of the user feedback system to capture
quality information (assessment and application) of research
datasets from users.
• The prototype supports retrieval and publication of user feedback
information by combining a number of open-source technologies.
• The feedback records are made available as Linked Data to
promote integration with other sources on the Web.
• The W3C PROV model is used to represent the provenance of user
feedback information.
20 |
What’s Next?
• Track data application and assessment in a development
environment
21 |
Thank You…
22 |
IMPORTANT ASPECTS:
VALUE, EASY, FAST..

Using Feedback from Data Consumers to Capture Quality Information on Environmental Research Data

  • 1.
    Session: Approaches toImproved Collection and Dissemination of Earth Science Data Quality Information AGU 2015 | San Francisco | 14-18 December 2015 MINERAL RESOURCES FLAGSHIP Anusuriya Devaraju and Jens Klump (anusuriya.devaraju@csiro.au) Using Feedback from Data Consumers to Capture Quality Information on Environmental Research Data Images: Anett Moritz, bpaquality.wordpress.com
  • 2.
    Outline • Definitions (UserFeedback, Research Data, Data Quality) • Motivation • Goals & Solutions • Summary
  • 3.
    Outline • Definitions (UserFeedback, Research Data, Data Quality) • Motivation • Goals & Solutions • Summary
  • 4.
    • Feedback refersto information about reactions to a product. • Feedback Types User Feedback 4 | User experience (assessment and usage) General (comment, how- to, suggestion, dissuasion) Rating Requirements (feature, content) Image by Commonwealth Fund
  • 5.
    Research Datasets 5 | Researchdata are facts, observations or experiences on which an argument, theory or test is based.1 1The University of Melbourne draft policy on the Management of Research Data and Records
  • 6.
    Data Quality • Thequality of data is often examined based several categories.* • Quality = Fitness for use (Wang & Strong, 1996) • Appropriate for use or meets user needs • Datasets are often used for a purpose different from the intended one. • Inadequate understanding of the purpose may lead to poor quality of derived data. 6 | * R.Y. Wang, D.M. Strong, Beyond accuracy: what data quality means to data consumers, 1996. (Image: https://wq.io/research/quality)
  • 7.
    Outline • Definitions (UserFeedback, Research Data, Data Quality) • Motivation • Goals & Solutions • Summary
  • 8.
    Quality Measures inPractice Data quality descriptions supplied by data providers might be incomplete or may only address specific quality aspects. • Accessibility, e.g., persistent identification, file format • Completeness, e.g., required metadata • Compliant with community standards • Private and confidentiality concerns • Review code, e.g., check and verify replication code • Link to other research products Scholarly and data journals may take a role in ensuring data quality, but this mechanism only applies to data sets submitted to the journals. 8 | Reference: http://www.ijdc.net/index.php/ijdc/article/view/9.1.263/358
  • 9.
    User Feedback andData Quality 9 | Image : http://whartonmagazine.com/blogs/women-and-leadership-moving-forward/ Data consumers may complement existing entities to assess and document the quality of published data sets. Data quality information may be gathered via a user feedback approach. Discovered issues, data application and derived datasets PROVIDER CONSUMER Data creation & publication
  • 10.
    Existing Feedback Mechanisms 10| Research Data Portals Feedback Mechanism Research Data Australia (RDA) General feedback form, and user contributed tags for data discovery CSIRO Data Access Portal Refer to the email of the data collector in the metadata TERN Data Discovery Portal General contact form Australian Ocean Data Network Portal (AODN) General contact form and portal help forum Atlas of Living Australia (ALA) UserVoice feedback portal OzFlux Data Portal Email link (for all inquiries and assistance) National Marine Mammal Data Portal General feedback form Urban Research Infrastructure Network Email link for general inquiries, Social media buttons for distribute the link of a data set. Examples of research data portals and their feedback mechanisms
  • 11.
    Why Does QualityInformation From Users Matter? Feedback information from data consumers gives other users and data providers a better insight into application and assessment of published data sets. 11 | An example of corrected groundwater chemistry data sets provided by the Geological Survey of South Australia and correction notes produced by (Gray and Bardwell, 2015).
  • 12.
    Data providers may usethe feedback information to handle erroneous data and improve existing data collection and processing methods. 12 | Why Does Quality Information From Users Matter? An issue tracking component installed as part of the Terrestrial Environmental Observatories (TERENO) data portal
  • 13.
    Outline • Definitions (UserFeedback, Research Data, Data Quality) • Motivation • Goals & Solutions • Summary
  • 14.
    Goals Develop a systematicand reusable approach to 1. Capture user feedback on the application and assessment of research datasets (with identifiers) 2. Link feedback information to actual data sets 3. Support discovery of research data using feedback information. 14 |
  • 15.
    Feedback Application Server DataPortalwith FeedbackPlugin LinkedData & SPARQL Clients Feedback Data Store (MySQL) REST Feedback Web Service RDF SPARQL D2R Server D2R Engine JSON RDF User Feedback System 15 | Feedback from users may be gathered : • Implicit (automated tracking of data activities) • Explicit (predefined input templates) 1 Gather feedback 2 Store feedback 3 Publish feedback The prototype of the user feedback system
  • 16.
    16 | 1. GatherFeedback1
  • 17.
    17 | A relationaldata model representing key aspects of user feedback: • Feedback types and contributors • Target data and context • Supporting documents 2. Store Feedback2
  • 18.
    18 | 3. PublishFeedback3 A high level overview of the W3C PROV model Image : http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
  • 19.
    19 | 3. PublishFeedback Feedback published as Linked Data Entities and agent involved in an error report feedback activity 3
  • 20.
    Conclusions • We developeda prototype of the user feedback system to capture quality information (assessment and application) of research datasets from users. • The prototype supports retrieval and publication of user feedback information by combining a number of open-source technologies. • The feedback records are made available as Linked Data to promote integration with other sources on the Web. • The W3C PROV model is used to represent the provenance of user feedback information. 20 |
  • 21.
    What’s Next? • Trackdata application and assessment in a development environment 21 |
  • 22.
    Thank You… 22 | IMPORTANTASPECTS: VALUE, EASY, FAST..

Editor's Notes

  • #5 Feedback - information about reactions to a product, a person's performance of a task, etc. which is used as a basis for improvement.
  • #6 Research Data: Data are facts, observations or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or analysed, experimental or observational. Data includes: laboratory notebooks; field notebooks; primary research data (including research data in hardcopy or in computer readable form); questionnaires; audiotapes; videotapes; models; photographs; films; test responses. Research collections may include slides; artefacts; specimens; samples. Provenance information about the data might also be included: the how, when, where it was collected and with what (for example, instrument). The software code used to generate, annotate or analyse the data may also be included. Research data means information objects generated by scholarly projects for example through experiments, measurements, surveys or interviews.
  • #7 The concept of "fitness for use" emphasizes the importance of taking a consumer viewpoint of quality because ultimately it is the consumer who will judge whether or not a product is fit for use. we define "data quality" as data that are fit for use by data consumers
  • #9 What do we know about the quality of these datasets? Why does quality matter? Who should be responsible for their quality? In the research data ecosystem, several entities are responsible for data quality. Data producers (researchers and agencies) play a major role in this aspect as they often include validation checks or data cleaning as part of their work. It is possible that the quality information is not supplied with published data sets; if it is available, the descriptions might be incomplete, ambiguous or address specific quality aspects. Data repositories have built infrastructures to share data, but not all of them assess data quality. They normally provide guidelines of documenting quality information. Some suggests that scholarly and data journals should take a role in ensuring data quality by involving reviewers to assess data sets used in articles, and incorporating data quality criteria in the author guidelines. However, this mechanism primarily addresses data sets submitted to journals.
  • #10 Note that not all user feedback records are classified as quality information.
  • #12 Linking the corrected data sets and the supporting documents to the existing data repository can improve the re-usability of the data, reduce the duplication of effort in data handling, and potentially stimulate collaborations among researchers working in similar domains.
  • #13 an issue tracking component installed as part of the Terrestrial Environmental Observatories (TERENO) data portal is used by TERENO members to report any problems or issues related to data sets made available through the portal.
  • #19 Vocabularies from Dublin Core and PROV-O are used to clarify the source and attribution of feedback.
  • #20 Vocabularies from Dublin Core and PROV-O are used to clarify the source and attribution of feedback.
  • #21 The framework comprises a browser plug-in, a web service and a data model such that feedback can be easily reported, retrieved and searched. The feedback records are also made available as Linked Data to promote integration with other sources on the Web. Vocabularies from Dublin Core and PROV-O are used to clarify the source and attribution of feedback. The application of the framework is illustrated with the CSIRO’s Data Access Portal. Provenance of feedback data was annotated with the W3C PROV ontology
  • #22 The framework comprises a browser plug-in, a web service and a data model such that feedback can be easily reported, retrieved and searched. The feedback records are also made available as Linked Data to promote integration with other sources on the Web. Vocabularies from Dublin Core and PROV-O are used to clarify the source and attribution of feedback. The application of the framework is illustrated with the CSIRO’s Data Access Portal.