Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Maximizing Correctness with Minimal User Effort
to Learn Data Transformations
Bo Wu and Craig Knoblock
University of South...
2
Art website Buyer
3
Dimension of artworks
4
Programming by Example
Video is from Excel YouTube official channel (https://www.youtube.com/watch?v=YPG8PAQQ894)
Too Many Records
5
Overconfident Users
6
Users are often too confident to examine the results thoroughly
Variations
7
Problem
Enable the users of PBE systems to achieve maximal
correctness with minimal effort on large datasets
8
Help users ...
Approach Overview
9
Raw Transformed
10“ H x 8” W 10
H: 58 x
W:25”
58
12”H x 9”W 12
11”H x 6” 11
… …
30 x 46” 30 x 46
Entir...
Learning from users’ feedback
10
Verifying Records
• First recommend records causing runtime
errors
– Records cause the program exit abnormally
• Second re...
Learning the Meta-classifier
12
cs1
…
Meta-classifier
cs2
cs4 cs3
cp1
…
cp2
cp3 cp4
cf1
…
cf2
cf3 cf4
Program agreement
Fo...
Evaluation
• The recommendation contains incorrect
records
13
Evaluation
• The recommendation can place incorrect
records on top
14
User study
15
Experiment setup:
• 5 scenarios with 4000 records per scenario
• 10 graduate students divided into two groups
Summary and Future Work
• Summary
– Sample records
– Identify incorrect/questionable records
– Allow user to refine the re...
17
Questions ?
Data and system available at
https://github.com/areshand/Web-Karma
Type of Classifiers
• Classifier based on distance
• Classifier based on agreement of programs
• Classifier based on forma...
Learning from various past results
19
…
Raw Transformed
26" H x 24" W x 12.5 26
Framed at 21.75" H x 24.25” W 21
12" H x 9...
Sorting Records
20
Runtime errors
Rank records
using #failed_subprograms
Rank records
using meta-classifier output
Yes
No
...
Upcoming SlideShare
Loading in …5
×

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

430 views

Published on

Data transformation often requires users to write many trivial and task-dependent programs to transform thousands of records. Recently, programming-by-example (PBE) approaches enable users to transform data without coding. A key challenge of these PBE approaches is to deliver correctly transformed results on large datasets, since these transformation programs are likely to be generated by non-expert users. To address this challenge, existing approaches aim to identify a small set of potentially incorrect records and ask users to examine these records instead of the entire dataset. However, because the transformation scenarios are highly task-dependent, existing approaches cannot capture the incorrect records for various scenarios. \ We present a approach that learns from past transformation scenarios to generate a meta-classifier to identify the incorrect records. Our approach color-codes these transformed records and then presents them for users to examine. The method allows users to either enter an example for a record transformed incorrectly or confirm the correctness of a transformed record. And our approach can learn from the users' labels to refine the meta-classifier to accurately identify the incorrect records. Simulation results and a user study show that our method can identify the incorrectly transformed records and reduce the user efforts in examining the results.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

  1. 1. Maximizing Correctness with Minimal User Effort to Learn Data Transformations Bo Wu and Craig Knoblock University of Southern California 1 Department of Computer Science
  2. 2. 2 Art website Buyer
  3. 3. 3 Dimension of artworks
  4. 4. 4 Programming by Example Video is from Excel YouTube official channel (https://www.youtube.com/watch?v=YPG8PAQQ894)
  5. 5. Too Many Records 5
  6. 6. Overconfident Users 6 Users are often too confident to examine the results thoroughly
  7. 7. Variations 7
  8. 8. Problem Enable the users of PBE systems to achieve maximal correctness with minimal effort on large datasets 8 Help users to identify at least one of all incorrect records in every iteration with minimal effort on large datasets
  9. 9. Approach Overview 9 Raw Transformed 10“ H x 8” W 10 H: 58 x W:25” 58 12”H x 9”W 12 11”H x 6” 11 … … 30 x 46” 30 x 46 Entire dataset Random Sampling Raw Transformed 10“ H x 8” W 10 11”H x 6” 11 … … 30 x 46” 30 x 46 Sampled records Verifying records Raw Transformed 11”H x 6” 11 30 x 46” 30 x 46 … … Sorting and color-codingRaw Transformed 30 x 46” 30 x 46 11”H x 6” 11 … …
  10. 10. Learning from users’ feedback 10
  11. 11. Verifying Records • First recommend records causing runtime errors – Records cause the program exit abnormally • Second recommend potentially incorrect records – Learn a binary meta-classifier 11 Input: 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic Raw Transformed 11”H x 6” 11 30 x 46” 30 x 46 … … Ex:
  12. 12. Learning the Meta-classifier 12 cs1 … Meta-classifier cs2 cs4 cs3 cp1 … cp2 cp3 cp4 cf1 … cf2 cf3 cf4 Program agreement Format ambiguity Similarity cs3 cs4 cp2 cf1 w1 w2 w3 w4 …
  13. 13. Evaluation • The recommendation contains incorrect records 13
  14. 14. Evaluation • The recommendation can place incorrect records on top 14
  15. 15. User study 15 Experiment setup: • 5 scenarios with 4000 records per scenario • 10 graduate students divided into two groups
  16. 16. Summary and Future Work • Summary – Sample records – Identify incorrect/questionable records – Allow user to refine the recommendation – Color-code the results • Future work – Show histograms of the data – Translate the program to readable natural text 16
  17. 17. 17 Questions ? Data and system available at https://github.com/areshand/Web-Karma
  18. 18. Type of Classifiers • Classifier based on distance • Classifier based on agreement of programs • Classifier based on format ambiguity 18
  19. 19. Learning from various past results 19 … Raw Transformed 26" H x 24" W x 12.5 26 Framed at 21.75" H x 24.25” W 21 12" H x 9" 12 … Raw Transformed Ravage 2099#24 (November, 1994) November, 1994 Gambit III#1 (September, 1997) September, 1997 (comic) Spidey Super Stories#12/2 (September, 1975) comic … Examples Incorrect records Correct records
  20. 20. Sorting Records 20 Runtime errors Rank records using #failed_subprograms Rank records using meta-classifier output Yes No Checking transformed records Record #failed_subprograms 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic 3 1998 Honda Civic 12k miles s. Auto. - $3800 (Arcadia) 2

×