Aditya Parameswaran
Assistant Professor
University of Illinois
(w/ ManasiVartak, Samuel Madden @ MIT;
Tarique Siddiqui, Silu Huang @ Illinois)
http://data-people.cs.illinois.edu
DSIAWorkshop,VIS 2015
TowardsVisualization
Recommendation Systems
1
“Bring out your dead!” courtesy Monty Python
The Dark Ages ofVisualization
Recommendations
Substantial manual effort and tedious trial-and-error
2
To the Age of Enlightenment:
the Holy Grail
Can we build systems that automatically recommend
visualizations highlighting patterns of interest?
3
“The Holy Grail” courtesy Monty Python
Why now?
Reason 1: Too much data: records and attributes
Most of the dataset is unexplored!
4
Why now?
Reason 2: Lack of skills
Harvard Business Review Mashable.com
5
Limitations in CurrentTools
• Big Picture
• Analyst Preferences
• Specification
• Exploration
not ACID …
6
Limitations in CurrentTools
• Big Picture
– Poor comprehension of context
• Analyst Preferences
– Limited understanding of user interests
• Specification
– Insufficient means to specify trends of interest
• Exploration
– Inadequate navigation to unexplored areas
7
RecentAttempts atVizrec Systems
• Tableau Elastic
• Voyager
• Harvest
• Profiler
• Our systems
– SeeDB [VLDB 14 x 2,VLDB 16]
– zenvisage [unpublished]
This conference!
8
Still early days!
SeeDB: ComparativeTasks
Task:
Compare staplers (target, query)
with other products
Results:
Visualizations where staplers
“differ most” from other products
Issue: Many attributes  Many many visualizations!9
50
10 10
30
MA CA IL NY
30
20
10
40
Stapler sales
Other sales
Stapler prod
9
Other prod
: SearchTasks
Very early demo! Feedback welcome.
(you saw it here first...)
10
5 RecommendationAxes
• Specification of IntendedTask or Insight
– e.g., comparative (X vs.Y), search (find X with a
desired criteria), outliers (find unusual X)
• Data Characteristics
– e.g., typical correlations, patterns, trends across
attributes, across rows
• Semantics or Domain Knowledge
• Visual Ease of Understanding
• Analyst Preferences
11data-people.cs.illinois.edu/papers/dsia.pdf
Architectural Considerations
• Pre-computation
• Online computation
–Sharing
–Parallelism
–Pruning
–Approximations [VLDB’15]
12data-people.cs.illinois.edu/papers/dsia.pdf
A Clarion Call to DSIA Researchers…
Visualization Recommendation Systems:
are critically important
are timely
lead to interesting viz, db, ml, hci problems
Let’s move towards the age of enlightenment!
“The Holy Grail” courtesy Monty Python
13
data-people.cs.illinois.edu/papers/dsia.pdf
Ongoing Projects in Interactive Analytics
Minimizing effort & maximizing efficiency
http://data-people.cs.illinois.edu
• Data Manipulation [VLDB’15 x 2]
• DataVisualization [VLDB’14 x 2,VLDB ’15,VLDB ‘16]
• Data Collaboration [VLDB ’15 x 2, CIDR ’15,TAPP ’15]
• Data Processing with [VLDB ’15, HCOMP ’15, KDD ‘15]
datahub
14
Recent Papers, Demos
POPULACE
15
ResearchThrust II: Crowds
Minimizing cost and maximizing accuracy in
human-powered data management
Data Processing
Algorithms
Auxiliary Plugins:
Quality, Pricing
Data Processing
Systems
Filter [SIGMOD12,VLDB14] Max [SIGMOD12]
Clean [KDD12,TKDD13] Categorize [VLDB11]
Search [ICDE14] Debug [NIPS12] Count [HCOMP15]
Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12]
DataSift [HCOMP13, SIGMOD14] HQuery [CIDR11]
Conf [KDD13, ICDE15] Evict [TR12] Debias [KDD15]
Pricing[VLDB15] Quality [HCOMP14]
16
Human-in-the-loop
Data Management
Dual personalities
• Analysts supervising the analysis
– How do we help them get the insights they want?
• Crowds helping the analysis
– How do we best make use of them to process data?
17
Visualizations
Queries (100s)
Sharing
Pruning
Optimizer
DBMS
Middleware
Layer
18
Task Specification
ManualVisualization Builder
Visualization Pane
Recommendation Bar
User Study
Part I :Validate utility metric vs. other metrics
– See paper!
Part II : Study impact of recommendations
– H1: SeeDB finds interesting visualizations faster
– H2: Users prefer tool w/recommendations
I. SeeDB enables faster analysis
• Users view more visualizations with SeeDB
• Users bookmark more visualizations with SeeDB
• Bookmark rate 3X higher with SeeDB
# charts # bookmarks bookmark rate
Manual 6.3 +/- 3.8 1.1 +/- 1.45 0.14 +/- 0.16
SeeDB 10.8 +/- 4.41 3.4 +/- 1.35 0.43 +/- 0.23
II. Users Prefer SeeDB
100% users prefer SeeDB over Manual
“. . . quickly deciding what correlations are relevant” and
“[analyze] . . . a new dataset quickly”
“. . . great tool for proposing a set of initial queries for a
dataset”
“. . . potential downside may be that it made me lazy so I
didn’t bother thinking as much about what I really could study
or be interested in”
Questions on Part 2?
Overall research agenda …
Human-in-the-loop
Data Management
24
25

Towards Visualization Recommendation Systems

  • 1.
    Aditya Parameswaran Assistant Professor Universityof Illinois (w/ ManasiVartak, Samuel Madden @ MIT; Tarique Siddiqui, Silu Huang @ Illinois) http://data-people.cs.illinois.edu DSIAWorkshop,VIS 2015 TowardsVisualization Recommendation Systems 1
  • 2.
    “Bring out yourdead!” courtesy Monty Python The Dark Ages ofVisualization Recommendations Substantial manual effort and tedious trial-and-error 2
  • 3.
    To the Ageof Enlightenment: the Holy Grail Can we build systems that automatically recommend visualizations highlighting patterns of interest? 3 “The Holy Grail” courtesy Monty Python
  • 4.
    Why now? Reason 1:Too much data: records and attributes Most of the dataset is unexplored! 4
  • 5.
    Why now? Reason 2:Lack of skills Harvard Business Review Mashable.com 5
  • 6.
    Limitations in CurrentTools •Big Picture • Analyst Preferences • Specification • Exploration not ACID … 6
  • 7.
    Limitations in CurrentTools •Big Picture – Poor comprehension of context • Analyst Preferences – Limited understanding of user interests • Specification – Insufficient means to specify trends of interest • Exploration – Inadequate navigation to unexplored areas 7
  • 8.
    RecentAttempts atVizrec Systems •Tableau Elastic • Voyager • Harvest • Profiler • Our systems – SeeDB [VLDB 14 x 2,VLDB 16] – zenvisage [unpublished] This conference! 8 Still early days!
  • 9.
    SeeDB: ComparativeTasks Task: Compare staplers(target, query) with other products Results: Visualizations where staplers “differ most” from other products Issue: Many attributes  Many many visualizations!9 50 10 10 30 MA CA IL NY 30 20 10 40 Stapler sales Other sales Stapler prod 9 Other prod
  • 10.
    : SearchTasks Very earlydemo! Feedback welcome. (you saw it here first...) 10
  • 11.
    5 RecommendationAxes • Specificationof IntendedTask or Insight – e.g., comparative (X vs.Y), search (find X with a desired criteria), outliers (find unusual X) • Data Characteristics – e.g., typical correlations, patterns, trends across attributes, across rows • Semantics or Domain Knowledge • Visual Ease of Understanding • Analyst Preferences 11data-people.cs.illinois.edu/papers/dsia.pdf
  • 12.
    Architectural Considerations • Pre-computation •Online computation –Sharing –Parallelism –Pruning –Approximations [VLDB’15] 12data-people.cs.illinois.edu/papers/dsia.pdf
  • 13.
    A Clarion Callto DSIA Researchers… Visualization Recommendation Systems: are critically important are timely lead to interesting viz, db, ml, hci problems Let’s move towards the age of enlightenment! “The Holy Grail” courtesy Monty Python 13 data-people.cs.illinois.edu/papers/dsia.pdf
  • 14.
    Ongoing Projects inInteractive Analytics Minimizing effort & maximizing efficiency http://data-people.cs.illinois.edu • Data Manipulation [VLDB’15 x 2] • DataVisualization [VLDB’14 x 2,VLDB ’15,VLDB ‘16] • Data Collaboration [VLDB ’15 x 2, CIDR ’15,TAPP ’15] • Data Processing with [VLDB ’15, HCOMP ’15, KDD ‘15] datahub 14 Recent Papers, Demos POPULACE
  • 15.
  • 16.
    ResearchThrust II: Crowds Minimizingcost and maximizing accuracy in human-powered data management Data Processing Algorithms Auxiliary Plugins: Quality, Pricing Data Processing Systems Filter [SIGMOD12,VLDB14] Max [SIGMOD12] Clean [KDD12,TKDD13] Categorize [VLDB11] Search [ICDE14] Debug [NIPS12] Count [HCOMP15] Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12] DataSift [HCOMP13, SIGMOD14] HQuery [CIDR11] Conf [KDD13, ICDE15] Evict [TR12] Debias [KDD15] Pricing[VLDB15] Quality [HCOMP14] 16
  • 17.
    Human-in-the-loop Data Management Dual personalities •Analysts supervising the analysis – How do we help them get the insights they want? • Crowds helping the analysis – How do we best make use of them to process data? 17
  • 18.
  • 19.
  • 20.
    User Study Part I:Validate utility metric vs. other metrics – See paper! Part II : Study impact of recommendations – H1: SeeDB finds interesting visualizations faster – H2: Users prefer tool w/recommendations
  • 21.
    I. SeeDB enablesfaster analysis • Users view more visualizations with SeeDB • Users bookmark more visualizations with SeeDB • Bookmark rate 3X higher with SeeDB # charts # bookmarks bookmark rate Manual 6.3 +/- 3.8 1.1 +/- 1.45 0.14 +/- 0.16 SeeDB 10.8 +/- 4.41 3.4 +/- 1.35 0.43 +/- 0.23
  • 22.
    II. Users PreferSeeDB 100% users prefer SeeDB over Manual “. . . quickly deciding what correlations are relevant” and “[analyze] . . . a new dataset quickly” “. . . great tool for proposing a set of initial queries for a dataset” “. . . potential downside may be that it made me lazy so I didn’t bother thinking as much about what I really could study or be interested in”
  • 23.
  • 24.
    Overall research agenda… Human-in-the-loop Data Management 24
  • 25.

Editor's Notes

  • #3 Despite the advent of visualization tools like Tableau, we’re still in Current are akin to a movie catalog Where you can see the list of available movies, Select ones you want And see information about them. If you don’t know the movie you want to watch, you’ll have to look at a whole lot of movies before you what you desire In other words, current visualization systems involve sub Before you get the desired result
  • #4 Let’s move to Much like netflix and amazon recommendations of today,
  • #5 Why is this timely? Increasingly larger datasets with large numbers of records and attributes As a result Motivating the need for recommendations for the unexplored areas
  • #6 Second reason is that everyone wants to be a data scientist (and who are we to argue), but don’t really have the skills. We need to build the tools that help them get the insights they need.
  • #7 So what do current systems lack. I’m a database guy, and for some reason, we love chemistry based acronyms, so here’s a new one.
  • #8 Provide a.. Is the dip in february in sales expected? Or is it anomalous? Do not take into account typical browsing patterns For instance, if the analyst wants to find all products that took a hit in february? Can we find all attributes on which two products differ? Often users focus on a tiny portion of the dataset, perhaps due to inexperience.
  • #9 As it turns out.. We aren’t the only ones preaching this wisdom. Partially addressing these limitations Including one from tableau and one appearing at this very conf from the jeff and the uw folks I’m going to tell you about our systems to give you a flavor of what we’re talking about
  • #10 Caters to the user specification of a comparative task What SeeDB will provide are .. Among all the vis Key issue here is that
  • #11 Caters to the user specification of a search task
  • #12 In our workshop paper, we identified 5 recommendation axes: Which is very hard Ton of work from the viz community on this
  • #13 In building these vizrec systems there are a number of interesting systems challenges What should be done online and offline Online, how do we maximize sharing and parallelism in evaluating these recs? How do we … that we know are not useful How do we leverage app to return results faster, or return approximate results?
  • #14 In the age of data science
  • #19 Overall architecture Middleware layer that sits between the UI and the DBMS User task (compare married/un) is broken down into a collection of q; Optimizer handles these q using a combination of … optimizations and makes repeated q to the DBMS
  • #23 Note of caution