SlideShare a Scribd company logo
The State-of-the-Art of Set Visualization
This paper talks about different kinds of set visualization techniques to visualize large number
of possible relations between collections of elements (set – collection of unique elements). This
paper categorizes these techniques into six main categories:
 Euler and Venn diagrams (Euler Diagrams, ComED, DupED, Bubble sets)
 Overlays (Bubble sets, Linesets, kelp diagrams, kelp fusion, Icon Lists)
 Node-link diagrams(ComED , DupED , KelpDiagrams, Kelp Fusion, LinkedLists , Anchored
Maps, Pivot paths, Frequency grids)
 Matrix-based techniques ( Conset, Onset, Frequency grids, similarity matrix etc.)
 Aggregation-based techniques (Radial sets, parallel sets, linear diagram, upset , Meta
crystal etc.)
 Scatter plots and other techniques (Scatter view , Bicentric Diagrams)
Comparison of selected techniques by tasks they support:
Key Insight:
“Clearly, there is no single technique that supports all tasks. The choice of the technique to use
for a specific problem requires extensive analysis of the problem domain and its data
characteristics. This is important to determine the tasks that need to be supported and the
actual scalability requirements.”
Scalability limits depend on the complexity of the specific data set, such as overlap strength and
skewness (measure of the asymmetry of the distribution) in the set sizes.
Categories according to visual elements they use:
 Regions
 Lines
 Glyphs and icons
 Combination thereof
Problems which can be modeled using sets:
 Tags and multi-label classifications
 Subscriptions
 Voting: Many voting scenarios, a voter can select one more candidates for a specific
election. Each candidate defines a set over the voters who selected him/her.
 Surveys: In case of multiple-choice survey questions which can be answered by
selecting multiple answers each answer defines a set over survey projects who selected
this answer
 Probabilistic events: Each probabilistic event can be modeled as a subset of the sets of
all possible outcomes.
 Fuzzy clustering: In fuzzy clustering, one point can belong to multiple clusters, which
results in set overlaps.
 Query results: items satisfying a query can be modeled as a set over all data items. It is
possible to compare multiple
 Meta search
 Faceted search: “Faceted Search” offers dynamic filters that are automatically
generated based on your actual query results. These filters let you quickly and easily
hone your query over 50 million LinkedIn profiles based on 8 facets: current company,
past company, location, relationship, location, industry, school, and profile languages.
 Formal concept analysis: Modeling concepts and their relations to reason problems
 Genomics: Modeling individuals as a set that contains specific genes.
Common Tasks with set typed data:
 Tasks related to elements: Related to membership of elements in the sets in one word
containment (what elements are contained in a particular set) this task is at element
level.
 Tasks related to sets and set relations: This task is at set level higher level reasoning
about sets not taking individual elements into consideration. Grouping sets into set
families, analyze inclusion relations etc.
 Tasks related to element attributes: This task is concerned with interrelation between
element memberships and attributes.
Open Problems:
 Generating Euler diagrams with specific properties: Tools that determine whether a
diagram can be drawn with desired properties and propose alternative solutions to non-
draw able cases (e.g. using shading or approximate areas) would improve the quality of
the generated Euler diagrams and their applicability in various domains. No
implementation is available yet in this direction.
 Scalability: Scalability of certain techniques is severely limited Improving upon these
limits is necessary to address various real world problems that involve a large number of
sets and/or elements.
 The role of ordering: Lot of work has been done for reordering generic matrices and
node-link diagrams to reveal clusters and/or reduce clutter this work needs to be
revisited from sets perspective.
 Evaluation: More evaluation work is needed to determine which techniques work well
for which data characteristics and tasks, and to steer future research towards promising
directions.
 Visualizing sets in the context of other data types: Further work is needed to visualize
sets over elements in a timeline, a tree or a multi-variate visualization.
 Comparing multiple set families: Comparison might involve knowing which set relations
or attribute values change most/least across the different families techniques are
needed in scalable way in number of sets and families to support such comparison tasks.
 Time-varying set-typed data: Attribute values of the elements might change over time
even with static set memberships. Visualizing changes in set-typed data is challenging,
as the data are already complex
 Visualizing fuzzy and uncertain set memberships: More work on both analytical and
visual methods is needed to communicate the fuzziness in the data and study its effect
on various set-related tasks.
Ideas and research directions that could improve on existing set visualization techniques:
 Interaction: Interactivity makes simplifying complex visualizations possible by showing
certain information on demand and selecting certain parts to explore in more detail. It
also facilitates various comparisons within one set family or across multiple families.
 Coordinated multiple views: Can reduce the complexity of the data by showing
information at multiple levels of detail. This can also provide complementary
perspectives on the data (e.g. overlap matrix + spatial set distribution) to enrich the
analysis.
 Small multiples: Could provide solutions to visualizations that are severely limited in the
number of sets, such as Euler diagrams, Mosaic Displays or Double-Decker plots. They
can also be used to compare, for instance, data with certain attribute values to
determine if they correlate with certain set relations or membership patterns.
 Hybrid representations: Might be useful especially when the sets can be semantically
divided in two groups.
 Matrix based representations: There are several possibilities to encode multiple values
in a matrix cell this can be employed to show aggregated information on the elements
and their attributes, as with aggregation-based techniques.
 Analytical methods can transform large set-typed data: Several aggregations of the
elements are possible based on their set-memberships, degrees and attribute values.
Intuitive set-operations can be used, e.g. to aggregate multiple sets, or to replace a
large family of sets with a smaller family over the same elements.

More Related Content

What's hot

Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validation
Colleen Farrelly
 
Matlab Data And Statistics
Matlab Data And StatisticsMatlab Data And Statistics
Matlab Data And Statistics
DataminingTools Inc
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by Analogy
Colleen Farrelly
 
Quantum generalized linear models
Quantum generalized linear modelsQuantum generalized linear models
Quantum generalized linear models
Colleen Farrelly
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
s v
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
Colleen Farrelly
 
PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear Models
Colleen Farrelly
 
Morse-Smale Regression
Morse-Smale RegressionMorse-Smale Regression
Morse-Smale Regression
Colleen Farrelly
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
緯鈞 沈
 
Multiscale Mapper Networks
Multiscale Mapper NetworksMultiscale Mapper Networks
Multiscale Mapper Networks
Colleen Farrelly
 
18 ijcse-01232
18 ijcse-0123218 ijcse-01232
18 ijcse-01232
Shivlal Mewada
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
 
WILF2011 - slides
WILF2011 - slidesWILF2011 - slides
WILF2011 - slides
Gabriella Casalino
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
DataminingTools Inc
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering Types
Suryakumar Thangarasu
 
Tensor decompositions for medical analytics
Tensor decompositions for medical analyticsTensor decompositions for medical analytics
Tensor decompositions for medical analytics
Colleen Farrelly
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
Colleen Farrelly
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image Segmentation
Christopher Peter Makris
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
Ding Li
 

What's hot (20)

Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validation
 
Matlab Data And Statistics
Matlab Data And StatisticsMatlab Data And Statistics
Matlab Data And Statistics
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by Analogy
 
Quantum generalized linear models
Quantum generalized linear modelsQuantum generalized linear models
Quantum generalized linear models
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear Models
 
Morse-Smale Regression
Morse-Smale RegressionMorse-Smale Regression
Morse-Smale Regression
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Multiscale Mapper Networks
Multiscale Mapper NetworksMultiscale Mapper Networks
Multiscale Mapper Networks
 
18 ijcse-01232
18 ijcse-0123218 ijcse-01232
18 ijcse-01232
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
WILF2011 - slides
WILF2011 - slidesWILF2011 - slides
WILF2011 - slides
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering Types
 
Tensor decompositions for medical analytics
Tensor decompositions for medical analyticsTensor decompositions for medical analytics
Tensor decompositions for medical analytics
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image Segmentation
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
 

Similar to Summary2 (1)

Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
Infrrd
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
KU Leuven
 
Uml - An Overview
Uml - An OverviewUml - An Overview
Uml - An Overview
Raj Thilak S
 
08 class and sequence diagrams
08   class and sequence diagrams08   class and sequence diagrams
08 class and sequence diagrams
kebsterz
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...
Mobarok Hossen
 
Object oriented analysis and design unit- iv
Object oriented analysis and design unit- ivObject oriented analysis and design unit- iv
Object oriented analysis and design unit- iv
Shri Shankaracharya College, Bhilai,Junwani
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
IJERA Editor
 
Data models
Data modelsData models
Data models
Hira Bukhari
 
Data models
Data modelsData models
Data models
Hira Bukhari
 
F04463437
F04463437F04463437
F04463437
IOSR-JEN
 
Class and object_diagram
Class  and object_diagramClass  and object_diagram
Class and object_diagram
Sadhana28
 
SE - System Models
SE - System ModelsSE - System Models
SE - System Models
Jomel Penalba
 
Ch8
Ch8Ch8
A03202001005
A03202001005A03202001005
A03202001005
theijes
 
Learning to rank image tags with limited training examples
Learning to rank image tags with limited training examplesLearning to rank image tags with limited training examples
Learning to rank image tags with limited training examples
CloudTechnologies
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Varad Meru
 
Lecture 16 requirements modeling - scenario, information and analysis classes
Lecture 16   requirements modeling - scenario, information and analysis classesLecture 16   requirements modeling - scenario, information and analysis classes
Lecture 16 requirements modeling - scenario, information and analysis classes
IIUI
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
IJERA Editor
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
Amit Sheth
 

Similar to Summary2 (1) (20)

Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
 
Uml - An Overview
Uml - An OverviewUml - An Overview
Uml - An Overview
 
08 class and sequence diagrams
08   class and sequence diagrams08   class and sequence diagrams
08 class and sequence diagrams
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...
 
Object oriented analysis and design unit- iv
Object oriented analysis and design unit- ivObject oriented analysis and design unit- iv
Object oriented analysis and design unit- iv
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
Data models
Data modelsData models
Data models
 
Data models
Data modelsData models
Data models
 
F04463437
F04463437F04463437
F04463437
 
Class and object_diagram
Class  and object_diagramClass  and object_diagram
Class and object_diagram
 
SE - System Models
SE - System ModelsSE - System Models
SE - System Models
 
Ch8
Ch8Ch8
Ch8
 
A03202001005
A03202001005A03202001005
A03202001005
 
Learning to rank image tags with limited training examples
Learning to rank image tags with limited training examplesLearning to rank image tags with limited training examples
Learning to rank image tags with limited training examples
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Lecture 16 requirements modeling - scenario, information and analysis classes
Lecture 16   requirements modeling - scenario, information and analysis classesLecture 16   requirements modeling - scenario, information and analysis classes
Lecture 16 requirements modeling - scenario, information and analysis classes
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 

More from Adarsh Burma

Summary_Onset (1)
Summary_Onset (1)Summary_Onset (1)
Summary_Onset (1)
Adarsh Burma
 
unofficial_Transcript (1)
unofficial_Transcript (1)unofficial_Transcript (1)
unofficial_Transcript (1)Adarsh Burma
 
Toss_Up
Toss_UpToss_Up
Toss_Up
Adarsh Burma
 
Final_Report_new (1)
Final_Report_new (1)Final_Report_new (1)
Final_Report_new (1)
Adarsh Burma
 
brf to mathml
brf to mathmlbrf to mathml
brf to mathml
Adarsh Burma
 
Academic_Projects
Academic_ProjectsAcademic_Projects
Academic_Projects
Adarsh Burma
 
Keste_Projects
Keste_ProjectsKeste_Projects
Keste_Projects
Adarsh Burma
 

More from Adarsh Burma (7)

Summary_Onset (1)
Summary_Onset (1)Summary_Onset (1)
Summary_Onset (1)
 
unofficial_Transcript (1)
unofficial_Transcript (1)unofficial_Transcript (1)
unofficial_Transcript (1)
 
Toss_Up
Toss_UpToss_Up
Toss_Up
 
Final_Report_new (1)
Final_Report_new (1)Final_Report_new (1)
Final_Report_new (1)
 
brf to mathml
brf to mathmlbrf to mathml
brf to mathml
 
Academic_Projects
Academic_ProjectsAcademic_Projects
Academic_Projects
 
Keste_Projects
Keste_ProjectsKeste_Projects
Keste_Projects
 

Summary2 (1)

  • 1. The State-of-the-Art of Set Visualization This paper talks about different kinds of set visualization techniques to visualize large number of possible relations between collections of elements (set – collection of unique elements). This paper categorizes these techniques into six main categories:  Euler and Venn diagrams (Euler Diagrams, ComED, DupED, Bubble sets)  Overlays (Bubble sets, Linesets, kelp diagrams, kelp fusion, Icon Lists)  Node-link diagrams(ComED , DupED , KelpDiagrams, Kelp Fusion, LinkedLists , Anchored Maps, Pivot paths, Frequency grids)  Matrix-based techniques ( Conset, Onset, Frequency grids, similarity matrix etc.)  Aggregation-based techniques (Radial sets, parallel sets, linear diagram, upset , Meta crystal etc.)  Scatter plots and other techniques (Scatter view , Bicentric Diagrams) Comparison of selected techniques by tasks they support:
  • 2. Key Insight: “Clearly, there is no single technique that supports all tasks. The choice of the technique to use for a specific problem requires extensive analysis of the problem domain and its data characteristics. This is important to determine the tasks that need to be supported and the actual scalability requirements.” Scalability limits depend on the complexity of the specific data set, such as overlap strength and skewness (measure of the asymmetry of the distribution) in the set sizes. Categories according to visual elements they use:  Regions  Lines  Glyphs and icons  Combination thereof Problems which can be modeled using sets:  Tags and multi-label classifications  Subscriptions
  • 3.  Voting: Many voting scenarios, a voter can select one more candidates for a specific election. Each candidate defines a set over the voters who selected him/her.  Surveys: In case of multiple-choice survey questions which can be answered by selecting multiple answers each answer defines a set over survey projects who selected this answer  Probabilistic events: Each probabilistic event can be modeled as a subset of the sets of all possible outcomes.  Fuzzy clustering: In fuzzy clustering, one point can belong to multiple clusters, which results in set overlaps.  Query results: items satisfying a query can be modeled as a set over all data items. It is possible to compare multiple  Meta search  Faceted search: “Faceted Search” offers dynamic filters that are automatically generated based on your actual query results. These filters let you quickly and easily hone your query over 50 million LinkedIn profiles based on 8 facets: current company, past company, location, relationship, location, industry, school, and profile languages.
  • 4.  Formal concept analysis: Modeling concepts and their relations to reason problems  Genomics: Modeling individuals as a set that contains specific genes. Common Tasks with set typed data:  Tasks related to elements: Related to membership of elements in the sets in one word containment (what elements are contained in a particular set) this task is at element level.  Tasks related to sets and set relations: This task is at set level higher level reasoning about sets not taking individual elements into consideration. Grouping sets into set families, analyze inclusion relations etc.  Tasks related to element attributes: This task is concerned with interrelation between element memberships and attributes. Open Problems:  Generating Euler diagrams with specific properties: Tools that determine whether a diagram can be drawn with desired properties and propose alternative solutions to non- draw able cases (e.g. using shading or approximate areas) would improve the quality of the generated Euler diagrams and their applicability in various domains. No implementation is available yet in this direction.  Scalability: Scalability of certain techniques is severely limited Improving upon these limits is necessary to address various real world problems that involve a large number of sets and/or elements.
  • 5.  The role of ordering: Lot of work has been done for reordering generic matrices and node-link diagrams to reveal clusters and/or reduce clutter this work needs to be revisited from sets perspective.  Evaluation: More evaluation work is needed to determine which techniques work well for which data characteristics and tasks, and to steer future research towards promising directions.  Visualizing sets in the context of other data types: Further work is needed to visualize sets over elements in a timeline, a tree or a multi-variate visualization.  Comparing multiple set families: Comparison might involve knowing which set relations or attribute values change most/least across the different families techniques are needed in scalable way in number of sets and families to support such comparison tasks.  Time-varying set-typed data: Attribute values of the elements might change over time even with static set memberships. Visualizing changes in set-typed data is challenging, as the data are already complex  Visualizing fuzzy and uncertain set memberships: More work on both analytical and visual methods is needed to communicate the fuzziness in the data and study its effect on various set-related tasks. Ideas and research directions that could improve on existing set visualization techniques:  Interaction: Interactivity makes simplifying complex visualizations possible by showing certain information on demand and selecting certain parts to explore in more detail. It also facilitates various comparisons within one set family or across multiple families.  Coordinated multiple views: Can reduce the complexity of the data by showing information at multiple levels of detail. This can also provide complementary perspectives on the data (e.g. overlap matrix + spatial set distribution) to enrich the analysis.  Small multiples: Could provide solutions to visualizations that are severely limited in the number of sets, such as Euler diagrams, Mosaic Displays or Double-Decker plots. They can also be used to compare, for instance, data with certain attribute values to determine if they correlate with certain set relations or membership patterns.  Hybrid representations: Might be useful especially when the sets can be semantically divided in two groups.  Matrix based representations: There are several possibilities to encode multiple values in a matrix cell this can be employed to show aggregated information on the elements and their attributes, as with aggregation-based techniques.
  • 6.  Analytical methods can transform large set-typed data: Several aggregations of the elements are possible based on their set-memberships, degrees and attribute values. Intuitive set-operations can be used, e.g. to aggregate multiple sets, or to replace a large family of sets with a smaller family over the same elements.