Vivarana literature survey

CS4202 Research & Development Project
Literature Survey on Data Visualization and
Complex Event Processing Rule Generation
Project Group Name : Vivarana
Project Supervisers
Prof. Gihan Dias
Eng. Charith Chitraranjan
Group Members
100112V - E.A.S.D.Edirisinghe
100132G - W.V.D.Fernando
100440A - R.H.T.D.Ranasinghe
100444N - M.C.S.Ranatunga

Table of Content
1. Introduction 01
2. Multidimensional data visualization 02
3. Visualization techniques 05
3.1 Scatter Plots 05
3.1.1 Rank-by-feature framework 06
3.1.2 Rolling Dice Framework 09
3.1.3 Shortcomings of Scatterplot Matrix (SPLOM) 12
3.2 Parallel Coordinates 13
3.2.1 Definition and Representation 14
3.2.3 Brushing 16
3.2.4 Axis Reordering 19
3.2.5 Data Clustering 20
3.2.6 Statistical Coloring 23
3.2.7 Scaling 24
3.2.8 Limitations 25
3.3 Radviz 29
3.4 Mosaic Plots 30
3.5 Self Organizing Maps 32
3.6 Sunburst Visualization 34
3.7 Trellis Visualization 35
3.8 Grand Tour 37
3.8.1 Tours 37
3.8.2 Tour methods 38
4. CEP Rule generation 40
4.1 iCEP 41
4.2 Tuning rule parameters using the Prediction-Correction Paradigm 45
4.2.1 Model 46
4.2.2 System State 47
4.2.3 Rule Tuning Mechanism 47
References 50

List of Figures.
Figure 1: A scatterplot of the distribution of drivers’ visibility range against their age
Figure 2: A scatterplot matrix displays of data with three variants X, Y, and Z.
Figure 3: Rank-by-feature framework interface for scatterplots (2D).
Figure 4: Rank by feature visualization for a data set of a demographic and health related statistics for 3138 U.S. counties
Figure 5: Scatterplot matrix navigation for a digital camera dataset.
Figure 6: Stage-by-stage overview of the scatterplot animated transition
Figure 7: Scatterplot matrix for the “Nuts-and-bolts” dataset
Figure 8: Generalized Plot Matrix for the “Nuts-and-bolts” dataset
Figure 9: Parallel coordinate plot with 8 variables for 250 cars
Figure 10: Parallel Coordinate plot for a point
Figure 11: Parallel Coordinate plot for points in a line with m < 0
Figure 12: Parallel Coordinate plot for points in a line with 0<m<1
Figure 13: Negative correlation between Car Weight and the Year
Figure 14: Using brushing to filter Cars with 6 cylinders
Figure 15: Using composite brushing to Filter Cars with 6 cylinders made in 76’
Figure 16: An example of smooth brushing
Figure 17: Angular Brushing
Figure 18: Multiple ways of ordering N axes in parallel coordinates
Figure 19: Two clusters represented in parallel coordinates
Figure 20: Multiple clusters visualized in parallel coordinates in different colors
Figure 21: Variable length Opacity Bands representing a cluster in parallel coordinate
Figure 22: Parallel-coordinates plot using polylines and using bundled curves
Figure 23: Statistically colored Parallel Coordinates plot on weight of cars
Figure 24: Three scaling options for visualizing the stage times in the Tour de France
Figure 25: Parallel Coordinates plot for a data set with 8000 rows
Figure 26: Parallel coordinates for the “Olive Oils” data
Figure 27: Parallel Coordinates visualization with Z score coloring
Figure 28: Parallel Coordinates drawn on same data set using data selection
Figure 29: Radviz Visualization for multi-dimensional data
Figure 30: Mosaic plot for the Titanic data showing the distribution of passenger’s survival based on their class and sex
Figure 31: Double decker plot for the Titanic data
Figure 32: Training a self-organizing map.
Figure 33: A self-organizing map trained on the poverty levels of countries
Figure 34: A sunburst visualization summarizing user paths through a fictional e-commerce site.
Figure 35: Trellis Chart for a dates set on sales
Figure 36: Trellis Display of Scatter Plots (Relationship of Gifts Given/Received on Revenue)

Figure 37: Grand Tours
Figure 38: 1D grand tour path in 3D
Figure 39: Structure of the iCEP framework
Figure 40: Prediction Correction Paradigm
Figure 41: An overview of rules tuning method

1
1. Introduction
Nowadays every action/event occurring in the real world, whether it be a change of temperature detected by a sensor, changes in stock market prices or even the movement of objects tracked through GPS coordinates is digitally collected and stored for further exploration and analysis and sometimes pre-specified action is triggered in real-time when a particular action/event occurs. Complex Event Processing (CEP) engines are used to analyze these events on the fly and to execute appropriate pre-specified actions. But one of the downside of this real-time event monitoring and processing using a CEP is that a domain expert must write necessary CEP rules in-order to detect interesting event and to trigger an appropriate response. Sometimes the domain expert might lack the knowledge to write efficient CEP rules for a particular CEP engine using its query language or he might need to explore, understand and analyze the incoming event stream prior to writing any rules. By providing an interactive visualization of data to the domain experts, we can help them in their process of generating CEP rules. Hence, this literature survey mainly contains two sections. Section 3 presents our findings on interactive visualization techniques. In this sections we have described about Scatterplots (section 3.1) and parallel coordinated (section 3.2) in details and we have introduced other promising visualization techniques briefly. Section 4 contains our findings on two methods of generating CEP rule generation namely iCEP and rule parameter tuning. Further section 2 contains an overview of multidimensional visualization (principles, techniques, problems) for the sake of completeness.

2
2. Multidimensional data visualization
Recent advances in technology has enabled the generation of vast amounts of data in a wide range of fields. These data also keep getting more complex. Data analysts want to look for patterns, anomalies and structures in data. Analyzing the data can lead to important knowledge discoveries which is valuable to users. The benefits of such understanding reflect in business decision making, more accurate medical diagnosis, finer engineering and more refined conclusions in a general sense.
Visualizing these complex data can provide an overview of the data, summary of the data and also can provide and help in identifying areas of interest in the data. Good data visualization techniques that allows users to explore and manipulate the data can empower them in analyzing the data and identifying important patterns and trends in the data that may have been hidden otherwise.
Multi-dimensional data visualization is a very active research area that goes back many years [1]. In this survey we have focused on 2D multi-dimensional data visualization techniques, because 2D visualizations will make it easy for the users to analyze and interact with the data as 2D surfaces present a surface that more familiar to users and is easy to navigate.
There are multiple challenges that needs overcoming in multidimensional data visualization. Finding a good visualization includes finding a good compromise that can overcome some of these challenges are
● Mapping - Finding a good mapping from a multi-dimensional space to a two dimensional space is not a simple task. The final representation of the data should be intuitive and interpretable. Users should be able to identify patterns and trends in the multi-dimensional data using the two dimensional representation.
● Large amounts of data - Modern dataset contain very large amounts of data that can lead to very dense data visualizations. This causes the loss of information in the visualization because the users lose the ability to distinguish between small differences in the data.
● Dimensionality - Displaying the information of multiple dimensions in two dimensional space can also lead to very dense and cluttered visualizations. Techniques need to be developed to allow users to reduce the clutter and identify important information in the

3
data. Techniques such as principle component analysis [2] can help in identifying important dimensions in the data.
● Assessing effectiveness - Information needs from data varies widely with each data set. So there is no “silver bullet” in visualization technique that can solve all the problem. Different datasets and requirements can yield to varying visualization methods. There is no method to access the effectiveness of a visualization method over another so there is process that can be followed to come up with a visualization method that works for any dataset.
Further according to E.R. Tufte [3] a good visualization comprises of below qualities
● Show data variations instead of design variations. This quality encourages the viewer to think about the substance rather than about methodology, graphic design, the tech of graphic production, etc. One way to achieve this quality in a visualization is to have a high data-to-ink ratio [4] and a high data density.
● Clear, detailed and thorough labeling and appropriate scales. A visualization can use layering and separation techniques to show the labels of the data items
● Size of the graphic effect should be directly proportional to the numeric quantities. This can be achieved by avoiding chart junks such as unnecessary 3D, shadowing effects and by reducing the lie factor[5]
In-order to make the visualization more user friendly, a number of interaction techniques have been proposed [6]. It should be noted that that the behavior of these interaction techniques differ from one visualization technique to another. However, interaction techniques allows the user to directly interact with the visualization and to change the visualization according to the exploration objective. Below list contains the major interactive techniques we have identified.
● Dynamic Projections
Dynamic projection means dynamically changing the projection in-order to explore a multidimensional data set. A classic example would be the Grand Tour [7] which tries to show all interesting pairs of dimensions of a multidimensional dataset as a series of

4
scatterplots. However, the sequence of projection can be random, manual, pre-computed, or even data driven depending on the visualization technique.
● Interactive Filtering
When exploring large dataset interactively partitioning and focusing on interesting subsets is a must. This can be achieved through direct selection of the desired subset (browsing) or through specifying the properties of the desired subset (querying). However, browsing and querying becomes difficult and inaccurate respectively when the dataset becomes larger. As a solution to this problem a number of techniques such as Magic Lens [8], InfoCrystal [9] have been developed in-order to improve interactive filtering in data exploration.
● Interactive Zooming
Zooming is used in almost all the interactive visualizations. When dealing with large amount of data, sometimes the data is highly compressed in-order to provide an overview of it. In such cases zooming does not only mean to display the data objects larger, but it also means that the data representation should automatically change to present more details on higher zoom levels (decompressing). The initial view (compressed view) will allow the user to identify patterns, correlations and outliers and by zooming in to the interested area user can study the data objects within that region in more detail.
● Interactive Distortion
Interactive distortion techniques will help in data exploration process by providing a way for focusing on details while preserving an overview of data. The basic idea of distortion is to show a portion of the data with high level of details while other portion is shown in lower level of detail.
● Interactive Linking and Brushing
The idea of linking and brushing is to combine different visualization methods to overcome the shortcomings of single techniques. As an example one could visualize a scatterplot matrix (section 3.1) for a data set and when some points in a particular scatterplot is brushed those points will get highlighted in all other scatterplots. Hence interactive changes made in one visualization are automatically reflected in the other visualizations.

5
3. Visualization techniques
3.1 Scatter Plots
Scatterplots are a commonly used visualization technique to deal with multivariate data sets. Mainly there are 2D and 3D scatter plot visualizations. In a 2D scatterplot, data points from two dimensions of a dataset are plotted in a Cartesian coordinate system where the two axes represent the selected dimensions resulting in a scattering of points. An example of a scatterplot showing the distribution of drivers visibility with their age is shown if Figure 1.
Figure 1: A scatterplot of the distribution of drivers’ visibility range against their age
The positions of the data points represent the corresponding dimension values. Scatterplots are useful for visually identifying correlations between two selected variables of a multidimensional data set, or finding clusters of individuals (outliers) in the dataset. One single scatterplot can only depict the correlation between two dimensions. Additional limited dimensions can be mapped to color, size or shape of the plotting points.
Advocates of 3D scatterplots argue that since the natural world is three dimensional, users can readily grasp 3D representations. However, there is substantial empirical evidence that for multidimensional ordinal data (rather than 3D real objects such as chairs or skeletons), users struggle with occlusion and the cognitive burden of navigation as they try to find desired

6
viewpoints [10]. Advocates of higher dimensional displays have demonstrated attractive possibilities, but their strategies are still difficult to grasp for most users.
Since two-dimensional scatterplot presentation offer ample power while maintaining comprehensibility, many variations have been proposed. One of the method used to visualize multivariate data using 2D scatterplots is scatterplot matrix (SPLOM) [1].
Figure 2: A scatterplot matrix displays of data with three variants X, Y, and Z [1].
Each individual plot in the SPLOM is identified by its row and column number in the matrix [1]. For example, the identity of the upper left plot of the matrix in Figure 2 is (1, 3) and the lower right plot is (3, 1). The empty diagonals displays the variable names. Plot (2, 1) is the scatter plot of parameter X against Y while plot (1, 2) is the reverse, i.e. Y versus X.
One of the major disadvantage of SPLOM is that as the number of dimensions of the data set grow the n-by-n SPLOM grows and each individual scatterplot in the SPLOM will have less space. Following frameworks provide a solution to that problem by incorporating interactive techniques with the traditional SPLOM.
3.1.1 Rank-by-feature framework
Many variations have been proposed to the initial SPLOM to enhance its interactivity and interpretability. One such enhancement is presented with the rank-by-feature framework [10]. Instead of directly visualizing the data point against all pairs of dimensions, this framework allows the user to select an interesting ranking criterion which will be described later in this section.

7
A B C D
Figure 3: Rank-by-feature framework interface for scatterplots (2D). All pairs of dimensions are sorted according to the current ordering criterion (Correlation coefficient) (A) in the ordered list (C). The score overview (B) shows an overview of scores of pairs of dimensions. A mouse over event activates a cell in the score overview, highlights the corresponding item in the ordered list (C) and shows the corresponding scatterplot in the scatterplot browser (D) simultaneously. A scatterplot is shown in the scatterplot browser (D), where it is also easy to traverse scatterplot space by changing X or Y axis using item sliders on the horizontal or vertical axis. (A demographic and health related statistics for 3138 U.S. counties with 17 attributes.)
Figure 3 shows a dataset of demographic and health related statistics for 3138 U.S. counties with 17 attributes, visualized through the rank-by-feature framework and its interface consists of four coordinated components: control panel (Figure 3A), score overview (Figure 3B), ordered list (Figure 3C), and scatterplot browser (Figure 3D).
User can select an ordering criterion in the control panel (Figure 3A), and the ordered list (Figure 3C) shows the pairs of dimensions (scatterplots) sorted according to the score of the criteria with the scores color-coded on the background. But users cannot see an overview of entire relationships between variables at a glance in the ordered list. Hence the score overview (Figure 3B), an m-by-m grid view where all dimensions are aligned in the rows and columns has been implemented. Each cell of score overview represents a scatterplot whose horizontal and vertical axes are dimensions at the corresponding column and row respectively.
Since this matrix is symmetric, only the lower-triangular part is shown. Each cell is color- coded by its score value using the same mapping scheme as in ordered list. The scatterplot corresponding to the cell is shown in the scatterplot browser (Figure 3D) simultaneously, and the corresponding item is highlighted in the ordered list (Figure 3C). In the scatterplot browser, users can quickly take a look at scatterplots by using item sliders attached to the scatterplot view.

8
Simply by dragging the vertical or horizontal item slider bar, users can change the dimension for either the horizontal or vertical axis respectively while preserving the other axis.
Below list contains the ranking criterions suggested by this framework.
● Correlation coefficient (-1 to 1):
The Pearson’s correlation coefficient (r) for a scatterplot (S) with n points [12] is defined in Equation 1
Equation 1: Pearson’s correlation coefficient (r) for a scatterplot (S) with n points
Pearson’s r is a number between -1 and 1. The sign and magnitude tells the direction and the strength of the relationship respectively. Although correlation doesn’t necessarily imply causality, it can provide a good clue to the true cause, which could be another variable. Linear relationships are more common and simple to understand. As a visual representation of the linear relationship between two variables, the line of best fit or the regression line is drawn over scatterplots.
● Least square error for curvilinear regression (0 to 1)
This criterion sort scatterplots in terms of least-square errors from the optimal quadratic curve fit so that the user can isolate the scatterplots where all points are closely/loosely arranged along a quadratic curve. In some scenarios it might be interesting to find nonlinear relationships in the data set in addition to linear relationship.
● Quadracity (0 to infinity)
The "Quadricity" criterion is added to emphasize the real quadratic relationships. It ranks scatterplots according to the coefficient of the highest degree term, so that users can easily identify ones that are more quadratic than others.
● The number of potential outliers (0 to n)
Distance-based outlier detection methods such as DB-out [13] or Density based outlier detection methods such as Local Outlier Factor (LOF)-based method [14] can be used to detect outliers in a scatterplot and rank by-feature framework uses LOF-based method (Figure 4), since it is more flexible and dynamic in terms of outlier definition and

9
detection. The outliers are highlighted with yellow triangles in the scatterplot browser view.
Figure 4: Rank by feature visualization for a data set of a demographic and health related statistics for 3138 U.S. counties with 17 attributes, visualized with the Number of Potential Outliers ranking criteria.
● The number of items in the region of interest (0 to n)
This criterion allows the user to draw a free-formed polygon region of interest on the scatterplot. Then the framework will use the number of data points in the region to order all scatterplots so that user can easily find the ones with most/least number of items in the specified region.
● Uniformity of scatterplots (0 to infinity)
To calculate this criterion the two-dimensional space is divided into regular grid cells and then each cell is used as a bin. For example, if k-by-k grid has been generated, the entropy of a scatterplot S would be
Where Pij is the probability that an item belongs to the cell at (i, j) of the grid.
3.1.2 Rolling Dice Framework
Rolling dice is another framework which utilizes SPLOM to visualize multidimensional data [15]. In this framework, transitions from one scatterplot to another is performed as animated rotations in 3D space, similar to a rolling dice. Rolling dice framework suggest a visual querying

10
technique so that a user can refine his requirement by exploring how the same query would result in any scatterplot.
Figure 5: Scatterplot matrix navigation for a digital camera dataset [15]. The main interface proposed by rolling dice framework consist of a scatterplot matrix component (A), a scatterplot component (B) and a query layer component (C)
The interface proposed by the framework mainly consist of three components: Scatterplot component (Figure 5B), scatterplot matrix component (Figure 5A) and query layer component (Figure 5C). The scatterplot component shows the currently viewed cell of the scatterplot matrix with the name and labels of the two displayed axes. The scatterplot matrix component can be used both as an overview and a navigational tool. Navigation in the scatterplot matrix is restricted to orthogonal movement along the same row or column in the matrix so that one dimension in the focused scatterplot is always preserved while the other changes. The change is visualized using a 3D rotation animation which gives a semantic meaning to the movement of the points, allowing human mind to interpret the motion as shape [16].

11
The transition of scatterplots is performed as a three-stage animation: extrusion into 3D, rotation and projection into 2D. More specifically, given two current visualized dimensions x and y and a vertical transition to a new dimension y', will follow below mentioned steps (also depicted in Figure 6).
Figure 6: Stage-by-stage overview of the scatterplot animated transition: Extrusion (A, B), rotation (C), Projection (D, E)
● Extrusion: The scatterplot visualizing x and y axes is extruded to 3D where y’ becomes the new depth coordinate for each data point. At the end of this step the 2D scatterplots has become 3D (Figure 6A and 6B)
● Rotation : The scatterplot is rotated 90 degrees up or down, causing the axis previously along the depth dimension to become the new vertical axis (Figure 6C)
● Projection : The 3D plot is projected back into 2D with x and y as the new horizontal and vertical axes (Figure 6D and 6E)
Further, rolling dice framework suggest a method called query sculpting which allows selecting data items in the main scatterplot visualization using 2D bounding shapes (convex hulls) and iteratively refining that selection from other viewpoints while navigating the scatterplot matrix. As shown in Figure 5C the query layer component is used for selecting, naming and clearing color-coded queries during the visual exploration. Clicking and dragging one query onto another will perform union or intersection operation (by dragging using the left or right mouse button respectively). Each query layer also provides a visual indication of the percentage of items currently selected by it.

12
3.1.3 Shortcomings of Scatterplot Matrix (SPLOM)
In-order to discuss the shortcomings of SPLOM let's consider a fictitious "nuts-and-bolts" dataset. This dataset shown in Table 1 involves 3 (independent) categorical variables: Region (North, Central, and South), Month (January, February...), and Product (Nuts or Bolts). It also consists of 3 (dependent) continuous variables: Sales, Equipment costs, and Labor costs.
Region
Month
Product
Sales
Equipment costs
Labor costs
North
Jan
Nuts
2.78
0.92
4.30
North
Feb
Nuts
4.92
1.64
4.30
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
South
Dec
Bolts
9.50
2.44
5.20
Table 1: “Nuts-and-Bolts” dataset
Figure 7 shows the SPLOM for the "nuts-and-bolts" dataset and the top three scatterplots (e.g. Month vs Region) each show a crossing of two categorical variables, resulting in an uninformative grid of points. Further, scatterplots showing continuous vs categorical variables suffers from over plotting (e.g.: Sales vs. product)
Figure 7: Scatterplot matrix for the “Nuts-and-bolts” dataset

13
In-order to overcome this issue Generalized Plot Matrix (GPLOM) [17] has been proposed. In the GPLOM it is suggested to use heatmaps to visualize pairs of categorical variable, bar-charts to visualize continuous vs categorical variables and scatterplots to visualize pairs of continuous variables. It is important to note that in this scenario scatterplots show individual tuples, whereas the barchars and heatmaps show aggregated data. Figure 8 shows the GPLOM for the “nuts-and-bolts” dataset. Even though GPLOM is a better choice than SPLOM to visualize a combination of continuous and categorical variables, since it uses 3 types of charts it loses the consistency of the matrix.
Figure 8: Generalized Plot Matrix for the “Nuts-and-bolts” dataset
3.2 Parallel Coordinates
Parallel coordinates introduced by Inselberg and Dimsdale [11, 19] is a popular technique for transforming multidimensional data into a 2D image. The m-dimensional data items are represented as lines crossing m parallel axes, each axis corresponding to one dimension of the

14
original data. Fundamentally parallel coordinates differ from all other visualization methodologies since it yields graphical representation of multidimensional data rather than just visualizing a finite set of points [19].
Figure 9 displays Parallel Coordinate plot with 8 variables using a dataset[58] which contains information about cars such as economy (mpg), cylinders, displacement (cc)and etc. for a selected sample of cars manufactured within 1970 to 1982.
Figure 9: Parallel coordinate plot with 8 variables for 250 cars
3.2.1 Definition and Representation
On the plane with xy-Cartesian coordinates starting on the y-axis, N copies of the real line, labeled x1,x2, x3.... xn are places equi-distant and perpendicular to the x axis, They are the axes of the parallel coordinate system for Euclidean N-Dimensional Space RN all having the same positive orientation as the y axis. [11]
Figure 10: Parallel Coordinate plot for a point

15
In the figure 10 it is shown how a point C with coordinates (c1, c2, c3...cn) can be represented by a polygonal line. As in the aforementioned way m number of data points can be represented by m polygonal lines.
A point in 2d Cartesian space is represented by a single line in parallel coordinates. Extending on this, a line in 2d Cartesian space is represented in parallel coordinates by selecting a set of collinear points on the line and representing each of those points in the parallel coordinates visualization. The lines in the parallel coordinates visualization that represent those points intersect at some point in the visualization. If the distance between the axes is d, the intersecting point 푙 for line l is,
For lines with negative slope (m < 0) the interesting point lies between the axes as in Figure 11.
Figure 11: Parallel Coordinate plot for points in a line with m < 0
For m > 1 the intersecting point lies left of the X1 axis while intersecting point for the lines with m (0 < m < 1), lies right of the X2 axis as in the Figure 12.
Figure 12: Parallel Coordinate plot for points in a line with 0<m<1

16
The above property can be considered as one of the main advantages in parallel coordinates. Parallel Coordinates representations can provide statistical data interpretations. In the statistical setting, the following interpretations can be made: For highly negatively correlated pairs, the dual line segments in Parallel Coordinates tend to cross near a single point between the two Parallel Coordinates axes. Parallel or almost parallel lines between axes indicate positive correlation between variables [20, 21]. For an example we can see that there is a highly negative correlation between weight and year in the Figure 13.
Figure 13: Negative correlation between Car Weight and the Year
Over the years parallel coordinates have been enhanced by multiple people. Data Scientists have been working on improving this technique for better data investigation and for easier, user-friendly interaction by adding brushing, data clustering, real-time re-ordering of coordinate axes, etc.
3.2.3 Brushing
Brushing is considered to be a very effective technique for specifying an explicit focus during information visualization [22]. The user actively marks subsets of the data-set as being especially interesting and the points that are contained by the brush are colored differently from other points to make them standout [23]. For example if the user is interested in cars having 6 cylinders he can use brushing as depicted in the Figure 14.

17
Figure 14: Using brushing to filter Cars with 6 cylinders
The introduction of composite brushes [23] allows users to more specifically define their focus. Composite brushes are a combination of single brushes which result the conjunction of those single brushes. For an example if the user is interested in cars having 6 cylinders that were produced on 76’ he can use composite brushing as depicted in Figure 15.
Figure 15: Using composite brushing to Filter Cars with 6 cylinders made in 76’
Brushing technique we have seen up to now uses a discrete distinction between focus and context. With that we don’t understand the similarity of other data points to the focused data points. The solution that had brought forward for this is called smooth brushing [22] where a multi-valued or even continuous transition is allowed, which inherently supports the similarity between data-points in focus and their context. This corresponds to a degree-of-interest (DOI) function which non-binarily maps into the [0, 1] range. Often, such a non-binary DOI function is defined by means of spatial distances, i.e., the DOI-value reflects the distance of a data-point from a so-called center-of interest.

18
Figure 16: An example of Smooth brushing: note the gradual changes of drawing intensity which reflect the respective degree of interest, after smooth brushing of the 2nd axis.
The standard brushing primarily acts along the axes, but the technique called angular brushing enables the space between axes for brushing [22]. The user can interactively specify a sub-set of slopes which then yields all those data-points to be marked as part of the current focus, which exhibit the matching correlation in between the brushed axes. For an example if the user is interested on data that only has a negative correlation between Horsepower and Acceleration he can use angular brushing as shown in Figure 17.

19
Figure 17: Angular Brushing: Reading between the lines whereas most line-segments go up in- between the 2nd and the 3rd axis (visualizing a positive correlation of values there), just a few go down – those have been emphasized through angular brushing
3.2.4 Axis Reordering
One strength of parallel coordinates as described in section 3.2.1, is its effectiveness of visualizing relations between coordinate axes. By bringing axes next to each other in an interactive way, the user can investigate how values are related to each other with special respect to two of the data dimensions. Order of the axes clearly affects the patterns revealed by parallel coordinate plots. Figure 18 shows 3 ways out of N! (N = 8 in this case) ways of reordering axes. But only the plot C in Figure 18 is capable of showing that there is a highly negative correlation between weight and economy.
Many Researchers address this problem using some measure to score an order of axes while others discuss how to visualize multiple orderings in a single display [24]. Many approaches for this which are based on the combination of Nonlinear Correlation Coefficient and Singular Value Decomposition algorithm [25] are suggested. By using these approaches, the first

20
remarkable axe can be selected based on mathematics theory and all axis are re-ordered in line with the degree of similarities among them [25].
Figure 18: Multiple ways of ordering N axes in parallel coordinates: (A): The default Order of the Axes, (B): Axes are re-ordered to see the correlation between Year and Power - highly negative correlation is observed. (C): Axes are reordered to see the correlation between Weight and the Economy - highly negative correlation is observed.
3.2.5 Data Clustering
Parallel Coordinates are a good technique to show clusters in the data set. There are many techniques that researchers have used to show clusters in parallel coordinates.

21
Coloring is one method that has been used to show clusters in parallel coordinates [26]. Different colors will be assigned to different clusters. As in the figure 19 it shows two clusters that had been given explicitly is represented with 2 different colors.
Figure 19: Two clusters represented in parallel coordinates with two different colors (red and blue)
Figure 20 shows the same cluster visualization technique for more many clusters for the data set taken from USDA National Nutrient Database.
Figure 20: Multiple clusters visualized in parallel coordinates in different colors
Variable length Opacity bands [26] is another technique of showing clusters in Parallel Coordinates. Figure 21 shows a graduated band faded from a dense middle to transparent edges that visually encodes information for a cluster. The mean stretches across the middle of the band and is encoded with the deepest opacity. This allows the user to differentiate sparse, broad clusters and narrow, dense clusters. The top and bottom edges of the band have full transparency. The opacity across the rest of the band is linearly interpolated. The thickness of the band across each axis section represents the extents of the cluster in that dimension.

22
Figure 21: Variable length Opacity Bands representing a cluster in parallel coordinate
Curved bundling [27] is also used to visualize clusters in parallel coordinates. Bundled curve plots extend the traditional polyline plots and are designed to reveal the structure of clusters previously identified in the input data. Given a data point (P1, P2,...,PN),its corresponding polyline is replaced by a piecewise cubic Bezier curve preserving following properties. (Denote the main axes by X1, X2, X3 … XN to avoid the confusion between them and the added axes.)
● The curve interpolates P1, P2,..., PN at the main axes
● Curves corresponding to data points that belong to the same cluster are bundled between adjacent main axes. This is accomplished by inserting a virtual axis midway between the main axes and by appropriately positioning the Bézier control points along the virtual axis. To support curve bundling, control points that define curves within the same cluster are attracted toward a cluster centroid along the virtual axis.
Figure 22 compares a polyline plot with its counterpart using bundled curves. Polylines require color coding to distinguish clusters, whereas curve bundles rely on geometrical proximity to naturally represent cluster information. The cluttered visualization in color-coded polylines, which is the standard approach to cluster-membership visualization, motivates the new geometry based method.

23
Figure 22: Parallel-coordinates plot (A) using polylines with color coding to show clusters, and (B) using bundled curves
Bundling violates the point-line duality discussed in section 3.2.1, but can be used to visualize clusters using geometry only, leaving the color channel free for other uses such as statistical coloring which is described in section 3.2.6. To adjust the shape of Bézier curves there are many algorithms proposed by many researchers [27, 28, 29].
3.2.6 Statistical Coloring
Coloring polygonal lines can be used to display statistical coloring of axes. A popular color scheme is to color by z-score for that dimension, so that we can understand the data distribution of that dimension. Figure 23 shows how z-score coloring has been used on weight dimension in that data set.
Figure 23: Statistically colored Parallel Coordinates plot on weight of cars - Cars that have a high weight will be blue in color while low weight vehicles are colored red.

24
3.2.7 Scaling
Scaling of the axes are also an interesting property in the parallel coordinates. Default scaling is to plot all values over the full range of each axis between the minimum and the maximum of the variable. Several other scaling methods have been suggested by researchers [21]. A common one would be to use a common scale over all axes. Figure 24 shows the difference between two scaling methods. The data taken is individual stage times of the 155 cyclists who finished the 2005 Tour De France bicycle race. Figure 24A is plotted with default scaling and Figure 24B is plotted using a common scale over all axes. But it is obvious that the both Figure 24A and Figure 24B are not capable enough to reveal correlations between axes even though Figure 24B shows the outliers clearly. But the spread between the first and the last cyclist is almost invisible for most of the stages. In the Figure 24C, a common scale for all stages is used, but each stage is aligned at the median value of that stage. It is the user experience, his domain knowledge and the use case that defines the scale and alignment on the parallel coordinates [21].
Figure 24: Three scaling options for visualizing the stage times in the Tour de France 2005: (A): All stages are scaled individually between minimum and maximum value of the stage (usual

25
default for parallel coordinate plots). (B): A common scale is used, i.e., the minimum/maximum time of all stages is used as the global minimum/maximum for all axes. (C): Common scale for all stages, but each stage is aligned at the median value of that stage.
3.2.8 Limitations
Even though Parallel coordinates are a great tool to visualize high dimensional data, it soon reached its limits. When using a very large dataset there are some identified weaknesses in parallel coordinates such as:
1. Cross-over Problem - The zigzagging polygonal lines used for data representation are not continuous. They generally lose visual continuation across the parallel-coordinates axes, making it difficult to follow lines that share a common point along an axis.
2. When two or more data points have the same or similar values for a subset of the attributes, the corresponding polylines may overlap and clutter the visualization.
Figure 25 depicts the aforementioned two problems - A parallel coordinate plot drawn for 8000 data points.
Figure 25: Parallel Coordinates plot for a data set with 8000 rows. (Food information taken from USDA National Nutrient Database)
Given a very large data set, with this two problems it is not easy to come to a conclusion about the correlation in axes and brushing also will not give a clear idea about the data.
One solution to above problems is to use α-blending [21]. When α-blending is used, each polygon is plotted with only α percent opacity. With smaller α values, areas of high line density are more visible and hence are better contrasted to areas with a small density.

26
The data in Figure 26 are real data from Forina et al.[32] on the fatty acid content of Italian olive oil samples from nine regions. Figure 26 A, B, C shows the same plot of all eight fatty acids with α-values of 0.5, 0.1, and 0.01 respectively. Depending on the amount of α- blending applied, the group structure of some of the nine regions is more or less visible [21].
It is hard to come to a conclusion about a value for α. The user must adjust the α value until the graph gain enough insight.
Figure 26: Parallel coordinates for the “Olive Oils” data with different alpha values. α = 0.5 (A), α = 0.1 (B), and α = 0.05 (C)

27
Clustering and statistical coloring were mentioned in the sections 3.2.5 and 3.2.6 will also reduce the weaknesses in Parallel Coordinates.
Figure 27: Parallel coordinates visualization with Z score coloring: Z score coloring based on the amount of water - foods with high water percentage will have blue color while foods with lower water percentage will have red color
As in the Figure 27, point line duality is preserved more when statistical coloring is used. Data preprocessing techniques can also be used to overcome the limitations in parallel coordinates: data selection and data aggregation. Data selection means that a display does not represent a dataset as a whole but only a portion of it, which is selected in a certain way [30].The display is supplied with interactive controls for changing the current selection, which results in showing another portion of the data [30].
The Figure 28 shows how to display portion of the data and to overcome the weaknesses in Parallel Coordinates. The Figure 28A only displays food group of sausages and luncheon meats. Respectively, Figure 28B and Figure 28C displays food groups of beef products and spices and herbs, which is a better visualization than visualizing whole data set.
Data aggregation reduces the amount of data under visualization by grouping individual items into subsets, often called ‘aggregates’, and some collective characteristics of the aggregates can be computed. The aggregates and their characteristics (jointly called ‘aggregated data’) are then explored instead of the original data. For an example in parallel coordinates there is just one polygonal line for the whole cluster so that mentioned limitations at the beginning of this section will be reduced.

28
Figure 28: Parallel Coordinates drawn on same data set using data selection: (A): Displays food group of sausages and luncheon meats. (B): Displays food groups of beef products. (C): Displays food groups of spices and herbs
Parallel Coordinates might be the least affected plot from curse of dimensionality since it can represent many dimensions as long as the screen width permits. But that also comes to a limitation when it comes to high dimensional data because the distance d between two coordinates gets decreased with the increase in number of dimensions. As a result the correlation between axes might not be clear in the plot. Most of the applications assume it is up to the user to decide which attributes should be kept in, or removed from a visualization. This approach will not be a good approach for a user who does not have domain knowledge, parallel coordinates itself can be used to reduce dimensions of the data set [31].

29
When we were discussing about axis reordering in section 3.2.4 we talked about getting a measure to the axis similarity. Once the most similar axes are identified through that algorithm the application can suggest user to remove them and keep one significant axe to all those identified similar axes [31]. In that way redundant attributes can be removed from the visualization and the space can be used efficiently to represent the remaining attributes.
Parallel Coordinates are a good technique to visualize data. It support many user interactions and data analytic techniques. Even though it has limits researchers have found many ways to overcome those limitations. Parallel Coordinates are still a hot topic for data visualization research work.
3.3 Radviz
The Radviz (Radial Visualization) visualization method [33] maps a set of n dimensional data points onto a two dimensional space. All dimensions are represented by a set of equally spaced anchor points on the circumference of a circle.
For each data instance, imagine a set of springs that connects the data point to the anchor point for each dimension. The spring constant for the spring that connects to the ith anchor corresponds to the value of the ith dimension of the data instance. Each data point is then displayed where the sum of all the spring forces equals 0. All the data point values are usually normalized to have values between 0 and 1.
Consider the example in Figure 29.A, this data has 8 dimensions {d1, d2. … dn}. Each data point is connected as shown in the diagram using springs. Following this procedure for all the records in the dataset leads to the Radviz display. Figure 29.B shows a Radviz representation for a dataset on transitional cell carcinoma (TCC) of the bladder generated by Clifford Lab at LSUHSC-S [34].
One major disadvantage of this method is the overlap of points. Consider the following two points on a 4 dimensional data space, (1, 1, 1, 1) and (10, 10, 10, 10). These two data records will overlap in a Radviz display even though they are clearly different because the dimensions pull them both equally.

30
A B
Figure 29: Radviz Visualization for multi-dimensional data. (A): Shows the set of springs and the forces exerted by those springs on a single data point. (B): A Radviz representation for a dataset on transitional cell carcinoma
Categorical dimensions cannot be visualized with Radviz and require additional preprocessing. First each categorical dimension needs to be flattened to create a new dimension for each possible category. This becomes problematic as the number of possible categories increase and may lead to poor visualizations.
Another challenge in generating good visualizations with this method is identifying a good ordering for the anchor points that correspond to the dimensions. A good ordering needs to be found that makes it easy to identify patterns in the data. An interactive approach that allows for changing the position of anchor points can be used to help users overcome this issue.
3.4 Mosaic Plots
Mosaic plots [35, 36] are a popular method of visualizing categorical data. They provide a way of visualizing the counts in a multivariate n-way contingency table. The frequencies in the contingency table are represented by a group of rectangles whose areas are proportional to the frequency of each cell in the contingency table.
A mosaic plot starts as a rectangle. Then at each stage of plot creation, the rectangles are split parallel to one of the two axes based on the proportions of data belonging to a category. An

31
example of a mosaic plot is shown in Figure 30. It shows a mosaic plot for the Titanic dataset, which describes the attributes of passengers on Titanic details of their survival.
Figure 30: Mosaic plot for the Titanic data showing the distribution of passenger’s survival based on their class and sex
The process of creating a mosaic display can be described as below [37].
Let us assume that we want to construct a mosaic plot for p categorical variables X1,..., Xp. Let ci be the number of categories of variable Xi, i = 1, . . . , p.
1. Start with one single rectangle r (of width w and height h), and let i = 1.
2. Cut rectangle ri-1 into ci pieces: find all observations corresponding to rectangle ri−1, and find the breakdown for each variable Xi (i.e., count the number of observations that fall into each of the categories). Split the width (height) of rectangle ri−1 into ci pieces, where the widths (heights) are proportional to the breakdown, and keep the height (width) of each the same as ri−1. Call these new rectangles rji, with j = 1, . . . ,ci.
3. Increase i by 1.
4. While i<= p, repeat steps 2 and 3 for all rji−1 with j =1 , . . . ,ci−1
In standard mosaic plots the rectangle is divided both horizontally and vertically. A variation of mosaic plots that only divide the rectangle horizontally has been proposed called Double Decker plots [38]. These can be used to visualize association rules. An example of a

32
double decker plot is show in Figure 31 for the same data as in Figure 30. There are other variations of mosaic plots such as fluctuation diagrams that try to increase the usability of them.
Figure 31: Double decker plot for the Titanic data showing the distribution of passenger’s survival based on their class and sex
Mosaic plots are an interesting visualization technique for categorical data but they can't handle continuous data. To display continuous data using a mosaic plot the data needs to be first converted to categorical through a process such as binning. Mosaic plots require the visual comparison of rectangle and their sizes to understand the data. But this becomes complicated as the number of rectangles increase and the distance between two increases. So they are harder to interpret and understand. Vastly different aspect ratios of the rectangles also compound the difficulty in comparing their sizes.
Another issue with Mosaic plots is that they become more complex as the number of dimensions in the data increase. Each additional dimension requires the rectangles to be split again which at least doubles the possible number of rectangles leading to a final visualization that is not very user friendly.
3.5 Self Organizing Maps
Self-organizing maps (SOM) [39] is a type of neural network that has been used widely in data exploration and visualization among its many other uses. SOMs use an unsupervised learning algorithm to perform a topology preserving mapping from a high dimensional data space to a lower dimensional map (usually a two dimensional lattice). The mapping preserves the

33
topology of the high dimensional data space such that data points lying near each other in the original multidimensional space maps to nearby units in the output space. Generating self-organizing maps consists of training a set of neurons with the dataset. At each step of the training an input data item is matched against the neurons from which the closest one is chosen as the winner. Then the weights of the winner and the neighborhood of the winner is updated to reinforce this behavior. the final result is a topology preserving ordering where similar new data entry will match to neurons nearer to each other.
Figure 32: Training a self-organizing map. For each data item, the closest neuron is selected using some distance metric
An example of a self-organizing map is shown in Figure 33. This shows a self-organizing map trained on the poverty levels of countries [40]. As can be seen clearly countries with similar poverty levels got matched to neurons close to each other. USA, Canada and other countries with lower poverty are together in the yellow and green areas while countries such as Afghanistan and Mali which have high poverty levels are grouped together in the purple areas. This shows the topology preserving aspect of SOMs.
Figure 33: A self-organizing map trained on the poverty levels of countries

34
There are some challenges with using self-organizing maps for multidimensional data visualization. 1. SOMs are not unique. The same data can lead to widely different outcomes based on the initialization of the SOM. So the same data may yield different visualizations and lead to confusion. 2. While similar data points are grouped together in SOMs, similar groups are not guaranteed to be close to each other. Some SOMs may be created that have similar groups in multiple places in the map. 3. SOMs are not very user friendly when compared with other visualization techniques. Its not easy to look at a SOM and interpret the data. 4. The process of creating a SOM is computationally expensive. The computational requirements grow as the dimensionality of data increases. In modern data sources that are highly complex and detailed this becomes a major drawback.
3.6 Sunburst Visualization
The Sunburst technique, like Tree Map [44] is a space-filling visualization that uses a radial rather than a rectangular layout to visualize hierarchical information [43]. It is comparable to a nested pie charts. It can be used to show hierarchical information such as elements of a decision tree. This compact visualization avoids the problem of decision trees getting too wide to fit the display area. It’s akin to visualizing the tree in a top down manner. The center represents the root of the decision tree and the ring around it as its children. In SunBurst, the top of the hierarchy is at the center and deeper levels farther away from the center. The angle swept out by an item and its color correspond to some attribute of the data. For instance, in a visualization of a file system, the angle may correspond to the file/directory size and the color may correspond to the file type. An example Sunburst display is shown in Figure 34. This visualization has been used to summarize user navigation paths through a website [41]. Further this visualization has been used to visualize frequent item sets [42].

35
Figure 34: A sunburst visualization summarizing user paths through a fictional e-commerce site. The inner ring represents the first event in the visit (showing here, for example, that most visits start on the homepage and approximately one-third start on a product page). The outer rings represent the subsequent events.
3.7 Trellis Visualization
Trellis chart Also known as: Small Multiples [45], Panel Chart, Lattice Chart, Grid Chart, is a layout of smaller charts in a grid with consistent scales. Each smaller chart represents an item in a category, named “conditions” [48]. The data displayed on each smaller chart is conditional on items in the category. Trellis Charts are useful for finding the structure and patterns in complex data. The grid layout looks similar to a garden trellis, hence the name Trellis Chart.

36
Figure 35: Trellis Chart for a dates set on sales
Main aspects of trellis displays are columns, rows, panels and pages [46]. The figure 35 consists of 4 columns, 1 row, 4 panels and 1 page. Trellised visualizations enable the user to quickly recognize similarities or differences between different categories in the data. Each individual panel in a trellis visualization displays a subset of the original data table, where the subsets are defined by the categories available in a column or hierarchy. To make plots comparable across rows and columns, the same scales are used in all the panel plots [47].
Benefits of trellis chart are;
● They are easy to understand. A Trellis Chart is a basic chart type repeated many times. If you understand the basic chart type, you can understand the whole Trellis Chart.
● Having many small charts enables you to view complex multi-dimensional data in a flat 2D layout avoiding the need for confusing 3D charts.
● The grid layout combined with consistent scales makes data comparison simple. Just look up/down or across the charts.
Figure 36 contains a trellis chart for Minnesota Barley Data from The Design of Experiments [59] by R.A. Fisher. The trial involved planting: 10 varieties of barley, in 6 different sites over two different years. The researchers measured yield in bushels per acre for each of the 120 possibilities.

37
Figure 36: Minnesota Barley Data Trellis Chart
3.8 Grand Tour
Grand tour is one of the tour methods which is used to find structure of multidimensional data. This method can be applied to show multidimensional data in a 2D computer display. Tour is a subset of all the possible projections of multidimensional data. The different tour methods combine several static projections using different interpolation techniques into a movie, which is called a tour [50].
3.8.1 Tours
In a static projection some of the information of the dataset is lost to the user. But if several projections in different planes can be shown to the user step by step, user can get the idea of overview of structure of the multivariate data.

38
Tours provide a general approach to choose and view data projections, allowing the viewer to mentally connect disparate views, and thus supporting the exploration of a high- dimensional space. (A) (B) Figure 37: (A): The scatterplot shows a multidimensional data set (some census data [49]). The data is mapped to coordinates in a multidimensional space. A snapshot of the grand tour, a projection of the data to single plane is illustrated in (B).
3.8.2 Tour methods
● Grand Tour - Shows all projections of the multivariate data by a random walk through the landscape. ● Projection Pursuit (PP) guided tour - Tour gives more concentration to more interesting views based on a PP index. ● Manual Control - User can decide the tour direction to take.
The grand tour method for choosing the target plane is to use random selection. A frame is randomly selected from the space of all possible projections. A target frame is chosen randomly by standardizing a random vector from a standard multivariate normal distribution: sample p values from a standard univariate normal distribution, resulting in a sample from a

39
standard multivariate normal. Standardizing this vector to have length equal to one gives a random value from a (p−1) dimensional sphere, that is, a randomly generated projection vector. Do this twice to get a 2D projection, where the second vector is orthonormalized on the first. Figure 38 illustrates the tour path.
Figure 38: grand tour path in 3D space.
The solid circle in Figure 38 indicates the first point on the tour path corresponding to the starting frame. The solid square indicates the last point in the tour path, or the last projection computed. Each point corresponds to a projection from 3 dimensions to one dimension. The projection will look as if the data space is viewed from that direction. In grand tour this point is chosen randomly.

40
4. CEP Rule generation
Recent advances in technology has enabled the generation of vast amounts of data in a wide range of fields. This data is created continuously in large quantities overtime as data streams. Complex Event Processing (CEP) can be used to analyze and process these large data streams to identify interesting situations and respond to them as quickly as possible.
Complex event processors are used in almost every domain : vehicular traffic analysis, network monitoring, sensor details analyzing[51], analyzing trends in stock market[52], fraud detection[53]. Any system that requires real time monitoring can use a complex event processor.
In CEP, the processing takes place according to user-defined rules, which specify the relations between the observed events and the actions required by the user. For an example in a network monitoring system a complex event processor can be used to notify the system admin about an excessive internet usage of an user in that particular network. An example rule will look like this,
from currentsums[bandwidth>100000]
select User_IP
insert into shouldNotify;
Where if a user's bandwidth exceeds the limit, the admin will receive a notification. The value of the "limit" in this example should be low enough to catch high usage as well as it should be high enough to ignore normal users.
Any complex event processing rule will have a condition to check, and an action associated with that condition. So regardless of the domain, any system using a CEP heavily depends on the rules defined by the user.
In current complex event processing applications, users need to manually specify the rules that are used to identify and act on important patterns in the event streams. This is a complex and arduous task that is time consuming, includes a lot of trial and error and typically requires domain specific information that is hard to identify accurately.
So the rule writing is typically done by domain experts who study the parameters available in the event streams manually or using external data analysis tools to identify the

41
events that need to be specially handled. Needless to say that incorrect estimation of relevant parameters in the rules negatively impacts the utility of the systems that depend on accurate processing of these events. Even for domain experts manually specifying textual rules in CEP specific rule language is not a very user friendly experience. Maintaining the system after a rule is specified to provide the same functionality through changing data and behavior may require periodical updates to the specified rule that may require the same effort as initially spent.
Several approaches [54, 55, 56] have been proposed to overcome these difficulties using data mining and knowledge discovery techniques to generate rules based on available data. This provide users the ability to automatically generate rules based on their requirements.
Two approaches have been proposed that can help in generating CEP rules. One is Using a framework that learns, from historical traces, the hidden causality between the received events and the situations to detect, and uses them to automatically generate CEP rules [54]. Another approach is to use a skeleton of the rule and use historical traces to tune the parameters of the final rule [55].
4.1 iCEP
iCEP [54] analyzes historical traces and learns from them. It adopts a highly modular design, with different components considering different aspects of the rule.
Following terminology and definitions are used in the framework.
Each event notification is assumed to be characterized by a type and a set of attributes. The event type defines the number, order, names, and types of the attributes that compose the event itself. It is also assumed that events occur instantaneously at some points in time. Accordingly, each notification includes a timestamp, which represents the time of occurrence of the event it encodes. Author of the paper uses the following example event of type ‘Temp’
Temp@10(room=123, value=24.5)
This event contains the fact that the air temperature measured inside room 123 at time 10 was 24.5 0C.
Another aspect of the terminology used by the authors is the difference between primitive and composite events. Simple events similar to the one given above are considered as primitive events. A composite event is defined using a pattern of primitive events. When such a pattern is

42
identified the CEP engine derives that a composite event has occurred and notifies the interested components. An event trace that end with the occurrence of the composite event is called a positive event trace.
iCEP framework uses the following basic building blocks used in most CEP systems to generate filters for events.
➔ Selection: filters relevant event notifications according to the values of their attributes.
➔ Conjunction: combines event notifications together
➔ Parameterization: introduces constraints involving the values carried by different events.
➔ Sequence: introduces ordering relations among events.
➔ Window: defines the maximum timeframe of a pattern.
➔ Aggregation: constraints involving some aggregated value.
iCEP uses a set of modules that generates a combination of above building blocks to generate CEP rules. The framework uses a training data set created using historical traces to generate rules using a supervised learning technique.
The learning method uses the following consideration.
Consider the following positive event trace
ε1 : A@0, B@2, C@3
This implies the following set of constraints Sε1
- A: an event of type A must occur
- B: an event of type B must occur
- C: an event of type C must occur
- A→B: the event of type A must occur before that of type B
- A→C: the event of type A must occur before that of type C
- B→C: the event of type B must occur before that of type C
We can assert that for each rule r and event trace ε, r fires if and only if Sr ⊆ Sε where Sr is the complete set of constraints that needs to be satisfied for the rule to fire.
Using these considerations the problem of rule generation can be expressed as the problem of identifying Sr. Given a positive trace ε, Sε can be considered as an over constraining

43
approximation of Sr. To produce an approximation of Sr we can consider the set of all positive traces collectively and consider the conjunction of all the sets of constraints generated.
Using these intuitions the iCEP framework follows the following steps in generating rules.
1. Determine the relevant timeframe to consider (window size)
2. Identify the relevant event types and attributes
3. Determine the selection and parameter constraints
4. Discover ordering constraints (sequences)
5. Identify aggregate and negation constraints.
Figure 39: Structure of the iCEP framework
The final structure of the framework is shown in figure 39. The problem is broken down to sub problems and solved using different modules (described below) that work together.
● Event Learner: The event learner tries to determine which primitive event types are required for the composite event to occur. It considers the window size as an optional input parameter. It cuts each positive trace such that it ends with the occurrence. For each positive trace, the event learner extracts the set of event types it contains. Then, according to the general intuition described above, it computes and outputs the intersection of all these sets.
● Window Learner: The window learner is responsible for learning the size of the window that includes all primitive events required for a composite event. If the required event types are knows the window learner tries to identify a window size that would ensure all required primitive events are present is all positive traces. If the required event types are not known, window learner and event learner uses an iterative approach where increasing window sizes are fed to the event learner until a required accuracy in the rule is reached.

44
● Constraint Learner: This module receives the filtered event traces from the above two modules and tries to identify possible constraints in the parameters. For all parameters it tries to look for equality constraints where all possible traces contain a single value and failing that generates an inequality constraint the looks for values between the minimum and maximum value available all positive traces.
● Aggregate Learner: As shown in Figure 39, the aggregate learner runs in parallel with the constraint learner. Instead of looking for single value constraints the aggregate learner uses aggregation functions such as ‘sum’ and ‘average’ over the time window over all the events of a certain type to generate constraints.
Other modules in the framework uses the same methods to identify different aspects of the rule.
The effectiveness of the framework has been assessed using the following steps.
1. Use an existing rule created by a domain expert that identifies a set of composite events in a data stream and collect the positive traces.
2. Use iCEP with the data collected in the above step to generate a rule
3. Run the data again through the CEP with the generated rule and capture the composite events triggered.
4. Compare the two versions and calculate precision and recall
The results have been promising with a precision of around 94% based on some of the tests that were run by the authors. But the system is far from perfect and the following are some of the challenges that needs to be overcome.
1. A large training dataset with many positive traces are required to generate good rules with high precision. The training methodology considers only the conjunction of all the positive traces to generate rules. So without a large number of positive traces that cover the variations in the data generating accurate rules is difficult.
2. High computational requirements. The iterative approach used with the windows learner and event learner translates to a lot of computations that needs to be done. So without hints from a domain expert on the window size or the required events and parameters the runtime and computational cost increases rapidly.

45
3. The generated rules require tuning and cleanup from the user. As the rules created are generated automatically the constraints may be over constraining or may contain mistakes when used with previously unseen conditions. So they require a final cleanup by the users.
4.2 Tuning rule parameters using the Prediction-Correction Paradigm
A mechanism has been proposed by Yulia Turchin in order to automate the definition of the rules at the beginning and automate the update of rules with the time [55]. It consists of 2 main repetitive stages - namely rule parameter prediction and rule parameter correction. Parameter prediction is performed by updating the parameters using available expert knowledge regarding the future changes of parameters. The rule parameter correction utilizes expert feedback regarding the actual past occurrence of events and the events materialized by the CEP framework to tune rule parameters. For an example in an Intrusion detection system [57] a domain expert can specify the rule as follow. “If the size of the received packet from user has a high level of deviation from “normal” packet size with estimated size of m1 and standard deviation of 휎1, infer an event E1 representing the anomaly level of the packet size”. It is a hard task to determine the values for m1 and 휎1and moreover the specified values can change with the time due to dynamic nature of network traffic. Rule parameter determination and tuning can be done as following: Given a set of rules, provide an initial value for rule parameters and then modify it as required. For example for a given rule, rule tuning algorithm might suggest to replace values m1 with values m2 such that m2 < m1. Initial prediction of m1 value can be done as special case of tuning where arbitrary value is corrected to m1 by the rule tuning algorithm. This rule tuning algorithm should be tied with ability of the system to correctly predict events. So that rule tuning algorithm can see that parameter m1 is too high and because of that many intrusions were not detected, therefore it needs to be reduced to m2.

46
Figure 40: Prediction Correction Paradigm
The proposed framework is based on the Kalman Estimator which is a simple type of supervised, Bayesian, and predict-correct estimator [18]. As shown in figure 40 the framework learns and updates the system state in two stages, namely rule parameter prediction and rule parameter update. Unsupervised learning is carried out in rule parameter prediction - rule parameters are updated without any user feedback and it depends on preexisting knowledge about how the parameters might change over time and events created by the inference algorithm to predict rule parameters. In rule parameter update stage, the parameters are tuned in a supervised manner using domain expert’s feedback and recently generated events to update rule parameters to next stage. User feedback can be given through two forms - direct and indirect feedback. Direct feedback involves changes to the system state while indirect feedback provides an assessment on the correctness of the estimated event history.
4.2.1 Model
The model of this methods consists of events, rules and system. In here event means a significant (of interest to the system) actual occurrence of the system. Examples of events include notifications of login attempt, failures of IT components. Therefore, we can define an event history h to be a set of all events (of interest to the system), as well as their associated data. And event notification to be an estimation of an occurrence of an event. Some event may not be notified and some non-occurring events may be notified because of faulty equipment. Therefore we can define estimated event history h’ of notified events (of interest to the system). Events can

47
be of two types: explicit events and inferred events. Explicit events are signaled by event sources. For example a new network connection request is an explicit event. Inferred events are the events materialized by the system based on other events, for example an illegal connection attempt event is an inferred event materialized by the network security system, based on the explicit event of a new network connection, and an inferred event of unsuccessful user authorization. Inferred events, just like explicit events, belong to event histories. Inferred events that actually occurred in the real world belong to event history h and those who are only estimated to occur in h’ estimated event history.
Events can be inferred by rules. Rule can be represented by quadruple r = <sr , pr , ar , mr>. sr is a selection function that filters events according to rule r. Events selected by selection function are said to be relevant events. Input to this function is an event history h. pr is a predicate, defined over a filtered event history, determining when events become candidates for materialization. The ar is an association function, which defines how many events should be materialized, as well as which subsets of selectable events are associated with each materialized event. mr is a mapping function that determines attribute values for the materialized events in ar.
4.2.2 System State
It is expected that expert can provide the form of sr , pr , ar , and mr but providing accurate values will be difficult. These are called rule parameters and set of all parameters will be called system state. This system state will be updated by this system as shown in figure 40. In predict stage parameters are updated using the knowledge how the rule might change over time and updated event history h. In update stage parameters are updated by direct feedback where exact rule parameter is mentioned, or in an indirect manner where events in estimated event history h’ are marked whether they actually occurred or not.
4.2.3 Rule Tuning Mechanism
In-order to tune rule parameters this framework uses discrete Kalman filter technique. The filter estimates the process state at some time and then obtains feedback in the form of (noisy) measurements.

48
Rule tuning model consists of two recursive equations: time equation which shows how parameters change over time and history equation which shows outcome of a set of rules and their parameters. Time equation is a function of previous system state (set of rule parameters) and actual event history of that time period and output of this equation is current system state. History equation is a function of current set of rule parameters, set of explicit event during that time period and actual event history of previous time period and output this equation is actual event history. But since current system state is not known, another equation which is known as estimated event history equation which differs from original history equation by using estimated current system state (estimated current set of rule parameters) and its output is estimated current event history. This can be used to evaluate performance of our inference mechanism. Performance evaluation will be based on the comparison of the estimated event history received from the inference mechanism and the actual event history, provided by expert feedback at the end of time interval k. By that we can measure the performance measures of precision and recall. The precision is the percentage of correctly inferred events relative to the total number of events inferred in this time interval. Recall measures the percentage of correctly inferred events (i.e., true positive) relative to the actual total number of events occurred in this time interval.
Figure 41: An overview of rules tuning method
The Rule Tuning Method consists of a repetitive sequence of actions that should be performed for correct evaluation and dynamic update of rule parameters. The sequence is illustrated in Figure 41.

49
Above model is a generic model for automating rule parameter tuning in CEP’s. Further, it serves as a proof of concept of automatic rule parameter tuning when doing that manually becomes a cognitive challenge. However the model introduced here is more generic and actual implementation will require lot of work and tailoring for that specific requirement (such as example mentioned here intrusion detection in IDS). But this model can work as a theoretical basis for any such work because of the promising results of the empirical study.

50
References
1. Wong, Pak Chung, and R. Daniel Bergeron. "30 Years of Multidimensional Multivariate Visualization." In Scientific Visualization, pp. 3-33. 1994.
2. Jolliffe, Ian. Principal component analysis. John Wiley & Sons, Ltd, 2005.
3. Tufte, E. R., & Graves-Morris, P. R. (1983). The visual display of quantitative information (Vol. 2). Cheshire, CT: Graphics press.
4. Data-Ink Ratio. [ONLINE] Available at: http://www.infovis-wiki.net/index.php/Data- Ink_Ratio. [Last Accessed 5 Nov. 2014].
5. Lie Factor. [ONLINE] Available at: http://www.infovis- wiki.net/index.php?title=Lie_Factor. [Last Accessed 5 Nov. 2014].
6. Keim, D. A. (2002). Information visualization and visual data mining. Visualization and Computer Graphics, IEEE Transactions on, 8(1), 1-8.
7. Asimov, D. (1985). The grand tour: a tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 6(1), 128-143.
8. Bier, E. A., Stone, M. C., Pier, K., Buxton, W., & DeRose, T. D. (1993, September). Toolglass and magic lenses: the see-through interface. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques (pp. 73-80). ACM.
9. Spoerri, A. (1995). InfoCrystal, a visual tool for information retrieval (Doctoral dissertation, Massachusetts Institute of Technology).
10. Seo, J., & Shneiderman, B. (2005). A rank-by-feature framework for interactive exploration of multidimensional data. Information Visualization, 4(2), 96-113.
11. Inselberg, A., & Dimsdale, B. (1987). Parallel coordinates for visualizing multidimensional geometry (pp. 25-44). Springer Japan.
12. Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58(347-352), 240-242.
13. Knorr, E. M., Ng, R. T., & Tucakov, V. (2000). Distance-based outliers: algorithms and applications. The VLDB Journal—The International Journal on Very Large Data Bases, 8(3-4), 237-253.
14. Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In ACM Sigmod Record (Vol. 29, No. 2, pp. 93-104). ACM.
15. Elmqvist, N., Dragicevic, P., & Fekete, J. D. (2008). Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation. Visualization and Computer Graphics, IEEE Transactions on, 14(6), 1539-1148.
16. Ullman, S. (1979). The interpretation of visual motion. Massachusetts Inst of Technology Pr.
17. Im, J. F., McGuffin, M. J., & Leung, R. (2013). Gplom: The generalized plot matrix for visualizing multidimensional multivariate data. Visualization and Computer Graphics, IEEE Transactions on, 19(12), 2606-2614.

51
18. R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35 – 45, 1960.
19. A. Inselberg and B. Dimsdale. Parallel Coordinates: A Tool for Visualizing Multi- dimensional Geometry , 1990
20. Savoska, S., & Loskovska, S. (2009, November). Parallel Coordinates as Tool of Exploratory Data Analysis. In 17th Telecommunications Forum TELFOR, Belgrade, Serbia (pp. 24-26).
21. Chen, C. H., Härdle, W., & Unwin, A. (2008). Handbooks of Computational Statistics: Data Visualization. 164 - 174
22. Hauser, H., Ledermann, F., & Doleisch, H. (2002). Angular brushing of extended parallel coordinates. In Information Visualization, 2002. INFOVIS 2002. IEEE Symposium on (pp. 127-130). IEEE.
23. Martin, A. R., & Ward, M. O. (1995, October). High dimensional brushing for interactive exploration of multivariate data. In Proceedings of the 6th Conference on Visualization'95 (p. 271). IEEE Computer Society.
24. Heinrich, J., & Weiskopf, D. (2012). State of the art of parallel coordinates. In Eurographics 2013-State of the Art Reports (pp. 95-116). The Eurographics Association.
25. Lu, L. F., Huang, M. L., & Huang, T. H. (2012, December). A new axes re-ordering method in parallel coordinates visualization. In Machine Learning and Applications (ICMLA), 2012 11th International Conference on (Vol. 2, pp. 252-257). IEEE.
26. Fua, Y. H., Ward, M. O., & Rundensteiner, E. A. (1999, October). Hierarchical parallel coordinates for exploration of large datasets. In Proceedings of the conference on Visualization'99: celebrating ten years (pp. 43-50). IEEE Computer Society Press.
27. Yuan Luo, Daniel Weiskopf, Member, IEEE Computer Society, Hao Zhang, Member, IEEE Computer Society, and Arthur E. Kirkpatrick : Cluster Visualization in Parallel Coordinates Using Curve Bundles
28. Heinrich, J., Luo, Y., Kirkpatrick, A. E., Zhang, H., & Weiskopf, D. (2011). Evaluation of a bundling technique for parallel coordinates. arXiv preprint arXiv:1109.6073.
29. Zhou, H., Yuan, X., Qu, H., Cui, W., & Chen, B. (2008, May). Visual clustering in parallel coordinates. In Computer Graphics Forum (Vol. 27, No. 3, pp. 1047-1054). Blackwell Publishing Ltd.
30. Andrienko, G., & Andrienko, N. (2005). Blending aggregation and selection: Adapting parallel coordinates for the visualization of large datasets. The Cartographic Journal, 42(1), 49-60.
31. Artero, A. O., de Oliveira, M. C. F., & Levkowitz, H. (2006, July). Enhanced high dimensional data visualization through dimension reduction and attribute arrangement. In Information Visualization, 2006. IV 2006. Tenth International Conference on (pp. 707- 712). IEEE.

52
32. Forina, M., Armanino, C., Lanteri, S. and Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition, in H. Martens and H. Russwurm (eds), Food Research and Data Analysis, Applied Science Publishers, London UK, pp. 189-214
33. Hoffman, Patrick, Georges Grinstein, Kenneth Marx, Ivo Grosse, and Eugene Stanley. "DNA visual and analytic data mining." In Visualization'97., Proceedings, pp. 437-441. IEEE, 1997.
34. R. Stone II, A.L. Sabichi, J. Gill, I.Lee, R. Loganatharaj, M. Trutschl, U. Cvek, J.L. Clifford. Identification of genes involved in early stage bladder cancer progression [Unpublished].
35. Hartigan, J. A., and Kleiner, B. (1981). Mosaics for contingency tables. In W. F. Eddy (Ed.), Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface. New York: Springer-Verlag.
36. Friendly, M. (2002). A brief history of the mosaic display. Journal of Computational and Graphical Statistics, 11(1).
37. Hofmann, H. (2008). Mosaic plots and their variants. In Handbook of data visualization (pp. 617-642). Springer Berlin Heidelberg.
38. Hofmann, H., Siebes, A. P., & Wilhelm, A. F. (2000, August). Visualizing association rules with interactive mosaic plots. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 227-235). ACM.
39. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464- 1480.
40. Kaski, S., & Kohonen, T. (1996). Exploratory data analysis by the self-organizing map: Structures of welfare and poverty in the world. In Neural networks in financial engineering. Proceedings of the third international conference on neural networks in the capital markets.
41. K. Rodden, “Applying a sunburst visualization to summarize user navigation sequences”, IEEE Comput. Graph. Appl. Mag., Vol. 34, iss. 5, pp. 36-40, Sept.-Oct. 2014.
42. Keim, D. A., Schneidewind, J., & Sips, M. (2005). Fp-viz: Visual frequent pattern mining. Bibliothek der Universität Konstanz.
43. J. Stasko. SunBurst [Online]. Available: http://www.cc.gatech.edu/gvu/ii/sunburst/
44. R. Vliegen, J.J. van Wijk and E.-J. van der Linden, "Visualizing Business Data with Generalized Treemaps", IEEE Trans. Visualization and Computer Graphics, vol. 12, no. 5, pp. 789-796, Sept./Oct. 2006.
45. E. Tufte, “Small Multiples,” in Envisioning Information, Cheshire, CT: Graphics Press, ch. 4,pp. 67-80.
46. R.A. Becker, et al., "The visual design and control of trellis display,” J. Comp. Graph. Stat., vol. 5, iss. 2, pp. 123-155, 1996.
47. M. Theus, “High Dimensional Data Visualizations,” in C. Chen et al. Handbook of Data Visualization, Berlin: Springer, part II¸ch. 6, sec. 3, pp. 156-163.
48. What is a Trellis Chart [Online]. Available: http://trellischarts.com/what-is-a-trellis-chart

53
49. M.Y. Huh, K. Kiyeol. "Visualization of multidimensional data using modifications of the Grand Tour," J. Appl. Statist.,vol. 29, no. 5, pp. 721-728,2002.
50. D. Cook et al., “Grand Tours, Projection, Pursuit Guided Tours, and Manual Controls,” in C. Chen et al. Handbook of Data Visualization, Berlin: Springer, part III¸ch. 2, pp. 296-312.
51. Broda, K., Clark, K., Miller, R., & Russo, A. (2009). SAGE: a logical agent-based environment monitoring and control system (pp. 112-117). Springer Berlin Heidelberg.
52. A. J. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. M. White. Towards expressive publish/subscribe systems. In EDBT, pages 627–644, 2006.
53. N. P. Schultz-Møller, M. Migliavacca, and P. Pietzuch. Distributed complex event processing with query rewriting. In DEBS, pages 4:1–4:12. ACM, 2009.
54. Margara, A., Cugola, G., & Tamburrelli, G. (2014, May). Learning from the past: automated rule generation for complex event processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (pp. 47-58). ACM.
55. Turchin, Yulia, Avigdor Gal, and Segev Wasserkrug. "Tuning complex event processing rules using the prediction-correction paradigm." In Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, p. 10. ACM, 2009.
56. Mutschler, C., & Philippsen, M. (2012). Learning event detection rules with noise hidden Markov models. In AHS (pp. 159-166).
57. Axelsson, S. (2000). Intrusion detection systems: A survey and taxonomy (Vol. 99). Technical report.
58. A selected set of attributes for a sample of cars manufactured within 1970 to 1982. [ONLINE] Available at: http://web.pdx.edu/~gerbing/data/cars.csv. [Last Accessed 5 Nov. 2014].
59. Fisher, R. A. (1935). The design of experiments.

Vivarana literature survey

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Vivarana literature survey

Similar to Vivarana literature survey (20)

Recently uploaded

Recently uploaded (20)

Vivarana literature survey