SlideShare a Scribd company logo
1 of 4
Download to read offline

Abstract—The intention of this paper is to present a model
to apply predictive data mining and Associative rule mining to
the ever changing spatio-temporal data. As the data is highly
varying, there is immense research importance to the models
that handle the highly variable spatio-temporal data
efficiently. The main issues are the management of large
volume of data and the amount of calculations to be done to
deal with such a data. We assume a general problem involving
n points in a u-dimensional space where each point having a
feature vector of v-dimensions denoting it’s properties.
I. INTRODUCTION
ssociactive rule mining is the discovery of association
rules showing attribute value conditions that occur
frequently together in a given data. The associative rule is
represented as A=>B which means that all the tuples in the
database that satisfy the conditions in A are likely to satisfy
the conditions in B where A and B are set of attributes.
A=A1^A2^A3^…Am and B=B1^B2^B3^….Bn For example
age( X,”4..9” ) => buys( X,”chocolate” ) ^ plays( X,”cricket” )
(support=3%,confidence =50%) means that among all the
people under study, 3% people are of age between 4 and 9.
And it is 50% probable that people of this age buy
chocolates and play cricket. Predictive data mining analyses
current and historical facts to determine the fate of future
event. We may also need predict the class to which the given
object belongs and estimate its value depending upon the
past experiences. is a template for Microsoft Word versions
6.0 or later.
This work was supported in as a part of semester project by the Indian
Institute Of Information Technology-Allahabad. We were under the
consistent mentorship of Dr.O.P.Vyas (DAAD Fellow (Technical
University of Kaiserslautern - Germany) AOTS Fellow (CICC-
Japan) )
Polisetti Vinay is a under graduate student of Indian Institute of
Information Technology pursuing in the Information Technology
stream (e-mail:vinaychitti@gmail.com, iit2007106@iiita.ac.in,
phone: 979-350-6423)
P.Sai Krishna Reddy is a under graduate student of Indian Institute
of Information Technology pursuing in the Information Technology
stream(e-mail:saikrishna.3490@gmail.com, phone: 979-523-5237)
Anirudh Perugu is a under graduate student of Indian Institute of
Information Technology pursuing in the Information Technology
stream (e-mail:akoolstud@gmail.com, iit2007060@iiita.ac.in, phone:
903-206-1867)
II. PROBLEM ADRESED
Suppose we have “n” points (nodes) in a plane of u-
dimensional space. Let each point be having a feature vector
of v-dimensions. As the spatial properties of points are their
location in space and the temporal co-ordinates are the
feature vector. With the progress of time, the points are
changing their positions in d-dimension plane along with
their feature vector. We need to find a mechanismto mine the
data which is varying both in spatial and temporal co-
ordinates.
III. REPRESENTATION OF DATA
Suppose we denote the spatial co-ordinate of any point as a
vector  udddd ,...,, 321 and the feature vector of any
point is represented as  vtttt ,...,, 321 .As all these values
are real numbers we maintain only a single vector of (u+v)
dimensions for each point called combined vector.In the
combined vector first u co-ordinates are spatial and next v
co-ordinates are temporal (features).We also maintain a
check board on which all the points are mapped. Let the
number of points be
k2
2 (i,e n=
k2
2 where k  2) .We can
map the points to the check board of size(
k
2 *
k
2 ).
Suppose if we have 16 points in plane we can then map it
with check board of size (4*4) as in fig1. At each block we
store the combined vector of corresponding node.
Fig1 showing a model check board for 16 points
N1 N2 N3 N4
N5 N6 N7 N8
N9 N10 N11 N12
N13 N14 N15 N16
Now we go on mapping like this from instant to instant and
obtain the snapshots of data from time to time. So we get
different snapshots of check boards at different times.
IV. MANIPULATION OF DATA
We now convert the check boards of above step into a
matrix called motley matrixwhich is the data structure used in
Predictive &Association rule Mining of Spatio-Temporal data
Polisetti Vinay, P Sai Krishna Reddy, Anirudh Perugu,Dr.O.P.Vyas
Indian Institute of Information Technology-Allahabad
(iit2007106, iit2007081, iec2007048)@iiita.ac.in , dropvyas@gmail.com
A
mining the spatio-temporal data. We use the following
notation to produce motley matrix from any two consecutive
snapshots of check board. We replace each block of check
board with a proper color to obtain motley matrix. We denote
the points changing only spatially with blue color, the points
which change only temporal co-ordinates with green color,
the points which change both temporally and spatially with
red color and the points which do not change with white
color. For example from time 0T (snapshot of first check
board) to 1T (snapshot of second check board) if the node
changes only spatially but not temporally we replace it with
blue color and we do so for all the points. We finally obtain a
colorful check board obtained by replacing each node with
corresponding color depending on the changes occurred in
that node (point) in transition from 0T to 1T . We now divide
that colorful check board into four equal halves and each half
into further into four more equal halves and so on recursively
until we get color in the each boxto be same. The figure thus
obtained is called motley matrix.
Fig 2 showing the model of a possible motley matrix[2]
In the above figure we can observe that the figure was
divided recursively into 4 equal halves until we get same
color in the each and every block. Fromtime 0T to 1T we get a
check board and thus a motley matrix and similarly between
1T to 2T we get another check board and thus another
motley matrix. This process continues producing a motley
matrixper instant. We use this motley matrixto mine our data.
V. CONSTRUCTION OF OPTIONAL QUAD TREE
We now need a efficient data structure to emulate the
motley matrix of above section. We use Multi Version
optional quad tree (MVOQT) to represent the motley matrix
and we also use the following convention to convert motley
matrixto a tree that can be traversed in )log(n time.
Fig 3 showing the convention used to construct MVOQT
1 2
3 4
The root of the tree depicts the entire undivided motley
matrix. As we divide the motley matrixinto four equal halves,
we add four children to the root in the tree in the same order
as that is shown in fig 3. If each equal half of the matrix is
further divided then we add children to the corresponding
node of the tree. The tree for above motley matrix would be
as follows.
Fig 3 showing optional quad tree for motley matrixof fig 2 [2]
After the construction of quad tree we use it in many ways to
answer the queries. Infact we need an effective data structure
to represent the motley matrix and the requirement was that
we should be able to traverse the entire matrix in minimum
amount of time possible. So the effective data structure
which comes to mind is tree with considerable branching
factor (4 in case of MVOQT). We use this motley matrixand
MVOQT to find the patterns in the given data. We intend to
do classification, association rule mining and also predictive
data mining to answer the most probable queries.
VI. CLASSIFICATION RULE MINING
We now traverse the tree to mine the data. While building
the motley matrix we were checking whether the point has
changed spatially or temporally or both or not at all. That can
be done in the following way. Generate two randomnumbers
1r and 2r where ( ur  10 ) and ( vuru  21 )
recursively. If ( ))()!( 1
1
1
1
1 rNrN ii TT  then we assume that
point
1
N has changed spatially from iT to 1iT . Continuing
the similar operations we can get to know about all the points
which changed spatially or temporally or both or not at all.
We indirectly have got some sort of classification here. Now
we can have an idea about what points are changing
spatially at what instant. We can analyze patterns fromthere
on knowing what points are changing at what rate. This
method will be very efficient because if we control the time
interval between two consecutive snapshots, very fewer
changes occur spatially and they can be tracked easily.
VII. ASSOCIATIVE RULE MINING
After building the MVOQT we ship the data to four linearly
accessable data structures (say a vector) as follows
Fig 4 showing four linearly accessable data structures
4T
3T
2T
1T
0T
4(a) Spatial Vector 4(b) Temporal Vector
4(c) Spatio-temporal Vector 4(d) Null vector
Once we build the MVOQT we ship the data to
corresponding linear data structure from instant to instant.
We store all the points which changed only spatially from
0T to 1T in spatial vector with label as 0T .At any instant (say
from iT to 1iT ) points which change only spatially are
stored in spatial vector under the label iT .Similarly we fill all
the linear vectors similarly. For example for two continuous
instances of spatial vector say instants 0T and 1T each entry
in the spatial vector will be as follows.
Fig 5 showing the entries in spatial vector
5(a) showing the entry in spatial vector at 0T
5(b) showing the entry in spatial vector at 1T
Now we can clearly observe five or six instances like this of
spatial vector (and all other vectors too) and frame rules to
mine the data. From the above two instances we can make
out that whenever both node 3 and node 4 changes spatially
and if node 4 doubles its co-ordinates < 21,dd > then node 3
also changes spatially and doubles only the co-ordinates
< 21,dd > and rest all don‟t change. Put the associative rule
),,(),,( 213214  ddNNodeddNNode .We can
also observe that when node 4 doesn‟t change its d3, d4 and
d5 co-ordinate values so does node 3.We can put this as
),,!,(),,!,( 54335434  dddNNodedddNNode
Thus as the time passes we can keep on building the
MVOQTs and from there on we can go on fill the entries of
respective vectors. We can start finding the patterns from
respective vectors. The main advantage of this MVOTQ is
we are able to reach a group of nodes having similar
properties in just one go. We can compare different motley
matrices and can also predict the class to which given node
4T
3T
2T
1T
0T
4T
3T
2T
1T
0T
4T
3T
2T
1T
0T
belongs to. For example if in motley matrix 1 the node
1
N is
green and in motley matrix2 it is also green and further if it is
also green in motley matrix 3 then we can predict that node
1
N will be green in 4th
motley matrix with the probability of
0.75.Similarly by analyzing the particular spatial or temporal
co-ordinate values fromdifferent motley matrices we can also
predict the value of a particular spatial or temporal co-
ordinate.
VIII. CONCLUSION
We conclude from this paper that b using these three data
structures motley matrix followed by Multi Version Optional
Quad Tree (MVOTQ) and further shipping the data to any
linearly accessable data structure such as a vector or array of
arrays we can effectively reduce the amount of manipulations
needed to reach a node of specified property.
ACKNOWLEDGMENT
Polisetti Vinay finally thank our Prof.Dr.O.P.Vyas for his
consistent encouragement and for motivating us right from
the beginning in all the aspects. I also thank Mr.Dheeraj
Tyagi for his whole hearted help throughout in presenting a
wonderful paper.
REFERENCES
[1]Multi version Linear Quadtree for Spatio-Temporal Data
by Theodoros Tzouramanis, Michael Vassilakopoulos, and
Yannis Manolopoulos Data Engineering Lab Department of
Informatics, Aristotle University
[2] Modeling Highly Variable Spatio-Temporal Data T-S. Yeh B. de
Cambray PRISM Laboratory (CNRS, Universities of Paris VI
and Versailles-St Quentin) & GDR Cassini
[3] M. J. Egenhofer. What's special about spatial?
Database requirements for vehicle navigation in geographic
space. In ACM SIGMOD, pages 398-402, May 1993
[4] Continuous Query Processing of Spatio-temporal Data
Streams in place
Mohamed F. Mokbel Xiaopeng Xiong Moustafa A. Hammad
Walid G. Aref_ Department of Computer Sciences, Purdue
University, West Lafayette, IN 47907-1398
[5] Data Mining Conceptions andTechniques
Jiawei Han and Micheline Kamber
[6] Mesrobian, E., Muntz, R., Shek, E., Santos, J.R., Yi, J., Ng, K.,
Chien, S.Y., Mechoso, C., Farrara, J., Stolorz, P. and
Nakamura, H. 1995. „Exploratory data mining and analysis
using CONQUEST‟. In Proc. IEEE Pacific Rim Conference
on Communications, Computers and Signal Processing,
IEEE, New York. 281-286.
[7] Koperski, K. and Han, J. 1995. „Discovery of Spatial
AssociationRules in Geographic Information Databases’. In
Proc. Fourth International Symposium on Large Spatial
Databases, Maine 47-6

More Related Content

What's hot

DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering TypesAshwin Shenoy M
 
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUEEFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUEIJNSA Journal
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD Editor
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data SciencePremier Publishers
 
HW2-1_05.doc
HW2-1_05.docHW2-1_05.doc
HW2-1_05.docbutest
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishtuxette
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
 
Square transposition: an approach to the transposition process in block cipher
Square transposition: an approach to the transposition process in block cipherSquare transposition: an approach to the transposition process in block cipher
Square transposition: an approach to the transposition process in block cipherjournalBEEI
 
Secure Linear Transformation Based Cryptosystem using Dynamic Byte Substitution
Secure Linear Transformation Based Cryptosystem using Dynamic Byte SubstitutionSecure Linear Transformation Based Cryptosystem using Dynamic Byte Substitution
Secure Linear Transformation Based Cryptosystem using Dynamic Byte SubstitutionCSCJournals
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Yueshen Xu
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
One dimensional vector based pattern
One dimensional vector based patternOne dimensional vector based pattern
One dimensional vector based patternijcsit
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingIOSR Journals
 

What's hot (18)

DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
Forecasting Gasonline Price in Vietnam Based on Fuzzy Time Series and Automat...
Forecasting Gasonline Price in Vietnam Based on Fuzzy Time Series and Automat...Forecasting Gasonline Price in Vietnam Based on Fuzzy Time Series and Automat...
Forecasting Gasonline Price in Vietnam Based on Fuzzy Time Series and Automat...
 
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUEEFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 
HW2-1_05.doc
HW2-1_05.docHW2-1_05.doc
HW2-1_05.doc
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
31 (1)
31 (1)31 (1)
31 (1)
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
 
Square transposition: an approach to the transposition process in block cipher
Square transposition: an approach to the transposition process in block cipherSquare transposition: an approach to the transposition process in block cipher
Square transposition: an approach to the transposition process in block cipher
 
Application of Matrices
Application of MatricesApplication of Matrices
Application of Matrices
 
Secure Linear Transformation Based Cryptosystem using Dynamic Byte Substitution
Secure Linear Transformation Based Cryptosystem using Dynamic Byte SubstitutionSecure Linear Transformation Based Cryptosystem using Dynamic Byte Substitution
Secure Linear Transformation Based Cryptosystem using Dynamic Byte Substitution
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
One dimensional vector based pattern
One dimensional vector based patternOne dimensional vector based pattern
One dimensional vector based pattern
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 

Similar to Applying predictive data mining and associative rule mining to spatio-temporal data

Extracting ocean
Extracting oceanExtracting ocean
Extracting oceanes712
 
Application of panel data to the effect of five (5) world development indicat...
Application of panel data to the effect of five (5) world development indicat...Application of panel data to the effect of five (5) world development indicat...
Application of panel data to the effect of five (5) world development indicat...Alexander Decker
 
Heptagonal Fuzzy Numbers by Max Min Method
Heptagonal Fuzzy Numbers by Max Min MethodHeptagonal Fuzzy Numbers by Max Min Method
Heptagonal Fuzzy Numbers by Max Min MethodYogeshIJTSRD
 
Vector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operationsVector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operationsssuser2624f71
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsMason Porter
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_coursePedro Correia
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_coursePedro Correia
 
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Dominic Suciu
 
Statistics Questions to Answer.doc.rtf2Note An Excel Wor.docx
Statistics Questions to Answer.doc.rtf2Note An Excel Wor.docxStatistics Questions to Answer.doc.rtf2Note An Excel Wor.docx
Statistics Questions to Answer.doc.rtf2Note An Excel Wor.docxdessiechisomjj4
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisMason Ziemer
 
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in StocksApplying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in StocksLawrence Takeuchi
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
Finding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By QuadtreeFinding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By Quadtreeiosrjce
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
Mining Regular Patterns in Data Streams Using Vertical Format
Mining Regular Patterns in Data Streams Using Vertical FormatMining Regular Patterns in Data Streams Using Vertical Format
Mining Regular Patterns in Data Streams Using Vertical FormatCSCJournals
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 

Similar to Applying predictive data mining and associative rule mining to spatio-temporal data (20)

Introduction to VTK
Introduction to VTKIntroduction to VTK
Introduction to VTK
 
Extracting ocean
Extracting oceanExtracting ocean
Extracting ocean
 
Application of panel data to the effect of five (5) world development indicat...
Application of panel data to the effect of five (5) world development indicat...Application of panel data to the effect of five (5) world development indicat...
Application of panel data to the effect of five (5) world development indicat...
 
Heptagonal Fuzzy Numbers by Max Min Method
Heptagonal Fuzzy Numbers by Max Min MethodHeptagonal Fuzzy Numbers by Max Min Method
Heptagonal Fuzzy Numbers by Max Min Method
 
Bay's marko chain
Bay's marko chainBay's marko chain
Bay's marko chain
 
Vector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operationsVector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operations
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_course
 
Student_Garden_geostatistics_course
Student_Garden_geostatistics_courseStudent_Garden_geostatistics_course
Student_Garden_geostatistics_course
 
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
 
Statistics Questions to Answer.doc.rtf2Note An Excel Wor.docx
Statistics Questions to Answer.doc.rtf2Note An Excel Wor.docxStatistics Questions to Answer.doc.rtf2Note An Excel Wor.docx
Statistics Questions to Answer.doc.rtf2Note An Excel Wor.docx
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in StocksApplying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
Ca36464468
Ca36464468Ca36464468
Ca36464468
 
Finding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By QuadtreeFinding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By Quadtree
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Mining Regular Patterns in Data Streams Using Vertical Format
Mining Regular Patterns in Data Streams Using Vertical FormatMining Regular Patterns in Data Streams Using Vertical Format
Mining Regular Patterns in Data Streams Using Vertical Format
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 

Applying predictive data mining and associative rule mining to spatio-temporal data

  • 1.  Abstract—The intention of this paper is to present a model to apply predictive data mining and Associative rule mining to the ever changing spatio-temporal data. As the data is highly varying, there is immense research importance to the models that handle the highly variable spatio-temporal data efficiently. The main issues are the management of large volume of data and the amount of calculations to be done to deal with such a data. We assume a general problem involving n points in a u-dimensional space where each point having a feature vector of v-dimensions denoting it’s properties. I. INTRODUCTION ssociactive rule mining is the discovery of association rules showing attribute value conditions that occur frequently together in a given data. The associative rule is represented as A=>B which means that all the tuples in the database that satisfy the conditions in A are likely to satisfy the conditions in B where A and B are set of attributes. A=A1^A2^A3^…Am and B=B1^B2^B3^….Bn For example age( X,”4..9” ) => buys( X,”chocolate” ) ^ plays( X,”cricket” ) (support=3%,confidence =50%) means that among all the people under study, 3% people are of age between 4 and 9. And it is 50% probable that people of this age buy chocolates and play cricket. Predictive data mining analyses current and historical facts to determine the fate of future event. We may also need predict the class to which the given object belongs and estimate its value depending upon the past experiences. is a template for Microsoft Word versions 6.0 or later. This work was supported in as a part of semester project by the Indian Institute Of Information Technology-Allahabad. We were under the consistent mentorship of Dr.O.P.Vyas (DAAD Fellow (Technical University of Kaiserslautern - Germany) AOTS Fellow (CICC- Japan) ) Polisetti Vinay is a under graduate student of Indian Institute of Information Technology pursuing in the Information Technology stream (e-mail:vinaychitti@gmail.com, iit2007106@iiita.ac.in, phone: 979-350-6423) P.Sai Krishna Reddy is a under graduate student of Indian Institute of Information Technology pursuing in the Information Technology stream(e-mail:saikrishna.3490@gmail.com, phone: 979-523-5237) Anirudh Perugu is a under graduate student of Indian Institute of Information Technology pursuing in the Information Technology stream (e-mail:akoolstud@gmail.com, iit2007060@iiita.ac.in, phone: 903-206-1867) II. PROBLEM ADRESED Suppose we have “n” points (nodes) in a plane of u- dimensional space. Let each point be having a feature vector of v-dimensions. As the spatial properties of points are their location in space and the temporal co-ordinates are the feature vector. With the progress of time, the points are changing their positions in d-dimension plane along with their feature vector. We need to find a mechanismto mine the data which is varying both in spatial and temporal co- ordinates. III. REPRESENTATION OF DATA Suppose we denote the spatial co-ordinate of any point as a vector  udddd ,...,, 321 and the feature vector of any point is represented as  vtttt ,...,, 321 .As all these values are real numbers we maintain only a single vector of (u+v) dimensions for each point called combined vector.In the combined vector first u co-ordinates are spatial and next v co-ordinates are temporal (features).We also maintain a check board on which all the points are mapped. Let the number of points be k2 2 (i,e n= k2 2 where k  2) .We can map the points to the check board of size( k 2 * k 2 ). Suppose if we have 16 points in plane we can then map it with check board of size (4*4) as in fig1. At each block we store the combined vector of corresponding node. Fig1 showing a model check board for 16 points N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 N16 Now we go on mapping like this from instant to instant and obtain the snapshots of data from time to time. So we get different snapshots of check boards at different times. IV. MANIPULATION OF DATA We now convert the check boards of above step into a matrix called motley matrixwhich is the data structure used in Predictive &Association rule Mining of Spatio-Temporal data Polisetti Vinay, P Sai Krishna Reddy, Anirudh Perugu,Dr.O.P.Vyas Indian Institute of Information Technology-Allahabad (iit2007106, iit2007081, iec2007048)@iiita.ac.in , dropvyas@gmail.com A
  • 2. mining the spatio-temporal data. We use the following notation to produce motley matrix from any two consecutive snapshots of check board. We replace each block of check board with a proper color to obtain motley matrix. We denote the points changing only spatially with blue color, the points which change only temporal co-ordinates with green color, the points which change both temporally and spatially with red color and the points which do not change with white color. For example from time 0T (snapshot of first check board) to 1T (snapshot of second check board) if the node changes only spatially but not temporally we replace it with blue color and we do so for all the points. We finally obtain a colorful check board obtained by replacing each node with corresponding color depending on the changes occurred in that node (point) in transition from 0T to 1T . We now divide that colorful check board into four equal halves and each half into further into four more equal halves and so on recursively until we get color in the each boxto be same. The figure thus obtained is called motley matrix. Fig 2 showing the model of a possible motley matrix[2] In the above figure we can observe that the figure was divided recursively into 4 equal halves until we get same color in the each and every block. Fromtime 0T to 1T we get a check board and thus a motley matrix and similarly between 1T to 2T we get another check board and thus another motley matrix. This process continues producing a motley matrixper instant. We use this motley matrixto mine our data. V. CONSTRUCTION OF OPTIONAL QUAD TREE We now need a efficient data structure to emulate the motley matrix of above section. We use Multi Version optional quad tree (MVOQT) to represent the motley matrix and we also use the following convention to convert motley matrixto a tree that can be traversed in )log(n time. Fig 3 showing the convention used to construct MVOQT 1 2 3 4 The root of the tree depicts the entire undivided motley matrix. As we divide the motley matrixinto four equal halves, we add four children to the root in the tree in the same order as that is shown in fig 3. If each equal half of the matrix is further divided then we add children to the corresponding node of the tree. The tree for above motley matrix would be as follows. Fig 3 showing optional quad tree for motley matrixof fig 2 [2] After the construction of quad tree we use it in many ways to answer the queries. Infact we need an effective data structure to represent the motley matrix and the requirement was that we should be able to traverse the entire matrix in minimum amount of time possible. So the effective data structure which comes to mind is tree with considerable branching factor (4 in case of MVOQT). We use this motley matrixand MVOQT to find the patterns in the given data. We intend to do classification, association rule mining and also predictive data mining to answer the most probable queries. VI. CLASSIFICATION RULE MINING We now traverse the tree to mine the data. While building
  • 3. the motley matrix we were checking whether the point has changed spatially or temporally or both or not at all. That can be done in the following way. Generate two randomnumbers 1r and 2r where ( ur  10 ) and ( vuru  21 ) recursively. If ( ))()!( 1 1 1 1 1 rNrN ii TT  then we assume that point 1 N has changed spatially from iT to 1iT . Continuing the similar operations we can get to know about all the points which changed spatially or temporally or both or not at all. We indirectly have got some sort of classification here. Now we can have an idea about what points are changing spatially at what instant. We can analyze patterns fromthere on knowing what points are changing at what rate. This method will be very efficient because if we control the time interval between two consecutive snapshots, very fewer changes occur spatially and they can be tracked easily. VII. ASSOCIATIVE RULE MINING After building the MVOQT we ship the data to four linearly accessable data structures (say a vector) as follows Fig 4 showing four linearly accessable data structures 4T 3T 2T 1T 0T 4(a) Spatial Vector 4(b) Temporal Vector 4(c) Spatio-temporal Vector 4(d) Null vector Once we build the MVOQT we ship the data to corresponding linear data structure from instant to instant. We store all the points which changed only spatially from 0T to 1T in spatial vector with label as 0T .At any instant (say from iT to 1iT ) points which change only spatially are stored in spatial vector under the label iT .Similarly we fill all the linear vectors similarly. For example for two continuous instances of spatial vector say instants 0T and 1T each entry in the spatial vector will be as follows. Fig 5 showing the entries in spatial vector 5(a) showing the entry in spatial vector at 0T 5(b) showing the entry in spatial vector at 1T Now we can clearly observe five or six instances like this of spatial vector (and all other vectors too) and frame rules to mine the data. From the above two instances we can make out that whenever both node 3 and node 4 changes spatially and if node 4 doubles its co-ordinates < 21,dd > then node 3 also changes spatially and doubles only the co-ordinates < 21,dd > and rest all don‟t change. Put the associative rule ),,(),,( 213214  ddNNodeddNNode .We can also observe that when node 4 doesn‟t change its d3, d4 and d5 co-ordinate values so does node 3.We can put this as ),,!,(),,!,( 54335434  dddNNodedddNNode Thus as the time passes we can keep on building the MVOQTs and from there on we can go on fill the entries of respective vectors. We can start finding the patterns from respective vectors. The main advantage of this MVOTQ is we are able to reach a group of nodes having similar properties in just one go. We can compare different motley matrices and can also predict the class to which given node 4T 3T 2T 1T 0T 4T 3T 2T 1T 0T 4T 3T 2T 1T 0T
  • 4. belongs to. For example if in motley matrix 1 the node 1 N is green and in motley matrix2 it is also green and further if it is also green in motley matrix 3 then we can predict that node 1 N will be green in 4th motley matrix with the probability of 0.75.Similarly by analyzing the particular spatial or temporal co-ordinate values fromdifferent motley matrices we can also predict the value of a particular spatial or temporal co- ordinate. VIII. CONCLUSION We conclude from this paper that b using these three data structures motley matrix followed by Multi Version Optional Quad Tree (MVOTQ) and further shipping the data to any linearly accessable data structure such as a vector or array of arrays we can effectively reduce the amount of manipulations needed to reach a node of specified property. ACKNOWLEDGMENT Polisetti Vinay finally thank our Prof.Dr.O.P.Vyas for his consistent encouragement and for motivating us right from the beginning in all the aspects. I also thank Mr.Dheeraj Tyagi for his whole hearted help throughout in presenting a wonderful paper. REFERENCES [1]Multi version Linear Quadtree for Spatio-Temporal Data by Theodoros Tzouramanis, Michael Vassilakopoulos, and Yannis Manolopoulos Data Engineering Lab Department of Informatics, Aristotle University [2] Modeling Highly Variable Spatio-Temporal Data T-S. Yeh B. de Cambray PRISM Laboratory (CNRS, Universities of Paris VI and Versailles-St Quentin) & GDR Cassini [3] M. J. Egenhofer. What's special about spatial? Database requirements for vehicle navigation in geographic space. In ACM SIGMOD, pages 398-402, May 1993 [4] Continuous Query Processing of Spatio-temporal Data Streams in place Mohamed F. Mokbel Xiaopeng Xiong Moustafa A. Hammad Walid G. Aref_ Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-1398 [5] Data Mining Conceptions andTechniques Jiawei Han and Micheline Kamber [6] Mesrobian, E., Muntz, R., Shek, E., Santos, J.R., Yi, J., Ng, K., Chien, S.Y., Mechoso, C., Farrara, J., Stolorz, P. and Nakamura, H. 1995. „Exploratory data mining and analysis using CONQUEST‟. In Proc. IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, IEEE, New York. 281-286. [7] Koperski, K. and Han, J. 1995. „Discovery of Spatial AssociationRules in Geographic Information Databases’. In Proc. Fourth International Symposium on Large Spatial Databases, Maine 47-6