SlideShare a Scribd company logo
1 of 2
Download to read offline
A Methodology for Classifying Visitors to an Amusement Park
VAST Challenge 2015 - Mini-Challenge 1
Gustavo Dejean *, Departamento de Computacion
Universidad Nacional del Oeste - San Antonio de Padua, Pcia. de Buenos Aires - Argentina.
ABSTRACT
The main contribution of this work is showing how to obtain
a classification of visitors to an amusement park by using
cluster analysis and visualization techniques. The selection
of variables for K-means algorithm and the results obtained
are visually analyzed in dispersion graphs according to their
Principal Components, in boxplots and in a Linear Model
so as to fine-tune a result that can explain differences
between the groups and similarities within each group.
Keywords: visual analysis, vast challenge, data mining,
clustering, K-means, cluster analysis.
1 INTRODUCTION
Frequently, clustering algorithms identity clusters that do not exist
in reality [I] or trivial clusters that are not relevant for
researcher's goals. That is why, it is important to be able to
visualize results. This is attained both with 2D and 3D diagrams
with their first Principal Components (PC), as well as with other
types of diagrams.
2 THEORY
2.1 Material
Three archives of moves were used, plus a fourth one that
completed Friday's missing records. Overall, they contain
26,021,945 records. Each record accounts for a move or a check­
in from a visitor at the park on a specific day and time. Archives
are included in [2]. Tools that were used include: Postgresql 8.4
database, IBM SPSS Statics v22; JMP II (SW), and Tableau 8.2.
2.2 Methods
The process used for reaching to a final cluster may be divided
into four stages, that were executed concurrently with different
intensities, according to the progress of the work. These four
stages were: I) Creation of new variables; 2) Creation of models
and visualization; 3) Principal Components (PC); 4) K-means
and cluster visualization.
2.2.1 Creation of New Variables
Approximately 64 new variables were created. Some of them
were instrumental for developing a Linear Model that enabled
visualizing and understanding the data and outcomes that were
obtained. Table 1 describes the main variables that were created.
2.2.2 Creation of Models and Visualization
The Linear Model (LM) exhibited in Figure 1 was instrumental
for visualizing the outcome of the final clustering.
* email: dejean2010@gmail.com
IEEE Conference on Visual Analytics Science and Technology 2015
October 25-30, Chicago, II, USA
978-1-4673-9783-4/15/$31.00 ©2015 IEEE
The LM shows a linear correlation between the number of moves
vs. the amount of check-ins in the games (for every ID), yielding
R2 = 0,848. An interesting observation of the LM is that, in
general, each point represents a group of visitors. That is, each
group of persons entering the park through the same door and at
the same time was represented in the LM by points that were
overlapping or very close to one another. It will always be
interesting visualizing any possible outcome from a clustering
process by using this Model.
Figure 1: Linear Model. counCol movement vs.
counCof
_
check-in.
151
152
By observing the outlier points in the LM, 39 IDs with zero
check-in in the games were identified. These IDs were grouped in
Cluster 0; and they were not taken into account when applying the
K-means algorithm nor in any other LM tunings, nor in the
analysis of Pc. In Cluster 0, this would be represented mainly by
the park employees. The LM without outliers remained with a R2
=0,854.
2.2.3 Principal Components
In the case of the study, as the dimensionality of the problem was
greater than three, the PC were used for reducing it and, thus, for
visualizing the outcomes in 2D or 3D graphs, by using the first
two or three PC.
2.2.4 K-means and cluster visualization
An important point in this stage is determining which is the
appropriate amount for visitor classification. The CCC criterion
was used: Sarle's Cubic Clustering Criterion [3] taking the first
peak with CCC >3. The other important point is selecting the
variables for the input of the K-means algorithm that may yield a
better grouping.
3 RESULTS
With the described procedure, for a subset of variables, a K = 6
was obtained. Thus, in the final result, 7 clusters are obtained.
(N.B. Bear in mind that cluster 0 had been created), see figure 2.
Figure 2: Distribution of each Cluster, Cluster a is not shown in
the histogram.
The interpretation of each group was done by using boxplots, as
shown in Figures 3. For example, when comparing the cluster, it
was discovered that cluster 1 and cluster 3 show preferences for
Thill Rides games and rejection for Kidder Rides. Similarly, with
the other variables, in general, relevant differences were obtained
between the clusters, thus accounting for the classification that
was obtained.
Another interesting visualization of the results is shown in
Figure 4, where the LM is used for visualizing each cluster. It
should be noted, for example, that it is easy to read and detect
which clusters have compulsive players and which have very low
profile players, among other features. Overall, each cluster has a
different straight line tangent. The number of groups in each
cluster was roughly obtained with a SQL query grouped per time
of entry and per entry point.
Figure 3: Preferences for Thill Rides and rejection for Kidder
Rides for cluster 1 and 3
Figure 4: Visualization of Clusters in the LM Rides
4 CONCLUSION
In the resulting classification, it was possible to verify the
existence of similarities between the visitors of each cluster and
the differences between different clusters, mainly using boxplots,
but also leveraging interpretation with the LM visualization.
ACKNOWLEDGMENTS
To the authorities of Universidad Nacional del Oeste and
specially to Mg Antonio Fotti and Lie, Ariel Aizemberg for
enabling the research task.
REFERENCES
[1] Johnson Dallas E., Metodos Multivariados Aplicados al
analisis de Datos, page 321. Mexico, International Thomson
Editores, 2000.
[2] http://www.vacommunity.org/2015+VAST+Challenge
%3A+MCI
[3] Sarle, W.S., The CuvicClustering Criterion, SAS Technical
Report A-I08, Cary NC: SAS Institute, Inc., 1983.

More Related Content

What's hot

IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...IRJET Journal
 
Spme 2013 segmentation
Spme 2013 segmentationSpme 2013 segmentation
Spme 2013 segmentationQujiang Lei
 
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]Mumbai B.Sc.IT Study
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...MLAI2
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
facility layout paper
 facility layout paper facility layout paper
facility layout paperSaurabh Tiwary
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansWrishin Bhattacharya
 
A New Key Stream Generator Based on 3D Henon map and 3D Cat map
A New Key Stream Generator Based on 3D Henon map and 3D Cat mapA New Key Stream Generator Based on 3D Henon map and 3D Cat map
A New Key Stream Generator Based on 3D Henon map and 3D Cat maptayseer Karam alshekly
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsMLAI2
 
Ijcga1
Ijcga1Ijcga1
Ijcga1ijcga
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7
 
Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.Nagesh NARASIMHA PRASAD
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin Rshanelynn
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...IOSR Journals
 

What's hot (18)

D044042432
D044042432D044042432
D044042432
 
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
 
Spme 2013 segmentation
Spme 2013 segmentationSpme 2013 segmentation
Spme 2013 segmentation
 
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
 
Isam2_v1_2
Isam2_v1_2Isam2_v1_2
Isam2_v1_2
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
facility layout paper
 facility layout paper facility layout paper
facility layout paper
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c means
 
A New Key Stream Generator Based on 3D Henon map and 3D Cat map
A New Key Stream Generator Based on 3D Henon map and 3D Cat mapA New Key Stream Generator Based on 3D Henon map and 3D Cat map
A New Key Stream Generator Based on 3D Henon map and 3D Cat map
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with Hypergraphs
 
Ijcga1
Ijcga1Ijcga1
Ijcga1
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.Analysis of CANADAIR CL-215 retractable landing gear.
Analysis of CANADAIR CL-215 retractable landing gear.
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
 

Similar to A Methodology for Classifying Visitors to an Amusement Park VAST Challenge 2015 - Mini-Challenge 1

IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET Journal
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
 
Application of pattern recognition techniques in banknote classification
Application of pattern recognition techniques in banknote classificationApplication of pattern recognition techniques in banknote classification
Application of pattern recognition techniques in banknote classificationNikos Mouzakitis
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2IAEME Publication
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataKathleneNgo
 
IRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image ProcessingIRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image ProcessingIRJET Journal
 
Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...IJECEIAES
 
Control chart pattern recognition using k mica clustering and neural networks
Control chart pattern recognition using k mica clustering and neural networksControl chart pattern recognition using k mica clustering and neural networks
Control chart pattern recognition using k mica clustering and neural networksISA Interchange
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...sipij
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATION
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATIONFUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATION
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATIONijcsit
 
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...IRJET Journal
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansEditor IJCATR
 
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...IOSR Journals
 
IMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUESIMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUESIRJET Journal
 
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesClustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesIRJET Journal
 

Similar to A Methodology for Classifying Visitors to an Amusement Park VAST Challenge 2015 - Mini-Challenge 1 (20)

IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
Application of pattern recognition techniques in banknote classification
Application of pattern recognition techniques in banknote classificationApplication of pattern recognition techniques in banknote classification
Application of pattern recognition techniques in banknote classification
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
IRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image ProcessingIRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image Processing
 
Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...Proposed algorithm for image classification using regression-based pre-proces...
Proposed algorithm for image classification using regression-based pre-proces...
 
Control chart pattern recognition using k mica clustering and neural networks
Control chart pattern recognition using k mica clustering and neural networksControl chart pattern recognition using k mica clustering and neural networks
Control chart pattern recognition using k mica clustering and neural networks
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
FREE- REFERENCE IMAGE QUALITY ASSESSMENT FRAMEWORK USING METRICS FUSION AND D...
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATION
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATIONFUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATION
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATION
 
Vivarana literature survey
Vivarana literature surveyVivarana literature survey
Vivarana literature survey
 
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bact...
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K Means
 
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
 
IMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUESIMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUES
 
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesClustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining Techniques
 

Recently uploaded

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Delhi Call girls
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 

Recently uploaded (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 

A Methodology for Classifying Visitors to an Amusement Park VAST Challenge 2015 - Mini-Challenge 1

  • 1. A Methodology for Classifying Visitors to an Amusement Park VAST Challenge 2015 - Mini-Challenge 1 Gustavo Dejean *, Departamento de Computacion Universidad Nacional del Oeste - San Antonio de Padua, Pcia. de Buenos Aires - Argentina. ABSTRACT The main contribution of this work is showing how to obtain a classification of visitors to an amusement park by using cluster analysis and visualization techniques. The selection of variables for K-means algorithm and the results obtained are visually analyzed in dispersion graphs according to their Principal Components, in boxplots and in a Linear Model so as to fine-tune a result that can explain differences between the groups and similarities within each group. Keywords: visual analysis, vast challenge, data mining, clustering, K-means, cluster analysis. 1 INTRODUCTION Frequently, clustering algorithms identity clusters that do not exist in reality [I] or trivial clusters that are not relevant for researcher's goals. That is why, it is important to be able to visualize results. This is attained both with 2D and 3D diagrams with their first Principal Components (PC), as well as with other types of diagrams. 2 THEORY 2.1 Material Three archives of moves were used, plus a fourth one that completed Friday's missing records. Overall, they contain 26,021,945 records. Each record accounts for a move or a check­ in from a visitor at the park on a specific day and time. Archives are included in [2]. Tools that were used include: Postgresql 8.4 database, IBM SPSS Statics v22; JMP II (SW), and Tableau 8.2. 2.2 Methods The process used for reaching to a final cluster may be divided into four stages, that were executed concurrently with different intensities, according to the progress of the work. These four stages were: I) Creation of new variables; 2) Creation of models and visualization; 3) Principal Components (PC); 4) K-means and cluster visualization. 2.2.1 Creation of New Variables Approximately 64 new variables were created. Some of them were instrumental for developing a Linear Model that enabled visualizing and understanding the data and outcomes that were obtained. Table 1 describes the main variables that were created. 2.2.2 Creation of Models and Visualization The Linear Model (LM) exhibited in Figure 1 was instrumental for visualizing the outcome of the final clustering. * email: dejean2010@gmail.com IEEE Conference on Visual Analytics Science and Technology 2015 October 25-30, Chicago, II, USA 978-1-4673-9783-4/15/$31.00 ©2015 IEEE The LM shows a linear correlation between the number of moves vs. the amount of check-ins in the games (for every ID), yielding R2 = 0,848. An interesting observation of the LM is that, in general, each point represents a group of visitors. That is, each group of persons entering the park through the same door and at the same time was represented in the LM by points that were overlapping or very close to one another. It will always be interesting visualizing any possible outcome from a clustering process by using this Model. Figure 1: Linear Model. counCol movement vs. counCof _ check-in. 151
  • 2. 152 By observing the outlier points in the LM, 39 IDs with zero check-in in the games were identified. These IDs were grouped in Cluster 0; and they were not taken into account when applying the K-means algorithm nor in any other LM tunings, nor in the analysis of Pc. In Cluster 0, this would be represented mainly by the park employees. The LM without outliers remained with a R2 =0,854. 2.2.3 Principal Components In the case of the study, as the dimensionality of the problem was greater than three, the PC were used for reducing it and, thus, for visualizing the outcomes in 2D or 3D graphs, by using the first two or three PC. 2.2.4 K-means and cluster visualization An important point in this stage is determining which is the appropriate amount for visitor classification. The CCC criterion was used: Sarle's Cubic Clustering Criterion [3] taking the first peak with CCC >3. The other important point is selecting the variables for the input of the K-means algorithm that may yield a better grouping. 3 RESULTS With the described procedure, for a subset of variables, a K = 6 was obtained. Thus, in the final result, 7 clusters are obtained. (N.B. Bear in mind that cluster 0 had been created), see figure 2. Figure 2: Distribution of each Cluster, Cluster a is not shown in the histogram. The interpretation of each group was done by using boxplots, as shown in Figures 3. For example, when comparing the cluster, it was discovered that cluster 1 and cluster 3 show preferences for Thill Rides games and rejection for Kidder Rides. Similarly, with the other variables, in general, relevant differences were obtained between the clusters, thus accounting for the classification that was obtained. Another interesting visualization of the results is shown in Figure 4, where the LM is used for visualizing each cluster. It should be noted, for example, that it is easy to read and detect which clusters have compulsive players and which have very low profile players, among other features. Overall, each cluster has a different straight line tangent. The number of groups in each cluster was roughly obtained with a SQL query grouped per time of entry and per entry point. Figure 3: Preferences for Thill Rides and rejection for Kidder Rides for cluster 1 and 3 Figure 4: Visualization of Clusters in the LM Rides 4 CONCLUSION In the resulting classification, it was possible to verify the existence of similarities between the visitors of each cluster and the differences between different clusters, mainly using boxplots, but also leveraging interpretation with the LM visualization. ACKNOWLEDGMENTS To the authorities of Universidad Nacional del Oeste and specially to Mg Antonio Fotti and Lie, Ariel Aizemberg for enabling the research task. REFERENCES [1] Johnson Dallas E., Metodos Multivariados Aplicados al analisis de Datos, page 321. Mexico, International Thomson Editores, 2000. [2] http://www.vacommunity.org/2015+VAST+Challenge %3A+MCI [3] Sarle, W.S., The CuvicClustering Criterion, SAS Technical Report A-I08, Cary NC: SAS Institute, Inc., 1983.