SlideShare a Scribd company logo
UNSUPERVISED LEARNING: SIMILARITIES AND
DISTANCE FUNCTIONS FOR IOT DATA
G. CASOLLA, V. SCHIANO, S. CUOMO, F. GIAMPAOLO, F. PICCIALLI
University of Naples Federico II
INTRODUCTION (1)
In our research we studied visitors’ behaviours inside one of the most
important museum in Italy, the National Archaeological Museum of
Naples, through a deployed IoT framework. It is composed by smart
sensor boards with Bluetooth and Wi-Fi capabilities. The IoT system
is able to track a visitor path by collecting his position and the related
time-of-stay inside the museum rooms [1]. Our dataset relies on about
19 000 unique users behavioural data composed by two features: (i) the
visiting Path (non-numerical data) and (ii) the Time spent (numerical
data).
Classifying unstructured and unlabelled data is a key challenge dealing
with three main research tasks:
• the study and selection of the appropriate similarity and distance
functions,
• the selection of the number of clusters in which partitioning the
data,
• the choice of the most suitable algorithm to achieve an accurate
data clustering.
SIMILARITIES AND DISTANCE FUNCTIONS (3)
(a) - Cosine similarity
scos(S, T; q) =
v(S; q) · v(T; q)
v(S; q) 2 v(T; q) 2
(b) - Jaccard similarity
sJac(S, T; q) =
|Q(S; q) ∩ Q(T; q)|
|Q(S; q) ∪ Q(T; q)|
(c) - Longest Common Substring
dLCS(S, T) =
LCS(n, m)
n + m
where
LCS(i, j) =



max(i, j) if min(i, j) = 0,
LCS(i − 1, j − 1) if S(i) = T(j),
1 + min {LCS(i − 1, j), LCS(i, j − 1)} otherwise
(d) - Generalized Levenshtein distance
dLv(S, T) =
Lv(n, m)
max{n, m}
where
Lv(i, j) =



max(i, j) if min(i, j) = 0,
min



Lv(i − 1, j) + 1
Lv(i, j − 1) + 1
Lv(i − 1, j − 1) + 1(Si=Tj )
otherwise.
(a) (b)
(c) (d)
More in detail, the q-grams based distances
are calculated from q-grams based similari-
ties with the formula:
dx(S, T) = 1 − sx(S, T).
COMPARISON (4)
An ordered F-measure heatmap representing the comparison among all the experimented
clustering methodologies.
THEORY (2)
The Clustering algorithms we considered
are:
• Hierarchical Clustering
• K-Medoids (PAM)
A similarity function is a tool that gives the
strength of the relationship between two or
more data items.
To deal with the non-numerical Path fea-
ture of our dataset, that can be treated as a
string, we have analyzed four strings simi-
larity functions. A string can be defined as
sequence of finite characters from a finite al-
phabet. A q-gram is a string with q consecu-
tive characters. The q-grams from a string S
are realized by collecting in sequences q char-
acters from S and saving the appearing q-
grams. To deal with the numerical Time fea-
ture we have pre-computed the distance ma-
trix by using the well-known Euclidean dis-
tance and then merged with the other strings’
distances with the formula defined as fol-
lows:
Dsum =
K
i=1
ωiD(i)
where D(i)
is a single distance matrix, K is
the total number of distances that we want to
merge and ωi is the inverse of the maximum
element of the i-th distance matrix D(i)
. It’s
shown that, to improve the performance of a
classification, it is better to combine multiple
distances than to use a single distance matrix
[2].
REFERENCES (5)
[1] F. Piccialli, Y. Yoshimura, P. Benedusi, C. Ratti, and
S. Cuomo, Lessons learned from longitudinal mod-
eling of mobile-equipped visitors in a complex mu-
seum. Neural Comp. and App., pp. 1-17, 2019
[2] A. Ibba, R. Duin, and W.-J. Lee, A study on com-
bining sets of differently measured dissimilarities.
ICPR’10, pp. 3360-3363, 2010

More Related Content

What's hot

Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
tuxette
 
Trajectory Mining Icdm 2009
Trajectory Mining Icdm 2009Trajectory Mining Icdm 2009
Trajectory Mining Icdm 2009
Chang Cheng
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
tuxette
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
tuxette
 
Fundamental matrix
Fundamental matrixFundamental matrix
Fundamental matrix
Su Yan-Jen
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
tuxette
 
YOLACT
YOLACTYOLACT
Multidimensional access methods
Multidimensional access methodsMultidimensional access methods
Multidimensional access methodsunyil96
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
tuxette
 
Heterogeneous data fusion with multiple kernel growing self organizing maps
Heterogeneous data fusion with multiple kernel growing self organizing mapsHeterogeneous data fusion with multiple kernel growing self organizing maps
Heterogeneous data fusion with multiple kernel growing self organizing maps
Pruthuvi Maheshakya Wijewardena
 
A spatio-temporal visual analysis tool for historical dictionaries.
A spatio-temporal visual analysis tool for historical dictionaries. A spatio-temporal visual analysis tool for historical dictionaries.
A spatio-temporal visual analysis tool for historical dictionaries.
Technological Ecosystems for Enhancing Multiculturality
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
tuxette
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
Houw Liong The
 
An Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes ImagesAn Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes Images
Reham Marzouk
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesKasun Gajasinghe
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
Ken Kuroki
 
HW2-1_05.doc
HW2-1_05.docHW2-1_05.doc
HW2-1_05.docbutest
 
Parcel Lot Division with cGAN
Parcel Lot Division with cGANParcel Lot Division with cGAN
Parcel Lot Division with cGAN
Matthew To
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
tuxette
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talk
Dmitrii Ignatov
 

What's hot (20)

Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Trajectory Mining Icdm 2009
Trajectory Mining Icdm 2009Trajectory Mining Icdm 2009
Trajectory Mining Icdm 2009
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Fundamental matrix
Fundamental matrixFundamental matrix
Fundamental matrix
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
YOLACT
YOLACTYOLACT
YOLACT
 
Multidimensional access methods
Multidimensional access methodsMultidimensional access methods
Multidimensional access methods
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
Heterogeneous data fusion with multiple kernel growing self organizing maps
Heterogeneous data fusion with multiple kernel growing self organizing mapsHeterogeneous data fusion with multiple kernel growing self organizing maps
Heterogeneous data fusion with multiple kernel growing self organizing maps
 
A spatio-temporal visual analysis tool for historical dictionaries.
A spatio-temporal visual analysis tool for historical dictionaries. A spatio-temporal visual analysis tool for historical dictionaries.
A spatio-temporal visual analysis tool for historical dictionaries.
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
An Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes ImagesAn Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes Images
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
HW2-1_05.doc
HW2-1_05.docHW2-1_05.doc
HW2-1_05.doc
 
Parcel Lot Division with cGAN
Parcel Lot Division with cGANParcel Lot Division with cGAN
Parcel Lot Division with cGAN
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talk
 

Similar to Unsupervised Learning: Similarities and distance functions for IoT data

Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
ijsrd.com
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
lauratoni4
 
A probabilistic data encryption scheme (pdes)
A probabilistic data encryption scheme (pdes)A probabilistic data encryption scheme (pdes)
A probabilistic data encryption scheme (pdes)
Alexander Decker
 
From Signal to Symbols
From Signal to SymbolsFrom Signal to Symbols
From Signal to Symbols
gpano
 
ME Synopsis
ME SynopsisME Synopsis
ME Synopsis
Poonam Debnath
 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RL
lauratoni4
 
Detection of Dense, Overlapping, Geometric Objects
Detection of Dense, Overlapping, Geometric Objects Detection of Dense, Overlapping, Geometric Objects
Detection of Dense, Overlapping, Geometric Objects
gerogepatton
 
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSDETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
gerogepatton
 
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSDETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
ijaia
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingIAEME Publication
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognition
Saibee Alam
 
T24144148
T24144148T24144148
T24144148
IJERA Editor
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
University of Salerno
 
Ew4301904907
Ew4301904907Ew4301904907
Ew4301904907
IJERA Editor
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
IJMER
 
Taxis.key
Taxis.keyTaxis.key
Taxis.key
Oleguer Sagarra
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
Data Structures unit I Introduction - data types
Data Structures unit I Introduction - data typesData Structures unit I Introduction - data types
Data Structures unit I Introduction - data types
AmirthaVarshini80
 
A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...
journalBEEI
 

Similar to Unsupervised Learning: Similarities and distance functions for IoT data (20)

Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
 
A probabilistic data encryption scheme (pdes)
A probabilistic data encryption scheme (pdes)A probabilistic data encryption scheme (pdes)
A probabilistic data encryption scheme (pdes)
 
From Signal to Symbols
From Signal to SymbolsFrom Signal to Symbols
From Signal to Symbols
 
ME Synopsis
ME SynopsisME Synopsis
ME Synopsis
 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RL
 
Detection of Dense, Overlapping, Geometric Objects
Detection of Dense, Overlapping, Geometric Objects Detection of Dense, Overlapping, Geometric Objects
Detection of Dense, Overlapping, Geometric Objects
 
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSDETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
 
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSDETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognition
 
T24144148
T24144148T24144148
T24144148
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
Ew4301904907
Ew4301904907Ew4301904907
Ew4301904907
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
Taxis.key
Taxis.keyTaxis.key
Taxis.key
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Data Structures unit I Introduction - data types
Data Structures unit I Introduction - data typesData Structures unit I Introduction - data types
Data Structures unit I Introduction - data types
 
A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

Unsupervised Learning: Similarities and distance functions for IoT data

  • 1. UNSUPERVISED LEARNING: SIMILARITIES AND DISTANCE FUNCTIONS FOR IOT DATA G. CASOLLA, V. SCHIANO, S. CUOMO, F. GIAMPAOLO, F. PICCIALLI University of Naples Federico II INTRODUCTION (1) In our research we studied visitors’ behaviours inside one of the most important museum in Italy, the National Archaeological Museum of Naples, through a deployed IoT framework. It is composed by smart sensor boards with Bluetooth and Wi-Fi capabilities. The IoT system is able to track a visitor path by collecting his position and the related time-of-stay inside the museum rooms [1]. Our dataset relies on about 19 000 unique users behavioural data composed by two features: (i) the visiting Path (non-numerical data) and (ii) the Time spent (numerical data). Classifying unstructured and unlabelled data is a key challenge dealing with three main research tasks: • the study and selection of the appropriate similarity and distance functions, • the selection of the number of clusters in which partitioning the data, • the choice of the most suitable algorithm to achieve an accurate data clustering. SIMILARITIES AND DISTANCE FUNCTIONS (3) (a) - Cosine similarity scos(S, T; q) = v(S; q) · v(T; q) v(S; q) 2 v(T; q) 2 (b) - Jaccard similarity sJac(S, T; q) = |Q(S; q) ∩ Q(T; q)| |Q(S; q) ∪ Q(T; q)| (c) - Longest Common Substring dLCS(S, T) = LCS(n, m) n + m where LCS(i, j) =    max(i, j) if min(i, j) = 0, LCS(i − 1, j − 1) if S(i) = T(j), 1 + min {LCS(i − 1, j), LCS(i, j − 1)} otherwise (d) - Generalized Levenshtein distance dLv(S, T) = Lv(n, m) max{n, m} where Lv(i, j) =    max(i, j) if min(i, j) = 0, min    Lv(i − 1, j) + 1 Lv(i, j − 1) + 1 Lv(i − 1, j − 1) + 1(Si=Tj ) otherwise. (a) (b) (c) (d) More in detail, the q-grams based distances are calculated from q-grams based similari- ties with the formula: dx(S, T) = 1 − sx(S, T). COMPARISON (4) An ordered F-measure heatmap representing the comparison among all the experimented clustering methodologies. THEORY (2) The Clustering algorithms we considered are: • Hierarchical Clustering • K-Medoids (PAM) A similarity function is a tool that gives the strength of the relationship between two or more data items. To deal with the non-numerical Path fea- ture of our dataset, that can be treated as a string, we have analyzed four strings simi- larity functions. A string can be defined as sequence of finite characters from a finite al- phabet. A q-gram is a string with q consecu- tive characters. The q-grams from a string S are realized by collecting in sequences q char- acters from S and saving the appearing q- grams. To deal with the numerical Time fea- ture we have pre-computed the distance ma- trix by using the well-known Euclidean dis- tance and then merged with the other strings’ distances with the formula defined as fol- lows: Dsum = K i=1 ωiD(i) where D(i) is a single distance matrix, K is the total number of distances that we want to merge and ωi is the inverse of the maximum element of the i-th distance matrix D(i) . It’s shown that, to improve the performance of a classification, it is better to combine multiple distances than to use a single distance matrix [2]. REFERENCES (5) [1] F. Piccialli, Y. Yoshimura, P. Benedusi, C. Ratti, and S. Cuomo, Lessons learned from longitudinal mod- eling of mobile-equipped visitors in a complex mu- seum. Neural Comp. and App., pp. 1-17, 2019 [2] A. Ibba, R. Duin, and W.-J. Lee, A study on com- bining sets of differently measured dissimilarities. ICPR’10, pp. 3360-3363, 2010