This document describes a methodology for creating dynamic crowding maps using mobile phone data to estimate population exposure during floods. It involves the following steps:
1) Applying Histogram of Oriented Gradients (HOG) to reduce high-dimensional mobile phone user data and cluster similar days.
2) Functionally clustering daily density profiles (DDP) of mobile phone users over time to group days with similar patterns.
3) Estimating total population exposed ("city users") by spatially matching mobile phone and census data to correct for market share.
4) Visualizing representative daily profiles for clusters using functional box plots of DDP trends over time.
The method is applied to a case study area
Human activity spatio-temporal indicators using mobile phone dataUniversity of Salerno
In the context of Smart Cities, monitoring the dynamic of the presence of people is a crucial aspect for the well-being of an urban area. We use mobile phone data as a proxy for the total number of people (Carpita & Simonetto 2014), with the specific aim of computing spatio-temporal region specific indicators. Telecom Italia Mobile (TIM), which is the largest operator in Italy, thanks to a research agreement with the Statistical Office of the Municipality of Brescia, provided to us about two years (April 2014 to June 2016) of High-Frequency Daily Mobile Phone Density Profiles (DMPDPs) in the form of a regular grid polygon each 15 minutes. Densities have to be rescaled in order to express the total amount of people rather than just TIM users. Separately
for selected regions in the province of Brescia, characterized by being either working or residential areas, we group similar DMPDPs and we characterize groups by their spatial and temporal components. In doing so, we propose a mixed-approach procedure.
Here we describe federated learning based traffic flow prediction system. In federated learning we solve the problem of data security and also provide collaborative learning. model parameter are shared here ,not data
Interpretability Evaluation of Annual Mosaic Image of MTB Model for Land Cove...TELKOMNIKA JOURNAL
To verify whether the annual mosaic image of MTB model is acceptable for further digital
analysis, it is necessary to evaluate the visual interpretability. The MTB model is an effort to integrate
multi-scene and multi-temporal data, to obtain a minimum cloud cover mosaic image in locations that are
often covered by clouds and haze. This study is to evaluate the interpretability of the annual mosaic image
for analysis of the land cover changes. The data used are the images of 2015, 2016, and 2017 covers a
part of central Sumatra. Visual interpretations with a series of steps are used, starting with identification of
the objects using interpretation keys, followed by spectral band correlations, scattergram analysis, and
ended by consistency assessment. The consistency assessment step is performed to determine the level
of clearness and easiness of the object recognition in the annual mosaic images. The results showed that
the most optimal spectral bands used for RGB combinations for visual interpretation were Band SWIR-1,
Band NIR, and Band Red. Based on the evaluation results, the annual mosaic image o f MTB model
performed the consistent results of the clearness objects and the easiness of the object recognition. Thus
the annual mosaic image of MTB model of 0.02x0.02 degree tile is acceptable for further digital processing
as well as digital land cover analysis.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In the context of Smart cities, local institutions face the increasing need for monitoring the
dynamic of the flow of people’s presences inside urban areas in order to plan the improvement
and the maintaining of the urban infrastructure. Rectangular grid polygons reporting the density
of people using mobile phone (Carpita, Simonetto, 2014) are source of very large data. Telecom
Italia Mobile (TIM), which is currently the largest operator in Italy in this sector, thanks to a
research agreement with the Statistical Office of the Municipality of Brescia, provided to us
about two years (April 2014 to June 2016, n ' 700) of Daily Mobile Phone Density Profiles
(DMPDPs) for the Province of Brescia in the form of a regular grid of 923 x 607 cells each 15
minutes.
In order to find regularities and detect anomalies in the flow of people’s presences, this
work aims to cluster similar DMPDPs, where each DMPDP is characterized by both the 2-D
spatial component (i.e. 923 x 607 dimensions, one for each cell of the grid) and by the temporal
component (i.e. each cell has repeated values in time, for a total of 96 daily dimensions per cell).
So, while each DMPDP counts for p ' 50 millions (923 x 607 x 96) of space-time dimensions,
time and economic constraints prevent us from having a longer time series of DMPDPs. In
this terms, to group DMPDPs configures as an High Dimensional Low Sample Size (HDLSS)
problem, since p n.
We propose a mixed-approach procedure that we apply to the city of Brescia. First, borrowing
the method of the Histogram of Oriented Gradients (HOG) from the Image Clustering
discipline (Tomasi, 2012), we perform a reduction of the DMPDPs dimensionality computing
their features extractions. In doing so, we perform some tuning on the HOG parameters in order
to reduce as much as possible the DMPDPs dimensionality while preserving as much as possible
the information contained in the extracted features. With this approach we preserve both the
spatial and the temporal components of the DMPDPs. Then, using the HOG features extractions,
we group DMPDPs by applying - and by testing the feasibility of - different clustering
approaches for large data
This project is to retrieve the similar geographic images from the dataset based on the features extracted.
Retrieval is the process of collecting the relevant images from the dataset which contains more number of
images. Initially the preprocessing step is performed in order to remove noise occurred in input image with
the help of Gaussian filter. As the second step, Gray Level Co-occurrence Matrix (GLCM), Scale Invariant
Feature Transform (SIFT), and Moment Invariant Feature algorithms are implemented to extract the
features from the images. After this process, the relevant geographic images are retrieved from the dataset
by using Euclidean distance. In this, the dataset consists of totally 40 images. From that the images which
are all related to the input image are retrieved by using Euclidean distance. The approach of SIFT is to
perform reliable recognition, it is important that the feature extracted from the training image be
detectable even under changes in image scale, noise and illumination. The GLCM calculates how often a
pixel with gray level value occurs. While the focus is on image retrieval, our project is effectively used in
the applications such as detection and classification.
Human activity spatio-temporal indicators using mobile phone dataUniversity of Salerno
In the context of Smart Cities, monitoring the dynamic of the presence of people is a crucial aspect for the well-being of an urban area. We use mobile phone data as a proxy for the total number of people (Carpita & Simonetto 2014), with the specific aim of computing spatio-temporal region specific indicators. Telecom Italia Mobile (TIM), which is the largest operator in Italy, thanks to a research agreement with the Statistical Office of the Municipality of Brescia, provided to us about two years (April 2014 to June 2016) of High-Frequency Daily Mobile Phone Density Profiles (DMPDPs) in the form of a regular grid polygon each 15 minutes. Densities have to be rescaled in order to express the total amount of people rather than just TIM users. Separately
for selected regions in the province of Brescia, characterized by being either working or residential areas, we group similar DMPDPs and we characterize groups by their spatial and temporal components. In doing so, we propose a mixed-approach procedure.
Here we describe federated learning based traffic flow prediction system. In federated learning we solve the problem of data security and also provide collaborative learning. model parameter are shared here ,not data
Interpretability Evaluation of Annual Mosaic Image of MTB Model for Land Cove...TELKOMNIKA JOURNAL
To verify whether the annual mosaic image of MTB model is acceptable for further digital
analysis, it is necessary to evaluate the visual interpretability. The MTB model is an effort to integrate
multi-scene and multi-temporal data, to obtain a minimum cloud cover mosaic image in locations that are
often covered by clouds and haze. This study is to evaluate the interpretability of the annual mosaic image
for analysis of the land cover changes. The data used are the images of 2015, 2016, and 2017 covers a
part of central Sumatra. Visual interpretations with a series of steps are used, starting with identification of
the objects using interpretation keys, followed by spectral band correlations, scattergram analysis, and
ended by consistency assessment. The consistency assessment step is performed to determine the level
of clearness and easiness of the object recognition in the annual mosaic images. The results showed that
the most optimal spectral bands used for RGB combinations for visual interpretation were Band SWIR-1,
Band NIR, and Band Red. Based on the evaluation results, the annual mosaic image o f MTB model
performed the consistent results of the clearness objects and the easiness of the object recognition. Thus
the annual mosaic image of MTB model of 0.02x0.02 degree tile is acceptable for further digital processing
as well as digital land cover analysis.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In the context of Smart cities, local institutions face the increasing need for monitoring the
dynamic of the flow of people’s presences inside urban areas in order to plan the improvement
and the maintaining of the urban infrastructure. Rectangular grid polygons reporting the density
of people using mobile phone (Carpita, Simonetto, 2014) are source of very large data. Telecom
Italia Mobile (TIM), which is currently the largest operator in Italy in this sector, thanks to a
research agreement with the Statistical Office of the Municipality of Brescia, provided to us
about two years (April 2014 to June 2016, n ' 700) of Daily Mobile Phone Density Profiles
(DMPDPs) for the Province of Brescia in the form of a regular grid of 923 x 607 cells each 15
minutes.
In order to find regularities and detect anomalies in the flow of people’s presences, this
work aims to cluster similar DMPDPs, where each DMPDP is characterized by both the 2-D
spatial component (i.e. 923 x 607 dimensions, one for each cell of the grid) and by the temporal
component (i.e. each cell has repeated values in time, for a total of 96 daily dimensions per cell).
So, while each DMPDP counts for p ' 50 millions (923 x 607 x 96) of space-time dimensions,
time and economic constraints prevent us from having a longer time series of DMPDPs. In
this terms, to group DMPDPs configures as an High Dimensional Low Sample Size (HDLSS)
problem, since p n.
We propose a mixed-approach procedure that we apply to the city of Brescia. First, borrowing
the method of the Histogram of Oriented Gradients (HOG) from the Image Clustering
discipline (Tomasi, 2012), we perform a reduction of the DMPDPs dimensionality computing
their features extractions. In doing so, we perform some tuning on the HOG parameters in order
to reduce as much as possible the DMPDPs dimensionality while preserving as much as possible
the information contained in the extracted features. With this approach we preserve both the
spatial and the temporal components of the DMPDPs. Then, using the HOG features extractions,
we group DMPDPs by applying - and by testing the feasibility of - different clustering
approaches for large data
This project is to retrieve the similar geographic images from the dataset based on the features extracted.
Retrieval is the process of collecting the relevant images from the dataset which contains more number of
images. Initially the preprocessing step is performed in order to remove noise occurred in input image with
the help of Gaussian filter. As the second step, Gray Level Co-occurrence Matrix (GLCM), Scale Invariant
Feature Transform (SIFT), and Moment Invariant Feature algorithms are implemented to extract the
features from the images. After this process, the relevant geographic images are retrieved from the dataset
by using Euclidean distance. In this, the dataset consists of totally 40 images. From that the images which
are all related to the input image are retrieved by using Euclidean distance. The approach of SIFT is to
perform reliable recognition, it is important that the feature extracted from the training image be
detectable even under changes in image scale, noise and illumination. The GLCM calculates how often a
pixel with gray level value occurs. While the focus is on image retrieval, our project is effectively used in
the applications such as detection and classification.
30.03.2017 Data Science Meetup - USER JOURNEY ANALYSIS, BETWEEN BUDGET ALLOCA...Zalando adtech lab
Prof. Dr. Burkhardt Funk from Leuphana University held this presentation on "User Journey Analysis, Between Budget Allocation and Micro Decisions" on the DATA SCIENCE MEETUP HAMBURG in the Zalando adtech lab Office on 30th March 2017
We illustrate unsupervised and supervised learning algorithms that accurately classify the lithological variations in the 3D seismic data. We demonstrate blind source separation techniques such as the principal components (PCA) and noise adjusted principal
components in conjunction with Kohonen Self organizing maps to produce superior unsupervised classification maps.
Further, we utilize the PCA space training in Maximum likelihood (ML) supervised classification. Results demonstrate that the ML supervised classification produces an improved classification of the facies in the 3D seismic dataset from the Anadarko basin in central Oklahoma.
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"Fwdays
During this talk, I’d be talking about how 3d data can be processed with Deep Learning models. The main focus would be on Point Clouds.
Session agenda:
What are 3D data and its representation
Overview of libraries to visualize and process
How to collect. Cameras. Calibration
The current state of the art for point cloud processing with Deep Learning models.
classification problem. Models to use
segmentation problem. Models to use
datasets. Losses and training routine
Point clouds correspondences
spectral methods to generate correspondences
Limitations.
20210225_ロボティクス勉強会_パーティクルフィルタのMAP推定の高速手法「FAST-MAP」を作ってみたMori Ken
Particle filters (PFs) are used for the discrete approximation of dynamic and non-Gaussian probability distributions using numerous particles. Maximum a posteriori (MAP) estimation, which is a point estimation method to extract a unique state value from the probability distributions formed by a PF, functions appropriately against multimodal distributions. However, MAP entails an enormous calculation cost. Therefore, we propose a method to perform MAP estimation with a low calculation cost by compressing the information configured by PF, using adaptive vector quantization. For MAP estimation with 900 particles, the proposed method reduced the computational cost by approximately 96% compared to the conventional method and maintained the same estimation accuracy during the simulation.
The variational Gaussian process (VGP), a Bayesian nonparametric model which adapts its shape to match com- plex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Modelling traffic flows with gravity models and mobile phone large dataUniversity of Salerno
The analysis of origin-destination traffic flows is useful in many contexts of application
as urban planning and tourism economics, and have been commonly studied through the
Gravity Model, which in its simplest formulation states that flows are proportional to masses
of both origin and destination and inversely proportional to distance between them. Using data
from the flow of mobile phone signals among different areas recorded on hourly basis for several
months, in this study we use the Gravity Model to characterize the dynamic of such flows
over the time in the strongly urbanized and flood-prone area of the Mandolossa (western outskirts
of Brescia, northern Italy), with the final aim of predicting the traffic flow during flood
episodes. In order to better account for the dynamic of flows over time, we introduce in the
model a most accurate set of explanatory variables: (i) the density of mobile phone users by
area and time period and (ii) some appropriate temporal effects. Preliminary results show that
the joint use of these two novel sets of explanatory variables allow us to obtain a better linear
fitting of the Gravity Model and a better traffic flow prediction for the flood risk evaluation.
Individual movements and geographical data mining. Clustering algorithms for ...Beniamino Murgante
Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.
Giuseppe Borruso, Gabriella Schoier - University of Trieste
Edge detection algorithm based on quantum superposition principle and photons...IJECEIAES
The detection of object edges in images is a crucial step employed in a vast amount of computer vision applications, for which a series of different algorithms has been developed in the last decades. This paper proposes a new edge detection method based on quantum information, which is achieved in two main steps: (i) an image enhancement stage that employs the quantum superposition law and (ii) an edge detection stage based on the probability of photon arrival to the camera sensor. The proposed method has been tested on synthetic and real images devoted to agriculture applications, where Fram & Deutsh criterion has been adopted to evaluate its performance. The results show that the proposed method gives better results in terms of detection quality and computation time compared to classical edge detection algorithms such as Sobel, Kayyali, Canny and a more recent algorithm based on Shannon entropy.
Laurent Etienne's presentation at Geomatics Atlantic 2012 (www.geomaticsatlantic.com) in Halifax, June 2012. More session details at http://lanyrd.com/2012/geomaticsatlantic2012/stbgx/ .
30.03.2017 Data Science Meetup - USER JOURNEY ANALYSIS, BETWEEN BUDGET ALLOCA...Zalando adtech lab
Prof. Dr. Burkhardt Funk from Leuphana University held this presentation on "User Journey Analysis, Between Budget Allocation and Micro Decisions" on the DATA SCIENCE MEETUP HAMBURG in the Zalando adtech lab Office on 30th March 2017
We illustrate unsupervised and supervised learning algorithms that accurately classify the lithological variations in the 3D seismic data. We demonstrate blind source separation techniques such as the principal components (PCA) and noise adjusted principal
components in conjunction with Kohonen Self organizing maps to produce superior unsupervised classification maps.
Further, we utilize the PCA space training in Maximum likelihood (ML) supervised classification. Results demonstrate that the ML supervised classification produces an improved classification of the facies in the 3D seismic dataset from the Anadarko basin in central Oklahoma.
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"Fwdays
During this talk, I’d be talking about how 3d data can be processed with Deep Learning models. The main focus would be on Point Clouds.
Session agenda:
What are 3D data and its representation
Overview of libraries to visualize and process
How to collect. Cameras. Calibration
The current state of the art for point cloud processing with Deep Learning models.
classification problem. Models to use
segmentation problem. Models to use
datasets. Losses and training routine
Point clouds correspondences
spectral methods to generate correspondences
Limitations.
20210225_ロボティクス勉強会_パーティクルフィルタのMAP推定の高速手法「FAST-MAP」を作ってみたMori Ken
Particle filters (PFs) are used for the discrete approximation of dynamic and non-Gaussian probability distributions using numerous particles. Maximum a posteriori (MAP) estimation, which is a point estimation method to extract a unique state value from the probability distributions formed by a PF, functions appropriately against multimodal distributions. However, MAP entails an enormous calculation cost. Therefore, we propose a method to perform MAP estimation with a low calculation cost by compressing the information configured by PF, using adaptive vector quantization. For MAP estimation with 900 particles, the proposed method reduced the computational cost by approximately 96% compared to the conventional method and maintained the same estimation accuracy during the simulation.
The variational Gaussian process (VGP), a Bayesian nonparametric model which adapts its shape to match com- plex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Modelling traffic flows with gravity models and mobile phone large dataUniversity of Salerno
The analysis of origin-destination traffic flows is useful in many contexts of application
as urban planning and tourism economics, and have been commonly studied through the
Gravity Model, which in its simplest formulation states that flows are proportional to masses
of both origin and destination and inversely proportional to distance between them. Using data
from the flow of mobile phone signals among different areas recorded on hourly basis for several
months, in this study we use the Gravity Model to characterize the dynamic of such flows
over the time in the strongly urbanized and flood-prone area of the Mandolossa (western outskirts
of Brescia, northern Italy), with the final aim of predicting the traffic flow during flood
episodes. In order to better account for the dynamic of flows over time, we introduce in the
model a most accurate set of explanatory variables: (i) the density of mobile phone users by
area and time period and (ii) some appropriate temporal effects. Preliminary results show that
the joint use of these two novel sets of explanatory variables allow us to obtain a better linear
fitting of the Gravity Model and a better traffic flow prediction for the flood risk evaluation.
Individual movements and geographical data mining. Clustering algorithms for ...Beniamino Murgante
Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.
Giuseppe Borruso, Gabriella Schoier - University of Trieste
Edge detection algorithm based on quantum superposition principle and photons...IJECEIAES
The detection of object edges in images is a crucial step employed in a vast amount of computer vision applications, for which a series of different algorithms has been developed in the last decades. This paper proposes a new edge detection method based on quantum information, which is achieved in two main steps: (i) an image enhancement stage that employs the quantum superposition law and (ii) an edge detection stage based on the probability of photon arrival to the camera sensor. The proposed method has been tested on synthetic and real images devoted to agriculture applications, where Fram & Deutsh criterion has been adopted to evaluate its performance. The results show that the proposed method gives better results in terms of detection quality and computation time compared to classical edge detection algorithms such as Sobel, Kayyali, Canny and a more recent algorithm based on Shannon entropy.
Laurent Etienne's presentation at Geomatics Atlantic 2012 (www.geomaticsatlantic.com) in Halifax, June 2012. More session details at http://lanyrd.com/2012/geomaticsatlantic2012/stbgx/ .
A strategy for the matching of mobile phone signals with census dataUniversity of Salerno
Administrative data allows us to count for the number of residents. The geo-localization of people by mobile phone, by quantifying the number of people at a given moment in time, enriches the amount of useful information for “smart”
(cities) evaluations. However, using Telecom Italia Mobile (TIM) data, we are able to characterize the spatio-temporal dynamic of the presences in the city of just TIM users. A strategy to estimate total presences is needed. In this paper we propose a
strategy to extrapolate the number of total people by using TIM data only. To do so, we apply a spatial record linkage of mobile phone data with administrative archives using the number of residents at the level of “sezione di censimento”.
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...ijsrd.com
Moving Object Databases (MOD), although ubiquitous, still call for methods that will be able to understand, search, analyze, and browse their spatiotemporal content. In this paper, we propose a method for trajectory segmentation and sampling based on the representativeness of the (sub) trajectories in the MOD. In order to find the most representative sub trajectories, the following methodology is proposed. First, a novel global voting algorithm is performed, based on local density and trajectory similarity information. This method is applied for each segment of the trajectory, forming a local trajectory descriptor that represents line segment representativeness. The sequence of this descriptor over a trajectory gives the voting signal of the trajectory, where high values correspond to the most representative parts. Then, a novel segmentation algorithm is applied on this signal that automatically estimates the number of partitions and the partition borders, identifying homogenous partitions concerning their representativeness. Finally, a sampling method over the resulting segments yields the most representative sub trajectories in the MOD. Our experimental results in synthetic and real MOD verify the effectiveness of the proposed scheme, also in comparison with other sampling techniques.
Towards an Affordable GIS for Analysing Public Transport Mobility DataBenito Zaragozí
Slides of a presentation at the GISTAM 2020 conference (streaming event). It's a proposal for naming the outputs of querying a database containing public transportation data from smart cards.
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONcsandit
In this paper, we present and analyze different approaches implemented here to resolve pedestrian detection problem. Histograms of Oriented Laplacian (HOL) is a descriptor of
characteristic, it aims to highlight objects in digital images, Discrete Cosine Transform DCT with its two version global (GDCT) and local (LDCT), it changes image's pixel into frequencies coefficients and then we use them as a characteristics in the process. We implemented
independently these methods and tried to combine it and used there outputs in a classifier, the new generated classifier has proved it efficiency in certain cases. The performance of those
methods and their combination is tested on most popular Dataset in pedestrian detection, which
are INRIA and Daimler.
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONcscpconf
In this paper, we present and analyze different approaches implemented here to resolve pedestrian detection problem. Histograms of Oriented Laplacian (HOL) is a descriptor of characteristic, it aims to highlight objects in digital images, Discrete Cosine Transform DCT
with its two version global (GDCT) and local (LDCT), it changes image's pixel into frequencies coefficients and then we use them as characteristics in the process. We implemented independently these methods and tried to combine it and used their outputs in a classifier, the newly generated classifier has proved its efficiency in certain cases. The performance of those methods and their combination is tested on most popular Dataset in pedestrian detection, which is INRIA and Daimler.
Presentation of a geotagging approach for social media content with a refined language modelling approach. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
Geotagging Social Media Content with a Refined Language Modelling ApproachSymeon Papadopoulos
Presentation of a geotagging approach for social media content with a refined language modelling approach. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Gloria Re Calegari
Prediction of expensive datasets starting from a set of cheap heterogeneous information sources in smart city scenarios.
Prediction of the population and land use of Milano starting from data about Points Of Interest and phone activity.
MODELLING DYNAMIC PATTERNS USING MOBILE DATAcscpconf
Understanding, modeling and simulating human mobility among urban regions is very challengeable effort. It is very important in rescue situations for many kinds of events, either in the indoor events like evacuation of buildings or outdoor ones like public assemblies, community evacuation, in exigency situations there are several incidents could be happened, the overcrowding causes injuries and death cases, which are emerged during emergency situations, as well as it serves urban planning and smart cities. The aim of this study is to explore the characteristics of human mobility patterns, and model themmathematically depending on inter-event time and traveled distances (displacements) parameters by using CDRs (Call Detailed Records) during Armada festival in France. However, the results of the numerical simulation endorse the other studies findings in that the most of real systems patterns are almost follows an exponential distribution. In the future the mobility patterns could be classified according (work or off) days, and the radius of gyration could be considered as effective parameter in modelling human mobility.
Modelling dynamic patterns using mobile datacsandit
Understanding, modeling and simulating human mobility among urban regions is very challengeable
effort. It is very important in rescue situations for many kinds of events, either in the indoor events like
evacuation of buildings or outdoor ones like public assemblies, community evacuation, in exigency
situations there are several incidents could be happened, the overcrowding causes injuries and death
cases, which are emerged during emergency situations, as well as it serves urban planning and smart
cities. The aim of this study is to explore the characteristics of human mobility patterns, and model them
mathematically depending on inter-event time and traveled distances (displacements) parameters by using
CDRs (Call Detailed Records) during Armada festival in France. However, the results of the numerical
simulation endorse the other studies findings in that the most of real systems patterns are almost follows an
exponential distribution. In the future the mobility patterns could be classified according (work or off)
days, and the radius of gyration could be considered as effective parameter in modelling human mobility
PREDICTION OF STORM DISASTER USING CLOUD MAP-REDUCE METHODAM Publications
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information the patterns, associations, or relationships among all this data can provide information. Spatial Data Mining (SDM) is the part of data mining. It is mainly used for finding the figure in data that is related to space. In spatial data mining, to get the result analyst use geographical or spatial information which require some special technique to get geographical data in the appropriate formats. SDM is mainly used in earth science data, crime mapping, census data, traffic data, and cancer cluster (i.e. to investigate environment health hazards). For real time processing, Stream data mining is used for the prediction of storm using spatial dataset with the help of stream data mining strategy i.e. CMR (Cloud Map-Reduce). In this, Stream data mining is presented to detect the storm disaster of coastal area which is located in Central America country and then by taking the dataset of coastal area which are having various regions. There are various regions i.e. Panama, Greater Antilles, Mexico golf which is used to detect the storm. It also detects that, is the region is affected or not, if affected then which area of that region is affected and from this it is helpful to predict the storm before the disaster occur. In this, two parameters are used to test the technique i.e. processing time and computational load and using this parameter compare the previous and proposed technique. The main aim of this is to predict storm disaster before it happen with the help of directionality and velocity of storm.
Topographic Information System as a Tool for Environmental Management, a Case...iosrjce
IOSR Journal of Environmental Science, Toxicology and Food Technology (IOSR-JESTFT) multidisciplinary peer-reviewed Journal with reputable academics and experts as board member. IOSR-JESTFT is designed for the prompt publication of peer-reviewed articles in all areas of subject. The journal articles will be accessed freely online.
A Study on Data Visualization Techniques of Spatio Temporal DataIJMTST Journal
Data visualization is an important tool to analyze complex Spatio Temporal data. The spatio-temporal data
can be visualized using 2D, 3D or any other type of maps. Cartography is the major technique used in
mapping. The data can also be visualized by placing different layers of maps one on other, which is done by
using GIS. Many data visualization techniques are in trend but the usage of the techniques must be decided
by considering the application requirements.
Predictive geospatial analytics using principal component regression IJECEIAES
Nowadays, exponential growth in geospatial or spatial data all over the globe, geospatial data analytics is absolutely deserved to pay attention in manipulating voluminous amount of geodata in various forms increasing with high velocity. In addition, dimensionality reduction has been playing a key role in high-dimensional big data sets including spatial data sets which are continuously growing not only in observations but also in features or dimensions. In this paper, predictive analytics on geospatial big data using Principal Component Regression (PCR), traditional Multiple Linear Regression (MLR) model improved with Principal Component Analysis (PCA), is implemented on distributed, parallel big data processing platform. The main objective of the system is to improve the predictive power of MLR model combined with PCA which reduces insignificant and irrelevant variables or dimensions of that model. Moreover, it is contributed to present how data mining and machine learning approaches can be efficiently utilized in predictive geospatial data analytics. For experimentation, OpenStreetMap (OSM) data is applied to develop a one-way road prediction for city Yangon, Myanmar. Experimental results show that hybrid approach of PCA and MLR can be efficiently utilized not only in road prediction using OSM data but also in improvement of traditional MLR model.
Detecting and classifying moments in basketball matches using sensor tracked ...University of Salerno
Data analytics in sports is crucial to evaluate the performance of single players and the whole team. The literature proposes a number of tools for both offence and defence scenarios. Data coming from tracking location of players, in this respect, may be used to enrich the amount of useful information. In basketball,
however, actions are interleaved with inactive periods. This paper describes a methodological approach to automatically identify active periods during a game and to classify them as offensive or defensive. The method is based on the application
of thresholds to players kinematic parameters, whose values undergo a “tuning” strategy similar to Receiver Operating Characteristic curves, using a “ground truth” extracted from the video of the games
To assess the scoring probability of teams and players in different areas of a court map is an important topic in basketball analytics, in order to define both game strategies and training programmes.
In this contribution we propose a method based on regression trees, aimed to define a partition of the court in rectangles with maximally different scoring probabilities. Each analysed team/player has its/his own partition, so comparisons can be made among different teams/players.
In addition, shooting efficiency measures computed within the rectangles can be used to define spatial scoring performance indicators.
Because of the advent of GPS techniques, a wide range of scientific literature on Sport Science is nowadays devoted to the analysis of players’ movement in relation to team performance in the context of big data analytics. A specific research question regards whether certain patterns of space among players affect team performance, from both an offensive and a defensive perspective. Using a time series of basketball players’ coordinates, we focus on the dynamics of the surface area of the five players on the court with a two-fold purpose: (i) to give tools allowing a detailed description and analysis of a game with respect to surface areas dynamics and (ii) to investigate its influence on the points made by both the team and the opponent. We propose a three-step procedure integrating different statistical modelling approaches. Specifically, we first employ a Markov Switching Model (MSM) to detect structural changes in the surface area. Then, we perform descriptive analyses in order to highlight associations between regimes and relevant game variables. Finally, we assess the relation between the regime probabilities and the scored points by means of Vector Auto Regressive (VAR) models. We carry out the proposed procedure using real data and, in the analyzed case studies, we find that structural changes are strongly associated to offensive and defensive game phases and that there is some association between the surface area dynamics and the points scored by the team and the opponent.
In the domain of Sport Analytics, Global Positioning Systems devices are intensively used as they permit to retrieve players' movements. Team sports' managers and coaches are interested on the relation between players' patterns of movements and team performance, in order to better manage their team. In this paper we propose a Cluster Analysis and Multidimensional Scaling approach to find and describe separate patterns of players movements. Using real data of multiple professional basketball teams, we find, consistently over different case studies, that in the defensive clusters players are close one to another while the transition cluster are characterized by a large space among them. Moreover, we find the pattern of players' positioning that produce the best shooting performance.
In these slides I show how different kind of geo-referenced objects can be processed and manipulated in smart cities' studies.
I also presents a proposal for applying a gravity model to human mobility by replacing masses with phone cells.
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...University of Salerno
A new approach in team sports analysis consists in studying positioning and movements of players during the game in relation to team performance. State of the art tracking systems produce spatio-temporal traces of players that have facilitated a variety of research aimed to extract insights from trajectories. Several methods borrowed from machine learning, network and complex systems, geographic information system, computer vision and statistics have been proposed. After having reviewed the state of the art in those niches of literature aiming to extract useful information to analysts and experts in terms of relation between players' trajectories and team performance, this paper presents preliminary results from analysing trajectories data and sheds light on potential future research in this eld of study. In particular, using convex hulls, we find interesting regularities in players' movement patterns.
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...University of Salerno
Global Positioning Systems (GPS) are nowadays intensively used in Sport Science as they permit to capture the space-time trajectories of players, with the aim to infer useful information to coaches in addition to traditional statistics. In our application to basketball, we used Cluster Analysis in order to split the match in a number of separate time-periods, each identifying homogeneous spatial relations among players in the court. Results allowed us to identify differences in spacing among players, distinguish defensive or offensive actions, analyze transition probabilities from a certain group to another one.
Global Positioning Systems (GPS) are nowadays intensively used in Sport Science as they capture the
trajectories of players and /or the ball, sometimes together with play-by-play recording the time of match
events, with the aim of infer to supply coaches, experts and analysts with useful information in addition to
traditional statistics. To find any regularities and synchronizations in players‘ trajectories, and to study their
relationship with team's performance, however, is a complex task, because of the strong interdependencies
among players in the court and because of external factors that can influence players. To this aim, a variety
of methods has been proposed in Sport Science literature, which borrow from the disciplines of Machine
Learning, Network and Complex Systems, Geographical Information Systems, Computer Vision and Statistics.
In this seminar, with an application to basketball, I propose a methodological approach that can be
generalized to other team sports. I first demonstrate the usefulness of a visual tool approach in order to
extract preliminar insights from trajectories, then, I use data mining techniques such as Cluster Analysis and
Multidimensional Scaling to decompose the game into homogeneous phases in terms of spatial relations.
To conclude, I present specific research questions, such as: i) who is the most influencing player of the team?
ii) how much each player influences the others? iii) how much trajectories are determined by trajectories of
other players and by external factors? where the adoption of methods traditionally used in Spatial Statistics
and Spatial Econometrics could have a potential. In this regard, the seminar is also intended as a `platform”
to launch new research challenges and to search for collaboration
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...University of Salerno
Nonlinear estimation of the gravity model with Poisson/negative binomial methods has become popular to model international trade flows, as it permits a better accounting for large numbers of zero flows. Nevertheless, as trade flows are not independent of each other due to spatial autocorrelation, those methods lead to biased parameter estimates. To overcome this problem, eigenvector spatial filtering variants of the Poisson/Negative binomial specification has been proposed in the literature of gravity modelling of trade. This paper contributes to the literature in two ways. First, by employing a stepwise selection criterion for spatial filters which is based on robust (sandwich) p-values and does not require likelihood-based indicators. In this respect, we develop an ad hoc backward stepwise function in R. Second, using this function, we select a reduced set of spatial filters that properly accounts for importer-side and exporter-side specific spatial effects, both at the count and the logit process. Applying this estimation strategy to a cross-section of bilateral trade flows between a set of worldwide countries for the year 2000, we find that our specification outperforms the benchmark models, in terms of model fitting, both considering the AIC and in predicting zero (and small) flows.
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...University of Salerno
Disentangling the relations between human migrations and water resources is relevant for food security and trade policy in water-scarce countries. It is commonly believed that human migrations are beneficial to the water endowments of origin countries for reducing the pressure on local resources. We show here that such belief is over-simplistic. We reframe the problem by considering the international food trade and the corresponding virtual water fluxes, which quantify the water used for the production of traded agricultural commodities. By means of robust analytical tools, we show that migrants strengthen the commercial links between countries, triggering trade fluxes caused by food consumption habits persisting after migration. Thus migrants significantly increase the virtual water fluxes and the use of water in the countries of origin. The flux ascribable to each migrant, i.e. the “water suitcase”, is found to have increased from 321 m3/y in 1990 to 1367 m3/y in 2010. A comparison with the water footprint of individuals shows that where the water suitcase exceeds the water footprint of inhabitants, migrations turn out to be detrimental to the water endowments of origin countries, challenging the common perception that migrations tend to relieve the pressure on the local (water) resources of origin countries.
''The global virtual water network'' is a FIRB project funded by MIUR which aims at studying the main characteristics and implications of the virtual water flows associated to the international trade of food.
The project has the following main goals:
understanding the global dynamics of virtual water flows;
investigating the international water (and food) trade network;
evaluating impacts and feedbacks for food security;
assessing the vulnerability of the system to crises.
We aim at investigating the complex relationships between climatic, agronomic and socio-economic factors and how they shape the evolution of the worldwide trade of virtual water.
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
2. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
The project
• Ongoing project (’till
06/2022):
This talk describes the works
conducted together with Prof.
Roberto Ranzi and Dr. Matteo
Balistrocchi (Department of Civil,
Environmental, Architectural Engineering and
Mathematics, UNIBS) in the context of
MoSoRe project
Regione Lombardia, Call HUB Research &
Innovation: Infrastrutture e servizi per la
Mobilità Sostenibile e Resiliente -
MoSoRe@UnibsID 1180965 - POR FESR
2014-2020
• Scientific output:
1 Metulini, R., Carpita, M., (2020), A Spatio-Temporal Indicator for
City Users based on Mobile Phone Signals and Administrative Data -
Social Indicator Research, 1-21. DOI: 10.1007/s11205-020-02355-2
2 Balistrocchi, M., Metulini, R., Carpita, M., and Ranzi, R.: Dynamic
maps of people exposure to floods based on mobile phone data.
Natural Hazards and Earth System Sciences, 2020, in press. DOI:
10.5194/nhess-2020-201.
3. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
The context of application
• Floods are natural phenomena whose hazards afflict nearly 20 million
people worldwide (Kellens et al., 2013), posing a serious challenge to
the protection of human lives.
• Urbanization determines dramatic increases in people exposure and
vulnerability to floods, since most of recent urbanizations are
developed in flood prone areas.
• The development of effective emergency management plans are
intended to provide communities with early warnings, reliable
real-time information.
• We provide a detailed and reliable picture of the real-time
spatiotemporal variability of the flood risk by proxying it with
dynamic crowding maps from mobile phone data for
reference groups of days.
4. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Data
• Erlang mobile phone measures (Erlang, 1909): average number of
mobile phone users (MPU) bearing a SIM connected to the network,
recorded at constant time steps with reference to a georeferenced
grid of square cells.
Available for Telecom Ialia Mobile (TIM) in the period from 04/2014
to 08/2016 thanks to a collaboration with Statistical Office of
Comune di Brescia.
• Census data from ISTAT, reporting residential population
(01/01/2016) by age, for each sezione di censimento (SC)
5. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
The set-up
• To detect MPU spatiotemporal variability we define the subject of
our analysis: the daily density profiles (DDP).
• Let eit be the number of MPU in the i − th grid cell in a generic time
interval t,
• let Ir = {i1, ..., im} be the set of grid cells in region r of interest,
• let Td = {t1, ..., to} be the set of intervals of time in a day d.
• DDPrd can be defined as the vector of the sums of MPU (a sum for
each considered time instant) in region r and day d (length = o)
DDPrd =
m
l=1
eil,t1 ,
m
l=1
eil,t2 , ...,
m
l=1
eil,to
• Goal: classifying the occurrences in the time series of DDPrd related
to the set d = {d1, ..., dn} of n analyzed days. In other words,
clustering similar DDPrd .
6. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Issues
• Our dataset amount to n observations (days) and p = m ∗ o features
per day (cells∗quarters).
Let consider one year of data (n = 365): o = 96 (quarters per day),
m = 400 (grid’s cells of the sample area).
• Number of features is larger than number of observations, so we refer
to an high-dimensional data setup (Donoho, 2000).
• Traditional techniques (Arabie and De Soete, 1996) may not return
robust results in high-dimensional data, for example due to the
presence of the curse of dimensionality (Keogh and Mueen, 2017).
• Bouveyron et al. (2007) addressed this issue with regard to
clustering. However, as suggested by Jovi et al. (2015) , a suitable
solution is represented by a preliminar data reduction strategy.
• Histogram of Oriented Gradients (HOG) approach is used for data
reduction.
7. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
The Strategy ...
( ... to take into account days’ similarity)
Step Type Aim Method Features
1 Data re-
duction +
clustering
find similar
raster images
HOG + k-
means cluster
HOG features
2 clustering find similar
functional
curves
functional
model-based
clustering
DDP features
3 population
assessment
estimate city
users
spatial match
of MPU and
census data
DDP features
+ population
4 visualiza-
tion
find reference
daily profiles
functional box
plots
DDP features
9. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
HOG data reduction
• for a given t, let it = {e1,t, e2,t, ..., eim,t} be the MPU vector of
region r in time instant t (dimension m).
• Aim: to reduce it to a smaller vector of values κ1,t (m < m), with
the relevant information contained in it.
• To do so, set it, separately for each t, undergoes a histogram of
oriented gradients (HOG) feature extraction (Dalal and Triggs, 2005).
• Vector zit = {ei,t/maxi∈Ir (ei,t)}, ∀i in Ir undergoes HOG
• HOG method:
1 split the m cells of the grid in S smaller grids G1, ..., GS
(Gi ∩ Gj = ∅, ∀i = 1, ..., S and ∀j = 1, ..., S with i = j) (
√
S is a
parameter to be chosen),
2 for each grid Gi , direction and magnitude gradient matrices are
computed (Dalal and Triggs, 2005).
3 from the two gradient matrices, histogram of gradients is
determined, with k equal bins (with k a parameter to be
chosen).
• κit is stacked over the subscript t, in order to derive (for region r,
day d) the vector of features κd (dimension S ∗ k ∗ o), d = 1, ..., n.
10. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
HOG data reduction ... explained
From a nxn raster data ....
93 124 77 ... ...
217 55 94 ... ...
24 77 109 ... ...
... ... ... ... ...
... ... ... ... ...
...to Xt, a matrix representing the
number of people in that cell at time t
1 standardize MPU data;
2 split matrix in sub-matrices;
3 for each sub-matrix, compute the
matrices of gradients (using the
sobel operator);
4 assign each value of the direction
matrix to one of the k bins of
the histogram using its
magnitude as weight, to produce
the vector of features;
5 stack into a vector the features
of all quarters of the day.
11. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
First step clustering
• Days are clustered in terms of how MPU are distributed over region r
according to index i, i.e., according to similarity in the raster image .
• The objects to be clustered are the n days and κd contains the
S ∗ k ∗ o (with S ∗ k ∗ o < m ∗ o) features for day d, ∀d = d1, ..., dn.
• k-mean cluster method (Hartigan and Wong, 1979) is adopted
(after having tested against curse of dimensionality)
• According to Hartigan and Wong criterion, the clusters’ number H is
chosen by minimizing the ratio between
the total within sum of squares and the total sum of squares for
different values of H.
13. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Second step clustering
• Aim: at considering similarity in the functional form of the DDPrd , if viewed
as functional curves.
• We consider DDPrd as the collection of functional observations
xrd (Td ), Td ∈ (t1, ..., to) (length o) (i.e.
m
l=1
eil,t1 in t1), with d
varying in d = {d1, ..., dn}.
• We adopt a model-based functional data clustering method
(MB-FAC, Bouveyron et al., 2015), which provides estimated curve
with specific parameters, to group days d (cluster’s objects) in terms
of the o DDPrd values (cluster’s variables)
• We adopt the following path:
1 functional data outlier detection by likelihood ratio test
(LRT) to remove anomalous DDPrd , as proposed by
Febrero-Bande et al. (2008);
2 Bouveyron et al. (2015) clustering method, using funFEM
package in R
• The method suits for high-dimensional data: it employs sub-space
clustering criterion (Agrawal et al., 1998, it considers just the
minimum number of variables for grouping objects)
15. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Population assessment - I
• Aim: to estimate the total amount of people (city users), while
MPU availability regards just one mobile phone company.
• We compute an estimate of the market share of the provider
company, to correct the DDPrd ,
• by comparing the number of residents from archives with the number
of TIM users on a residential area - in late evening hours (assuming
that, in late evening hours, residential Sezione di Censimento (SC)
are only populated by residents).
• MPU grid is made of square cells while SCs are irregular polygons →
the number of TIM users belonging to each SC needs to be retrieved
by intersecting the two sources.
• the portion of the cell belonging to the SC polygon were calculated
in order to count how many TIM users are present in each polygon , by using the
function extract in raster package, R.
16. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Population assessment - II
• Let Cellj , ∀j = 1, 2, ..., JSC be the cells of the sample area, the ratio
Aj =
area(SC) ∩ area(Cellj )
area(Cellj )
represents how much of Cellj is included in the chosen SC;
• let TUCj be the MPU in Cellj , the estimation of the number of MPU
in SC is
ETUSC =
j
TUCj ∗ Aj
.
• The estimated company market share in SC is given by
ETMSSC =
ETUSC
PSC
where PSC is the resident number for that SC (children and elderly
people excluded).
• The median (me(.)) of ETMSSC can be used as a proxy for the
company market share at city level;
• the city users estimate is given by
ˆDDPrd =
DDPrd
me(ETMSSC )
18. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Visualization
• Let consider DDPrd to be a functional curve xrd (Td ) displaying, in
the y-axis, the sum of MPU in region r and day d with respect to, in
the x-axis, time instants Td ∈ (t1, ..., to).
• Functional box plots (FBP, Sun and Genton, 2011) can be used to
display the profile for each final cluster.
• For cluster h, let dh = {d1h, ..., dnh} be the group of days belonging
to cluster h, and let ˆDDPrd,h = [ ˆDDPrd1,h, ..., ˆDDPrdn,h] be the
matrix of dimension o ∗ nh with a DDPrd of cluster h in each column.
• By considering each DDPrd a curve, the FBP representing the profile
plot of the total number of people (that we call city users) in different
hours (with DB), for cluster h, is computed using matrix DDPrd,h
19. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Case study description
• WGS 84 UTM 32 N coordinates: 5,040,920–5,049,980N,
585,970–592,970E (area about 64 km2
) centred on the
Mandolossa-Gandovere network (grid of 20x20 150m2
cells)
• at 15-minutes intervals (quarters) over the period July 1st, 2015 -
August 10th, 2016.
• After imputing missing quarters and removing the full day when they
are too many, we ended up with a number of valid 360 days.
• HOG parameters:
√
S = 3, h = 4.
• The interest is in residential and industrial part of 4 specific areas
(Moie di Sotto, Villaggio Badia and Fantasina, southern Gandovere
canal, Roncadelle)
20. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
First step clustering: results
Figure: Spine-plots representing the first-step clustering of days along (a)
months and (b) days of the week (green: all days mostly occurring in July,
August and September; blue: working days mostly occurring from February to
June; red: working days mostly occurring from October to January; yellow:
weekends mostly occurring from October to June)
23. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
Discussion
• The combination of:
1 high spatial resolution (150 m2
) and short time step (15 ) of
data, and
2 the application of the proposed statistical strategy thought for
high dimensional data
permits a
1 reliable population assessments even for small area, and
2 a precise evaluation of the temporal dynamic of city users in the
sample area
• Functional box plot results are meaningful:
1 working days and weekends show different temporal dynamics,
when they belong to working months (October to June),
2 daily dynamics in summer months (July, August and
September, holydays in Italy), must be regarded as different
from the others,
3 working days and weekends feature more similar daily density
profiles during such months.
24. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
References - I
1 Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P.: Automatic
subspace clustering of high dimensional data for data mining applications,
in: Proceedings of the 1998 ACM SIGMOD International Conference on
Management of Data, ACM Press, 94–105, doi:10.1145/276304.276314,
1998
2 Arabie, P., and De Soete, G.: Clustering and classification, World Scientific,
doi:10.1142/1930, 1996
3 Bouveyron, C., Girard, S., and Schmid, C.: High-dimensional data
clustering, Comput. Stat. Data An., 52(1), 502–519,
doi:10.1016/j.csda.2007.02.009, 2007
4 Bouveyron, C., Come, E., & Jacques, J. (2015). The discriminative
functional mixture model for a comparative analysis of bike sharing
systems. The Annals of Applied Statistics, 9(4), 1726-1760.
5 Dalal, N., and Triggs, B.: Histograms of oriented gradients for human
detection, in: Proceedings of the International Conference on Computer
Vision & Pattern Recognition (CVPR ’05), doi:10.1109/CVPR.2005.177,
2005.
6 Donoho, D. L.: High-dimensional data analysis: The curses and blessings of
dimensionality, AMS Math Challenges Lecture, 1–32, 2000
7 Erlang, A. K. (1909). The theory of probabilities and telephone
conversations. Nyt. Tidsskr. Mat. Ser. B, 20, 33-39.
25. Dynamic
crowding
maps
Carpita
Metulini
The project
Data
Methodology
Application
Conclusions
References
References - II
1 Febrero-Bande, M., Galeano, P., & Gonzalez-Manteiga, W. (2008). Outlier
detection in functional data by depth measures, with application to identify
abnormal NOx levels. Environmetrics: The official journal of the
International Environmetrics Society, 19(4), 331-345.
2 Hartigan, J. A., and Wong, M. A.: Algorithm AS 136: A k-means
clustering algorithm, J. R. Stat. Soc., Series C (Applied Statistics), 28(1),
100–108, doi:10.2307/2346830, 1979.
3 Kellens, W., Terpstra, T., and De Maeyer, P.: Perception and
communication of flood risks: A systematic review of empirical research,
Risk Anal., 33/1, 24–49, doi:10.1111/j.1539-6924.2012.01844.x, 2013
4 Keogh, E., and Mueen, A.: Curse of dimensionality, in: Sammut, C., and
Webb, G. I. (eds.), Encyclopedia of Machine Learning and Data Mining,
314–315, Springer, doi:10.1007/978-1-4899-7687-1-192, 2017.
5 Jovi, A., Brki, K., and Bogunovi, N.: A review of feature selection methods
with applications, in: Proceedings of the 38th International Convention on
Information and Communication Technology, Electronics and
Microelectronics (MIPRO), 1200–1205, doi:10.1109/MIPRO.2015.7160458,
2015
6 Sun, Y., & Genton, M. G. (2011). Functional boxplots. Journal of
Computational and Graphical Statistics, 20(2), 316-334.
26. Dynamic
crowding
maps
Carpita
Metulini
Supplemental
Figure: Snapshots of a dynamic map showing the spatiotemporal distribution of
mobile phone users (MPU) occurred at 12pm, 17/11/2015 (Wednesday); base
map Lombardy Regional Technical Map CTR 1:5000 provided by Lombardy
Region (www.geoportale.regione.lombardia.it).
Back to slide