This document outlines a thesis on using data fusion methods for location data analysis. The thesis will examine using aggregated call detail records and social media data to detect and describe events in urban areas. It will also analyze re-identifying anonymized call detail records by fusing them with social network data. The document discusses different location data types, outlier detection methods, evaluating event detection precision and recall, and improving results by combining datasets. It also covers calculating user uniqueness from mobility data and the probabilistic approach to re-identification.
Data fusion for city live event detectionAlket Cecaj
Event detection in urban context by using aggregated mobile activity as for example CDR data and social network data in this case geo-referenced Twitter data. The experiments show that the two datasets - CDR and social data - used, complement each other by providing better event detection results and event detscription.
Data fusion for city live event detectionAlket Cecaj
Event detection in urban context by using aggregated mobile activity as for example CDR data and social network data in this case geo-referenced Twitter data. The experiments show that the two datasets - CDR and social data - used, complement each other by providing better event detection results and event detscription.
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...ijngnjournal
Organizations face a challenge of accurately analyzing network data and providing automated action
based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and
improve the performance of the network services, but organizations use different network management
tools to understand and visualize the network traffic with limited abilities to dynamically optimize the
network. This research focuses on the development of an intelligent system that leverages big data
telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web
application for effortless management of all subsystems, and the system and application developed in
this research demonstrate the true potential for a scalable system capable of effectively benchmarking
the network to set the expected behavior for comparison and trend analysis. Moreover, this research
provides a proof of concept of how trend analysis results are actioned in both a traditional network and
a software-defined network (SDN) to achieve dynamic, automated load balancing.
The Physical Web is a generic term describes interconnection of physical objects and web. The Physical Web lets present physical objects in a web. There are different ways to do that and we will discuss them in our paper. Usually, the web presentation for a physical object could be implemented with the help of mobile devices. The basic idea behind the Physical Web is to navigate and control physical objects in the world surrounding mobile devices with the help of web technologies. Of course, there are different ways to identify and enumerate physical objects. In this paper, we describe the existing models as well as related challenges. In our analysis, we will target objects enumeration and navigation as well as data retrieving and programming for the Physical Web
9th International Conference on Database and Data Mining (DBDM 2021)albert ca
9th International Conference on Database and Data Mining (DBDM 2021) Conference provides a forum for researchers who address this issue and to present their work in a peer-reviewed forum.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to these topics only.
This paper is devoted to the crowd sensing applications. Crowd sensing (mobile crowd sensing in our case) is a new sensing paradigm based on the power of the crowd with the sensing capabilities of mobile devices, such as smartphones or wearable devices. This power is based on the smartphones, usually equipped with multiple sensors. So, it enables to collect local information from the individual’s surrounding environment with the help of sensing features of the mobile devices. In this paper, we provide the review of the back-end systems (data stores, etc.) for mobile crowd sensing systems. The main goal of this review is to propose the software architecture for mobile crowd sensing in Smart City environment. We discuss also the deployment of cloud-back-ends in Russia.
Pivotal role of intelligence analysis in ILPdalened
This is a presentation I delivered at a "Intelligence Strategies for Law Enforcement" conference in March 2010 in Pretoria. The focus was what analysts\' role is and what contribution they can make to the fight against crime. I represented the International Association for Law Enforcement Intelligence Analysts (IALEIA). You might find slide 5 with the 4 steps in the Intel cycle useful to describe to your colleagues what they can expect from you in all phases of the Intel process. I was a bit disappointed with the conference, as the organizers have definitely not done research to understand the topic themselves or get speakers that understand even the basics of intelligence.... However, I got some good feedback after the conference, but it looks as if we still have a very long struggle ahead of us to establish Intel-led policing in SA...
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...ijngnjournal
Organizations face a challenge of accurately analyzing network data and providing automated action
based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and
improve the performance of the network services, but organizations use different network management
tools to understand and visualize the network traffic with limited abilities to dynamically optimize the
network. This research focuses on the development of an intelligent system that leverages big data
telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web
application for effortless management of all subsystems, and the system and application developed in
this research demonstrate the true potential for a scalable system capable of effectively benchmarking
the network to set the expected behavior for comparison and trend analysis. Moreover, this research
provides a proof of concept of how trend analysis results are actioned in both a traditional network and
a software-defined network (SDN) to achieve dynamic, automated load balancing.
The Physical Web is a generic term describes interconnection of physical objects and web. The Physical Web lets present physical objects in a web. There are different ways to do that and we will discuss them in our paper. Usually, the web presentation for a physical object could be implemented with the help of mobile devices. The basic idea behind the Physical Web is to navigate and control physical objects in the world surrounding mobile devices with the help of web technologies. Of course, there are different ways to identify and enumerate physical objects. In this paper, we describe the existing models as well as related challenges. In our analysis, we will target objects enumeration and navigation as well as data retrieving and programming for the Physical Web
9th International Conference on Database and Data Mining (DBDM 2021)albert ca
9th International Conference on Database and Data Mining (DBDM 2021) Conference provides a forum for researchers who address this issue and to present their work in a peer-reviewed forum.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to these topics only.
This paper is devoted to the crowd sensing applications. Crowd sensing (mobile crowd sensing in our case) is a new sensing paradigm based on the power of the crowd with the sensing capabilities of mobile devices, such as smartphones or wearable devices. This power is based on the smartphones, usually equipped with multiple sensors. So, it enables to collect local information from the individual’s surrounding environment with the help of sensing features of the mobile devices. In this paper, we provide the review of the back-end systems (data stores, etc.) for mobile crowd sensing systems. The main goal of this review is to propose the software architecture for mobile crowd sensing in Smart City environment. We discuss also the deployment of cloud-back-ends in Russia.
Pivotal role of intelligence analysis in ILPdalened
This is a presentation I delivered at a "Intelligence Strategies for Law Enforcement" conference in March 2010 in Pretoria. The focus was what analysts\' role is and what contribution they can make to the fight against crime. I represented the International Association for Law Enforcement Intelligence Analysts (IALEIA). You might find slide 5 with the 4 steps in the Intel cycle useful to describe to your colleagues what they can expect from you in all phases of the Intel process. I was a bit disappointed with the conference, as the organizers have definitely not done research to understand the topic themselves or get speakers that understand even the basics of intelligence.... However, I got some good feedback after the conference, but it looks as if we still have a very long struggle ahead of us to establish Intel-led policing in SA...
pranešimas Lietuvos akademinių bibliotekų informacinės infrastruktūros mokslui ir studijoms palaikymo ir plėtros konsorciumo seminare
„Išmanioji informacijos paieška“
2013.11.20 d. VU MKIC
Many leaders believe that once a directive is given, the job of communicating to the organization is complete. When the results do not match what was anticipated, a natural response is to add more specificity to the directive and tighten controls. Unfortunately, this response is often counterproductive – slowing down and underutilizing the intelligence of teams while simultaneously creating the conditions for poor Executive decision-making and burn-out. The problem intensifies as organizations become larger and more dynamic.
Luckily, there is a solution to this dilemma through practices honed by military organizations. This presentation outlines the steps for achieving alignment and what should go into an effective strategic directive. The highly recommended source for this material is Stephen Bungay’s The Art of Action: How Leaders Close the Gap Between Plans, Actions and Results.
Intelligence Analysis & Cognitive Biases: an Illustrative Case StudyPierre Memheld
This case study is foremost an educational tool. It involves two European and Asian multinational tires manufacturer for OTR, Off the Road, or “off road” and a problem of price competition. It shows how an initial intelligence effort is led astray. Instead the solution is a combination of approaches, better known as Competitive Intelligence. It is built on the external vision of the company craft, the use of all information sources characteristics of an intelligence field dedicated to the business world. It is not a new discipline but a trans-disciplinary approach for information exploitation which is using elements from financial analysis, SWOT (strengths, weaknesses, opportunities, threats) matrixes, and value chain analysis. In the above case, the company Eurotires used mostly the following sources: internet, scientific and patent databases; public administrative sources; customers interviews, industrial experts (manufacturing and distribution), and marketing analysis.
Outline for "Soft Skills, No Bullshit" conference contribution.
Strongly inspired by Richards Heuer, introducing the method of intelligence analysis from incomplete data to high school and university students.
Through the generalization of deep learning, the research community has addressed critical challenges in
the network security domain, like malware identification and anomaly detection. However, they have yet to
discuss deploying them on Internet of Things (IoT) devices for day-to-day operations. IoT devices are often
limited in memory and processing power, rendering the compute-intensive deep learning environment
unusable. This research proposes a way to overcome this barrier by bypassing feature engineering in the
deep learning pipeline and using raw packet data as input. We introduce a feature- engineering-less
machine learning (ML) process to perform malware detection on IoT devices. Our proposed model,”
Feature engineering-less ML (FEL-ML),” is a lighter-weight detection algorithm that expends no extra
computations on “engineered” features. It effectively accelerates the low-powered IoT edge. It is trained
on unprocessed byte-streams of packets. Aside from providing better results, it is quicker than traditional
feature-based methods. FEL-ML facilitates resource-sensitive network traffic security with the added
benefit of eliminating the significant investment by subject matter experts in feature engineering.
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...ijcsit
Through the generalization of deep learning, the research community has addressed critical challenges in
the network security domain, like malware identification and anomaly detection. However, they have yet to
discuss deploying them on Internet of Things (IoT) devices for day-to-day operations. IoT devices are often
limited in memory and processing power, rendering the compute-intensive deep learning environment
unusable. This research proposes a way to overcome this barrier by bypassing feature engineering in the
deep learning pipeline and using raw packet data as input. We introduce a feature- engineering-less
machine learning (ML) process to perform malware detection on IoT devices. Our proposed model,”
Feature engineering-less ML (FEL-ML),” is a lighter-weight detection algorithm that expends no extra
computations on “engineered” features. It effectively accelerates the low-powered IoT edge. It is trained
on unprocessed byte-streams of packets. Aside from providing better results, it is quicker than traditional
feature-based methods. FEL-ML facilitates resource-sensitive network traffic security with the added
benefit of eliminating the significant investment by subject matter experts in feature engineering.
New kind of intrusions causes deviation in the normal behaviour of traffic flow in
computer networks every day. This study focused on enhancing the learning capabilities of IDS
to detect the anomalies present in a network traffic flow by comparing the k-means approach of
data mining for intrusion detection and the outlier detection approach. The k-means approach
uses clustering mechanisms to group the traffic flow data into normal and abnormal clusters.
Outlier detection calculates an outlier score (neighbourhood outlier factor (NOF)) for each flow
record, whose value decides whether a traffic flow is normal or abnormal. These two methods
were then compared in terms of various performance metrics and the amount of computer
resources consumed by them. Overall, k-means was more accurate and precise and has better
classification rate than outlier detection in intrusion detection using traffic flows. This will help
systems administrators in their choice of IDS.
Cloud computing and networking course: paper presentation -Data Mining for In...Cristian Consonni
This is the presentation for the course "Cloud Computing and Networking" of the ICT Doctoral School of the University of Trento.
The paper presented is
"Data mining for internet of things: A survey."
by Tsai, Chun-Wei, et al.
(Communications Surveys & Tutorials, IEEE 16.1 (2014): 77-97.)
https://jst.org.in/index.html
Our journal has digital transformation, effective management strategies are crucial. Our pages unfold discussions on navigating the complexities of modern business landscapes, strategic decision-making, and adaptive leadership—essential elements for success in the 21st century.
Quantified Self movement allows to collect a lot of
personal data which can be used to nurture the model
of the users. Evenly, when aggregated, these personal
data become a picture of the people of a space in a City
Model. This model can be fed also by data coming from
crowdsensing. The resulting City Model can be used to
provide personalized services to citizen, and to increase
people awareness about their behaviour that can help
in promoting collective behavioural change. The paper
This tutorial presents tools and techniques for effectively utilizing the Internet of Things (IoT) for building advanced applications, including the Physical-Cyber-Social (PCS) systems. The issues and challenges related to IoT, semantic data modelling, annotation, knowledge representation (e.g. modelling for constrained environments, complexity issues and time/location dependency of data), integration, analy- sis, and reasoning will be discussed. The tutorial will de- scribe recent developments on creating annotation models and semantic description frameworks for IoT data (e.g. such as W3C Semantic Sensor Network ontology). A review of enabling technologies and common scenarios for IoT applications from the data and knowledge engineering point of view will be discussed. Information processing, reasoning, and knowledge extraction, along with existing solutions re- lated to these topics will be presented. The tutorial summarizes state-of-the-art research and developments on PCS systems, IoT related ontology development, linked data, do- main knowledge integration and management, querying large- scale IoT data, and AI applications for automated knowledge extraction from real world data.
Related: Semantic Sensor Web: http://knoesis.org/projects/ssw
Physical-Cyber-Social Computing: http://wiki.knoesis.org/index.php/PCS
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
A wide-scale bottom-up approach to the creation and management of open data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. This talk explores how to involving a wide community of users in collaborative management of open data activities within a Smart City. The talk discusses how crowdsourcing techniques can be applied within a Smart City context using crowdsourcing and human computation platforms such as Amazon Mechanical Turk, Mobile Works, and Crowd Flower.
IJWMN -Malware Detection in IoT Systems using Machine Learning Techniquesijwmn
Malware detection in IoT environments necessitates robust methodologies. This study introduces a CNN-LSTM hybrid model for IoT malware identification and evaluates its performance against established methods. Leveraging K-fold cross-validation, the proposed approach achieved 95.5% accuracy, surpassing existing methods. The CNN algorithm enabled superior learning model construction, and the LSTM classifier exhibited heightened accuracy in classification. Comparative analysis against prevalent techniques demonstrated the efficacy of the proposed model, highlighting its potential for enhancing IoT security. The study advocates for future exploration of SVMs as alternatives, emphasizes the need for distributed detection strategies, and underscores the importance of predictive analyses for a more powerful IOT security. This research serves as a platform for developing more resilient security measures in IoT ecosystems.
MALWARE DETECTION IN IOT SYSTEMS USING MACHINE LEARNING TECHNIQUESijwmn
Malware detection in IoT environments necessitates robust methodologies. This study introduces
a CNN-LSTM hybrid model for IoT malware identification and evaluates its performance against
established methods. Leveraging K-fold cross-validation, the proposed approach achieved 95.5%
accuracy, surpassing existing methods. The CNN algorithm enabled superior learning model
construction, and the LSTM classifier exhibited heightened accuracy in classification.
Comparative analysis against prevalent techniques demonstrated the efficacy of the proposed
model, highlighting its potential for enhancing IoT security. The study advocates for future
exploration of SVMs as alternatives, emphasizes the need for distributed detection strategies, and
underscores the importance of predictive analyses for a more powerful IOT security. This
research serves as a platform for developing more resilient security measures in IoT ecosystems.
Distributed systems and blockchain technologyAlket Cecaj
An introduction to blockchain technology starting from the distributed systems and the CAP theorem. Consensus mechanisms explained on the bitcoin blockchain.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Information Fusion Methods for Location Data Analysis
1. Information fusion methods for
location data analysis
Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei
Doctorate School in Industrial Innovation Engineering
2. Thesis outline
• Introduction
• Data Fusion for Event Detection and Event Description Using Agg. CDR
• Re-identification of Anonymized CDR Records Using Information Fusion
• Privacy issues
• Conclusions
3. Data Fusion and Location data
• Data Fusion
• Location Data types:
- CDR (Call Description Records) aggregated or individual.
- Geo-tagged social network data or LBS as Foursquare
- Location data as Open data. Example: census data.
4. Data fusion for event detection by using aggregated
CDR and geo-tagged social network data
Detecting and describing events happening in urban
areas by analysing spatio – temporal data
• Detecting and describing events happening in urban areas by
analysing spatio – temporal data
• Prevoious works: Laura Ferrari, Marco Mamei, Massimo Colonna (2012) : “ People get together on special
events: Discovering happenings in the city via cell network analysis ” Pervasive Computing and Communications
Workshops (PERCOM Workshops), 2012 IEEE International Conference on.
• Publication: Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of
Ambient Intelligence and Humanized Computing, pp 1– 15.
11. By combining the results from
the two datasets
• Improvement of precision – recall
performance of the method
• The improvement is limited in the
long run by the main dataset.
• The same improvement can be
observed also by joining the results
of the other datasets.
Improving event detection results by data fusion
12. By using the CDR data the
events can be detected but
not described:
• By joining the results the data
can complement and enrich
each other.
• In this case the social dataset
can be used to describe
semantically the events
Data fusion for Event description
13. Re-identification of CDR data by using social
network geo-tagged data
Information fusion for anonymized CDR data de-
anonymization.
Montjoye, Y. et al. (2013). “Unique in the crowd. The privacy bounds of
human mobility”. In: Scientific Reports 3, pp. 161 –180
Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information
Fusion Between Anonymized CDR and Social Network Data”. Journal of Ambient Intelligence
and Humanized Computing, pp. 1–14.
17. • Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?
• Question is both novel (no other works addressing it in this domain) and fundamental
• Conditional probability
• Even the percentage is low in a data set of millions of users there is a consistent
number of them that can be identified.
Re-identification : probabilistic approach
18. Conclusions
• Information fusion as a an enabling process for novel applications
- Future work oriented towards the “structured data fusion” idea
• Privacy
- anonimty VS re-identification and remaining utility of data
- variations of existing privacy preserving techniques (Differential privacy.)
19. Publications
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness
for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in
enabling technologies for collaboration 17-20 2013.
• Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data
”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014.
• Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and
Humanized Computing, pp 1– 15.
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social
Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers
Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227.
• Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between
Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.
Editor's Notes
Lo scopo dell mio lavoro di tesi è quello di :
1- sviluppare delle tecniche di data fusion per dati geo-referenziati.
Questo lavoro, se da una parte ha permesso di sviluppare applicazioni per arricchire i data set stessi dal altra ha fatto emergere
problematiche di privacy che derivano dal processo di data fusion.
2- Questo lavoro,da un lato ha permesso di
2.1- sviluppare diverse applicazioni per arricchire i data set stessi e
2.2- dall’altro ha evidenziato alcuni problemi di privacy che derivano dal processo di data fusion.
La tesi si articola secondo i seguenti punti :
Dopo una prima parte introduttiva si presenta uno studio di rilevamento automatico di grandi eventi in aree urbane
usando dati aggregati di telefonia mobile e dati social geo-referenziati.
Dai dati aggregati si passa ai dati CDR anonimizzati che mostrano tracce di mobilità individuali. In particolare in questo lavoro si mostra come il processo di data fusion con questi dati può impattare la privacy.
Alla fine, insieme alle conclusioni si presentano diversi punti ancora aperti sia per quanto riguarda il campo di data fusion che quello sulla privacy preserving.
Data fusion è il processo di combinazione e integrazione di più data set. Il processo analizza diversi dati set cosi che ciascun di questi possa interagire, informare e completare gli altri data set.
Invece per quanto riguarda i tipi di dati geo-referenziati questi sono CDR o Call Description Records che possono essere di due formati :
Livelli di attività (chiamate ,SMS o connessione dati) in una certa zona in maniera aggregata
Dati che mostrano tracce di mobilità individuali
Un’altra fonte di location data sono anche i dati social geo-referenziati e gli open data ad esempio dati di censimento.
1- Presento subito il primo caso di applicazione data fusion che è un sistema di event detection che usa dati CDR aggregati.
Molto spesso i city manager o le autorità locali devono capire (anche con una certa urgenza se in caso di emergenza) quello che succede in una determinata area della città, oppure semplicemente capire le dinamiche di una zona urbana dal punto di vista del traffico, inquinamento del aria, movimenti di persone ecc.. ) e attuare miglioramenti in questo senso.
2- Questo studio segue questa direzione ed ha come obiettivo quello di creare un applicazione che possa rilevare in maniera automatica gli eventi
nelle zone urbane a partire dall’analisi di dati CDR aggregati e dai dati social geo-referenziati.
3- Altri lavori fatti in questo ambito sono : Ferrari Mamei Colonna (2012) presentato alla Percom2012
4- Questo lavoro è stato publicato in «Journal of Ambient Intelligence and Humanized Computing»
1- (I dati CDR) i dati CDR (o Call Description Records) aggregati mostrano livelli di attività in termini di chiamate e sms in uscita o in entrata in una certa zona.
2- (dati forniti ) durante un Big-Data challenge organizzato da TIM Telecom Italia nel 2014 e riguardano due città che sono Milano e Trento.
3- (Il grafico mostra ) i livelli di attività di una cella della griglia vicina a uno stadio dove tipicamente nel weekend ci sono attività sportive in un arco temporale di due mesi.
Per il nostro approccio di analisi e rilevazione degli eventi abbiamo aggregato i dati dal punto di vista spaziale e temporale.
L’aggregazione spaziale ci aiuta in due punti
1- il primo e quello secondo cui se l’area dove si svolge un evento risulta frantumata in più celle allora con l’aggregazione riusciamo
a identificare l’area del evento con 1 o massimo due celle
2- il secondo punto invece ha a che fare con aspetti computazionali e cioè con il fatto che con meno celle possiamo rilevare gli eventi in meno tempo
L’aggregazione temporale invece ci aiuta ad approssimare la distribuzione di densità di probabilità dei livelli di attività di una cella che è bimodale come in a) con una distribuzione normale come in d) aggregando i dati su base oraria e distinguendo tra giorni lavorativi e week-end.
La distribuzione normale dei dati, permette di poter usare in maniera efficace, uno strumento di rappresentazione dei livelli di attività di cella nel tempo che è il boxplot. Modellando i dati in questo modo posso usare un metodo di rilevazione degli outliers (quindi degli eventi) che è il boxplot rule.
Con questo metodo identifico gli outlier come valori superiori a upper bound UB dove UB = Q75 + k * IQR
dove IQR = Q75 – Q25.
Prendendo come riferimento un certo livello di attività o soglia valuto di volta in volta il numero di eventi che trovo per quella soglia.
Il coefficiente k mi da la possibilità di poter considerare come eventi oppure no i picchi che trovo con riferimento a diversi livelli
di attività di cella.
Anche altre versioni di questo metodo vengono testati utilizzando al posto di IQR il Q50 oppure il Q75 quindi si parlerà di questi metodi
Come IQR, M, e Q75
1- Confronto i risultati del metodo di event – detection con un inseme di dati di groundtruth
2-Questi sono un insieme di eventi successi nell’area nel periodo di riferimento del dataset stesso come partite di calcio, fiere, proteste e altri eventi che coinvolgono numeri consistenti di persone.
1- Quindi valutiamo i risultati di recall e precision del sistema confrontandoli con i dati di groundtruth.
2- In questo caso la recall mi da il rapporto tra eventi riconosciuti come tali e gli eventi che ci sono effettivamente stati nel area.
3- la precision è una misura che esprime la qualità della recall. Cioè, eventi del groung truth diviso la quantità di quello che il mio metodo di event detection (analizzando i miei dati) riconosce come eventi.
Il grafico a destra mostra i risultati di precision e recall usando il metodo della mediana per i vari valori di k.
In particolare ciascuna delle curve nel grafico a destra sono state ottenute con un singolo valore di k variando però il livello di soglia di riferimento e passando da un valore 1000 a un valore di circa 2500 anche il numero degli eventi che trovo varia. Per ogni livello di soglia di riferimento ottengo
Un certo valore di precision e recall che riporto nel grafico. Passando dal grafico in alto a quello in basso il numero degli eventi che trovo diminuisce
perché ignoro gli eventi di magnitudo inferiore e mi concentro sugli eventi più grandi. Questo fa si che la recall diminuisce mentre aumenta la precision,
In particolare per k bassi 0.5 (come nel primo grafico in alto) si ha una recall più alta ma una precisone bassa mentre per k alti migliora la precision
ma la recall parte da un valore iniziale più bassa.
Tanti altri esperimenti su entrambe le città e con diversi tipi di cella
Per integrare i risultati di event-detection ottenuti con i dati CDR e con i dati social consideriamo l’unione insiemistica degli eventi rilevati in ogni uno dei due data set. Quindi andiamo a valutare precison e recall con i risultati cosi integrati. La curva rossa mostra i valori di precision e recall finali. In particolare a parità di recall si nota un miglioramento della precision anche se tale miglioramento è limitato dagli eventi ottenuti con il dataset principale che è quello dei CDR.
Un altro vantaggio del data fusion deriva dal fatto che i due data set sono complementari ai fini del event description. Quindi arricchiscono il risultato finale in quanto il data set social è in grado di descrivere gli eventi rilevati con i dati CDR . Semplicemente analizzando i topic e le parole chiave che compaiono nel testo di aggiornamento di status degli utenti social una volta che i risultati si integrano.
Quindi una conclusione su questa prima parte della tesi è quello sulle opportunità che i metodi di data fusion offrono di poter arricchire
e complementare i dati di un data set e anche i risultati dell’ analisi.
I dati usati nell’esempio precedente sono forniti in un formatto aggregato quindi privi di riferimenti su dati individuali. In altri casi invece i CDR contengono dati anonimizzati dove l’id utente è un hash code univoco.
Anche se i dati in questo caso sono anonimizzati (l’anonimizazzione non basta anche se viene considerata molto spesso sicura )c’è sempre la possibilità che vengano de-anonimizzati utilizzando per la re-identificazione altri dati come ad esempio i dati social geo-referenziati.
Questo è possibile in quanto le tracce di mobilità di ciascun individuo (cosi come quelle digitali) sono uniche. Partendo da questo concetto di unicità delle tracce di mobilità il seguente studio mostra come è possibile utilizzare tecniche di data fusion per re-identificare utenti CDR anonimizzati.
1- due tipi di data set : due CDR e due data set social geo-referenziati.
Il primo grafico in alto a sinistra mostra la distribuzione degli eventi (Call - SMS - Internet) per utente del primo data set CDR. Di fianco a questo grafico si mostra graficamente una misura di mobilità di questi utenti che si chiama «Radius of Gyration» che esprime la lunghezza media dei percorsi degli utenti CDR.
Il grafico sotto esprime le stesse misure ma per gli utenti social quindi eventi per utente e «Radius of Gyration»
In particolare l’analisi di unicità dei percorsi di mobilità ci aiuta a capire due tipi di informazioni :
1- il numero medio di punti o eventi necessari per identificare come unico un individuo
2- la percentuale degli utenti CDR che ha un percorso unico e quindi può essere associato a un’unica traccia di mobilità.
Contestualizzarlo con un esempio concreto
Una prima conclusione si può avere guardando i dati del grafico a sinistra che mostra il numero di punti
Le conclusioni sono due in particolare :
1- Il data fusion e un processo che rende possibili diverse applicazioni tuttavia nel campo ancora manca un idea di data fusion strutturato
2- La seconda conclusione è sulla privacy in particolare quella dei dati CDR individuali che sebbene anonimizzati possono
Le pubblicazioni che abbiamo fatto su questi temi.