The document discusses using mobility traces and context information to detect loss or theft of mobile devices. It proposes converting traces and context into "behavior text" representations, then building an n-gram language model to establish a baseline for normal behavior. The model can detect anomalies indicating potential loss or theft events by flagging sequences with unexpectedly low probabilities. The approach aims to discover such events early for notification and recovery efforts.
The penetration of mobile devices equipped with various embedded sensors also make it possible to capture the physical and virtual context of the user and surrounding environment. Further, the modeling of human behaviors based on those data becomes very important due to the increasing popularity of context-aware computing and people-centric applications, which utilize users' behavior pattern to improve the existing services or enable new services. In many natural settings, however, their broader applications are hindered by three main challenges: rarity of labels, uncertainty of activity granularities, and the difficulty of multi-dimensional sensor fusion.
We introduce a new mobile system framework, SenSec, which uses passive sensory data to ensure the security of applications and data on mobile devices.
SenSec constantly collects sensory data from accelerometers, gyroscopes and magnetometers and constructs the gesture model of how a user uses the device.
SenSec calculates the sureness that the mobile device is being used by its owner.
Based on the sureness score, mobile devices can dynamically request the user to provide active authentication (such as a strong password), or disable certain features of the mobile devices to protect user's privacy and information security.
In this paper, we model such gesture patterns through a continuous n-gram language model using a set of features constructed from these sensors. We built mobile application prototype based on this model and use it to perform both user classification and user authentication experiments. User studies show that SenSec can achieve 75 accuracy in identifying the users and 71.3 accuracy in detecting the non-owners with only 13.1 false alarms.
Guest Lecture: SenSec - Mobile Security through BehavioMetrics Jiang Zhu
This document summarizes research on using mobile sensor data and behavioral biometrics for user authentication and activity recognition. It describes collecting data from accelerometers, GPS, WiFi and applications to build language models of user behavior. Scores are calculated to determine the likelihood a behavior belongs to a user or activity class. Authentication is triggered based on thresholds. The system was tested to identify users from single key presses and detect anomalies with days of training data at 80% accuracy. Future work involves expanded data collection, improved models, integration with security frameworks, and ensuring user privacy.
SenSec: Mobile Application Security through Passive SensingJiang Zhu
The document proposes a smartphone-based behavioral authentication system called SenSec. It collects sensor data to build user behavior models. Features are extracted from the sensor data and used to build risk analysis trees to detect anomalies. When anomalies are detected, a certainty score is broadcast and can trigger authentication for sensitive applications. The system was tested on a dataset of 25 users, achieving over 98% accuracy in user identification. Extensions and integrations with other systems are discussed to enhance security, privacy, and energy efficiency.
TaintDroid is a system that provides dynamic taint tracking and analysis for Android. It tracks privacy sensitive information like location, contacts etc. at variable, message, method and file levels with 14% overhead. Testing 30 apps found 20 shared information unexpectedly, like sending device IDs or location to ad servers. TaintDroid effectively demonstrates the need for stronger mobile privacy but has limitations like requiring OS modifications and false positives. Future work aims to reduce false positives, integrate crowdsourcing and detect privacy information leakage attempts.
Behaviometrics: Behavior Modeling from Heterogeneous Sensory Time-SeriesJiang Zhu
Over the decades, we have seen tremendous success in biometrics technologies being used in all types of applications based on the physical attributes of the individual such as face, fingerprints, voice and iris. Inspired by this, we introduce a new concept Mobile Behaviometrics, which uses algorithms and models to measure and quantify unique human behavioral patterns in place of human bio-attributes. Behaviometrics algorithms take multiple data from various sensors as input and fuse them to build behavioral models which are capable of producing application specific quantitative analysis on the unique individuals that were the originators of the data.
The document discusses intuitive user interfaces and one-touch interactions. It describes a company called IntuitiveUI that aims to simplify device usage through predictive modeling and a one-touch experience. IntuitiveUI uses sensors and logging of user behaviors to build statistical models and predict common actions based on context like time, location and past events. This allows displaying relevant options with a single touch rather than multiple taps through conventional menus. The approach aims to overcome challenges of mobile complexity but challenges include uneven data collection and meeting user expectations.
The penetration of mobile devices equipped with various embedded sensors also make it possible to capture the physical and virtual context of the user and surrounding environment. Further, the modeling of human behaviors based on those data becomes very important due to the increasing popularity of context-aware computing and people-centric applications, which utilize users' behavior pattern to improve the existing services or enable new services. In many natural settings, however, their broader applications are hindered by three main challenges: rarity of labels, uncertainty of activity granularities, and the difficulty of multi-dimensional sensor fusion.
We introduce a new mobile system framework, SenSec, which uses passive sensory data to ensure the security of applications and data on mobile devices.
SenSec constantly collects sensory data from accelerometers, gyroscopes and magnetometers and constructs the gesture model of how a user uses the device.
SenSec calculates the sureness that the mobile device is being used by its owner.
Based on the sureness score, mobile devices can dynamically request the user to provide active authentication (such as a strong password), or disable certain features of the mobile devices to protect user's privacy and information security.
In this paper, we model such gesture patterns through a continuous n-gram language model using a set of features constructed from these sensors. We built mobile application prototype based on this model and use it to perform both user classification and user authentication experiments. User studies show that SenSec can achieve 75 accuracy in identifying the users and 71.3 accuracy in detecting the non-owners with only 13.1 false alarms.
Guest Lecture: SenSec - Mobile Security through BehavioMetrics Jiang Zhu
This document summarizes research on using mobile sensor data and behavioral biometrics for user authentication and activity recognition. It describes collecting data from accelerometers, GPS, WiFi and applications to build language models of user behavior. Scores are calculated to determine the likelihood a behavior belongs to a user or activity class. Authentication is triggered based on thresholds. The system was tested to identify users from single key presses and detect anomalies with days of training data at 80% accuracy. Future work involves expanded data collection, improved models, integration with security frameworks, and ensuring user privacy.
SenSec: Mobile Application Security through Passive SensingJiang Zhu
The document proposes a smartphone-based behavioral authentication system called SenSec. It collects sensor data to build user behavior models. Features are extracted from the sensor data and used to build risk analysis trees to detect anomalies. When anomalies are detected, a certainty score is broadcast and can trigger authentication for sensitive applications. The system was tested on a dataset of 25 users, achieving over 98% accuracy in user identification. Extensions and integrations with other systems are discussed to enhance security, privacy, and energy efficiency.
TaintDroid is a system that provides dynamic taint tracking and analysis for Android. It tracks privacy sensitive information like location, contacts etc. at variable, message, method and file levels with 14% overhead. Testing 30 apps found 20 shared information unexpectedly, like sending device IDs or location to ad servers. TaintDroid effectively demonstrates the need for stronger mobile privacy but has limitations like requiring OS modifications and false positives. Future work aims to reduce false positives, integrate crowdsourcing and detect privacy information leakage attempts.
Behaviometrics: Behavior Modeling from Heterogeneous Sensory Time-SeriesJiang Zhu
Over the decades, we have seen tremendous success in biometrics technologies being used in all types of applications based on the physical attributes of the individual such as face, fingerprints, voice and iris. Inspired by this, we introduce a new concept Mobile Behaviometrics, which uses algorithms and models to measure and quantify unique human behavioral patterns in place of human bio-attributes. Behaviometrics algorithms take multiple data from various sensors as input and fuse them to build behavioral models which are capable of producing application specific quantitative analysis on the unique individuals that were the originators of the data.
The document discusses intuitive user interfaces and one-touch interactions. It describes a company called IntuitiveUI that aims to simplify device usage through predictive modeling and a one-touch experience. IntuitiveUI uses sensors and logging of user behaviors to build statistical models and predict common actions based on context like time, location and past events. This allows displaying relevant options with a single touch rather than multiple taps through conventional menus. The approach aims to overcome challenges of mobile complexity but challenges include uneven data collection and meeting user expectations.
Context is King: AR, AI, Salience, and the Constant Next ScenarioClark Dodsworth
Clark Dodsworth’s AREvent talk, Santa Clara, CA June 3, 2010: "Context is King: AR, AI, Saience, and the Constant Next Scenario" Mostly about smartphone AR as a gateway to context aware computing becoming indispensible.
The document discusses how networks and applications can become more aware of each other to improve the experience for end users. Currently, networks and applications operate independently without much visibility into each other. The document proposes that applications share information about end users and traffic with networks, and networks share information about topology, bandwidth, and resources with applications. This would allow applications to optimize content placement and resource usage, and networks to gain insights to better optimize traffic and provide new services. The document argues this type of programmable network can improve areas like security, performance, analytics and more.
The document summarizes Radhika Dharurkar's Masters thesis defense on context-aware middleware for activity recognition. It provides an overview of her motivation, approach, implementation, experiments and results. Her work involved developing a prototype system that can predict 10 activities using data from smartphone sensors and other sources with better than average precision. Experiments were conducted collecting data from 2 users over 2 weeks to evaluate different classification algorithms on recognizing activities like working, studying, sleeping, etc. The most confused activities in classification were working/studying with others like coffee/snacks and sleeping.
Mobile oxford open source junction 5 july 2011Tim Fernando
This document is a case study about the Mobile Oxford project from the University of Oxford. It describes how Mobile Oxford was created as an open source and accessible mobile website to provide services to students, staff and visitors of the university. It aggregates data from various university systems and provides features like transport information, contacts, library search, and tools from their learning management system. Mobile Oxford is now developed as part of the open source Molly Project to ensure long term sustainability and benefit other universities.
1) The document discusses a context-aware mobile social web, where mobile applications can access and use contextual information about users, such as their location, activity, and device characteristics.
2) Telecom Italia has developed platforms and applications to enable this context-aware mobile social web, including tools for collecting, representing, and analyzing context data, as well as applications that provide location-based recommendations and allow users to tag content with context.
3) Lessons from deploying these applications include the importance of openness to popular social networks, using context accurately, promoting high-quality content, and ensuring user privacy and comfort with sharing personal information. Standardization of context representation could help address challenges involving context certification and sharing
Uncovering Remote Peering Interconnections at IXPsAPNIC
This document discusses remote peering at internet exchange points (IXPs). It presents a new methodology to accurately infer which peers are connected to IXPs through remote peering. The methodology uses factors like port capacity, ping round-trip times, colocation facilities, connections to multiple IXPs, and private connectivity to determine if a peer is local or remote. Applying this methodology to top IXPs found that around a third of members peer remotely, and large IXPs have around 40% remote peers. Remote peering is becoming a more popular practice that allows IXPs to expand their geographical reach. A public portal was also proposed to provide monthly snapshots of remote versus local peering inferences.
From Context-awareness to Human Behavior PatternsVille Antila
Ville Antila discusses using smartphones to detect daily routines and human behavior patterns through continuous context logging. Smartphones can sense context through built-in sensors and log location, device usage, physical activity, and Bluetooth snapshots. This data is interpreted to estimate routines like locations visited and detect changes. Example applications include context-adaptive feedback that considers situation suitability, and context-based user interface migration between devices. Challenges include ensuring quality, user awareness of adaptive behavior, and testing context-aware applications in real-world use.
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004Jason Hong
Some older research I did looking at one way of building privacy-sensitive apps for ubiquitous computing environments. The core idea is to focus on locality, where all of the data is sensed and processed locally as much as possible.
Privacy is the most often-cited criticism of ubiquitous computing, and may be the greatest barrier to its long-term success. However, developers currently have little support in designing software architectures and in creating interactions that are effective in helping end-users manage their privacy. To address this problem, we present Confab, a toolkit for facilitating the development of privacy-sensitive ubiquitous computing applications. The requirements for Confab were gathered through an analysis of privacy needs for both end-users and application developers. Confab provides basic support for building ubiquitous computing applications, providing a framework as well as several customizable privacy mechanisms. Confab also comes with extensions for managing location privacy. Combined, these features allow application developers and end-users to support a spectrum of trust levels and privacy needs.
Authors are Jason Hong and James Landay
This document discusses cyber-physical systems and the Internet of Things. It outlines Tata Consultancy Services' research programs in areas like mobile phone sensing, camera sensing, signal and image processing, and human activity detection using sensors. The goals are to develop an IoT platform for affordable healthcare and wellness solutions using mobile phones to detect physiological parameters. Research is also described on indoor localization, cognitive load detection using EEG, and emotion recognition using cameras. TCS has several innovation labs conducting exploratory research on mobile interactive remote sensing applications.
The document discusses OpenSense, a research project that aims to monitor air pollution through dense deployments of wireless sensor nodes. The goals are to gather more precise, location-dependent pollution data in real-time. This would help officials and citizens, but poses technical challenges around sensor coordination, data quality, and privacy. The researchers use a utility-based control approach to optimize measurements across different system layers and models. OpenSense has deployed sensor nodes on buses, trams, and stationary wireless installations in several Swiss cities.
MindTrek2011 - ContextCapture: Context-based Awareness Cues in Status UpdatesVille Antila
Presentation of an experimental mobile application, which allows users to add different descriptions of context information to their Facebook status updates. The meaningfulness and the usage of different context descriptions were evaluated in a two-week user trial. The results show that the most frequently used awareness cues in the test setting were location, surroundings, friends and activity. The results also indicate that user-defined semantic abstractions of context items (e.g. “home”, “work”) were often more informative and useful than more accurate indicators (e.g. the address or the name of the place). We also found out that using shared context from friends in vicinity (e.g. identifying the people around) needs careful design to overcome the extended privacy implications.
Mobile Oxford - Open Source Junction 29 March 2011Tim Fernando
This document discusses Mobile Oxford, a mobile website developed at the University of Oxford to aggregate content and services for students, staff, and visitors onto multiple mobile devices. It was developed as an open source project called Molly to ensure long-term sustainability and accessibility. Mobile Oxford has become a central hub providing places, transport, contacts, library search, and tools from the Weblearn learning management system. It is developed entirely as open source software to allow customization and benefits from ongoing community contributions.
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016MLconf
Discerning Human Behavior from Mobility Data: Mobility data encompasses many elements, including location history, latitude coordinates, longitude coordinates, anonymized mobile device IDs, and timestamps. Such data are generated, for instance, by automobile navigation applications and by the mobile advertising ecosystem. Typical sources of mobility data contain extensive inaccuracies that result from a variety of sources, ranging from shortcomings in location services on mobile devices to the intentional misrepresentation of spatial coordinates by bad ecosystem actors. In this talk, we describe a production data pipeline, Darwin, which analyzes the location quality of mobility data to measure how accurately a set of mobility data represents true movement patterns. Darwin uses a number of measures that are ultimately combined into two quality scores: hyper-locality and clusterability. These measurements include techniques from information theory, the mean number of spatial clusters, the compactness of the clusters, and the differences between the empirical distribution of digits in the spatial coordinates and reference distributions.
Network Driven Behaviour Modelling for Designing User Centred IoT ServicesFahim Kawsar
We are observing a monumental effort from the industry and academia to make everything connected. Naturally, to understand the needs of these connected things, we need a better understanding of humans and where, when, and how they interact. Then we can create digital services and capabilities that fundamentally change the way we experience our lives. IoT 1.0 is all about connectivity, and scale. IoT 2.0 will be about learning and contextual automation. Designing intention- and behavior-aware services will be the principal source of differentiation, and competitive advantage for the industry players. In this talk I argue that for wide scale adoption, and market penetration of personalized IoT services, existing network infrastructure should play the key role for sensing and learning, by eliminating the cost of deployment and management of many sensors. I will show then how wireless network can be used as a sensing platform to model human behaviour and to redefine people-content, people-thing, and people-people interaction experience in an IoT enabled world.
This document provides an agenda for the MOBISYS seminar, listing the speakers and their topics. It includes snippets of conversations between the speakers as they introduce themselves and their topics. The document discusses research related to pervasive computing, wireless networks, and mobile systems.
This document provides an agenda for the MOBISYS seminar, listing the speakers and their topics. It includes snippets of conversations between the speakers as they introduce themselves and their topics. The document discusses research related to pervasive computing, wireless networks, and mobile systems.
Sense Networks uses proprietary technology and location analytics expertise to analyze location history and behavior patterns to deliver unique insights and intelligence. They have built a high-capacity platform called MacroSense that can extract information from tens of millions of user locations and points of interest, segmenting users and predicting future behaviors. Their location-based segments have been shown to drive user responses and actions.
Digital Aura allows Alice and Bob to connect via their Bluetooth devices when Bob recognizes Alice's request for help with algorithms from her profile as he passes by. His phone alerts him and transfers Alice's contact information so they can get in touch.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Context is King: AR, AI, Salience, and the Constant Next ScenarioClark Dodsworth
Clark Dodsworth’s AREvent talk, Santa Clara, CA June 3, 2010: "Context is King: AR, AI, Saience, and the Constant Next Scenario" Mostly about smartphone AR as a gateway to context aware computing becoming indispensible.
The document discusses how networks and applications can become more aware of each other to improve the experience for end users. Currently, networks and applications operate independently without much visibility into each other. The document proposes that applications share information about end users and traffic with networks, and networks share information about topology, bandwidth, and resources with applications. This would allow applications to optimize content placement and resource usage, and networks to gain insights to better optimize traffic and provide new services. The document argues this type of programmable network can improve areas like security, performance, analytics and more.
The document summarizes Radhika Dharurkar's Masters thesis defense on context-aware middleware for activity recognition. It provides an overview of her motivation, approach, implementation, experiments and results. Her work involved developing a prototype system that can predict 10 activities using data from smartphone sensors and other sources with better than average precision. Experiments were conducted collecting data from 2 users over 2 weeks to evaluate different classification algorithms on recognizing activities like working, studying, sleeping, etc. The most confused activities in classification were working/studying with others like coffee/snacks and sleeping.
Mobile oxford open source junction 5 july 2011Tim Fernando
This document is a case study about the Mobile Oxford project from the University of Oxford. It describes how Mobile Oxford was created as an open source and accessible mobile website to provide services to students, staff and visitors of the university. It aggregates data from various university systems and provides features like transport information, contacts, library search, and tools from their learning management system. Mobile Oxford is now developed as part of the open source Molly Project to ensure long term sustainability and benefit other universities.
1) The document discusses a context-aware mobile social web, where mobile applications can access and use contextual information about users, such as their location, activity, and device characteristics.
2) Telecom Italia has developed platforms and applications to enable this context-aware mobile social web, including tools for collecting, representing, and analyzing context data, as well as applications that provide location-based recommendations and allow users to tag content with context.
3) Lessons from deploying these applications include the importance of openness to popular social networks, using context accurately, promoting high-quality content, and ensuring user privacy and comfort with sharing personal information. Standardization of context representation could help address challenges involving context certification and sharing
Uncovering Remote Peering Interconnections at IXPsAPNIC
This document discusses remote peering at internet exchange points (IXPs). It presents a new methodology to accurately infer which peers are connected to IXPs through remote peering. The methodology uses factors like port capacity, ping round-trip times, colocation facilities, connections to multiple IXPs, and private connectivity to determine if a peer is local or remote. Applying this methodology to top IXPs found that around a third of members peer remotely, and large IXPs have around 40% remote peers. Remote peering is becoming a more popular practice that allows IXPs to expand their geographical reach. A public portal was also proposed to provide monthly snapshots of remote versus local peering inferences.
From Context-awareness to Human Behavior PatternsVille Antila
Ville Antila discusses using smartphones to detect daily routines and human behavior patterns through continuous context logging. Smartphones can sense context through built-in sensors and log location, device usage, physical activity, and Bluetooth snapshots. This data is interpreted to estimate routines like locations visited and detect changes. Example applications include context-adaptive feedback that considers situation suitability, and context-based user interface migration between devices. Challenges include ensuring quality, user awareness of adaptive behavior, and testing context-aware applications in real-world use.
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004Jason Hong
Some older research I did looking at one way of building privacy-sensitive apps for ubiquitous computing environments. The core idea is to focus on locality, where all of the data is sensed and processed locally as much as possible.
Privacy is the most often-cited criticism of ubiquitous computing, and may be the greatest barrier to its long-term success. However, developers currently have little support in designing software architectures and in creating interactions that are effective in helping end-users manage their privacy. To address this problem, we present Confab, a toolkit for facilitating the development of privacy-sensitive ubiquitous computing applications. The requirements for Confab were gathered through an analysis of privacy needs for both end-users and application developers. Confab provides basic support for building ubiquitous computing applications, providing a framework as well as several customizable privacy mechanisms. Confab also comes with extensions for managing location privacy. Combined, these features allow application developers and end-users to support a spectrum of trust levels and privacy needs.
Authors are Jason Hong and James Landay
This document discusses cyber-physical systems and the Internet of Things. It outlines Tata Consultancy Services' research programs in areas like mobile phone sensing, camera sensing, signal and image processing, and human activity detection using sensors. The goals are to develop an IoT platform for affordable healthcare and wellness solutions using mobile phones to detect physiological parameters. Research is also described on indoor localization, cognitive load detection using EEG, and emotion recognition using cameras. TCS has several innovation labs conducting exploratory research on mobile interactive remote sensing applications.
The document discusses OpenSense, a research project that aims to monitor air pollution through dense deployments of wireless sensor nodes. The goals are to gather more precise, location-dependent pollution data in real-time. This would help officials and citizens, but poses technical challenges around sensor coordination, data quality, and privacy. The researchers use a utility-based control approach to optimize measurements across different system layers and models. OpenSense has deployed sensor nodes on buses, trams, and stationary wireless installations in several Swiss cities.
MindTrek2011 - ContextCapture: Context-based Awareness Cues in Status UpdatesVille Antila
Presentation of an experimental mobile application, which allows users to add different descriptions of context information to their Facebook status updates. The meaningfulness and the usage of different context descriptions were evaluated in a two-week user trial. The results show that the most frequently used awareness cues in the test setting were location, surroundings, friends and activity. The results also indicate that user-defined semantic abstractions of context items (e.g. “home”, “work”) were often more informative and useful than more accurate indicators (e.g. the address or the name of the place). We also found out that using shared context from friends in vicinity (e.g. identifying the people around) needs careful design to overcome the extended privacy implications.
Mobile Oxford - Open Source Junction 29 March 2011Tim Fernando
This document discusses Mobile Oxford, a mobile website developed at the University of Oxford to aggregate content and services for students, staff, and visitors onto multiple mobile devices. It was developed as an open source project called Molly to ensure long-term sustainability and accessibility. Mobile Oxford has become a central hub providing places, transport, contacts, library search, and tools from the Weblearn learning management system. It is developed entirely as open source software to allow customization and benefits from ongoing community contributions.
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016MLconf
Discerning Human Behavior from Mobility Data: Mobility data encompasses many elements, including location history, latitude coordinates, longitude coordinates, anonymized mobile device IDs, and timestamps. Such data are generated, for instance, by automobile navigation applications and by the mobile advertising ecosystem. Typical sources of mobility data contain extensive inaccuracies that result from a variety of sources, ranging from shortcomings in location services on mobile devices to the intentional misrepresentation of spatial coordinates by bad ecosystem actors. In this talk, we describe a production data pipeline, Darwin, which analyzes the location quality of mobility data to measure how accurately a set of mobility data represents true movement patterns. Darwin uses a number of measures that are ultimately combined into two quality scores: hyper-locality and clusterability. These measurements include techniques from information theory, the mean number of spatial clusters, the compactness of the clusters, and the differences between the empirical distribution of digits in the spatial coordinates and reference distributions.
Network Driven Behaviour Modelling for Designing User Centred IoT ServicesFahim Kawsar
We are observing a monumental effort from the industry and academia to make everything connected. Naturally, to understand the needs of these connected things, we need a better understanding of humans and where, when, and how they interact. Then we can create digital services and capabilities that fundamentally change the way we experience our lives. IoT 1.0 is all about connectivity, and scale. IoT 2.0 will be about learning and contextual automation. Designing intention- and behavior-aware services will be the principal source of differentiation, and competitive advantage for the industry players. In this talk I argue that for wide scale adoption, and market penetration of personalized IoT services, existing network infrastructure should play the key role for sensing and learning, by eliminating the cost of deployment and management of many sensors. I will show then how wireless network can be used as a sensing platform to model human behaviour and to redefine people-content, people-thing, and people-people interaction experience in an IoT enabled world.
This document provides an agenda for the MOBISYS seminar, listing the speakers and their topics. It includes snippets of conversations between the speakers as they introduce themselves and their topics. The document discusses research related to pervasive computing, wireless networks, and mobile systems.
This document provides an agenda for the MOBISYS seminar, listing the speakers and their topics. It includes snippets of conversations between the speakers as they introduce themselves and their topics. The document discusses research related to pervasive computing, wireless networks, and mobile systems.
Sense Networks uses proprietary technology and location analytics expertise to analyze location history and behavior patterns to deliver unique insights and intelligence. They have built a high-capacity platform called MacroSense that can extract information from tens of millions of user locations and points of interest, segmenting users and predicting future behaviors. Their location-based segments have been shown to drive user responses and actions.
Digital Aura allows Alice and Bob to connect via their Bluetooth devices when Bob recognizes Alice's request for help with algorithms from her profile as he passes by. His phone alerts him and transfers Alice's contact information so they can get in touch.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
1. Jiang Zhu and Joy Y. Zhang
Carnegie Mellon University
August 2nd, 2011
1
2. • Monitor and track user mobility behavior in WLAN environment
using RSS trace
• Convert mobility traces and other context information to Behavior
Text representations
• Build n-gram language model with behavior text and use it for
anomaly detection to discover loss or theft events
2
4. 60% Miami
New York
50%
Los Angeles
40% Phoenix
30% Sacramento
Chicago
20%
Dallas
10% Houston
0% Philadelphia
Boston
Mobile Device Loss or theft San
Francisco
frequent visited
Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September 21-
28, 2010, with an oversample in the top 20 cities (based on population).
4
5. Business and personal • CAPEX loss
applications running together • Data loss
Corporate messaging, email on • Recovery effort
personal devices
•Loss of business
Intranet wireless access on
personal devices ―The 329 organizations polled had
Personal finance and banking on collectively lost more than 86,000
devices … with average cost of lost
corporate devices
data at $49,246 per device, worth
Mobile payments and credentials $2.1 billion or $6.4 million per
organization.
"The Billion Dollar Lost-Laptop Study," conducted by Intel
Corporation and the Ponemon Institute, analyzed the
scope and circumstances of missing laptop PCs.
5
6. Detection
To discover the Mitigation
loss and theft early
enough to initiate Revoke access to
other steps sensitive
data, applications
or services
Notification
Notify
owners, administra Recovery
tors or authority
Rescue device
Recover/restore
data
6
7. • Mobilityas Behavior
• Mobility modeling is a well studied research area
• Can be measured and tracked: Wi-Fi, GPS, Cellular, etc
• Other contextual information can be combined: Bluetooth, accelerometer, etc
• Other motivating applications
• Healthcare: Inpatient telemetry.
• Education: Young children monitoring
• Law reinforcement: Inmates monitoring and control
7
9. • Past and current location trigger future locations
Hallway A Office
Break
Room
Hallway B Bathroom
• User mobility as short sequence of locations
[1] [2]
• ―Language as action‖: Language vs. streams of sensor data
• Composing elements: sensor data vs. words in corpus
• Sequence structure: local dependency vs. ―grammar‖
[1] Aipperspach, et al, ―Modeling Human Behavior from Simple sensors in the Home‖, PerCom 2006
[2] Buthpitya, et al, ―n—gram Geo-Trace Modeling‖, Pervasive 2011
9
10. • User location at time t depends only on the last n-1 locations
• Sequence of locations can be predicted by n consecutive location
in the past
• Maximum Likelihood Estimation from training data by counting:
• MLE assign zero probability to unseen n-grams
Incorporate smoothing function (Katz)
Discount probability for observed grams
Reserve probability for unseen grams
10
11. • Long distance dependency of words in sentences
• tri-grams for ―I hit the tennis ball‖: ―I hit the‖, ―hit the tennis‖ ―the tennis ball‖
• ―I hit ball‖ not captured
• Future pseudo location depends on locations far in the past.
Intermediate behavior has little relevance or influence
• Noise in the data collected: ―ping-pong‖ effect in WLAN
association, interference, sampling errors, etc
• Model size
11
13. • Collect RSS of the devices on multiple WAPs with timestamps
• Aggregate and serialize into time series of RSS vectors
* Lin, et al ―WASP: An enhanced indoor location algorithm for a congested wi-fi environment‖
13
14. • Dimensionality in RSS vector – too fine for modeling
• Proximity in location results in similar RSS vector
• K-means clustering algorithm with distance function similar to
WASP[1] and each cluster assigned a pseudo location label
[1] Lin, et al ―WASP: An enhanced indoor location algorithm for a congested wi-fi environment‖
14
15. • Repeating location labels dominate n-gram statistics
• Extracting ―duration‖ by counting repeating labels
• Only append ―duration‖ label if Mutual Information of locationand
duration is high
• Dependency - ―Conference Room‖ + ―1 hours‖ infer ―Meeting‖
• Personal - ―Professor’s Office‖ + ―10 minutes‖ infer ―Student’s quick chat‖
• Segment behavior text sequences based on time-of-day
• Behavior follows routine and agenda
• Varying among users
• Cut the boundary based on activity level
15
16. Extract Preprocessing Anomaly
Pseudo Detection
Location
Behavior Text
RSS N-gram
Generation
Trace Model
Fusion
Extract
Sensing
Other
Features
Anomaly Y/N
16
17. • Feed sequence of the past locations in a sliding window of size N
to n-gram model for testing
• For a testing sequence of pseudo locations
• Estimate the average log probability this sequence is generated
from the n-gram or skipped n-gram model
• If this likelihood drops below a threshold, flag an anomaly alert
17
18. 0. 8
0. 7
Aver age Log Pr obabi l i t y
0. 6
0. 5
0. 4
C D A
0. 3
0. 2
Log Probility B
Low Threshold
High Threshold
0. 1
0
Sl i di ng W ndow Posi t i on
i
18
19. Extract Preprocessing Anomaly
Pseudo Detection
Location
Behavior Text
RSS N-gram
Generation
Trace Model
Fusion
Extract
Sensing
Other
Features
Threshold >
Anomaly Y/N
19
21. Dataset
• RSS vector clustering
Users 40
• Run small subset trace with
Cisco SJC 14 1F
Location
Alpha networks
different K and evaluate
clustering performance by
RSS
13 sec average distance to centroids
sampling rate
Period 5 days • K = 3X #WAPs has the best
trade-offs
Number of WAPs 87
• Yield ~260 pseudo locations
Cisco Aironet
Device
1500 + MSE
Dataset Size 3.2 mil points
21
22. • Testing samples
Positive sample: simulated anomaly by splicing traces from two different users
Negative sample: trace from ―owner‖
22
23. • Train n-gram models with 8 hour data
• Continuous 5-gram model and Skipped 3-gram with skipping
factor k=2 result in similar accuracy ~ 60%
• Model complexity: k-order reduction
• Skip factor K is data dependent: particular scenarios in our data set: office
with hallways and corridors
• Further investigation needed to find the optimal K.
• Replacing repeating labels with duration feature improve the
model
Before collapsing, 5-gram statistics are dominated by several sequences with
long repeating locations. Top 200 grams are repeating labels
After collapsing, 5-gram statistics are well distributed
• Time-of-day has only marginal improvement, <1%
23
24. 1
0.9
0.8
True Positive Rate
0.7
0.6
0.5
0.4
0.3
0.2 Data Size (12 Hrs)
0.1 Data Size (8 Hrs)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Positive Rate
Source information is set at 12 points.
24
25. 1
0.9
0.8
0.7
0.6
Accuracy
0.5
0.4
0.3
Data size (4hr)
0.2
Data size (8hr)
0.1 Data size (12hr)
0
0 1 2 3 4 5 6 7 8 9 10
n-gram order
Source information is set at 12 points.
25
27. Extract Preprocessing Anomaly
Pseudo Detection
Location
Behavior Text
RSS N-gram
Generation
Trace Model
Fusion
Extract
Sensing
Other
Features
Threshold >
Anomaly Y/N
• Experiments to discover loss or theft event through anomaly
detection with 70~80% accuracy with only 8 hours of training data
27
28. Thank you.
And special thanks to our sponsors
CyLab Mobility Research Center
Cisco Systems Inc. Army
Research Office
33. • Extract Mobility model from real trace in WLAN environment
[1]
• Extract mobility tracks, duration from WLAN association records
• Analyze mobility characteristics: pause time, speed, direction, destination
region and their distributions
• Build empirical model to generate synthetic trace
• Steady state and transient behavior can be modeled with Semi-
[2]
Markov model
Transition probability matrix and sojourn time distribution
• Language model to model behavior from sensors in home
[3]
Show support on similarity between language and behavior
Smoothed n-gram model to make single-step prediction on binary sensor
readings from smarthome
[1] Kim et al, ―Extract a Mobility model from Real User Traces, INFORCOM 2006
[2] Lee and Hou, ―Modeling Steady-State and Transient Behaviors of User Mobility‖, MobiHoc 2006
[3] Aipperspach, et al, ―Modeling Human Behavior from Simple sensors in the Home‖, PerCom 2006
33
34. • Overhead and lack • Model complexity • It is straightforward
of granularity in and computational to convert binary
inferring user overhead not sensor data to
location and pause suitable for real behavior text for
time from WLAN time application LM-based
association [Lee’06] analysis.[Aipp’06]
records[Kim’06]
• Simple and cost- • Heterogeneous
• Fine-grain, higher effective model to multi-valued
dimension trace capture mobility sensory data is
data to model reducing ping-pong hard to convert to a
mobility behavior, effects single-dimension
such as RSS behavior text
beacons trace
34
36. • Calculate coordinates for each RSS vector using ―Indoor location‖
algorithm[1] and generate hot region plot
[1] Lin, et al ―WASP: An enhanced indoor location algorithm for a congested wi-fi environment‖
36
37. • Select 10 users with the least cross entropy
37
38. • Help Cisco to adopt this model to Mobility Service Engine
• Heterogeneous sensor data fusion
Network traffic patterns from wireless controllers
Applications, Memory and battery status
GPS, accelerometers, gyroscope, temperature, etc
• Advanced Model
Leverage the internal factorized relationships among various sensors
• Factor Language Model
• More Applications
Prediction: resource allocation, energy saving, personalized services
Anomaly detection: adaptive authentication, patient telemetry
38
40. • Confirm similarity between language and behavior
• Multi-dimension to single dimension and n-gram: low complexity
but good results
• Potential problems:
•Dimensionality reduction to 1-D to use language approach in modeling may
cause loss of the relationship among multi-dimensional data
Sensor 1
Sensor 2
State
•Skipped n-gram approach is dependent on the data and may only have
marginal improvement or even worse results.
40
Editor's Notes
Good afternoon everybody.. Thank you for coming. Today.. I’ll be presenting my work on a language approach for detecting anomalies in user mobility behaviors by modeling their WiFi traces
As a quick overview of our work, in order to do anomaly detection, we monitor and track user mobility behavior through the RSS trace from the Wifi environmentAnd then we convert these trace and other context information to behavior text representation. After that we build a n-gram language model and use it to discover anomaly such as device loss or theft
So why we want to study the anomaly detection in such an environment… let me talk about our motivation.So, as we all know, Mobile applications and devices are becoming ubiquitous. On one side, mobile devices make our lives convenient. And people love it. But on the other side, the broad adoption of mobile applications such as email, messaging.. online banking and personal finance expose our identities and privacy to greater risk. The devices are portable and can be used almost everywhere we go, therefore they are also easy to lose or be stolen.
Last year, a survey showed that on average 36% people who participated in the survey have been experienced device loss or theft in the past. Among the regions surveyed, miami and new york have as high as 50% loss rate. Also, it shows that a big portion of the loss happened at the places that we often visit, such as university campus, office buildings, .
Losing a mobile device nowadays is not the same as 10, 20 years ago. With the proliferation of mobile devices in corporate environment, the boundary of personal devices and business devices are so blurry. People are using the same devices to gain intranet wireless access, to check corporate emails, to work on business documents, even to access trade secrets. If the devices are lost, there would be greater risks in term of data loss.Another survey shows that the data loss cost is about 50 thousand dollars per device or 6.4 million dollars per organization in the past.
Given the high device loss rate and high cost associated with these losses, accountable schemes are needed to promptly and accurately discover and detect these undesirable events. Such detections will facilitate subsequent notification, mitigation and recovery process to control or even avoid the damages. And in our research work, we are focusing on the detection part of the whole action chains, namely “Anomaly Detection” First we collect user behavior and build an accountable behavior model. We can monitoruser behavior constantly and compare it with the learned model. if it deviates from the learned model, we can flag an alert.
Behavior is a broad concept. Here, we want to leverage mobility as behavior as the example just shown. The reason is the following , first, mobility modeling has been studied thoroughly in the past and there are a lot of methodology that we can borrow from and lessions we can learn from. Secondly, mobility can be easily measured in the current computing environment, WIFi, GPS, cellular and can be combined with other context information such as bluetooth and other sensors. …Although our focus is on the detection of mobile device loss and theft, there are a lot of other motivating applications of mobility anomaly detection. One interesting example among other listed here is inpatient monitoring or telemetry. Imagine if we can detect anomaly in inpatient’s mobility in a hospital, medical help can be called up to handle the situation promptly.
What we would like our system to do is to ..Sense the WiFI signals of the mobile devices …. Do some preprocessing on the data …. And then feed it into an anomaly detection model which then outputs whether there is an anomaly or not let’s take a look at the assumptions on which our approach is based on.
Our system is based on the assumption that a user will have a unique set of locations which act as triggers for their future locations. For example, an employee exiting the break room may have two destinations: hallway A and hallway BIf we know he is taking hallway A, << hit enter>>we know that he will be in his office soon. Other wise, <<hit enter>> he may go to bath room insteadPrevious work showed that the user mobility model can be estimated by short sequence of locations … and showed a correlation between human behavior and natural language. … research also showed that language model can be used to effectively detect anomaly in Geo-tracing
So building along this line, we use a continousn-gram model to learn the sequence of locations from user’s wifi traces.N-gram model works under the assumptions that the next location in the sequence .. depends on just the last n-1 locations… Once the n-gram model is trained, we can use it to calculate the probability of all possible next locations given the past n-1 locations…. and see which one is the most likely location.To train the model, we use maximum likelihood estimation on the training sequences to estimate these conditional probability … just by counting. As show in this equation, MLE probability of being in location at time i conditioned on the past n-1 history locations is… just the count of all n sequences in the data divided by the count of all these n-1 sequences. There is one small problem with this approach. Let’s say our model come across a location that has not been seen in the training. It just assumes a zero probability. This may push the system to trigger anomaly alert. Luckily, N-gram model is very robust in handling unseen labels if we use smoothing. Smoothing algorithms such as Katz… are to take some probability mass from the seen locations and reserve them for those unseen locations.
In natural language, words in a sentence may have long-distance dependencies. For example, the sentence “I hit the tennis ball” … has 3 tri-grams.. “I hit the” … “hit the tennis” .. And.. “the tennis ball” It is clear that an equally important tri-gram “I hit ball” is not normally captured by the continuous n-gram… because the separators ‘the” “tennis” is in the middle. If we could skip the separators … and we can form this important tri-gram. I hit ball Similarity, in our continuous n-gram model I just described, user’s next locations is dependent only on his n-1 previous locations. However, in many cases this may not be true.Use the same example, if a user is leaving the break room and entering hallway that leads to his office, we can predict he will be in his office soon. The intermediate locations along the hallway and before entering the office are not that important. Those locations can be skipped in the modeling. As shown in the diagram here, ABC is the break room, ACD is the entrance of the hallway and EDB is the office. Anything in the middle can be skipped and still give the same results. By skipping detracting grams, now… the effective n-gram order becomes (n-d). Therefore, we can reduce the size of the model in terms of computation and storage because the n-gram model has better performance for a lower value of n.
Now we have talked about our language based model on the right hand side. But we can’t feed the wifi traces to n-gram model directly, Because, Firstly n-gram models can’t handle numeric data like signal strength. It can only take discrete sets of symbols. The Second issue is that … even though we represent the RSS trace as vectors, the amount of data required to create a model with reasonable accuracy would be immerse. Because it is not likely there will be repeating signal strength with the exact the same readings. Therefore, we need to take a look at our data and find a way to convert the sensed data into text representation.
The Wifi trace we collect in our system is different from the Dartmouth data set. The management, control and data frames from a device will be heard by multiple APs. In our particular setup, these APs will record the Received signal strength or RSS of those frame along with the Identity of the device and timing information.These traces will be aggregated to a central location .. where we can serialize these traces based on the time stamp and classify them using the device IDs. So.. for a particular device, we can build a time series of RSS vector, each element in the vector is the RSS from a particular AP. These series of RSS vector along with other context information serves as the input to the preprocessing module…. Where we will convert these to a text representation before feed them into our n-gram model.
From the signal propagation model, if two vectors are very similar, we know that the location where this vectors are measured should be within a reasonable proximity. Based on this assumption, we want to partition the RSS vector space into many “pseudo locations” and assign each “pseudo location” a unique label. By pseudo, we mean we don’t need to know the exact location of the reading, we just need to distinguish between two different locationsWell, this can be easily done by clustering algorithm… for example K-means clustering. In the k-mean clustering runs, we use a distance function similar to redpin and WASP in addition to the standard cosine function to reduce the noise caused by interference.Once the clustering is done, we assign labels to all the members belong to the same cluster….
We also incorporated other features.Due to the way how the data is collected and aggregated, there could be a lot of repeating labels in the sequences if a user stay at one location for a long time. To extract one more “duration” feature, we count the repeating labels and remove the repeating sequence and add a new label … with both location and duration information. One minor improvement we did is to only append the duration label if the mutual information between the location and duration is high. Intuitively, we want to capture the correlations between the location and the duration. For example, conference room + 1 hour will imply a meeting. While office + 10 min will imply a quick visit. …Time-of-day features is also quantized into 4 labels and appended to the main pseudo location label. Quantization process is not based on a fixed boundary because we know that user’s mobility also follow certain regularities due to job roles and responsibility. Sometimes it follows a personalized agenda. We choose the boundary ..for time of day.. based on user’s activity level. << next slide>>Mutual information I(X, Y) = int_yint_xp(x,y) log[ (p(x,y)/(p1(x)p2(y)] I(X,Y) = 0 -> independentI(X,Y) >=0
Now we have the Sensing, Preprocessing and Modeling parts in place, let’s take a look how this system is used to do anomaly detection
We feed the RSS trace to the preprocessing module and then feed it to the n-gram model.. And the n-gram model continuously produces the likelihood estimate for the last N behavior text,… specifically, we will calculate the average log probability of this N behavior text using this equation If this likelihood drops below a certain threshold, the system will trigger an anomaly alert.
This graph shows the anomaly detection process and demonstrate different threshold may cause either detection delay (B) or cause false positives (point C & D) when point A is the actual anomaly point. The way to find the right threshold is to use receiver-operating-characteristic curve or ROC curve. We will look at this in more details later in the talk.
So, this complete the whole system architecture. We have the sensing part that produce RSS traces, we have preprocessing part that convert the traces and other context information to behavior text and we have the modeling training and inference part that is used to do anomaly detection with a design parameter “threshold”
Now, let’s discuss the experiments we did.Before looking at the experiments and results, let me describe the data set we used.
So… we collected the RSS traces from 87 WAPs in an office building over 5 days. The time precision of the RSS sample is at 13 sec level. These traces contain complete data of 40 users and … in total we have about 3.2 mil data points. To determine the number of clusters in the k-means clustering, we took a small subset traces and run the algorithm with different Ks. We evaluated the results by looking at the average distance to centroids and number of iterations. If we choose k as number of Aps, it will be similar to using association records. If K is too large, the clustering algorithm will take long to finish and the resulting n-gram model will have large vocabulary size. We found if we pick K as 3 times of the number of Aps, it will provides reasonable clustering performance and quality compared to 4 times or 5 times. This resulted in about 260 pseudo location labels. Backup data points:Pseudo location from RSS (other schem not very ….) 1500 data points (RSS) per user at average RSS from 3-7 WAPs.assume user up half of the time -> 80k data points per user for 5 days3.2 mil data points collected for 40 users. 20 mils rss readingsFor each of these 40 users, 16K RSS vector total
To validate our system, we need to have some testing data. However, from the trace we collected, there are no recorded anomaly fortunately. We created simulated device stolen events by splicing two users’ trace segments at their intersection points…. where similar label or labels sequences are shared. We combined this simulated traces with normal traces to create a testing data set.
Before we run experiments to explore the design parameter space such as threshold, n-gram order n and training size, we want to gain some insights on whether the model works and whether the ideas in preprocessing ,, we described.. have some impacts. First, we want to how skipped n-gram affect our model. Using 8 hours of data, we train a continuous 5-gram model and skip-2 5-ngram model. Both model can capture similar length of mobility behavior and with similar detection accuracy. But the skip n-gram model has k-order reduction in the model size. This particular scenario works is probably due to the environment where the data is collected. The office floor has hallways and corridors and people have to follow those to walk around. We also found that … removing the repeating labels and adding the duration features help in the model. The 5-gram model was dominated by these repeating labels. Actually top 200 grams are repeating or partially repeating grams. After we enable the duration feature, the 5-grams statistics are better distributed. Lastly, we found the time-of-day feature doesn’t provide much gain as it brings about less than 1% improvement. This is probably due to the length of the training data. 8 hours training may not be able to capture the daily routine that well, so… time-of-day feature doesn’t have significant effect on the results.
Now we gained some insights on our approach. It is time to explore some of the design parameters we mentioned in the beginning. The first set of experiments is to find the best anomaly detection threshold. Actually there is no best threshold, the threshold is depending on the applications we are running. What’s the requirements on the detection accuracy? Can we allow much false positive? Do we have enough training data? To provide a guideline in answering these questions, we plot Receiver Operating Characteristic curve (or ROC curve) Essentially, ROC curve is about the trade-offs between the true-positive rate and false-positive rate in our anomaly detection. We perform the experiments with different training data sizes. We plot the ROC curve by varying the threshold and record the TPR and FPRWith the ROC curve, we can decide the threshold for a particular application depending on The amount of data the model should see before the model can detect anomaly The required TPR Or the acceptable FPRFor example, we want to use 8 hour training size and want to have less than 0.1 false positive rate, then we just need to locate this point and obtain the threshold by which this data point is generated. (0.4) We need to use threshold < 0.4 in order to fulfill the FPR requirement. Another example: let’s say we want to have the same FPR requirement but want to have TPR > 0.8, then we have to use more than 8 hours training size to archive this goal.
We plot this graphs with different training size and n-gram orders. From the graph, we can see several things. A higher order model captures more context and in turn increase accuracy. But…. , accuracy saturates beyond 5, which means in user’s behavior is more likely to be dependent on its last 5 pseudo locations. This resonates with the past work we mentioned in the beginning. It also tells us that increase the model complexity beyond this point will NOT bring about significant improvement.Second, it shows that if the training size is as small as 4 hours, it may not capture users’ mobility behavior thoroughly enough to make an accurate detection. Also, the closeness between 8 hr and 12 hour curves also suggests that our system will provide relative good results if we have observed users’ behavior for 8 hours. One interesting point to make here is the 12 hour and 8 hour curve cross over at the lower n-gram orders. While this could be due to errors in handling the data, our explanation is leaning towards that the bigger training data set will exposure more common locations that are not captured in the shorter training size. With these common locations, people are sharing a lot of shorter sequences, leading to more simulated anomaly are not detected and … bring down the accuracy.
So now lets see what we conclude from this work and the future work we plan to do
In conclusion, we have build a system that we monitor and track user mobility behavior through the RSS trace from the WLAN environmentWe convert these trace and other context information to behavior text representation. And we build a n-gram language model and use it to discover anomaly such as device loss or theft.
Finally, I would like to thank our sponsors from Cylab, Cisco and Army ResearchAnd Thank you all very much for your attention.
Thinking of a simple example, where the red traces in this office floor represent the usual mobility of a user. In this case, this user is finishing a meeting in a conference room and is going back to his cubicle. << hit enter >>Now, if we look at the another path user is taking, instead of going this way, he is going towards the other direction. <<hit enter>>Then deviating further and further like thisIn such a case, we would want to flag this as an anomaly. It could be a case that a visitor who attend the meeting and took the device the employee forgot in the conference room and went away. the device may still has the access to company internal network and other data source, by receiving this alert, the infrastructure would revoke his authentication credentials temporarily until the user can authentication himself again. <<hit enter>>Now, if in stead of going further away, he is going back to his cubile, just by taking an alternate path. In this case, we probably do not want to flag this as a anomaly
As I just mentioned … mobility modeling is a well studied research area. Before we go into talking about our model, let me talk about some related work.
Mobility Model have been heavily used in networking research esp in Ad hoc networks.Popular models such as random way points dereived from mathematical simplications. Work by a group people in Dartmouth college is among the first attempt to construct a Wifi mobility model from real-world traces. The trace data is basically the association records collected from the wifi environment. Because… the association record may not reflect user’s actual location, they developed methods and heuritics to extract mobility tracks and pause time. They draw distributions for pause time, speed, direction of travel and destination region…. and use this to build an empirical model to generate sythentic traces. There are other works to model mobility using markov models. However, research showed that in real trace, pause-time doesn’t follow exponential distribution, therefore Markov model may not be realistic if the pause duration follows other distributions. Another group in UIUC used the same data set and adopted a semi-markov model to study the steady-state and transient behavior. They constructed transition probability matrix and sojourn time distributions … and built a time location prediction algorithm to handle load balancing in Wfi networks. Another work using Georgia Tech’s smart home data set… captured our attention. In that work, the authors use simple smoothed n-gram model to make single-step prediction on binary sensor readings. It further showed the support on similarity between language and human behavior. It actually inspired us to look at the solutions of mobility modeling using a language approach.
All this existing work motivate us to think more on how to build a simple and effective mobility model to capture human behaviors. First, WiFi association records is one level indirection from the user mobility. We would like to have more direct sensor readings to reflect user mobility tracks. Secondly, semi-markov model or even DBN models are too complex for real time application. For the anomaly detection application that we are interested in, we need to come up with a simpler approach in order to have real time performance. Lastly, language and n-gram approach seems very promising on the simplicity side, however, converting mobility traces. Mostly multi-valued data streams, to a single demension text representation is very challenging. It is even more challenging if we want add other context information to it. With these findings and thoughts in mind, let me start to describe our approach. <<hit enter>>
Since we are reusing other user’s trace for testing, there is a problem that could lead to unfair evaluation. If the users that we used to splice the traces … have very different mobility regions, it should be very easy to detect the simulated anomaly… because their uni-gram statistics are so different. We would like to evaluate the systems using the testing data sets that are generated from users who share mobility behaviors. First, we want to see if user’s mobility areas are separable. We run “indoor location” algorithm and calculate the (x,y) coordinates. This gave us a chance to visualize the mobility patterns and coverage area.As shown in this particular graph, orange and green users are completely separated… and the red and blue have some overlap, but still partitioned. We need to remove user pairs like this in our simulated anomaly generation process.
Of course, we can NOT run the locationing algorithm for all our traces. We want to filter out those users at the pseudo location label level. …Cross entropy provides a way to measure the correlations of two distributions, and it is a good fit for our problem.We calculate the cross entropy of pseudo location labels for all the 40 users …. And we chose the 10 users with least cross entropy. This is to ensure these users mobility paths strongly overlap and it will provide fair evaluation with the simulated anomaly.
For future work… As part of the sponsored research, we will help cisco integrate this model to their MSE as a value-added mobility application.This model will work with existing CCX solution to help in enterprise device security as well as leveraging its prediction capability to improve VoIP roaming performance. We are also looking into obtaining more heterogeneous sensor data from the current system such as traffic pattern, device capability and other external sensors such GPS and temperature to build a more robust sensor fusion framework As mentioned in the previous slides, to solve the problem with the factor relationship among different sensors, we plan to adopt factor language model. Last but not the least, we are looking for opportunities to apply this work to more appealing applications in healthcare and in security.
One big message from this work is that we confirm the similarity between language and behavior again. N-gram model is simple and versatile enough for various applications. We demonstrated that we can combine multi-dimension data into a single dimension and convert to behavior text. We also demonstrated some of our ideas in preprocessing, modeling and testing led to reasonable improvements. Through experiments, we explored the parameters space and gained valuable insights. We also discovered some potential problems with these ideas. Especially with the dimensionality reduction. If the sensors have internal relationship and different factors towards the behavior modeling, reducing them blindly to 1-D may actually lose that information. Also, the skipped n-gram model is dependent on the data and needs further investigation.