Icccn2011 jiang-0802


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Good afternoon everybody.. Thank you for coming. Today.. I’ll be presenting my work on a language approach for detecting anomalies in user mobility behaviors by modeling their WiFi traces
  • As a quick overview of our work, in order to do anomaly detection, we monitor and track user mobility behavior through the RSS trace from the Wifi environmentAnd then we convert these trace and other context information to behavior text representation. After that we build a n-gram language model and use it to discover anomaly such as device loss or theft
  • So why we want to study the anomaly detection in such an environment… let me talk about our motivation.So, as we all know, Mobile applications and devices are becoming ubiquitous. On one side, mobile devices make our lives convenient. And people love it. But on the other side, the broad adoption of mobile applications such as email, messaging.. online banking and personal finance expose our identities and privacy to greater risk. The devices are portable and can be used almost everywhere we go, therefore they are also easy to lose or be stolen.
  • Last year, a survey showed that on average 36% people who participated in the survey have been experienced device loss or theft in the past. Among the regions surveyed, miami and new york have as high as 50% loss rate. Also, it shows that a big portion of the loss happened at the places that we often visit, such as university campus, office buildings, .
  • Losing a mobile device nowadays is not the same as 10, 20 years ago. With the proliferation of mobile devices in corporate environment, the boundary of personal devices and business devices are so blurry. People are using the same devices to gain intranet wireless access, to check corporate emails, to work on business documents, even to access trade secrets. If the devices are lost, there would be greater risks in term of data loss.Another survey shows that the data loss cost is about 50 thousand dollars per device or 6.4 million dollars per organization in the past.
  • Given the high device loss rate and high cost associated with these losses, accountable schemes are needed to promptly and accurately discover and detect these undesirable events. Such detections will facilitate subsequent notification, mitigation and recovery process to control or even avoid the damages. And in our research work, we are focusing on the detection part of the whole action chains, namely “Anomaly Detection” First we collect user behavior and build an accountable behavior model. We can monitoruser behavior constantly and compare it with the learned model. if it deviates from the learned model, we can flag an alert.
  • Behavior is a broad concept. Here, we want to leverage mobility as behavior as the example just shown. The reason is the following , first, mobility modeling has been studied thoroughly in the past and there are a lot of methodology that we can borrow from and lessions we can learn from. Secondly, mobility can be easily measured in the current computing environment, WIFi, GPS, cellular and can be combined with other context information such as bluetooth and other sensors. …Although our focus is on the detection of mobile device loss and theft, there are a lot of other motivating applications of mobility anomaly detection. One interesting example among other listed here is inpatient monitoring or telemetry. Imagine if we can detect anomaly in inpatient’s mobility in a hospital, medical help can be called up to handle the situation promptly.
  • What we would like our system to do is to ..Sense the WiFI signals of the mobile devices …. Do some preprocessing on the data …. And then feed it into an anomaly detection model which then outputs whether there is an anomaly or not let’s take a look at the assumptions on which our approach is based on.
  • Our system is based on the assumption that a user will have a unique set of locations which act as triggers for their future locations. For example, an employee exiting the break room may have two destinations: hallway A and hallway BIf we know he is taking hallway A, << hit enter>>we know that he will be in his office soon. Other wise, <<hit enter>> he may go to bath room insteadPrevious work showed that the user mobility model can be estimated by short sequence of locations … and showed a correlation between human behavior and natural language. … research also showed that language model can be used to effectively detect anomaly in Geo-tracing
  • So building along this line, we use a continousn-gram model to learn the sequence of locations from user’s wifi traces.N-gram model works under the assumptions that the next location in the sequence .. depends on just the last n-1 locations… Once the n-gram model is trained, we can use it to calculate the probability of all possible next locations given the past n-1 locations…. and see which one is the most likely location.To train the model, we use maximum likelihood estimation on the training sequences to estimate these conditional probability … just by counting. As show in this equation, MLE probability of being in location at time i conditioned on the past n-1 history locations is… just the count of all n sequences in the data divided by the count of all these n-1 sequences. There is one small problem with this approach. Let’s say our model come across a location that has not been seen in the training. It just assumes a zero probability. This may push the system to trigger anomaly alert. Luckily, N-gram model is very robust in handling unseen labels if we use smoothing. Smoothing algorithms such as Katz… are to take some probability mass from the seen locations and reserve them for those unseen locations.
  • In natural language, words in a sentence may have long-distance dependencies. For example, the sentence “I hit the tennis ball” … has 3 tri-grams.. “I hit the” … “hit the tennis” .. And.. “the tennis ball” It is clear that an equally important tri-gram “I hit ball” is not normally captured by the continuous n-gram… because the separators ‘the” “tennis” is in the middle. If we could skip the separators … and we can form this important tri-gram. I hit ball Similarity, in our continuous n-gram model I just described, user’s next locations is dependent only on his n-1 previous locations. However, in many cases this may not be true.Use the same example, if a user is leaving the break room and entering hallway that leads to his office, we can predict he will be in his office soon. The intermediate locations along the hallway and before entering the office are not that important. Those locations can be skipped in the modeling. As shown in the diagram here, ABC is the break room, ACD is the entrance of the hallway and EDB is the office. Anything in the middle can be skipped and still give the same results. By skipping detracting grams, now… the effective n-gram order becomes (n-d). Therefore, we can reduce the size of the model in terms of computation and storage because the n-gram model has better performance for a lower value of n.
  • Now we have talked about our language based model on the right hand side. But we can’t feed the wifi traces to n-gram model directly, Because, Firstly n-gram models can’t handle numeric data like signal strength. It can only take discrete sets of symbols. The Second issue is that … even though we represent the RSS trace as vectors, the amount of data required to create a model with reasonable accuracy would be immerse. Because it is not likely there will be repeating signal strength with the exact the same readings. Therefore, we need to take a look at our data and find a way to convert the sensed data into text representation.
  • The Wifi trace we collect in our system is different from the Dartmouth data set. The management, control and data frames from a device will be heard by multiple APs. In our particular setup, these APs will record the Received signal strength or RSS of those frame along with the Identity of the device and timing information.These traces will be aggregated to a central location .. where we can serialize these traces based on the time stamp and classify them using the device IDs. So.. for a particular device, we can build a time series of RSS vector, each element in the vector is the RSS from a particular AP. These series of RSS vector along with other context information serves as the input to the preprocessing module…. Where we will convert these to a text representation before feed them into our n-gram model.
  • From the signal propagation model, if two vectors are very similar, we know that the location where this vectors are measured should be within a reasonable proximity. Based on this assumption, we want to partition the RSS vector space into many “pseudo locations” and assign each “pseudo location” a unique label. By pseudo, we mean we don’t need to know the exact location of the reading, we just need to distinguish between two different locationsWell, this can be easily done by clustering algorithm… for example K-means clustering. In the k-mean clustering runs, we use a distance function similar to redpin and WASP in addition to the standard cosine function to reduce the noise caused by interference.Once the clustering is done, we assign labels to all the members belong to the same cluster….
  • We also incorporated other features.Due to the way how the data is collected and aggregated, there could be a lot of repeating labels in the sequences if a user stay at one location for a long time. To extract one more “duration” feature, we count the repeating labels and remove the repeating sequence and add a new label … with both location and duration information. One minor improvement we did is to only append the duration label if the mutual information between the location and duration is high. Intuitively, we want to capture the correlations between the location and the duration. For example, conference room + 1 hour will imply a meeting. While office + 10 min will imply a quick visit. …Time-of-day features is also quantized into 4 labels and appended to the main pseudo location label. Quantization process is not based on a fixed boundary because we know that user’s mobility also follow certain regularities due to job roles and responsibility. Sometimes it follows a personalized agenda. We choose the boundary ..for time of day.. based on user’s activity level. << next slide>>Mutual information I(X, Y) = int_yint_xp(x,y) log[ (p(x,y)/(p1(x)p2(y)] I(X,Y) = 0 -> independentI(X,Y) >=0
  • Now we have the Sensing, Preprocessing and Modeling parts in place, let’s take a look how this system is used to do anomaly detection
  • We feed the RSS trace to the preprocessing module and then feed it to the n-gram model.. And the n-gram model continuously produces the likelihood estimate for the last N behavior text,… specifically, we will calculate the average log probability of this N behavior text using this equation If this likelihood drops below a certain threshold, the system will trigger an anomaly alert.
  • This graph shows the anomaly detection process and demonstrate different threshold may cause either detection delay (B) or cause false positives (point C & D) when point A is the actual anomaly point. The way to find the right threshold is to use receiver-operating-characteristic curve or ROC curve. We will look at this in more details later in the talk.
  • So, this complete the whole system architecture. We have the sensing part that produce RSS traces, we have preprocessing part that convert the traces and other context information to behavior text and we have the modeling training and inference part that is used to do anomaly detection with a design parameter “threshold”
  • Now, let’s discuss the experiments we did.Before looking at the experiments and results, let me describe the data set we used.
  • So… we collected the RSS traces from 87 WAPs in an office building over 5 days. The time precision of the RSS sample is at 13 sec level. These traces contain complete data of 40 users and … in total we have about 3.2 mil data points. To determine the number of clusters in the k-means clustering, we took a small subset traces and run the algorithm with different Ks. We evaluated the results by looking at the average distance to centroids and number of iterations. If we choose k as number of Aps, it will be similar to using association records. If K is too large, the clustering algorithm will take long to finish and the resulting n-gram model will have large vocabulary size. We found if we pick K as 3 times of the number of Aps, it will provides reasonable clustering performance and quality compared to 4 times or 5 times. This resulted in about 260 pseudo location labels. Backup data points:Pseudo location from RSS (other schem not very ….) 1500 data points (RSS) per user at average RSS from 3-7 WAPs.assume user up half of the time -> 80k data points per user for 5 days3.2 mil data points collected for 40 users. 20 mils rss readingsFor each of these 40 users, 16K RSS vector total
  • To validate our system, we need to have some testing data. However, from the trace we collected, there are no recorded anomaly fortunately. We created simulated device stolen events by splicing two users’ trace segments at their intersection points…. where similar label or labels sequences are shared. We combined this simulated traces with normal traces to create a testing data set.
  • Before we run experiments to explore the design parameter space such as threshold, n-gram order n and training size, we want to gain some insights on whether the model works and whether the ideas in preprocessing ,, we described.. have some impacts. First, we want to how skipped n-gram affect our model. Using 8 hours of data, we train a continuous 5-gram model and skip-2 5-ngram model. Both model can capture similar length of mobility behavior and with similar detection accuracy. But the skip n-gram model has k-order reduction in the model size. This particular scenario works is probably due to the environment where the data is collected. The office floor has hallways and corridors and people have to follow those to walk around. We also found that … removing the repeating labels and adding the duration features help in the model. The 5-gram model was dominated by these repeating labels. Actually top 200 grams are repeating or partially repeating grams. After we enable the duration feature, the 5-grams statistics are better distributed. Lastly, we found the time-of-day feature doesn’t provide much gain as it brings about less than 1% improvement. This is probably due to the length of the training data. 8 hours training may not be able to capture the daily routine that well, so… time-of-day feature doesn’t have significant effect on the results.
  • Now we gained some insights on our approach. It is time to explore some of the design parameters we mentioned in the beginning. The first set of experiments is to find the best anomaly detection threshold. Actually there is no best threshold, the threshold is depending on the applications we are running. What’s the requirements on the detection accuracy? Can we allow much false positive? Do we have enough training data? To provide a guideline in answering these questions, we plot Receiver Operating Characteristic curve (or ROC curve) Essentially, ROC curve is about the trade-offs between the true-positive rate and false-positive rate in our anomaly detection. We perform the experiments with different training data sizes. We plot the ROC curve by varying the threshold and record the TPR and FPRWith the ROC curve, we can decide the threshold for a particular application depending on The amount of data the model should see before the model can detect anomaly The required TPR Or the acceptable FPRFor example, we want to use 8 hour training size and want to have less than 0.1 false positive rate, then we just need to locate this point and obtain the threshold by which this data point is generated. (0.4) We need to use threshold < 0.4 in order to fulfill the FPR requirement. Another example: let’s say we want to have the same FPR requirement but want to have TPR > 0.8, then we have to use more than 8 hours training size to archive this goal.
  • We plot this graphs with different training size and n-gram orders. From the graph, we can see several things. A higher order model captures more context and in turn increase accuracy. But…. , accuracy saturates beyond 5, which means in user’s behavior is more likely to be dependent on its last 5 pseudo locations. This resonates with the past work we mentioned in the beginning. It also tells us that increase the model complexity beyond this point will NOT bring about significant improvement.Second, it shows that if the training size is as small as 4 hours, it may not capture users’ mobility behavior thoroughly enough to make an accurate detection. Also, the closeness between 8 hr and 12 hour curves also suggests that our system will provide relative good results if we have observed users’ behavior for 8 hours. One interesting point to make here is the 12 hour and 8 hour curve cross over at the lower n-gram orders. While this could be due to errors in handling the data, our explanation is leaning towards that the bigger training data set will exposure more common locations that are not captured in the shorter training size. With these common locations, people are sharing a lot of shorter sequences, leading to more simulated anomaly are not detected and … bring down the accuracy.
  • So now lets see what we conclude from this work and the future work we plan to do
  • In conclusion, we have build a system that we monitor and track user mobility behavior through the RSS trace from the WLAN environmentWe convert these trace and other context information to behavior text representation. And we build a n-gram language model and use it to discover anomaly such as device loss or theft.
  • Finally, I would like to thank our sponsors from Cylab, Cisco and Army ResearchAnd Thank you all very much for your attention.
  • Thinking of a simple example, where the red traces in this office floor represent the usual mobility of a user. In this case, this user is finishing a meeting in a conference room and is going back to his cubicle. << hit enter >>Now, if we look at the another path user is taking, instead of going this way, he is going towards the other direction. <<hit enter>>Then deviating further and further like thisIn such a case, we would want to flag this as an anomaly. It could be a case that a visitor who attend the meeting and took the device the employee forgot in the conference room and went away. the device may still has the access to company internal network and other data source, by receiving this alert, the infrastructure would revoke his authentication credentials temporarily until the user can authentication himself again. <<hit enter>>Now, if in stead of going further away, he is going back to his cubile, just by taking an alternate path. In this case, we probably do not want to flag this as a anomaly
  • As I just mentioned … mobility modeling is a well studied research area. Before we go into talking about our model, let me talk about some related work.
  • Mobility Model have been heavily used in networking research esp in Ad hoc networks.Popular models such as random way points dereived from mathematical simplications. Work by a group people in Dartmouth college is among the first attempt to construct a Wifi mobility model from real-world traces. The trace data is basically the association records collected from the wifi environment. Because… the association record may not reflect user’s actual location, they developed methods and heuritics to extract mobility tracks and pause time. They draw distributions for pause time, speed, direction of travel and destination region…. and use this to build an empirical model to generate sythentic traces. There are other works to model mobility using markov models. However, research showed that in real trace, pause-time doesn’t follow exponential distribution, therefore Markov model may not be realistic if the pause duration follows other distributions. Another group in UIUC used the same data set and adopted a semi-markov model to study the steady-state and transient behavior. They constructed transition probability matrix and sojourn time distributions … and built a time location prediction algorithm to handle load balancing in Wfi networks. Another work using Georgia Tech’s smart home data set… captured our attention. In that work, the authors use simple smoothed n-gram model to make single-step prediction on binary sensor readings. It further showed the support on similarity between language and human behavior. It actually inspired us to look at the solutions of mobility modeling using a language approach.
  • All this existing work motivate us to think more on how to build a simple and effective mobility model to capture human behaviors. First, WiFi association records is one level indirection from the user mobility. We would like to have more direct sensor readings to reflect user mobility tracks. Secondly, semi-markov model or even DBN models are too complex for real time application. For the anomaly detection application that we are interested in, we need to come up with a simpler approach in order to have real time performance. Lastly, language and n-gram approach seems very promising on the simplicity side, however, converting mobility traces. Mostly multi-valued data streams, to a single demension text representation is very challenging. It is even more challenging if we want add other context information to it. With these findings and thoughts in mind, let me start to describe our approach. <<hit enter>>
  • Since we are reusing other user’s trace for testing, there is a problem that could lead to unfair evaluation. If the users that we used to splice the traces … have very different mobility regions, it should be very easy to detect the simulated anomaly… because their uni-gram statistics are so different. We would like to evaluate the systems using the testing data sets that are generated from users who share mobility behaviors. First, we want to see if user’s mobility areas are separable. We run “indoor location” algorithm and calculate the (x,y) coordinates. This gave us a chance to visualize the mobility patterns and coverage area.As shown in this particular graph, orange and green users are completely separated… and the red and blue have some overlap, but still partitioned. We need to remove user pairs like this in our simulated anomaly generation process.
  • Of course, we can NOT run the locationing algorithm for all our traces. We want to filter out those users at the pseudo location label level. …Cross entropy provides a way to measure the correlations of two distributions, and it is a good fit for our problem.We calculate the cross entropy of pseudo location labels for all the 40 users …. And we chose the 10 users with least cross entropy. This is to ensure these users mobility paths strongly overlap and it will provide fair evaluation with the simulated anomaly.
  • For future work… As part of the sponsored research, we will help cisco integrate this model to their MSE as a value-added mobility application.This model will work with existing CCX solution to help in enterprise device security as well as leveraging its prediction capability to improve VoIP roaming performance. We are also looking into obtaining more heterogeneous sensor data from the current system such as traffic pattern, device capability and other external sensors such GPS and temperature to build a more robust sensor fusion framework As mentioned in the previous slides, to solve the problem with the factor relationship among different sensors, we plan to adopt factor language model. Last but not the least, we are looking for opportunities to apply this work to more appealing applications in healthcare and in security.
  • One big message from this work is that we confirm the similarity between language and behavior again. N-gram model is simple and versatile enough for various applications. We demonstrated that we can combine multi-dimension data into a single dimension and convert to behavior text. We also demonstrated some of our ideas in preprocessing, modeling and testing led to reasonable improvements. Through experiments, we explored the parameters space and gained valuable insights. We also discovered some potential problems with these ideas. Especially with the dimensionality reduction. If the sensors have internal relationship and different factors towards the behavior modeling, reducing them blindly to 1-D may actually lose that information. Also, the skipped n-gram model is dependent on the data and needs further investigation.
  • Icccn2011 jiang-0802

    1. 1. Jiang Zhu and Joy Y. ZhangCarnegie Mellon UniversityAugust 2nd, 2011 1
    2. 2. • Monitor and track user mobility behavior in WLAN environment using RSS trace• Convert mobility traces and other context information to Behavior Text representations• Build n-gram language model with behavior text and use it for anomaly detection to discover loss or theft events 2
    3. 3. 3
    4. 4. 60% Miami New York 50% Los Angeles 40% Phoenix 30% Sacramento Chicago 20% Dallas 10% Houston 0% Philadelphia Boston Mobile Device Loss or theft San Francisco frequent visitedStrategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September 21- 28, 2010, with an oversample in the top 20 cities (based on population). 4
    5. 5. Business and personal • CAPEX lossapplications running together • Data lossCorporate messaging, email on • Recovery effortpersonal devices •Loss of businessIntranet wireless access onpersonal devices ―The 329 organizations polled hadPersonal finance and banking on collectively lost more than 86,000 devices … with average cost of lostcorporate devices data at $49,246 per device, worthMobile payments and credentials $2.1 billion or $6.4 million per organization. "The Billion Dollar Lost-Laptop Study," conducted by Intel Corporation and the Ponemon Institute, analyzed the scope and circumstances of missing laptop PCs. 5
    6. 6. DetectionTo discover the Mitigationloss and theft earlyenough to initiate Revoke access toother steps sensitive data, applications or services Notification Notify owners, administra Recovery tors or authority Rescue device Recover/restore data 6
    7. 7. • Mobilityas Behavior • Mobility modeling is a well studied research area • Can be measured and tracked: Wi-Fi, GPS, Cellular, etc • Other contextual information can be combined: Bluetooth, accelerometer, etc• Other motivating applications • Healthcare: Inpatient telemetry. • Education: Young children monitoring • Law reinforcement: Inmates monitoring and control 7
    8. 8. 8
    9. 9. • Past and current location trigger future locations Hallway A Office Break Room Hallway B Bathroom• User mobility as short sequence of locations [1] [2]• ―Language as action‖: Language vs. streams of sensor data • Composing elements: sensor data vs. words in corpus • Sequence structure: local dependency vs. ―grammar‖ [1] Aipperspach, et al, ―Modeling Human Behavior from Simple sensors in the Home‖, PerCom 2006 [2] Buthpitya, et al, ―n—gram Geo-Trace Modeling‖, Pervasive 2011 9
    10. 10. • User location at time t depends only on the last n-1 locations• Sequence of locations can be predicted by n consecutive location in the past• Maximum Likelihood Estimation from training data by counting:• MLE assign zero probability to unseen n-grams Incorporate smoothing function (Katz) Discount probability for observed grams Reserve probability for unseen grams 10
    11. 11. • Long distance dependency of words in sentences • tri-grams for ―I hit the tennis ball‖: ―I hit the‖, ―hit the tennis‖ ―the tennis ball‖ • ―I hit ball‖ not captured• Future pseudo location depends on locations far in the past. Intermediate behavior has little relevance or influence • Noise in the data collected: ―ping-pong‖ effect in WLAN association, interference, sampling errors, etc • Model size 11
    12. 12. Preprocessing Anomaly Detection RSS N-gram Trace ModelSensing Anomaly Y/N 12
    13. 13. • Collect RSS of the devices on multiple WAPs with timestamps• Aggregate and serialize into time series of RSS vectors* Lin, et al ―WASP: An enhanced indoor location algorithm for a congested wi-fi environment‖ 13
    14. 14. • Dimensionality in RSS vector – too fine for modeling• Proximity in location results in similar RSS vector• K-means clustering algorithm with distance function similar to WASP[1] and each cluster assigned a pseudo location label[1] Lin, et al ―WASP: An enhanced indoor location algorithm for a congested wi-fi environment‖ 14
    15. 15. • Repeating location labels dominate n-gram statistics• Extracting ―duration‖ by counting repeating labels• Only append ―duration‖ label if Mutual Information of locationand duration is high • Dependency - ―Conference Room‖ + ―1 hours‖ infer ―Meeting‖ • Personal - ―Professor’s Office‖ + ―10 minutes‖ infer ―Student’s quick chat‖• Segment behavior text sequences based on time-of-day • Behavior follows routine and agenda • Varying among users • Cut the boundary based on activity level 15
    16. 16. Extract Preprocessing Anomaly Pseudo Detection Location Behavior Text RSS N-gram Generation Trace Model Fusion ExtractSensing Other Features Anomaly Y/N 16
    17. 17. • Feed sequence of the past locations in a sliding window of size N to n-gram model for testing• For a testing sequence of pseudo locations• Estimate the average log probability this sequence is generated from the n-gram or skipped n-gram model• If this likelihood drops below a threshold, flag an anomaly alert 17
    18. 18. 0. 8 0. 7Aver age Log Pr obabi l i t y 0. 6 0. 5 0. 4 C D A 0. 3 0. 2 Log Probility B Low Threshold High Threshold 0. 1 0 Sl i di ng W ndow Posi t i on i 18
    19. 19. Extract Preprocessing Anomaly Pseudo Detection Location Behavior Text RSS N-gram Generation Trace Model Fusion ExtractSensing Other Features Threshold > Anomaly Y/N 19
    20. 20. 20
    21. 21. Dataset • RSS vector clusteringUsers 40 • Run small subset trace with Cisco SJC 14 1FLocation Alpha networks different K and evaluate clustering performance byRSS 13 sec average distance to centroidssampling ratePeriod 5 days • K = 3X #WAPs has the best trade-offsNumber of WAPs 87 • Yield ~260 pseudo locations Cisco AironetDevice 1500 + MSEDataset Size 3.2 mil points 21
    22. 22. • Testing samples Positive sample: simulated anomaly by splicing traces from two different users Negative sample: trace from ―owner‖ 22
    23. 23. • Train n-gram models with 8 hour data• Continuous 5-gram model and Skipped 3-gram with skipping factor k=2 result in similar accuracy ~ 60% • Model complexity: k-order reduction • Skip factor K is data dependent: particular scenarios in our data set: office with hallways and corridors • Further investigation needed to find the optimal K.• Replacing repeating labels with duration feature improve the model Before collapsing, 5-gram statistics are dominated by several sequences with long repeating locations. Top 200 grams are repeating labels After collapsing, 5-gram statistics are well distributed• Time-of-day has only marginal improvement, <1% 23
    24. 24. 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 0.2 Data Size (12 Hrs) 0.1 Data Size (8 Hrs) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive RateSource information is set at 12 points. 24
    25. 25. 1 0.9 0.8 0.7 0.6 Accuracy 0.5 0.4 0.3 Data size (4hr) 0.2 Data size (8hr) 0.1 Data size (12hr) 0 0 1 2 3 4 5 6 7 8 9 10 n-gram orderSource information is set at 12 points. 25
    26. 26. 26
    27. 27. Extract Preprocessing Anomaly Pseudo Detection Location Behavior Text RSS N-gram Generation Trace Model Fusion Extract Sensing Other Features Threshold > Anomaly Y/N• Experiments to discover loss or theft event through anomaly detection with 70~80% accuracy with only 8 hours of training data 27
    28. 28. Thank you. And special thanks to our sponsors CyLab Mobility Research Center Cisco Systems Inc. Army Research Office
    29. 29. 29
    30. 30. 30
    31. 31. 31
    32. 32. 32
    33. 33. • Extract Mobility model from real trace in WLAN environment [1] • Extract mobility tracks, duration from WLAN association records • Analyze mobility characteristics: pause time, speed, direction, destination region and their distributions • Build empirical model to generate synthetic trace• Steady state and transient behavior can be modeled with Semi- [2] Markov model Transition probability matrix and sojourn time distribution• Language model to model behavior from sensors in home [3] Show support on similarity between language and behavior Smoothed n-gram model to make single-step prediction on binary sensor readings from smarthome [1] Kim et al, ―Extract a Mobility model from Real User Traces, INFORCOM 2006 [2] Lee and Hou, ―Modeling Steady-State and Transient Behaviors of User Mobility‖, MobiHoc 2006 [3] Aipperspach, et al, ―Modeling Human Behavior from Simple sensors in the Home‖, PerCom 2006 33
    34. 34. • Overhead and lack • Model complexity • It is straightforward of granularity in and computational to convert binary inferring user overhead not sensor data to location and pause suitable for real behavior text for time from WLAN time application LM-based association [Lee’06] analysis.[Aipp’06] records[Kim’06] • Simple and cost- • Heterogeneous• Fine-grain, higher effective model to multi-valued dimension trace capture mobility sensory data is data to model reducing ping-pong hard to convert to a mobility behavior, effects single-dimension such as RSS behavior text beacons trace 34
    35. 35. 35
    36. 36. • Calculate coordinates for each RSS vector using ―Indoor location‖ algorithm[1] and generate hot region plot[1] Lin, et al ―WASP: An enhanced indoor location algorithm for a congested wi-fi environment‖ 36
    37. 37. • Select 10 users with the least cross entropy 37
    38. 38. • Help Cisco to adopt this model to Mobility Service Engine• Heterogeneous sensor data fusion Network traffic patterns from wireless controllers Applications, Memory and battery status GPS, accelerometers, gyroscope, temperature, etc• Advanced Model Leverage the internal factorized relationships among various sensors • Factor Language Model• More Applications Prediction: resource allocation, energy saving, personalized services Anomaly detection: adaptive authentication, patient telemetry 38
    39. 39. 39
    40. 40. • Confirm similarity between language and behavior• Multi-dimension to single dimension and n-gram: low complexity but good results• Potential problems: •Dimensionality reduction to 1-D to use language approach in modeling may cause loss of the relationship among multi-dimensional data Sensor 1 Sensor 2 State •Skipped n-gram approach is dependent on the data and may only have marginal improvement or even worse results. 40