Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Variable neighborhood Prediction of temporal collective profiles by Keun-Woo Lim, Telecom ParisTech


Published on

Temporal collective profiles generated by mobile network users can be used to predict network usage, which in turn can be used to improve the performance of the network to meet user demands. This presentation will talk about a prediction method of temporal collective profiles which is suitable for online network management. Using weighted graph representation, the target sample is observed during a given period to determine a set of neighboring profiles that are considered to behave similarly enough. The prediction of the target profile is based on the weighted average of its neighbors, where the optimal number of neighbors are selected through a form of variable neighborhood search. This method is applied to two datasets, one provided by a mobile network service provider and the other from a Wi-Fi service provider. The proposed prediction method can conveniently characterize user behavior via graph representation, while outperforming existing prediction methods. Also, unlike existing methods that utilize categorization, it has a low computational complexity, which makes it suitable for online network analysis.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Variable neighborhood Prediction of temporal collective profiles by Keun-Woo Lim, Telecom ParisTech

  1. 1. VARIABLE NEIGHBORHOOD PREDICTION OF TEMPORAL COLLECTIVE PROFILES Presentation for EuroIoTA ’16 Speaker: Keun-Woo Lim Telecom Paristech 24-11-2016
  2. 2. Brief Overview  What do we do in this work?  Analysis of temporal collective profiles (time-series)  Use of mobile datasets (cellular, Wi-Fi)  Real–time & Lightweight prediction (online prediction)  What do we try to achieve?  Better prediction accuracy  Lower computational complexity  Better application & use case
  3. 3. Contents Contents  Introduction  Methodology  Prediction  Outlier Detection  Future Work
  4. 4. Introduction
  5. 5. Temporal collective profiles  Representation of data that aggregate behavior of group of individuals – over time  Can be categorized in various ways  “Daily Profiles”
  6. 6. What are collected?  Basic telephone calls and SMS?  However, we want to focus on more specific matters  Specific application data  Usage of Internet service
  7. 7. Why do we analyze these data?  For “online network analysis”  Real-time prediction of the near-future  Recognition of sudden changes/outliers  Timely adaptation  Use cases  Resource allocation  Traffic handling  Social behavior
  8. 8. Requirements  Low computational complexity  Lightweight prediction methods  Accuracy  Still have to be accurate
  9. 9. Dataset  Cellular mobile dataset  1-week data from 90 lacs in Paris  More than 500 daily profiles  Wi-Fi cloud dataset  122 days (March 1st to June 30th, 2014)  60 million URL connection logs (Top 20 mobile applications)
  10. 10. Methodology
  11. 11. What should we do with daily profiles?  Daily profiles can be:  Very similar to each other (same day, location, etc.)  Very different too (outlier, events)  We use methods to calculate similarity  Cluster similar profiles  Distinguish different profiles
  12. 12. Previous work (Offline analysis)1  Utilization of clustering methods (UPGMA)  With similarity comparison techniques (DTW, quantiles)  Not ideal in online data analysis  Clustering may take long time (𝑂(𝑀2 𝑁3)with DTW) 1K. Lim, S. Secci, L. Tabourier, B. Tebbani, “Characterizing and predicting mobile application usage,”
  13. 13. Profile similarity  We use two examples of similarity measures (M values in a time-series)  Euclidean distance (ED) = Θ(M)  Dynamic time warping (DTW) = Θ(M2)  For specific dataset containing N profiles,  ED = Θ(N2M)  DTW = Θ(N2M2) to compare all with each other
  14. 14. Weighted graph representation  Using similarity measures, we acquire a graph structure of neighbors  E.g., if ED is used, lower value = more similar
  15. 15. Filtering paths  Filter neighbors with high distance  Depending on the value of α, the number of neighbors change for all profiles
  16. 16. Visualization of graph structure  Example graph structure for ED – cellular dataset
  17. 17. Variable Neighborhood Prediction (VNP)
  18. 18. Principle of VNP  For a new day 𝑥 𝑛(𝑡), we configure  𝑡0 = 0, 𝑡1 = 0~24, 𝑡2 = 24 (hour)  Objective  Observation period = 𝑥 𝑛 𝑡0, 𝑡1  Create a temporal profile to predict 𝑥 𝑛 𝑡1, 𝑡2  Find 𝑥 from the observation period  The closest profile 𝑥, in 𝑥 𝑡0, 𝑡1 and 𝑥 𝑛 𝑡0, 𝑡1
  19. 19. Find the neighbors  Using closest neighbor 𝑥, we find the group of neighbors 𝑁 𝑛 to be used for prediction  For any other profile y of the training set,  𝑦 ∈ 𝑁𝑛 𝑖𝑖𝑓 𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑦 𝑡0, 𝑡1 ≤ 𝑎 ∙ 𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑥 𝑡0, 𝑡1
  20. 20. Creating the prediction profile  Using 𝑁 𝑛, formulate the prediction  𝑥 𝑛 𝑡 = σ 𝑦∈𝑁 𝑛 𝑠(𝑥 𝑛,𝑦)∙𝑦(𝑡) σ 𝑦∈𝑁 𝑛 𝑠(𝑥 𝑛,𝑦)  Simply put, it is the weighted average over the profiles of its neighborhood
  21. 21. Training Parameter 𝑎  𝑎 can be tuned to select the optimal number of neighbors  Variable neighborhood search to find the 𝑎 that yields the highest accuracy over time  E.g. 1.0 < 𝑎 < 10.0  Drawbacks  Increase in complexity (recalculate for each 𝑎)
  22. 22. Calculating multiple 𝑡1  For a more fine-grained prediction, multiple 𝑡1 can be used in one day  Repetition of the VNP (e.g. per-hour analysis)
  23. 23. Handling Complexity - VNP  Computation of calculating neighborhood of target day per 𝑎 :  ED = Θ(NM)  DTW = Θ(NM2)  Depending on N, this can be large in practice  Also, in case of multiple 𝑡1 analysis, large M can also impact
  24. 24. Handling Complexity - Graph  Can be heavy  ED = Θ(N2M)  DTW = Θ(N2M2)  Luckily, graph representation is only updated once per day  Although, needed for various M in case of multiple 𝑡1 analysis  Also, space partitioning can be used to reduce time  Via Kd-tree  This can reduce complexity of ED to Θ(log(N)M)
  25. 25. Prediction Analysis
  26. 26. Prediction accuracy analysis  Prediction through relative error, defined as  𝜀 = σ 𝑡=𝑡1 𝑡2 𝑥 𝑛 𝑡 − ෣𝑥 𝑛 𝑡 2 σ 𝑡=𝑡1 𝑡2 𝑥 𝑛 𝑡 2  Comparison with closest neighbor ( 𝑎 =1), UPGMA  𝑡1 = 12 cellular data - ED cellular data - DTW
  27. 27. Effect of changing 𝑡1  Per-hour analysis  The length of observation period may also effect the performance of prediction cellular data - ED cellular data - DTW
  28. 28. Time consumption  The required time can be acceptable for both methods in a per-hour analysis.  However, need caution for DTW when many profiles are used cellular data - ED cellular data - DTW
  29. 29. Distribution of α  The distribution of optimal α is focused in range [1,2], allowing us to easily limit the range of α  Distribution of neighbors is heterogeneous, but most are < 20
  30. 30. Conclusion & Future work
  31. 31. Conclusion & Future work  We have proposed a methodology for online prediction of mobile time-series datasets  Acceptable time for our current dataset  Can be used for other time-series datasets in various IoT environment  Further studies include  Testing in a bigger scale dataset
  32. 32. Any Questions?
  33. 33. Appendix – Wi-Fi data prediction Wifi data - ED Wifi data - DTW