Smart Home Technologies Data Mining and Prediction

1,031 views
919 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,031
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
58
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Visit course website
  • Smart Home Technologies Data Mining and Prediction

    1. 1. Smart Home Technologies <ul><li>Data Mining and Prediction </li></ul>
    2. 2. Objectives of Data Mining and Prediction <ul><li>Large amounts of sensor data have to be “interpreted” to acquire knowledge about tasks that occur in the environment </li></ul><ul><li>Patterns in the data can be used to predict future events </li></ul><ul><li>Knowledge of tasks facilitates the automation of task components to improve the inhabitants’ experience </li></ul>
    3. 3. Data Mining and Prediction <ul><li>Data Mining attempts to extract patterns from the available data </li></ul><ul><ul><li>Associative patterns </li></ul></ul><ul><ul><li>What data attributes occur together ? </li></ul></ul><ul><ul><li>Classification </li></ul></ul><ul><ul><li>What indicates a given category ? </li></ul></ul><ul><ul><li>Temporal patterns </li></ul></ul><ul><ul><li>What sequences of events occur frequently ? </li></ul></ul>
    4. 4. Example Patterns <ul><li>Associative pattern </li></ul><ul><ul><li>When Bob is in the living room he likes to watch TV and eat popcorn with the light turned off. </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Action movie fans like to watch Terminator, drink beer, and have pizza. </li></ul></ul><ul><li>Sequential patterns </li></ul><ul><ul><li>After coming out of the bedroom in the morning, Bob turns off the bedroom lights, then goes to the kitchen where he makes coffee, and then leaves the house. </li></ul></ul>
    5. 5. Data Mining and Prediction <ul><li>Prediction attempts to form patterns that permit it to predict the next event(s) given the available input data. </li></ul><ul><ul><li>Deterministic predictions </li></ul></ul><ul><ul><li>If Bob leaves the bedroom before 7:00 am on a workday, then he will make coffee in the kitchen. </li></ul></ul><ul><ul><li>Probabilistic sequence models </li></ul></ul><ul><ul><li>If Bob turns on the TV in the evening then he will 80% of the time go to the kitchen to make popcorn. </li></ul></ul>
    6. 6. Objective of Prediction in Intelligent Environments <ul><li>Anticipate inhabitant actions </li></ul><ul><li>Detect unusual occurrences (anomalies) </li></ul><ul><li>Predict the right course of actions </li></ul><ul><li>Provide information for decision making </li></ul><ul><ul><li>Automate repetitive tasks </li></ul></ul><ul><ul><ul><li>e.g.: prepare coffee in the morning, turn on lights </li></ul></ul></ul><ul><ul><li>Eliminate unnecessary steps, improve sequences </li></ul></ul><ul><ul><ul><li>e.g.: determine if will likely rain based on weather forecast and external sensors to decide if to water the lawn. </li></ul></ul></ul>
    7. 7. What to Predict <ul><li>Behavior of the Inhabitants </li></ul><ul><ul><li>Location </li></ul></ul><ul><ul><li>Tasks / goals </li></ul></ul><ul><ul><li>Actions </li></ul></ul><ul><li>Behavior of the Environment </li></ul><ul><ul><li>Device behavior (e.g. heating, AC) </li></ul></ul><ul><ul><li>Interactions </li></ul></ul>
    8. 8. Example: Location Prediction <ul><li>Where will Bob go next? </li></ul><ul><li>Location t+1 = f(x) </li></ul><ul><li>Input data x: </li></ul><ul><ul><li>Location t , Location t-1 , … </li></ul></ul><ul><ul><li>Time, date, day of the week </li></ul></ul><ul><ul><li>Sensor data </li></ul></ul>
    9. 9. Example: Location Prediction Kitchen Bathroom Monday 02/25 7:00 Bathroom Bedroom Tuesday 02/26 6:30 Bedroom Bathroom Monday 02/25 22:10 Bathroom Living room Monday 02/25 22:00 Living room Bedroom Monday 02/25 18:10 Bedroom Kitchen Monday 02/25 18:00 Kitchen Garage Monday 02/25 17:30 Garage Kitchen Monday 02/25 7:30 Bathroom Bedroom Monday 02/25 6:30 Location t+1 Location t Day Date Time
    10. 10. Example: Location Prediction <ul><li>Learned pattern </li></ul><ul><ul><li>If Day = Monday…Friday </li></ul></ul><ul><ul><li>& Time > 0600 </li></ul></ul><ul><ul><li>& Time < 0700 </li></ul></ul><ul><ul><li>& Location t = Bedroom </li></ul></ul><ul><ul><li>Then Location t+1 = Bathroom </li></ul></ul>
    11. 11. Prediction Techniques <ul><li>Classification-Based Approaches </li></ul><ul><ul><li>Nearest Neighbor </li></ul></ul><ul><ul><li>Neural Networks </li></ul></ul><ul><ul><li>Bayesian Classifiers </li></ul></ul><ul><ul><li>Decision Trees </li></ul></ul><ul><li>Sequential Behavior Modeling </li></ul><ul><ul><li>Hidden Markov Models </li></ul></ul><ul><ul><li>Temporal Belief Networks </li></ul></ul>
    12. 12. Classification-Based Prediction <ul><li>Problem </li></ul><ul><ul><li>Input: State of the environment </li></ul></ul><ul><ul><ul><li>Attributes of the current state </li></ul></ul></ul><ul><ul><ul><ul><li>inhabitant location, device status, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Attributes of previous states </li></ul></ul></ul><ul><ul><li>Output: Concept description </li></ul></ul><ul><ul><ul><li>Concept indicates next event </li></ul></ul></ul><ul><ul><li>Prediction has to be applicable to future examples </li></ul></ul>
    13. 13. Instance-Based Prediction: Nearest Neighbor <ul><li>Use previous instances as a model for future instances </li></ul><ul><li>Prediction for the current instance is chosen as the classification of the most similar previously observed instance. </li></ul><ul><ul><li>Instances with correct classifications (predictions) (x i ,f(x i )) are stored </li></ul></ul><ul><ul><li>Given a new instance x q , the prediction is derived as the one of the most similar instance x k : </li></ul></ul><ul><ul><li>f(x q ) = f(x k ) </li></ul></ul>
    14. 14. Example: Location Prediction Kitchen Bathroom Monday 02/25 7:00 Bathroom Bedroom Tuesday 02/26 6:30 Bedroom Bathroom Monday 02/25 22:10 Bathroom Living room Monday 02/25 22:00 Living room Bedroom Monday 02/25 18:10 Bedroom Kitchen Monday 02/25 18:00 Kitchen Garage Monday 02/25 17:30 Garage Kitchen Monday 02/25 7:30 Bathroom Bedroom Monday 02/25 6:30 Location t+1 Location t Day Date Time
    15. 15. Nearest Neighbor Example: Inhabitant Location <ul><li>Training Instances (with concept): </li></ul><ul><li>((Bedroom, 6:30), Bathroom), ((Bathroom, 7:00), Kitchen), </li></ul><ul><li>((Kitchen, 7:30), Garage), ((Garage, 17:30), Kitchen), … </li></ul><ul><li>Similarity Metric: </li></ul><ul><ul><li>d((location 1 , time 1 ), (location 2 , time 2 )) = </li></ul></ul><ul><ul><li>1000*(location 1  location 2 ) + | time 1 – time 2 | </li></ul></ul><ul><li>Query Instance: </li></ul><ul><ul><li>x q = (Bedroom, 6:20) </li></ul></ul><ul><li>Nearest Neighbor: </li></ul><ul><ul><li>x k = ( Bedroom, 6:30) d(x k , x q ) = 10 </li></ul></ul><ul><ul><li>Prediction f(x k ): </li></ul></ul><ul><ul><ul><li>Bathroom </li></ul></ul></ul>
    16. 16. Nearest Neighbor <ul><li>Training instances and similarity metric form regions where a concept (prediction) applies: </li></ul><ul><li>Uncertain information and incorrect training instances lead to incorrect classifications </li></ul>
    17. 17. k-Nearest Neighbor <ul><li>Instead of using the most similar instance, use the average of the k most similar instances </li></ul><ul><ul><li>Given query x q , estimate concept (prediction) using majority of k nearest neighbors </li></ul></ul><ul><ul><li>Or, estimate concept by establishing the concept with the highest sum of inverse distances: </li></ul></ul>
    18. 18. k-Nearest Neighbor Example <ul><li>TV viewing preferences </li></ul><ul><ul><li>Distance Function? </li></ul></ul><ul><ul><ul><li>What are the important attributes ? </li></ul></ul></ul><ul><ul><ul><li>How can they be compared ? </li></ul></ul></ul>… News Action News News Reality Genre News 33 Thursday 02/25 21:00 … … … … … News 8 Saturday 02/27 20:00 Terminator I 21 Saturday 02/27 12:00 News 11 Friday 02/26 19:00 Cops 27 Thursday 02/25 19:30 Title Channel Day Date Time … Documentary News News Reality Genre News 4 Sunday 03/20 22:00 … … … … … Nova 13 Tuesday 03/22 22:00 60 Minutes 8 Monday 03/21 20:00 Antiques Roadshow 13 Sunday 03/20 13:30 Title Channel Day Date Time
    19. 19. k-Nearest Neighbor Example <ul><li>Distance function example: </li></ul><ul><ul><li>Most important matching attribute: Show name </li></ul></ul><ul><ul><li>Second most important attribute: Time </li></ul></ul><ul><ul><li>Third most important attribute: Genre </li></ul></ul><ul><ul><li>Fourth most important attribute: Channel </li></ul></ul><ul><ul><li>Does he/she like to watch Nova ? </li></ul></ul>… News News Documentary Genre News 33 Thursday 04/21 21:00 … … … … … 60 Minutes 8 Friday 04/22 20:00 WW II Planes 13 Wednesday 04/20 16:30 Title Channel Day Date Time
    20. 20. Nearest Neighbor <ul><li>Advantages </li></ul><ul><ul><li>Fast training (just store instances) </li></ul></ul><ul><ul><li>Complex target functions </li></ul></ul><ul><ul><li>No loss of information </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Slow at query time (have to evaluate all instances) </li></ul></ul><ul><ul><li>Sensitive to correct choice of similarity metric </li></ul></ul><ul><ul><li>Easily fooled by irrelevant attributes </li></ul></ul>
    21. 21. Decision Trees <ul><li>Use training instances to build a sequence of evaluations that permits to determine the correct category (prediction) </li></ul><ul><ul><li>If Bob is in the Bedroom then </li></ul></ul><ul><ul><li>if the time is between 6:00 and 7:00 then </li></ul></ul><ul><ul><li>Bob will go to the Bathroom </li></ul></ul><ul><ul><li>else </li></ul></ul><ul><li>Sequence of evaluations are represented as a tree where leaves are labeled with the category </li></ul>
    22. 22. Decision Tree Induction <ul><li>Algorithm (main loop) </li></ul><ul><ul><li>A = best attribute for next node </li></ul></ul><ul><ul><li>Assign A as attribute for node </li></ul></ul><ul><ul><li>For each value of A, create descendant node </li></ul></ul><ul><ul><li>Sort training examples to descendants </li></ul></ul><ul><ul><li>If training examples perfectly classified, then Stop, else iterate over descendants </li></ul></ul>
    23. 23. Decision Tree Induction <ul><li>Best attribute based on information-theoretic concept of entropy </li></ul><ul><ul><li>Choose the attribute that reduces the entropy (~uncertainty) most </li></ul></ul>A1 Bathroom (25) Kitchen (25) Bathroom (25) Kitchen (25) ? ? v 2 v 1 A2 Bathroom (0) Kitchen (50) Bathroom (50) Kitchen (0) B K v 1 v 2
    24. 24. Decision Tree Example: Inhabitant Location Day Time > 6:00 Location t Time < 7:00 Bathroom M…F yes yes Bedroom … no no Sat Sun Location t Living Room Bedroom …
    25. 25. Example: Location Prediction Kitchen Bathroom Monday 02/25 7:00 Bathroom Bedroom Tuesday 02/26 6:30 Bedroom Bathroom Monday 02/25 22:10 Bathroom Living room Monday 02/25 22:00 Living room Bedroom Monday 02/25 18:10 Bedroom Kitchen Monday 02/25 18:00 Kitchen Garage Monday 02/25 17:30 Garage Kitchen Monday 02/25 7:30 Bathroom Bedroom Monday 02/25 6:30 Location t+1 Location t Day Date Time
    26. 26. Decision Trees <ul><li>Advantages </li></ul><ul><ul><li>Understandable rules </li></ul></ul><ul><ul><li>Fast learning and prediction </li></ul></ul><ul><ul><li>Lower memory requirements </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Replication problem (each category requires multiple branches) </li></ul></ul><ul><ul><li>Limited rule representation (attributes are assumed to be locally independent) </li></ul></ul><ul><ul><li>Numeric attributes can lead to large branching factors </li></ul></ul>
    27. 27. Artificial Neural Networks <ul><li>Use a numeric function to calculate the correct category. The function is learned from the repeated presentation of the set of training instances where each attribute value is translated into a number. </li></ul><ul><li>Neural networks are motivated by the functioning of neurons in the brain. </li></ul><ul><ul><li>Functions are computed in a distributed fashion by a large number of simple computational units </li></ul></ul>
    28. 28. Neural Networks
    29. 29. Computer vs. Human Brain 10 14 10 6 Neuron updates / sec 10 14 bits/sec 10 9 bits/sec Bandwidth 10 -3 sec 10 -9 sec Cycle time 10 11 neurons, 10 14 synapses 10 10 bits RAM, 10 12 bits disk Storage units 10 11 neurons 1 CPU, 10 8 gates Computational units Human Brain Computer
    30. 30. Artificial Neurons <ul><li>Artificial neurons are a much simplified computational model of neurons </li></ul><ul><ul><li>Output: </li></ul></ul><ul><ul><li>A function is learned by adjusting the weights w j </li></ul></ul>
    31. 31. Artificial Neuron <ul><li>Activation functions </li></ul>
    32. 32. Perceptrons <ul><li>Perceptrons use a single unit with a threshold function to distinguish two categories </li></ul>
    33. 33. Perceptron Learning <ul><li>Weights are updated based on the treaining instances (x (i) , f(x (i) )) presented. </li></ul><ul><ul><li>Adjusts the weights in order to move the output closer to the desired target concept. </li></ul></ul><ul><ul><li>Learning rate  determines how fast to adjust the weights (too slow will require many training steps, too fast will prevent learning). </li></ul></ul>
    34. 34. Limitation of Perceptrons <ul><li>Learns only linearly-separable functions </li></ul><ul><ul><li>E.g. XOR can not be learned </li></ul></ul>
    35. 35. Feed forward Networks with Sigmoid Units <ul><li>Networks of units with sigmoid activation functions can learn arbitrary functions </li></ul>
    36. 36. Feed forward Networks with Sigmoid Units <ul><li>General Networks permit arbitrary state-based categories (predictions) to be learned </li></ul>
    37. 37. Learning in Multi-Layer Networks: Error Back-Propagation <ul><li>As in Perceptrons, differences between the output of the network and the target concept are propagated back to the input weights. </li></ul><ul><li>Output errors for hidden units are computed based on the propagated errors for the inputs of the output units. </li></ul><ul><li>Weight updates correspond to gradient descent on the output error function in weight space. </li></ul>
    38. 38. Neural Network Examples <ul><li>Prediction </li></ul><ul><ul><li>Predict steering commands in cars </li></ul></ul><ul><ul><li>Modeling of device behavior </li></ul></ul><ul><ul><li>Face and object recognition </li></ul></ul><ul><ul><li>Pose estimation </li></ul></ul><ul><li>Decision and Control </li></ul><ul><ul><li>Heating and AC control </li></ul></ul><ul><ul><li>Light control </li></ul></ul><ul><ul><li>Automated vehicles </li></ul></ul>
    39. 39. Neural Network Example: Prediction of Lighting <ul><li>University of Colorado Adaptive Home [DLRM94] </li></ul><ul><li>Neural network learns to predict the light level after a set of lights are changed </li></ul><ul><ul><li>Input: </li></ul></ul><ul><ul><ul><li>The current light device levels (7 inputs) </li></ul></ul></ul><ul><ul><ul><li>The current light sensor levels (4 inputs) </li></ul></ul></ul><ul><ul><ul><li>The new light device levels (7 inputs) </li></ul></ul></ul><ul><ul><li>Output: </li></ul></ul><ul><ul><ul><li>The new light sensor levels (4 outputs) </li></ul></ul></ul><ul><li>[DLRM94] Dodier, R. H., Lukianow, D., Ries, J., & Mozer, M. C. (1994). </li></ul><ul><li>A comparison of neural net and conventional techniques for lighting control. Applied Mathematics and Computer Science , 4 , 447-462. </li></ul>
    40. 40. Neural Networks <ul><li>Advantages </li></ul><ul><ul><li>General purpose learner (can learn arbitrary categories) </li></ul></ul><ul><ul><li>Fast prediction </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>All inputs have to be translated into numeric inputs </li></ul></ul><ul><ul><li>Slow training </li></ul></ul><ul><ul><li>Learning might result in a local optimum </li></ul></ul>
    41. 41. Bayes Classifier <ul><li>Use Bayesian probabilities to determine the most likely next event for the given instance given all the training data. </li></ul><ul><ul><li>Conditional probabilities are determined from the training data. </li></ul></ul>
    42. 42. Naive Bayes Classifier <ul><li>Bayes classifier required estimating P(x|f) for all x and f by counting occurrences in the training data. </li></ul><ul><ul><li>Generally too complex for large systems </li></ul></ul><ul><li>Naive Bayes classifier assumes that attributes are statistically independent </li></ul>
    43. 43. Bayes Classifier <ul><li>Advantages </li></ul><ul><ul><li>Yields optimal prediction (given the assumptions) </li></ul></ul><ul><ul><li>Can handle discrete or numeric attribute values </li></ul></ul><ul><ul><li>Naive Bayes classifier easy to compute </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Optimal Bayes classifier computationally intractable </li></ul></ul><ul><ul><li>Naive Bayes assumption usually violated </li></ul></ul>
    44. 44. Bayesian Networks <ul><li>Bayesian networks explicitly represent the dependence and independence of various attributes. </li></ul><ul><ul><li>Attributes are modeled as nodes in a network and links represent conditional probabilities. </li></ul></ul><ul><ul><li>Network forms a causal model of the attributes </li></ul></ul><ul><li>Prediction can be included as an additional node. </li></ul><ul><li>Probabilities in Bayesian networks can be calculated efficiently using analytical or statistical inference techniques. </li></ul>
    45. 45. Bayesian Networks Example: Location Prediction <ul><li>All state attributes are represented as nodes. </li></ul><ul><ul><li>Nodes can include attributes that are not observable. </li></ul></ul>P(Bathroom | R, Gr) Prediction Room Get ready Time Day 0.0 0.2 False 0.1 0.8 True Kitchen Bedroom Gr R
    46. 46. Bayesian Networks <ul><li>Advantages </li></ul><ul><ul><li>Efficient inference mechanism </li></ul></ul><ul><ul><li>Readable structure </li></ul></ul><ul><ul><li>For many problems relatively easy to design by hand </li></ul></ul><ul><ul><li>Mechanisms for learning network structure exist </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Building network automatically is complex </li></ul></ul><ul><ul><li>Does not handle sequence information </li></ul></ul>
    47. 47. Sequential Behavior Prediction <ul><li>Problem </li></ul><ul><ul><li>Input: A sequence of states or events </li></ul></ul><ul><ul><ul><li>States can be represented by their attributes </li></ul></ul></ul><ul><ul><ul><ul><li>inhabitant location, device status, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Events can be raw observations </li></ul></ul></ul><ul><ul><ul><ul><li>Sensor readings, inhabitant input, etc. </li></ul></ul></ul></ul><ul><ul><li>Output: Predicted next event </li></ul></ul><ul><ul><li>Model of behavior has to be built based on past instances and be usable for future predictions. </li></ul></ul>
    48. 48. Sequence Prediction Techniques <ul><li>String matching algorithms </li></ul><ul><ul><li>Deterministic best match </li></ul></ul><ul><ul><li>Probabilistic matching </li></ul></ul><ul><li>Markov Models </li></ul><ul><ul><li>Markov Chains </li></ul></ul><ul><ul><li>Hidden Markov Models </li></ul></ul><ul><li>Dynamic Belief Networks </li></ul>
    49. 49. String-Based Prediction <ul><li>Use the string of previous events or states to find a part that matches the current history. </li></ul><ul><ul><li>Prediction is either the event that followed the best (longest) matching string or the most likely event to follow strings partially matching the history. </li></ul></ul><ul><li>Issues: </li></ul><ul><ul><li>How to determine quality of match ? </li></ul></ul><ul><ul><li>How can such a predictor be represented efficiently if the previous event string is long ? </li></ul></ul>
    50. 50. Example System: IPAM [DH98] <ul><li>Predict UNIX commands issued by a user </li></ul><ul><li>Calculate p(x t ,x t-1 ) based on frequency </li></ul><ul><ul><li>Update current p(Predicted, x t-1 ) by  </li></ul></ul><ul><ul><li>Update current p(Observed, x t-1 ) by 1-  </li></ul></ul><ul><ul><li>Weight more recent events more heavily </li></ul></ul><ul><li>Data </li></ul><ul><ul><li>77 users, 2-6 months, >168,000 commands </li></ul></ul><ul><ul><li>Accuracy less than 40% for one guess, but better than Naïve Bayes Classifier </li></ul></ul><ul><li>[DH98] B. D. Davison and H. Hirsh. Probabilistic Online Action Prediction . Intelligent Environments: Papers from the AAAI 1998 Spring Symposium , Technical Report SS-98-02, pp. 148-154: AAAI Press. </li></ul>
    51. 51. Example System: ONISI [GP00] <ul><li>Look for historical state/action sequences that match immediate history and determine the quality of the predictions from these sequences </li></ul><ul><ul><li>In state s at time t , compute l t (s,a) </li></ul></ul><ul><ul><ul><li>Average length of the k longest sequences ending in a </li></ul></ul></ul><ul><ul><li>In state s, compute f(s,a) </li></ul></ul><ul><ul><ul><li>Frequency of action a executed from state s </li></ul></ul></ul><ul><ul><li>Rank predictions using </li></ul></ul><ul><li>[GP00] Peter Gorniak and David Poole, Predicting Future User Actions by Observing Unmodified Applications , Seventeenth National Conference on Artificial Intelligence (AAAI-2000) , August 2000. </li></ul>
    52. 52. Onisi Example [GP00] <ul><li>k=3, for action a3 there are only two matches of length 1 and 2, so l t (s3,a3) = (0+1+2)/3 = 1 </li></ul><ul><li>If  =0.9, the sum of averaged lengths for all actions is 5, a3 has occurred 50 times in s3, and s3 is visited 100 times, then R t (s3,a3) = 0.9*1/5 + 0.1*50/100 = 0.18+0.05 = 0.23 </li></ul>
    53. 53. Example Sequence Predictors <ul><li>Advantages </li></ul><ul><ul><li>Permits predictions based on sequence of events </li></ul></ul><ul><ul><li>Simple learning mechanism </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Relatively ad hoc weighting of sequence matches </li></ul></ul><ul><ul><li>Limited prediction capabilities </li></ul></ul><ul><ul><li>Large overhead for long past state/action sequences </li></ul></ul>
    54. 54. Markov Chain Prediction <ul><li>Use the string of previous events or states to create a model of the event generating process. </li></ul><ul><ul><li>Models are probabilistic and can be constructed from the observed behavior of the system </li></ul></ul><ul><ul><li>Prediction is the most event that is most likely to be generated by the model. </li></ul></ul><ul><li>Issues: </li></ul><ul><ul><li>What form should the model take ? </li></ul></ul><ul><ul><ul><li>String-based models </li></ul></ul></ul><ul><ul><ul><li>State-based models </li></ul></ul></ul>
    55. 55. Example System: Active LeZi [GC03] <ul><li>Assumptions: </li></ul><ul><ul><li>Event sequences are fairly repeatable </li></ul></ul><ul><ul><li>Generated by deterministic source </li></ul></ul><ul><li>Construct model as parse tree of possible event sequences </li></ul><ul><ul><li>Nodes are events with associated frequencies </li></ul></ul><ul><ul><li>Model constructed using LZ78 text compression algorithm </li></ul></ul><ul><li>[DH98] K. Gopalratnam and D. J. Cook, Active LeZi : An Incremental Parsing Algorithm for Device Usage Prediction in the Smart Home , In Proceedings of the Florida Artificial Intelligence Research Symposium , 2003. </li></ul>
    56. 56. Text Compression: LZ78 <ul><li>Parses string x 1 , x 2 , …. x i into c(i) </li></ul><ul><li>substrings w 1 , w 2 , …. w c(i) that form the set of phrases used for compression </li></ul><ul><ul><li>Each prefix of a phrase w j is also a phrase w i in the set used for compression </li></ul></ul><ul><ul><li>Example: </li></ul></ul><ul><ul><ul><li>input aaababbbbbaabccddcbaaaa </li></ul></ul></ul><ul><ul><ul><li>yields phrases a,aa,b,ab,bb,bba,abc,c,d,dc,ba,aaa </li></ul></ul></ul>
    57. 57. Active LeZi <ul><li>Represent compression phrases as a parse tree with frequency statistics </li></ul><ul><ul><li>E.g.: aaababbbbbaabccddcbaaaa </li></ul></ul>
    58. 58. Prediction in Active LeZi <ul><li>Calculate the probability for each possible event </li></ul><ul><ul><li>To calculate the probability, transitions across phrase boundaries have to be considered </li></ul></ul><ul><ul><ul><li>Slide window across the input sequence </li></ul></ul></ul><ul><ul><ul><li>Length k equal to longest phrase seen so far </li></ul></ul></ul><ul><ul><ul><li>Gather statistics on all possible contexts </li></ul></ul></ul><ul><ul><ul><li>Order k-1 Markov model </li></ul></ul></ul><ul><li>Output event with greatest probability across all contexts as prediction </li></ul>
    59. 59. Example: Probability of a <ul><li>Order 2 </li></ul><ul><ul><li>2/5 times that aa appears </li></ul></ul><ul><li>Order 1 </li></ul><ul><ul><li>5/10 times that a appears </li></ul></ul><ul><li>Order 0 </li></ul><ul><ul><li>10/23 total symbols </li></ul></ul><ul><li>Blended probability is </li></ul><ul><li>Probability of escaping to lower order = frequency of null endings </li></ul>
    60. 60. Active LeZi Example: Prediction on Simulated MavHome Data <ul><li>Data simulates a single inhabitant interacting with the devices in the home </li></ul><ul><ul><li>Repetitive behavior patterns are embedded in the data (e.g. morning routine ) </li></ul></ul><ul><ul><li>Time is ignored in the prediction </li></ul></ul><ul><ul><li>Only device interactions are recorded </li></ul></ul>
    61. 61. Active LeZi <ul><li>Advantages </li></ul><ul><ul><li>Permits predictions based on sequence of events </li></ul></ul><ul><ul><li>Does not require the construction of states </li></ul></ul><ul><ul><li>Permits probabilistic predictions </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Tree can become very large (long prediction times) </li></ul></ul><ul><ul><li>Nonoptimal predictions if the tree is not sufficiently deep </li></ul></ul>
    62. 62. Markov Chain Models <ul><li>Markov chain models represent the event generating process probabilistically. </li></ul><ul><ul><li>Markov models can be described by a tuple <S, T> representing states and transition probabilities. </li></ul></ul><ul><ul><li>Markov assumption: The current state contains all information about the past that is necessary to predict the probability of the next state. </li></ul></ul><ul><ul><ul><li>P(x t+1 |x t , x t-1 , …, x 0 ) = P(x t+1 | x t ) </li></ul></ul></ul><ul><ul><li>Transitions correspond to events that occurred in the environment (inhabitant actions, etc) </li></ul></ul><ul><li>Prediction of next state (and event) </li></ul>
    63. 63. Markov Chain Example <ul><li>Example states: </li></ul><ul><ul><li>S = {(Room, Time, Day, Previous Room)} </li></ul></ul><ul><ul><li>Transition probabilities can be calculated from training data by counting occurrences </li></ul></ul>x 1 x 4 x 6 x 2 x 5 x 3
    64. 64. Markov Models <ul><li>Advantages </li></ul><ul><ul><li>Permits probabilistic predictions </li></ul></ul><ul><ul><li>Transition probabilities are easy to learn </li></ul></ul><ul><ul><li>Representation is easy to interpret </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>State space has to have Markov property </li></ul></ul><ul><ul><li>State space selection is not automatic </li></ul></ul><ul><ul><li>States might have to include previous information </li></ul></ul><ul><ul><li>State attributes might not be observable </li></ul></ul>
    65. 65. Partially Observable MMs <ul><li>Partially Observable Markov Models extend Markov models by permitting states to be only partially observable. </li></ul><ul><ul><li>Systems can be represented by a tuple <S, T, O, V> where <S, T> is a Markov model and O, V are mapping observations about the state to probabilities of a given state </li></ul></ul><ul><ul><ul><li>O = {o i } is the set of observations </li></ul></ul></ul><ul><ul><ul><li>V: V(x, o) = P(o | x) </li></ul></ul></ul><ul><li>To determine a prediction the probability of being in any given state is computed </li></ul>
    66. 66. Partially Observable MMs <ul><li>Prediction is the most likely next state given the information about the current state (i.e. the current belief state): </li></ul><ul><ul><li>Belief state B is a probability distribution over the state space: </li></ul></ul><ul><ul><ul><li>B = ((x 1 , P(x 1 )), …, (x n , P(x n )) </li></ul></ul></ul><ul><ul><li>Prediction of the next state: </li></ul></ul>
    67. 67. Hidden Markov Models <ul><li>Hidden Markov Models (HMM) provide mechanisms to learn the Markov Model <S, T> underlying a POMM from the sequence of observations. </li></ul><ul><ul><li>Baum-Welch algorithm learns transition and observation probabilities as well as the state space (only the number of states has to be given) </li></ul></ul><ul><ul><li>Model learned is the one that is most likely to explain the observed training sequences </li></ul></ul>
    68. 68. Hidden Markov Model Example <ul><li>Tossing a balanced coin starting with a biased coin that always starts heads: </li></ul>
    69. 69. Partially Observable MMs <ul><li>Advantages </li></ul><ul><ul><li>Permits optimal predictions </li></ul></ul><ul><ul><li>HMM provide algorithms to learn the model </li></ul></ul><ul><ul><li>In HMM, Markovian state space description has not to be known </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>State space can be enormous </li></ul></ul><ul><ul><li>Learning of HMM is generally very complex </li></ul></ul><ul><ul><li>Computation of belief state is computationally expensive </li></ul></ul>
    70. 70. Example Location Prediction Task <ul><li>Environment and observations: </li></ul>[0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] [0, 1, 0, 2, 4, 5, 4, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2] [0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] [0, 1, 0, 2, 0, 2, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6] [0, 1, 0, 2, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] [4, 3, 4, 2, 0, 1, 0, 0, 0, 1, 2, 4, 5, 4, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 5, 4, 6, 6]
    71. 71. Neural Network Predictor <ul><li>Example network and training data </li></ul><ul><ul><li>Data has to be divided into training instances </li></ul></ul><ul><ul><li>Inputs represent current and 4 past locations </li></ul></ul># Input training pattern 1: 6 6 6 6 6 # Output training pattern 1: 1.000 # Input training pattern 2: 6 6 4 3 4 # Output training pattern 2: 0.333 # Input training pattern 3: 6 6 6 6 6 # Output training pattern 3: 1.000 # Input training pattern 4: 6 6 6 6 6 # Output training pattern 4: 1.000 # Input training pattern 5: 6 6 6 6 6 # Output training pattern 5: 1.000
    72. 72. Neural Network Predictor <ul><li>Learning performance depends on: </li></ul><ul><ul><li>Network topology </li></ul></ul><ul><ul><li>Input representation </li></ul></ul><ul><ul><li>Learning rate </li></ul></ul>
    73. 73. Hidden Markov Model Example <ul><li>Input representation and learned HMM: </li></ul><ul><ul><li>Initial and final HMM model </li></ul></ul>
    74. 74. Dynamic Bayesian Networks <ul><li>Dynamic Bayesian Networks use a Bayesian network to represent the belief state. </li></ul><ul><ul><li>State is constructed from a set of attributes (nodes) </li></ul></ul><ul><ul><li>Transitions over time are modeled as links between a model at time t and a model at time t+1 </li></ul></ul>Get ready Time Day Room Get ready Time Day Room Time t Time t+1
    75. 75. Dynamic Bayesian Networks <ul><li>Advantages </li></ul><ul><ul><li>Handles partial observability </li></ul></ul><ul><ul><li>More compact model </li></ul></ul><ul><ul><li>Belief state is inherent in the network </li></ul></ul><ul><ul><li>Simple prediction of next belief state </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>State attributes have to be predetermined </li></ul></ul><ul><ul><li>Learning of probabilities is very complex </li></ul></ul>
    76. 76. Conclusions <ul><li>Prediction is important in intelligent environments </li></ul><ul><ul><li>Captures repetitive patterns (activities) </li></ul></ul><ul><ul><li>Helps automating activities (But: only tells what will happen next; not what the system should do next) </li></ul></ul><ul><li>Different prediction algorithms have different strength and weaknesses: </li></ul><ul><ul><li>Select a prediction approach that is suitable for the particular problem. </li></ul></ul><ul><ul><li>There is no “best” prediction approach </li></ul></ul><ul><li>Optimal prediction is a very hard problem and is not yet solved. </li></ul>

    ×