Your SlideShare is downloading. ×
0
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Smart Home Technologies Data Mining and Prediction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Smart Home Technologies Data Mining and Prediction

725

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
725
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
50
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Visit course website
  • Transcript

    • 1. Smart Home Technologies <ul><li>Data Mining and Prediction </li></ul>
    • 2. Objectives of Data Mining and Prediction <ul><li>Large amounts of sensor data have to be “interpreted” to acquire knowledge about tasks that occur in the environment </li></ul><ul><li>Patterns in the data can be used to predict future events </li></ul><ul><li>Knowledge of tasks facilitates the automation of task components to improve the inhabitants’ experience </li></ul>
    • 3. Data Mining and Prediction <ul><li>Data Mining attempts to extract patterns from the available data </li></ul><ul><ul><li>Associative patterns </li></ul></ul><ul><ul><li>What data attributes occur together ? </li></ul></ul><ul><ul><li>Classification </li></ul></ul><ul><ul><li>What indicates a given category ? </li></ul></ul><ul><ul><li>Temporal patterns </li></ul></ul><ul><ul><li>What sequences of events occur frequently ? </li></ul></ul>
    • 4. Example Patterns <ul><li>Associative pattern </li></ul><ul><ul><li>When Bob is in the living room he likes to watch TV and eat popcorn with the light turned off. </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Action movie fans like to watch Terminator, drink beer, and have pizza. </li></ul></ul><ul><li>Sequential patterns </li></ul><ul><ul><li>After coming out of the bedroom in the morning, Bob turns off the bedroom lights, then goes to the kitchen where he makes coffee, and then leaves the house. </li></ul></ul>
    • 5. Data Mining and Prediction <ul><li>Prediction attempts to form patterns that permit it to predict the next event(s) given the available input data. </li></ul><ul><ul><li>Deterministic predictions </li></ul></ul><ul><ul><li>If Bob leaves the bedroom before 7:00 am on a workday, then he will make coffee in the kitchen. </li></ul></ul><ul><ul><li>Probabilistic sequence models </li></ul></ul><ul><ul><li>If Bob turns on the TV in the evening then he will 80% of the time go to the kitchen to make popcorn. </li></ul></ul>
    • 6. Objective of Prediction in Intelligent Environments <ul><li>Anticipate inhabitant actions </li></ul><ul><li>Detect unusual occurrences (anomalies) </li></ul><ul><li>Predict the right course of actions </li></ul><ul><li>Provide information for decision making </li></ul><ul><ul><li>Automate repetitive tasks </li></ul></ul><ul><ul><ul><li>e.g.: prepare coffee in the morning, turn on lights </li></ul></ul></ul><ul><ul><li>Eliminate unnecessary steps, improve sequences </li></ul></ul><ul><ul><ul><li>e.g.: determine if will likely rain based on weather forecast and external sensors to decide if to water the lawn. </li></ul></ul></ul>
    • 7. What to Predict <ul><li>Behavior of the Inhabitants </li></ul><ul><ul><li>Location </li></ul></ul><ul><ul><li>Tasks / goals </li></ul></ul><ul><ul><li>Actions </li></ul></ul><ul><li>Behavior of the Environment </li></ul><ul><ul><li>Device behavior (e.g. heating, AC) </li></ul></ul><ul><ul><li>Interactions </li></ul></ul>
    • 8. Example: Location Prediction <ul><li>Where will Bob go next? </li></ul><ul><li>Location t+1 = f(x) </li></ul><ul><li>Input data x: </li></ul><ul><ul><li>Location t , Location t-1 , … </li></ul></ul><ul><ul><li>Time, date, day of the week </li></ul></ul><ul><ul><li>Sensor data </li></ul></ul>
    • 9. Example: Location Prediction Kitchen Bathroom Monday 02/25 7:00 Bathroom Bedroom Tuesday 02/26 6:30 Bedroom Bathroom Monday 02/25 22:10 Bathroom Living room Monday 02/25 22:00 Living room Bedroom Monday 02/25 18:10 Bedroom Kitchen Monday 02/25 18:00 Kitchen Garage Monday 02/25 17:30 Garage Kitchen Monday 02/25 7:30 Bathroom Bedroom Monday 02/25 6:30 Location t+1 Location t Day Date Time
    • 10. Example: Location Prediction <ul><li>Learned pattern </li></ul><ul><ul><li>If Day = Monday…Friday </li></ul></ul><ul><ul><li>& Time > 0600 </li></ul></ul><ul><ul><li>& Time < 0700 </li></ul></ul><ul><ul><li>& Location t = Bedroom </li></ul></ul><ul><ul><li>Then Location t+1 = Bathroom </li></ul></ul>
    • 11. Prediction Techniques <ul><li>Classification-Based Approaches </li></ul><ul><ul><li>Nearest Neighbor </li></ul></ul><ul><ul><li>Neural Networks </li></ul></ul><ul><ul><li>Bayesian Classifiers </li></ul></ul><ul><ul><li>Decision Trees </li></ul></ul><ul><li>Sequential Behavior Modeling </li></ul><ul><ul><li>Hidden Markov Models </li></ul></ul><ul><ul><li>Temporal Belief Networks </li></ul></ul>
    • 12. Classification-Based Prediction <ul><li>Problem </li></ul><ul><ul><li>Input: State of the environment </li></ul></ul><ul><ul><ul><li>Attributes of the current state </li></ul></ul></ul><ul><ul><ul><ul><li>inhabitant location, device status, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Attributes of previous states </li></ul></ul></ul><ul><ul><li>Output: Concept description </li></ul></ul><ul><ul><ul><li>Concept indicates next event </li></ul></ul></ul><ul><ul><li>Prediction has to be applicable to future examples </li></ul></ul>
    • 13. Instance-Based Prediction: Nearest Neighbor <ul><li>Use previous instances as a model for future instances </li></ul><ul><li>Prediction for the current instance is chosen as the classification of the most similar previously observed instance. </li></ul><ul><ul><li>Instances with correct classifications (predictions) (x i ,f(x i )) are stored </li></ul></ul><ul><ul><li>Given a new instance x q , the prediction is derived as the one of the most similar instance x k : </li></ul></ul><ul><ul><li>f(x q ) = f(x k ) </li></ul></ul>
    • 14. Example: Location Prediction Kitchen Bathroom Monday 02/25 7:00 Bathroom Bedroom Tuesday 02/26 6:30 Bedroom Bathroom Monday 02/25 22:10 Bathroom Living room Monday 02/25 22:00 Living room Bedroom Monday 02/25 18:10 Bedroom Kitchen Monday 02/25 18:00 Kitchen Garage Monday 02/25 17:30 Garage Kitchen Monday 02/25 7:30 Bathroom Bedroom Monday 02/25 6:30 Location t+1 Location t Day Date Time
    • 15. Nearest Neighbor Example: Inhabitant Location <ul><li>Training Instances (with concept): </li></ul><ul><li>((Bedroom, 6:30), Bathroom), ((Bathroom, 7:00), Kitchen), </li></ul><ul><li>((Kitchen, 7:30), Garage), ((Garage, 17:30), Kitchen), … </li></ul><ul><li>Similarity Metric: </li></ul><ul><ul><li>d((location 1 , time 1 ), (location 2 , time 2 )) = </li></ul></ul><ul><ul><li>1000*(location 1  location 2 ) + | time 1 – time 2 | </li></ul></ul><ul><li>Query Instance: </li></ul><ul><ul><li>x q = (Bedroom, 6:20) </li></ul></ul><ul><li>Nearest Neighbor: </li></ul><ul><ul><li>x k = ( Bedroom, 6:30) d(x k , x q ) = 10 </li></ul></ul><ul><ul><li>Prediction f(x k ): </li></ul></ul><ul><ul><ul><li>Bathroom </li></ul></ul></ul>
    • 16. Nearest Neighbor <ul><li>Training instances and similarity metric form regions where a concept (prediction) applies: </li></ul><ul><li>Uncertain information and incorrect training instances lead to incorrect classifications </li></ul>
    • 17. k-Nearest Neighbor <ul><li>Instead of using the most similar instance, use the average of the k most similar instances </li></ul><ul><ul><li>Given query x q , estimate concept (prediction) using majority of k nearest neighbors </li></ul></ul><ul><ul><li>Or, estimate concept by establishing the concept with the highest sum of inverse distances: </li></ul></ul>
    • 18. k-Nearest Neighbor Example <ul><li>TV viewing preferences </li></ul><ul><ul><li>Distance Function? </li></ul></ul><ul><ul><ul><li>What are the important attributes ? </li></ul></ul></ul><ul><ul><ul><li>How can they be compared ? </li></ul></ul></ul>… News Action News News Reality Genre News 33 Thursday 02/25 21:00 … … … … … News 8 Saturday 02/27 20:00 Terminator I 21 Saturday 02/27 12:00 News 11 Friday 02/26 19:00 Cops 27 Thursday 02/25 19:30 Title Channel Day Date Time … Documentary News News Reality Genre News 4 Sunday 03/20 22:00 … … … … … Nova 13 Tuesday 03/22 22:00 60 Minutes 8 Monday 03/21 20:00 Antiques Roadshow 13 Sunday 03/20 13:30 Title Channel Day Date Time
    • 19. k-Nearest Neighbor Example <ul><li>Distance function example: </li></ul><ul><ul><li>Most important matching attribute: Show name </li></ul></ul><ul><ul><li>Second most important attribute: Time </li></ul></ul><ul><ul><li>Third most important attribute: Genre </li></ul></ul><ul><ul><li>Fourth most important attribute: Channel </li></ul></ul><ul><ul><li>Does he/she like to watch Nova ? </li></ul></ul>… News News Documentary Genre News 33 Thursday 04/21 21:00 … … … … … 60 Minutes 8 Friday 04/22 20:00 WW II Planes 13 Wednesday 04/20 16:30 Title Channel Day Date Time
    • 20. Nearest Neighbor <ul><li>Advantages </li></ul><ul><ul><li>Fast training (just store instances) </li></ul></ul><ul><ul><li>Complex target functions </li></ul></ul><ul><ul><li>No loss of information </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Slow at query time (have to evaluate all instances) </li></ul></ul><ul><ul><li>Sensitive to correct choice of similarity metric </li></ul></ul><ul><ul><li>Easily fooled by irrelevant attributes </li></ul></ul>
    • 21. Decision Trees <ul><li>Use training instances to build a sequence of evaluations that permits to determine the correct category (prediction) </li></ul><ul><ul><li>If Bob is in the Bedroom then </li></ul></ul><ul><ul><li>if the time is between 6:00 and 7:00 then </li></ul></ul><ul><ul><li>Bob will go to the Bathroom </li></ul></ul><ul><ul><li>else </li></ul></ul><ul><li>Sequence of evaluations are represented as a tree where leaves are labeled with the category </li></ul>
    • 22. Decision Tree Induction <ul><li>Algorithm (main loop) </li></ul><ul><ul><li>A = best attribute for next node </li></ul></ul><ul><ul><li>Assign A as attribute for node </li></ul></ul><ul><ul><li>For each value of A, create descendant node </li></ul></ul><ul><ul><li>Sort training examples to descendants </li></ul></ul><ul><ul><li>If training examples perfectly classified, then Stop, else iterate over descendants </li></ul></ul>
    • 23. Decision Tree Induction <ul><li>Best attribute based on information-theoretic concept of entropy </li></ul><ul><ul><li>Choose the attribute that reduces the entropy (~uncertainty) most </li></ul></ul>A1 Bathroom (25) Kitchen (25) Bathroom (25) Kitchen (25) ? ? v 2 v 1 A2 Bathroom (0) Kitchen (50) Bathroom (50) Kitchen (0) B K v 1 v 2
    • 24. Decision Tree Example: Inhabitant Location Day Time > 6:00 Location t Time < 7:00 Bathroom M…F yes yes Bedroom … no no Sat Sun Location t Living Room Bedroom …
    • 25. Example: Location Prediction Kitchen Bathroom Monday 02/25 7:00 Bathroom Bedroom Tuesday 02/26 6:30 Bedroom Bathroom Monday 02/25 22:10 Bathroom Living room Monday 02/25 22:00 Living room Bedroom Monday 02/25 18:10 Bedroom Kitchen Monday 02/25 18:00 Kitchen Garage Monday 02/25 17:30 Garage Kitchen Monday 02/25 7:30 Bathroom Bedroom Monday 02/25 6:30 Location t+1 Location t Day Date Time
    • 26. Decision Trees <ul><li>Advantages </li></ul><ul><ul><li>Understandable rules </li></ul></ul><ul><ul><li>Fast learning and prediction </li></ul></ul><ul><ul><li>Lower memory requirements </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Replication problem (each category requires multiple branches) </li></ul></ul><ul><ul><li>Limited rule representation (attributes are assumed to be locally independent) </li></ul></ul><ul><ul><li>Numeric attributes can lead to large branching factors </li></ul></ul>
    • 27. Artificial Neural Networks <ul><li>Use a numeric function to calculate the correct category. The function is learned from the repeated presentation of the set of training instances where each attribute value is translated into a number. </li></ul><ul><li>Neural networks are motivated by the functioning of neurons in the brain. </li></ul><ul><ul><li>Functions are computed in a distributed fashion by a large number of simple computational units </li></ul></ul>
    • 28. Neural Networks
    • 29. Computer vs. Human Brain 10 14 10 6 Neuron updates / sec 10 14 bits/sec 10 9 bits/sec Bandwidth 10 -3 sec 10 -9 sec Cycle time 10 11 neurons, 10 14 synapses 10 10 bits RAM, 10 12 bits disk Storage units 10 11 neurons 1 CPU, 10 8 gates Computational units Human Brain Computer
    • 30. Artificial Neurons <ul><li>Artificial neurons are a much simplified computational model of neurons </li></ul><ul><ul><li>Output: </li></ul></ul><ul><ul><li>A function is learned by adjusting the weights w j </li></ul></ul>
    • 31. Artificial Neuron <ul><li>Activation functions </li></ul>
    • 32. Perceptrons <ul><li>Perceptrons use a single unit with a threshold function to distinguish two categories </li></ul>
    • 33. Perceptron Learning <ul><li>Weights are updated based on the treaining instances (x (i) , f(x (i) )) presented. </li></ul><ul><ul><li>Adjusts the weights in order to move the output closer to the desired target concept. </li></ul></ul><ul><ul><li>Learning rate  determines how fast to adjust the weights (too slow will require many training steps, too fast will prevent learning). </li></ul></ul>
    • 34. Limitation of Perceptrons <ul><li>Learns only linearly-separable functions </li></ul><ul><ul><li>E.g. XOR can not be learned </li></ul></ul>
    • 35. Feed forward Networks with Sigmoid Units <ul><li>Networks of units with sigmoid activation functions can learn arbitrary functions </li></ul>
    • 36. Feed forward Networks with Sigmoid Units <ul><li>General Networks permit arbitrary state-based categories (predictions) to be learned </li></ul>
    • 37. Learning in Multi-Layer Networks: Error Back-Propagation <ul><li>As in Perceptrons, differences between the output of the network and the target concept are propagated back to the input weights. </li></ul><ul><li>Output errors for hidden units are computed based on the propagated errors for the inputs of the output units. </li></ul><ul><li>Weight updates correspond to gradient descent on the output error function in weight space. </li></ul>
    • 38. Neural Network Examples <ul><li>Prediction </li></ul><ul><ul><li>Predict steering commands in cars </li></ul></ul><ul><ul><li>Modeling of device behavior </li></ul></ul><ul><ul><li>Face and object recognition </li></ul></ul><ul><ul><li>Pose estimation </li></ul></ul><ul><li>Decision and Control </li></ul><ul><ul><li>Heating and AC control </li></ul></ul><ul><ul><li>Light control </li></ul></ul><ul><ul><li>Automated vehicles </li></ul></ul>
    • 39. Neural Network Example: Prediction of Lighting <ul><li>University of Colorado Adaptive Home [DLRM94] </li></ul><ul><li>Neural network learns to predict the light level after a set of lights are changed </li></ul><ul><ul><li>Input: </li></ul></ul><ul><ul><ul><li>The current light device levels (7 inputs) </li></ul></ul></ul><ul><ul><ul><li>The current light sensor levels (4 inputs) </li></ul></ul></ul><ul><ul><ul><li>The new light device levels (7 inputs) </li></ul></ul></ul><ul><ul><li>Output: </li></ul></ul><ul><ul><ul><li>The new light sensor levels (4 outputs) </li></ul></ul></ul><ul><li>[DLRM94] Dodier, R. H., Lukianow, D., Ries, J., & Mozer, M. C. (1994). </li></ul><ul><li>A comparison of neural net and conventional techniques for lighting control. Applied Mathematics and Computer Science , 4 , 447-462. </li></ul>
    • 40. Neural Networks <ul><li>Advantages </li></ul><ul><ul><li>General purpose learner (can learn arbitrary categories) </li></ul></ul><ul><ul><li>Fast prediction </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>All inputs have to be translated into numeric inputs </li></ul></ul><ul><ul><li>Slow training </li></ul></ul><ul><ul><li>Learning might result in a local optimum </li></ul></ul>
    • 41. Bayes Classifier <ul><li>Use Bayesian probabilities to determine the most likely next event for the given instance given all the training data. </li></ul><ul><ul><li>Conditional probabilities are determined from the training data. </li></ul></ul>
    • 42. Naive Bayes Classifier <ul><li>Bayes classifier required estimating P(x|f) for all x and f by counting occurrences in the training data. </li></ul><ul><ul><li>Generally too complex for large systems </li></ul></ul><ul><li>Naive Bayes classifier assumes that attributes are statistically independent </li></ul>
    • 43. Bayes Classifier <ul><li>Advantages </li></ul><ul><ul><li>Yields optimal prediction (given the assumptions) </li></ul></ul><ul><ul><li>Can handle discrete or numeric attribute values </li></ul></ul><ul><ul><li>Naive Bayes classifier easy to compute </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Optimal Bayes classifier computationally intractable </li></ul></ul><ul><ul><li>Naive Bayes assumption usually violated </li></ul></ul>
    • 44. Bayesian Networks <ul><li>Bayesian networks explicitly represent the dependence and independence of various attributes. </li></ul><ul><ul><li>Attributes are modeled as nodes in a network and links represent conditional probabilities. </li></ul></ul><ul><ul><li>Network forms a causal model of the attributes </li></ul></ul><ul><li>Prediction can be included as an additional node. </li></ul><ul><li>Probabilities in Bayesian networks can be calculated efficiently using analytical or statistical inference techniques. </li></ul>
    • 45. Bayesian Networks Example: Location Prediction <ul><li>All state attributes are represented as nodes. </li></ul><ul><ul><li>Nodes can include attributes that are not observable. </li></ul></ul>P(Bathroom | R, Gr) Prediction Room Get ready Time Day 0.0 0.2 False 0.1 0.8 True Kitchen Bedroom Gr R
    • 46. Bayesian Networks <ul><li>Advantages </li></ul><ul><ul><li>Efficient inference mechanism </li></ul></ul><ul><ul><li>Readable structure </li></ul></ul><ul><ul><li>For many problems relatively easy to design by hand </li></ul></ul><ul><ul><li>Mechanisms for learning network structure exist </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Building network automatically is complex </li></ul></ul><ul><ul><li>Does not handle sequence information </li></ul></ul>
    • 47. Sequential Behavior Prediction <ul><li>Problem </li></ul><ul><ul><li>Input: A sequence of states or events </li></ul></ul><ul><ul><ul><li>States can be represented by their attributes </li></ul></ul></ul><ul><ul><ul><ul><li>inhabitant location, device status, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Events can be raw observations </li></ul></ul></ul><ul><ul><ul><ul><li>Sensor readings, inhabitant input, etc. </li></ul></ul></ul></ul><ul><ul><li>Output: Predicted next event </li></ul></ul><ul><ul><li>Model of behavior has to be built based on past instances and be usable for future predictions. </li></ul></ul>
    • 48. Sequence Prediction Techniques <ul><li>String matching algorithms </li></ul><ul><ul><li>Deterministic best match </li></ul></ul><ul><ul><li>Probabilistic matching </li></ul></ul><ul><li>Markov Models </li></ul><ul><ul><li>Markov Chains </li></ul></ul><ul><ul><li>Hidden Markov Models </li></ul></ul><ul><li>Dynamic Belief Networks </li></ul>
    • 49. String-Based Prediction <ul><li>Use the string of previous events or states to find a part that matches the current history. </li></ul><ul><ul><li>Prediction is either the event that followed the best (longest) matching string or the most likely event to follow strings partially matching the history. </li></ul></ul><ul><li>Issues: </li></ul><ul><ul><li>How to determine quality of match ? </li></ul></ul><ul><ul><li>How can such a predictor be represented efficiently if the previous event string is long ? </li></ul></ul>
    • 50. Example System: IPAM [DH98] <ul><li>Predict UNIX commands issued by a user </li></ul><ul><li>Calculate p(x t ,x t-1 ) based on frequency </li></ul><ul><ul><li>Update current p(Predicted, x t-1 ) by  </li></ul></ul><ul><ul><li>Update current p(Observed, x t-1 ) by 1-  </li></ul></ul><ul><ul><li>Weight more recent events more heavily </li></ul></ul><ul><li>Data </li></ul><ul><ul><li>77 users, 2-6 months, >168,000 commands </li></ul></ul><ul><ul><li>Accuracy less than 40% for one guess, but better than Naïve Bayes Classifier </li></ul></ul><ul><li>[DH98] B. D. Davison and H. Hirsh. Probabilistic Online Action Prediction . Intelligent Environments: Papers from the AAAI 1998 Spring Symposium , Technical Report SS-98-02, pp. 148-154: AAAI Press. </li></ul>
    • 51. Example System: ONISI [GP00] <ul><li>Look for historical state/action sequences that match immediate history and determine the quality of the predictions from these sequences </li></ul><ul><ul><li>In state s at time t , compute l t (s,a) </li></ul></ul><ul><ul><ul><li>Average length of the k longest sequences ending in a </li></ul></ul></ul><ul><ul><li>In state s, compute f(s,a) </li></ul></ul><ul><ul><ul><li>Frequency of action a executed from state s </li></ul></ul></ul><ul><ul><li>Rank predictions using </li></ul></ul><ul><li>[GP00] Peter Gorniak and David Poole, Predicting Future User Actions by Observing Unmodified Applications , Seventeenth National Conference on Artificial Intelligence (AAAI-2000) , August 2000. </li></ul>
    • 52. Onisi Example [GP00] <ul><li>k=3, for action a3 there are only two matches of length 1 and 2, so l t (s3,a3) = (0+1+2)/3 = 1 </li></ul><ul><li>If  =0.9, the sum of averaged lengths for all actions is 5, a3 has occurred 50 times in s3, and s3 is visited 100 times, then R t (s3,a3) = 0.9*1/5 + 0.1*50/100 = 0.18+0.05 = 0.23 </li></ul>
    • 53. Example Sequence Predictors <ul><li>Advantages </li></ul><ul><ul><li>Permits predictions based on sequence of events </li></ul></ul><ul><ul><li>Simple learning mechanism </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Relatively ad hoc weighting of sequence matches </li></ul></ul><ul><ul><li>Limited prediction capabilities </li></ul></ul><ul><ul><li>Large overhead for long past state/action sequences </li></ul></ul>
    • 54. Markov Chain Prediction <ul><li>Use the string of previous events or states to create a model of the event generating process. </li></ul><ul><ul><li>Models are probabilistic and can be constructed from the observed behavior of the system </li></ul></ul><ul><ul><li>Prediction is the most event that is most likely to be generated by the model. </li></ul></ul><ul><li>Issues: </li></ul><ul><ul><li>What form should the model take ? </li></ul></ul><ul><ul><ul><li>String-based models </li></ul></ul></ul><ul><ul><ul><li>State-based models </li></ul></ul></ul>
    • 55. Example System: Active LeZi [GC03] <ul><li>Assumptions: </li></ul><ul><ul><li>Event sequences are fairly repeatable </li></ul></ul><ul><ul><li>Generated by deterministic source </li></ul></ul><ul><li>Construct model as parse tree of possible event sequences </li></ul><ul><ul><li>Nodes are events with associated frequencies </li></ul></ul><ul><ul><li>Model constructed using LZ78 text compression algorithm </li></ul></ul><ul><li>[DH98] K. Gopalratnam and D. J. Cook, Active LeZi : An Incremental Parsing Algorithm for Device Usage Prediction in the Smart Home , In Proceedings of the Florida Artificial Intelligence Research Symposium , 2003. </li></ul>
    • 56. Text Compression: LZ78 <ul><li>Parses string x 1 , x 2 , …. x i into c(i) </li></ul><ul><li>substrings w 1 , w 2 , …. w c(i) that form the set of phrases used for compression </li></ul><ul><ul><li>Each prefix of a phrase w j is also a phrase w i in the set used for compression </li></ul></ul><ul><ul><li>Example: </li></ul></ul><ul><ul><ul><li>input aaababbbbbaabccddcbaaaa </li></ul></ul></ul><ul><ul><ul><li>yields phrases a,aa,b,ab,bb,bba,abc,c,d,dc,ba,aaa </li></ul></ul></ul>
    • 57. Active LeZi <ul><li>Represent compression phrases as a parse tree with frequency statistics </li></ul><ul><ul><li>E.g.: aaababbbbbaabccddcbaaaa </li></ul></ul>
    • 58. Prediction in Active LeZi <ul><li>Calculate the probability for each possible event </li></ul><ul><ul><li>To calculate the probability, transitions across phrase boundaries have to be considered </li></ul></ul><ul><ul><ul><li>Slide window across the input sequence </li></ul></ul></ul><ul><ul><ul><li>Length k equal to longest phrase seen so far </li></ul></ul></ul><ul><ul><ul><li>Gather statistics on all possible contexts </li></ul></ul></ul><ul><ul><ul><li>Order k-1 Markov model </li></ul></ul></ul><ul><li>Output event with greatest probability across all contexts as prediction </li></ul>
    • 59. Example: Probability of a <ul><li>Order 2 </li></ul><ul><ul><li>2/5 times that aa appears </li></ul></ul><ul><li>Order 1 </li></ul><ul><ul><li>5/10 times that a appears </li></ul></ul><ul><li>Order 0 </li></ul><ul><ul><li>10/23 total symbols </li></ul></ul><ul><li>Blended probability is </li></ul><ul><li>Probability of escaping to lower order = frequency of null endings </li></ul>
    • 60. Active LeZi Example: Prediction on Simulated MavHome Data <ul><li>Data simulates a single inhabitant interacting with the devices in the home </li></ul><ul><ul><li>Repetitive behavior patterns are embedded in the data (e.g. morning routine ) </li></ul></ul><ul><ul><li>Time is ignored in the prediction </li></ul></ul><ul><ul><li>Only device interactions are recorded </li></ul></ul>
    • 61. Active LeZi <ul><li>Advantages </li></ul><ul><ul><li>Permits predictions based on sequence of events </li></ul></ul><ul><ul><li>Does not require the construction of states </li></ul></ul><ul><ul><li>Permits probabilistic predictions </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Tree can become very large (long prediction times) </li></ul></ul><ul><ul><li>Nonoptimal predictions if the tree is not sufficiently deep </li></ul></ul>
    • 62. Markov Chain Models <ul><li>Markov chain models represent the event generating process probabilistically. </li></ul><ul><ul><li>Markov models can be described by a tuple <S, T> representing states and transition probabilities. </li></ul></ul><ul><ul><li>Markov assumption: The current state contains all information about the past that is necessary to predict the probability of the next state. </li></ul></ul><ul><ul><ul><li>P(x t+1 |x t , x t-1 , …, x 0 ) = P(x t+1 | x t ) </li></ul></ul></ul><ul><ul><li>Transitions correspond to events that occurred in the environment (inhabitant actions, etc) </li></ul></ul><ul><li>Prediction of next state (and event) </li></ul>
    • 63. Markov Chain Example <ul><li>Example states: </li></ul><ul><ul><li>S = {(Room, Time, Day, Previous Room)} </li></ul></ul><ul><ul><li>Transition probabilities can be calculated from training data by counting occurrences </li></ul></ul>x 1 x 4 x 6 x 2 x 5 x 3
    • 64. Markov Models <ul><li>Advantages </li></ul><ul><ul><li>Permits probabilistic predictions </li></ul></ul><ul><ul><li>Transition probabilities are easy to learn </li></ul></ul><ul><ul><li>Representation is easy to interpret </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>State space has to have Markov property </li></ul></ul><ul><ul><li>State space selection is not automatic </li></ul></ul><ul><ul><li>States might have to include previous information </li></ul></ul><ul><ul><li>State attributes might not be observable </li></ul></ul>
    • 65. Partially Observable MMs <ul><li>Partially Observable Markov Models extend Markov models by permitting states to be only partially observable. </li></ul><ul><ul><li>Systems can be represented by a tuple <S, T, O, V> where <S, T> is a Markov model and O, V are mapping observations about the state to probabilities of a given state </li></ul></ul><ul><ul><ul><li>O = {o i } is the set of observations </li></ul></ul></ul><ul><ul><ul><li>V: V(x, o) = P(o | x) </li></ul></ul></ul><ul><li>To determine a prediction the probability of being in any given state is computed </li></ul>
    • 66. Partially Observable MMs <ul><li>Prediction is the most likely next state given the information about the current state (i.e. the current belief state): </li></ul><ul><ul><li>Belief state B is a probability distribution over the state space: </li></ul></ul><ul><ul><ul><li>B = ((x 1 , P(x 1 )), …, (x n , P(x n )) </li></ul></ul></ul><ul><ul><li>Prediction of the next state: </li></ul></ul>
    • 67. Hidden Markov Models <ul><li>Hidden Markov Models (HMM) provide mechanisms to learn the Markov Model <S, T> underlying a POMM from the sequence of observations. </li></ul><ul><ul><li>Baum-Welch algorithm learns transition and observation probabilities as well as the state space (only the number of states has to be given) </li></ul></ul><ul><ul><li>Model learned is the one that is most likely to explain the observed training sequences </li></ul></ul>
    • 68. Hidden Markov Model Example <ul><li>Tossing a balanced coin starting with a biased coin that always starts heads: </li></ul>
    • 69. Partially Observable MMs <ul><li>Advantages </li></ul><ul><ul><li>Permits optimal predictions </li></ul></ul><ul><ul><li>HMM provide algorithms to learn the model </li></ul></ul><ul><ul><li>In HMM, Markovian state space description has not to be known </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>State space can be enormous </li></ul></ul><ul><ul><li>Learning of HMM is generally very complex </li></ul></ul><ul><ul><li>Computation of belief state is computationally expensive </li></ul></ul>
    • 70. Example Location Prediction Task <ul><li>Environment and observations: </li></ul>[0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] [0, 1, 0, 2, 4, 5, 4, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2] [0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] [0, 1, 0, 2, 0, 2, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6] [0, 1, 0, 2, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] [4, 3, 4, 2, 0, 1, 0, 0, 0, 1, 2, 4, 5, 4, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 4, 5, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 3, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 5, 4, 6, 6]
    • 71. Neural Network Predictor <ul><li>Example network and training data </li></ul><ul><ul><li>Data has to be divided into training instances </li></ul></ul><ul><ul><li>Inputs represent current and 4 past locations </li></ul></ul># Input training pattern 1: 6 6 6 6 6 # Output training pattern 1: 1.000 # Input training pattern 2: 6 6 4 3 4 # Output training pattern 2: 0.333 # Input training pattern 3: 6 6 6 6 6 # Output training pattern 3: 1.000 # Input training pattern 4: 6 6 6 6 6 # Output training pattern 4: 1.000 # Input training pattern 5: 6 6 6 6 6 # Output training pattern 5: 1.000
    • 72. Neural Network Predictor <ul><li>Learning performance depends on: </li></ul><ul><ul><li>Network topology </li></ul></ul><ul><ul><li>Input representation </li></ul></ul><ul><ul><li>Learning rate </li></ul></ul>
    • 73. Hidden Markov Model Example <ul><li>Input representation and learned HMM: </li></ul><ul><ul><li>Initial and final HMM model </li></ul></ul>
    • 74. Dynamic Bayesian Networks <ul><li>Dynamic Bayesian Networks use a Bayesian network to represent the belief state. </li></ul><ul><ul><li>State is constructed from a set of attributes (nodes) </li></ul></ul><ul><ul><li>Transitions over time are modeled as links between a model at time t and a model at time t+1 </li></ul></ul>Get ready Time Day Room Get ready Time Day Room Time t Time t+1
    • 75. Dynamic Bayesian Networks <ul><li>Advantages </li></ul><ul><ul><li>Handles partial observability </li></ul></ul><ul><ul><li>More compact model </li></ul></ul><ul><ul><li>Belief state is inherent in the network </li></ul></ul><ul><ul><li>Simple prediction of next belief state </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>State attributes have to be predetermined </li></ul></ul><ul><ul><li>Learning of probabilities is very complex </li></ul></ul>
    • 76. Conclusions <ul><li>Prediction is important in intelligent environments </li></ul><ul><ul><li>Captures repetitive patterns (activities) </li></ul></ul><ul><ul><li>Helps automating activities (But: only tells what will happen next; not what the system should do next) </li></ul></ul><ul><li>Different prediction algorithms have different strength and weaknesses: </li></ul><ul><ul><li>Select a prediction approach that is suitable for the particular problem. </li></ul></ul><ul><ul><li>There is no “best” prediction approach </li></ul></ul><ul><li>Optimal prediction is a very hard problem and is not yet solved. </li></ul>

    ×