Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • What is RL ?? the process by which an agent/player/entity improves its behavior in an environment via experience
  • Presentation

    1. 1. Context Aware Resource Management in Multi-Inhabitant Smart Homes: A Nash H -learning based Approach Nirmalya Roy, Abhishek Roy & Sajal K Das Presented by: Viraj Bhat virajbATcaip * rutgers * edu
    2. 2. Q-learning <ul><li>Q-learning </li></ul><ul><ul><li>Reinforcement Learning (RL) algorithm </li></ul></ul><ul><ul><ul><li>Does not need a model of its environment and can be used on-line. </li></ul></ul></ul><ul><ul><li>Estimates the values of state-action pairs. </li></ul></ul><ul><ul><li>Q(s,a) = expected discounted sum of future payoffs obtained by taking action a from state s and following an optimal policy thereafter. </li></ul></ul><ul><ul><li>Optimal action from any state is the one with the highest Q-value. </li></ul></ul>
    3. 3. Nash Equilibrium <ul><li>Game theory </li></ul><ul><ul><li>selection of the best option when the costs and advantages of each possibility are not pre-determined, but are instead reliant on the future behaviors of other players. </li></ul></ul><ul><ul><li>Zero sum game </li></ul></ul><ul><ul><ul><li>player benefits only at the expense of others (Chess, go, matching pennies </li></ul></ul></ul><ul><ul><li>Non-Zero sum game </li></ul></ul><ul><ul><ul><li>one player does not necessarily correspond with a loss by another (prisoner's dilemma). </li></ul></ul></ul><ul><li>A Nash Equilibrium is a set of mixed strategies for finite, non-cooperative games between two or more players whereby no player can improve his or her payoff by changing their strategy. </li></ul>
    4. 4. Entropy <ul><li>Entropy in information theory describes with how much randomness (or, alternatively, 'uncertainty') there is in a signal or random event. [ Shannon's entropy ] </li></ul><ul><li>Entropy satisfies the assumptions : </li></ul><ul><ul><li>The measure should be proportional (continuous) </li></ul></ul><ul><ul><ul><li>changing the value of one of the probabilities by a very small amount should only change the entropy by a small amount. </li></ul></ul></ul><ul><ul><li>If all the outcomes are equally likely then increasing the samples should always increase the entropy. </li></ul></ul><ul><ul><li>Entropy of the final result should be a weighted sum of the entropies. </li></ul></ul>
    5. 5. Smart Home <ul><li>Goal is to provide its inhabitants </li></ul><ul><ul><li>maximum possible comfort </li></ul></ul><ul><ul><li>minimize resource consumption </li></ul></ul><ul><ul><li>reduce overall cost of maintaining a home </li></ul></ul><ul><li>How it does it </li></ul><ul><ul><li>Autonomously acquire and apply knowledge about its inhabitants (“Context Awareness”) </li></ul></ul><ul><ul><ul><li>Infrastructure needs to be cognizant of its context. </li></ul></ul></ul><ul><ul><li>Adapt to inhabitants behavior or preferences (Strike a balance) </li></ul></ul><ul><li>Problem </li></ul><ul><ul><li>Mobility of individuals creates uncertainty of location and subsequent activities </li></ul></ul><ul><ul><ul><li>Optimal location prediction across multiple inhabitants is “NP hard” (Proved in this paper) </li></ul></ul></ul><ul><ul><li>Contexts of multiple inhabitants in same environments are inherently co-related and interdependent . </li></ul></ul>
    6. 6. Contributions <ul><li>Prediction of location across multiple inhabitants is an NP-hard problem. </li></ul><ul><li>Develop Nash H -learning </li></ul><ul><ul><li>Explores co-relation of mobility patterns across inhabitants </li></ul></ul><ul><ul><li>minimizes overall joint uncertainty </li></ul></ul><ul><li>Predict most likely routes followed by multiple inhabitants </li></ul><ul><li>Knowledge of inhabitants contexts like location and associated activities helps control automated devices </li></ul><ul><li>Nash- H learning performs better than predictive schemes optimized for individual inhabitants’ location/activity. </li></ul><ul><ul><li>Collective better than individual </li></ul></ul>
    7. 7. Single Inhabitant Location Tracking <ul><li>Symbolic interpretation of the inhabitants movement (mobility) profile as captured by RFID readers, sensors and pressure switches </li></ul><ul><li>Inhabitants current location is reflection of mobility/activity profile that can be learned over time in a online fashion. </li></ul><ul><ul><li>Mobility  Stochastic process  (repetitive nature of routes) Piecewise stationary  LZ-78 (text compression)  minimize overall entropy </li></ul></ul>
    8. 8. Multi-Inhabitant Location Prediction <ul><li>For a group of predictions for inhabitants residing in a smart home consisting of L different locations; the objective is to minimize the number of successful location predictions </li></ul><ul><li>Problem of maximizing number of successful predictions of multiple inhabitants’ locations is NP hard </li></ul><ul><ul><li>Proof : Reduce the problem to Set Packing </li></ul></ul><ul><li>Set Packing: Goal is to maximize the number of mutually disjoint subsets from S </li></ul><ul><li>Tracking: Each location identified by a sensor occupied by at most inhabitant </li></ul>
    9. 9. Predictive Nash- H learning <ul><li>Background </li></ul><ul><ul><li>Assumption: Every agent wants to satisfy his own preferences </li></ul></ul><ul><ul><li>Goal: achieve suitable balance among the preferences of all inhabitants residing in a smart home. (Non co-operative game theory) </li></ul></ul><ul><ul><ul><li>Every n stochastic game possesses at least one Nash equilibrium. </li></ul></ul></ul><ul><li>Entropy – H Learning </li></ul><ul><ul><li>Learn to perform actions that optimize reward. </li></ul></ul><ul><ul><ul><li>Learns a value function that maps state-action pairs to future reward using entropy measure, H . </li></ul></ul></ul><ul><ul><ul><li>(new experience + old value function)  statistically improved value function </li></ul></ul></ul>
    10. 10. Nash- H Algorithm <ul><li>Learning agent indexed by i learns about its H - values by forming an arbitrary guess at time 0. </li></ul><ul><ul><li>Initially assumed to be zero. </li></ul></ul><ul><li>At each time t agent i observes current state and takes its action </li></ul><ul><li>Observes own reward & action by other agents & their rewards & its own state  calculates Nash Equilibrium </li></ul><ul><ul><li>Other agents H-values not given but starting initialized to zero. </li></ul></ul><ul><ul><li>Agent i observes other agents immediate rewards and previous actions. </li></ul></ul><ul><ul><li>Agent i updates it’s beliefs about j ’s H -function according to a updating rule its applied to it’s own. </li></ul></ul>Learning parameters (0-1)
    11. 11. NHL algorithm
    12. 12. Convergence of NHL Algorithm <ul><li>Every state and action for k = 1,… n are visited infinitely often. </li></ul><ul><li>Updates of H-function occur only at the current state. </li></ul><ul><ul><li>Learning rate satisfies the following conditions: </li></ul></ul><ul><li>Proof: </li></ul><ul><ul><li>Iterative utility functions converge to the Nash Equilibrium with probability 1 </li></ul></ul><ul><ul><li>The predictive H-learning framework given by equation to predict entropies converges to a Nash Equilibrium </li></ul></ul>
    13. 13. Worst case Analysis <ul><li>Ratio of the worst possible Nash Equilibrium and “Social Optimum” as the measure of the effectiveness of the system. </li></ul><ul><ul><li>Worst case analysis (coordination ratio) = Worst possible cost/Optimal Cost </li></ul></ul><ul><ul><li>Similar to the problem of throwing m balls in m bins and attempting to find the expected maximum number of balls. </li></ul></ul>
    14. 14. Inhabitants’ Joint Typical Routes <ul><li>Use concept of jointly-typical joint set and asymptotic equipartition property (AEP) to derive small subsets of highly probable routes. </li></ul><ul><li>System captures typical set of inhabitant movement profiles from the H -learning scheme </li></ul><ul><ul><li>Uses this to predict the inhabitants most likely routes </li></ul></ul><ul><ul><li>Measure this using which provides the gap between the ideal probability of a typical route and the probability the route is stored in a dictionary </li></ul></ul><ul><ul><li>Experiments have delta less than equal to 0.01. </li></ul></ul>
    15. 15. Smart Home: Resource and Comfort Management <ul><li>Mobility-Aware Energy Consumption </li></ul><ul><ul><li>Devices like lights, fan or air-conditioner operate in a pro-active manner to conserve energy during occupants absence (in particular places). </li></ul></ul><ul><li>Smart Temperature Control </li></ul><ul><ul><li>Distributed Control System </li></ul></ul><ul><ul><ul><li>Preconditioning period: Time to get the temperature to a specific level </li></ul></ul></ul><ul><li>Estimation of Inhabitants Comfort </li></ul><ul><ul><li>Subjective measure experienced; difficult to categorize; approximated </li></ul></ul><ul><ul><li>Joint function (Temperature deviation, No of Manual devices & Time spent) </li></ul></ul>
    16. 16. Experiments <ul><li>Inhabitants in a smart home equipped with RF tags sensed by RF reader </li></ul><ul><li>The sensors placed in a smart home work in coordination with RF readers. </li></ul><ul><li>Studied energy management without any predictive scheme and per-inhabitant location prediction. </li></ul><ul><li>Simulation experiments for 12 weeks over 4 inhabitants and 2 visitors </li></ul>Raw data
    17. 17. Predictive Location Estimation <ul><li>H -leaning approach </li></ul><ul><ul><li>Entropy of resident is (1-3); Visitor is 4(uncertainity) </li></ul></ul><ul><li>Nash H learning </li></ul><ul><ul><li>Entropy of individual is 4.0 at start </li></ul></ul><ul><ul><li>As it learns more due to learning entropy reduces to 1.0 </li></ul></ul><ul><ul><li>Total Entropy is between 10.0 to 4 </li></ul></ul>H - learning Nash -H learning
    18. 18. Prediction Successes <ul><li>H learning capable of estimating location of residents with 90% accuracy in 3 weeks </li></ul><ul><ul><li>Visitors is between 50-60% </li></ul></ul><ul><li>Nash-H learning initially slow but has higher success rate of 95% for joint predictions </li></ul><ul><ul><li>Individual predictions have correlations so only 80% is max possible. </li></ul></ul><ul><li>Nash- H learning leads to higher success rate than simple H -learning </li></ul>H - learning Nash -H learning
    19. 19. Inhabitants joint typical routes <ul><li>Size of individual and joint typical set is initially 50% of the total routes </li></ul><ul><li>Size shrinks to less than 10% as the system captures inhabitants movements </li></ul>
    20. 20. Storage Overhead <ul><li>Nash-H scheme has low storage (memory) overhead – 10KB </li></ul><ul><li>Compared to total storage for 40KB for existing per-inhabitant prediction scheme. </li></ul>
    21. 21. Energy Savings <ul><li>Without prediction daily energy consumption is 20KW-Hr </li></ul><ul><li>Using predictive schemes; daily average energy consumption kept at 4KW-Hr (after system learns) </li></ul>
    22. 22. Comfort <ul><li>Nash- H learning scheme reduced manual operations performed and time spent for all inhabitants </li></ul>
    23. 23. Comments and Discussions <ul><li>How does this system scale as the number of individuals in the system increase. </li></ul><ul><li>The system uses joint typical sets; which hopefully characterize the entire set and represent the most probable routes; what if this is not the case. </li></ul><ul><li>Results about entropy of visitors (entropy is far worse than individuals) is not presented in Nash-H learning framework (Prediction Success). </li></ul><ul><li>To get successful predictions in learning schemes memory required for storage keeps increasing. </li></ul><ul><ul><li>Figure 8 : cleverly plotted to highlight Nash- H scheme. </li></ul></ul>
    24. 24. References <ul><li>N. Roy, A. Roy, S. K. Das, “ Context-Aware Resource Management in Multi-Inhabitant Smart Homes: A Nash H-Learning based Approach” , Pervasive Computing and Communications, Pisa - Italy, 13-17 March 2006 </li></ul><ul><li>“ Information Entropy” </li></ul><ul><li>J. Hu, M. P. Wellman, “ Nash Q-Learning for General-Sum Stochastic Games ” Journal of Machine Learning Research 4 (2003) 1039-1069 </li></ul><ul><li>“ Nash equilibrium”, </li></ul>