Context Aware Resource Management in Multi-Inhabitant Smart Homes: A Nash  H -learning based Approach Nirmalya Roy, Abhishek Roy & Sajal K Das Presented by:  Viraj Bhat  virajbATcaip * rutgers * edu
Q-learning Q-learning Reinforcement Learning (RL) algorithm Does not need a model of its environment and can be used on-line. Estimates the values of state-action pairs. Q(s,a)   = expected discounted sum of future payoffs obtained by taking  action a  from  state s  and following an optimal policy thereafter. Optimal action from any state is the one with the highest Q-value.
Nash Equilibrium Game theory  selection of the best option when the costs and advantages of each possibility are not pre-determined, but are instead reliant on the future behaviors of other players. Zero sum game player benefits only at the expense of others (Chess, go, matching pennies Non-Zero sum game one player does not necessarily correspond with a loss by another (prisoner's dilemma). A Nash Equilibrium is a set of mixed strategies for finite, non-cooperative games between two or more players whereby no player can improve his or her payoff by changing their strategy.
Entropy Entropy in information theory describes with how much randomness (or, alternatively, 'uncertainty') there is in a signal or random event. [ Shannon's entropy ] Entropy satisfies the assumptions : The measure should be proportional (continuous)  changing the value of one of the probabilities by a very small amount should only change the entropy by a small amount.  If all the outcomes are equally likely then increasing the samples should always increase the entropy.  Entropy of the final result should be a weighted sum of the entropies.
Smart Home Goal is to provide its inhabitants  maximum possible comfort minimize resource consumption  reduce overall cost of maintaining a home How it does it Autonomously acquire and apply knowledge about its inhabitants (“Context Awareness”) Infrastructure needs to be cognizant of its context. Adapt to inhabitants behavior or preferences (Strike a balance) Problem Mobility of individuals creates uncertainty of location and subsequent activities  Optimal location prediction across multiple inhabitants is  “NP hard” (Proved in this paper) Contexts of multiple inhabitants in same environments are inherently  co-related  and  interdependent .
Contributions Prediction of location across multiple inhabitants is an NP-hard problem. Develop Nash  H -learning  Explores co-relation of mobility patterns across inhabitants minimizes overall joint uncertainty Predict most likely routes followed by multiple inhabitants Knowledge of inhabitants contexts like location and associated activities helps control automated devices Nash- H  learning performs better than predictive schemes optimized for individual inhabitants’ location/activity. Collective better than individual
Single Inhabitant Location Tracking Symbolic interpretation of the inhabitants movement (mobility) profile as captured by RFID readers, sensors and pressure switches Inhabitants current location is reflection of mobility/activity  profile that can be learned over time in a online fashion. Mobility    Stochastic process    (repetitive nature of routes) Piecewise stationary    LZ-78 (text compression)    minimize overall entropy
Multi-Inhabitant Location Prediction For a group of  predictions for  inhabitants residing in a smart home consisting of  L  different locations; the objective is to minimize the number of successful location predictions Problem of maximizing number of successful predictions of multiple inhabitants’ locations is NP hard Proof : Reduce the problem to Set Packing Set Packing: Goal is to maximize the number of mutually disjoint subsets from S Tracking: Each location identified by a sensor occupied by at most inhabitant
Predictive Nash- H  learning Background Assumption: Every agent wants to satisfy his own preferences Goal: achieve suitable balance among the preferences of all inhabitants residing in a smart home. (Non co-operative game theory) Every  n  stochastic game possesses at least one Nash equilibrium. Entropy –  H  Learning Learn to perform actions that optimize reward. Learns a value function that maps state-action pairs to future reward using entropy measure,  H . (new experience + old value function)    statistically improved value function
Nash- H  Algorithm Learning agent indexed by  i  learns about its  H - values by forming an arbitrary guess at time 0. Initially assumed to be zero. At each time t agent i  observes current state and takes its action Observes  own  reward & action by  other  agents & their rewards & its own state    calculates Nash Equilibrium Other agents H-values not given but starting initialized to zero. Agent  i  observes other agents immediate rewards and previous actions. Agent  i  updates it’s beliefs about  j ’s  H -function according to a updating rule its applied to it’s own. Learning parameters (0-1)
NHL algorithm
Convergence of NHL Algorithm Every state  and action  for  k = 1,… n  are visited infinitely often. Updates of H-function occur only at the current state. Learning rate satisfies the following conditions: Proof:  Iterative utility functions converge to the Nash Equilibrium with probability 1 The predictive H-learning framework given by equation to predict entropies converges to a Nash Equilibrium
Worst case Analysis Ratio of the worst possible Nash Equilibrium and “Social Optimum” as the measure of the effectiveness of the system. Worst case analysis (coordination ratio) = Worst possible cost/Optimal Cost Similar to the problem of throwing  m  balls in  m  bins and attempting to find the expected maximum number of balls.
Inhabitants’ Joint Typical Routes Use concept of  jointly-typical  joint set and  asymptotic equipartition property  (AEP) to derive small subsets of highly probable routes. System captures typical set of inhabitant movement profiles from the  H -learning scheme Uses this to predict the inhabitants most likely routes Measure this using  which provides the gap between the ideal probability of a typical route and the probability the route is stored in a dictionary Experiments have delta less than equal to 0.01.
Smart Home: Resource and Comfort Management Mobility-Aware Energy Consumption Devices like lights, fan or air-conditioner operate in a pro-active manner to conserve energy during occupants absence (in particular places). Smart Temperature Control Distributed Control System  Preconditioning period: Time to get the temperature to a specific level Estimation of Inhabitants Comfort Subjective measure experienced; difficult to categorize; approximated Joint function (Temperature deviation, No of Manual devices & Time spent)
Experiments Inhabitants in a smart home equipped with RF tags sensed by RF reader The sensors placed in a smart home work in coordination with RF readers. Studied energy management without any predictive scheme and per-inhabitant location prediction. Simulation experiments for 12 weeks over 4 inhabitants and 2 visitors Raw data
Predictive Location Estimation H -leaning approach  Entropy of resident is (1-3); Visitor is 4(uncertainity) Nash  H  learning  Entropy of individual is 4.0 at start As it learns more due to learning entropy reduces to 1.0 Total Entropy is between 10.0 to 4 H  - learning Nash -H  learning
Prediction Successes H  learning capable of estimating location of residents with 90% accuracy in 3 weeks Visitors is between 50-60% Nash-H learning initially slow but has higher success rate of 95% for joint predictions Individual predictions have correlations so only 80% is max possible. Nash- H  learning leads to higher success rate than simple  H -learning H  - learning Nash -H  learning
Inhabitants joint typical routes Size of individual and joint typical set is initially 50% of the total routes Size shrinks to less than 10% as the system captures inhabitants movements
Storage Overhead Nash-H scheme has low storage (memory) overhead – 10KB Compared to total storage for 40KB for existing per-inhabitant prediction scheme.
Energy Savings Without prediction daily energy consumption is 20KW-Hr Using predictive schemes; daily average energy consumption kept at 4KW-Hr (after system learns)
Comfort Nash- H  learning scheme reduced manual operations performed and time spent for all inhabitants
Comments and Discussions How does this system scale as the number of individuals in the system increase. The system uses joint typical sets; which hopefully characterize the entire set and represent the most probable routes; what if this is not the case. Results about entropy of visitors (entropy is far worse than individuals) is not presented in Nash-H learning framework (Prediction Success). To get successful predictions in learning schemes memory required for storage keeps increasing. Figure 8 : cleverly plotted to highlight Nash- H  scheme.
References N. Roy, A. Roy, S. K. Das, “ Context-Aware Resource Management in Multi-Inhabitant Smart Homes: A Nash H-Learning based Approach” , Pervasive Computing and Communications, Pisa - Italy, 13-17 March 2006 “ Information Entropy”  http://en.wikipedia.org/wiki/Shannon_entropy J. Hu, M. P. Wellman, “ Nash Q-Learning for General-Sum Stochastic Games ” Journal of Machine Learning Research 4 (2003) 1039-1069 “ Nash equilibrium”,  http://en.wikipedia.org/wiki/Nash_equilibrium

Presentation

  • 1.
    Context Aware ResourceManagement in Multi-Inhabitant Smart Homes: A Nash H -learning based Approach Nirmalya Roy, Abhishek Roy & Sajal K Das Presented by: Viraj Bhat virajbATcaip * rutgers * edu
  • 2.
    Q-learning Q-learning ReinforcementLearning (RL) algorithm Does not need a model of its environment and can be used on-line. Estimates the values of state-action pairs. Q(s,a) = expected discounted sum of future payoffs obtained by taking action a from state s and following an optimal policy thereafter. Optimal action from any state is the one with the highest Q-value.
  • 3.
    Nash Equilibrium Gametheory selection of the best option when the costs and advantages of each possibility are not pre-determined, but are instead reliant on the future behaviors of other players. Zero sum game player benefits only at the expense of others (Chess, go, matching pennies Non-Zero sum game one player does not necessarily correspond with a loss by another (prisoner's dilemma). A Nash Equilibrium is a set of mixed strategies for finite, non-cooperative games between two or more players whereby no player can improve his or her payoff by changing their strategy.
  • 4.
    Entropy Entropy ininformation theory describes with how much randomness (or, alternatively, 'uncertainty') there is in a signal or random event. [ Shannon's entropy ] Entropy satisfies the assumptions : The measure should be proportional (continuous) changing the value of one of the probabilities by a very small amount should only change the entropy by a small amount. If all the outcomes are equally likely then increasing the samples should always increase the entropy. Entropy of the final result should be a weighted sum of the entropies.
  • 5.
    Smart Home Goalis to provide its inhabitants maximum possible comfort minimize resource consumption reduce overall cost of maintaining a home How it does it Autonomously acquire and apply knowledge about its inhabitants (“Context Awareness”) Infrastructure needs to be cognizant of its context. Adapt to inhabitants behavior or preferences (Strike a balance) Problem Mobility of individuals creates uncertainty of location and subsequent activities Optimal location prediction across multiple inhabitants is “NP hard” (Proved in this paper) Contexts of multiple inhabitants in same environments are inherently co-related and interdependent .
  • 6.
    Contributions Prediction oflocation across multiple inhabitants is an NP-hard problem. Develop Nash H -learning Explores co-relation of mobility patterns across inhabitants minimizes overall joint uncertainty Predict most likely routes followed by multiple inhabitants Knowledge of inhabitants contexts like location and associated activities helps control automated devices Nash- H learning performs better than predictive schemes optimized for individual inhabitants’ location/activity. Collective better than individual
  • 7.
    Single Inhabitant LocationTracking Symbolic interpretation of the inhabitants movement (mobility) profile as captured by RFID readers, sensors and pressure switches Inhabitants current location is reflection of mobility/activity profile that can be learned over time in a online fashion. Mobility  Stochastic process  (repetitive nature of routes) Piecewise stationary  LZ-78 (text compression)  minimize overall entropy
  • 8.
    Multi-Inhabitant Location PredictionFor a group of predictions for inhabitants residing in a smart home consisting of L different locations; the objective is to minimize the number of successful location predictions Problem of maximizing number of successful predictions of multiple inhabitants’ locations is NP hard Proof : Reduce the problem to Set Packing Set Packing: Goal is to maximize the number of mutually disjoint subsets from S Tracking: Each location identified by a sensor occupied by at most inhabitant
  • 9.
    Predictive Nash- H learning Background Assumption: Every agent wants to satisfy his own preferences Goal: achieve suitable balance among the preferences of all inhabitants residing in a smart home. (Non co-operative game theory) Every n stochastic game possesses at least one Nash equilibrium. Entropy – H Learning Learn to perform actions that optimize reward. Learns a value function that maps state-action pairs to future reward using entropy measure, H . (new experience + old value function)  statistically improved value function
  • 10.
    Nash- H Algorithm Learning agent indexed by i learns about its H - values by forming an arbitrary guess at time 0. Initially assumed to be zero. At each time t agent i observes current state and takes its action Observes own reward & action by other agents & their rewards & its own state  calculates Nash Equilibrium Other agents H-values not given but starting initialized to zero. Agent i observes other agents immediate rewards and previous actions. Agent i updates it’s beliefs about j ’s H -function according to a updating rule its applied to it’s own. Learning parameters (0-1)
  • 11.
  • 12.
    Convergence of NHLAlgorithm Every state and action for k = 1,… n are visited infinitely often. Updates of H-function occur only at the current state. Learning rate satisfies the following conditions: Proof: Iterative utility functions converge to the Nash Equilibrium with probability 1 The predictive H-learning framework given by equation to predict entropies converges to a Nash Equilibrium
  • 13.
    Worst case AnalysisRatio of the worst possible Nash Equilibrium and “Social Optimum” as the measure of the effectiveness of the system. Worst case analysis (coordination ratio) = Worst possible cost/Optimal Cost Similar to the problem of throwing m balls in m bins and attempting to find the expected maximum number of balls.
  • 14.
    Inhabitants’ Joint TypicalRoutes Use concept of jointly-typical joint set and asymptotic equipartition property (AEP) to derive small subsets of highly probable routes. System captures typical set of inhabitant movement profiles from the H -learning scheme Uses this to predict the inhabitants most likely routes Measure this using which provides the gap between the ideal probability of a typical route and the probability the route is stored in a dictionary Experiments have delta less than equal to 0.01.
  • 15.
    Smart Home: Resourceand Comfort Management Mobility-Aware Energy Consumption Devices like lights, fan or air-conditioner operate in a pro-active manner to conserve energy during occupants absence (in particular places). Smart Temperature Control Distributed Control System Preconditioning period: Time to get the temperature to a specific level Estimation of Inhabitants Comfort Subjective measure experienced; difficult to categorize; approximated Joint function (Temperature deviation, No of Manual devices & Time spent)
  • 16.
    Experiments Inhabitants ina smart home equipped with RF tags sensed by RF reader The sensors placed in a smart home work in coordination with RF readers. Studied energy management without any predictive scheme and per-inhabitant location prediction. Simulation experiments for 12 weeks over 4 inhabitants and 2 visitors Raw data
  • 17.
    Predictive Location EstimationH -leaning approach Entropy of resident is (1-3); Visitor is 4(uncertainity) Nash H learning Entropy of individual is 4.0 at start As it learns more due to learning entropy reduces to 1.0 Total Entropy is between 10.0 to 4 H - learning Nash -H learning
  • 18.
    Prediction Successes H learning capable of estimating location of residents with 90% accuracy in 3 weeks Visitors is between 50-60% Nash-H learning initially slow but has higher success rate of 95% for joint predictions Individual predictions have correlations so only 80% is max possible. Nash- H learning leads to higher success rate than simple H -learning H - learning Nash -H learning
  • 19.
    Inhabitants joint typicalroutes Size of individual and joint typical set is initially 50% of the total routes Size shrinks to less than 10% as the system captures inhabitants movements
  • 20.
    Storage Overhead Nash-Hscheme has low storage (memory) overhead – 10KB Compared to total storage for 40KB for existing per-inhabitant prediction scheme.
  • 21.
    Energy Savings Withoutprediction daily energy consumption is 20KW-Hr Using predictive schemes; daily average energy consumption kept at 4KW-Hr (after system learns)
  • 22.
    Comfort Nash- H learning scheme reduced manual operations performed and time spent for all inhabitants
  • 23.
    Comments and DiscussionsHow does this system scale as the number of individuals in the system increase. The system uses joint typical sets; which hopefully characterize the entire set and represent the most probable routes; what if this is not the case. Results about entropy of visitors (entropy is far worse than individuals) is not presented in Nash-H learning framework (Prediction Success). To get successful predictions in learning schemes memory required for storage keeps increasing. Figure 8 : cleverly plotted to highlight Nash- H scheme.
  • 24.
    References N. Roy,A. Roy, S. K. Das, “ Context-Aware Resource Management in Multi-Inhabitant Smart Homes: A Nash H-Learning based Approach” , Pervasive Computing and Communications, Pisa - Italy, 13-17 March 2006 “ Information Entropy” http://en.wikipedia.org/wiki/Shannon_entropy J. Hu, M. P. Wellman, “ Nash Q-Learning for General-Sum Stochastic Games ” Journal of Machine Learning Research 4 (2003) 1039-1069 “ Nash equilibrium”, http://en.wikipedia.org/wiki/Nash_equilibrium

Editor's Notes

  • #3 What is RL ?? the process by which an agent/player/entity improves its behavior in an environment via experience