SlideShare a Scribd company logo
1 of 51
Download to read offline
A Column Generation Approach to Solve Multi-Team
Influence Maximization Problem for Social Lottery Design
by
Manjunath Holaykoppa Nanjunda Jois
02-01-2015
A thesis submitted to the
Faculty of the Graduate School of the
State University of New York at Buffalo
in partial fulfillment of the requirements for the degree of
Master of Science
Department of Industrial and Systems Engineering
Acknowledgements
I would like to express my special thanks of gratitude to my Advisor Dr. Alexander Nikolaev and
Co-Advisor Dr. Jose Walteros for their continuous support and guidance throughout the research
work.
ii
Contents
Acknowledgements ii
List of Figures iv
List of Tables v
Abstract vi
1 Introduction 1
2 Literature Review 3
3 Influence Maximization for Social Lottery Design: Intuition and Problem For-
mulation 9
3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Calculation of Influencing Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Multi-Team Influence Maximization Problem . . . . . . . . . . . . . . . . . . . . . . 12
4 A Column Generation Approach for MTIMP 14
4.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Initial Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Master Problem for MTIMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Pricing Problem (Sub Problem) for MTIMP . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 A Column Generation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6 Further Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Computational Experiments 20
5.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 Random Lottery Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Discussion 23
6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7 Conclusion 25
Bibliography 26
8 Appendix 29
iii
List of Figures
2.1 Empirical Evidence of a Diminishing Returns Phenomenon in Social Influence . . . . 6
3.1 Social Network Depicting Active Energy Savers in a Community . . . . . . . . . . . 10
3.2 Awareness Spread with Different Groups of Winners . . . . . . . . . . . . . . . . . . 10
3.3 Piecewise Linear Function to Model Awareness Gain Related to Winning Neighbors 11
8.1 Results of Computational Study on Advagato Network - Instance 1 . . . . . . . . . . 30
8.2 Results of Computational Study on Advagato Network - Instance 2 . . . . . . . . . . 31
8.3 Results of Computational Study on Advagato Network - Instance 3 . . . . . . . . . . 32
8.4 Results of Computational Study on Advagato Network - Instance 4 . . . . . . . . . . 33
8.5 Results of Computational Study on Advagato Network - Instance 5 . . . . . . . . . . 34
8.6 Results of Computational Study on Advagato Network - Instance 6 . . . . . . . . . . 35
8.7 Results of Computational Study on Advagato Network - Instance 7 . . . . . . . . . . 36
8.8 Results of Computational Study on Advagato Network - Instance 8 . . . . . . . . . . 37
8.9 Results of Computational Study on Advagato Network - Instance 9 . . . . . . . . . . 38
8.10 Results of Computational Study on Advagato Network - Instance 10 . . . . . . . . . 39
8.11 Results of Computational Study on Advagato Network - Instance 11 . . . . . . . . . 40
8.12 Results of Computational Study on Advagato Network - Instance 12 . . . . . . . . . 41
8.13 Results of Computational Study on Enron Email Network - Instance 1 . . . . . . . . 42
8.14 Results of Computational Study on Enron Email Network - Instance 2 . . . . . . . . 43
8.15 Results of Computational Study on Enron Email Network - Instance 3 . . . . . . . . 44
8.16 Results of Computational Study on Enron Email Network - Instance 4 . . . . . . . . 45
iv
List of Tables
5.1 Results of Computational Study on Advagato Network with Bi-modal Distribution
for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Results of Computational Study on Advagato Network with Chi-square Distribution
for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Results of Computational Study on Enron Email Network with Bi-modal Distribution
for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Results of Computational Study on Enron Email Network with Chi-square Distribu-
tion for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
v
Abstract
The conventional Influence Maximization problem is the problem of finding such a team (a small
subset) of seed nodes in a social network that would maximize the spread of influence over the
whole network. This paper considers a lottery system aimed at maximizing the awareness spread
to promote energy conservation behavior as a stochastic Influence Maximization problem with the
constraints ensuring lottery fairness. The resulting Multi-Team Influence Maximization problem
involves assigning the probabilities to multiple teams of seeds (interpreted as lottery winners) to
maximize the expected awareness spread. Such a variation of the Influence Maximization problem is
modeled as a Linear Program; however, enumerating all the possible teams is a hard task considering
that the feasible team count grows exponentially with the network size. In order to address this
challenge, we develop a column generation based approach to solve the problem with a limited num-
ber of candidate teams, where new candidates are generated and added to the problem iteratively.
We adopt a piecewise linear function to model the impact of including a new team so as to pick
only such teams which can improve the existing solution. We demonstrate that with this approach
we can solve such influence maximization problems to optimality, and perform computational study
with real-world social network data sets to showcase the efficiency of the approach in finding lottery
designs for optimal awareness spread. Lastly, we explore other possible scenarios where this model
can be utilized to optimally solve the otherwise hard to solve influence maximization problems.
vi
Chapter 1
Introduction
The efficient consumption of energy is an important issue considering its impact on the environ-
ment and the economy. Residential energy use accounts for 21% of total U.S energy consumption
[USE, 2014], and utility companies are held responsible for promoting efficient energy utilization
behavior with its consumers. Energy conscious consumer behavior, however, can never result in
any decrease in revenue generated through energy sales as government regulates energy prices to
offset any revenue loss. By incentivizing energy savings, the utility companies are able to serve more
customers with the same power output and the same profitability [App, 2009], making incentive
program design an important research and investment area.
Several Energy efficiency programs have been developed by major U.S. companies: e.g., such as
Google Powermeter, Microsoft Holm, Hug Energy, Tendril Energy efficiency and O-Power. Failures
of such programs can be attributed to the low consumer interest in the use of complex technology
and procedures. However, some programs have shown positive results by engaging the consumers
on a social level without the need for any technical skills [Nayak A.V, 2013].
With this in mind, we develop the idea of a lottery system to promote the efficient energy
utilization by developing a scalable mathematical model to solve such a problem to optimiality, i.e.,
ensuring the maximal spread of energy saving awareness.
The Lottery System Framework
The key concept of the lottery system is to aggregate energy savings of small groups and award it to
few households depicting desired energy conservation behavior ensuring the resultant prize amount
is significant enough to create interest and increase awareness in a community. To implement such
a lottery system, we begin by informing the residents of the selected community about the lottery
program and the fact that they have higher chances of winning a lottery prize of significant amount
by contributing to more energy savings. At the end of each period a group of winners shall be selected
simultaneously and the pooled prize amount is distributed equally among them. Upon distributing
the lottery prize in this manner, we can expect the winners to share their experience with their
neighbors resulting in spread of awareness. We believe that with increased awareness about the
program, more and more households would be motivated to reduce their energy consumption in
solicitation of winning such a lottery in the future. Since we are considering groups of winners, if
we wish to select S winners from a given community with N households, we would have N
S groups
of winners to choose from. The problem of interest here would be to find the optimal assignment of
the winning probabilities to all groups such that the awareness spread is maximized in expectation,
while ensuring that the probability of winning for any individual household winning is proportional
to their energy savings level.
1
Manjunath Holaykoppa Nanjunda Jois Introduction
To successfully solve for the best assignment of probabilities to multiple teams (groups of win-
ners), we would need to ensure two things, namely: (1) Lottery system is fair and the probability of
a participant winning increases with increase in their share of contribution to the total savings pool,
(2) The combination of winners are suitably located in the social network to maximize awareness
spread. In order to accomplish these two tasks we develop a mathematical model to accurately
depict the spread of awareness and provide ways to maximize the same. Considering that influence
maximization problems are generally hard to solve [Kempe et al., 2003], we try to develop an elegant
model with simple dynamics of influence spread while ensuring that the model corresponds to real
world social dynamics as closely as possible. While the problem of energy conservation considered
serves exactly this purpose, we believe this model would be applicable to other problems of similar
nature as well. Also, since the importance of energy conservation is a well established concept and
since we are aware of the existing methodology adopted by utility companies, we shift our attention
towards the development of an efficient mathematical model, which is the primary concern of this
paper.
In an effort to model such a lottery system we study existing literature to understand the following
aspects: (1) How to model awareness spread related to energy conscious behavior over a social
network? (2) How to solve such a model to obtain an optimal solution that maximizes the spread of
awareness?
The paper consists of the following sections. Chapter 2 contains literature review related to spread
of influence in a social network and existing methods towards identifying the best set of nodes for
maximizing the awareness spread. In Chapter 3 we define the problem of social lottery design and
develop a linear program to solve the same. Chapter 4 elaborates on column generation method to
solve large real world instances of the problem. Chapter 5 contains computational experiments on
empirical data-sets. In Chapter 6 we discuss further about limitations and scope for future works.
2
Chapter 2
Literature Review
Relevance of “Word-of-mouth” Communications for Adoption
of New Product or Idea.
The first part of our search leads us back to the experimental marketing research of 1950’s that
focused on modeling diffusion and adoption of new products in the market. These efforts were
primarily aimed at improving marketing methodologies considering the effect of “word-of-mouth”
communications which was established to be a key factor in convincing customer to make a purchase
decision [Brooks, 1957]. Contrary to popular beliefs of technology alienating individuals from society
and norms of conformance, it was found that new generations much less fixed by traditions are all
the more dependent on social conformance to “keep in tune with the life style of the moment”.
Furthermore, the need for such consensual validation was observed to be more pronounced in the
early introductory period of a new product in the market.
Most of the earlier research focused on the individual attributes and behavior of users and
generally suffered from a lack of understanding of network phenomenon. The study on the integration
of network and diffusion models [Valente, 1995] opened up a broad array of research topics where
the influence spread was studied not in isolation but with that of the underlying network topology.
One of the first of models of this kind was also focused on improving sales by marketing to not
only those customers who would be expected to buy a new product but also to the customers with
a higher network value or those who could influence others in their network to make a purchase
decision [Domingos and Richardson, 2001] [Richardson and Domingos, 2002]. In these papers, it is
established that considering the potential of individual entities in a network to adopt a certain idea
or product without considering their network position and how they can influence other entities
in the network would result in a sub-optimal solution while trying to maximize profit from sales.
Hence, it is important to consider the network structure along with individual attributes while we
are trying to maximize the expected growth in awareness.
Contribution of Winners in Spreading Awareness
During 1950’s, it was found that adoption of a new idea or product generally follows a “two-step
flow of communication” wherein any propaganda is first evaluated and adopted by opinion leaders
who in-turn motivate the less active sections of the population to follow suit [Katz, 1957]. Note that
interpersonal relations can be viewed as (1) channels of information flow, (2) channels of social
pressure, or (3) channels of social support. Apart from the conventional factors that result in an
entity being influenced by peers, the author describes a phenomenon of undecided entities to adopt
an idea as a result of the “Influencee” wanting to be as much like the “Influencer” as possible. This
3
Manjunath Holaykoppa Nanjunda Jois Literature Review
strengthens the premise that an environment protecting lottery system should be fair, i.e., favoring
active energy savers. In other words, the households that save more energy should be more likely to
win. As such, they would play an effective role in influencing others as they would be perceived to
be in a more desirable position as compared to that of non-savers. Another study during this period
reveals that “No material interest is involved in the recommendation” - is the most basic motivation
for the “listener” in accepting and acting on any recommendation [Dichter, 1966]. This generally
holds true in the setting of energy saving, as a winner sharing their experience participating in the
program and winning the lottery does not entail any material interest, and hence, is well positioned
to spread the awareness, e.g., in comparison with direct marketing options.
Effect of Awareness Gain towards Higher Engagement and
Improved Energy Conscious Behavior
The popular Bass Model [Bass, 1969] depicts how timing of a consumer’s initial purchase is corre-
lated to number of previous buyers in the society. The general diffusion model proposed by Bass
provides a widely accepted framework for forecasting the adoption of new product: when the po-
tential consumers do not have an objective basis to evaluate the product, they wait for a general
acceptance of the product in the community. While this was one of the first models considering the
consensual validation as a factor for adoption of new product or idea, there have been many other
studies [Granovetter, 1973][Granovetter, 1978] [Kempe et al., 2003] [Brown and Reingen, 1987] that
demonstrate the validity of herd behavior in human decision making. Also, it is found that one can
expect a “system delay” in adoption of risky ideas, wherein the stable influence adoption configu-
ration delays accepting the risky innovation until others have demonstrated its feasibility through
prior adoption [Becker, 1970]. Considering that the lottery system does not entail any material risk
by way of participating, we need not consider this aspect while modeling our problem. Hence, we
can conclude that the willingness to adopt a new idea increases with increase in social pressure and
increase in connected peers already adopting the idea. As such, for modeling the lottery system,
we can assume that the degree of engagement of a household would increase proportional to the
awareness brought about by the connected peers who are already actively saving energy. Since, by
the virtue of design of the lottery system, where winners are more likely to be exhibiting energy
conscious behavior, we can justifiably assume that the more winners an individual finds close by,
the more their awareness and the degree of engagement towards saving energy can be expected to
increase.
Impact of Multiple Winning Neighbors on Awareness Gain of
Individual Entities
The analytical model aimed at predicting market responses to new consumer products utilizes a
model of “diminishing returns to media weight” [Claycamp and Liddy, 1969] and has found to closely
match real world market responses [Mahajan et al., 1984]. To elaborate, incremental influence gain
by an individual exposed to the same or similar nature of marketing would result in marginal returns
each time. This relates well to our problem with the possibility of more than one winners in the
neighborhood of any household. We would like to incorporate the concept of diminishing returns on
awareness gain with increase in number of winning neighbors, as we may expect to find overlap of
information shared resulting in reduction of new information obtained each time.
Alternatively, this phenomenon can also be expressed as the level of interest in adopting a new
product or idea, which might increase substantially the first few times a person is introduced to the
idea but repeated exposure to the same information may become redundant and less effective.
4
Manjunath Holaykoppa Nanjunda Jois Literature Review
A study reveals that there is an increase in probability of participation of an individual with
increase in perceived number of participants already existing in the network [Granovetter, 1978].
However, in another study we note that a small group of highly resourceful and interested actors
may contribute to a cause regardless of their own personal gain if they believe that their actions can
make a positive change to the community [Macy, 1991]. In this model, as opposed to each individual
deciding based on the number of existing households already saving energy in their neighborhood
successively, all the participants would chose to participate or not based on their perceived level
of existing participants and their personal interests. The aforementioned article elaborates on the
differences between individuals deciding simultaneously based on what they can perceive at a given
point in time (parallel choice) to that of individuals deciding one after the other sequentially with
each person’s decision having an impact on the next one in the chain (serial choice). While there
is considerable difference in the two models, the results of simulation experiments indicate that the
expected level of contribution in both cases increase in the beginning when there are fewer adopters
of the idea but levels-off in the end.
Modeling Influence Spread and Awareness Gain over the Net-
work
While the influence gained by an individual can be attributed to many factors, we can consider one of
the most important factors — an enthusiastic recommendation from a neighboring household upon
winning the lottery prize. The number of people willing to communicate with their peers about a
product is considered the single most significant factor in evaluating customer retention and degree
of acceptance of a product in a community [Reichheld, 2003].
With the understanding that winning households play a key role in spreading awareness gained
by among their neighbors, we try to establish the degree to which one or more winning neighbors
contribute to increase in awareness gained by an individual in the network. As mentioned earlier, it
is worthwhile to consider a diminishing returns concept to model the awareness gain.
An analysis of historical data from Wikipedia where social interactions defined by editors writing
on the user-talk page of others [Cosley et al., 2010] confirms this phenomenon of increasing proba-
bility of editing an article by an individual based on the number of connections who are previous
editors as shown in Figure (2.1.a).
We can observe here and in other studies of similar nature [Shi et al., 2009] [Angst et al., 2010]
[Goyal et al., 2010] that the probability of a user acting on a certain aspect increases drastically
when the first few of their neighbors adopt this behavior. However, the increase in probability
quickly levels off with little to no effect from actions of remaining users in the network. As such,
we can consider a monotonically increasing piecewise linear function related to a small number of
neighbors, which, upon winning would result in an awareness gain with diminishing returns with
increase in the number of winning neighbors.
In another study on adoption of Twitter hashtags [Romero et al., 2011], it is found that re-
peated exposures to a hashtag continue to have significant marginal effects on increasing the prob-
ability of adoption as shown in Figure (2.1.c). It is also shown that the probability of an indi-
vidual joining a community is influenced by number of friends he or she has within the community
[Backstrom et al., 2006] as shown in Figure (2.1.b). A research conducted on user grouping behavior
on online forums also confirms the law of diminishing returns based on study over different online
forums [Shi et al., 2009], results of which are as shown in Figure (2.1.d).
5
Manjunath Holaykoppa Nanjunda Jois Literature Review
Figure 2.1: Empirical Evidence of a Diminishing Returns Phenomenon in Social Influence
Top Left: Probability of editing an Article on Wikipedia as a function of number of previous editors who are also
connections through an interaction network. Top Right: Probability of joining a LiveJournal community as a
function of the number of friends already in the community. Bottom Left: Fraction of users who adopt a hashtag
after their Kth exposure to it. Bottom Right: Probability of a user joining a community as a function of number
of reply friends who are active in that community.
Degree of Engagement as a Continuous Parameter as Opposed
to Binary Choice of Participation
In a study aimed at understanding diffusion delays or failure associated with some innovations, a
simple model is setup, where, the level of belief regarding an idea is said to increase with every social
interaction confirming the same upto a threshold value [Deroıan, 2002]. In other words, as opposed
to the binary choice of decision making prevalent in the scenario of purchasing a new product, in this
model, the state of each agent is given by an expected utility which describes the disposition to pay
for adopting the new technology, from bad to good appreciation. This conforms well with the concept
of participating in the energy conservation program, as the desired behavior from participant cannot
be construed as a decision to participate or not; rather, it is the degree of awareness or inclination
towards making efforts to save energy. As such, we could adopt a continuous parameter to account
6
Manjunath Holaykoppa Nanjunda Jois Literature Review
for this level of inclination of the participant and the final objective of this paper is to maximize the
degree of awareness while considering constraint to ensure lottery fairness.
Ideal Number of Winners to be Selected
Assuming the influence spread is correlated to the winning amount, we can model the problem to
find an optimal number of winners. However, this would increase the complexity of the problem of
assigning probabilities of winning to groups of households. Furthermore, based on a simulation study
on large scale network [Aral et al., 2011], it is found that seeding more than 0.2 % of the population
is wasteful because the gain from their adoption is lower than the gain from their natural adoption
(without seeding). Hence, we would like to limit the value of this parameter to a fixed value. Also, a
survey on “Energy Awareness Spread” shows that majority of individuals would consider $1,000 as
a significant prize amount that could motivate them to undertake additional energy saving efforts.
Therefore, assuming that each household depicting energy conscious behavior saves at least $10 on
average, it would be counterproductive to select more than 1% of such population as winners, as this
would reduce the winning amount and may result in lack of interest from even winning households
[Nayak A.V, 2013]. This survey also shows that only a small portion of a population would be
interested in actively saving energy to begin with. Hence, we consider 1% of household contributing
energy savings as the number of winners to be selected for the problem under consideration.
Importance of Fairness for Positive Awareness Spread
Lastly, we consider the aspect of fairness, which may very well be the most important aspect of
this paper and in developing the mathematical model. While intuitively, fairness intentions seem
to play an important role in economic relations, political struggles, and legal disputes but there is
surprisingly little direct evidence for its behavioral importance. A study on this topic indicates that
the attribution of fairness intentions is important in both the domains of negatively and positively
reciprocal behavior [Falk et al., 2008]. This study proves that people not only take the distributive
consequences of an action but also the intention it signals into account when judging the fair-
ness of an action. Another study specific to the lottery system indicates that results produced by
an unbiased procedure tends to be more acceptable than those produced by a biased procedure
[Bolton et al., 2005]. In other words, people are willing to impose a cost both on self and others to
resist procedures that they deem to be biased against them.
Establishing the Social Network Structure
While we have assumed that the underlying social network of the community being considered for the
problem is readily available, this may not always be the case. In modern times, with the extensive
use of cell phones and Internet connectivity, it may be fairly easy to establish the social network of a
community. It has been found that using cell phone records we can establish the friendship network
with a high degree of accuracy [Eagle et al., 2009].
Aggregating the available digital footprints of online users we can infer the network structure and
observe macroscopic behavior. This is arguably one of the most important aspects of computational
social science [Lazer et al., 2009]. There also exist many Apps intended to be used as tools for
social bonding while performing various routine activities like fitness training, shopping and eating,
etc. These Apps generally collect data pertaining to the individuals habits, behavior and social
connections. Underlying social networks of a community as related to energy conscious behavior can
be mapped by Apps developed for such purpose.
7
Manjunath Holaykoppa Nanjunda Jois Literature Review
Other Popular Models Not Applicable to Awareness Spread
through Lottery System
Apart from the aforementioned articles, there are other notable publications on the subject of
Influence Maximization modeling [Kempe et al., 2005] [Borgatti, 2006] which are not considered
here in detail, for two reasons : (1)These methods typically construe influence propagation through
activation of individuals in the network, whereas our problem is better modeled as a continuous
degree of awareness or participation,(2) Most of these methods are computationally intractable and
only provide heuristics to arrive at an approximate solution. The focus of this paper is to develop a
practical model which is readily usable to increase awareness spread in a community of interest.
8
Chapter 3
Influence Maximization for Social
Lottery Design: Intuition and
Problem Formulation
3.1 Problem Description
With reference to the framework of lottery system described in Chapter 1, we need to assign prob-
abilities to all possible sets of fixed number of winners so as to maximize the awareness spread in
expectation. Awarding a lottery prize to certain number of participants would motivate them to con-
tinue their energy savings endeavor and also to spread awareness among other households connected
to them in the social network. For such a lottery system to be fair and impart a positive message
in the society, it would be appropriate to provide a higher chance of winning to participants who
have engaged in undertaken efficient energy measures to reduce their energy consumption. When a
prize is awarded to participants who have put forth considerable effort in reducing their energy they
would be highly motivated to share their experience with their peers and would influence a positive
change in the behavior of their neighbors. While the aforementioned conditions are met, we would
like to maximize the spread of influence without disturbing the fairness of the lottery system. In
other words, regardless of probability of any group of households winning, the individual probability
of any household winning through all combination of groups would remain proportional to their
savings.
The idea here is to spread the lottery amount across the network so that the influence spread is
not restricted to a certain region that may have a higher concentration of individuals contributing
to higher energy savings. To illustrate this, let us consider a hypothetical case with social network
of 34 participants [Zachary, 1977] and number of winners to be selected is 2. To clearly understand
the nature of this problem, let us consider an extreme case where savings contributed by nodes 1, 33
and 34 is equal and significantly higher than that of remaining participants as illustrated in Figure
(3.1).
If we were to select two winners from this system arbitrarily without considering the effect of
influence then the probability of teams {1, 33}, {1, 34} and {33, 34} winning would be same.
However, as we can see from Figure (3.2), the influencing potential of team {1, 34} is significantly
higher compared to that of {33, 34}.
The objective of the problem is to assign higher probabilities to such teams with a higher in-
fluencing potential while ensuring the probability of expected earnings of any household through
different combinations of teams is still proportional to their individual energy saving contribution.
9
Manjunath Holaykoppa Nanjunda Jois Problem Description
Figure 3.1: Social Network Depicting Active Energy Savers in a Community
(a) Awareness Spread when Nodes 33 and 34 Win (b) Awareness Spread when Nodes 1 and 34 Win
Figure 3.2: Awareness Spread with Different Groups of Winners
With this, we can now list the specific challenges involved in solving such a problem as follows:
1. Enumerating all possible groups of winners.
2. Assigning probability of being picked as the winners for each group while ensuring :
• The lottery system is fair , i.e., chances of winning a certain amount through lottery for
individual households — hereinafter referred to as “Marginal Probability” is proportional
to their energy savings contribution.
• The awareness spread is maximized over the social network.
Let us consider a community with a set of N households where we intend to launch the afore-
mentioned lottery system. Let N be the set of households who succeed in saving energy in a given
period and let M be the total monetary equivalent of amount of energy saved. Now we can fix
the number of winners among which the pooled amount M is distributed as S where S = N /100.
Assuming we do not award prize to households not saving energy we would have a total of N
S
groups to assign probabilities. While in some cases this may restrict the awareness spread to a spe-
cific region where households are already demonstrating energy conscious behavior, this restriction
can be easily lifted by considering a minimal probability of winning for all households regardless of
their energy savings contribution, in which case N = N . Let us assume that we can enumerate all
10
Manjunath Holaykoppa Nanjunda Jois Calculation of Influencing Capacity
possible combination of winners say K, such that K = N
S . For the sake of simplicity, we shall
use names of the sets synonymous with total number of elements in their respective sets wherever
applicable. Let us consider an index of i for every household in the set N and an index of k for
every combination of winners, we can define Pi to be probability of a household to win a prize
amount of M such that i∈N Pi = 1. Then, PiM would be the monetary gain for the household i
in expectation. Also, let Ck represent the amount of awareness spread over the network when the
prize amount is distributed among members of group k. Now, our entire problem can be reduced to
assigning a probability value Xk for each group of winners k such that CkXk is maximized, while
ensuring that the expected amount of winning of any household i is still Pi.M. Thus, Xk would be
the only decision variable in this model.
3.2 Calculation of Influencing Capacity
In order to calculate the influencing capacity of every group of winners, it is necessary to establish a
model for propagation of influence through the network and awareness spread. With reference to the
literature review on influence spread detailed in Chapter 2, we find evidence to support the argument
that there exists strong correlation between adoption of new idea and number of neighbors who have
already adopted the idea. We observe that the probability of adoption is considerably high initially
but remains constant after a certain number of neighbors have adopted the idea. Further increase
in number of people who adopt the idea has little effect on improving the probability that the idea
is adopted. In the survey conducted for the purpose of understanding expectation of individuals to
undertake effort to save energy [Nayak A.V, 2013], we observe that a finite number of influencing
agents in the neighborhood of an individual would suffice to motivate them to take part in energy
conservation. With this understanding, we construct a piecewise linear function to establish the
influence gained by an individual in relation to the number of winners in their neighborhood as
shown in Figure (3.3).
Figure 3.3: Piecewise Linear Function to Model Awareness Gain Related to Winning Neighbors
With this, we can calculate the total awareness spread Ck when a group k wins the lottery by
aggregating the awareness gained by all N individual households in the network.
11
Manjunath Holaykoppa Nanjunda Jois Multi-Team Influence Maximization Problem
3.3 Multi-Team Influence Maximization Problem
Assuming that we have now listed all the K possible groups of winners and calculated their cor-
responding awareness spreading capacity Ck, we can formulate a Linear Program to determine the
optimal assignment of probabilities to each of these K groups to maximize the total awareness spread
in expectation, subject to a constraint ensuring the expected winning amount of all individual house-
holds are proportional to their energy savings contribution.
To define such a constraint, let us represent each winning group k as ak such that,
aik =
1 if participant i is a member of team k,
0 otherwise.
Since we know that the total awareness spread through any single group of winners k is Ck and
that group k has a probability of Xk of being selected, we can define CkXk as the expected awareness
spread from group k. We formulate a linear program to maximize such awareness spread as follows:
Multi-Team Influence Maximization Problem (MTIMP)
max :
k∈K
CkXk (3.1)
Subject to :
k∈K
aikXk
S
= Pi, ∀ i ∈ N , (3.2)
Xk ≥ 0, ∀ k ∈ K. (3.3)
Equation (3.1) represents the expected awareness gained by all households based on the prob-
ability Xk of group k winning the lottery prize. Equation (3.2) represents the probabilities of all
different groups through which a household i could win. As such, sum of these should be equal
to the marginal probability of the household winning. To clarify, we are distributing the winning
amount equally among all members of the group selected, the expected winning amount would be
M
S . Also, since this needs to be proportional to expected winning amount of the household based
on their contribution to savings pool we have:
k∈K
aikXk
M
S
= Pi M, ∀ i ∈ N .
Canceling M from both sides of the equation we have the following constraint:
k∈K
aikXk
S
= Pi, ∀ i ∈ N .
Lastly, we have the non-negativity constraint defined by Equation (3.3) for the decision variable Xk.
We do not include any constraint to ensure that the probabilities of the all K groups add up to
1 as this is satisfied provided sum of marginal probabilities Pi of all households add up to 1.
Proposition 1: k∈K Xk = 1 as long as i∈N Pi = 1.
Proof : By summing both sides of Equation (3.2) over all i we have,
i∈N k∈K
aikXk
S
=
i∈N
Pi,
12
Manjunath Holaykoppa Nanjunda Jois Multi-Team Influence Maximization Problem
When i∈N Pi = 1 we get,
k∈K i∈N
aikXk
S
= 1.
Since, i∈N aik = S by definition we have,
k∈K
XkS
S
= 1,
k∈K
Xk = 1.
Proposition 2: Total number of groups with non-zero probability in an optimal solution is always
less than or equal to N .
Proof: Assuming there are K groups and this model would have N constraints, ignoring the non-
negativity constraint of the decision variable. As such, any basic feasible solution will have at most
N basic variables. Therefore, if an extreme-point algorithm such as the simplex method is used, any
optimal solution will have at most N teams with probability value more than 0 while the remaining
K − N groups would have probability of 0.
While this model accurately represents the problem at hand, enumerating all possible combina-
tions of winners (i.e., N
S in total) is computationally prohibitive. Hence, we consider an alternate
approach that uses column generation to only work with those variables or set of teams that have a
significant effect in spreading awareness, resulting in better computational efficiency.
13
Chapter 4
A Column Generation Approach
for MTIMP
In general, column generation divides the original problem into two problems, namely: the master
problem and the subproblem. The master problem is a reformulation of the original problem defined
over a manageable subset of variables. At each iteration of the column generation algorithm, addi-
tional variables are introduced to the master problem via the subproblem, in case those are required
to declare optimality. Column generation has been successfully applied to solve a wide array of
industrial problems generally associated with a large number of variables such as vehicle routing,
crew scheduling and cutting stock problems.
4.1 Framework Overview
Considering that enumerating all possible combinations of winners is computationally difficult, we
can begin solving the problem with only a few combination of winners. The set of variables considered
for solving the master problem is referred to as the candidate list. Upon solving such a restricted
master problem, we can analyze the values of dual variables to determine a new combination of
winners to be included in the master problem. However, to be able to do this, we would first need to
have a set of variables K among which one can obtain a feasible basis to the master problem. More
formally, we need to start with a set of variables for which there exists a convex combination of these
that would satisfy the marginal probability constraint of all households considered. Unfortunately,
because of the problem’s structure generating such a K is not a trivial task.
4.2 Initial Basic Feasible Solution
For any given problem with N households with positive energy savings contribution, we can initialize
K a set of N dummy teams, each of those composed by a single individual household. Note that,
upon solving the master problem with this set of variables, assuming that dummy team k is composed
of household i, the optimal solution would be the trivial Xk = Pi, as the marginal probability
constraint can only be satisfied when the probability of every team is equal to that of the marginal
probability of the individual household contained therein. Clearly, since such dummy teams only
contain one member as opposed to S (which is the required size), the resulting solution is not valid
for the original problem. However, adding these teams provides the master problem with an initial
basis.
Now, to avoid having the dummy teams in the final solution, their corresponding influence
capacity is set to 0 or less. Hence, after several iterations, these variables will be sequentially replaced
14
Manjunath Holaykoppa Nanjunda Jois Master Problem for MTIMP
by teams of valid size. This approach for generating initial feasible solutions is motivated by other
penalization methods (Big-M), typically used when solving linear programs [Bazaraa et al., 2011].
4.3 Master Problem for MTIMP
We can define the initial set of variables as the set K such that K ⊂ K. Later on, we shall elaborate
on ways to start with better quality initial feasible solution which would help us obtain the optimal
solution faster. If we consider a set of N variables as described earlier, we can define the restricted
master problem as below:
max
k∈K
CkXk (4.1)
Subject to :
k∈K
aik
Xk
S
= Pi, ∀ i ∈ N , (4.2)
Xk ≥ 0, ∀ k ∈ K . (4.3)
Proposition 3: The optimal solution of the restricted master problem always forms a
lower bound to the original problem
Proof: Assuming none of the existing variables in the candidate list are removed, the feasible
region including the optimal solution for the master problem will only increase with inclusion of new
variables. Therefore, this solution would always be a feasible solution in all subsequent iterations
and as such the optimal solution of the master problem cannot be lower than this value. Hence, we
can consider the optimal solution of the master problem in any iteration to be a lower bound to the
original problem.
Upon solving the restricted master problem, we can analyze the dual values associated with each
constraint to determine the impact of increasing or decreasing the right hand side of the constraint
on the objective function value. More specifically, the dual values obtained after solving the master
problem represents the change in the objective function value per unit change to the right hand side
of the constraint.
By considering all the dual values, we can estimate the implicit cost of including a new group
of winners in the optimal solution as the sum product of all the dual values associated with the
marginal probability of winners present in the group. For a group of winners k the reduction to
objective function value by including this team in the optimal solution is given by
i∈N
aik
S
βi, (4.4)
where βi is the dual value associated with the constraint obtained by solving the master problem.
4.4 Pricing Problem (Sub Problem) for MTIMP
The premise in column generation is that many of the variables will be non-basic in an optimal
solution to the original problem, so one should only generate those variables that have the potential
to improve the objective function value. Thus, the objective function for the pricing problem is the
reduced cost calculation of the non-basic variables.
The reduced cost of a variable represents the marginal improvement in the objective function
that would be obtained if the value of the variable is increased. The reduced cost of including a
group k is given by the difference between increase to the objective function and the penalty for
15
Manjunath Holaykoppa Nanjunda Jois Pricing Problem (Sub Problem) for MTIMP
including the said team as described in Equation (4.4). Hence, we can compute the reduced cost for
any group of winners k as
Ck −
i∈N
aik
S
βi, (4.5)
A group k can increase the objective function value only when the value of the reduced cost
depicted in Equation (4.5) is greater than zero. In other words, when the reduced cost of a vari-
able is greater than zero, the increase in the objective function is greater than that of the penalty
associated with including the said group. Hence, our objective for the pricing problem would be to
determine a group of winners k such that
Ck −
i∈N
aik
S
βi > 0. (4.6)
In order to determine such a group k, we can formulate an Integer linear program with an objec-
tive function to maximize the reduced cost by considering which households can be included in this
group. For this purpose, let us define Y as the binary vector of size N representing the decision
variable where
Yi =
1 if participant i is a member of the new group of winners,
0 otherwise.
It is worthwhile to noting that Y will later represent the incidence vector ak when the new group
of winners k is then added to the master problem. Hence, we can define the reduced cost in Equation
(4.6) as
Ck −
i∈N
Yi
S
βi. (4.7)
Now, the only unknown value in Equation (4.7) is Ck, which can be computed as a function of
Yi. More specifically, Ck can be computed by aggregating the awareness gained by each individual in
the network when team k wins. The individual awareness gain is a function of number of neighbors
in the vicinity of the individual which is again a function of Yi. Furthermore, the function repre-
senting the awareness gain based on the number of winning neighbors as depicted in Section 3.2, is
essentially non-linear. To linearize this expression, let Zi be represent the awareness gained by a
household i upon winning of a group k. Now, we can compute Zi by counting the number of winners
in the neighborhood of i, say wi when the group k wins. We can represent all the connections of
household i in the social network as eij such that
eij =
1 if household i is connected to household j,
0 otherwise.
Now, we can compute the number of winning neighbors in the vicinity of household i when a
team k given by Y wins as below
j∈N
eijYj = wi, ∀ i ∈ N. (4.8)
We can represent the piecewise linear function for determining the awareness gained by each
individual as a set of lines expressed through a set of slopes and intercepts of the corresponding
line. In other words, let mt and bt represent the slope and intercept of line t of the piecewise linear
function considered. Also, let bT be the highest intercept representing the maximum awareness that
can be gained by any individual. Now, based on the values of wi obtained through Equation (4.8),
16
Manjunath Holaykoppa Nanjunda Jois A Column Generation Algorithm
one can express the awareness gained by individual i using variable Zi constrained as follows:
mtwi + bt + bT Yi ≥ Zi, ∀ i ∈ N, ∀t <= T (4.9)
Here, the term bT Yi represents the fact that the motivation level of a household would reach its
highest limit when they themselves have won, regardless of how many of their neighbors have won.
To ensure there is no additional awareness gain accounted for beyond the maximum limit bT we
need to have
Zi ≤ bT , ∀ i ∈ N. (4.10)
Under constraints (4.8), (4.9) and (4.10), and by incorporating Zi into the maximization problem
objective, one ensures that the value of Zi, indeed reflects the awareness gained by any individual
household i when a team k represented by Yi wins the lottery. Aggregating awareness gained by
individual households over the entire network, we obtain the total influencing capacity of team k
over the network
i∈N
Zi = Ck, when equations (4.8 - 4.10) are satisfied.
Hence, one can formulate the pricing problem as follows:
max :
i∈N
Zi −
i∈N
Yi
βi
S
Subject to :
j∈N
eijYj = wi, ∀ i ∈ N,
mtwi + bt + bT Yi >= Zi, ∀ i ∈ N, ∀t ≤ T,
Zi ≤ bT , ∀ i ∈ N,
i∈N
Yi = S,
Yi ∈ {0, 1} ∀i ∈ N ,
Zi ≥ 0,
wi ≥ 0,
The last set of constraints ensures that the number of winning households is equal to required
and fixed number of winners S, non-negativity and binary constraints.
4.5 A Column Generation Algorithm
In order to iteratively solve for the optimal solution using the aforementioned master and sub
problem, we need to follow the column generation algorithm as stated below:
17
Manjunath Holaykoppa Nanjunda Jois Further Enhancements
1: Construction of an initial feasible solution: An initial basic feasible solution is generated
by one of the techniques mentioned in this paper.
2: Restricted Master Problem: The Integer Linear program is solved with the current candidate
list of variables and the dual variables are obtained as β.
3: Sub-Problem: With β values obtained from previous step, we solve the sub-problem.
4: Evaluation of optimality : If the objective function value of sub-problem ≤ 0, it means there
does not exist any new group of winners which when added to the candidate list would improve
the objective function. Hence, the incumbent solution obtained at Step 2 would be optimal.
Otherwise, proceed to Step 5.
5: Update candidate list: The new group obtained from the Sub-problem is added to the can-
didate list and we proceed to Step 2.
4.6 Further Enhancements
Stopping before Optimality:
We can also terminate the column generation process when we are close to the optimal solution,
in an effort to save computational time by evaluating if the objective function value is ≤ in Step
4. Here represents the error tolerance or the permissible distance from optimal solution at which
we would wish to stop the process.
Improved Initial Basic Feasible Solution:
While we are solving the multi-team social lottery design problem using the above algorithm, it
is worthwhile to note that the initial basic feasible solution considered here is invalid for all practical
purposes if we wish to distribute the prize among more than one winners. This results in poor
convergence in the initial stages as we would need to run at least N iterations to find a set of new
valid variables to replace the initial basic feasible solution considered. Hence, it would be beneficial
to consider any algorithm to improve the initial basic feasible solution. One strategy here could
be to include a random combination of groups of winners along with the simple solution of groups
consisting of only one winner which would definitely yield a feasible solution. If there exists a convex
combination of these randomly generated groups along with the singleton groups such that marginal
probability constraint is satisfied then this would yield a better starting solution than with just
singleton groups as their starting objective solution value is always 0. Better yet, we can append a
group of winners highly likely to be in the optimal solution based on their individual attributes like
Degree centrality or Eigenvector centrality measures.
We can observe here that the contribution of any group of winners to the objective function
value is equal to the product of the group’s aggregate influencing capacity and the probability of the
group winning. Furthermore, the probability of any group is limited by the minimum of marginal
probabilities of all winning households in the group. To clarify, the probability of any group winning
can never be more than that of the marginal probability of any winner in the group as this would
violate the fairness constraint. As such, the minimum marginal probability of winning households in
a group forms the upper bound for the probability that can be assigned to this group. Considering
this, we can generate groups which can result in a high contribution to the objective function and
hence are more likely to be considered in the optimal solution using the heuristics detailed at the
end of this Section.
The idea here is to generate groups of winners with high influencing capacity that could poten-
tially be part of the optimal solution. We can accomplish this by sequentially adding individual
households based on the number of neighbors they have and their individual marginal probability.
Upon considering a set of households for a team, we can consider the minimum marginal probability
of these households as the maximum probability that can be assigned to the team. Once this is done,
before finding the next set of households to be grouped, we can reduce the marginal probability of
18
Manjunath Holaykoppa Nanjunda Jois Further Enhancements
the households already considered by the maximum probability that can be assigned to the group
they are present in. By sequentially forming new candidate teams in this manner we can expect
to form N − S groups, all of which could be present in the optimal solution without violating any
constraints. However, the last set of S winners that may be left out can only form a team when
all of their remaining probabilities are equal. In the unlikely event that this happens, we would not
only have a feasible solution but also a solution close to the optimal. While, the N − S group of
winners generated thus far may not form a feasible solution on their own, they form good candidates
to be included in the candidate list along with the singleton groups of winner detailed earlier.
Algorithm to generate such teams is as follows:
1: Initialize Let Current Winner = 1, Current Team = 1
2: Search for a household with the highest product of their marginal probability and number of
neighbors connected to them. Also, ensure that the household is not already considered in the
group with index Current Team.
3: Add the household to the group Current Team
4: Increment Current Winner and mark as used for Current Team
5: If Current Winner = S then increment Current Team, set Current Winner = 0, set marginal
probability = marginal probability - Minimum of Marginal probabilities of all households in
Current Team and return to Step 2.
6: Go to Step 2.
19
Chapter 5
Computational Experiments
5.1 Experimental Design
To evaluate the performance of this model, we considered two real world social network data-sets
namely: (1) Advagto trust metric [kon, 2014] converted to an undirected network resulting in N
= 5,155 nodes and 39,285 edges. Also, we have removed the singleton nodes not connected to any
other nodes in the network as this would not add any value to our study of awareness spread. (2)
Enron email network [Leskovec et al., 2009] which is also converted to an undirected graph resulting
in N = 36,692 Nodes with 183,831 Edges.
In a real world scenario, it is reasonable to assume only a small portion of all households willing
to undertake considerable effort to save energy [Nayak A.V, 2013]. Furthermore, implementing the
proposed lottery design for a community where all the households are already contributing to energy
conservation may not be very effective. As such, we consider different instances with different portion
of households contributing to energy savings. More specifically, for Advagato network, we consider
the cases with the number of active savers N = 10%, 15%, 20%, 30%, 40%, 50% of N, and
for Enron email network, N = 15%, 20% of N as active savers.
Also, we consider two distributions to depict different amount of energy saved by individual
households namely, Chi-square with 2 degrees of freedom and a Bi-modal normal distribution .
Here, the Chi-square distribution is used assuming only a few households would be interested as well
as competent to save higher amounts energy. The Bi-Modal normal distribution represents the case
where one segment of the community is better equipped to save energy and the other is not while
individual interests to engage in energy conservation may be normally distributed.
Lastly, the number of winners in a single lottery round is taken to be S = 1%N based on the
argument in Section 2, to ensure a resulting lottery prize amount which is significant enough to
attract interest of households. The stopping criterion is based on the gap of = 0.00001.
For instances with N < 10, 000 nodes we find that a solution very close to optimal is reached
relatively quickly, within 3N number of iterations.
5.2 Random Lottery Design
To establish the effectiveness of optimizing the lottery design, we compute the influence spread in the
absence of the optimization model, in which case the probability of any group of households winning
is given by the product of the marginal winning probabilities over all the members of the group
winning together. Enumerating all the group combinations to calculate the awareness spread from
each team is computationally hard. Also, when the number of active savers is considerably large, we
can expect that the marginal probability of any single household is very small. As such, we consider
20
Manjunath Holaykoppa Nanjunda Jois Random Lottery Design
an approximation technique to establish the influence spread value under random winner selection,
which involves assuming the energy savings and hence the marginal probability of all energy saving
hours are equal. With this, we can calculate the probability of a certain number of neighbors s
winning the lottery as a binomial random variable with parameters p = 1/N , n = N and
k = s. By calculating the total expected awareness spread by way of such an approximation yields
values for each instance and by running the program close to 30 hours (wall time) we find results as
shown in the Appendix.
While we do not find any significant difference in computational efficiency between the two
distribution types, we observe that the computational time increases with increase in number of
winners to be selected or with increase in number of active savers. This can be understood as the
number of constraints of the pricing problem increases with increase in the size of network and the
feasible region increases with increase in the number of winners to be selected. Results related to
the objective solution, binomial approximation and the improvement achieved through use of the
optimization model as shown in Tables (5.1 - 5.4).
Active
savers N’
Iterations
completed
Awareness spread Upper
Bound
GAP
Optimization
EffectivenessRandom Design MTIMP
515 1243 29.93 40.58 40.58 0.00% 35.60%
773 4621 55.3 68.99 68.99 0.00% 24.80%
1030 5304 66.75 101.36 102.07 0.00% 51.90%
1546 5675 102.6 150.79 156.16 0.00% 47.00%
2061 6517 118.17 192.16 207.21 0.10% 62.60%
2577 5775 149.33 223.8 270.92 0.20% 49.90%
Table 5.1: Results of Computational Study on Advagato Network with Bi-modal Distribution for
Savings
Active
savers N’
Iterations
completed
Awareness spread Upper
Bound
GAP
Optimization
EffectivenessRandom Design MTIMP
515 1323 29.93 42 42 0.00% 40.30%
773 5017 55.3 64.56 64.56 0.00% 16.70%
1030 6220 66.75 104.33 104.99 0.00% 56.30%
1546 6492 102.6 141.36 152.81 0.10% 37.80%
2061 5408 118.17 189.99 264.51 0.40% 60.80%
2577 2380 149.33 180.32 423.87 1.40% 20.80%
Table 5.2: Results of Computational Study on Advagato Network with Chi-square Distribution for
Savings
Active
savers N’
Iterations
completed
Awareness spread Upper
Bound
GAP
Optimization
EffectivenessRandom Design MTIMP
5502 710 241.71 267.83 1552.11 4.80% 10.80%
7339 932 414.85 309.85 2260.42 6.30% -25.30%
Table 5.3: Results of Computational Study on Enron Email Network with Bi-modal Distribution
for Savings
21
Manjunath Holaykoppa Nanjunda Jois Results
Active
savers N’
Iterations
completed
Awareness spread Upper
Bound
GAP
Optimization
EffectivenessRandom Design MTIMP
5503 1385 241.67 242.32 1551.21 5.40% 0.30%
7338 1036 414.9 317.83 2108.87 5.60% -23.40%
Table 5.4: Results of Computational Study on Enron Email Network with Chi-square Distribution
for Savings
5.3 Results
Through the plots of lower bound and upper bound on objective function value against number of
iterations, one can observe that the upper bound drops drastically during the initial phase while the
drop reduces as the gap from optimality reduces. Hence, starting with a higher quality initial feasible
solution would enable reaching optimal solution or desired gap from optimal solution quicker.
We can observe from Tables (5.1 - 5.4) that for the cases where optimization model has reached
the stopping criteria or is otherwise close to optimal, there is significant improvement in the objective
function value while adopting the optimization model over picking lottery winners through random
design. We find that the optimal solution found through the column generation method is on an
average 40% better compared to that of picking the winners only based on their marginal probability
without considering the social network topology. This is a considerable improvement and would help
spread the awareness about energy conservation much faster than traditional means. Furthermore,
this would have a cascading effect as with more households savings energy actively, the lottery
amount can be distributed among more households ensuring dramatic spread of awareness over
subsequent periods.
22
Chapter 6
Discussion
The mathematical model shown here can be applied to other problems of similar nature as well,
wherein there is no implicit “activation” point for an individual to act based on influence gained from
their neighbors and the influence gained is linear piecewise monotonically increasing function. For
example, the model is applicable to brand awareness programs, where a company maybe willing to
provide free samples to a few individuals who are generally interested in the company’s products, who
can evangelize a new product among their peers and improve brand reputation as well as awareness.
This is very similar to the lottery problem as their is no purchase decision to be made, rather
a degree of enthusiasm which would be shown by individuals regarding a product. Furthermore,
since the selected individuals obtain free sample, it can be construed as that of winning a lottery.
This would highly depend on the nature of product (obtaining a brand new cell phone would be
similar to winning a lottery while obtaining free samples of shampoo may not be just as good) and
how company plans to start the seeding process. In case of inexpensive products, companies can
choose to provide gift hampers including an array of their products to improve the experience of the
individuals selected for seeding. For such a scenario, we could use the column generation approach
for MTIMP to solve for the optimal assignment of probabilities to teams of seeds to maximize the
expected awareness spread.
6.1 Limitations
For the sake of our study we have considered that the number of winners are fixed and known. Also,
the total savings would be distributed equally among all participants. This is a limitation, since
winners with higher amounts may spread more awareness and as such it may be beneficial to allot
appropriate amount to each winners to maximize the awareness spread. Incorporating this aspect
would result in the number of winners and the amount won by them to be decision variables which
would further complicate the problem but worth considering for future research. This also implies
that the function considered here to depict the influence gain would be different for each winner
winning a certain amount.
For the influence spread, we limit ourselves to considering the awareness spread delivered directly
by the winners. However, in reality we can expect people to be influenced via more distant connec-
tions as well. Our model can be easily updated with additional constraints to include influencing
effects from friends of friends. However, this would increase the computational time required to
solve the problem.
In our model, we have considered that the initial awareness or the level of interest of all households
is zero, ignoring the fact that some of the households actively saved energy. This is in consideration of
the fact that many households may accidentally end up saving energy in short periods, for example,
they may be out on a vacation for the most part of the period. So, assuming that a certain household
23
Manjunath Holaykoppa Nanjunda Jois Future Scope
is fully aware of energy conservation techniques based on the fact that they saved energy in short
periods may not be very accurate. In order to decidedly know if the energy savers are saving
energy consciously, we would need to collect data over many periods, determine which participants
are consistently performing well and incorporate a higher initial awareness level in our model to
accurately depict the scenario of some households exhibiting a higher awareness level before the
lottery prize is distributed.
While our model is very similar to that of [Deroıan, 2002], the details of this paper are limited to
one time step. However, by updating the initial awareness level of all households and any changes
to the social network, we can repeatedly use the above method to maximize the influence spread.
6.2 Future Scope
The aforementioned model can be further empowered with online social networking systems to enable
increased social pressure and support. Also, since we are aware of the benefits of providing incentives
over seeding strategies [Aral et al., 2011], it would be worthwhile to consider the idea of an online
system to monitor the friendship status as well as savings achieved by individuals. Referring new
households can be encouraged with an increase in the chance of winning lottery. While this would
no doubt improve the social pressure and overall behavior it may also prove to be too intrusive and
may adversely affect our objective of improving energy conservation.
Since the computational time reduces drastically with a better initial feasible solution, it may
be worthwhile to consider improved heuristics to determine improved basic feasible solution. Also,
in the model described here we are establishing the probability of a group of households winning.
Alternatively, a method can be developed to sequentially pick winners under the same premise of
ensuring fairness to all households.
24
Chapter 7
Conclusion
We believe the model developed here can be effectively used to promote efficient energy usage
behavior in the society. The general dis-interest towards saving energy on a routine basis can be
overcome by adopting such engaging and involving programs. We have shown that the optimization
technique developed herein can improve the awareness spread to a considerable degree as opposed
to picking households as winners based on just their individual savings alone. The mathematical
model developed here provides many insights into structure of such a problem which can be adopted
for future research. Computational study demonstrates that the model can be readily solved for
small communities but the computational time increases with increase in the number of nodes and
winners to be selected. However, considering that the problem need only be solved once per period,
the computational time may not be most challenging aspect of implementing such a social lottery
design. Lastly, the concept of social lotteries to encourage some behavior has the potential to be
applied on a wide variety of industry problems especially those dealing with customers connected
through a social network.
25
Bibliography
[App, 2009] (2009). The effect of energy efficiency programs on electric utility revenue requirements
– http://www.publicpower.org/files/pdfs/effectofenergyefficiency.pdf.
[kon, 2014] (2014). Advogato network dataset – KONECT – http://konect.uni-
koblenz.de/networks/advogato.
[USE, 2014] (2014). Nov 2014 monthly energy review. Technical report, U.S. Energy Information
Administration.
[Angst et al., 2010] Angst, C. M., Agarwal, R., Sambamurthy, V., and Kelley, K. (2010). Social
contagion and information technology diffusion: the adoption of electronic medical records in us
hospitals. Management Science, 56(8):1219–1241.
[Aral et al., 2011] Aral, S., Muchnik, L., and Sundararajan, A. (2011). Engineering social con-
tagions: Optimal network seeding and incentive strategies. In Winter Conference on Business
Intelligence.
[Backstrom et al., 2006] Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. (2006). Group
formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th
ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44–54.
ACM.
[Bass, 1969] Bass, F. (1969). A new product growth model for consumer durables. management
sciences. Institute for Operations Research and the Management Sciences. Evanston, XV (5).
[Bazaraa et al., 2011] Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D. (2011). Linear programming
and network flows. John Wiley & Sons.
[Becker, 1970] Becker, M. H. (1970). Sociometric location and innovativeness: Reformulation and
extension of the diffusion model. American Sociological Review, pages 267–282.
[Bolton et al., 2005] Bolton, G. E., Brandts, J., and Ockenfels, A. (2005). Fair procedures: Evidence
from games involving lotteries*. The Economic Journal, 115(506):1054–1076.
[Borgatti, 2006] Borgatti, S. P. (2006). Identifying sets of key players in a social network. Compu-
tational & Mathematical Organization Theory, 12(1):21–34.
[Brooks, 1957] Brooks, R. C. (1957). Word-of-mouth advertising in selling new products. The
Journal of Marketing, pages 154–161.
[Brown and Reingen, 1987] Brown, J. J. and Reingen, P. H. (1987). Social ties and word-of-mouth
referral behavior. Journal of Consumer research, pages 350–362.
[Claycamp and Liddy, 1969] Claycamp, H. J. and Liddy, L. E. (1969). Prediction of new product
performance: An analytical approach. Journal of Marketing Research, pages 414–420.
26
[Cosley et al., 2010] Cosley, D., Huttenlocher, D. P., Kleinberg, J. M., Lan, X., and Suri, S. (2010).
Sequential influence models in social networks. ICWSM, 10:26.
[Deroıan, 2002] Deroıan, F. (2002). Formation of social networks and diffusion of innovations. Re-
search policy, 31(5):835–846.
[Dichter, 1966] Dichter, E. (1966). How word-of-mouth advertising works. Harvard business review,
44(6):147–160.
[Domingos and Richardson, 2001] Domingos, P. and Richardson, M. (2001). Mining the network
value of customers. In Proceedings of the seventh ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 57–66. ACM.
[Eagle et al., 2009] Eagle, N., Pentland, A. S., and Lazer, D. (2009). Inferring friendship net-
work structure by using mobile phone data. Proceedings of the National Academy of Sciences,
106(36):15274–15278.
[Falk et al., 2008] Falk, A., Fehr, E., and Fischbacher, U. (2008). Testing theories of fairnessinten-
tions matter. Games and Economic Behavior, 62(1):287–303.
[Gale and Kariv, 2003] Gale, D. and Kariv, S. (2003). Bayesian learning in social networks. Games
and Economic Behavior, 45(2):329–346.
[Goyal et al., 2010] Goyal, A., Bonchi, F., and Lakshmanan, L. V. (2010). Learning influence prob-
abilities in social networks. In Proceedings of the third ACM international conference on Web
search and data mining, pages 241–250. ACM.
[Granovetter, 1978] Granovetter, M. (1978). Threshold models of collective behavior. American
journal of sociology, pages 1420–1443.
[Granovetter, 1973] Granovetter, M. S. (1973). The strength of weak ties. American journal of
sociology, pages 1360–1380.
[Katz, 1957] Katz, E. (1957). The two-step flow of communication: An up-to-date report on an
hypothesis. Public opinion quarterly, 21(1):61–78.
[Kempe et al., 2003] Kempe, D., Kleinberg, J., and Tardos, ´E. (2003). Maximizing the spread of
influence through a social network. In Proceedings of the ninth ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 137–146. ACM.
[Kempe et al., 2005] Kempe, D., Kleinberg, J., and Tardos, ´E. (2005). Influential nodes in a diffusion
model for social networks. In Automata, languages and programming, pages 1127–1138. Springer.
[Lazer et al., 2009] Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D.,
Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al. (2009). Life in the network: the
coming age of computational social science. Science (New York, NY), 323(5915):721.
[Leskovec et al., 2009] Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. (2009). Com-
munity structure in large networks: Natural cluster sizes and the absence of large well-defined
clusters. Internet Mathematics, 6(1):29–123.
[Macy, 1991] Macy, M. W. (1991). Chains of cooperation: Threshold effects in collective action.
American Sociological Review, pages 730–747.
[Mahajan et al., 1984] Mahajan, V., Muller, E., and Sharma, S. (1984). An empirical comparison
of awareness forecasting models of new product introduction. Marketing Science, 3(3):179–197.
27
[Nayak A.V, 2013] Nayak A.V, Nikolaev A.G, J. M. (2013). Increasing the energy conservation
awareness using the influential power of a lottery system.
[Reichheld, 2003] Reichheld, F. F. (2003). The one number you need to grow. Harvard business
review, 81(12):46–55.
[Richardson and Domingos, 2002] Richardson, M. and Domingos, P. (2002). Mining knowledge-
sharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international con-
ference on Knowledge discovery and data mining, pages 61–70. ACM.
[Romero et al., 2011] Romero, D. M., Meeder, B., and Kleinberg, J. (2011). Differences in the
mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion
on twitter. In Proceedings of the 20th international conference on World wide web, pages 695–704.
ACM.
[Shi et al., 2009] Shi, X., Zhu, J., Cai, R., and Zhang, L. (2009). User grouping behavior in on-
line forums. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge
discovery and data mining, pages 777–786. ACM.
[Valente, 1995] Valente, T. W. (1995). Network models of the diffusion of innovations, volume 2.
Hampton Press Cresskill, NJ.
[Zachary, 1977] Zachary, W. W. (1977). An information flow model for conflict and fission in small
groups. Journal of anthropological research, pages 452–473.
28
Chapter 8
Appendix
The graphs shows plot objective function values obtained on each iteration of column generation
approach to solving the MITMP. The upper bound and lower bound of the objective function value at
any iteration denoted by orange and blue dots respectively, represents the minimum and maximum
awareness spread one can expect with the existing candidate list of variables on any iteration. The
black line indicates the objective function value that can be obtained through random design and
signifies the threshold beyond which the column generation approach can be considered to be more
efficient in spread awareness through the network. One can observe from these plots that the number
of iterations required to reach a certain amount of gap from optimality increases with increase in
N and corresponding increase of S. Furthermore, even though the starting initial basic feasible
solution may have an objective function value lesser than that of random design, note that when gap
is considerably small, the objective function value of column generation approach is approximately
40% higher than that of random design.
29
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.1: Results of Computational Study on Advagato Network - Instance 1
30
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.2: Results of Computational Study on Advagato Network - Instance 2
31
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.3: Results of Computational Study on Advagato Network - Instance 3
32
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.4: Results of Computational Study on Advagato Network - Instance 4
33
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.5: Results of Computational Study on Advagato Network - Instance 5
34
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.6: Results of Computational Study on Advagato Network - Instance 6
35
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.7: Results of Computational Study on Advagato Network - Instance 7
36
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.8: Results of Computational Study on Advagato Network - Instance 8
37
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.9: Results of Computational Study on Advagato Network - Instance 9
38
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.10: Results of Computational Study on Advagato Network - Instance 10
39
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.11: Results of Computational Study on Advagato Network - Instance 11
40
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.12: Results of Computational Study on Advagato Network - Instance 12
41
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.13: Results of Computational Study on Enron Email Network - Instance 1
42
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.14: Results of Computational Study on Enron Email Network - Instance 2
43
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.15: Results of Computational Study on Enron Email Network - Instance 3
44
Manjunath Holaykoppa Nanjunda Jois Appendix
Figure 8.16: Results of Computational Study on Enron Email Network - Instance 4
45

More Related Content

Similar to Master Thesis - A Column Generation Approach to Solve Multi-Team Influence Maximization Problem for Social Lottery Design

GraphIVM- Accelerating IVMthrough Non-relational Caching
GraphIVM- Accelerating IVMthrough Non-relational CachingGraphIVM- Accelerating IVMthrough Non-relational Caching
GraphIVM- Accelerating IVMthrough Non-relational CachingGaurav Saxena
 
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...Yomna Mahmoud Ibrahim Hassan
 
Minigrid policy toolkit 2014 REN21
Minigrid policy toolkit 2014 REN21Minigrid policy toolkit 2014 REN21
Minigrid policy toolkit 2014 REN21PatrickTanz
 
Master's_Thesis_XuejiaoHAN
Master's_Thesis_XuejiaoHANMaster's_Thesis_XuejiaoHAN
Master's_Thesis_XuejiaoHANXuejiao Han
 
Microsoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data ScienceMicrosoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data ScienceMashfiq Shahriar
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisOktay Bahceci
 
Lecturenotesstatistics
LecturenotesstatisticsLecturenotesstatistics
LecturenotesstatisticsRekha Goel
 
A Decision Support System For Sales Territory Planning Using The Genetic Algo...
A Decision Support System For Sales Territory Planning Using The Genetic Algo...A Decision Support System For Sales Territory Planning Using The Genetic Algo...
A Decision Support System For Sales Territory Planning Using The Genetic Algo...Tony Lisko
 
Economic value-of-the-advertising-supported-internet-ecosystem’
Economic value-of-the-advertising-supported-internet-ecosystem’Economic value-of-the-advertising-supported-internet-ecosystem’
Economic value-of-the-advertising-supported-internet-ecosystem’IAB Netherlands
 
Thesis_Walter_PhD_final_updated
Thesis_Walter_PhD_final_updatedThesis_Walter_PhD_final_updated
Thesis_Walter_PhD_final_updatedWalter Rodrigues
 
Nweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italyNweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italyAimonJamali
 
Al-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research ProjectAl-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research ProjectLeila Al-Mqbali
 

Similar to Master Thesis - A Column Generation Approach to Solve Multi-Team Influence Maximization Problem for Social Lottery Design (20)

GraphIVM- Accelerating IVMthrough Non-relational Caching
GraphIVM- Accelerating IVMthrough Non-relational CachingGraphIVM- Accelerating IVMthrough Non-relational Caching
GraphIVM- Accelerating IVMthrough Non-relational Caching
 
myEcoCost Brochure
myEcoCost BrochuremyEcoCost Brochure
myEcoCost Brochure
 
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Exper...
 
Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)
 
Knapp_Masterarbeit
Knapp_MasterarbeitKnapp_Masterarbeit
Knapp_Masterarbeit
 
Minigrid policy toolkit 2014 REN21
Minigrid policy toolkit 2014 REN21Minigrid policy toolkit 2014 REN21
Minigrid policy toolkit 2014 REN21
 
Master's_Thesis_XuejiaoHAN
Master's_Thesis_XuejiaoHANMaster's_Thesis_XuejiaoHAN
Master's_Thesis_XuejiaoHAN
 
FULLTEXT01.pdf
FULLTEXT01.pdfFULLTEXT01.pdf
FULLTEXT01.pdf
 
T401
T401T401
T401
 
Microsoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data ScienceMicrosoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data Science
 
Final_Thesis
Final_ThesisFinal_Thesis
Final_Thesis
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_Analysis
 
Lecturenotesstatistics
LecturenotesstatisticsLecturenotesstatistics
Lecturenotesstatistics
 
A Decision Support System For Sales Territory Planning Using The Genetic Algo...
A Decision Support System For Sales Territory Planning Using The Genetic Algo...A Decision Support System For Sales Territory Planning Using The Genetic Algo...
A Decision Support System For Sales Territory Planning Using The Genetic Algo...
 
Economic value-of-the-advertising-supported-internet-ecosystem’
Economic value-of-the-advertising-supported-internet-ecosystem’Economic value-of-the-advertising-supported-internet-ecosystem’
Economic value-of-the-advertising-supported-internet-ecosystem’
 
Thesis
ThesisThesis
Thesis
 
Thesis_Walter_PhD_final_updated
Thesis_Walter_PhD_final_updatedThesis_Walter_PhD_final_updated
Thesis_Walter_PhD_final_updated
 
Semester 5 Experts in Teams Project - Opus
Semester 5 Experts in Teams Project - OpusSemester 5 Experts in Teams Project - Opus
Semester 5 Experts in Teams Project - Opus
 
Nweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italyNweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italy
 
Al-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research ProjectAl-Mqbali, Leila, Big Data - Research Project
Al-Mqbali, Leila, Big Data - Research Project
 

Master Thesis - A Column Generation Approach to Solve Multi-Team Influence Maximization Problem for Social Lottery Design

  • 1. A Column Generation Approach to Solve Multi-Team Influence Maximization Problem for Social Lottery Design by Manjunath Holaykoppa Nanjunda Jois 02-01-2015 A thesis submitted to the Faculty of the Graduate School of the State University of New York at Buffalo in partial fulfillment of the requirements for the degree of Master of Science Department of Industrial and Systems Engineering
  • 2. Acknowledgements I would like to express my special thanks of gratitude to my Advisor Dr. Alexander Nikolaev and Co-Advisor Dr. Jose Walteros for their continuous support and guidance throughout the research work. ii
  • 3. Contents Acknowledgements ii List of Figures iv List of Tables v Abstract vi 1 Introduction 1 2 Literature Review 3 3 Influence Maximization for Social Lottery Design: Intuition and Problem For- mulation 9 3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Calculation of Influencing Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Multi-Team Influence Maximization Problem . . . . . . . . . . . . . . . . . . . . . . 12 4 A Column Generation Approach for MTIMP 14 4.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Initial Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Master Problem for MTIMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Pricing Problem (Sub Problem) for MTIMP . . . . . . . . . . . . . . . . . . . . . . . 15 4.5 A Column Generation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.6 Further Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Computational Experiments 20 5.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 Random Lottery Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6 Discussion 23 6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7 Conclusion 25 Bibliography 26 8 Appendix 29 iii
  • 4. List of Figures 2.1 Empirical Evidence of a Diminishing Returns Phenomenon in Social Influence . . . . 6 3.1 Social Network Depicting Active Energy Savers in a Community . . . . . . . . . . . 10 3.2 Awareness Spread with Different Groups of Winners . . . . . . . . . . . . . . . . . . 10 3.3 Piecewise Linear Function to Model Awareness Gain Related to Winning Neighbors 11 8.1 Results of Computational Study on Advagato Network - Instance 1 . . . . . . . . . . 30 8.2 Results of Computational Study on Advagato Network - Instance 2 . . . . . . . . . . 31 8.3 Results of Computational Study on Advagato Network - Instance 3 . . . . . . . . . . 32 8.4 Results of Computational Study on Advagato Network - Instance 4 . . . . . . . . . . 33 8.5 Results of Computational Study on Advagato Network - Instance 5 . . . . . . . . . . 34 8.6 Results of Computational Study on Advagato Network - Instance 6 . . . . . . . . . . 35 8.7 Results of Computational Study on Advagato Network - Instance 7 . . . . . . . . . . 36 8.8 Results of Computational Study on Advagato Network - Instance 8 . . . . . . . . . . 37 8.9 Results of Computational Study on Advagato Network - Instance 9 . . . . . . . . . . 38 8.10 Results of Computational Study on Advagato Network - Instance 10 . . . . . . . . . 39 8.11 Results of Computational Study on Advagato Network - Instance 11 . . . . . . . . . 40 8.12 Results of Computational Study on Advagato Network - Instance 12 . . . . . . . . . 41 8.13 Results of Computational Study on Enron Email Network - Instance 1 . . . . . . . . 42 8.14 Results of Computational Study on Enron Email Network - Instance 2 . . . . . . . . 43 8.15 Results of Computational Study on Enron Email Network - Instance 3 . . . . . . . . 44 8.16 Results of Computational Study on Enron Email Network - Instance 4 . . . . . . . . 45 iv
  • 5. List of Tables 5.1 Results of Computational Study on Advagato Network with Bi-modal Distribution for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Results of Computational Study on Advagato Network with Chi-square Distribution for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.3 Results of Computational Study on Enron Email Network with Bi-modal Distribution for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4 Results of Computational Study on Enron Email Network with Chi-square Distribu- tion for Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 v
  • 6. Abstract The conventional Influence Maximization problem is the problem of finding such a team (a small subset) of seed nodes in a social network that would maximize the spread of influence over the whole network. This paper considers a lottery system aimed at maximizing the awareness spread to promote energy conservation behavior as a stochastic Influence Maximization problem with the constraints ensuring lottery fairness. The resulting Multi-Team Influence Maximization problem involves assigning the probabilities to multiple teams of seeds (interpreted as lottery winners) to maximize the expected awareness spread. Such a variation of the Influence Maximization problem is modeled as a Linear Program; however, enumerating all the possible teams is a hard task considering that the feasible team count grows exponentially with the network size. In order to address this challenge, we develop a column generation based approach to solve the problem with a limited num- ber of candidate teams, where new candidates are generated and added to the problem iteratively. We adopt a piecewise linear function to model the impact of including a new team so as to pick only such teams which can improve the existing solution. We demonstrate that with this approach we can solve such influence maximization problems to optimality, and perform computational study with real-world social network data sets to showcase the efficiency of the approach in finding lottery designs for optimal awareness spread. Lastly, we explore other possible scenarios where this model can be utilized to optimally solve the otherwise hard to solve influence maximization problems. vi
  • 7. Chapter 1 Introduction The efficient consumption of energy is an important issue considering its impact on the environ- ment and the economy. Residential energy use accounts for 21% of total U.S energy consumption [USE, 2014], and utility companies are held responsible for promoting efficient energy utilization behavior with its consumers. Energy conscious consumer behavior, however, can never result in any decrease in revenue generated through energy sales as government regulates energy prices to offset any revenue loss. By incentivizing energy savings, the utility companies are able to serve more customers with the same power output and the same profitability [App, 2009], making incentive program design an important research and investment area. Several Energy efficiency programs have been developed by major U.S. companies: e.g., such as Google Powermeter, Microsoft Holm, Hug Energy, Tendril Energy efficiency and O-Power. Failures of such programs can be attributed to the low consumer interest in the use of complex technology and procedures. However, some programs have shown positive results by engaging the consumers on a social level without the need for any technical skills [Nayak A.V, 2013]. With this in mind, we develop the idea of a lottery system to promote the efficient energy utilization by developing a scalable mathematical model to solve such a problem to optimiality, i.e., ensuring the maximal spread of energy saving awareness. The Lottery System Framework The key concept of the lottery system is to aggregate energy savings of small groups and award it to few households depicting desired energy conservation behavior ensuring the resultant prize amount is significant enough to create interest and increase awareness in a community. To implement such a lottery system, we begin by informing the residents of the selected community about the lottery program and the fact that they have higher chances of winning a lottery prize of significant amount by contributing to more energy savings. At the end of each period a group of winners shall be selected simultaneously and the pooled prize amount is distributed equally among them. Upon distributing the lottery prize in this manner, we can expect the winners to share their experience with their neighbors resulting in spread of awareness. We believe that with increased awareness about the program, more and more households would be motivated to reduce their energy consumption in solicitation of winning such a lottery in the future. Since we are considering groups of winners, if we wish to select S winners from a given community with N households, we would have N S groups of winners to choose from. The problem of interest here would be to find the optimal assignment of the winning probabilities to all groups such that the awareness spread is maximized in expectation, while ensuring that the probability of winning for any individual household winning is proportional to their energy savings level. 1
  • 8. Manjunath Holaykoppa Nanjunda Jois Introduction To successfully solve for the best assignment of probabilities to multiple teams (groups of win- ners), we would need to ensure two things, namely: (1) Lottery system is fair and the probability of a participant winning increases with increase in their share of contribution to the total savings pool, (2) The combination of winners are suitably located in the social network to maximize awareness spread. In order to accomplish these two tasks we develop a mathematical model to accurately depict the spread of awareness and provide ways to maximize the same. Considering that influence maximization problems are generally hard to solve [Kempe et al., 2003], we try to develop an elegant model with simple dynamics of influence spread while ensuring that the model corresponds to real world social dynamics as closely as possible. While the problem of energy conservation considered serves exactly this purpose, we believe this model would be applicable to other problems of similar nature as well. Also, since the importance of energy conservation is a well established concept and since we are aware of the existing methodology adopted by utility companies, we shift our attention towards the development of an efficient mathematical model, which is the primary concern of this paper. In an effort to model such a lottery system we study existing literature to understand the following aspects: (1) How to model awareness spread related to energy conscious behavior over a social network? (2) How to solve such a model to obtain an optimal solution that maximizes the spread of awareness? The paper consists of the following sections. Chapter 2 contains literature review related to spread of influence in a social network and existing methods towards identifying the best set of nodes for maximizing the awareness spread. In Chapter 3 we define the problem of social lottery design and develop a linear program to solve the same. Chapter 4 elaborates on column generation method to solve large real world instances of the problem. Chapter 5 contains computational experiments on empirical data-sets. In Chapter 6 we discuss further about limitations and scope for future works. 2
  • 9. Chapter 2 Literature Review Relevance of “Word-of-mouth” Communications for Adoption of New Product or Idea. The first part of our search leads us back to the experimental marketing research of 1950’s that focused on modeling diffusion and adoption of new products in the market. These efforts were primarily aimed at improving marketing methodologies considering the effect of “word-of-mouth” communications which was established to be a key factor in convincing customer to make a purchase decision [Brooks, 1957]. Contrary to popular beliefs of technology alienating individuals from society and norms of conformance, it was found that new generations much less fixed by traditions are all the more dependent on social conformance to “keep in tune with the life style of the moment”. Furthermore, the need for such consensual validation was observed to be more pronounced in the early introductory period of a new product in the market. Most of the earlier research focused on the individual attributes and behavior of users and generally suffered from a lack of understanding of network phenomenon. The study on the integration of network and diffusion models [Valente, 1995] opened up a broad array of research topics where the influence spread was studied not in isolation but with that of the underlying network topology. One of the first of models of this kind was also focused on improving sales by marketing to not only those customers who would be expected to buy a new product but also to the customers with a higher network value or those who could influence others in their network to make a purchase decision [Domingos and Richardson, 2001] [Richardson and Domingos, 2002]. In these papers, it is established that considering the potential of individual entities in a network to adopt a certain idea or product without considering their network position and how they can influence other entities in the network would result in a sub-optimal solution while trying to maximize profit from sales. Hence, it is important to consider the network structure along with individual attributes while we are trying to maximize the expected growth in awareness. Contribution of Winners in Spreading Awareness During 1950’s, it was found that adoption of a new idea or product generally follows a “two-step flow of communication” wherein any propaganda is first evaluated and adopted by opinion leaders who in-turn motivate the less active sections of the population to follow suit [Katz, 1957]. Note that interpersonal relations can be viewed as (1) channels of information flow, (2) channels of social pressure, or (3) channels of social support. Apart from the conventional factors that result in an entity being influenced by peers, the author describes a phenomenon of undecided entities to adopt an idea as a result of the “Influencee” wanting to be as much like the “Influencer” as possible. This 3
  • 10. Manjunath Holaykoppa Nanjunda Jois Literature Review strengthens the premise that an environment protecting lottery system should be fair, i.e., favoring active energy savers. In other words, the households that save more energy should be more likely to win. As such, they would play an effective role in influencing others as they would be perceived to be in a more desirable position as compared to that of non-savers. Another study during this period reveals that “No material interest is involved in the recommendation” - is the most basic motivation for the “listener” in accepting and acting on any recommendation [Dichter, 1966]. This generally holds true in the setting of energy saving, as a winner sharing their experience participating in the program and winning the lottery does not entail any material interest, and hence, is well positioned to spread the awareness, e.g., in comparison with direct marketing options. Effect of Awareness Gain towards Higher Engagement and Improved Energy Conscious Behavior The popular Bass Model [Bass, 1969] depicts how timing of a consumer’s initial purchase is corre- lated to number of previous buyers in the society. The general diffusion model proposed by Bass provides a widely accepted framework for forecasting the adoption of new product: when the po- tential consumers do not have an objective basis to evaluate the product, they wait for a general acceptance of the product in the community. While this was one of the first models considering the consensual validation as a factor for adoption of new product or idea, there have been many other studies [Granovetter, 1973][Granovetter, 1978] [Kempe et al., 2003] [Brown and Reingen, 1987] that demonstrate the validity of herd behavior in human decision making. Also, it is found that one can expect a “system delay” in adoption of risky ideas, wherein the stable influence adoption configu- ration delays accepting the risky innovation until others have demonstrated its feasibility through prior adoption [Becker, 1970]. Considering that the lottery system does not entail any material risk by way of participating, we need not consider this aspect while modeling our problem. Hence, we can conclude that the willingness to adopt a new idea increases with increase in social pressure and increase in connected peers already adopting the idea. As such, for modeling the lottery system, we can assume that the degree of engagement of a household would increase proportional to the awareness brought about by the connected peers who are already actively saving energy. Since, by the virtue of design of the lottery system, where winners are more likely to be exhibiting energy conscious behavior, we can justifiably assume that the more winners an individual finds close by, the more their awareness and the degree of engagement towards saving energy can be expected to increase. Impact of Multiple Winning Neighbors on Awareness Gain of Individual Entities The analytical model aimed at predicting market responses to new consumer products utilizes a model of “diminishing returns to media weight” [Claycamp and Liddy, 1969] and has found to closely match real world market responses [Mahajan et al., 1984]. To elaborate, incremental influence gain by an individual exposed to the same or similar nature of marketing would result in marginal returns each time. This relates well to our problem with the possibility of more than one winners in the neighborhood of any household. We would like to incorporate the concept of diminishing returns on awareness gain with increase in number of winning neighbors, as we may expect to find overlap of information shared resulting in reduction of new information obtained each time. Alternatively, this phenomenon can also be expressed as the level of interest in adopting a new product or idea, which might increase substantially the first few times a person is introduced to the idea but repeated exposure to the same information may become redundant and less effective. 4
  • 11. Manjunath Holaykoppa Nanjunda Jois Literature Review A study reveals that there is an increase in probability of participation of an individual with increase in perceived number of participants already existing in the network [Granovetter, 1978]. However, in another study we note that a small group of highly resourceful and interested actors may contribute to a cause regardless of their own personal gain if they believe that their actions can make a positive change to the community [Macy, 1991]. In this model, as opposed to each individual deciding based on the number of existing households already saving energy in their neighborhood successively, all the participants would chose to participate or not based on their perceived level of existing participants and their personal interests. The aforementioned article elaborates on the differences between individuals deciding simultaneously based on what they can perceive at a given point in time (parallel choice) to that of individuals deciding one after the other sequentially with each person’s decision having an impact on the next one in the chain (serial choice). While there is considerable difference in the two models, the results of simulation experiments indicate that the expected level of contribution in both cases increase in the beginning when there are fewer adopters of the idea but levels-off in the end. Modeling Influence Spread and Awareness Gain over the Net- work While the influence gained by an individual can be attributed to many factors, we can consider one of the most important factors — an enthusiastic recommendation from a neighboring household upon winning the lottery prize. The number of people willing to communicate with their peers about a product is considered the single most significant factor in evaluating customer retention and degree of acceptance of a product in a community [Reichheld, 2003]. With the understanding that winning households play a key role in spreading awareness gained by among their neighbors, we try to establish the degree to which one or more winning neighbors contribute to increase in awareness gained by an individual in the network. As mentioned earlier, it is worthwhile to consider a diminishing returns concept to model the awareness gain. An analysis of historical data from Wikipedia where social interactions defined by editors writing on the user-talk page of others [Cosley et al., 2010] confirms this phenomenon of increasing proba- bility of editing an article by an individual based on the number of connections who are previous editors as shown in Figure (2.1.a). We can observe here and in other studies of similar nature [Shi et al., 2009] [Angst et al., 2010] [Goyal et al., 2010] that the probability of a user acting on a certain aspect increases drastically when the first few of their neighbors adopt this behavior. However, the increase in probability quickly levels off with little to no effect from actions of remaining users in the network. As such, we can consider a monotonically increasing piecewise linear function related to a small number of neighbors, which, upon winning would result in an awareness gain with diminishing returns with increase in the number of winning neighbors. In another study on adoption of Twitter hashtags [Romero et al., 2011], it is found that re- peated exposures to a hashtag continue to have significant marginal effects on increasing the prob- ability of adoption as shown in Figure (2.1.c). It is also shown that the probability of an indi- vidual joining a community is influenced by number of friends he or she has within the community [Backstrom et al., 2006] as shown in Figure (2.1.b). A research conducted on user grouping behavior on online forums also confirms the law of diminishing returns based on study over different online forums [Shi et al., 2009], results of which are as shown in Figure (2.1.d). 5
  • 12. Manjunath Holaykoppa Nanjunda Jois Literature Review Figure 2.1: Empirical Evidence of a Diminishing Returns Phenomenon in Social Influence Top Left: Probability of editing an Article on Wikipedia as a function of number of previous editors who are also connections through an interaction network. Top Right: Probability of joining a LiveJournal community as a function of the number of friends already in the community. Bottom Left: Fraction of users who adopt a hashtag after their Kth exposure to it. Bottom Right: Probability of a user joining a community as a function of number of reply friends who are active in that community. Degree of Engagement as a Continuous Parameter as Opposed to Binary Choice of Participation In a study aimed at understanding diffusion delays or failure associated with some innovations, a simple model is setup, where, the level of belief regarding an idea is said to increase with every social interaction confirming the same upto a threshold value [Deroıan, 2002]. In other words, as opposed to the binary choice of decision making prevalent in the scenario of purchasing a new product, in this model, the state of each agent is given by an expected utility which describes the disposition to pay for adopting the new technology, from bad to good appreciation. This conforms well with the concept of participating in the energy conservation program, as the desired behavior from participant cannot be construed as a decision to participate or not; rather, it is the degree of awareness or inclination towards making efforts to save energy. As such, we could adopt a continuous parameter to account 6
  • 13. Manjunath Holaykoppa Nanjunda Jois Literature Review for this level of inclination of the participant and the final objective of this paper is to maximize the degree of awareness while considering constraint to ensure lottery fairness. Ideal Number of Winners to be Selected Assuming the influence spread is correlated to the winning amount, we can model the problem to find an optimal number of winners. However, this would increase the complexity of the problem of assigning probabilities of winning to groups of households. Furthermore, based on a simulation study on large scale network [Aral et al., 2011], it is found that seeding more than 0.2 % of the population is wasteful because the gain from their adoption is lower than the gain from their natural adoption (without seeding). Hence, we would like to limit the value of this parameter to a fixed value. Also, a survey on “Energy Awareness Spread” shows that majority of individuals would consider $1,000 as a significant prize amount that could motivate them to undertake additional energy saving efforts. Therefore, assuming that each household depicting energy conscious behavior saves at least $10 on average, it would be counterproductive to select more than 1% of such population as winners, as this would reduce the winning amount and may result in lack of interest from even winning households [Nayak A.V, 2013]. This survey also shows that only a small portion of a population would be interested in actively saving energy to begin with. Hence, we consider 1% of household contributing energy savings as the number of winners to be selected for the problem under consideration. Importance of Fairness for Positive Awareness Spread Lastly, we consider the aspect of fairness, which may very well be the most important aspect of this paper and in developing the mathematical model. While intuitively, fairness intentions seem to play an important role in economic relations, political struggles, and legal disputes but there is surprisingly little direct evidence for its behavioral importance. A study on this topic indicates that the attribution of fairness intentions is important in both the domains of negatively and positively reciprocal behavior [Falk et al., 2008]. This study proves that people not only take the distributive consequences of an action but also the intention it signals into account when judging the fair- ness of an action. Another study specific to the lottery system indicates that results produced by an unbiased procedure tends to be more acceptable than those produced by a biased procedure [Bolton et al., 2005]. In other words, people are willing to impose a cost both on self and others to resist procedures that they deem to be biased against them. Establishing the Social Network Structure While we have assumed that the underlying social network of the community being considered for the problem is readily available, this may not always be the case. In modern times, with the extensive use of cell phones and Internet connectivity, it may be fairly easy to establish the social network of a community. It has been found that using cell phone records we can establish the friendship network with a high degree of accuracy [Eagle et al., 2009]. Aggregating the available digital footprints of online users we can infer the network structure and observe macroscopic behavior. This is arguably one of the most important aspects of computational social science [Lazer et al., 2009]. There also exist many Apps intended to be used as tools for social bonding while performing various routine activities like fitness training, shopping and eating, etc. These Apps generally collect data pertaining to the individuals habits, behavior and social connections. Underlying social networks of a community as related to energy conscious behavior can be mapped by Apps developed for such purpose. 7
  • 14. Manjunath Holaykoppa Nanjunda Jois Literature Review Other Popular Models Not Applicable to Awareness Spread through Lottery System Apart from the aforementioned articles, there are other notable publications on the subject of Influence Maximization modeling [Kempe et al., 2005] [Borgatti, 2006] which are not considered here in detail, for two reasons : (1)These methods typically construe influence propagation through activation of individuals in the network, whereas our problem is better modeled as a continuous degree of awareness or participation,(2) Most of these methods are computationally intractable and only provide heuristics to arrive at an approximate solution. The focus of this paper is to develop a practical model which is readily usable to increase awareness spread in a community of interest. 8
  • 15. Chapter 3 Influence Maximization for Social Lottery Design: Intuition and Problem Formulation 3.1 Problem Description With reference to the framework of lottery system described in Chapter 1, we need to assign prob- abilities to all possible sets of fixed number of winners so as to maximize the awareness spread in expectation. Awarding a lottery prize to certain number of participants would motivate them to con- tinue their energy savings endeavor and also to spread awareness among other households connected to them in the social network. For such a lottery system to be fair and impart a positive message in the society, it would be appropriate to provide a higher chance of winning to participants who have engaged in undertaken efficient energy measures to reduce their energy consumption. When a prize is awarded to participants who have put forth considerable effort in reducing their energy they would be highly motivated to share their experience with their peers and would influence a positive change in the behavior of their neighbors. While the aforementioned conditions are met, we would like to maximize the spread of influence without disturbing the fairness of the lottery system. In other words, regardless of probability of any group of households winning, the individual probability of any household winning through all combination of groups would remain proportional to their savings. The idea here is to spread the lottery amount across the network so that the influence spread is not restricted to a certain region that may have a higher concentration of individuals contributing to higher energy savings. To illustrate this, let us consider a hypothetical case with social network of 34 participants [Zachary, 1977] and number of winners to be selected is 2. To clearly understand the nature of this problem, let us consider an extreme case where savings contributed by nodes 1, 33 and 34 is equal and significantly higher than that of remaining participants as illustrated in Figure (3.1). If we were to select two winners from this system arbitrarily without considering the effect of influence then the probability of teams {1, 33}, {1, 34} and {33, 34} winning would be same. However, as we can see from Figure (3.2), the influencing potential of team {1, 34} is significantly higher compared to that of {33, 34}. The objective of the problem is to assign higher probabilities to such teams with a higher in- fluencing potential while ensuring the probability of expected earnings of any household through different combinations of teams is still proportional to their individual energy saving contribution. 9
  • 16. Manjunath Holaykoppa Nanjunda Jois Problem Description Figure 3.1: Social Network Depicting Active Energy Savers in a Community (a) Awareness Spread when Nodes 33 and 34 Win (b) Awareness Spread when Nodes 1 and 34 Win Figure 3.2: Awareness Spread with Different Groups of Winners With this, we can now list the specific challenges involved in solving such a problem as follows: 1. Enumerating all possible groups of winners. 2. Assigning probability of being picked as the winners for each group while ensuring : • The lottery system is fair , i.e., chances of winning a certain amount through lottery for individual households — hereinafter referred to as “Marginal Probability” is proportional to their energy savings contribution. • The awareness spread is maximized over the social network. Let us consider a community with a set of N households where we intend to launch the afore- mentioned lottery system. Let N be the set of households who succeed in saving energy in a given period and let M be the total monetary equivalent of amount of energy saved. Now we can fix the number of winners among which the pooled amount M is distributed as S where S = N /100. Assuming we do not award prize to households not saving energy we would have a total of N S groups to assign probabilities. While in some cases this may restrict the awareness spread to a spe- cific region where households are already demonstrating energy conscious behavior, this restriction can be easily lifted by considering a minimal probability of winning for all households regardless of their energy savings contribution, in which case N = N . Let us assume that we can enumerate all 10
  • 17. Manjunath Holaykoppa Nanjunda Jois Calculation of Influencing Capacity possible combination of winners say K, such that K = N S . For the sake of simplicity, we shall use names of the sets synonymous with total number of elements in their respective sets wherever applicable. Let us consider an index of i for every household in the set N and an index of k for every combination of winners, we can define Pi to be probability of a household to win a prize amount of M such that i∈N Pi = 1. Then, PiM would be the monetary gain for the household i in expectation. Also, let Ck represent the amount of awareness spread over the network when the prize amount is distributed among members of group k. Now, our entire problem can be reduced to assigning a probability value Xk for each group of winners k such that CkXk is maximized, while ensuring that the expected amount of winning of any household i is still Pi.M. Thus, Xk would be the only decision variable in this model. 3.2 Calculation of Influencing Capacity In order to calculate the influencing capacity of every group of winners, it is necessary to establish a model for propagation of influence through the network and awareness spread. With reference to the literature review on influence spread detailed in Chapter 2, we find evidence to support the argument that there exists strong correlation between adoption of new idea and number of neighbors who have already adopted the idea. We observe that the probability of adoption is considerably high initially but remains constant after a certain number of neighbors have adopted the idea. Further increase in number of people who adopt the idea has little effect on improving the probability that the idea is adopted. In the survey conducted for the purpose of understanding expectation of individuals to undertake effort to save energy [Nayak A.V, 2013], we observe that a finite number of influencing agents in the neighborhood of an individual would suffice to motivate them to take part in energy conservation. With this understanding, we construct a piecewise linear function to establish the influence gained by an individual in relation to the number of winners in their neighborhood as shown in Figure (3.3). Figure 3.3: Piecewise Linear Function to Model Awareness Gain Related to Winning Neighbors With this, we can calculate the total awareness spread Ck when a group k wins the lottery by aggregating the awareness gained by all N individual households in the network. 11
  • 18. Manjunath Holaykoppa Nanjunda Jois Multi-Team Influence Maximization Problem 3.3 Multi-Team Influence Maximization Problem Assuming that we have now listed all the K possible groups of winners and calculated their cor- responding awareness spreading capacity Ck, we can formulate a Linear Program to determine the optimal assignment of probabilities to each of these K groups to maximize the total awareness spread in expectation, subject to a constraint ensuring the expected winning amount of all individual house- holds are proportional to their energy savings contribution. To define such a constraint, let us represent each winning group k as ak such that, aik = 1 if participant i is a member of team k, 0 otherwise. Since we know that the total awareness spread through any single group of winners k is Ck and that group k has a probability of Xk of being selected, we can define CkXk as the expected awareness spread from group k. We formulate a linear program to maximize such awareness spread as follows: Multi-Team Influence Maximization Problem (MTIMP) max : k∈K CkXk (3.1) Subject to : k∈K aikXk S = Pi, ∀ i ∈ N , (3.2) Xk ≥ 0, ∀ k ∈ K. (3.3) Equation (3.1) represents the expected awareness gained by all households based on the prob- ability Xk of group k winning the lottery prize. Equation (3.2) represents the probabilities of all different groups through which a household i could win. As such, sum of these should be equal to the marginal probability of the household winning. To clarify, we are distributing the winning amount equally among all members of the group selected, the expected winning amount would be M S . Also, since this needs to be proportional to expected winning amount of the household based on their contribution to savings pool we have: k∈K aikXk M S = Pi M, ∀ i ∈ N . Canceling M from both sides of the equation we have the following constraint: k∈K aikXk S = Pi, ∀ i ∈ N . Lastly, we have the non-negativity constraint defined by Equation (3.3) for the decision variable Xk. We do not include any constraint to ensure that the probabilities of the all K groups add up to 1 as this is satisfied provided sum of marginal probabilities Pi of all households add up to 1. Proposition 1: k∈K Xk = 1 as long as i∈N Pi = 1. Proof : By summing both sides of Equation (3.2) over all i we have, i∈N k∈K aikXk S = i∈N Pi, 12
  • 19. Manjunath Holaykoppa Nanjunda Jois Multi-Team Influence Maximization Problem When i∈N Pi = 1 we get, k∈K i∈N aikXk S = 1. Since, i∈N aik = S by definition we have, k∈K XkS S = 1, k∈K Xk = 1. Proposition 2: Total number of groups with non-zero probability in an optimal solution is always less than or equal to N . Proof: Assuming there are K groups and this model would have N constraints, ignoring the non- negativity constraint of the decision variable. As such, any basic feasible solution will have at most N basic variables. Therefore, if an extreme-point algorithm such as the simplex method is used, any optimal solution will have at most N teams with probability value more than 0 while the remaining K − N groups would have probability of 0. While this model accurately represents the problem at hand, enumerating all possible combina- tions of winners (i.e., N S in total) is computationally prohibitive. Hence, we consider an alternate approach that uses column generation to only work with those variables or set of teams that have a significant effect in spreading awareness, resulting in better computational efficiency. 13
  • 20. Chapter 4 A Column Generation Approach for MTIMP In general, column generation divides the original problem into two problems, namely: the master problem and the subproblem. The master problem is a reformulation of the original problem defined over a manageable subset of variables. At each iteration of the column generation algorithm, addi- tional variables are introduced to the master problem via the subproblem, in case those are required to declare optimality. Column generation has been successfully applied to solve a wide array of industrial problems generally associated with a large number of variables such as vehicle routing, crew scheduling and cutting stock problems. 4.1 Framework Overview Considering that enumerating all possible combinations of winners is computationally difficult, we can begin solving the problem with only a few combination of winners. The set of variables considered for solving the master problem is referred to as the candidate list. Upon solving such a restricted master problem, we can analyze the values of dual variables to determine a new combination of winners to be included in the master problem. However, to be able to do this, we would first need to have a set of variables K among which one can obtain a feasible basis to the master problem. More formally, we need to start with a set of variables for which there exists a convex combination of these that would satisfy the marginal probability constraint of all households considered. Unfortunately, because of the problem’s structure generating such a K is not a trivial task. 4.2 Initial Basic Feasible Solution For any given problem with N households with positive energy savings contribution, we can initialize K a set of N dummy teams, each of those composed by a single individual household. Note that, upon solving the master problem with this set of variables, assuming that dummy team k is composed of household i, the optimal solution would be the trivial Xk = Pi, as the marginal probability constraint can only be satisfied when the probability of every team is equal to that of the marginal probability of the individual household contained therein. Clearly, since such dummy teams only contain one member as opposed to S (which is the required size), the resulting solution is not valid for the original problem. However, adding these teams provides the master problem with an initial basis. Now, to avoid having the dummy teams in the final solution, their corresponding influence capacity is set to 0 or less. Hence, after several iterations, these variables will be sequentially replaced 14
  • 21. Manjunath Holaykoppa Nanjunda Jois Master Problem for MTIMP by teams of valid size. This approach for generating initial feasible solutions is motivated by other penalization methods (Big-M), typically used when solving linear programs [Bazaraa et al., 2011]. 4.3 Master Problem for MTIMP We can define the initial set of variables as the set K such that K ⊂ K. Later on, we shall elaborate on ways to start with better quality initial feasible solution which would help us obtain the optimal solution faster. If we consider a set of N variables as described earlier, we can define the restricted master problem as below: max k∈K CkXk (4.1) Subject to : k∈K aik Xk S = Pi, ∀ i ∈ N , (4.2) Xk ≥ 0, ∀ k ∈ K . (4.3) Proposition 3: The optimal solution of the restricted master problem always forms a lower bound to the original problem Proof: Assuming none of the existing variables in the candidate list are removed, the feasible region including the optimal solution for the master problem will only increase with inclusion of new variables. Therefore, this solution would always be a feasible solution in all subsequent iterations and as such the optimal solution of the master problem cannot be lower than this value. Hence, we can consider the optimal solution of the master problem in any iteration to be a lower bound to the original problem. Upon solving the restricted master problem, we can analyze the dual values associated with each constraint to determine the impact of increasing or decreasing the right hand side of the constraint on the objective function value. More specifically, the dual values obtained after solving the master problem represents the change in the objective function value per unit change to the right hand side of the constraint. By considering all the dual values, we can estimate the implicit cost of including a new group of winners in the optimal solution as the sum product of all the dual values associated with the marginal probability of winners present in the group. For a group of winners k the reduction to objective function value by including this team in the optimal solution is given by i∈N aik S βi, (4.4) where βi is the dual value associated with the constraint obtained by solving the master problem. 4.4 Pricing Problem (Sub Problem) for MTIMP The premise in column generation is that many of the variables will be non-basic in an optimal solution to the original problem, so one should only generate those variables that have the potential to improve the objective function value. Thus, the objective function for the pricing problem is the reduced cost calculation of the non-basic variables. The reduced cost of a variable represents the marginal improvement in the objective function that would be obtained if the value of the variable is increased. The reduced cost of including a group k is given by the difference between increase to the objective function and the penalty for 15
  • 22. Manjunath Holaykoppa Nanjunda Jois Pricing Problem (Sub Problem) for MTIMP including the said team as described in Equation (4.4). Hence, we can compute the reduced cost for any group of winners k as Ck − i∈N aik S βi, (4.5) A group k can increase the objective function value only when the value of the reduced cost depicted in Equation (4.5) is greater than zero. In other words, when the reduced cost of a vari- able is greater than zero, the increase in the objective function is greater than that of the penalty associated with including the said group. Hence, our objective for the pricing problem would be to determine a group of winners k such that Ck − i∈N aik S βi > 0. (4.6) In order to determine such a group k, we can formulate an Integer linear program with an objec- tive function to maximize the reduced cost by considering which households can be included in this group. For this purpose, let us define Y as the binary vector of size N representing the decision variable where Yi = 1 if participant i is a member of the new group of winners, 0 otherwise. It is worthwhile to noting that Y will later represent the incidence vector ak when the new group of winners k is then added to the master problem. Hence, we can define the reduced cost in Equation (4.6) as Ck − i∈N Yi S βi. (4.7) Now, the only unknown value in Equation (4.7) is Ck, which can be computed as a function of Yi. More specifically, Ck can be computed by aggregating the awareness gained by each individual in the network when team k wins. The individual awareness gain is a function of number of neighbors in the vicinity of the individual which is again a function of Yi. Furthermore, the function repre- senting the awareness gain based on the number of winning neighbors as depicted in Section 3.2, is essentially non-linear. To linearize this expression, let Zi be represent the awareness gained by a household i upon winning of a group k. Now, we can compute Zi by counting the number of winners in the neighborhood of i, say wi when the group k wins. We can represent all the connections of household i in the social network as eij such that eij = 1 if household i is connected to household j, 0 otherwise. Now, we can compute the number of winning neighbors in the vicinity of household i when a team k given by Y wins as below j∈N eijYj = wi, ∀ i ∈ N. (4.8) We can represent the piecewise linear function for determining the awareness gained by each individual as a set of lines expressed through a set of slopes and intercepts of the corresponding line. In other words, let mt and bt represent the slope and intercept of line t of the piecewise linear function considered. Also, let bT be the highest intercept representing the maximum awareness that can be gained by any individual. Now, based on the values of wi obtained through Equation (4.8), 16
  • 23. Manjunath Holaykoppa Nanjunda Jois A Column Generation Algorithm one can express the awareness gained by individual i using variable Zi constrained as follows: mtwi + bt + bT Yi ≥ Zi, ∀ i ∈ N, ∀t <= T (4.9) Here, the term bT Yi represents the fact that the motivation level of a household would reach its highest limit when they themselves have won, regardless of how many of their neighbors have won. To ensure there is no additional awareness gain accounted for beyond the maximum limit bT we need to have Zi ≤ bT , ∀ i ∈ N. (4.10) Under constraints (4.8), (4.9) and (4.10), and by incorporating Zi into the maximization problem objective, one ensures that the value of Zi, indeed reflects the awareness gained by any individual household i when a team k represented by Yi wins the lottery. Aggregating awareness gained by individual households over the entire network, we obtain the total influencing capacity of team k over the network i∈N Zi = Ck, when equations (4.8 - 4.10) are satisfied. Hence, one can formulate the pricing problem as follows: max : i∈N Zi − i∈N Yi βi S Subject to : j∈N eijYj = wi, ∀ i ∈ N, mtwi + bt + bT Yi >= Zi, ∀ i ∈ N, ∀t ≤ T, Zi ≤ bT , ∀ i ∈ N, i∈N Yi = S, Yi ∈ {0, 1} ∀i ∈ N , Zi ≥ 0, wi ≥ 0, The last set of constraints ensures that the number of winning households is equal to required and fixed number of winners S, non-negativity and binary constraints. 4.5 A Column Generation Algorithm In order to iteratively solve for the optimal solution using the aforementioned master and sub problem, we need to follow the column generation algorithm as stated below: 17
  • 24. Manjunath Holaykoppa Nanjunda Jois Further Enhancements 1: Construction of an initial feasible solution: An initial basic feasible solution is generated by one of the techniques mentioned in this paper. 2: Restricted Master Problem: The Integer Linear program is solved with the current candidate list of variables and the dual variables are obtained as β. 3: Sub-Problem: With β values obtained from previous step, we solve the sub-problem. 4: Evaluation of optimality : If the objective function value of sub-problem ≤ 0, it means there does not exist any new group of winners which when added to the candidate list would improve the objective function. Hence, the incumbent solution obtained at Step 2 would be optimal. Otherwise, proceed to Step 5. 5: Update candidate list: The new group obtained from the Sub-problem is added to the can- didate list and we proceed to Step 2. 4.6 Further Enhancements Stopping before Optimality: We can also terminate the column generation process when we are close to the optimal solution, in an effort to save computational time by evaluating if the objective function value is ≤ in Step 4. Here represents the error tolerance or the permissible distance from optimal solution at which we would wish to stop the process. Improved Initial Basic Feasible Solution: While we are solving the multi-team social lottery design problem using the above algorithm, it is worthwhile to note that the initial basic feasible solution considered here is invalid for all practical purposes if we wish to distribute the prize among more than one winners. This results in poor convergence in the initial stages as we would need to run at least N iterations to find a set of new valid variables to replace the initial basic feasible solution considered. Hence, it would be beneficial to consider any algorithm to improve the initial basic feasible solution. One strategy here could be to include a random combination of groups of winners along with the simple solution of groups consisting of only one winner which would definitely yield a feasible solution. If there exists a convex combination of these randomly generated groups along with the singleton groups such that marginal probability constraint is satisfied then this would yield a better starting solution than with just singleton groups as their starting objective solution value is always 0. Better yet, we can append a group of winners highly likely to be in the optimal solution based on their individual attributes like Degree centrality or Eigenvector centrality measures. We can observe here that the contribution of any group of winners to the objective function value is equal to the product of the group’s aggregate influencing capacity and the probability of the group winning. Furthermore, the probability of any group is limited by the minimum of marginal probabilities of all winning households in the group. To clarify, the probability of any group winning can never be more than that of the marginal probability of any winner in the group as this would violate the fairness constraint. As such, the minimum marginal probability of winning households in a group forms the upper bound for the probability that can be assigned to this group. Considering this, we can generate groups which can result in a high contribution to the objective function and hence are more likely to be considered in the optimal solution using the heuristics detailed at the end of this Section. The idea here is to generate groups of winners with high influencing capacity that could poten- tially be part of the optimal solution. We can accomplish this by sequentially adding individual households based on the number of neighbors they have and their individual marginal probability. Upon considering a set of households for a team, we can consider the minimum marginal probability of these households as the maximum probability that can be assigned to the team. Once this is done, before finding the next set of households to be grouped, we can reduce the marginal probability of 18
  • 25. Manjunath Holaykoppa Nanjunda Jois Further Enhancements the households already considered by the maximum probability that can be assigned to the group they are present in. By sequentially forming new candidate teams in this manner we can expect to form N − S groups, all of which could be present in the optimal solution without violating any constraints. However, the last set of S winners that may be left out can only form a team when all of their remaining probabilities are equal. In the unlikely event that this happens, we would not only have a feasible solution but also a solution close to the optimal. While, the N − S group of winners generated thus far may not form a feasible solution on their own, they form good candidates to be included in the candidate list along with the singleton groups of winner detailed earlier. Algorithm to generate such teams is as follows: 1: Initialize Let Current Winner = 1, Current Team = 1 2: Search for a household with the highest product of their marginal probability and number of neighbors connected to them. Also, ensure that the household is not already considered in the group with index Current Team. 3: Add the household to the group Current Team 4: Increment Current Winner and mark as used for Current Team 5: If Current Winner = S then increment Current Team, set Current Winner = 0, set marginal probability = marginal probability - Minimum of Marginal probabilities of all households in Current Team and return to Step 2. 6: Go to Step 2. 19
  • 26. Chapter 5 Computational Experiments 5.1 Experimental Design To evaluate the performance of this model, we considered two real world social network data-sets namely: (1) Advagto trust metric [kon, 2014] converted to an undirected network resulting in N = 5,155 nodes and 39,285 edges. Also, we have removed the singleton nodes not connected to any other nodes in the network as this would not add any value to our study of awareness spread. (2) Enron email network [Leskovec et al., 2009] which is also converted to an undirected graph resulting in N = 36,692 Nodes with 183,831 Edges. In a real world scenario, it is reasonable to assume only a small portion of all households willing to undertake considerable effort to save energy [Nayak A.V, 2013]. Furthermore, implementing the proposed lottery design for a community where all the households are already contributing to energy conservation may not be very effective. As such, we consider different instances with different portion of households contributing to energy savings. More specifically, for Advagato network, we consider the cases with the number of active savers N = 10%, 15%, 20%, 30%, 40%, 50% of N, and for Enron email network, N = 15%, 20% of N as active savers. Also, we consider two distributions to depict different amount of energy saved by individual households namely, Chi-square with 2 degrees of freedom and a Bi-modal normal distribution . Here, the Chi-square distribution is used assuming only a few households would be interested as well as competent to save higher amounts energy. The Bi-Modal normal distribution represents the case where one segment of the community is better equipped to save energy and the other is not while individual interests to engage in energy conservation may be normally distributed. Lastly, the number of winners in a single lottery round is taken to be S = 1%N based on the argument in Section 2, to ensure a resulting lottery prize amount which is significant enough to attract interest of households. The stopping criterion is based on the gap of = 0.00001. For instances with N < 10, 000 nodes we find that a solution very close to optimal is reached relatively quickly, within 3N number of iterations. 5.2 Random Lottery Design To establish the effectiveness of optimizing the lottery design, we compute the influence spread in the absence of the optimization model, in which case the probability of any group of households winning is given by the product of the marginal winning probabilities over all the members of the group winning together. Enumerating all the group combinations to calculate the awareness spread from each team is computationally hard. Also, when the number of active savers is considerably large, we can expect that the marginal probability of any single household is very small. As such, we consider 20
  • 27. Manjunath Holaykoppa Nanjunda Jois Random Lottery Design an approximation technique to establish the influence spread value under random winner selection, which involves assuming the energy savings and hence the marginal probability of all energy saving hours are equal. With this, we can calculate the probability of a certain number of neighbors s winning the lottery as a binomial random variable with parameters p = 1/N , n = N and k = s. By calculating the total expected awareness spread by way of such an approximation yields values for each instance and by running the program close to 30 hours (wall time) we find results as shown in the Appendix. While we do not find any significant difference in computational efficiency between the two distribution types, we observe that the computational time increases with increase in number of winners to be selected or with increase in number of active savers. This can be understood as the number of constraints of the pricing problem increases with increase in the size of network and the feasible region increases with increase in the number of winners to be selected. Results related to the objective solution, binomial approximation and the improvement achieved through use of the optimization model as shown in Tables (5.1 - 5.4). Active savers N’ Iterations completed Awareness spread Upper Bound GAP Optimization EffectivenessRandom Design MTIMP 515 1243 29.93 40.58 40.58 0.00% 35.60% 773 4621 55.3 68.99 68.99 0.00% 24.80% 1030 5304 66.75 101.36 102.07 0.00% 51.90% 1546 5675 102.6 150.79 156.16 0.00% 47.00% 2061 6517 118.17 192.16 207.21 0.10% 62.60% 2577 5775 149.33 223.8 270.92 0.20% 49.90% Table 5.1: Results of Computational Study on Advagato Network with Bi-modal Distribution for Savings Active savers N’ Iterations completed Awareness spread Upper Bound GAP Optimization EffectivenessRandom Design MTIMP 515 1323 29.93 42 42 0.00% 40.30% 773 5017 55.3 64.56 64.56 0.00% 16.70% 1030 6220 66.75 104.33 104.99 0.00% 56.30% 1546 6492 102.6 141.36 152.81 0.10% 37.80% 2061 5408 118.17 189.99 264.51 0.40% 60.80% 2577 2380 149.33 180.32 423.87 1.40% 20.80% Table 5.2: Results of Computational Study on Advagato Network with Chi-square Distribution for Savings Active savers N’ Iterations completed Awareness spread Upper Bound GAP Optimization EffectivenessRandom Design MTIMP 5502 710 241.71 267.83 1552.11 4.80% 10.80% 7339 932 414.85 309.85 2260.42 6.30% -25.30% Table 5.3: Results of Computational Study on Enron Email Network with Bi-modal Distribution for Savings 21
  • 28. Manjunath Holaykoppa Nanjunda Jois Results Active savers N’ Iterations completed Awareness spread Upper Bound GAP Optimization EffectivenessRandom Design MTIMP 5503 1385 241.67 242.32 1551.21 5.40% 0.30% 7338 1036 414.9 317.83 2108.87 5.60% -23.40% Table 5.4: Results of Computational Study on Enron Email Network with Chi-square Distribution for Savings 5.3 Results Through the plots of lower bound and upper bound on objective function value against number of iterations, one can observe that the upper bound drops drastically during the initial phase while the drop reduces as the gap from optimality reduces. Hence, starting with a higher quality initial feasible solution would enable reaching optimal solution or desired gap from optimal solution quicker. We can observe from Tables (5.1 - 5.4) that for the cases where optimization model has reached the stopping criteria or is otherwise close to optimal, there is significant improvement in the objective function value while adopting the optimization model over picking lottery winners through random design. We find that the optimal solution found through the column generation method is on an average 40% better compared to that of picking the winners only based on their marginal probability without considering the social network topology. This is a considerable improvement and would help spread the awareness about energy conservation much faster than traditional means. Furthermore, this would have a cascading effect as with more households savings energy actively, the lottery amount can be distributed among more households ensuring dramatic spread of awareness over subsequent periods. 22
  • 29. Chapter 6 Discussion The mathematical model shown here can be applied to other problems of similar nature as well, wherein there is no implicit “activation” point for an individual to act based on influence gained from their neighbors and the influence gained is linear piecewise monotonically increasing function. For example, the model is applicable to brand awareness programs, where a company maybe willing to provide free samples to a few individuals who are generally interested in the company’s products, who can evangelize a new product among their peers and improve brand reputation as well as awareness. This is very similar to the lottery problem as their is no purchase decision to be made, rather a degree of enthusiasm which would be shown by individuals regarding a product. Furthermore, since the selected individuals obtain free sample, it can be construed as that of winning a lottery. This would highly depend on the nature of product (obtaining a brand new cell phone would be similar to winning a lottery while obtaining free samples of shampoo may not be just as good) and how company plans to start the seeding process. In case of inexpensive products, companies can choose to provide gift hampers including an array of their products to improve the experience of the individuals selected for seeding. For such a scenario, we could use the column generation approach for MTIMP to solve for the optimal assignment of probabilities to teams of seeds to maximize the expected awareness spread. 6.1 Limitations For the sake of our study we have considered that the number of winners are fixed and known. Also, the total savings would be distributed equally among all participants. This is a limitation, since winners with higher amounts may spread more awareness and as such it may be beneficial to allot appropriate amount to each winners to maximize the awareness spread. Incorporating this aspect would result in the number of winners and the amount won by them to be decision variables which would further complicate the problem but worth considering for future research. This also implies that the function considered here to depict the influence gain would be different for each winner winning a certain amount. For the influence spread, we limit ourselves to considering the awareness spread delivered directly by the winners. However, in reality we can expect people to be influenced via more distant connec- tions as well. Our model can be easily updated with additional constraints to include influencing effects from friends of friends. However, this would increase the computational time required to solve the problem. In our model, we have considered that the initial awareness or the level of interest of all households is zero, ignoring the fact that some of the households actively saved energy. This is in consideration of the fact that many households may accidentally end up saving energy in short periods, for example, they may be out on a vacation for the most part of the period. So, assuming that a certain household 23
  • 30. Manjunath Holaykoppa Nanjunda Jois Future Scope is fully aware of energy conservation techniques based on the fact that they saved energy in short periods may not be very accurate. In order to decidedly know if the energy savers are saving energy consciously, we would need to collect data over many periods, determine which participants are consistently performing well and incorporate a higher initial awareness level in our model to accurately depict the scenario of some households exhibiting a higher awareness level before the lottery prize is distributed. While our model is very similar to that of [Deroıan, 2002], the details of this paper are limited to one time step. However, by updating the initial awareness level of all households and any changes to the social network, we can repeatedly use the above method to maximize the influence spread. 6.2 Future Scope The aforementioned model can be further empowered with online social networking systems to enable increased social pressure and support. Also, since we are aware of the benefits of providing incentives over seeding strategies [Aral et al., 2011], it would be worthwhile to consider the idea of an online system to monitor the friendship status as well as savings achieved by individuals. Referring new households can be encouraged with an increase in the chance of winning lottery. While this would no doubt improve the social pressure and overall behavior it may also prove to be too intrusive and may adversely affect our objective of improving energy conservation. Since the computational time reduces drastically with a better initial feasible solution, it may be worthwhile to consider improved heuristics to determine improved basic feasible solution. Also, in the model described here we are establishing the probability of a group of households winning. Alternatively, a method can be developed to sequentially pick winners under the same premise of ensuring fairness to all households. 24
  • 31. Chapter 7 Conclusion We believe the model developed here can be effectively used to promote efficient energy usage behavior in the society. The general dis-interest towards saving energy on a routine basis can be overcome by adopting such engaging and involving programs. We have shown that the optimization technique developed herein can improve the awareness spread to a considerable degree as opposed to picking households as winners based on just their individual savings alone. The mathematical model developed here provides many insights into structure of such a problem which can be adopted for future research. Computational study demonstrates that the model can be readily solved for small communities but the computational time increases with increase in the number of nodes and winners to be selected. However, considering that the problem need only be solved once per period, the computational time may not be most challenging aspect of implementing such a social lottery design. Lastly, the concept of social lotteries to encourage some behavior has the potential to be applied on a wide variety of industry problems especially those dealing with customers connected through a social network. 25
  • 32. Bibliography [App, 2009] (2009). The effect of energy efficiency programs on electric utility revenue requirements – http://www.publicpower.org/files/pdfs/effectofenergyefficiency.pdf. [kon, 2014] (2014). Advogato network dataset – KONECT – http://konect.uni- koblenz.de/networks/advogato. [USE, 2014] (2014). Nov 2014 monthly energy review. Technical report, U.S. Energy Information Administration. [Angst et al., 2010] Angst, C. M., Agarwal, R., Sambamurthy, V., and Kelley, K. (2010). Social contagion and information technology diffusion: the adoption of electronic medical records in us hospitals. Management Science, 56(8):1219–1241. [Aral et al., 2011] Aral, S., Muchnik, L., and Sundararajan, A. (2011). Engineering social con- tagions: Optimal network seeding and incentive strategies. In Winter Conference on Business Intelligence. [Backstrom et al., 2006] Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. (2006). Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44–54. ACM. [Bass, 1969] Bass, F. (1969). A new product growth model for consumer durables. management sciences. Institute for Operations Research and the Management Sciences. Evanston, XV (5). [Bazaraa et al., 2011] Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D. (2011). Linear programming and network flows. John Wiley & Sons. [Becker, 1970] Becker, M. H. (1970). Sociometric location and innovativeness: Reformulation and extension of the diffusion model. American Sociological Review, pages 267–282. [Bolton et al., 2005] Bolton, G. E., Brandts, J., and Ockenfels, A. (2005). Fair procedures: Evidence from games involving lotteries*. The Economic Journal, 115(506):1054–1076. [Borgatti, 2006] Borgatti, S. P. (2006). Identifying sets of key players in a social network. Compu- tational & Mathematical Organization Theory, 12(1):21–34. [Brooks, 1957] Brooks, R. C. (1957). Word-of-mouth advertising in selling new products. The Journal of Marketing, pages 154–161. [Brown and Reingen, 1987] Brown, J. J. and Reingen, P. H. (1987). Social ties and word-of-mouth referral behavior. Journal of Consumer research, pages 350–362. [Claycamp and Liddy, 1969] Claycamp, H. J. and Liddy, L. E. (1969). Prediction of new product performance: An analytical approach. Journal of Marketing Research, pages 414–420. 26
  • 33. [Cosley et al., 2010] Cosley, D., Huttenlocher, D. P., Kleinberg, J. M., Lan, X., and Suri, S. (2010). Sequential influence models in social networks. ICWSM, 10:26. [Deroıan, 2002] Deroıan, F. (2002). Formation of social networks and diffusion of innovations. Re- search policy, 31(5):835–846. [Dichter, 1966] Dichter, E. (1966). How word-of-mouth advertising works. Harvard business review, 44(6):147–160. [Domingos and Richardson, 2001] Domingos, P. and Richardson, M. (2001). Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57–66. ACM. [Eagle et al., 2009] Eagle, N., Pentland, A. S., and Lazer, D. (2009). Inferring friendship net- work structure by using mobile phone data. Proceedings of the National Academy of Sciences, 106(36):15274–15278. [Falk et al., 2008] Falk, A., Fehr, E., and Fischbacher, U. (2008). Testing theories of fairnessinten- tions matter. Games and Economic Behavior, 62(1):287–303. [Gale and Kariv, 2003] Gale, D. and Kariv, S. (2003). Bayesian learning in social networks. Games and Economic Behavior, 45(2):329–346. [Goyal et al., 2010] Goyal, A., Bonchi, F., and Lakshmanan, L. V. (2010). Learning influence prob- abilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining, pages 241–250. ACM. [Granovetter, 1978] Granovetter, M. (1978). Threshold models of collective behavior. American journal of sociology, pages 1420–1443. [Granovetter, 1973] Granovetter, M. S. (1973). The strength of weak ties. American journal of sociology, pages 1360–1380. [Katz, 1957] Katz, E. (1957). The two-step flow of communication: An up-to-date report on an hypothesis. Public opinion quarterly, 21(1):61–78. [Kempe et al., 2003] Kempe, D., Kleinberg, J., and Tardos, ´E. (2003). Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146. ACM. [Kempe et al., 2005] Kempe, D., Kleinberg, J., and Tardos, ´E. (2005). Influential nodes in a diffusion model for social networks. In Automata, languages and programming, pages 1127–1138. Springer. [Lazer et al., 2009] Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al. (2009). Life in the network: the coming age of computational social science. Science (New York, NY), 323(5915):721. [Leskovec et al., 2009] Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. (2009). Com- munity structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29–123. [Macy, 1991] Macy, M. W. (1991). Chains of cooperation: Threshold effects in collective action. American Sociological Review, pages 730–747. [Mahajan et al., 1984] Mahajan, V., Muller, E., and Sharma, S. (1984). An empirical comparison of awareness forecasting models of new product introduction. Marketing Science, 3(3):179–197. 27
  • 34. [Nayak A.V, 2013] Nayak A.V, Nikolaev A.G, J. M. (2013). Increasing the energy conservation awareness using the influential power of a lottery system. [Reichheld, 2003] Reichheld, F. F. (2003). The one number you need to grow. Harvard business review, 81(12):46–55. [Richardson and Domingos, 2002] Richardson, M. and Domingos, P. (2002). Mining knowledge- sharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international con- ference on Knowledge discovery and data mining, pages 61–70. ACM. [Romero et al., 2011] Romero, D. M., Meeder, B., and Kleinberg, J. (2011). Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th international conference on World wide web, pages 695–704. ACM. [Shi et al., 2009] Shi, X., Zhu, J., Cai, R., and Zhang, L. (2009). User grouping behavior in on- line forums. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 777–786. ACM. [Valente, 1995] Valente, T. W. (1995). Network models of the diffusion of innovations, volume 2. Hampton Press Cresskill, NJ. [Zachary, 1977] Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of anthropological research, pages 452–473. 28
  • 35. Chapter 8 Appendix The graphs shows plot objective function values obtained on each iteration of column generation approach to solving the MITMP. The upper bound and lower bound of the objective function value at any iteration denoted by orange and blue dots respectively, represents the minimum and maximum awareness spread one can expect with the existing candidate list of variables on any iteration. The black line indicates the objective function value that can be obtained through random design and signifies the threshold beyond which the column generation approach can be considered to be more efficient in spread awareness through the network. One can observe from these plots that the number of iterations required to reach a certain amount of gap from optimality increases with increase in N and corresponding increase of S. Furthermore, even though the starting initial basic feasible solution may have an objective function value lesser than that of random design, note that when gap is considerably small, the objective function value of column generation approach is approximately 40% higher than that of random design. 29
  • 36. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.1: Results of Computational Study on Advagato Network - Instance 1 30
  • 37. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.2: Results of Computational Study on Advagato Network - Instance 2 31
  • 38. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.3: Results of Computational Study on Advagato Network - Instance 3 32
  • 39. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.4: Results of Computational Study on Advagato Network - Instance 4 33
  • 40. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.5: Results of Computational Study on Advagato Network - Instance 5 34
  • 41. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.6: Results of Computational Study on Advagato Network - Instance 6 35
  • 42. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.7: Results of Computational Study on Advagato Network - Instance 7 36
  • 43. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.8: Results of Computational Study on Advagato Network - Instance 8 37
  • 44. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.9: Results of Computational Study on Advagato Network - Instance 9 38
  • 45. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.10: Results of Computational Study on Advagato Network - Instance 10 39
  • 46. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.11: Results of Computational Study on Advagato Network - Instance 11 40
  • 47. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.12: Results of Computational Study on Advagato Network - Instance 12 41
  • 48. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.13: Results of Computational Study on Enron Email Network - Instance 1 42
  • 49. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.14: Results of Computational Study on Enron Email Network - Instance 2 43
  • 50. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.15: Results of Computational Study on Enron Email Network - Instance 3 44
  • 51. Manjunath Holaykoppa Nanjunda Jois Appendix Figure 8.16: Results of Computational Study on Enron Email Network - Instance 4 45