Information Networks And Their Dynamics

1,240 views
1,184 views

Published on

Information networks and their dynamics

Published in: Technology, Business
1 Comment
5 Likes
Statistics
Notes
No Downloads
Views
Total views
1,240
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

Information Networks And Their Dynamics

  1. 1. Information Networks and their Dynamics Srinath Srinivasa IIIT Bangalore and Oktave Research Foundation [email_address]
  2. 2. Partially based on the book <ul><li>Sage Publishers, New Delhi, London, Thousand Oaks, 2006, ISBN 0761935126 </li></ul>
  3. 3. Recent new additions to our vocabulary <ul><li>Telemedicine </li></ul><ul><li>SMS/MMS </li></ul><ul><li>e-learning </li></ul><ul><li>Net Banking </li></ul><ul><li>E-ticketing </li></ul><ul><li>Open-source </li></ul><ul><li>Privacy policy </li></ul><ul><li>EULA </li></ul><ul><li>… </li></ul><ul><li>Phishing </li></ul><ul><li>Hacking </li></ul><ul><li>Cyber crimes </li></ul><ul><li>Virus / Spyware / Adware / Malware </li></ul><ul><li>Cyber squatting </li></ul><ul><li>Identity theft </li></ul><ul><li>Piracy </li></ul><ul><li>… </li></ul>
  4. 4. The “information age” <ul><li>Comprehensive change brought by information and communication technologies (ICT) </li></ul><ul><li>Qualitative changes affecting the underlying mental model or the “paradigm” </li></ul><ul><li>Changes affecting the way we live (not just businesses) </li></ul><ul><li>Separation of information transactions from material transactions </li></ul>
  5. 5. The information age Material exchange network Information exchange network Internet, mobile, databases, etc Then Now
  6. 6. Material exchange <ul><li>Constrained by the laws of physics </li></ul><ul><li>Conserved transactions </li></ul><ul><li>High cost of replication </li></ul><ul><li>High cost of transportation </li></ul>
  7. 7. Information exchange with today’s ICTs <ul><li>Intangible (little or no physical constraints) </li></ul><ul><li>Non-conserved transactions </li></ul><ul><li>Extremely low replication costs </li></ul><ul><li>Extremely low transportation costs </li></ul><ul><li>Hard to “snatch away” internalized information </li></ul>
  8. 8. Information Networks <ul><li>Historically, information was “piggy backed” over a material carrier giving information networks the same characteristics as material networks </li></ul><ul><li>With today’s technologies, communication and coordination is separated from transport and logistics </li></ul><ul><li>Several kinds of transactions are pure information transactions having no material component. Ex: software, data, news, knowledge, etc. </li></ul><ul><li>How are such information networks different from material exchange networks? </li></ul>
  9. 9. Outline <ul><li>Part I: Information networks and the Power Law distribution </li></ul><ul><li>Part II: Underlying dynamics </li></ul><ul><li>Part III: Social information networks </li></ul>
  10. 10. Part I Information Networks and the Power Law Distribution
  11. 11. Distribution of marks in an exam <ul><li>i.i.d (independent and identically distributed) processes </li></ul><ul><li>Approximates a Gaussian or “Normal” distribution (binomial in the discrete case) </li></ul><ul><li>Mode near the mean </li></ul><ul><li>Very ubiquitous </li></ul><ul><li>Finite variance and the central limit theorem </li></ul>
  12. 12. Distribution of email recipients <ul><li>Most recipients have received very small number of emails </li></ul><ul><li>However, a small number of recipients have received a very large number of emails </li></ul><ul><li>Approximates the “Power Law” distribution </li></ul><ul><li>Infinite variance or scale-free system </li></ul>
  13. 13. The Power Law distribution <ul><li>Pr[X = x]  x -  for a given exponent  </li></ul><ul><li>Straight line on a log-log scale </li></ul><ul><li>Infinite variance </li></ul><ul><li>Scale-free (self similar) </li></ul>
  14. 14. Underlying random processes <ul><li>Exam system: A set of n independent random processes </li></ul><ul><li>Email system: A set of n interdependent random processes </li></ul><ul><li>Emails part of conversations </li></ul>1 2 3 4 1 2 3 4
  15. 15. Power Laws in nature <ul><li>Population distribution across human settlements </li></ul><ul><li>Global airline networks </li></ul><ul><li>WWW in-degree and out-degree </li></ul><ul><li>Sizes of blood vessels in the human body </li></ul><ul><li>Wealth distribution </li></ul><ul><li>Frequency of word occurrence in documents </li></ul><ul><li>Frequency of keyword searches on the web </li></ul><ul><li>Distribution of earthquake sizes against their frequency </li></ul><ul><li>etc.. </li></ul>
  16. 16. Characteristics of the Power Law <ul><li>Intuitive </li></ul><ul><ul><li>Very small number of very large entities and very large number of very small entities </li></ul></ul><ul><ul><li>Infinite variance or “long-tailed” distribution (for certain value ranges of the exponent  </li></ul></ul>
  17. 17. Characteristics of the Power Law <ul><li>Mathematical </li></ul><ul><ul><li>Distribution function </li></ul></ul><ul><ul><li>Scale-invariance property </li></ul></ul><ul><ul><li>log-linear relationship with exponent </li></ul></ul>
  18. 18. Other pertinent distributions <ul><li>Zipf distribution </li></ul><ul><ul><li>Empirical result for word frequencies in document corpora </li></ul></ul><ul><ul><li>f(x): frequency of word x </li></ul></ul><ul><ul><li>r(x): Rank of word x (the r th most frequent word) </li></ul></ul><ul><li>Shown to be equivalent to the power-law distribution </li></ul>
  19. 19. Other pertinent distributions <ul><li>Pareto’s law </li></ul><ul><li>x min is the min value taken by x and  > 0 </li></ul><ul><li>When 0 <  · 1, then the mean is infinite, and when 1 <  · 2, the variance is infinite </li></ul><ul><li>Informally called the 80-20 principle </li></ul><ul><li>Shown to be equivalent to the power-law distribution </li></ul>
  20. 20. Other pertinent distributions <ul><li>Log-normal distribution </li></ul><ul><ul><li>y = f(x) is log-normally distributed, if ln y is normally distributed </li></ul></ul><ul><ul><li>Approximates a power-law if the variance of ln y is very large </li></ul></ul><ul><ul><li>An alternative (sometimes better) characterization of interdependent random processes </li></ul></ul><ul><ul><li>Generated by product of i.i.d random processes </li></ul></ul>
  21. 21. Part II Underlying Dynamics
  22. 22. Non-linearity <ul><li>Interdependent system with circular causalities </li></ul><ul><li>Also called “complex systems” </li></ul><ul><li>Feedback: a central characteristic </li></ul><ul><li>Positive feedback (reinforcing loops) and negative feedback (balancing loops) </li></ul>
  23. 23. Non-linearity: growth <ul><li>Feedback makes the present state of the system, a function of the previous states </li></ul><ul><li>When x 0 > 0 and r > 1, we have positive feedback and x grows over time </li></ul>
  24. 24. Non-linearity: saturation <ul><li>However, every system usually also has a “saturation” point beyond which it cannot grow. The system reaches the saturation point asymptotically </li></ul><ul><li>If w.l.o.g. the saturation point is ‘1’ then the dynamical equation becomes: </li></ul><ul><li>This is called the “logistic” equation (population equation) and is representative of a large class of real-world systems </li></ul>
  25. 25. Logistic equation in everyday terms <ul><li>The rich get richer – up to a certain point </li></ul><ul><li>Large cities attract more migrants – until its infrastructure saturates </li></ul><ul><li>Celebrities (people who have media attention) get more media attention – until people get bored of them </li></ul><ul><li>Pages with high PageRank get higher PageRank – until either user attention or search engine popularity saturates </li></ul><ul><li>Large population leads to larger population – until resources saturate </li></ul>
  26. 26. Sensitivity to initial conditions <ul><li>Case: What happens when two or more non-linear processes share resources among themselves? </li></ul>
  27. 27. Sensitivity to initial conditions
  28. 28. Sensitivity to initial conditions <ul><li>The growth ‘r’ of both A and B feed on the same population base </li></ul><ul><li>The growth of A is at the cost of B and vice versa </li></ul><ul><li>The growth of either A or B is dependent on their present population </li></ul><ul><li>Small differentials in initial populations can tilt the balance irrevocably </li></ul>
  29. 29. Preferential attachment The population distribution among the cells follows a power law
  30. 30. Impact of growth rate on dynamics
  31. 31. Impact of growth rate on dynamics r = 3.0 r = 3.1 r = 3.2 r = 3.5
  32. 32. Impact of growth rate on dynamics r = 3.7 r = 3.9
  33. 33. Period doubling and chaos <ul><li>Increasing growth rate in a saturation system leads to oscillations with increasing frequency </li></ul><ul><li>For growth rates r = [3,4), a phenomenon called “period doubling” or “bifurcations” is witnessed with oscillations developing sub-oscillations </li></ul><ul><li>The rate at which sub-oscillations develop in the logistic equation is known to be a constant (~ 4.66920) called the Feigenbaum’s constant </li></ul><ul><li>When r ¸ 4, the system breaks down </li></ul>
  34. 34. Period doubling in the logistic equation
  35. 35. Attractors <ul><li>A stable non-linear system eventually displays an “attractor” pattern </li></ul><ul><li>Attractor patterns can be “emergent” or “scale invariant” </li></ul><ul><li>Emergence: Aggregate property that cannot be seen in the individual parts </li></ul><ul><li>Scale invariance: Sub-systems displaying the same properties as the aggregate </li></ul>
  36. 36. Emergent Attractors
  37. 37. Emergent Attractors
  38. 38. Emergent Attractors
  39. 39. Scale-invariant attractors
  40. 40. Scale-invariant attractors
  41. 41. Part III Social information networks
  42. 42. Outline for Part III <ul><li>Random graphs </li></ul><ul><ul><li>Largest connected component </li></ul></ul><ul><ul><li>Small-world networks </li></ul></ul><ul><li>Information cascades </li></ul><ul><li>Emergence of network topology </li></ul>
  43. 43. Machines Societies <ul><li>Designed for a specific purpose </li></ul><ul><li>Structure, a result of design </li></ul><ul><li>Complementary components </li></ul><ul><li>Component dynamics need coordination </li></ul><ul><li>Made up of autonomous actors pursuing self-interest </li></ul><ul><li>Structure an emergent property -- result of evolution </li></ul><ul><li>Actor dynamics need management </li></ul>Machines of nature – living beings – are more like societies rather than machines
  44. 44. Social information networks <ul><li>Information networks formed in a society of autonomous actors </li></ul><ul><li>Network connections typically a function of self-interest dynamics </li></ul><ul><li>Resulting network structure interesting for its attractor properties </li></ul>
  45. 45. Random graphs <ul><li>Simplest form of social network models </li></ul><ul><li>Given a population of nodes, edges are randomly added </li></ul><ul><li>Properties to observe: </li></ul><ul><ul><li>Size of the largest connected component (system connectivity) </li></ul></ul><ul><ul><li>Diameter of the graph (maximum degree of separation) </li></ul></ul>
  46. 46. Random graphs <ul><li>Largest connected component </li></ul><ul><ul><li>Measures system connectivity </li></ul></ul><ul><ul><li>Calibrates the spread of ideas and influence </li></ul></ul><ul><li>Diameter of the graph </li></ul><ul><ul><li>Measures the degree of separation </li></ul></ul><ul><ul><li>Calibrates distortion (or lack of it) in the spread of ideas and influence </li></ul></ul><ul><li>Large connected component </li></ul><ul><ul><li>Useful for disseminating information </li></ul></ul><ul><li>Small degree of separation </li></ul><ul><ul><li>Useful for business connections to develop </li></ul></ul>
  47. 47. Largest connected component
  48. 48. Largest connected component <ul><li>Connectivity in a system with n nodes witnesses an inflection roughly when n/2 random edges are added </li></ul><ul><li>With n random edges, roughly 80% of the system is connected </li></ul><ul><li>Connectivity starts saturating around 4n random edges </li></ul>
  49. 49. Random graph diameter
  50. 50. Random graph diameter <ul><li>Adding random edges increases connectivity, but also increases the overall degree of separation! </li></ul><ul><li>Degree of separation starts reducing after reaching a peak value </li></ul><ul><li>(More communication links makes the world bigger before it becomes smaller) </li></ul><ul><li>Small world networks: Networks having a diameter much less than the number of nodes </li></ul>
  51. 51. Clustered graphs <ul><li>Social networks are better modeled as clustered graphs , rather than pure random graphs </li></ul><ul><li>Clustered graph property: If A knows B and C, then with a very high probability, B and C know each other </li></ul><ul><li>Random or “long distance” edges link disparate clusters or communities </li></ul>
  52. 52. Clustered graphs in metric spaces <ul><li>Nodes arranged in a metric space (having a distance function between node pairs) </li></ul><ul><li>Clustering probability proportional to distance </li></ul><ul><li>Random connections reduce as distance increases </li></ul>
  53. 53. Clustered graphs in metric spaces <ul><li>Node u connects to node v with a probability of:  (u,v) -   where  (u,v) is the distance between u and v and  is the “clustering coefficient.” </li></ul>
  54. 54. Clustered graphs in metric spaces <ul><li>When  is high, the network becomes a clustered graph. </li></ul><ul><li>Network has a large number of local connections, making it easy to navigate </li></ul><ul><li>It has very small number of long-distance connections making the diameter high. </li></ul>
  55. 55. Clustered graphs in metric spaces <ul><li>When  is small, long distance connections are as frequent as local connections </li></ul><ul><li>With enough edges, the diameter of the graph becomes small </li></ul><ul><li>But navigability suffers! Even though short paths exist, it is not possible to discover them from local information </li></ul>
  56. 56. Kleinberg connectivity <ul><li>At a critical value of  = 2, the clustering property of large  and small world property of small  balance each other </li></ul><ul><li>Such a graph not only has a short diameter, but short paths are also discoverable from local information </li></ul><ul><li>Such connectivity is also called Kleinberg connectivity </li></ul>
  57. 57. Kleinberg connectivity <ul><li>An optimal graph structure balancing spread of information and minimizing distortion </li></ul><ul><li>Alternate way of verifying Kleinberg connectivity: A node as the same connectivity with nodes at different levels of granularity </li></ul><ul><li>Example: If you have n friends who live in the same street, n friends in the city, n friends in the country, n friends across the world; you’ve started a Kleinberg connectivity. </li></ul>
  58. 58. Information cascades <ul><li>Spread of information/ideas/fads across large populations </li></ul><ul><li>Two critical factors determining information cascades: </li></ul><ul><ul><li>Network configuration </li></ul></ul><ul><ul><li>“Conformity” </li></ul></ul>
  59. 59. Asch conformity experiment
  60. 60. Asch conformity experiment <ul><li>A majority of the subjects decided to conform to the group opinion, even though the correct answer was starkly visible! </li></ul><ul><li>The probability of conformance was found to be a function of the ratio of the majority versus minority, rather than absolute numbers </li></ul>
  61. 61. Conformity and cascades A is more likely to adopt a new idea spreading through the network as compared to B
  62. 62. Information cascades An idea originating from ‘a’ cascades to b, c and h when the conformity threshold is 0.5. It never cascades to ‘d’ because d is under pressure to conform to status quo from e, f and g.
  63. 63. Information cascades <ul><li>Too little connectivity: insufficient exposure, not conducive for information cascades </li></ul><ul><li>Too much connectivity: inertia and conformance, not conducive for information cascades </li></ul><ul><ul><li>In stark contrast to the epidemic spread of diseases – high connectivity means greater chances of epidemics </li></ul></ul>
  64. 64. Emergence of network topology [Venkatasubramanian et. al 2004] <ul><li>Given a society of n actors (nodes) </li></ul><ul><li>Each actor has survival demands, the supply for which may exist anywhere in the network </li></ul><ul><li>Communication network has three optimization criteria: </li></ul><ul><ul><li>Efficiency </li></ul></ul><ul><ul><li>Robustness </li></ul></ul><ul><ul><li>Cost </li></ul></ul>
  65. 65. Emergence of network topology <ul><li>Cost: Each communication channel (edge) adds to the cost. Cost is kept constant by giving each node only one edge </li></ul><ul><li>Efficiency: The system is efficient if the all-pairs separation between nodes is minimized </li></ul><ul><li>Robustness: The system is robust if the network remains connected in the face of node failures </li></ul>
  66. 66. Emergence of network topology <ul><li>Topology Breeding: </li></ul><ul><ul><li>Cost is kept constant by giving each node exactly one edge </li></ul></ul><ul><ul><li>Robustness is bounded by allowing the failure of any one node </li></ul></ul><ul><ul><li>Random topologies are generated and combined. Topologies with lower fit functions are discarded </li></ul></ul><ul><ul><li>Fit calculated by a parameter  that trades between efficiency and robustness </li></ul></ul>
  67. 67. Emergence of network topology <ul><li>Emergent topology when  = 1 (100% importance to efficiency and 0% importance to robustness) </li></ul><ul><li>Star has the smallest degree of separation for a network of n nodes and n edges </li></ul><ul><li>Failure of the central node disconnects the society </li></ul>
  68. 68. Emergence of network topology <ul><li>Emergent topology when  = 0 (100% importance to robustness and 0% importance to efficiency) </li></ul><ul><li>Circle keeps the society connected in the face of single node failure </li></ul><ul><li>High degree of separation (not efficient) </li></ul>
  69. 69. Emergence of network topology <ul><li>Emergent topology when  = 0.78 </li></ul><ul><li>Intermediate values of  gives a variety of “hub and spoke” topologies – combinations of circle and star </li></ul><ul><li>When n ! 1 degree distribution in the hub and spoke resembles a power-law </li></ul>
  70. 70. Perceived value and saturation <ul><li>In a society, actors connect to one another to receive “value” </li></ul><ul><li>In making a decision to connect to somebody, there “perceived value” function to be optimized </li></ul><ul><li>Following cases of networks: </li></ul><ul><ul><li>Small number of partners (costly connections, material exchange networks) </li></ul></ul><ul><ul><li>Large number of partners (frictionless connections, information networks) </li></ul></ul>
  71. 71. Perceived value and saturation <ul><li>When an actor connects to another actor i , there is a perceived value v i attached to that actor </li></ul><ul><li>In addition, there a satisfaction value or saturation limit S for each actor </li></ul><ul><li>Connections are established until the accumulated perceived value reaches the required saturation limit </li></ul><ul><li>Law of diminishing returns: The perceived value assigned to the k th node decreases as k increases even if the intrinsic value provided by the node is the same. </li></ul><ul><li>cumulative value at node j: </li></ul>
  72. 72. Perceived value and saturation <ul><li>As z ! 1 , cumulative value at any node j can be approximated </li></ul><ul><li>as S j z = v [ln z + c] </li></ul><ul><li>Setting the intrinsic value v = 1 the average global satisfaction </li></ul><ul><li>metric is now given by S = h S j z i = c + h ln z (j) i </li></ul><ul><li>In other words, global satisfaction measure grows as a function </li></ul><ul><li>of the log of the average degree distribution. </li></ul>
  73. 73. Perceived value and saturation <ul><li>Maximum Entropy: </li></ul><ul><li>In addition to saturation, connections are assumed to be made in a least biased fashion so as to minimize the latent uncertainty about the connection in the face of failures. </li></ul><ul><li>The resultant distribution of node degrees can be formulated using the maximum entropy principle under the constraint for the global satisfaction function: </li></ul><ul><li>S /h ln z i </li></ul><ul><li>As z ! 1 , we get a power-law distribution: </li></ul>
  74. 74. The power-law network is hence an optimal network topology in frictionless transactions arising out of a number of individual decisions aiming to maximize value and minimize uncertainty!
  75. 75. Thank You! Q & A
  76. 76. Further reading <ul><li>L. A. Adamic. Zipf, Power-laws and Pareto: A ranking tutorial. HP Labs technical report. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html </li></ul><ul><li>Karthik B.R., Aditya Ramana Rachakonda, Srinath Srinivasa. Strange Central-Limit Properties of Keyword Queries on the Web. IIITB Technical Report 2007. </li></ul><ul><li>Jon Kleinberg. The small-world phenomena: An algorithmic perspective. 2000. http://www.cs.cornell.edu/home/kleinber/swn.ps </li></ul><ul><li>Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, Volume 286, 509–512, 1999. </li></ul><ul><li>M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics Vol 1, No. 2, 226–251, 2003. </li></ul><ul><li>M. E. J. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary Physics Vol 46, 323–351. </li></ul><ul><li>Venkat Venkatasubramanian, Santhoji Katare, Priyan R. Patkar, Fang-ping Mu. Spontaneous emergence of complex optimal networks through evolutionary adaptation. Computers and Chemical Engineering , Vol 28, pp 1789—1798, 2004. </li></ul><ul><li>Venkat Venkatasubramanian, Dimitris Politis, Priyan Patkar. Entropy maximization as a holistic design principle for complex, optimal networks. AIChE (American Institute for Chemical Engineers) Journal, Vol. 52, No. 3, pp 1004—1009, March 2006. </li></ul>

×