Aristotle University, Department of MathematicsMaster in Web Sciencesupported by Municipality of VeriaStochastic Modeling of Web EvolutionS. Amarantidis, I. Antoniou,  M. VafopoulosMathematics Department Aristotle University of Thessaloniki      SMTDA 2010Chania, Crete, Greece
ContentsWhat is the Web?What are the issues and problems?The Web as a Complex SystemQuery-Web ModelsStochastic Models and the Web2
What is the Web?Internet ≠ Web Web: a system of interlinked hypertext documents (html) with unique addresses (URI) accessed via the Internet (http)3
Web milestones41992: Tim Berners-Lee presents the idea in CERN1993: Dertouzos (MIT) andMetakides (EU) create W3C appointing TBL as director
Why the Web is so successful?Is based on architecture (HTTP, URI, HTML) which is: simple, free or cheap, open source, extensibletolerantnetworked fun & powerfuluniversal5
Why is so successful?New experience of exploring & editing huge amount of information, people, abilities anytime, from anywhereThe biggest human system with no central authority and control but with log data (Yotta* Bytes/sec)Has not yet revealed its full potential…*102486
We knew the Web was big... 1 trillion unique URIs(Google blog 7/25/2008)2 billion usersGoogle: 300 million searches/dayUS: 15 billion searches/month72% of the Web population are active on at least 1 social network …7Source blog.usaseopros.com/2009/04/15/google-searches-per-day-reaches-293-million-in-march-2009/
Web: the new continentFacebook: 400 million active users50% of our active users log on to Facebook in any given day35 million users update their status each day60 million status updates posted each day3 billion photos uploaded to the site each monthTwitter: 75 million active users141 employees Youtube: 350 million daily visitorsFlickr: 35 million daily visitors8
Web: the new continentOnline advertising spending in the UK has overtaken television expenditure for the first time [4 billion Euros/year] (30/9/2009, BBC)In US, spending on digital marketing will overtake that of print for the first time in 2010Amazon.com: 50 million daily visitors60 billion dollars market capitalization24.000 employess9
Web generations10
Web: What are the issues and related problems?Safe surfing (navigating)Find relevant and credible information (example: research)Create successful e-businessReduce tax evasionEnable local economic developmentCommunicate with friends, colleagues, customers, citizens, voters,…11
Need to study the WebThe Web is the largest human information construct in history. The Web is transforming society…It is time to study it systematically as stand-alone socio-technicalartifact12
How to study the Web?Analyze the interplay among the:StructureFunctionEvolutionthe Web as a highly inter-connected                        large complex system13
 Web Modelingunderstand measure and model its evolution in order to optimize its social benefit through effective policies 14
What is the Structure of the WebThe Web as a Graph: Nodes: the websites (URI)   more than 1 trillionLinks: the hyperlinks     5 links per page (average) Weights: link assessment The WWW graph is a Directed Evolving Graph1520.5  1.23  0.21 2.14
Statistical Analysis of Graphs:The degree distribution P(k) = P(d ≤ k) is the distribution function of the random variable d that counts the degree of  a randomly chosen node.
 Statistical Analysis of the Web Graphfour major findings:power law degree distribution      (self-similarity)internet traffic: FaloutsosWeb links: Barabasismall world property      (the diameter is much smaller than the order of the graph)      easy communicationmany dense bipartite subgraphson-line property      (the number of nodes and edges changes with time)
Distribution of links on the World-Wide Web          P(k)∼ k−γ power law a, Outgoing links (URLs found on an HTML document); b, Incoming links Web.c,  Average of the shortest path between two documents as a function of system size [Barabasi,ea 1999]
Small World Property Social Communication Networks Watts-Strogatz(1998)Short average path lengths and high clustering. WWW Average Distance (Shortest Path) between 2 Documents:<ℓ> = 0.35 + 2.06 log(n)<ℓ> = 18.6, n = 8 x 108  (1999)<ℓ> = 18.9, n = 109  (2009)two randomly chosen documents on the web are on average 19 clicks away from each other. (Small World)
Web dynamicsSearch (PageRank, HITS, Markov matrices)Traffic Evolutiongraph generatorsGamesmechanism design (auctions)Queries-search engine-Web 20
Search The Hyperlink MatrixThe page rank vector π, is an eigenvector ofthe Hyperlink Markov matrix M, For the eigenvalue 1. π is a stationary distribution Mπ = ππ = (π(κ)), π(κ) = the pagerank of the web page κdimΜ = the number of the web pages                that can be crawled by search engines.
Basis of Google’s AlgorithmIf the Μarkovmatrix M is ergodic,     the stationary distribution vector ρ is unique.If the Μarkovmatrix Μ is mixing, then π is calculated as the limit  for every initial probability distribution ρ.The 2ndeigenvalueοf M estimates      the speed of Convergence
Internet TrafficPrigogine and Herman 1971 Stochastic model of vehicular traffic dynamics  based on statistical physics  between the macroscopic “fluid dynamics” model           and the individual vehicle model (1st order SDE)f0 is the "desired" velocity distribution function x and v are the position and velocity of the "vehicles“   is the average velocityc is the concentration of the "vehicles“P is the probability of "passing" in the sense of increase of flow, T is the relaxation time.
Adaptation of the Prigogine - Hermann Model  for the Internet Traffic [Antoniou, Ivanov 2002,2003]Vehicles = the Information PackagesStatistics of Information Packages: Log-Normal Distribution
The Origin of Power Law in Network Structure and Network TrafficKolmogorov 1941, The local structure of the turbulence in incompressibleviscous fluid for very large Reynolds numbers, Dokl. Akad. Nauk SSSR 30, 301.The origin of Self-Similar Stochastic Processes Model of the homogeneous fragmentationApplying a variant the central limit theorem, Kolmogorov found that the logarithms of the grain sizes are normally distributedBefore Fractals and Modern scale-free models, Wavelet Analysis of data [Antoniou, Ivanov 2002]
Evolution: Graph GeneratorsErdős-Rényi (ER)model [Erdős, Rényi ‘60]Small-world model [Watts, Strogatz ‘98]Preferential Attachment [Barabási, Albert ‘99]EdgeCopying models [Kumar et al.’99], [Kleinberg et al.’99],Forest Fire model [Leskovec, Faloutsos ‘05]Kroneckergraphs [Leskovec, Chakrabarti, Kleinberg, Faloutsos ‘07]Optimization-based models [Carlson,Doyle,’00] [Fabrikant et al. ’02]26
 Evolution: Game theoretic models  Stageman (2004) Information Goods and Advertising: An Economic Model of the InternetZsoltKatona and MiklosSarvary (2007) Network Formation and the Structure of the Commercial World Wide WebKumar (2009), Why do Consumers Contribute to Connected Goods27
Evolution: Queries- Search Engine -Web Kouroupas, Koutsoupias, Papadimitriou, SideriKKPS 2005Economic-inspired model (utility)Explains scale-free behaviorIn the Web three types of entities exist:Documents-i.e. web pages, created by authors [n] Users [m]Topics [k]k≤m≤n28
 the KKPS modelThe Search Engine recommends Documents to the Users A User obtains satisfaction (Utility) after presented with some Documents by a Search EngineUsers choose and endorse those that have the highest Utility for them, and thenSearch Engines make better recommendations based on these endorsements29
 Documents For each topic t ≤ kthere is a Document vector Dt of length n(relevance of Document d for Topic t)For Dtthe value 0 is very probable so that about k - 1 of every k entries are 030
 User-QueryThere are Users that can be thought as simple Queries asked by individuals. For each topic t there is a User vector Rtof length m, (relevance of User-Query i for Topic t)with about m/k non-zero entries31
 User-Querythe number of Documents proposed by the Search Engine is fixed and denoted by αthe number of endorsements per User-Query is also fixed and denoted by bb ≤ α ≤ n32
 the algorithm Step 1: A User-Query,for a specific Topic, isentered in the Search EngineStep 2: The Search Engine recommends αrelevant Documents. The listing order is defined by a rule. In the very first operation of the Search Engine the Documents the rule is random listing according to some probability distributionStep 3: Among the α recommended Documents, bareendorsed on the basis of highest Utility. In this way, the bipartite graph S= ([m], [n], L) of Document endorsements is formed. Compute the in-degree of the Documents from the endorsements 33
the algorithm Step 4: Repeat Step 1 for another Topic.Step 5: Repeat Step 2. The rule for Documents listing is the decreasing in-degree for the specific User-Query computed in Step 3.Step 6: Repeat Step 3.Step 7: Repeat Steps 4, 5, 6 for a number of iterations necessary for statistical convergence (“that is, until very few changes are observed in the www state” )  34
 utility 35
36
 results of statistical experimentsKKPS for a wide range of values of the parameters m, n, k, a, b, the in-degree of the documents is power-law distributedthe price of anarchy (efficiency of algorithm) improved radically during the first 2-3 iterations and later the improvement had a slower rate37
  results of statistical experimentsKKPS  When the number of topics k increases the efficiency of the algorithm increasesWhen a increases (the number of recommended documents by the search engine) the efficiency of the algorithm also increasesIncreasing b (number of endorsed documents per user) causes the efficiency of the algorithm to decrease38
results of statistical experimentsAmarantidis-Antoniou-Vafopoulos we extend the investigation in two directions:for Uniform, Poisson and Normal initial random distribution of Documents in-degree (Step 2) and for different values of α, b and k39
  results of statistical experimentsAmarantidis-Antoniou-Vafopoulos   in the case α=bthe validity of the power law becomes less significant as bincreasesb: number of endorsed documents per user a: the number of recommended documents by the search engine 40
  results of statistical experimentsAmarantidis-Antoniou-Vafopoulos   an increase in the number of Topics k, results faster decay of the power law exponent 41
Power law for the case b≤α
efficiency of the search algorithm α=befficiency of the search algorithm increases when the number of topics k increases[confirmation of KKPS results]
efficiency of the search algorithm in the case b≤α efficiency of the search algorithm increases when the number of recommended Documents by the Search Engine α increases[confirmation of KKPS results]
efficiency of the search algorithm b≤αefficiency of the search algorithm increases when the number of bof endorsed Documents per User-Query increases[KKPS results not confirmed]
 Discussion of statistical experimentsAmarantidis-Antoniou-Vafopoulos α=b: all recommended Documents are endorsed according to the highest in-degree criterion Utility is useful only in terms of establishing compatibility between Utility Matrix and the Users-Queries and Documents bipartite graph46
Discussion of statistical experimentsAmarantidis-Antoniou-Vafopoulos origin of the power law distribution of the in-degree of Documents, two mechanisms are identified in the KKPS model:Users-Queries endorse a small fraction of Documents presented (b)Assuming a small fraction of poly-topic Documents, the algorithm creates a high number of endorsements for themThe above mechanisms are not exhaustive for the real Web graph.      Indexing algorithms, crawler’s design, Documents structure and evolution should be examined as      possible additional mechanisms contributing to      the manifestation of the power law distribution47
Discussion on the Endorsement Mechanism“The endorsement mechanism does not need to be specified, as soon as it is observable by the Search Engine. For example, endorsing a Document may entail clicking it, or pointing a hyperlink to it.” This KKPS hypothesis does not take into account the fundamental difference between clicking a link (browsing) and creating a hyperlink. 48
 discussionWeb traffic is observable by the website owner or administrator through the corresponding log file and by third parties authorized (like search engine cookies which can trace clicking behavior or malicious 49
 discussion On the contrary, creating a hyperlink results in a more “permanent” link between two Documents which is observable by all Users-Queries and Search Engines. Therefore, the KKPS algorithm actually examines the Web traffic and not the hyperlink structure of Documents which is the basis of the in-degree Search engine’s algorithm 50
 discussion Web traffic as well as Web content editing, are not taken into account in the algorithms of Search engines based on the in-degree (i.e. Pagerank). These algorithms were built for Web 1.0 where Web content update and traffic monetization were not so significant51
 discussion  In the present Web 2.0 era with rapid change, the Web graph, content and traffic should be taken into account in efficient search algorithms. Therefore, birth-death processes for Documents and links and Web traffic should be introduced in Web models, combined with content update (Web 2.0) and semantic markup (Web 3.0) for Documents. 52
 discussion The discrimination between Users and Queries could facilitate extensions of the KKPS model: teleportation          (a direct visit to a Document which avoids Search Engines)different types of Users and relevance feedback between Documents and Queries 53
 discussion  KKPS: Utility is defined to be time invariant linear function of R and D which by construction is not affecting the www state when α=bnot take into account the dynamic interdependence of the Utility on the www state. In reality, the evolution of the www state will change both R and DA future extension of KKPS model should account for user behavior by incorporating Web browsing and editing preferences54
 discussion  useful to offer deeper insight in the Web’s economic aspects in the KKPS model:valuation mechanisms for Web traffic and link structures and monetizing the search procedure (sponsored search, digital goods, excludable, anti-rival goods etc) 55
Stochastic Models and the WebWebmetrics: statistical models for the Web function, structure & evolution in order to evaluate individual, business and public policies56
Master in web scienceis based on Web assessment, mathematical modeling and operation combined with business applications and societal transformations in the knowledge society. Web science studies, apart from Academic, Research and Training careers offer remarkable opportunities in Business.57WS.01 lecture 1: Web historyMichalis Vafopoulos
Master in web science58Michalis Vafopoulos

2010 06-08 chania stochastic web modelling - copy

  • 1.
    Aristotle University, Departmentof MathematicsMaster in Web Sciencesupported by Municipality of VeriaStochastic Modeling of Web EvolutionS. Amarantidis, I. Antoniou, M. VafopoulosMathematics Department Aristotle University of Thessaloniki SMTDA 2010Chania, Crete, Greece
  • 2.
    ContentsWhat is theWeb?What are the issues and problems?The Web as a Complex SystemQuery-Web ModelsStochastic Models and the Web2
  • 3.
    What is theWeb?Internet ≠ Web Web: a system of interlinked hypertext documents (html) with unique addresses (URI) accessed via the Internet (http)3
  • 4.
    Web milestones41992: TimBerners-Lee presents the idea in CERN1993: Dertouzos (MIT) andMetakides (EU) create W3C appointing TBL as director
  • 5.
    Why the Webis so successful?Is based on architecture (HTTP, URI, HTML) which is: simple, free or cheap, open source, extensibletolerantnetworked fun & powerfuluniversal5
  • 6.
    Why is sosuccessful?New experience of exploring & editing huge amount of information, people, abilities anytime, from anywhereThe biggest human system with no central authority and control but with log data (Yotta* Bytes/sec)Has not yet revealed its full potential…*102486
  • 7.
    We knew theWeb was big... 1 trillion unique URIs(Google blog 7/25/2008)2 billion usersGoogle: 300 million searches/dayUS: 15 billion searches/month72% of the Web population are active on at least 1 social network …7Source blog.usaseopros.com/2009/04/15/google-searches-per-day-reaches-293-million-in-march-2009/
  • 8.
    Web: the newcontinentFacebook: 400 million active users50% of our active users log on to Facebook in any given day35 million users update their status each day60 million status updates posted each day3 billion photos uploaded to the site each monthTwitter: 75 million active users141 employees Youtube: 350 million daily visitorsFlickr: 35 million daily visitors8
  • 9.
    Web: the newcontinentOnline advertising spending in the UK has overtaken television expenditure for the first time [4 billion Euros/year] (30/9/2009, BBC)In US, spending on digital marketing will overtake that of print for the first time in 2010Amazon.com: 50 million daily visitors60 billion dollars market capitalization24.000 employess9
  • 10.
  • 11.
    Web: What arethe issues and related problems?Safe surfing (navigating)Find relevant and credible information (example: research)Create successful e-businessReduce tax evasionEnable local economic developmentCommunicate with friends, colleagues, customers, citizens, voters,…11
  • 12.
    Need to studythe WebThe Web is the largest human information construct in history. The Web is transforming society…It is time to study it systematically as stand-alone socio-technicalartifact12
  • 13.
    How to studythe Web?Analyze the interplay among the:StructureFunctionEvolutionthe Web as a highly inter-connected large complex system13
  • 14.
    Web Modelingunderstandmeasure and model its evolution in order to optimize its social benefit through effective policies 14
  • 15.
    What is theStructure of the WebThe Web as a Graph: Nodes: the websites (URI) more than 1 trillionLinks: the hyperlinks 5 links per page (average) Weights: link assessment The WWW graph is a Directed Evolving Graph1520.5 1.23 0.21 2.14
  • 16.
    Statistical Analysis ofGraphs:The degree distribution P(k) = P(d ≤ k) is the distribution function of the random variable d that counts the degree of a randomly chosen node.
  • 17.
    Statistical Analysisof the Web Graphfour major findings:power law degree distribution (self-similarity)internet traffic: FaloutsosWeb links: Barabasismall world property (the diameter is much smaller than the order of the graph) easy communicationmany dense bipartite subgraphson-line property (the number of nodes and edges changes with time)
  • 18.
    Distribution of linkson the World-Wide Web P(k)∼ k−γ power law a, Outgoing links (URLs found on an HTML document); b, Incoming links Web.c, Average of the shortest path between two documents as a function of system size [Barabasi,ea 1999]
  • 19.
    Small World PropertySocial Communication Networks Watts-Strogatz(1998)Short average path lengths and high clustering. WWW Average Distance (Shortest Path) between 2 Documents:<ℓ> = 0.35 + 2.06 log(n)<ℓ> = 18.6, n = 8 x 108 (1999)<ℓ> = 18.9, n = 109 (2009)two randomly chosen documents on the web are on average 19 clicks away from each other. (Small World)
  • 20.
    Web dynamicsSearch (PageRank,HITS, Markov matrices)Traffic Evolutiongraph generatorsGamesmechanism design (auctions)Queries-search engine-Web 20
  • 21.
    Search The HyperlinkMatrixThe page rank vector π, is an eigenvector ofthe Hyperlink Markov matrix M, For the eigenvalue 1. π is a stationary distribution Mπ = ππ = (π(κ)), π(κ) = the pagerank of the web page κdimΜ = the number of the web pages that can be crawled by search engines.
  • 22.
    Basis of Google’sAlgorithmIf the Μarkovmatrix M is ergodic, the stationary distribution vector ρ is unique.If the Μarkovmatrix Μ is mixing, then π is calculated as the limit for every initial probability distribution ρ.The 2ndeigenvalueοf M estimates the speed of Convergence
  • 23.
    Internet TrafficPrigogine andHerman 1971 Stochastic model of vehicular traffic dynamics based on statistical physics between the macroscopic “fluid dynamics” model and the individual vehicle model (1st order SDE)f0 is the "desired" velocity distribution function x and v are the position and velocity of the "vehicles“ is the average velocityc is the concentration of the "vehicles“P is the probability of "passing" in the sense of increase of flow, T is the relaxation time.
  • 24.
    Adaptation of thePrigogine - Hermann Model for the Internet Traffic [Antoniou, Ivanov 2002,2003]Vehicles = the Information PackagesStatistics of Information Packages: Log-Normal Distribution
  • 25.
    The Origin ofPower Law in Network Structure and Network TrafficKolmogorov 1941, The local structure of the turbulence in incompressibleviscous fluid for very large Reynolds numbers, Dokl. Akad. Nauk SSSR 30, 301.The origin of Self-Similar Stochastic Processes Model of the homogeneous fragmentationApplying a variant the central limit theorem, Kolmogorov found that the logarithms of the grain sizes are normally distributedBefore Fractals and Modern scale-free models, Wavelet Analysis of data [Antoniou, Ivanov 2002]
  • 26.
    Evolution: Graph GeneratorsErdős-Rényi(ER)model [Erdős, Rényi ‘60]Small-world model [Watts, Strogatz ‘98]Preferential Attachment [Barabási, Albert ‘99]EdgeCopying models [Kumar et al.’99], [Kleinberg et al.’99],Forest Fire model [Leskovec, Faloutsos ‘05]Kroneckergraphs [Leskovec, Chakrabarti, Kleinberg, Faloutsos ‘07]Optimization-based models [Carlson,Doyle,’00] [Fabrikant et al. ’02]26
  • 27.
    Evolution: Gametheoretic models Stageman (2004) Information Goods and Advertising: An Economic Model of the InternetZsoltKatona and MiklosSarvary (2007) Network Formation and the Structure of the Commercial World Wide WebKumar (2009), Why do Consumers Contribute to Connected Goods27
  • 28.
    Evolution: Queries- SearchEngine -Web Kouroupas, Koutsoupias, Papadimitriou, SideriKKPS 2005Economic-inspired model (utility)Explains scale-free behaviorIn the Web three types of entities exist:Documents-i.e. web pages, created by authors [n] Users [m]Topics [k]k≤m≤n28
  • 29.
    the KKPSmodelThe Search Engine recommends Documents to the Users A User obtains satisfaction (Utility) after presented with some Documents by a Search EngineUsers choose and endorse those that have the highest Utility for them, and thenSearch Engines make better recommendations based on these endorsements29
  • 30.
    Documents Foreach topic t ≤ kthere is a Document vector Dt of length n(relevance of Document d for Topic t)For Dtthe value 0 is very probable so that about k - 1 of every k entries are 030
  • 31.
    User-QueryThere areUsers that can be thought as simple Queries asked by individuals. For each topic t there is a User vector Rtof length m, (relevance of User-Query i for Topic t)with about m/k non-zero entries31
  • 32.
    User-Querythe numberof Documents proposed by the Search Engine is fixed and denoted by αthe number of endorsements per User-Query is also fixed and denoted by bb ≤ α ≤ n32
  • 33.
    the algorithmStep 1: A User-Query,for a specific Topic, isentered in the Search EngineStep 2: The Search Engine recommends αrelevant Documents. The listing order is defined by a rule. In the very first operation of the Search Engine the Documents the rule is random listing according to some probability distributionStep 3: Among the α recommended Documents, bareendorsed on the basis of highest Utility. In this way, the bipartite graph S= ([m], [n], L) of Document endorsements is formed. Compute the in-degree of the Documents from the endorsements 33
  • 34.
    the algorithm Step4: Repeat Step 1 for another Topic.Step 5: Repeat Step 2. The rule for Documents listing is the decreasing in-degree for the specific User-Query computed in Step 3.Step 6: Repeat Step 3.Step 7: Repeat Steps 4, 5, 6 for a number of iterations necessary for statistical convergence (“that is, until very few changes are observed in the www state” ) 34
  • 35.
  • 36.
  • 37.
    results ofstatistical experimentsKKPS for a wide range of values of the parameters m, n, k, a, b, the in-degree of the documents is power-law distributedthe price of anarchy (efficiency of algorithm) improved radically during the first 2-3 iterations and later the improvement had a slower rate37
  • 38.
    resultsof statistical experimentsKKPS When the number of topics k increases the efficiency of the algorithm increasesWhen a increases (the number of recommended documents by the search engine) the efficiency of the algorithm also increasesIncreasing b (number of endorsed documents per user) causes the efficiency of the algorithm to decrease38
  • 39.
    results of statisticalexperimentsAmarantidis-Antoniou-Vafopoulos we extend the investigation in two directions:for Uniform, Poisson and Normal initial random distribution of Documents in-degree (Step 2) and for different values of α, b and k39
  • 40.
    resultsof statistical experimentsAmarantidis-Antoniou-Vafopoulos in the case α=bthe validity of the power law becomes less significant as bincreasesb: number of endorsed documents per user a: the number of recommended documents by the search engine 40
  • 41.
    resultsof statistical experimentsAmarantidis-Antoniou-Vafopoulos an increase in the number of Topics k, results faster decay of the power law exponent 41
  • 42.
    Power law forthe case b≤α
  • 43.
    efficiency of thesearch algorithm α=befficiency of the search algorithm increases when the number of topics k increases[confirmation of KKPS results]
  • 44.
    efficiency of thesearch algorithm in the case b≤α efficiency of the search algorithm increases when the number of recommended Documents by the Search Engine α increases[confirmation of KKPS results]
  • 45.
    efficiency of thesearch algorithm b≤αefficiency of the search algorithm increases when the number of bof endorsed Documents per User-Query increases[KKPS results not confirmed]
  • 46.
    Discussion ofstatistical experimentsAmarantidis-Antoniou-Vafopoulos α=b: all recommended Documents are endorsed according to the highest in-degree criterion Utility is useful only in terms of establishing compatibility between Utility Matrix and the Users-Queries and Documents bipartite graph46
  • 47.
    Discussion of statisticalexperimentsAmarantidis-Antoniou-Vafopoulos origin of the power law distribution of the in-degree of Documents, two mechanisms are identified in the KKPS model:Users-Queries endorse a small fraction of Documents presented (b)Assuming a small fraction of poly-topic Documents, the algorithm creates a high number of endorsements for themThe above mechanisms are not exhaustive for the real Web graph. Indexing algorithms, crawler’s design, Documents structure and evolution should be examined as possible additional mechanisms contributing to the manifestation of the power law distribution47
  • 48.
    Discussion on theEndorsement Mechanism“The endorsement mechanism does not need to be specified, as soon as it is observable by the Search Engine. For example, endorsing a Document may entail clicking it, or pointing a hyperlink to it.” This KKPS hypothesis does not take into account the fundamental difference between clicking a link (browsing) and creating a hyperlink. 48
  • 49.
    discussionWeb trafficis observable by the website owner or administrator through the corresponding log file and by third parties authorized (like search engine cookies which can trace clicking behavior or malicious 49
  • 50.
    discussion Onthe contrary, creating a hyperlink results in a more “permanent” link between two Documents which is observable by all Users-Queries and Search Engines. Therefore, the KKPS algorithm actually examines the Web traffic and not the hyperlink structure of Documents which is the basis of the in-degree Search engine’s algorithm 50
  • 51.
    discussion Webtraffic as well as Web content editing, are not taken into account in the algorithms of Search engines based on the in-degree (i.e. Pagerank). These algorithms were built for Web 1.0 where Web content update and traffic monetization were not so significant51
  • 52.
    discussion In the present Web 2.0 era with rapid change, the Web graph, content and traffic should be taken into account in efficient search algorithms. Therefore, birth-death processes for Documents and links and Web traffic should be introduced in Web models, combined with content update (Web 2.0) and semantic markup (Web 3.0) for Documents. 52
  • 53.
    discussion Thediscrimination between Users and Queries could facilitate extensions of the KKPS model: teleportation (a direct visit to a Document which avoids Search Engines)different types of Users and relevance feedback between Documents and Queries 53
  • 54.
    discussion KKPS: Utility is defined to be time invariant linear function of R and D which by construction is not affecting the www state when α=bnot take into account the dynamic interdependence of the Utility on the www state. In reality, the evolution of the www state will change both R and DA future extension of KKPS model should account for user behavior by incorporating Web browsing and editing preferences54
  • 55.
    discussion useful to offer deeper insight in the Web’s economic aspects in the KKPS model:valuation mechanisms for Web traffic and link structures and monetizing the search procedure (sponsored search, digital goods, excludable, anti-rival goods etc) 55
  • 56.
    Stochastic Models andthe WebWebmetrics: statistical models for the Web function, structure & evolution in order to evaluate individual, business and public policies56
  • 57.
    Master in webscienceis based on Web assessment, mathematical modeling and operation combined with business applications and societal transformations in the knowledge society. Web science studies, apart from Academic, Research and Training careers offer remarkable opportunities in Business.57WS.01 lecture 1: Web historyMichalis Vafopoulos
  • 58.
    Master in webscience58Michalis Vafopoulos

Editor's Notes

  • #27 Add people from audience, domingos?