Your SlideShare is downloading. ×
The Google Pagerank algorithm - How does it work?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The Google Pagerank algorithm - How does it work?

3,084
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,084
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
69
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. QAB Term 1Markov Chains and Google Inc. GUSTAVO ARGUELLO KUNDAN BHADURI VERITY NOBLE IMBA NOV 2010 N1 IE BUSINESS SCHOOL MARIA DE MOLINA 11 MADRID 28002 SPAIN
  • 2. QAB Term 1 Project: Markov Chains and Google Inc.Table of ContentsImplementing Markov Chains with Google PageRank ......................................................................................................... 2Issues to be addressed ......................................................................................................................................................... 3Techniques that may be used to overcome the problem of solving such a large system ................................................... 4Exhibit 1: A sample 4-state Markov chain with transition probabilities .............................................................................. 6Exhibit 2: Sample 4X4 transition Matrix ............................................................................................................................... 6Exhibit 3: Explaining the basis of Markov’s chain ................................................................................................................ 6Exhibit 4: Demonstrating the stable state values using simple matrix multiplication ......................................................... 7Exhibit 5: Calculating the steady state eigen values πA and πE ............................................................................................ 8Exhibit 6: The improved Google PageRank algorithm.......................................................................................................... 8Exhibit 7: PageRank of the search string ‘Techbend blog’ ................................................................................................... 9Exhibit 8: The correlation between a webpage and the rest of the web ............................................................................ 9Exhibit 9: KundanBhaduri.com and its links to other sites................................................................................................. 10Exhibit 10: Applying Markov Chain method to calculate the PageRank for ‘TechBend blog’ ........................................... 11Exhibit 11: Computing a small Eigen value with Power Method ....................................................................................... 12IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 1
  • 3. QAB Term 1 Project: Markov Chains and Google Inc.Implementing Markov Chains with Google PageRankIn its most basic form, a homogeneous Markov chain (Exhibit 1) simply refers to a series of events/actions that followone another and that are independent of each other, while the transition from one state to another is memory-less.More scientifically, a Markov chain is a collection of random variables {Xt} which holds the property that given thecurrent state, the future is conditionally independent of the past.1 The collection of these variables is shown in asquare matrix which is known as the Transition Matrix. Therefore, we can classify a problem to be solvable by thetheory of Markov chains if it bears the following characteristics: a) At any point in time, any of the objects should be in one and exactly one defined state. At the end of the period, the object can move to a new state or remain in its original state 2. b) The objects move between states based on the transition probabilities (Exhibit 2) that depend on only the current state. The sum of all probabilities of moving to all possible states should be one. c) The transition probabilities (of going from A to B) remain constant over time.In order to develop an understanding of how to solve the Markov chain, assume that the simple 2-state chain inExhibit 2 describes a simple website. A user typically clicks a link on the homepage (E) for 70% of the time that leadsher to page (A), while the remaining 30% of the time, the user clicks a link that keeps her on the same page (E).Similarly, once the user is on page (A), 40% of the times, the user clicks another link back to (E) and the remaining 60%of the time the user clicks a link that keeps her on the same page (E). The Markov chain can help us find theprobabilities of a random user being present on any page after X number of iterations of this chain. The websiteadministrator might want to use this information in order to decide as to which page to focus on for maximising his adrevenue. Please note that Google’s implementation of the Markov Chain is that of a Non-Absrobing Markov Chain.In order to solve this problem, we start by using the tree method of calculating 2nd level probability Pij (2) i.e. theprobability of going from any node i to j in the 2nd iteration, where i, j belong to E or A as given in Exhibit 4. Here weobserve that the probability of landing on the page A are now 63% and 64% respectively if the user was at E and Arespectively at the end of the first iteration. Following this method, if we continue working for up to 7 iterations, wewill realize that the probability values have reached a steady state and do not change anymore.In order to find the steady state probability values of both the webpages, we use the steady state equation of π =π*P and solve as shown in Exhibit 5. This establishes the Eigen values of πA and πE as 0.63 and 0.37 respectively.Therefore, we can recommend that it is wiser to spend advertising effort on the page A since in the long run it is twiceas likely to attract clicks as page E. As we progress towards looking at how Google ranks pages according to theirrelevance, it will be interesting to note that their Eigen values play a significant part.Markov chains have significant use in industrial research, organization behaviour, financial markets analysis, humanresource planning, marketing forecast etc. A very interesting use of Markov’s chain has been in the music industry. Asearly as in the 1950s, music composers used the Markov Chain to study the pattern of notes in popular songs3 andthereby create new music sequences based on the studied musical notes.The example of linked webpages that we discussed above can now be extrapolated to calculate the probability ofarriving at any webpage for a certain search criteria, if the entire World Wide Web is considered as a large connected,memoryless chain. Based on the relevance criterion, we can estimate the highest relevance factor, and therefore anypage’s utility rank for a search string. This is the rationale behind Google’s patented PageRank algorithm.1 Weisstein, Eric W. "Markov Chain." From MathWorld - A Wolfram Web Resource. http://mathworld.wolfram.com/MarkovChain.html2 Tamara Lynn Anthony, Rice University: Markov Chains3 Verbeurgt Karsten, Dinolfo Michael, Fayer Mikhail: Extracting Patterns in Music for Composition via Markov ChainsIMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 2
  • 4. QAB Term 1 Project: Markov Chains and Google Inc.Google’s PageRank algorithm4 is a stochastic algorithm that determines the significance of a page relative to a searchstring. This is not the only factor that Google adopts to rank pages, but it is an important one. For Google (or for a webadministrator), the PageRank of a page denotes the real probability of a random web surfer reaching that page afterclicking on many links. The PageRanks form a probability distribution over web pages, explaining why the sum ofPageRank of all pages is 1. Refer to Exhibit 6 for a mathematical representation of the PageRank algorithm. Essentially,the Google PageRank method will rank those pages higher (i.e. more important) that have links to other higher rankedor more important pages.Let us explain the algorithm with a real-life example: One of the co-authors of this report is an active Technologyblogger and writes a blog called “The TechBend” at www.KundanBhaduri.com. Exhibit 7 shows that the GooglePageRank of the search string “Techbend blog” is highest for www.KundanBhaduri.com and it thus appears on top ofGoogle’s search results. Interestingly, while there are other professional sites and blogs with domain names such aswww.TechBend.com etc, yet they do not figure anywhere close to the top of the search results on Google. Let usexplore how this was achieved using the application of Markov Chain.Holistically, the internet as we know is a connected graph of interlinked webpages (Exhibit 8). Therefore, it will havean exhaustively large transition probability matrix. One look at Exhibit 9 tells us that for the homepage of TheTechbend to rank high on Google’s PageRank, its Eigen value has to be higher than all other competing webpages thathave the same context. More specifically, Eigen values on connections to those nodes (webpages) in the matrix haveto be high which themselves have high Eigen values with other connections. In other words, the probability of reachingour target page will be high when coming from another high-probability page. We tested this logic with Exhibits 3 and5 where we saw that A achieved a higher Eigen value because it was more probable to arrive at A from E or to remainon A itself. This logic is at the core of Google’s PageRank.In our example, www.KundanBhaduri.com does achieve a higher PageRank by linking itself with other highlyprominent websites such as Techcrunch, Engadget and TED. Since these sites enjoy a higher PageRank, by linkingthemselves back to The Techbend Blog, the overall probability of a random surfer arriving at www.KundanBhaduri.comis higher than it is for www.TechBend.com. This is explained by a higher Eigen Value (Exhibit 10) and therefore ahigher PageRank for The Techbend. An important factor that needs to be emphasized here is that it is not just aboutthe number of links that a webpage exchanges with another but its relative importance in the universe of all such links.Issues to be addressedHowever, since the internet is an exhaustively large set of nodes (over 1 trillion)5, there are some issues that need tobe addressed to make the Markov Chain model functional for Google PageRank. Firstly, the calculation of the EigenVector for such a large (and growing) matrix is non-trivial. We will address this issue in the second part of the report.Other than that, the issues related to handling dangling nodes (i.e. dead pages) and calculating an appropriatedamping factor are significant. The damping factor refers to the probability that the random user will not abruptly endthe session (by either exiting the browser or typing a new URL). In order to avoid a situation of creating an absorbingMarkov chain, pages with no outbound links are assumed to link out to all other pages in the collection. TheirPageRank scores are therefore divided evenly amongst all other pages.Calculating the preliminary transition matrix of the web is also a significant challenge given the massive size of theworldwide web. Therefore, a workaround to this problem is by ‘guessing’ the transition matrix and then progressivelycorrecting the value. Since Google recalculates the PageRanks every time it crawls through the web, its approximationdecreases with each iteration.4 Hwai-Hui Fu , Dennis K. J. Lin and Hsien-Tang Tsai (Dept. of Bus. Administration, Shu-Te University): Applied Stochastic models in Business and Industry5 Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011.IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 3
  • 5. QAB Term 1 Project: Markov Chains and Google Inc.Techniques that may be used to overcome the problem of solving such a largesystem:Now that we understand how Google was able to apply a form of Markov Chain modelling to create their PageRank system,we will address one of the most significant problems they faced, solving the system π = π P. Solving this equation in a smallmatrix we can quickly find exact solutions. When the web was much smaller, Google could compute the steady state vectorof 26 million pages in about 2 hours6. The resulting computation would then be used for a fixed period of time. However,because of the sheer size of the World Wide Web, which Google asserts the number of websites is now over the 1 trillionmark7, the resulting stochastic matrix will now contain over a trillion rows and columns.Additionally, given the dynamics of Web 2.0, it would no longer be efficient for Google to use the stale data from thesecomputations for a fixed time interval. “Today, Google downloads the web continuously, collecting updated pageinformation and re-processing the entire web-link graph several times per day”8. In sum, the ever changing, and everexpanding nature of the World Wide Web and its content, coupled with the search engine’s commitment to provide thebest information available, only serves to multiply exponentially the problem of solving the aforementioned system.If you think about it, the resulting matrix of the web, with it’s over a trillion columns and rows, is going to be composedmostly of zeroes, given that most webpages link to a very tiny and limited number of additional web pages. In fact, a 2004study shows that the average number of out-links from a given webpage is just 52, hence only 52 of the remaining trillionelements are non-zero.9 This means that the web matrix is very sparse.In order to solve this problem, one of the main tools that can be used (or a variation thereof that Google appears to haveimplemented), is called “The Power Method” or “Power Iteration”. This method applied to the Google matrix will convergeto the PageRank vector, in other words, it will ultimately help us define the weighting or importance of our webpagesrelative to the entire matrix. The power method is an iterative process for approximating eigenvalues; we will use thismethod to find our dominant Eigenvalue and Eigenvector. “Eigenvectors of a square matrix are the non-zero vectorswhich, after being multiplied by the matrix, remain proportional to the original vector".10 In order to implement thismethod, we must assume that our matrix, which we will now refer to as matrix A, has a dominant eigenvalue withcorresponding dominant eigenvectors. The dominant eigenvector of a matrix is an eigenvector corresponding to theeigenvalue of largest magnitude of that matrix. In order to approximate a dominant eigenvector we choose an initialapproximation of one of the dominant eigenvectors of A, which we will call π 0. Then we can form the following sequence11: π 1 = A π0 π 2 = A π 1 = A(A π 0) = A2 π 0 π 3 = A π 2 = A(A2 π 0) = A3 π 0 ⁞ π k = A π k-1 = A(Ak-1 π 0) = Ak π 0For large powers of k, this method provides a good approximation of the dominant eigenvector in matrix A. The methodrequires successive iterates until some convergence criterion is satisfied. With our dominant eigenvector, we can find ourdominant eigenvalue using the Rayleigh quotient, as follows12:6 Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011.<http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html>.7 Ibid.8 Ibid.9 Anuj Nanavati, Arindam Chakraborty, David Deangelis, Hasrat Godil, and Thomas D’Silva, An investigation of documents on the World Wide Web,h p://www.iit.edu/˜dsiltho/Inves ga on.pdf, December 2004.10 "Eigenvalues and Eigenvectors." Wikipedia, the Free Encyclopedia. 27 Sept. 2010. Web. 10 Feb. 2011.http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors.11 Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print.12 Ibid.IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 4
  • 6. QAB Term 1 Project: Markov Chains and Google Inc. λ= Aπ ∙π ___________________________ π∙π“In cases for which the power method generates a good approximation of a dominant eigenvector, the Rayleigh quotientprovides a correspondingly good approximation of the dominant eigenvalue”13.One of the unique features of the Google matrix, as we briefly mentioned before, is that the total number of nonzeroelements in a given row is quite small (due to the small number of hyperlinks that a given webpage might contain) (Exhibit11). Since all our computations involve this sparse matrix multiplied by vectors, an iteration of the power method isconsidered very cheap14.Another necessary technique Google implemented to make this system solvable was the fix to the dangling node problem.What happens when a user arrives at a webpage that does not link out to another webpage? Does our random surferbecome absorbed by this webpage, does he never leave? This is the dangling node problem, for which our Markov Chaincould categorize these nodes as absorbing states, unless we do something to correct this situation. Suppose the GoogleMatrix was called Matrix H. In order to correct for this, we could create a new matrix S = H + dw, where d is a columnvector that identifies dangling nodes and assigns either a 1 if the node is dangling or a 0 otherwise, and w is a row vector(w1, w2, …, wn) used to determine where our random surfer will go in order to not become absorbed. One way of assigningvalue to this row vector is to say that there is equal probability our surfer will land on any of the n webpages that exist, sothe row for w would look like this: ( … ). Whilst there are other ways to assign w, this is the most common, and issufficient for our purposes.Another important technique that may be used by Google to help solve the system is the inclusion of a damping factor. Thedamping factor is added in to account for the possibility that a given web surfer may at any time choose not to follow thelinks on a given webpage that are available to him and type in any URL in order to go to a page that is out of the currentchain. In fact, Brin and Page reference the damping factor in their original paper on Google (submitted while at Stanford),“The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85”15.While the damping factor is intended to model the behaviour of a random web surfer, it also serves the additionalpurpose of speeding up convergence of the power method. This is because the ratio of the two eigenvalues largest inmagnitude of the matrix determine how quickly the method converges16. It has been proven that the second largesteigenvalue of the Google matrix is less than or equal to the damping factor used17. The power method converges quicklywhen the damping factor is less than 1. According to Rebecca Wills, only 29 iterations are required for the differencebetween iterates to become less than 10-2 when using a damping factor of 0.85, the number of iterations goes up to 44when the damping factor is raised to 0.9018. Hence, the damping factor increases/speeds the solvability of this complexsystem by reducing the iterations necessary to assign PageRank vectors.While Google’s problem of solving this enormous system is certainly no easy task, especially not at the speed that theymight require. They have been able to overcome these significant obstacles through the unique application of certainexisting mathematical algorithms.13 Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print.14 Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11.15 Brin, S., and Page L.. "The Anatomy of a Large-scale Hypertextual Web Search Engine." Computer Networks and ISDN Systems 30.1-7 (1998): 107-17. Print.16 Gene H. Golub and Charles F. Van Loan, Matrix computations, 3rd ed., The Johns Hopkins University Press, 1996.17 Taher H. Haveliwala and Sepandar D. Kamvar, The second eigenvalue of the Google matrix, Tech. report, Stanford University, 2003.18 Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11.IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 5
  • 7. QAB Term 1 Project: Markov Chains and Google Inc.Exhibit 1: A sample 4-state Markov chain with transition probabilities P11 1 P12 2 P23 P24 P41 3 4 P34Exhibit 2: Sample 4X4 transition Matrix Exhibit 3: Explaining the basis of Markov’s chain1919 Image taken from http://en.wikipedia.org/wiki/Markov_chainIMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 6
  • 8. QAB Term 1 Project: Markov Chains and Google Inc.Exhibit 4: Demonstrating the stable state values using simple matrixmultiplication 0.3 0.7 0.4 0.6 P= Pij (2) = |P|2ij 0.3 0.7 0.3 0.7 0.37 0.63 0.4 0.6 0.4 0.6 0.36 0.64 * = P3 0.363 0.637 0.364 0.636 P4 0.3637 0.6363 0.3636 0.6364 P5 0.36363 0.63637 0.36364 0.63636 P6 0.363637 0.636363 0.363636 0.636364 P7 0.363636 0.636364 S 0.363636 0.636364 t a b P8 0.363636 0.636364 l 0.363636 0.636364 e P9 0.363636 0.636364 S 0.363636 0.636364 t a P10 0.363636 0.636364 t e 0.363636 0.636364 V P11 0.363636 0.636364 a 0.363636 0.636364 l u P12 0.363636 0.636364 e 0.363636 0.636364 sIMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 7
  • 9. QAB Term 1 Project: Markov Chains and Google Inc.Exhibit 5: Calculating the steady state eigen values πA and πE π = π*P 0.3 0.7 Therefore, π π = π π * 0.4 0.6 Solving these two equations: 1. π = 0.3*π +0.4*π 2. π = 0.7*π +0.6*π Also, we know that: 3. πE + πA = 1 Since equations 1 & 2 are similar, solving equations 2 and 3 together: π = 0.7*(1 − π ) +0.6*π Or, = 0.63 And, = 0.37Exhibit 6: The improved Google PageRank algorithm 1 ( ) ( ) ( ) PR(A) = 1 − ∗ + ∗ + + ⋯ + ∑ ( ) ∑ ( ) ( ) ( ) ( ) Where: • PR(A) is the PageRank of page A • PR(Ti) is the PageRank of pages Ti that link to page A • C(Ti) is the number of outbound links on page Ti • n is the total number of all pages that link to page A • N is the total number of all pages on the web.It is noteworthy that there is an adjusting damping factor involved in the calculation. The above equation representsthe final version of the PageRank algorithm with the damping factor being incorporated within the first argument onthe RHS of the equation.IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 8
  • 10. QAB Term 1 Project: Markov Chains and Google Inc.Exhibit 7: PageRank of the search string ‘Techbend blog’Exhibit 8: The correlation between a webpage and the rest of the web20 The importance of these links determines the overall importance of your webpage to the PageRank algorithm20 Laure Ninove, Cristobald de Kerchove , Paul Van Dooren: Université Catholique de Louvainhttp://www.esat.kuleuven.be/scd/golub/presentations/Gene_PVD.pdfIMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 9
  • 11. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 9: KundanBhaduri.com and its links to other sites TechCrunch Very high PageRank Rest of Internet Engadget Very high PageRankThe homepage of KundanBhaduri.com hosts the blog The TechBend TED Very high PageRank IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 10
  • 12. QAB Term 1 Project: Markov Chains and Google Inc.Exhibit 10: Applying Markov Chain method to calculate the PageRank for‘TechBend blog’Following is the probability matrix that shows the likelihood of a user clicking on a page to arrive at the homepage ofanother website when she is searching for the string “TechBend blog”. All site names here refer to their respectivehomepages, for the purpose of Markov chain analysis. m et com .co er n m com .co uri m I nt ch. t. ad o nd ge D.c n the … Bh Be Cru gad TE an ch of ch En nd Te st Te Re Ku KundanBhaduri.com 0.6 0.3 0.01 0.03 0.01 … … TechBend.com 0.42 0.1 0.12 0.01 0.11 … … Engadget.com 0.65 0.02 0.1 0.21 0.01 … … TED.com 0.54 0.22 0.1 0 0.09 … … TechCrunch.com 0.64 0.17 0.13 0.01 0 … … … 0.59 0.31 0.02 0.04 0.01 … … Rest of the Internet … … … … … … … Transition Probabilities of KundanBhaduri.com and TechBend.com For the Stable-state matrix π = π*P (1) We assume: Webpage Eigen Value KundanBhaduri.com πA TechBend.com πB Engadget.com πC TED.com πD TechCrunch.com πE Therefore using (1), we get: πA = πA *0.6 + πB*0.42 + πC*0.65 + πD*0.54 + πE*0.64 + …*0.59 + … (2) πB = πA *0.3 + πB*0.1 + πC*0.02 + πD*0.22 + πE*0.17 + …*0.31 + … (3)It is clear from equations (2) and (3) that πA >> πB considering that there are no other webpages on the internetthat are more important (i.e. have higher probability rank) than the pages described in the above table.Therefore, we conclude that KundanBhaduri.com will have a higher PageRank than TechBend.com for the searchterm ‘TechBend blog’IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 11
  • 13. QAB Term 1 Project: Markov Chains and Google Inc.Exhibit 11: Computing a small Eigen value with Power MethodWe know that: π = π*PFor a hypothetical π of the order 20X20, notice that most of the nodes are zero. This considerably reducesthe total cost of computing the π*P value, since sum of all the zero valued π row/column values will be zero. 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 4 0 0 4 0 0 9 0 0 7 0 0 0 1 0 9 0 0 6 0 0 12 0 8 0 0 8 0 0 5 0 0 2 0 8 0 7 0 0 8 0 0 4 0 2 0 2 0 5 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 8 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 7 8 0 6 0 6 0 8 1 0 0 0 0 0 0 0 0 8 0 0 0 0 9 0 0 0 2 0 1 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 0 0 0 0 8 0 0 7 Therefore πA = ∑ ∗ for value of j = 1 and k belongs to a value between πA to πBSince most of the values of the above terms are zero, we only need to count for rows 1 and 4 from the table above. Therefore, πA = 1 * πA + 8 * πD This helps us solve a large Markov transition probability matrix in a trivial way.IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 12