5. A B C D E
Backlinks (Links In) 1 2 2 1 3
OutLinks 1 1 3 2* 1
A
B
C
D
E
1-Damping Factor
New Value
@Dixon_Jones
6. A B C D E
Backlinks (Links In) 1 2 2 1 3
OutLinks 1 1 3 2 1
A
B
C
D
E
Damping Factor
New Value
@Dixon_Jones
7. A B C D E
Backlinks (Links In) 1 2 2 1 3
OutLinks 1 1 3 2 1
A
B
C
D
E
1 -Damping Factor
New Value
@Dixon_Jones
8. A B C D E
Backlinks (Links In) 1 2 2 1 3
OutLinks 1 1 3 2 1
A
B
C
D
E
1-Damping Factor
New Value
@Dixon_Jones
9. A B C D E
Backlinks (Links In) 1 2 2 1 3
OutLinks 1 1 3 2 1
A
B
C
D
E
1-Damping Factor
New Value
@Dixon_Jones
10. A B C D E
Backlinks (Links In) 1 2 2 1 3
OutLinks 1 1 3 2 1
A
B
C
D
E
1-Damping Factor
New Value
@Dixon_Jones
11. A B C D E
Backlinks (Links In) 1 2 2 1 2
OutLinks 1 1 3 2 1
A
B
C
D
E
1-Damping Factor
New Value
PageRank
Multiplier = Page Rank
X
(Damping factor)
Number Outlinks
Multiplier
0.15
@Dixon_Jones
12. A B C D E
Page Rank 1 2 2 1 2
Out links 1 1 3 2 1
Multiplier 0.85 1.7 0.566666667 0.425 1.7
A
B
C
D
E
1-Damping Factor .15 .15 .15 .15 .15
New Page Rank
Multiplier=
Page Rank X 0.85
Number Outlinks
@Dixon_Jones
13. A B C D E
Page Rank 1 2 2 1 2
Out links 1 1 3 2 1
Multiplier 0.85 1.7 0.566666667 0.425 1.7
A 0.85
B 1.7
C 0.566666667 0.566666667 0.566666667
D 0.425 0.425
E 1.7
1-Damping Factor .15 .15 .15 .15 .15
New Page Rank 0.716666667 1.141666667 1.85 1.85 1.141666667
Multiplier=
Page Rank X 0.85
Number Outlinks
@Dixon_Jones
14. My Excel Workings
@Dixon_Jones
Email talk@dixonjones.com with “PageRank” in the subject to join
the Mailing list and receive the Excel Spreadsheet.
15. A B C D E
Page Rank 1 2 2 1 2
Out links 1 1 3 2 1
Multiplier 0.85 1.7 0.566666667 0.425 1.7
A 0.85
B 1.7
C 0.566666667 0.566666667 0.566666667
D 0.425 0.425
E 1.7
1-Damping Factor .15 .15 .15 .15 .15
New Page Rank 0.716666667 1.141666667 1.85 1.85 1.141666667
First Iteration
@Dixon_Jones
16. A B C D E
Page Rank 0.716666667 1.141666667 2.7 1.85 1.141666667
Out links 1 1 3 2 1
Multiplier 0.85 1.7 0.566666667 0.425 1.7
A 0.85
B 1.7
C 0.566666667 0.566666667 0.566666667
D 0.425 0.425
E 1.7
1-Damping Factor .15 .15 .15 .15 .15
New Page Rank 0.915 1.70125 1.729583333 1.120416667 1.70125
Second Iteration…
@Dixon_Jones
17. A B C D E
Page Rank 0.915 1.70125 1.729583333 1.120416667 1.70125
Out links 1 1 3 2 1
Multiplier 0.85 1.7 0.566666667 0.425 1.7
A 0.85
B 1.7
C 0.566666667 0.566666667 0.566666667
D 0.425 0.425
E 1.7
1-Damping Factor .15 .15 .15 .15 .15
New Page Rank 0.742 1.457 2.617 1.726 1.457
Third Iteration…
@Dixon_Jones
26. Works best with Universal Data set
• Every signal is small
• Individually prone to error or opinion
• At scale the error decreases
• Confidence increases
http://info.majestic.com/universal
@Dixon_Jones
… which is just one reason why Google still cannot let it go… This is a Tweet from Google Gary.
Lastly – PageRank is not about Rankings, because Pure Pagerank does NOT consider context. So be very wary of using page metrics that are based on search visibility. Majestic’s Citation Flow is about the purest correlation to PageRank currently available, although the algorithm is a little different.
Of course – it is the Matrix form of the PageRank algorithm. The algorithm that has made Larry Page and Sergei Brin two of the riuchest, most powerful people in the world. This is the maths that built Google.
This says “The Pagerank of a page in this iteration equals 1 minus a damping factor, PLUS… for every link into the page (except for links to itself), add the page rank of that page divided by the number of outbound links on the page and reduced by the damping factor.”
Easy right?!
Well – maybe for a few of you. But this algorithm is fundamental in understanding links and in particular. Understanding why most links counts for nothing or almost nothing. When you get to grips with this algorithm, you will be light years ahead of other SEOs… but I never really see it properly explained. I guarantee that even if you know this algorithm inside out, you’ll see some unexpected results from this maths and you will also never use the phrase “Domain Authority” in front of a customer again.
I am not asking anyone here to know much more than simple Excel.
I am going to start by showing you how that maths applies to this representation of a VERY small Internet system with only 5 nodes. Then we will look at a very slightly different map which has profound consequences to our results.
Before we start, maybe have a look at this and GUESS which node has the highest PageRank (The head of the tadpole lines are the “arrows” to show the direction of the links).
So the Pagerank algorithm is called an Iterative algorithm. We start with some estiates and then we continually refine our understanding of the ecosystem we are measuring. So how can we see how this formula applies to this ecosystem?
Firstly, we need to create a matrix… we have nodes A to E. I’ll call them pages for now, because it is a terminology we understand, but the hardcore fans should know I mean “nodes”, as this is important later.
1: Start Value (In this case) is number of actual links to each “node”. Most people actually set this to 1 to start, but there are two great reasons for using link counts. First it is a better approximation to start with than giving everythinhg the same value, so the algorithm stabilizes in less iterations and it is so useful to check my spreadsheet in a second… so node A has one link in (from page C)
2: Now let’s map out all the blanks in a matrix…. Starting with every page cannot link to itself (OK… it can… but not in the algo)
Node A ONLY links to C
Node B ONLY links to C
Node C to A, B E
D – Links to B and 3 TIMES to E! Do you count it once or 3 times? I’m going to count it ONCE right now, but we’ll come back to that oddity later.
E only links out to D
So here’s the grid. We can check a few things here… 8 green boxes= number of links in our algorithm (if we only counted the 3 links from D to E once).
Also – note that the majority of this grid is red… most pages on the Internet do not link to each other.
This is a simplification of that formula. It’s not TOO scary now is it? So now we can add the multiplier to each column. This is how much of its value each link will pass on to pages it links to.
So – for example, Page A has PR 1, Multiplied by 0.85 and divided by its single outbound link. So the multiplier is .85
On page C, the PR = 2. the Multiplier 2 X 0.85 all divided by the three outbound links. This means each one lends a score of 0.566666.
(This presentation is not going to go into the case of when the Outlinks is zero.)
So now we go along the green boxes, filling in the green boxes. So…
Page A gives one link TO page C… each link it gives has a value of 0.85… so we put 0.85 in this box.
Page C links to THREE pages, giving 0.5666667 to each one…. And so on until the green boes are filled.
No… if you remember, we took off the damping factor before we started this, so we need to add the damping factor back to every page. This means the total amount of PageRank will stay stable.
Then we add up the columns, to find new PageRakns for each page!
Now that is really all there is to the PageRank Algorithm – but I did say it is iterative. So you need to do it again and again to get to the real PageRank for every page. I therefore cut and paste the values back into the start values to get the next iteration. My boxes are already referenced, so the next iteration is worked out instantly…
If you want to see my Excel spreadsheet, by the way, here’s what to do.
So I take the numbers at the bottom…
And put them into the top… giving me new numbers at the bottom, which I…
Cut and paste into the top again to get the third iteration…
So this is what happens to the numbers after 15 iterations…. Look at how the 5 nodes are all stabilising to the same numbers. If we had started with all pages being 1, by the way, which is what most people tell you to do, this would have taken many more iterations to get to a stable set of numbers.
So now we have done the maths, we can see which is the most important page on our Internet.
Was it the one you guessed? Well whether you said yes OR no…. It’s now time to reveal the wider story.
You recall I said “nodes” instead of “pages”? That’s because this was doing the PageRank at the lowest common denominator I had…. 5 nodes. But what If this were actually domains, not pages?...
So now we have 10 nodes, not 5… and IMPORTANTLY, we now have some internal linking….
Where do you thing the power will lie in this version of the Internet?
Am I made enough to do all of the calculations again? Oh Yeh…
You can see all the workings in the spreadsheet if you want, but here’s the shortened version…
… and here is the actual scores for every page.
The winning page being Node E1.
So the winning Domain was site C in the 5 node model, so if you had used the domain level modelling, you would have hoped for links from pages which amongst the WORST at the page level.
Page rank was only EVER done at the page level… Majestic does our calculations at top level, Subdomain level and Page level – and in the quest to show our customers higher link counts, we default to TLD first – as do our competitors… but it is the PAGE level that counts.
If you build a new site and only used Domain Authority to create links, you could EASILY have got linked from the worst page possible, even if it was from the best domain, because of the INTERNAL LINKS of the other web pages! How on earth are you going to be able to see the strength of a link if that strength depends on the internal links on an entirely different website?!
Second observation is that the data does not have to be complete, but it works best with a universal data set.
Back in 2014, one of our researchers wrote this blog post after a study using the PageRank algorithm ONLY on Wikipedia showed Carl Linnaeus as more influential than Jesus or Hitler.
Majestic’s Citation Flow, as a proxy to PageRank, could have told the researcher a different, more likely result, as our data uses a larger section of the Internet.
The next oddity is that the majority of pages have hardy any PageRank at all!. The top three pages in this 10 node model counts for 75-80% of the entire PageRank of the system.
The last oddity is – the original guess… of using Link Counts as an initial estimate for PageRank sucks as a metric. This chart has plotted the PageRank of each of the pages as an area. When we started, page C3 was our the best guess for the highest pagerank. But look at how much love it loses by the end of the modelling.
… which is just one reason why Google still cannot let it go… This is a Tweet from Google Gary.
Lastly – PageRank is not about Rankings, because Pure Pagerank does NOT consider context. So be very wary of using page metrics that are based on search visibility. Majestic’s Citation Flow is about the purest correlation to PageRank currently available, although the algorithm is a little different.
Of course – it is the Matrix form of the PageRank algorithm. The algorithm that has made Larry Page and Sergei Brin twpo of the riuchest, most powerful people in the world. This is the maths that built Google.
This says “The Pagerank of a page in this iteration equals 1 minus a damping factor, PLUS… for every link into the page (except for links to itself), add the page rank of that page divided by the number of outbound links on the page and reduced by the damping factor.”