The PageRank algorithm calculates the importance of web pages based on the structure of incoming links. It models a random web surfer that randomly clicks on links, and also occasionally jumps to a random page. Pages are given more importance if they are linked to by other important pages. The algorithm represents this as a Markov chain and computes the PageRank scores through an iterative process until convergence. It has the advantages of being resistant to spam and efficiently pre-computing scores independently of user queries.
This presentation describes in simple terms how the PageRank algorithm by Google founders works. It displays the actual algorithm as well as tried to explain how the calculations are done and how ranks are assigned to any webpage.
This presentation describes in simple terms how the PageRank algorithm by Google founders works. It displays the actual algorithm as well as tried to explain how the calculations are done and how ranks are assigned to any webpage.
The PageRank and HITS techniques are used for ranking the relevancy of web pages, through analysis of the hyperlink structure that links pages together
Machine Learning With Logistic RegressionKnoldus Inc.
Machine learning is the subfield of computer science that gives computers the ability to learn without being programmed. Logistic Regression is a type of classification algorithm, based on linear regression to evaluate output and to minimize the error.
Many state of the art machine learning applications today are based on artifical neural networks. In this talk we explore several commonly used neural network architectures. We identify the ideas behind their design, describe their topologies, outline their properties and discuss their use.
You might be enjoy this talk if you are interested in:
* Discovering some of the popular neural network types
* Learning about their design and how they work
* Understanding what are they are good for
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
This paper discusses a method of Generalizing PageRank algorithm for different types of networks. Rank of each vertex is considered to be dependent upon both the in- and out-edges. Each edge can also have differing importance. This solves the problem of dead ends and spider traps without the need of taxation (?).
---
Abstract— PageRank is a well-known algorithm that has been used to understand the structure of the Web. In its classical formulation the algorithm considers only forward looking paths in its analysis- a typical web scenario. We propose a generalization of the PageRank algorithm based on both out-links and in-links. This generalization enables the elimination network anomalies- and increases the applicability of the algorithm to an array of new applications in networked data. Through experimental results we illustrate that the proposed generalized PageRank minimizes the effect of network anomalies, and results in more realistic representation of the network.
Keywords- Search Engine; PageRank; Web Structure; Web Mining; Spider-Trap; dead-end; Taxation;Web spamming
The PageRank algorithm is an important algorithm which is implemented to determine the quality of a page on the web. With search engines attaining a high position in guiding the traffic on the internet, PageRank is an important factor to determine its flow. Since link analysis is used in search engine's ranking systems, link based spam structure known as link farms are created by spammers to generate a high PageRank for their and in turn a target page. In this paper, we suggest a method through which these structures can be detected and thus the overall ranking results can be improved.
The PageRank and HITS techniques are used for ranking the relevancy of web pages, through analysis of the hyperlink structure that links pages together
Machine Learning With Logistic RegressionKnoldus Inc.
Machine learning is the subfield of computer science that gives computers the ability to learn without being programmed. Logistic Regression is a type of classification algorithm, based on linear regression to evaluate output and to minimize the error.
Many state of the art machine learning applications today are based on artifical neural networks. In this talk we explore several commonly used neural network architectures. We identify the ideas behind their design, describe their topologies, outline their properties and discuss their use.
You might be enjoy this talk if you are interested in:
* Discovering some of the popular neural network types
* Learning about their design and how they work
* Understanding what are they are good for
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
This paper discusses a method of Generalizing PageRank algorithm for different types of networks. Rank of each vertex is considered to be dependent upon both the in- and out-edges. Each edge can also have differing importance. This solves the problem of dead ends and spider traps without the need of taxation (?).
---
Abstract— PageRank is a well-known algorithm that has been used to understand the structure of the Web. In its classical formulation the algorithm considers only forward looking paths in its analysis- a typical web scenario. We propose a generalization of the PageRank algorithm based on both out-links and in-links. This generalization enables the elimination network anomalies- and increases the applicability of the algorithm to an array of new applications in networked data. Through experimental results we illustrate that the proposed generalized PageRank minimizes the effect of network anomalies, and results in more realistic representation of the network.
Keywords- Search Engine; PageRank; Web Structure; Web Mining; Spider-Trap; dead-end; Taxation;Web spamming
The PageRank algorithm is an important algorithm which is implemented to determine the quality of a page on the web. With search engines attaining a high position in guiding the traffic on the internet, PageRank is an important factor to determine its flow. Since link analysis is used in search engine's ranking systems, link based spam structure known as link farms are created by spammers to generate a high PageRank for their and in turn a target page. In this paper, we suggest a method through which these structures can be detected and thus the overall ranking results can be improved.
Incremental Page Rank Computation on Evolving Graphs : NOTESSubhajit Sahu
Highlighted notes while doing research work under Prof. Dip Sankar Banerjee and Prof. Kishore Kothapalli:
Incremental Page Rank Computation on Evolving Graphs.
https://dl.acm.org/doi/10.1145/1062745.1062885
This paper describes a simple method for computing dynamic pagerank, based on the fact that change of out-degree of a node does not affect its pagerank (first order markov property). The part of graph which is updated (edge additions / edge deletions / weight changes) is used to find the affected partition of graph using BFS. The unaffected partition is simply scaled, and pagerank computation is done only for the affected partition.
The way in which the displaying of the web pages is done within a search is not a mystery. It involves applied math and good computer science knowledge for the right implementation. This relation involves vectors, matrixes and other mathematical notations. The PageRank vector needs to be calculated, that implies calculations for a stationary distribution, stochastic matrix. The matrices hold the link structure and the guidance of the web surfer. As links are added every day, and the number of websites goes beyond billions, the modification of the web link’s structure in the web affects the PageRank. In order to make this work, search algorithms need improvements. Problems and misbehaviors may come into place, but this topic pays attention to many researches which do improvements day by day. Even though it is a simple formula, PageRank runs a successful business. PageRank may be considered as the right example where applied math and computer knowledge can be fitted together.
PageRank is an algorithm used by the Google web search engine to rank websites in the search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
1. PageRank Algorithm
El Habib NFAOUI (elhabib.nfaoui@usmba.ac.ma)
LIIAN Laboratory, Faculty of Sciences Dhar Al Mahraz, Fes
Sidi Mohamed Ben Abdellah University, Fes
2018-2019
3. 1. Introduction
Hyperlinks are a special feature of the Web, which link Web pages to form a huge
network. They have been exploited for many purposes, especially for Web search.
Google’s early success was largely attributed to its hyperlink-based ranking algorithm
called PageRank, which was originated from social network analysis [1].
Two most well known Web hyperlink analysis algorithms: PageRank and HITS
(Hypertext Induced Topic Search).
4. 2. PageRank
PageRank algorithm was first introduced by L. Page, S.Brin (1998), and later became
the skeleton for Google’s Search Engine. Basically, PageRank algorithm calculates
the importance ranking of every web page using the hyperlink structure of the web.
Importance ranking is represented by a global score assigned to every web page.
PageRank is a static ranking of Web pages in the sense that a PageRank value is
computed for each page off-line and it does not depend on search queries. The
PageRank of a node will depend on the link structure of the web graph.
Given a query, a web search engine computes a composite score for each web page
that combines hundreds of features such as cosine similarity and term proximity,
together with the PageRank score. This composite score is used to provide a ranked
list of results for the query.
5. 2.1 PageRank scoring
Consider a random surfer who randomly surfs the web pages:
Start at a random page
At each time step, the surfer go out of the
current page along one of the links
on that page, equiprobably
As the surfer proceeds in this random walk (surf) from node to node, he
visits some nodes more often than others; intuitively, these are nodes with
many links coming in from other frequently visited nodes. The idea behind
PageRank is that pages visited more often in this walk are more
important.
1/3
1/3
1/3
Sec. 21.2
6. 2.2 Teleporting (or teleportation)
What if the current location of the surfer has no out-links?
To address this an additional operation for our random surfer was introduced: the
teleport operation.
In the teleport operation the surfer jumps from a node to any other node in the web
graph. This could happen because he types an address into the URL bar of his browser.
The destination of a teleport operation is modeled as being chosen uniformly at random
from all web pages. In other words, if N is the total number of nodes in the web graph,
the teleport operation takes the surfer to each node with probability 1/N.
How do we model the random surfer process?
7. 3. Markov chains
A Markov chain consists of n states, plus an nn transition probability matrix P.
At each step, we are in one of the states.
For 1 i,j n, the matrix entry Pij tells us the probability of j being the next state,
given we are currently in state i.
i j
Pij
Pii>0
is OK.
Sec. 21.2.1
Clearly, for all i, .1
1
ij
n
j
P
8. 4. Random surfer model
We can view a random surfer on the web graph as a Markov chain (Markov
chains are abstractions of random walks). In this Markov chain model, each
Web page or node in the Web graph is regarded as a state. A hyperlink is a
transition, which leads from one state to another state with a transition
probability. Transition probability represents the probability of moving
from one web page to another. The teleport operation contributes to these
transition probabilities. Thus, this framework models Web surfing as a
stochastic process. It models a Web surfer randomly surfing the Web as a
state transition in the Markov chain.
9. 4. Random surfer model
The adjacency matrix A of the web graph is defined as follows: if there is a hyperlink from
page i to page j, then Aij = 1, otherwise Aij = 0. We can readily derive the transition
probability matrix P for our Markov chain from the N × N matrix A:
with probability, random surfer clicks on one of the hyperlinks. This is known as
transportation. Each hyperlink has an equal probability of being clicked. is a
damping factor usually set to 0.85.
with the complementary probability 1- (=0.15), random surfer jumps to some other
web page (e.g., enters the url into address bar of the browser). This is known as
teleportation. Each web page has an equal probability of being jumped to.
N is the total number of nodes in the web graph.
(Equation 1)
If
Otherwise (if Not)
10. 5. PageRank algorithm
The PageRank of page j is the sum of the PageRank scores of pages i linking to j,
weighted by the probability of going from i to j. In words, the PageRank thesis reads
as follows:
A Web page is important if it is pointed to by other important pages.
Let R be a N-dimensional row vector of PageRank values of all pages, i.e.,
The PageRank vector is then recursively defined as the solution of equation:
(Recursive calculation of the PageRanks. We consider the transportation and the
teleportation operations defined previously)
11. 5. PageRank algorithm
Input:
- The adjacency matrix A of the web graph;
- : damping factor ; // usually set to 0.85
- ε : Pre-specified threshold (desired precision); //used in Stopping condition
Initialization
- Using equation 1, calculate the probability matrix P;
- PageRank vector ;
- ;
Output: PageRank vector
Repeat
Until ε
Simple iterative algorithm for calculating the PageRanks vector R.
The iteration ends when the PageRank
values do not change much or converge.
In this algorithm, the iteration ends after
the L1-norm of the residual vector is less
than the pre-specified threshold. Note
that the L1-norm for a vector is simply
the sum of all the components.
12. 6. Example
Consider the social network given below. PageRank algorithm can find the
importance ranking of the nodes in the network.
: Is the damping factor
13. 6. Example
Transportation:
T matrix gives the pairwise transportation probabilities. Tij gives the probability
that random surfer transports from page i to page j (nodes are numbered in
alphabetic order, i.e., A=1, B=2, ...).
14. 6. Example
Teleportation:
D matrix gives the pairwise teleportation probabilities. Dij gives the probability that
random surfer teleports from page i to page j (nodes are numbered in alphabetic order,
i.e., A=1, B=2, ...). Note that, teleportation probabilities depends only on dangling and
non-dangling property of a node, i.e., node A is dangling, all other nodes are non-
dangling.
Dangling nodes : Nodes with no outgoing edges (links).
15. 6. Example
Random surfing probabilities:
Final probabilities for the random surfer is given by P = T +D.
16. 6. Example
PageRank computation:
As mentioned before, each web page has an initial score, which is 1/11 = 0.0909 (step
0). Using the basic version of PageRank algorithm given previously, we can compute the
PageRank scores of each page. Bellow is the PageRank vectors corresponding to the
given social network:
Converges occurs when L1-norm of PageRank scores is less than 10-6 and it takes 82 steps
to converge. S shows the scores for first 3 steps and last 2 steps. Scores are normalized to
sum to 1. In order to get the percentage of importance, scores can be multiplied by 100. Last
row of S gives the final percentages. (source: Shatlyk Ashyralyyev, CS533 course)
17. 7. Strengths of PageRank
The main advantage of PageRank is its ability to fight spam. A page is important if
the pages pointing to it are important. Since it is not easy for Web page owner to add
in-links into his/her page from other important pages, it is thus not easy to influence
PageRank. Nevertheless, there are reported ways to influence PageRank.
Recognizing and fighting spam is an important issue in Web search.
Another major advantage of PageRank is that it is a global measure and is query
independent. That is, the PageRank values of all the pages on the Web are
computed and saved off-line rather than at the query time. At the query time, only a
lookup is needed to find the value to be integrated with other strategies to rank the
pages. It is thus very efficient at the query time. Both these two advantages
contributed greatly to Google’s success.
We note again that the link-based ranking is not the only strategy used in a search
engine. Many other information retrieval methods, heuristics, and empirical
parameters are also employed. However, their details are not published. Also
PageRank is not the only link-based static and global ranking algorithm. All major
search engines, such as Bing and Yahoo!, have their own algorithms.
18. References
[1] Wasserman, S. and K. Faust. Social Network Analysis. 1994: Cambridge University
Press.
[2] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. An Introduction to
Information Retrieval. 2009, Cambridge University Press
[3] Bing Liu. Web Data Mining. Pub. Date: 2011, Second Edition, pages: 622. ISBN: 978-
3-642-19459-7. Publisher: Springer-Verlag Berlin Heidelberg
[4] Shatlyk Ashyralyyev, CS533 course, Bilkent University