This presentation describes in simple terms how the PageRank algorithm by Google founders works. It displays the actual algorithm as well as tried to explain how the calculations are done and how ranks are assigned to any webpage.
PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is referred to as the PageRank of E and denoted by {\displaystyle PR(E).} PR(E). Other factors like Author Rank can contribute to the importance of an entity.
A PageRank results from a mathematical algorithm based on the webgraph, created by all World Wide Web pages as nodes and hyperlinks as edges, taking into consideration authority hubs such as cnn.com or usa.gov. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it ("incoming links"). A page that is linked to by many pages with high PageRank receives a high rank itself.
Numerous academic papers concerning PageRank have been published since Page and Brin's original paper.[5] In practice, the PageRank concept may be vulnerable to manipulation. Research has been conducted into identifying falsely influenced PageRank rankings. The goal is to find an effective means of ignoring links from documents with falsely influenced PageRank.
Other link-based ranking algorithms for Web pages include the HITS algorithm invented by Jon Kleinberg (used by Teoma and now Ask.com),the IBM CLEVER project, the TrustRank algorithm and the hummingbird algorithm.
This presentation won me the best presentation award at my University Tech fest "Allegretto" in 2008.
I have also presented this seminar as a part of B.Tech curriculum in 7th Semester.
This presentation describes in simple terms how the PageRank algorithm by Google founders works. It displays the actual algorithm as well as tried to explain how the calculations are done and how ranks are assigned to any webpage.
PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is referred to as the PageRank of E and denoted by {\displaystyle PR(E).} PR(E). Other factors like Author Rank can contribute to the importance of an entity.
A PageRank results from a mathematical algorithm based on the webgraph, created by all World Wide Web pages as nodes and hyperlinks as edges, taking into consideration authority hubs such as cnn.com or usa.gov. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it ("incoming links"). A page that is linked to by many pages with high PageRank receives a high rank itself.
Numerous academic papers concerning PageRank have been published since Page and Brin's original paper.[5] In practice, the PageRank concept may be vulnerable to manipulation. Research has been conducted into identifying falsely influenced PageRank rankings. The goal is to find an effective means of ignoring links from documents with falsely influenced PageRank.
Other link-based ranking algorithms for Web pages include the HITS algorithm invented by Jon Kleinberg (used by Teoma and now Ask.com),the IBM CLEVER project, the TrustRank algorithm and the hummingbird algorithm.
This presentation won me the best presentation award at my University Tech fest "Allegretto" in 2008.
I have also presented this seminar as a part of B.Tech curriculum in 7th Semester.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
YouTube Link: https://youtu.be/vpOLiDyhNUA
** Machine Learning Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
This Edureka PPT on 'What is a Neural Network' will help you understand how Neural Networks can be used to solve complex, data-driven problems along with their real-world applications.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Deep Learning for Recommendations: Fundamentals and Advances
In this part, we focus on Graph Neural Networks for Recommendations.
Tutorial Website/slides: https://advanced-recommender-systems.github.io/ijcai2021-tutorial/
https://youtu.be/4aXk3LNTJRc
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
YouTube Link: https://youtu.be/vpOLiDyhNUA
** Machine Learning Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
This Edureka PPT on 'What is a Neural Network' will help you understand how Neural Networks can be used to solve complex, data-driven problems along with their real-world applications.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Deep Learning for Recommendations: Fundamentals and Advances
In this part, we focus on Graph Neural Networks for Recommendations.
Tutorial Website/slides: https://advanced-recommender-systems.github.io/ijcai2021-tutorial/
https://youtu.be/4aXk3LNTJRc
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
This presentation is based on ranking of web pages, mainly it consist of PageRank algorithm and HITS algorithm. It gives brief knowledge of how to calculate page rank by looking at the links between the pages. It tells you about different techniques of search engine optimization.
My talk about PageRank in the Department of Electrical Engineering, National Taiwan University in 2010. Caveat: many math symbols got scrambled, damn software backward compatibility.
Clinical Cases from Resource Limited Settings: David RoeselUWGlobalHealth
Participants will be able to: recognize importance and identify resources for learning about a country and local 'disease' profile; local/regional guidelines and algorithms appropriate for the specific clinical setting; how to address limitations in clinical resources for diagnosis and management of clinical cases; and understanding health care service structure and personnel/staffing structure.
Find details of all the Google Updates like Google Hummingbird, Google Mobile Friendly Update, Google Panda Update, Google Penguin Update, Google Pigeon Update, Google Payday Update, Google Pirate Update, Google EMD (Exact Match Domain) Update, with our comprehensive reporting.
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
This paper discusses a method of Generalizing PageRank algorithm for different types of networks. Rank of each vertex is considered to be dependent upon both the in- and out-edges. Each edge can also have differing importance. This solves the problem of dead ends and spider traps without the need of taxation (?).
---
Abstract— PageRank is a well-known algorithm that has been used to understand the structure of the Web. In its classical formulation the algorithm considers only forward looking paths in its analysis- a typical web scenario. We propose a generalization of the PageRank algorithm based on both out-links and in-links. This generalization enables the elimination network anomalies- and increases the applicability of the algorithm to an array of new applications in networked data. Through experimental results we illustrate that the proposed generalized PageRank minimizes the effect of network anomalies, and results in more realistic representation of the network.
Keywords- Search Engine; PageRank; Web Structure; Web Mining; Spider-Trap; dead-end; Taxation;Web spamming
Highlighted notes on Deeper Inside PageRank.
While doing research work under Prof. Kishore Kothapalli.
This is a really "deep" review of PageRank! Should be a good story for a PhD student going to be working with PageRank optimizations.
K anonymity for crowdsourcing database
In crowdsourcing database, human operators are embedded into the database engine and collaborate with other conventional database operators to process the queries. Each human operator publishes small HITs (Human Intelligent Task) to the crowdsourcing platform, which consists of a set of database records and corresponding questions for human workers.
Cost Efficient PageRank Computation using GPU : NOTESSubhajit Sahu
Highlighted notes on:
Cost Efficient PageRank Computation using GPU
This paper discusses the use of Aitken extrapolated Power method for PageRank computation. However, the results are unclear whether the performance improvement is due to GPU implementation, or due to use of Aitken extrapolation. The paper mentions a good performance improvement for damping factor values close to 1, and very low tolerance values which are usually not used for PageRank computation. It needs to be cross-checked to see if Aitken extrapolation provides and reduction of iterations on CPU only (as the same effect would be observed on the GPU, only timings change).
Incremental Page Rank Computation on Evolving Graphs : NOTESSubhajit Sahu
Highlighted notes while doing research work under Prof. Dip Sankar Banerjee and Prof. Kishore Kothapalli:
Incremental Page Rank Computation on Evolving Graphs.
https://dl.acm.org/doi/10.1145/1062745.1062885
This paper describes a simple method for computing dynamic pagerank, based on the fact that change of out-degree of a node does not affect its pagerank (first order markov property). The part of graph which is updated (edge additions / edge deletions / weight changes) is used to find the affected partition of graph using BFS. The unaffected partition is simply scaled, and pagerank computation is done only for the affected partition.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
1. Jung Hoon Kim
N5, Room 2239
E-mail: junghoon.kim@kaist.ac.kr
2014.01.14
KAIST Knowledge Service Engineering
Data Mining Lab.
1
2. Introduction
First introduced by Sergey Brin & Larry Page in 1998
Original ranking algorithm didn’t suitable for web in 1996
# of Web pages grew rapidly
in 1996, query “classification technique” => 10 million relevant
page searched!
content similarity method are easily spammed
vulnerable for spam page
KAIST Knowledge Service Engineering
Data Mining Lab.
2
3. Basic
page rank algorithm has two principle
A hyperlink from a page pointing to another page is an
implicit conveyance of authority to the target page.
thus, the more in-links that a page i receives, the more
prestige the page i has
Pages that point to page i also have their own prestige
score. A page with higher prestige score pointing to i is
more important than a page with a lower prestige score
pointing to i
KAIST Knowledge Service Engineering
Data Mining Lab.
3
4. principle
hyperlink trick
many incident node means more important
KAIST Knowledge Service Engineering
Data Mining Lab.
4
5. Authority
more authority people say .. is more important
John is computer scientist
Alice is cooker
KAIST Knowledge Service Engineering
Data Mining Lab.
5
6. Big picture
big picture
famous person is means having many incident edges
KAIST Knowledge Service Engineering
Data Mining Lab.
6
7. Cyclic problem
In web, there are many cycles like this
this matrix has cycle A->B->E
it means the score is increased by infinitely
KAIST Knowledge Service Engineering
Data Mining Lab.
7
8. Random suffer trick
To avoid many problem and many reason
they adapted random surfer
each node can ability to move any node
it can solve cycle problem
high incident node can have high rank
sometimes it called as damping factor(d)
by google initial model, d = 0.15
KAIST Knowledge Service Engineering
Data Mining Lab.
8
9. Test
1000 times test result
nearly correct ;
D, A has high rank
A has only one incident link
To easily identify rank, to
express percentage is good
methods
KAIST Knowledge Service Engineering
Data Mining Lab.
9
13. Formula
in mathematically, we have a system of n linear
equations.
P=(P1, P2, P3 , … Pn)
A is adjacent matrix, so we can make this formula
KAIST Knowledge Service Engineering
Data Mining Lab.
13
15. Linear Algebra
formula
P is an eigenvector with the corresponding eigenvalue of 1.
1 is the largest eigenvalue and the PageRank vector P is the
principle eigenvector
to calculate P, we can use power iteration algorithm
KAIST Knowledge Service Engineering
Data Mining Lab.
15
16. Condition
but the conditions are that A is a stochastic matrix and
that it is irreducible and aperiodic
We can see the graph model as markov model
each web page is node and hyperlink is transition
A is not a stochastic matrix, because there are zero
row(5). zero row means no out-link.
So we fix the problem by adding a complete set of outgoing
links from each such page i to all the pages on the Web
KAIST Knowledge Service Engineering
Data Mining Lab.
16
18. irreducible
if there is no path from u to v, A is not irreducible because
of some pair of nodes u and v.
if there are path u to v, A is irreducible!
A state i is periodic with period k > 1 if k is the smallest
number such that all paths leading from state i back to
state i have a length that is a multiple of k. If a state is not
periodic, A markov chain is aperiodic if all states are
aperiodic
KAIST Knowledge Service Engineering
Data Mining Lab.
18
19. Page Rank
It is easy to deal with the above two problems with a
single strategy
We add a link from each page to every page and give each
link a small transition probability controlled by a parameter
d
KAIST Knowledge Service Engineering
Data Mining Lab.
19
20. Page Rank
The computation of pagerank values of the Web pages can
be done using the power iteration method, which produces
the principal eigenvector with an eigenvalue of 1
The iteration ends when the PageRank values do not
change much or converge.
KAIST Knowledge Service Engineering
Data Mining Lab.
20
21. Real Page rank
To deal with web spam is most important thing
give equal random surfer constants and calculate all the
page needs to many times to calculate it
Currently, Google use more 200 factors to calculate
ranking in web
KAIST Knowledge Service Engineering
Data Mining Lab.
21