Page access coefficient algorithm for information filtering

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
60
PAGE ACCESS COEFFICIENT ALGORITHM FOR INFORMATION
FILTERING IN SOCIAL NETWORK
Mrs. L.Rajeswari * Dr.S.S.Dhenakaran
Computer Centre Department of Comp.Sci. & Engg.
Alagappa University Alagappa University
Karaikudi. India. Karaikudi. India.
ABSTRACT
Social Network [9] is defined as the network of interactions or relationships, where
the nodes consist of actors, and the edges consist of the relationship or interaction between
these actors. Information retrieval is used to get the relevant information, viewing needed
and ignoring irrelevant data. For information filtering, there exist two special algorithms
called PageRank and Weighted PageRank. But in both the algorithms, the computational
calculations are high and involve many more iterations. In this paper, a new algorithm called
PAC (Page Access Coefficient) is proposed to calculate the Rank of Web Pages for Social
Network in order to reduce the calculations and the time complexity.
Keyword: Social Network, Web Pages, Information Filtering, PageRank, Weighted
PageRank, PAC.
I INTRODUCTION
The world wide web[1] is a collection of information resources on the Internet that
are using the Hypertext Transfer protocol. It is repository of many interlinked hypertext
documents, accessed via the Internet. Web may contain text, images, videos and other
multimedia data.
Social networks [9] have become vary popular in recent years because of the
increasing proliferation and affordability of internet enabled devices such as personal
computers, mobile devices and other more recent hardware innovations such as internet
tablets. This is evidenced by the burgeoning popularity of many online social networks such
as Twitter, Facebook and LinkedIn. Social networks can be defined either in the context of
systems such as Facebook which are explicitly designed for social interactions, or in terms of
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 3, May-June (2013), pp. 60-69
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E

61
other sites such as Flickr which are designed for a different service such as content sharing,
but also allow an extensive level of social interaction.
A social network is a social structure made up of individuals (or organizations) called
“nodes”, which are tied (connected) by one or more specific types of interdependency, such
as friendship, kinship, common interest, and financial exchange, relationships of belief,
knowledge or prestige. Social networking sites [3] are the portals of entry to the Internet for
many millions of users, and they are being used both for advertisement as well as for ensuing
commerce. Social networks can be used to mitigate the privacy and access challenge that
arise when the amount of shared content is growing at an exponential rate.
A group of individuals with connections to other social world is likely to have access
to a wider range of information. It is better for individual success to have connections to a
variety of networks rather than many connections within a single network. Other Social
networks such as YouTube and Google Video are used to share multimedia content, and
others such as LiveJournal and BlogSpot are used to share blogs. Full Participation in online
social network [3] requires users to register a (pseudo) identity with the network, though
some sites do allow browsing public data without explicit sign-on. Users may share
volunteer information about themselves (e.g. their birthday, place of residence, interest, etc.)
all of which constitute the user’s profile. The Social network itself [3] is composed of links
between users. Some sites allow users to link to any other user (without consent from the link
recipient), while other sites follow a two-phase procedure that only allows a link to be
established when both parties agree. Certain sites, such as Flickr, have social networks with
directed links (meaning a link from A to B does not imply the presence of a reverse links),
whereas others, such as Orkut, have social networks with undirected links.
Information Filtering
Information filtering is a name used to describe a variety of process involving in the
delivery of information to people who need it. Information retrieval has been characterized in
a variety of ways, ranging from a description of its goal, to relatively abstract models of its
components and process. Information retrieval is used to get the relevant information,
viewing needed data and ignoring irrelevant data. For this, information filtering is necessary.
The goal of information filtering [4] is to eliminate the redundant or unsuitable
information and thus overcome the information overload. Information filtering helps users to
choose from an abundant number of possibilities (available products, potential friends, etc.)
those that are most likely to be of interest or use for them.
Relevant information [5] can be defined solely for a specific user and under the
context of a particular domain or topic. The shared “social” information can be used to
improve the task of retrieving relevant information, and for refining each agent’s particular
knowledge. The information filtering techniques are used in different applications, not only in
the web context, but in thematic issues as varied as voice recognition, classification of
telescopic astronomy or evaluation of financial risk. The information filtering is used to
select the particular product in online product sales.
II BACKGROUND
A. Ranking web pages
With the rapid growth [7] of WWW and the user’s demand on knowledge, it is
becoming more difficult to manage the information on WWW and satisfy the user needs.

62
Therefore, the users are looking for better information retrieval techniques and tools to locate,
extract, filter and find the necessary information. Most of the users use information retrieval
tools like search engines to find the information from the WWW. So Web mining and
ranking mechanism becomes very important for effective information retrieval.
With the rapid growth [6] of the Web, providing relevant pages of the highest quality
to the users based on their queries becomes increasingly difficult. The reasons are that, some
web pages are not self-descriptive and some links exist purely for navigational purposes.
Therefore, finding appropriate pages through a search engine that relies on web contents or
makes use of hyperlink information is very difficult. To overcome the above mentioned
problems, several algorithms have been developed. Among them PageRank in [7] and
Weighted PageRank in [7] are commonly used algorithms in Web Structure Mining.
B. PageRank
PageRank [8] is the most commonly used algorithms for ranking the various pages.
Working of PageRank algorithm depends upon link structure of the web pages. The
PageRank algorithm is based on the concepts that if a page contains an important links
towards it, then the links of this page towards the other page are also to be considered as an
important page. The PageRank [7] provides a more advanced way to compute the
importance or relevance of a Web page than simply counting the number of pages that are
linking to it (called a “backlinks”). The PageRank considers the back link in deciding the
rank score.
Assume an arbitrary page A has pages T1 to Tn pointing to it (incoming link).
PageRank can be calculated by the following equation.
PR(A) = (1-d)+d(PR(T1) / O(T1)+…………+PR(Tn )/O(Tn ) (1)
The parameter d is a damping factor, usually set to 0.85. O(A) is defined as the
number of links going out of page A.
Let us take an example of hyperlink structure of four pages A, B, C and D as shown
in Fig.1 below. The PageRank for Pages A, B, C and D can be calculated using (1).
Fig.1 Hyperlink Structure for 4 pages
Page
A
Page
B
Page
D
Page
C

63
PR(A)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)+PR(D)/O(D))
= (1-0.85)+0.85(1/3+1/3+1/1) =1.566667
PR(B)=(1-d)+d(PR(A)/O(A)+PR(C)/O(C)) = 1.099167
PR(C)=(1-d)+d(PR(A)/O(A)+PR(B)/O(B)) = 1.127264
PR(D)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)) = 0.780822
The second iteration by taking the above PageRank values. :
PR(A ) = (1-0.85)+0.85(1.099167/3 + 1.127264/3 + 0.780822/1)
= 1.444521
PR(B) = 0.15 + 0.85(1.444521/2 + 1.127643/3) = 1.083313
PR(C) = 0.15 +0.85(1.444521/2 + 1.083313/3 ) = 1.07086
PR(D) = 0.15 +0.85(1.083313/3 + 1.070863/3) = 0.760349
During the 34th
iteration, the PageRank get converged as shown in Table 1 below:.
Table 1
Iteration A B C D
1
2
3
4
--
--
33
34
35
1
1.566667
1.444521
1.406645
---
---
1.31351
1.313509
1.313509
1
1.099167
1.083313
1.051235
---
---
0.988244
0.988244
0.988244
1
1.127264
1.07086
1.045674
---
---
0.988244
0.988244
0.988244
1
0.780822
0.760349
0.744124
---
---
0.710005
0.710005
0.710005
For smaller set of pages, it is easy to calculate and find out the PageRank values. But for a
Web having billions of pages, it is not easy to do the calculation like above.
C. Weighted PageRank Algorithm:
Wenpu Xing and Ali Ghorbani [6] proposed a Weighted PageRank algorithm which
is an extension of the PageRank algorithm. This algorithm assigns a larger rank values to the
more important pages rather than dividing the rank value of a page evenly among its outgoing
linked pages. Each outgoing link gets a value proportional to its importance. The importance
is assigned in terms of weight values to the incoming and outgoing links and are denoted as
Win
(m,n) and Wout
(m,n) respectively. Win
(m,n) as shown in equation (3) is the weight of
link(m,n) calculated based on the number of incoming links of page n and the number of
incoming links of all reference pages of page m.
)3(
)(
),(
∑∈
=
mRp
p
nin
nm
I
I
W

64
)4(
)(
),(
∑∈
=
mRp
p
nout
nm
O
O
W
Where In and IP are the number of incoming links of page n and page p respectively. R(m)
denote the reference page list oa page m. Wout
(m,n) is as shown in equation (4) is the weight
of link(m,n) calculated based on the number of outgoing links of page n and the number of
outgoing links of all reference pages of m. Where On and Op are the number of outgoing
links of page n and p respectively. The formula proposed in [6] for the WPR is shown in
equation (5), which is the modification of the PageRank formula in [7]
)5()()1()( ),(),(
)(
out
nm
in
nm
nBm
WWmWPRddnWPR ∑∈
+−=
By using the same hyperlink structure as shown in Fig.1, the WPR equation for Page
A, B, C and D can be calculated using (5) as follows.
WPR(A) = (1-0.85)+0.85(1*3/5*1/3 + 1*3/5*1/3+1*3/4*1)
= 1.127
WPR(B) = (1-0.85)+0.85(1.127*1/3*1/2 + 1*2/5*1/2)
= 0.499
WPR(C) = (1-0.85)+0.85(1.127*1/3*1/2 + 0.499*2/5*1/2)
= 0.392
WPR(D) = (1-0.85)+0.85(0.499*1/2*1 + 0.392*2/5*1/3)
= 0.406
In the above calculations, it is defined as WPR(A)>WPR(B)>WPR(D)>WPR(C). This
results shows that the page rank order is different from PageRank.
To differentiate the WPR from the PageRank, Wenpu et al, categorized the resultant
pages of a query into four categories based on their relevancy to the given query. They are
1. Very Relevant Pages (VR ): The pages containing very important information related
to a given query.
2. Relavant Pages (R ): Pages are relevant but not having important information about a
given query.
3. Weak Relevant Pages (WR ): Pages may have the query keywords but they do not
have the relevant information.
4. Irrelevant pages (IR): Pages not having any relevant information and query
keywords.
The PageRank and WPR algorithms both provide ranked pages in the sorting order to
the users based on the given query. So, in the resultant list, the number of relevant pages and
their order are very important for users. Wenpu et al, proposed a Relevance Rule to calculate

65
the relevancy value of each page in the list pages. That makes WPR is different from
PageRank.
Experimental studies by Wenpu et al, showed the WPR produces larger relevancy
values than the PageRank.
III PROPOSED WORK
While filtering the information, the query (relevant information of users) results must
be ranked. Because, most users rarely go beyond the first few query results. The concept or
ranking is most commonly used in Search Engine. This paper proposes a new ranking
algorithm for information filtering in Social network.
For ranking the pages, the existing PageRank uses the iterated calculations and the
WPR uses weight for each and every page.
But in the proposed work, Page Access Coefficient (PAC) is calculated for each page.
The information retrieval is done based on the PAC values of the web page.
A. Calculation of Page Access Coefficient (PAC)
In the proposed work, the Rank of a Web page is calculated based on the number of
incoming links of page, number of outgoing links of Page and the total number of pages. The
proposed formula for calculating PAC is as shown below:
PAC(A) = IA + (OA / n) (6)
where PAC Page Access Coefficient
A Web Page
IA number of pages referring Page A
(incoming links of Page A)
OA number of pages referred by Page A
(outgoing links of Page A)
n total number of pages
Let us take an example of hyperlink structure of four pages A, B, C and D as shown in
Fig.1. The PAC for Pages A, B, C and D can be calculated by using (6) as
Here, n = 4, the total number of pages
PAC(A) = IA + (OA / n) = 3 + (2/4) = 3.5
PAC(B) = IB + (OB / n) = 2 + (3/4) = 2.75
PAC(C) = IC + (OC / n) = 2 + (3/4) = 2.75
PAC(D) = ID + (OD / n) = 2 + (1/4) = 2.25
From the above calculations it was shown that
PAC(A)>PAC(B)= PAC(C) >PAC(D)
According to PAC (Page Access Coefficient) calculation, in the first iteration, PAC
values of Page B and C get converged.

66
B. Algorithm for calculating PAC
Input: Let G=(V,E) be a Social network. V denote the Page, E denote the link between
pages. IV denotes the number of incoming links of Page V. OV denotes the number of
outgoing links of page V.
Output: The list of Page Access Coefficient is stored in PAC.
Algorithm 1
Parameters: a social network G=(V,E),
an evaluating page A є V;
PAC(A) Page Access Coefficient of Page A
n size of V (total number of pages)
for i=1 to n do
IA incoming links of Page A
OA outgoing links of Page A
End for
for all A є V do
PAC(A) = IA + (OA / n)
End for
IV RESULTS AND DISCUSSIONS
Let us compute the rank for Web pages shown in Fig.2 as below using PageRank,
WPR and Proposed PAC algorithm.
Fig.2 Hyperlink Structure for 5 pages
A
B C
D E

67
PageRank Output
PR(A) = (1-d)+d(PR(B)/2+PR(E)/2)
=(1-0.85)+0.85*(1/2+1/2) = 1
PR(B) = 0.15 +0.85*(1/2+1/2+1/2) = 1.425
PR(C) = 0.15+0.85*(1/2) = 0.575
PR(D) = 0.15 +0.85 *(1.425/2+0.575/2) = 1.0
PR(E) = 0.15 + 0.85*(0.575/2 + 1) = 0.8194
The second iteration is obtained below by taking the above PageRank values:
PR(A) = 1.1039
PR(B) = 1.2118
PR(C) = 0.5750
PR(D) = 0.9094
PR(E) = 1.1673
During the 44th
iteration, the PageRank gets converges as shown in Table 2 below:
Table 2
Iteration A B C D E
1 1 1 1 1 1
2 1.0000 1.4250 0.5750 1.0000 0.8194
3 1.1039 1.2118 0.5750 0.9094 1.1673
4 1.1611 1.3840 0.6191 1.0013 1.2643
--- --- ---- --- --- ---
--- --- ---- --- --- ---
41 1.6683 1.9246 0.8590 1.3330 1.6482
42 1.6684 1.9246 0.8590 1.3331 1.6482
43 1.6684 1.9247 0.8591 1.3331 1.6482
44 1.6685 1.9247 0.8591 1.3331 1.6483
45 1.6685 1.9247 0.8591 1.3331 1.6483
. From above, PR(B)>PR(A)>PR(E)>PR(D)>PR(C)
WPR output:
WPR(A) = 0.66
WPR(B) = 1.5933
WPR(C) = 0.2161
WPR(D) = 0.3152
WPR(E) = 0.40755
From above, WPR(B)>WPR(A)>WPR(E)>WPR(D)>WPR(C).

68
Use the hyperlink structure as shown in Fig. 2 and do the PAC Calculation.
The PAC equation for Page A, B, C and D as follows.
G =(V,E)
V={ A, B, C, D, E)
E={(A,B),(A,C),(B,A),(B,D),(C,B),(C,D),C,E),(D,E),(E,A),(E,B)}
n=5
Pages In Out
A 2 2
B 3 2
C 1 2
D 2 1
E 2 2
PAC output:
PAC(A) = IA + (OA / n) = 2 + (2/5) = 2.4
PAC(B) = IB + (OB / n) = 3 + (2/5) = 3.4
PAC(C) = IC + (OC / n) = 1 + (2/5) = 1.4
PAC(D) = ID + (OD / n) = 2 + (1/5) = 2.2
PAC(E) = IE + (OE / n) = 2 + (2/5) = 2.4
In this, PAC( B)>PAC(A)=PAC(E)>PAC(D)>PAC(C).
PR calculated values for Fig. 2, WPR calculated values for Fig. 2 and
PAC calculated values for Fig. 2 are shown above.
From this, it is shown that the results for PAC are obtained in single iteration.
The order of PR values of Page A, B, C, D, E ; WPR values Page A, B, C, D, E and
PAC values of Page A, B,C, D, E are almost same.
But in the PR calculation, many iteration have to do, it increases the calculation time.
But in WPR calculation, the weight for each vertex has to be calculated, it increases
the calculation complexity.
But in PAC calculation, the results are obtained in single step; therefore it reduces the
time and complexity of the calculation.
The time complexity for the Proposed algorithm is O(n), since for n vertices the
iteration is only n.

69
CONCLUSION
With the rapid growth of www and the user’s demand on knowledge, it is becoming
more difficult to manage the information on www and satisfy the user needs. Therefore, the
users are looking for better information filtering techniques to locate, extract, filter and find
the necessary information. This paper also studies the performance of PageRank and WPR
algorithms. It is found that the computational complexity is high in the existing algorithms.
So, a new algorithm is proposed to calculate the PAC for Social networks. The Proposed
algorithm is found to be the better than the existing algorithm, since the proposed algorithm
reduces the calculations and the time complexity.
REFERENCES
[1] Pooja Sharma et al. “Weighted Page Rank for Ordering Web Search Result”, IJEST ,
Vol.2(12),2010,7301-7310.
[2] Krishna Lerman, “Social Browsing & Information Filtering in Social Media”,
arXiv:0710.5697vl [cs.CY] 30 Oct 2007.
[3] Alan E.Mislove, “Online Social Networks: Measurement, Analysis and Applications to
Distributed Information Systems”, in Ph.D. thesis, April 2009, Rice University.
[4] Matus Medo, “Network-based information filtering algorithms: ranking and
recommendation”, arXiv:1208.4552vl [cs.SI] 22 Aug 2012.
[5] DELGADO et al. “Content-based Collaborative Information Filtering”, Nagoya 466
Japan, {jdelgado,ishii,tomkey}@ics.nitech.ac.jp.
[6] Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm” , Proc of the Second
Annual Conference on Communication Networks and Services Research Research
(CNSR’04), IEEE, 2004.
[7] Ashutosh Kumar Singh, Ravi Kumar P, “A comparative Study of Page Ranking
Algorithms for Information Retrieval”, IJECE, 4.7.2009.
[8] Dilip Kumar Sharma and A.K.Sharma, “ A Comparative Analysis of Web Page
Ranking Algorithms”, IJCSE, Vol.02,No.08,2010,2670-2676.
[9] Charu C.Aggarwal, “An Introduction to Social Network Data Analytics” IBM, T.J.
Research Centre.
[10] Muhanad A. Al-Khalisy and Dr.Haider K. Hoomod, “POSN: Private Information
Protection in Online Social Networks”, International Journal of Computer Engineering &
Technology (IJCET), Volume 4, Issue 2, 2013, pp. 340 - 355, ISSN Print: 0976 – 6367, ISSN
Online: 0976 – 6375.
[11] Alamelu Mangai J, Santhosh Kumar V and Sugumaran V, “Recent Research in Web
Page Classification – A Review”, International Journal of Computer Engineering &
Technology (IJCET), Volume 1, Issue 1, 2010, pp. 112 - 122, ISSN Print: 0976 – 6367, ISSN
Online: 0976 – 6375.

Page access coefficient algorithm for information filtering

More Related Content

What's hot

Viewers also liked

Similar to Page access coefficient algorithm for information filtering

More from IAEME Publication

Recently uploaded

Page access coefficient algorithm for information filtering