International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
60
PAGE ACCESS COEFFICIENT ALGORITHM FOR INFORMATION
FILTERING IN SOCIAL NETWORK
Mrs. L.Rajeswari * Dr.S.S.Dhenakaran
Computer Centre Department of Comp.Sci. & Engg.
Alagappa University Alagappa University
Karaikudi. India. Karaikudi. India.
ABSTRACT
Social Network [9] is defined as the network of interactions or relationships, where
the nodes consist of actors, and the edges consist of the relationship or interaction between
these actors. Information retrieval is used to get the relevant information, viewing needed
and ignoring irrelevant data. For information filtering, there exist two special algorithms
called PageRank and Weighted PageRank. But in both the algorithms, the computational
calculations are high and involve many more iterations. In this paper, a new algorithm called
PAC (Page Access Coefficient) is proposed to calculate the Rank of Web Pages for Social
Network in order to reduce the calculations and the time complexity.
Keyword: Social Network, Web Pages, Information Filtering, PageRank, Weighted
PageRank, PAC.
I INTRODUCTION
The world wide web[1] is a collection of information resources on the Internet that
are using the Hypertext Transfer protocol. It is repository of many interlinked hypertext
documents, accessed via the Internet. Web may contain text, images, videos and other
multimedia data.
Social networks [9] have become vary popular in recent years because of the
increasing proliferation and affordability of internet enabled devices such as personal
computers, mobile devices and other more recent hardware innovations such as internet
tablets. This is evidenced by the burgeoning popularity of many online social networks such
as Twitter, Facebook and LinkedIn. Social networks can be defined either in the context of
systems such as Facebook which are explicitly designed for social interactions, or in terms of
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 3, May-June (2013), pp. 60-69
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
61
other sites such as Flickr which are designed for a different service such as content sharing,
but also allow an extensive level of social interaction.
A social network is a social structure made up of individuals (or organizations) called
“nodes”, which are tied (connected) by one or more specific types of interdependency, such
as friendship, kinship, common interest, and financial exchange, relationships of belief,
knowledge or prestige. Social networking sites [3] are the portals of entry to the Internet for
many millions of users, and they are being used both for advertisement as well as for ensuing
commerce. Social networks can be used to mitigate the privacy and access challenge that
arise when the amount of shared content is growing at an exponential rate.
A group of individuals with connections to other social world is likely to have access
to a wider range of information. It is better for individual success to have connections to a
variety of networks rather than many connections within a single network. Other Social
networks such as YouTube and Google Video are used to share multimedia content, and
others such as LiveJournal and BlogSpot are used to share blogs. Full Participation in online
social network [3] requires users to register a (pseudo) identity with the network, though
some sites do allow browsing public data without explicit sign-on. Users may share
volunteer information about themselves (e.g. their birthday, place of residence, interest, etc.)
all of which constitute the user’s profile. The Social network itself [3] is composed of links
between users. Some sites allow users to link to any other user (without consent from the link
recipient), while other sites follow a two-phase procedure that only allows a link to be
established when both parties agree. Certain sites, such as Flickr, have social networks with
directed links (meaning a link from A to B does not imply the presence of a reverse links),
whereas others, such as Orkut, have social networks with undirected links.
Information Filtering
Information filtering is a name used to describe a variety of process involving in the
delivery of information to people who need it. Information retrieval has been characterized in
a variety of ways, ranging from a description of its goal, to relatively abstract models of its
components and process. Information retrieval is used to get the relevant information,
viewing needed data and ignoring irrelevant data. For this, information filtering is necessary.
The goal of information filtering [4] is to eliminate the redundant or unsuitable
information and thus overcome the information overload. Information filtering helps users to
choose from an abundant number of possibilities (available products, potential friends, etc.)
those that are most likely to be of interest or use for them.
Relevant information [5] can be defined solely for a specific user and under the
context of a particular domain or topic. The shared “social” information can be used to
improve the task of retrieving relevant information, and for refining each agent’s particular
knowledge. The information filtering techniques are used in different applications, not only in
the web context, but in thematic issues as varied as voice recognition, classification of
telescopic astronomy or evaluation of financial risk. The information filtering is used to
select the particular product in online product sales.
II BACKGROUND
A. Ranking web pages
With the rapid growth [7] of WWW and the user’s demand on knowledge, it is
becoming more difficult to manage the information on WWW and satisfy the user needs.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
62
Therefore, the users are looking for better information retrieval techniques and tools to locate,
extract, filter and find the necessary information. Most of the users use information retrieval
tools like search engines to find the information from the WWW. So Web mining and
ranking mechanism becomes very important for effective information retrieval.
With the rapid growth [6] of the Web, providing relevant pages of the highest quality
to the users based on their queries becomes increasingly difficult. The reasons are that, some
web pages are not self-descriptive and some links exist purely for navigational purposes.
Therefore, finding appropriate pages through a search engine that relies on web contents or
makes use of hyperlink information is very difficult. To overcome the above mentioned
problems, several algorithms have been developed. Among them PageRank in [7] and
Weighted PageRank in [7] are commonly used algorithms in Web Structure Mining.
B. PageRank
PageRank [8] is the most commonly used algorithms for ranking the various pages.
Working of PageRank algorithm depends upon link structure of the web pages. The
PageRank algorithm is based on the concepts that if a page contains an important links
towards it, then the links of this page towards the other page are also to be considered as an
important page. The PageRank [7] provides a more advanced way to compute the
importance or relevance of a Web page than simply counting the number of pages that are
linking to it (called a “backlinks”). The PageRank considers the back link in deciding the
rank score.
Assume an arbitrary page A has pages T1 to Tn pointing to it (incoming link).
PageRank can be calculated by the following equation.
PR(A) = (1-d)+d(PR(T1) / O(T1)+…………+PR(Tn )/O(Tn ) (1)
The parameter d is a damping factor, usually set to 0.85. O(A) is defined as the
number of links going out of page A.
Let us take an example of hyperlink structure of four pages A, B, C and D as shown
in Fig.1 below. The PageRank for Pages A, B, C and D can be calculated using (1).
Fig.1 Hyperlink Structure for 4 pages
Page
A
Page
B
Page
D
Page
C
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
63
PR(A)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)+PR(D)/O(D))
= (1-0.85)+0.85(1/3+1/3+1/1) =1.566667
PR(B)=(1-d)+d(PR(A)/O(A)+PR(C)/O(C)) = 1.099167
PR(C)=(1-d)+d(PR(A)/O(A)+PR(B)/O(B)) = 1.127264
PR(D)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)) = 0.780822
The second iteration by taking the above PageRank values. :
PR(A ) = (1-0.85)+0.85(1.099167/3 + 1.127264/3 + 0.780822/1)
= 1.444521
PR(B) = 0.15 + 0.85(1.444521/2 + 1.127643/3) = 1.083313
PR(C) = 0.15 +0.85(1.444521/2 + 1.083313/3 ) = 1.07086
PR(D) = 0.15 +0.85(1.083313/3 + 1.070863/3) = 0.760349
During the 34th
iteration, the PageRank get converged as shown in Table 1 below:.
Table 1
Iteration A B C D
1
2
3
4
--
--
33
34
35
1
1.566667
1.444521
1.406645
---
---
1.31351
1.313509
1.313509
1
1.099167
1.083313
1.051235
---
---
0.988244
0.988244
0.988244
1
1.127264
1.07086
1.045674
---
---
0.988244
0.988244
0.988244
1
0.780822
0.760349
0.744124
---
---
0.710005
0.710005
0.710005
For smaller set of pages, it is easy to calculate and find out the PageRank values. But for a
Web having billions of pages, it is not easy to do the calculation like above.
C. Weighted PageRank Algorithm:
Wenpu Xing and Ali Ghorbani [6] proposed a Weighted PageRank algorithm which
is an extension of the PageRank algorithm. This algorithm assigns a larger rank values to the
more important pages rather than dividing the rank value of a page evenly among its outgoing
linked pages. Each outgoing link gets a value proportional to its importance. The importance
is assigned in terms of weight values to the incoming and outgoing links and are denoted as
Win
(m,n) and Wout
(m,n) respectively. Win
(m,n) as shown in equation (3) is the weight of
link(m,n) calculated based on the number of incoming links of page n and the number of
incoming links of all reference pages of page m.
)3(
)(
),(
∑∈
=
mRp
p
nin
nm
I
I
W
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
64
)4(
)(
),(
∑∈
=
mRp
p
nout
nm
O
O
W
Where In and IP are the number of incoming links of page n and page p respectively. R(m)
denote the reference page list oa page m. Wout
(m,n) is as shown in equation (4) is the weight
of link(m,n) calculated based on the number of outgoing links of page n and the number of
outgoing links of all reference pages of m. Where On and Op are the number of outgoing
links of page n and p respectively. The formula proposed in [6] for the WPR is shown in
equation (5), which is the modification of the PageRank formula in [7]
)5()()1()( ),(),(
)(
out
nm
in
nm
nBm
WWmWPRddnWPR ∑∈
+−=
By using the same hyperlink structure as shown in Fig.1, the WPR equation for Page
A, B, C and D can be calculated using (5) as follows.
WPR(A) = (1-0.85)+0.85(1*3/5*1/3 + 1*3/5*1/3+1*3/4*1)
= 1.127
WPR(B) = (1-0.85)+0.85(1.127*1/3*1/2 + 1*2/5*1/2)
= 0.499
WPR(C) = (1-0.85)+0.85(1.127*1/3*1/2 + 0.499*2/5*1/2)
= 0.392
WPR(D) = (1-0.85)+0.85(0.499*1/2*1 + 0.392*2/5*1/3)
= 0.406
In the above calculations, it is defined as WPR(A)>WPR(B)>WPR(D)>WPR(C). This
results shows that the page rank order is different from PageRank.
To differentiate the WPR from the PageRank, Wenpu et al, categorized the resultant
pages of a query into four categories based on their relevancy to the given query. They are
1. Very Relevant Pages (VR ): The pages containing very important information related
to a given query.
2. Relavant Pages (R ): Pages are relevant but not having important information about a
given query.
3. Weak Relevant Pages (WR ): Pages may have the query keywords but they do not
have the relevant information.
4. Irrelevant pages (IR): Pages not having any relevant information and query
keywords.
The PageRank and WPR algorithms both provide ranked pages in the sorting order to
the users based on the given query. So, in the resultant list, the number of relevant pages and
their order are very important for users. Wenpu et al, proposed a Relevance Rule to calculate
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
65
the relevancy value of each page in the list pages. That makes WPR is different from
PageRank.
Experimental studies by Wenpu et al, showed the WPR produces larger relevancy
values than the PageRank.
III PROPOSED WORK
While filtering the information, the query (relevant information of users) results must
be ranked. Because, most users rarely go beyond the first few query results. The concept or
ranking is most commonly used in Search Engine. This paper proposes a new ranking
algorithm for information filtering in Social network.
For ranking the pages, the existing PageRank uses the iterated calculations and the
WPR uses weight for each and every page.
But in the proposed work, Page Access Coefficient (PAC) is calculated for each page.
The information retrieval is done based on the PAC values of the web page.
A. Calculation of Page Access Coefficient (PAC)
In the proposed work, the Rank of a Web page is calculated based on the number of
incoming links of page, number of outgoing links of Page and the total number of pages. The
proposed formula for calculating PAC is as shown below:
PAC(A) = IA + (OA / n) (6)
where PAC Page Access Coefficient
A Web Page
IA number of pages referring Page A
(incoming links of Page A)
OA number of pages referred by Page A
(outgoing links of Page A)
n total number of pages
Let us take an example of hyperlink structure of four pages A, B, C and D as shown in
Fig.1. The PAC for Pages A, B, C and D can be calculated by using (6) as
Here, n = 4, the total number of pages
PAC(A) = IA + (OA / n) = 3 + (2/4) = 3.5
PAC(B) = IB + (OB / n) = 2 + (3/4) = 2.75
PAC(C) = IC + (OC / n) = 2 + (3/4) = 2.75
PAC(D) = ID + (OD / n) = 2 + (1/4) = 2.25
From the above calculations it was shown that
PAC(A)>PAC(B)= PAC(C) >PAC(D)
According to PAC (Page Access Coefficient) calculation, in the first iteration, PAC
values of Page B and C get converged.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
66
B. Algorithm for calculating PAC
Input: Let G=(V,E) be a Social network. V denote the Page, E denote the link between
pages. IV denotes the number of incoming links of Page V. OV denotes the number of
outgoing links of page V.
Output: The list of Page Access Coefficient is stored in PAC.
Algorithm 1
Parameters: a social network G=(V,E),
an evaluating page A є V;
PAC(A) Page Access Coefficient of Page A
n size of V (total number of pages)
for i=1 to n do
IA incoming links of Page A
OA outgoing links of Page A
End for
for all A є V do
PAC(A) = IA + (OA / n)
End for
IV RESULTS AND DISCUSSIONS
Let us compute the rank for Web pages shown in Fig.2 as below using PageRank,
WPR and Proposed PAC algorithm.
Fig.2 Hyperlink Structure for 5 pages
A
B C
D E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
67
PageRank Output
PR(A) = (1-d)+d(PR(B)/2+PR(E)/2)
=(1-0.85)+0.85*(1/2+1/2) = 1
PR(B) = 0.15 +0.85*(1/2+1/2+1/2) = 1.425
PR(C) = 0.15+0.85*(1/2) = 0.575
PR(D) = 0.15 +0.85 *(1.425/2+0.575/2) = 1.0
PR(E) = 0.15 + 0.85*(0.575/2 + 1) = 0.8194
The second iteration is obtained below by taking the above PageRank values:
PR(A) = 1.1039
PR(B) = 1.2118
PR(C) = 0.5750
PR(D) = 0.9094
PR(E) = 1.1673
During the 44th
iteration, the PageRank gets converges as shown in Table 2 below:
Table 2
Iteration A B C D E
1 1 1 1 1 1
2 1.0000 1.4250 0.5750 1.0000 0.8194
3 1.1039 1.2118 0.5750 0.9094 1.1673
4 1.1611 1.3840 0.6191 1.0013 1.2643
--- --- ---- --- --- ---
--- --- ---- --- --- ---
41 1.6683 1.9246 0.8590 1.3330 1.6482
42 1.6684 1.9246 0.8590 1.3331 1.6482
43 1.6684 1.9247 0.8591 1.3331 1.6482
44 1.6685 1.9247 0.8591 1.3331 1.6483
45 1.6685 1.9247 0.8591 1.3331 1.6483
. From above, PR(B)>PR(A)>PR(E)>PR(D)>PR(C)
WPR output:
WPR(A) = 0.66
WPR(B) = 1.5933
WPR(C) = 0.2161
WPR(D) = 0.3152
WPR(E) = 0.40755
From above, WPR(B)>WPR(A)>WPR(E)>WPR(D)>WPR(C).
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
68
Use the hyperlink structure as shown in Fig. 2 and do the PAC Calculation.
The PAC equation for Page A, B, C and D as follows.
G =(V,E)
V={ A, B, C, D, E)
E={(A,B),(A,C),(B,A),(B,D),(C,B),(C,D),C,E),(D,E),(E,A),(E,B)}
n=5
Pages In Out
A 2 2
B 3 2
C 1 2
D 2 1
E 2 2
PAC output:
PAC(A) = IA + (OA / n) = 2 + (2/5) = 2.4
PAC(B) = IB + (OB / n) = 3 + (2/5) = 3.4
PAC(C) = IC + (OC / n) = 1 + (2/5) = 1.4
PAC(D) = ID + (OD / n) = 2 + (1/5) = 2.2
PAC(E) = IE + (OE / n) = 2 + (2/5) = 2.4
In this, PAC( B)>PAC(A)=PAC(E)>PAC(D)>PAC(C).
PR calculated values for Fig. 2, WPR calculated values for Fig. 2 and
PAC calculated values for Fig. 2 are shown above.
From this, it is shown that the results for PAC are obtained in single iteration.
The order of PR values of Page A, B, C, D, E ; WPR values Page A, B, C, D, E and
PAC values of Page A, B,C, D, E are almost same.
But in the PR calculation, many iteration have to do, it increases the calculation time.
But in WPR calculation, the weight for each vertex has to be calculated, it increases
the calculation complexity.
But in PAC calculation, the results are obtained in single step; therefore it reduces the
time and complexity of the calculation.
The time complexity for the Proposed algorithm is O(n), since for n vertices the
iteration is only n.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
69
CONCLUSION
With the rapid growth of www and the user’s demand on knowledge, it is becoming
more difficult to manage the information on www and satisfy the user needs. Therefore, the
users are looking for better information filtering techniques to locate, extract, filter and find
the necessary information. This paper also studies the performance of PageRank and WPR
algorithms. It is found that the computational complexity is high in the existing algorithms.
So, a new algorithm is proposed to calculate the PAC for Social networks. The Proposed
algorithm is found to be the better than the existing algorithm, since the proposed algorithm
reduces the calculations and the time complexity.
REFERENCES
[1] Pooja Sharma et al. “Weighted Page Rank for Ordering Web Search Result”, IJEST ,
Vol.2(12),2010,7301-7310.
[2] Krishna Lerman, “Social Browsing & Information Filtering in Social Media”,
arXiv:0710.5697vl [cs.CY] 30 Oct 2007.
[3] Alan E.Mislove, “Online Social Networks: Measurement, Analysis and Applications to
Distributed Information Systems”, in Ph.D. thesis, April 2009, Rice University.
[4] Matus Medo, “Network-based information filtering algorithms: ranking and
recommendation”, arXiv:1208.4552vl [cs.SI] 22 Aug 2012.
[5] DELGADO et al. “Content-based Collaborative Information Filtering”, Nagoya 466
Japan, {jdelgado,ishii,tomkey}@ics.nitech.ac.jp.
[6] Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm” , Proc of the Second
Annual Conference on Communication Networks and Services Research Research
(CNSR’04), IEEE, 2004.
[7] Ashutosh Kumar Singh, Ravi Kumar P, “A comparative Study of Page Ranking
Algorithms for Information Retrieval”, IJECE, 4.7.2009.
[8] Dilip Kumar Sharma and A.K.Sharma, “ A Comparative Analysis of Web Page
Ranking Algorithms”, IJCSE, Vol.02,No.08,2010,2670-2676.
[9] Charu C.Aggarwal, “An Introduction to Social Network Data Analytics” IBM, T.J.
Research Centre.
[10] Muhanad A. Al-Khalisy and Dr.Haider K. Hoomod, “POSN: Private Information
Protection in Online Social Networks”, International Journal of Computer Engineering &
Technology (IJCET), Volume 4, Issue 2, 2013, pp. 340 - 355, ISSN Print: 0976 – 6367, ISSN
Online: 0976 – 6375.
[11] Alamelu Mangai J, Santhosh Kumar V and Sugumaran V, “Recent Research in Web
Page Classification – A Review”, International Journal of Computer Engineering &
Technology (IJCET), Volume 1, Issue 1, 2010, pp. 112 - 122, ISSN Print: 0976 – 6367, ISSN
Online: 0976 – 6375.

Page access coefficient algorithm for information filtering

  • 1.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 60 PAGE ACCESS COEFFICIENT ALGORITHM FOR INFORMATION FILTERING IN SOCIAL NETWORK Mrs. L.Rajeswari * Dr.S.S.Dhenakaran Computer Centre Department of Comp.Sci. & Engg. Alagappa University Alagappa University Karaikudi. India. Karaikudi. India. ABSTRACT Social Network [9] is defined as the network of interactions or relationships, where the nodes consist of actors, and the edges consist of the relationship or interaction between these actors. Information retrieval is used to get the relevant information, viewing needed and ignoring irrelevant data. For information filtering, there exist two special algorithms called PageRank and Weighted PageRank. But in both the algorithms, the computational calculations are high and involve many more iterations. In this paper, a new algorithm called PAC (Page Access Coefficient) is proposed to calculate the Rank of Web Pages for Social Network in order to reduce the calculations and the time complexity. Keyword: Social Network, Web Pages, Information Filtering, PageRank, Weighted PageRank, PAC. I INTRODUCTION The world wide web[1] is a collection of information resources on the Internet that are using the Hypertext Transfer protocol. It is repository of many interlinked hypertext documents, accessed via the Internet. Web may contain text, images, videos and other multimedia data. Social networks [9] have become vary popular in recent years because of the increasing proliferation and affordability of internet enabled devices such as personal computers, mobile devices and other more recent hardware innovations such as internet tablets. This is evidenced by the burgeoning popularity of many online social networks such as Twitter, Facebook and LinkedIn. Social networks can be defined either in the context of systems such as Facebook which are explicitly designed for social interactions, or in terms of INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 3, May-June (2013), pp. 60-69 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 61 other sites such as Flickr which are designed for a different service such as content sharing, but also allow an extensive level of social interaction. A social network is a social structure made up of individuals (or organizations) called “nodes”, which are tied (connected) by one or more specific types of interdependency, such as friendship, kinship, common interest, and financial exchange, relationships of belief, knowledge or prestige. Social networking sites [3] are the portals of entry to the Internet for many millions of users, and they are being used both for advertisement as well as for ensuing commerce. Social networks can be used to mitigate the privacy and access challenge that arise when the amount of shared content is growing at an exponential rate. A group of individuals with connections to other social world is likely to have access to a wider range of information. It is better for individual success to have connections to a variety of networks rather than many connections within a single network. Other Social networks such as YouTube and Google Video are used to share multimedia content, and others such as LiveJournal and BlogSpot are used to share blogs. Full Participation in online social network [3] requires users to register a (pseudo) identity with the network, though some sites do allow browsing public data without explicit sign-on. Users may share volunteer information about themselves (e.g. their birthday, place of residence, interest, etc.) all of which constitute the user’s profile. The Social network itself [3] is composed of links between users. Some sites allow users to link to any other user (without consent from the link recipient), while other sites follow a two-phase procedure that only allows a link to be established when both parties agree. Certain sites, such as Flickr, have social networks with directed links (meaning a link from A to B does not imply the presence of a reverse links), whereas others, such as Orkut, have social networks with undirected links. Information Filtering Information filtering is a name used to describe a variety of process involving in the delivery of information to people who need it. Information retrieval has been characterized in a variety of ways, ranging from a description of its goal, to relatively abstract models of its components and process. Information retrieval is used to get the relevant information, viewing needed data and ignoring irrelevant data. For this, information filtering is necessary. The goal of information filtering [4] is to eliminate the redundant or unsuitable information and thus overcome the information overload. Information filtering helps users to choose from an abundant number of possibilities (available products, potential friends, etc.) those that are most likely to be of interest or use for them. Relevant information [5] can be defined solely for a specific user and under the context of a particular domain or topic. The shared “social” information can be used to improve the task of retrieving relevant information, and for refining each agent’s particular knowledge. The information filtering techniques are used in different applications, not only in the web context, but in thematic issues as varied as voice recognition, classification of telescopic astronomy or evaluation of financial risk. The information filtering is used to select the particular product in online product sales. II BACKGROUND A. Ranking web pages With the rapid growth [7] of WWW and the user’s demand on knowledge, it is becoming more difficult to manage the information on WWW and satisfy the user needs.
  • 3.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 62 Therefore, the users are looking for better information retrieval techniques and tools to locate, extract, filter and find the necessary information. Most of the users use information retrieval tools like search engines to find the information from the WWW. So Web mining and ranking mechanism becomes very important for effective information retrieval. With the rapid growth [6] of the Web, providing relevant pages of the highest quality to the users based on their queries becomes increasingly difficult. The reasons are that, some web pages are not self-descriptive and some links exist purely for navigational purposes. Therefore, finding appropriate pages through a search engine that relies on web contents or makes use of hyperlink information is very difficult. To overcome the above mentioned problems, several algorithms have been developed. Among them PageRank in [7] and Weighted PageRank in [7] are commonly used algorithms in Web Structure Mining. B. PageRank PageRank [8] is the most commonly used algorithms for ranking the various pages. Working of PageRank algorithm depends upon link structure of the web pages. The PageRank algorithm is based on the concepts that if a page contains an important links towards it, then the links of this page towards the other page are also to be considered as an important page. The PageRank [7] provides a more advanced way to compute the importance or relevance of a Web page than simply counting the number of pages that are linking to it (called a “backlinks”). The PageRank considers the back link in deciding the rank score. Assume an arbitrary page A has pages T1 to Tn pointing to it (incoming link). PageRank can be calculated by the following equation. PR(A) = (1-d)+d(PR(T1) / O(T1)+…………+PR(Tn )/O(Tn ) (1) The parameter d is a damping factor, usually set to 0.85. O(A) is defined as the number of links going out of page A. Let us take an example of hyperlink structure of four pages A, B, C and D as shown in Fig.1 below. The PageRank for Pages A, B, C and D can be calculated using (1). Fig.1 Hyperlink Structure for 4 pages Page A Page B Page D Page C
  • 4.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 63 PR(A)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)+PR(D)/O(D)) = (1-0.85)+0.85(1/3+1/3+1/1) =1.566667 PR(B)=(1-d)+d(PR(A)/O(A)+PR(C)/O(C)) = 1.099167 PR(C)=(1-d)+d(PR(A)/O(A)+PR(B)/O(B)) = 1.127264 PR(D)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)) = 0.780822 The second iteration by taking the above PageRank values. : PR(A ) = (1-0.85)+0.85(1.099167/3 + 1.127264/3 + 0.780822/1) = 1.444521 PR(B) = 0.15 + 0.85(1.444521/2 + 1.127643/3) = 1.083313 PR(C) = 0.15 +0.85(1.444521/2 + 1.083313/3 ) = 1.07086 PR(D) = 0.15 +0.85(1.083313/3 + 1.070863/3) = 0.760349 During the 34th iteration, the PageRank get converged as shown in Table 1 below:. Table 1 Iteration A B C D 1 2 3 4 -- -- 33 34 35 1 1.566667 1.444521 1.406645 --- --- 1.31351 1.313509 1.313509 1 1.099167 1.083313 1.051235 --- --- 0.988244 0.988244 0.988244 1 1.127264 1.07086 1.045674 --- --- 0.988244 0.988244 0.988244 1 0.780822 0.760349 0.744124 --- --- 0.710005 0.710005 0.710005 For smaller set of pages, it is easy to calculate and find out the PageRank values. But for a Web having billions of pages, it is not easy to do the calculation like above. C. Weighted PageRank Algorithm: Wenpu Xing and Ali Ghorbani [6] proposed a Weighted PageRank algorithm which is an extension of the PageRank algorithm. This algorithm assigns a larger rank values to the more important pages rather than dividing the rank value of a page evenly among its outgoing linked pages. Each outgoing link gets a value proportional to its importance. The importance is assigned in terms of weight values to the incoming and outgoing links and are denoted as Win (m,n) and Wout (m,n) respectively. Win (m,n) as shown in equation (3) is the weight of link(m,n) calculated based on the number of incoming links of page n and the number of incoming links of all reference pages of page m. )3( )( ),( ∑∈ = mRp p nin nm I I W
  • 5.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 64 )4( )( ),( ∑∈ = mRp p nout nm O O W Where In and IP are the number of incoming links of page n and page p respectively. R(m) denote the reference page list oa page m. Wout (m,n) is as shown in equation (4) is the weight of link(m,n) calculated based on the number of outgoing links of page n and the number of outgoing links of all reference pages of m. Where On and Op are the number of outgoing links of page n and p respectively. The formula proposed in [6] for the WPR is shown in equation (5), which is the modification of the PageRank formula in [7] )5()()1()( ),(),( )( out nm in nm nBm WWmWPRddnWPR ∑∈ +−= By using the same hyperlink structure as shown in Fig.1, the WPR equation for Page A, B, C and D can be calculated using (5) as follows. WPR(A) = (1-0.85)+0.85(1*3/5*1/3 + 1*3/5*1/3+1*3/4*1) = 1.127 WPR(B) = (1-0.85)+0.85(1.127*1/3*1/2 + 1*2/5*1/2) = 0.499 WPR(C) = (1-0.85)+0.85(1.127*1/3*1/2 + 0.499*2/5*1/2) = 0.392 WPR(D) = (1-0.85)+0.85(0.499*1/2*1 + 0.392*2/5*1/3) = 0.406 In the above calculations, it is defined as WPR(A)>WPR(B)>WPR(D)>WPR(C). This results shows that the page rank order is different from PageRank. To differentiate the WPR from the PageRank, Wenpu et al, categorized the resultant pages of a query into four categories based on their relevancy to the given query. They are 1. Very Relevant Pages (VR ): The pages containing very important information related to a given query. 2. Relavant Pages (R ): Pages are relevant but not having important information about a given query. 3. Weak Relevant Pages (WR ): Pages may have the query keywords but they do not have the relevant information. 4. Irrelevant pages (IR): Pages not having any relevant information and query keywords. The PageRank and WPR algorithms both provide ranked pages in the sorting order to the users based on the given query. So, in the resultant list, the number of relevant pages and their order are very important for users. Wenpu et al, proposed a Relevance Rule to calculate
  • 6.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 65 the relevancy value of each page in the list pages. That makes WPR is different from PageRank. Experimental studies by Wenpu et al, showed the WPR produces larger relevancy values than the PageRank. III PROPOSED WORK While filtering the information, the query (relevant information of users) results must be ranked. Because, most users rarely go beyond the first few query results. The concept or ranking is most commonly used in Search Engine. This paper proposes a new ranking algorithm for information filtering in Social network. For ranking the pages, the existing PageRank uses the iterated calculations and the WPR uses weight for each and every page. But in the proposed work, Page Access Coefficient (PAC) is calculated for each page. The information retrieval is done based on the PAC values of the web page. A. Calculation of Page Access Coefficient (PAC) In the proposed work, the Rank of a Web page is calculated based on the number of incoming links of page, number of outgoing links of Page and the total number of pages. The proposed formula for calculating PAC is as shown below: PAC(A) = IA + (OA / n) (6) where PAC Page Access Coefficient A Web Page IA number of pages referring Page A (incoming links of Page A) OA number of pages referred by Page A (outgoing links of Page A) n total number of pages Let us take an example of hyperlink structure of four pages A, B, C and D as shown in Fig.1. The PAC for Pages A, B, C and D can be calculated by using (6) as Here, n = 4, the total number of pages PAC(A) = IA + (OA / n) = 3 + (2/4) = 3.5 PAC(B) = IB + (OB / n) = 2 + (3/4) = 2.75 PAC(C) = IC + (OC / n) = 2 + (3/4) = 2.75 PAC(D) = ID + (OD / n) = 2 + (1/4) = 2.25 From the above calculations it was shown that PAC(A)>PAC(B)= PAC(C) >PAC(D) According to PAC (Page Access Coefficient) calculation, in the first iteration, PAC values of Page B and C get converged.
  • 7.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 66 B. Algorithm for calculating PAC Input: Let G=(V,E) be a Social network. V denote the Page, E denote the link between pages. IV denotes the number of incoming links of Page V. OV denotes the number of outgoing links of page V. Output: The list of Page Access Coefficient is stored in PAC. Algorithm 1 Parameters: a social network G=(V,E), an evaluating page A є V; PAC(A) Page Access Coefficient of Page A n size of V (total number of pages) for i=1 to n do IA incoming links of Page A OA outgoing links of Page A End for for all A є V do PAC(A) = IA + (OA / n) End for IV RESULTS AND DISCUSSIONS Let us compute the rank for Web pages shown in Fig.2 as below using PageRank, WPR and Proposed PAC algorithm. Fig.2 Hyperlink Structure for 5 pages A B C D E
  • 8.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 67 PageRank Output PR(A) = (1-d)+d(PR(B)/2+PR(E)/2) =(1-0.85)+0.85*(1/2+1/2) = 1 PR(B) = 0.15 +0.85*(1/2+1/2+1/2) = 1.425 PR(C) = 0.15+0.85*(1/2) = 0.575 PR(D) = 0.15 +0.85 *(1.425/2+0.575/2) = 1.0 PR(E) = 0.15 + 0.85*(0.575/2 + 1) = 0.8194 The second iteration is obtained below by taking the above PageRank values: PR(A) = 1.1039 PR(B) = 1.2118 PR(C) = 0.5750 PR(D) = 0.9094 PR(E) = 1.1673 During the 44th iteration, the PageRank gets converges as shown in Table 2 below: Table 2 Iteration A B C D E 1 1 1 1 1 1 2 1.0000 1.4250 0.5750 1.0000 0.8194 3 1.1039 1.2118 0.5750 0.9094 1.1673 4 1.1611 1.3840 0.6191 1.0013 1.2643 --- --- ---- --- --- --- --- --- ---- --- --- --- 41 1.6683 1.9246 0.8590 1.3330 1.6482 42 1.6684 1.9246 0.8590 1.3331 1.6482 43 1.6684 1.9247 0.8591 1.3331 1.6482 44 1.6685 1.9247 0.8591 1.3331 1.6483 45 1.6685 1.9247 0.8591 1.3331 1.6483 . From above, PR(B)>PR(A)>PR(E)>PR(D)>PR(C) WPR output: WPR(A) = 0.66 WPR(B) = 1.5933 WPR(C) = 0.2161 WPR(D) = 0.3152 WPR(E) = 0.40755 From above, WPR(B)>WPR(A)>WPR(E)>WPR(D)>WPR(C).
  • 9.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 68 Use the hyperlink structure as shown in Fig. 2 and do the PAC Calculation. The PAC equation for Page A, B, C and D as follows. G =(V,E) V={ A, B, C, D, E) E={(A,B),(A,C),(B,A),(B,D),(C,B),(C,D),C,E),(D,E),(E,A),(E,B)} n=5 Pages In Out A 2 2 B 3 2 C 1 2 D 2 1 E 2 2 PAC output: PAC(A) = IA + (OA / n) = 2 + (2/5) = 2.4 PAC(B) = IB + (OB / n) = 3 + (2/5) = 3.4 PAC(C) = IC + (OC / n) = 1 + (2/5) = 1.4 PAC(D) = ID + (OD / n) = 2 + (1/5) = 2.2 PAC(E) = IE + (OE / n) = 2 + (2/5) = 2.4 In this, PAC( B)>PAC(A)=PAC(E)>PAC(D)>PAC(C). PR calculated values for Fig. 2, WPR calculated values for Fig. 2 and PAC calculated values for Fig. 2 are shown above. From this, it is shown that the results for PAC are obtained in single iteration. The order of PR values of Page A, B, C, D, E ; WPR values Page A, B, C, D, E and PAC values of Page A, B,C, D, E are almost same. But in the PR calculation, many iteration have to do, it increases the calculation time. But in WPR calculation, the weight for each vertex has to be calculated, it increases the calculation complexity. But in PAC calculation, the results are obtained in single step; therefore it reduces the time and complexity of the calculation. The time complexity for the Proposed algorithm is O(n), since for n vertices the iteration is only n.
  • 10.
    International Journal ofComputer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 69 CONCLUSION With the rapid growth of www and the user’s demand on knowledge, it is becoming more difficult to manage the information on www and satisfy the user needs. Therefore, the users are looking for better information filtering techniques to locate, extract, filter and find the necessary information. This paper also studies the performance of PageRank and WPR algorithms. It is found that the computational complexity is high in the existing algorithms. So, a new algorithm is proposed to calculate the PAC for Social networks. The Proposed algorithm is found to be the better than the existing algorithm, since the proposed algorithm reduces the calculations and the time complexity. REFERENCES [1] Pooja Sharma et al. “Weighted Page Rank for Ordering Web Search Result”, IJEST , Vol.2(12),2010,7301-7310. [2] Krishna Lerman, “Social Browsing & Information Filtering in Social Media”, arXiv:0710.5697vl [cs.CY] 30 Oct 2007. [3] Alan E.Mislove, “Online Social Networks: Measurement, Analysis and Applications to Distributed Information Systems”, in Ph.D. thesis, April 2009, Rice University. [4] Matus Medo, “Network-based information filtering algorithms: ranking and recommendation”, arXiv:1208.4552vl [cs.SI] 22 Aug 2012. [5] DELGADO et al. “Content-based Collaborative Information Filtering”, Nagoya 466 Japan, {jdelgado,ishii,tomkey}@ics.nitech.ac.jp. [6] Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm” , Proc of the Second Annual Conference on Communication Networks and Services Research Research (CNSR’04), IEEE, 2004. [7] Ashutosh Kumar Singh, Ravi Kumar P, “A comparative Study of Page Ranking Algorithms for Information Retrieval”, IJECE, 4.7.2009. [8] Dilip Kumar Sharma and A.K.Sharma, “ A Comparative Analysis of Web Page Ranking Algorithms”, IJCSE, Vol.02,No.08,2010,2670-2676. [9] Charu C.Aggarwal, “An Introduction to Social Network Data Analytics” IBM, T.J. Research Centre. [10] Muhanad A. Al-Khalisy and Dr.Haider K. Hoomod, “POSN: Private Information Protection in Online Social Networks”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 340 - 355, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [11] Alamelu Mangai J, Santhosh Kumar V and Sugumaran V, “Recent Research in Web Page Classification – A Review”, International Journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 1, 2010, pp. 112 - 122, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.