Your SlideShare is downloading. ×
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Page access coefficient algorithm for information filtering
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Page access coefficient algorithm for information filtering

138

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
138
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME60PAGE ACCESS COEFFICIENT ALGORITHM FOR INFORMATIONFILTERING IN SOCIAL NETWORKMrs. L.Rajeswari * Dr.S.S.DhenakaranComputer Centre Department of Comp.Sci. & Engg.Alagappa University Alagappa UniversityKaraikudi. India. Karaikudi. India.ABSTRACTSocial Network [9] is defined as the network of interactions or relationships, wherethe nodes consist of actors, and the edges consist of the relationship or interaction betweenthese actors. Information retrieval is used to get the relevant information, viewing neededand ignoring irrelevant data. For information filtering, there exist two special algorithmscalled PageRank and Weighted PageRank. But in both the algorithms, the computationalcalculations are high and involve many more iterations. In this paper, a new algorithm calledPAC (Page Access Coefficient) is proposed to calculate the Rank of Web Pages for SocialNetwork in order to reduce the calculations and the time complexity.Keyword: Social Network, Web Pages, Information Filtering, PageRank, WeightedPageRank, PAC.I INTRODUCTIONThe world wide web[1] is a collection of information resources on the Internet thatare using the Hypertext Transfer protocol. It is repository of many interlinked hypertextdocuments, accessed via the Internet. Web may contain text, images, videos and othermultimedia data.Social networks [9] have become vary popular in recent years because of theincreasing proliferation and affordability of internet enabled devices such as personalcomputers, mobile devices and other more recent hardware innovations such as internettablets. This is evidenced by the burgeoning popularity of many online social networks suchas Twitter, Facebook and LinkedIn. Social networks can be defined either in the context ofsystems such as Facebook which are explicitly designed for social interactions, or in terms ofINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING& TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 3, May-June (2013), pp. 60-69© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2013): 6.1302 (Calculated by GISI)www.jifactor.comIJCET© I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME61other sites such as Flickr which are designed for a different service such as content sharing,but also allow an extensive level of social interaction.A social network is a social structure made up of individuals (or organizations) called“nodes”, which are tied (connected) by one or more specific types of interdependency, suchas friendship, kinship, common interest, and financial exchange, relationships of belief,knowledge or prestige. Social networking sites [3] are the portals of entry to the Internet formany millions of users, and they are being used both for advertisement as well as for ensuingcommerce. Social networks can be used to mitigate the privacy and access challenge thatarise when the amount of shared content is growing at an exponential rate.A group of individuals with connections to other social world is likely to have accessto a wider range of information. It is better for individual success to have connections to avariety of networks rather than many connections within a single network. Other Socialnetworks such as YouTube and Google Video are used to share multimedia content, andothers such as LiveJournal and BlogSpot are used to share blogs. Full Participation in onlinesocial network [3] requires users to register a (pseudo) identity with the network, thoughsome sites do allow browsing public data without explicit sign-on. Users may sharevolunteer information about themselves (e.g. their birthday, place of residence, interest, etc.)all of which constitute the user’s profile. The Social network itself [3] is composed of linksbetween users. Some sites allow users to link to any other user (without consent from the linkrecipient), while other sites follow a two-phase procedure that only allows a link to beestablished when both parties agree. Certain sites, such as Flickr, have social networks withdirected links (meaning a link from A to B does not imply the presence of a reverse links),whereas others, such as Orkut, have social networks with undirected links.Information FilteringInformation filtering is a name used to describe a variety of process involving in thedelivery of information to people who need it. Information retrieval has been characterized ina variety of ways, ranging from a description of its goal, to relatively abstract models of itscomponents and process. Information retrieval is used to get the relevant information,viewing needed data and ignoring irrelevant data. For this, information filtering is necessary.The goal of information filtering [4] is to eliminate the redundant or unsuitableinformation and thus overcome the information overload. Information filtering helps users tochoose from an abundant number of possibilities (available products, potential friends, etc.)those that are most likely to be of interest or use for them.Relevant information [5] can be defined solely for a specific user and under thecontext of a particular domain or topic. The shared “social” information can be used toimprove the task of retrieving relevant information, and for refining each agent’s particularknowledge. The information filtering techniques are used in different applications, not only inthe web context, but in thematic issues as varied as voice recognition, classification oftelescopic astronomy or evaluation of financial risk. The information filtering is used toselect the particular product in online product sales.II BACKGROUNDA. Ranking web pagesWith the rapid growth [7] of WWW and the user’s demand on knowledge, it isbecoming more difficult to manage the information on WWW and satisfy the user needs.
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME62Therefore, the users are looking for better information retrieval techniques and tools to locate,extract, filter and find the necessary information. Most of the users use information retrievaltools like search engines to find the information from the WWW. So Web mining andranking mechanism becomes very important for effective information retrieval.With the rapid growth [6] of the Web, providing relevant pages of the highest qualityto the users based on their queries becomes increasingly difficult. The reasons are that, someweb pages are not self-descriptive and some links exist purely for navigational purposes.Therefore, finding appropriate pages through a search engine that relies on web contents ormakes use of hyperlink information is very difficult. To overcome the above mentionedproblems, several algorithms have been developed. Among them PageRank in [7] andWeighted PageRank in [7] are commonly used algorithms in Web Structure Mining.B. PageRankPageRank [8] is the most commonly used algorithms for ranking the various pages.Working of PageRank algorithm depends upon link structure of the web pages. ThePageRank algorithm is based on the concepts that if a page contains an important linkstowards it, then the links of this page towards the other page are also to be considered as animportant page. The PageRank [7] provides a more advanced way to compute theimportance or relevance of a Web page than simply counting the number of pages that arelinking to it (called a “backlinks”). The PageRank considers the back link in deciding therank score.Assume an arbitrary page A has pages T1 to Tn pointing to it (incoming link).PageRank can be calculated by the following equation.PR(A) = (1-d)+d(PR(T1) / O(T1)+…………+PR(Tn )/O(Tn ) (1)The parameter d is a damping factor, usually set to 0.85. O(A) is defined as thenumber of links going out of page A.Let us take an example of hyperlink structure of four pages A, B, C and D as shownin Fig.1 below. The PageRank for Pages A, B, C and D can be calculated using (1).Fig.1 Hyperlink Structure for 4 pagesPageAPageBPageDPageC
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME63PR(A)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)+PR(D)/O(D))= (1-0.85)+0.85(1/3+1/3+1/1) =1.566667PR(B)=(1-d)+d(PR(A)/O(A)+PR(C)/O(C)) = 1.099167PR(C)=(1-d)+d(PR(A)/O(A)+PR(B)/O(B)) = 1.127264PR(D)=(1-d)+d(PR(B)/O(B)+PR(C)/O(C)) = 0.780822The second iteration by taking the above PageRank values. :PR(A ) = (1-0.85)+0.85(1.099167/3 + 1.127264/3 + 0.780822/1)= 1.444521PR(B) = 0.15 + 0.85(1.444521/2 + 1.127643/3) = 1.083313PR(C) = 0.15 +0.85(1.444521/2 + 1.083313/3 ) = 1.07086PR(D) = 0.15 +0.85(1.083313/3 + 1.070863/3) = 0.760349During the 34thiteration, the PageRank get converged as shown in Table 1 below:.Table 1Iteration A B C D1234----33343511.5666671.4445211.406645------1.313511.3135091.31350911.0991671.0833131.051235------0.9882440.9882440.98824411.1272641.070861.045674------0.9882440.9882440.98824410.7808220.7603490.744124------0.7100050.7100050.710005For smaller set of pages, it is easy to calculate and find out the PageRank values. But for aWeb having billions of pages, it is not easy to do the calculation like above.C. Weighted PageRank Algorithm:Wenpu Xing and Ali Ghorbani [6] proposed a Weighted PageRank algorithm whichis an extension of the PageRank algorithm. This algorithm assigns a larger rank values to themore important pages rather than dividing the rank value of a page evenly among its outgoinglinked pages. Each outgoing link gets a value proportional to its importance. The importanceis assigned in terms of weight values to the incoming and outgoing links and are denoted asWin(m,n) and Wout(m,n) respectively. Win(m,n) as shown in equation (3) is the weight oflink(m,n) calculated based on the number of incoming links of page n and the number ofincoming links of all reference pages of page m.)3()(),(∑∈=mRppninnmIIW
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME64)4()(),(∑∈=mRppnoutnmOOWWhere In and IP are the number of incoming links of page n and page p respectively. R(m)denote the reference page list oa page m. Wout(m,n) is as shown in equation (4) is the weightof link(m,n) calculated based on the number of outgoing links of page n and the number ofoutgoing links of all reference pages of m. Where On and Op are the number of outgoinglinks of page n and p respectively. The formula proposed in [6] for the WPR is shown inequation (5), which is the modification of the PageRank formula in [7])5()()1()( ),(),()(outnminnmnBmWWmWPRddnWPR ∑∈+−=By using the same hyperlink structure as shown in Fig.1, the WPR equation for PageA, B, C and D can be calculated using (5) as follows.WPR(A) = (1-0.85)+0.85(1*3/5*1/3 + 1*3/5*1/3+1*3/4*1)= 1.127WPR(B) = (1-0.85)+0.85(1.127*1/3*1/2 + 1*2/5*1/2)= 0.499WPR(C) = (1-0.85)+0.85(1.127*1/3*1/2 + 0.499*2/5*1/2)= 0.392WPR(D) = (1-0.85)+0.85(0.499*1/2*1 + 0.392*2/5*1/3)= 0.406In the above calculations, it is defined as WPR(A)>WPR(B)>WPR(D)>WPR(C). Thisresults shows that the page rank order is different from PageRank.To differentiate the WPR from the PageRank, Wenpu et al, categorized the resultantpages of a query into four categories based on their relevancy to the given query. They are1. Very Relevant Pages (VR ): The pages containing very important information relatedto a given query.2. Relavant Pages (R ): Pages are relevant but not having important information about agiven query.3. Weak Relevant Pages (WR ): Pages may have the query keywords but they do nothave the relevant information.4. Irrelevant pages (IR): Pages not having any relevant information and querykeywords.The PageRank and WPR algorithms both provide ranked pages in the sorting order tothe users based on the given query. So, in the resultant list, the number of relevant pages andtheir order are very important for users. Wenpu et al, proposed a Relevance Rule to calculate
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME65the relevancy value of each page in the list pages. That makes WPR is different fromPageRank.Experimental studies by Wenpu et al, showed the WPR produces larger relevancyvalues than the PageRank.III PROPOSED WORKWhile filtering the information, the query (relevant information of users) results mustbe ranked. Because, most users rarely go beyond the first few query results. The concept orranking is most commonly used in Search Engine. This paper proposes a new rankingalgorithm for information filtering in Social network.For ranking the pages, the existing PageRank uses the iterated calculations and theWPR uses weight for each and every page.But in the proposed work, Page Access Coefficient (PAC) is calculated for each page.The information retrieval is done based on the PAC values of the web page.A. Calculation of Page Access Coefficient (PAC)In the proposed work, the Rank of a Web page is calculated based on the number ofincoming links of page, number of outgoing links of Page and the total number of pages. Theproposed formula for calculating PAC is as shown below:PAC(A) = IA + (OA / n) (6)where PAC Page Access CoefficientA Web PageIA number of pages referring Page A(incoming links of Page A)OA number of pages referred by Page A(outgoing links of Page A)n total number of pagesLet us take an example of hyperlink structure of four pages A, B, C and D as shown inFig.1. The PAC for Pages A, B, C and D can be calculated by using (6) asHere, n = 4, the total number of pagesPAC(A) = IA + (OA / n) = 3 + (2/4) = 3.5PAC(B) = IB + (OB / n) = 2 + (3/4) = 2.75PAC(C) = IC + (OC / n) = 2 + (3/4) = 2.75PAC(D) = ID + (OD / n) = 2 + (1/4) = 2.25From the above calculations it was shown thatPAC(A)>PAC(B)= PAC(C) >PAC(D)According to PAC (Page Access Coefficient) calculation, in the first iteration, PACvalues of Page B and C get converged.
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME66B. Algorithm for calculating PACInput: Let G=(V,E) be a Social network. V denote the Page, E denote the link betweenpages. IV denotes the number of incoming links of Page V. OV denotes the number ofoutgoing links of page V.Output: The list of Page Access Coefficient is stored in PAC.Algorithm 1Parameters: a social network G=(V,E),an evaluating page A є V;PAC(A) Page Access Coefficient of Page An size of V (total number of pages)for i=1 to n doIA incoming links of Page AOA outgoing links of Page AEnd forfor all A є V doPAC(A) = IA + (OA / n)End forIV RESULTS AND DISCUSSIONSLet us compute the rank for Web pages shown in Fig.2 as below using PageRank,WPR and Proposed PAC algorithm.Fig.2 Hyperlink Structure for 5 pagesAB CD E
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME67PageRank OutputPR(A) = (1-d)+d(PR(B)/2+PR(E)/2)=(1-0.85)+0.85*(1/2+1/2) = 1PR(B) = 0.15 +0.85*(1/2+1/2+1/2) = 1.425PR(C) = 0.15+0.85*(1/2) = 0.575PR(D) = 0.15 +0.85 *(1.425/2+0.575/2) = 1.0PR(E) = 0.15 + 0.85*(0.575/2 + 1) = 0.8194The second iteration is obtained below by taking the above PageRank values:PR(A) = 1.1039PR(B) = 1.2118PR(C) = 0.5750PR(D) = 0.9094PR(E) = 1.1673During the 44thiteration, the PageRank gets converges as shown in Table 2 below:Table 2Iteration A B C D E1 1 1 1 1 12 1.0000 1.4250 0.5750 1.0000 0.81943 1.1039 1.2118 0.5750 0.9094 1.16734 1.1611 1.3840 0.6191 1.0013 1.2643--- --- ---- --- --- ------ --- ---- --- --- ---41 1.6683 1.9246 0.8590 1.3330 1.648242 1.6684 1.9246 0.8590 1.3331 1.648243 1.6684 1.9247 0.8591 1.3331 1.648244 1.6685 1.9247 0.8591 1.3331 1.648345 1.6685 1.9247 0.8591 1.3331 1.6483. From above, PR(B)>PR(A)>PR(E)>PR(D)>PR(C)WPR output:WPR(A) = 0.66WPR(B) = 1.5933WPR(C) = 0.2161WPR(D) = 0.3152WPR(E) = 0.40755From above, WPR(B)>WPR(A)>WPR(E)>WPR(D)>WPR(C).
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME68Use the hyperlink structure as shown in Fig. 2 and do the PAC Calculation.The PAC equation for Page A, B, C and D as follows.G =(V,E)V={ A, B, C, D, E)E={(A,B),(A,C),(B,A),(B,D),(C,B),(C,D),C,E),(D,E),(E,A),(E,B)}n=5Pages In OutA 2 2B 3 2C 1 2D 2 1E 2 2PAC output:PAC(A) = IA + (OA / n) = 2 + (2/5) = 2.4PAC(B) = IB + (OB / n) = 3 + (2/5) = 3.4PAC(C) = IC + (OC / n) = 1 + (2/5) = 1.4PAC(D) = ID + (OD / n) = 2 + (1/5) = 2.2PAC(E) = IE + (OE / n) = 2 + (2/5) = 2.4In this, PAC( B)>PAC(A)=PAC(E)>PAC(D)>PAC(C).PR calculated values for Fig. 2, WPR calculated values for Fig. 2 andPAC calculated values for Fig. 2 are shown above.From this, it is shown that the results for PAC are obtained in single iteration.The order of PR values of Page A, B, C, D, E ; WPR values Page A, B, C, D, E andPAC values of Page A, B,C, D, E are almost same.But in the PR calculation, many iteration have to do, it increases the calculation time.But in WPR calculation, the weight for each vertex has to be calculated, it increasesthe calculation complexity.But in PAC calculation, the results are obtained in single step; therefore it reduces thetime and complexity of the calculation.The time complexity for the Proposed algorithm is O(n), since for n vertices theiteration is only n.
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME69CONCLUSIONWith the rapid growth of www and the user’s demand on knowledge, it is becomingmore difficult to manage the information on www and satisfy the user needs. Therefore, theusers are looking for better information filtering techniques to locate, extract, filter and findthe necessary information. This paper also studies the performance of PageRank and WPRalgorithms. It is found that the computational complexity is high in the existing algorithms.So, a new algorithm is proposed to calculate the PAC for Social networks. The Proposedalgorithm is found to be the better than the existing algorithm, since the proposed algorithmreduces the calculations and the time complexity.REFERENCES[1] Pooja Sharma et al. “Weighted Page Rank for Ordering Web Search Result”, IJEST ,Vol.2(12),2010,7301-7310.[2] Krishna Lerman, “Social Browsing & Information Filtering in Social Media”,arXiv:0710.5697vl [cs.CY] 30 Oct 2007.[3] Alan E.Mislove, “Online Social Networks: Measurement, Analysis and Applications toDistributed Information Systems”, in Ph.D. thesis, April 2009, Rice University.[4] Matus Medo, “Network-based information filtering algorithms: ranking andrecommendation”, arXiv:1208.4552vl [cs.SI] 22 Aug 2012.[5] DELGADO et al. “Content-based Collaborative Information Filtering”, Nagoya 466Japan, {jdelgado,ishii,tomkey}@ics.nitech.ac.jp.[6] Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm” , Proc of the SecondAnnual Conference on Communication Networks and Services Research Research(CNSR’04), IEEE, 2004.[7] Ashutosh Kumar Singh, Ravi Kumar P, “A comparative Study of Page RankingAlgorithms for Information Retrieval”, IJECE, 4.7.2009.[8] Dilip Kumar Sharma and A.K.Sharma, “ A Comparative Analysis of Web PageRanking Algorithms”, IJCSE, Vol.02,No.08,2010,2670-2676.[9] Charu C.Aggarwal, “An Introduction to Social Network Data Analytics” IBM, T.J.Research Centre.[10] Muhanad A. Al-Khalisy and Dr.Haider K. Hoomod, “POSN: Private InformationProtection in Online Social Networks”, International Journal of Computer Engineering &Technology (IJCET), Volume 4, Issue 2, 2013, pp. 340 - 355, ISSN Print: 0976 – 6367, ISSNOnline: 0976 – 6375.[11] Alamelu Mangai J, Santhosh Kumar V and Sugumaran V, “Recent Research in WebPage Classification – A Review”, International Journal of Computer Engineering &Technology (IJCET), Volume 1, Issue 1, 2010, pp. 112 - 122, ISSN Print: 0976 – 6367, ISSNOnline: 0976 – 6375.

×