Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 11 SAMPLING ONLINE SOCIAL NETWORKS USING OUTLIER INDEXING Mr. Yogesh P Murumkar Student at PVPIT, Bavdhan, Pune Prof. Yogesh B. Gurav Asst. professor, PVPIT, Bavdhan, Pune I. ABSTRACT Online social networking are emerging, as well as underlying network infrastructure to use has increased interest Information for improving the information available on the social partners as a user. Multiplicative perturbations of linear data-additive, or a The combination of the two to study the utility of the flustered we discuss the output distortion using nonlinear data Possible nonlinear random data changes and show how This anomaly detection can be useful for maintaining the confidentiality Sensitive data set. We expect to develop limits on the accuracy of by using nonlinear distortion and also quantify privacy Standard definition to allow this approach. Main attractions by varying the degree of privacy the amount of control a user the nonlinearity. In full generality, and then changes to show that, for specific Cases, it is the distance protection. A user or a dynamic social network to collect information from a node in the neighborhood is focused on improving performance. User or node's social network to detect correctly we sampling- based algorithms to compress interest structure and social network considering the amount of estimated time is introduced to provide our sample correlations across the us, And also analyzed the basic sampling scheme variants, Distributed and centralized network model. In proposed system we used Outlier indexing algorithm because large datasets because random samples can be used for a wide range of analytical tasks. A main contribution of this paper is the discussion between the inevitability of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments are conducted on real-life data sets demonstrate the effectiveness of the approach. Index Terms: Online Social Network, Information Networks, Search Process, Query Processing, Performance Evaluation, Privacy. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 12 II. INTRODUCTION Over the last decade, the World Wide Web and Web search engines have fundamentally transformed the way people find and share information. Recently, a new form of publishing and locating information, known as online social networking, has become very popular. The social network structure can be modeled as a graph G with individuals representing nodes and relationships among them representing edges. We model environments in which social peers participate in a centralized social network (where knowledge of the network structure is assumed) or distributed (where network structure is unknown or limited). In any case, we assume that the rate of change of the content in these networks is high. Given such an environment we define the following problems Sampling Nodes in Social Networks, Sampling Information in Social Networks, Low selectivity, Centralized graphs are typical in social networking sites in which complete knowledge of users’ network is maintained. The changing trends in the use of web technology that aims to enhance interconnectivity, self-expression, and information sharing on the web have led to the emergence of online social networking services. This is evident by the multitude of activity and social interaction that takes place in web sites like Face book, My space, and Twitter to name a few. At the same time the desire to connect and interact evolves far beyond centralized social networking sites and takes the form of ad hoc social networks formed by instant messaging clients, VoIP software, or mobile geo social networks. While numerous studies have focused on the hyperlinked structure of the Web and have exploited it for searching content, few studies, if any, have examined the information exchange in online social networks. [1] The majority of all combinatorial computing applications can apparently be handled only by what amounts to an exhaustive search through all possibilities. [2] The effectiveness of the branch-and-bound procedure for solving mixed integer programming (MIP) problems has made it a method of choice in commercial software for several decades. [3] Anyone who has used a backtracking procedure will probably have observed some problem instances being solved almost immediately, and other problem instances of a similar size taking an inordinate length of time to solve. [4] Online social networks have become increasingly popular in the recent decade which gave rise to an increasing need in analyzing their properties and comparing them to one another. Many properties of online social networks are considered important.[5] A more efficient distributed algorithm for the DFS traversal of a network can help reduce the complexity of other distributed graph algorithms which use a distributed DFS traversal as their basic building block.[6] Many special traversal Techniques have been applied to solve graph-related problems.[7] A new distributed algorithm is presented for constructing breadth first search (BFS) trees. A BFS tree is a tree of shortest paths from a given root node to all other nodes of a network under the assumption of unit edge weights; such trees provide useful building blocks for a number of routing and control functions in communication networks [8] survey many of the measures used to describe and evaluate the efficiency and effectiveness of large-scale search services. These measures, herein visualized versus verbalized, reveal a domain rich in complexity and scale.[9] Complex networks describe a wide range of systems in nature and society. Frequently cited examples include the cell, a network of chemicals linked by chemical reactions, and the Internet, a network of routers and computers connected by physical links.[10] In the following section III we will discuss the different types of recommendation approaches along with their advantages and disadvantages. Section IV presents the proposed approach for web page recommendation. III. LITERATURE REVIEW A. Mislove, K.P. Gummadi, and P. Druschel, [1] in this paper, they examined the potential for using online social networks to enhance Internet search. They analyzed the differences between
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 13 the Web and social networking systems in terms of the mechanisms they use to publish and locate useful information. They discussed the benefits of integrating the mechanisms for finding useful content in both the Web and social networks. Our initial results from a social networking experiment suggest that such integration has the potential to improve the quality of Web search experience. D.E. Knuth [2] One of the chief difficulties associated with the so-called backtracking technique for combinatorial problems has been our inability to predict the efficiency of a given algorithm, or to compare the efficiencies of different approaches, without actually Writing and running the programs. This paper presents a simple method which produces reasonable estimates for most applications, requiring only a modest amount of hand calculation. The method should prove to be of considerable utility in connection with D. H. Lehmer's branch-and-bound approach to combinatorial optimization. G. Cornujols, M. Karamanov, and Y. Li [3] In this paper they shows showed empirically that the branch-and-bound solution time of an MIP solver can be roughly estimated in the early stages of the solution process. We proposed a procedure for this estimation based on parameters of a small sub tree. Our experiments showed that in a relatively short time, we can obtain sufficient information to predict the total running time with an error within a factor of five. This procedure can easily be built into an MIP solver. It is fast and does not interfere with the branch-and-bound algorithm. P. Kilby, J. Slaney, S. Thie´baux, and T. Walsh [4] in this paper they propose two new online methods for estimating the size of a backtracking search tree. The first method is based on a weighted sample of the branches visited by chronological backtracking. The second is a recursive method based on assuming that the unexplored part of the search tree will be similar to the part we have so far explored. They compare these methods against an old method due to Knuth based on random probing. They show that these methods can reliably estimate the size of search trees explored by both optimization and decision procedures. They also demonstrate that these methods for estimating search tree size can be used to select the algorithm likely to perform best on a particular problem instance. [5] They presented two algorithms for estimating the size of graphs. Both algorithms rely on nodes being samples from the graph's stationary distribution. They showed both analytically and experimentally that, for social-networks and other small world graphs, these algorithms considerably outperform uniformly sampling nodes. They consistently provide more accurate estimates while using a smaller number of samples. This result is even more outstanding since uniformly sampling nodes is strictly harder than sampling them according to the stationary distribution. IV. PROPOSED ALGORITHM Figure 1: Flow Diagram
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 14 To explore the underlying social structure and information, to improve the accuracy, to improve the efficiency, to study the application of sampling-based algorithms, to improve efficiency for low sensitivity quantities using Outlier Indexing technique, Figure 2: Block Diagram We present algorithmic details of our proposed methods. First, we describe Sample Dyn, an algorithm that is able to compute a near-uniform sample of users in dynamic social networks. Sampling Dynamic Social Networks Let Dd(v) be the vicinity of a user v at depth d. We introduce the algorithm SampleDyn that takes as input the user v, the size of the sample n, the network depth d, and a constant value for parameter C and obtains a near-uniform random sample of users by performing random walks on the nodes of Dd(v). Algorithm 1: Sampling in Dynamic Social Networks Procedure SAMPLEDYN (u; n; d;C) T = NULL, samples = 0, Sample array of size n while samples <= n do if (v = randomWalkðu; d;C; T))! = 0 then Sample=[samples ++] end if end while end procedure procedure RANDOMWALK(u; d;C; T) depth = 0, ps = 1 while depth < d do pick v 2 children(u) [u with pv = 1/degree(u)+1 if T [ v has no cycle then add v to T ps = ps & pv if v = u then accept with probability C ps if accepted then return v else return 0
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 15 end if else u = v, depth++ end if end if end while return 0 end procedure Using Separate Samples A first approach is to draw a separate independent sample from D(v) and estimate the aggregate counts for each item. Algorithm 2: Counts Estimation—Separate Samples Procedure EVALSINGLE (v; d;C; n;X) S array of size n Count array of size jXj for all x ϵ X do S = SampleDyn(v; n; d;C) for all i ϵ S do Count[x]= Count[x] + countix end for end for return Count end procedure Distributed Outlier Detection First, we give outlier detection algorithm for horizontally partitioned data without considering privacy. Consider a distributed setting with p players, each player having a subset of objects in the whole database. In this setting, each player first computes its set of local outliers by using the centralized algorithm on its local dataset. After the local outliers are generated, all the players communicate to compute the global outliers from the sets of local outliers. At the end of the algorithm each player will have its subset of the actual global outliers. We consider the horizontal distribution where each player has a subset of the total number of objects. The distributed algorithm DistributedOD is divided broadly into three phases. In the first phase, all players communicate to compute the global parameters. Then each player locally computes its set of local probable outliers M0. In the second phase GlobalApproxOD, the players engage in communication to compute their subsets of global probable outliers. Finally, in the third phase GlobalOD, the players again engage in communication to compute their subsets of the actual global outliersan overview of the process in the distributed setting from the perspective of one player in a two player setting. It is clear from the figure that the round complexity of our algorithm, which also holds true for multi player setting. Algorithm 1 DistributedOD: Outlier Detection Algorithm for Horizontal Distribution Require: Players PA and PB, PA’s Dataset DA, PB’s Dataset DB, Distance Threshold dt, Point Threshold pt, Approximation Factor _ Ensure: PA’s Outliers MA At PA : PA sends |DA| to PB
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 16 At PB : n = |DA| + |DB| PB sends n to PA At PA : p0 t = (1 − pt) × n R = dt/(1 + _) TA L×H = LSH(DA,R) compute bt M0A = ApproximateOD(DA, TA L×H, p0 t, bt) M00A = GlobalApproxOD(M0A) MA = GlobalOD(M00A) We give the distributed algorithm in a two player setting, which can be easily extended to a p player setting. Consider two players denoted by PA and PB with local datasets DA and DB. We present the algorithm such that one player, say PA will be able to compute its subset of the global outliers at the end of the algorithm. Similarly the algorithm can be used to enable PB to compute its subset of the global outliers by simply interchanging the roles of PA and PB in the algorithm. Using the Same Sample An alternate approach is to draw a sample S only once, and reuse the same sample to estimate the aggregate counts for each item x ϵ X. We refer to this algorithm as because it evaluates a batch of items at each visit to a sampled node. Cost Analysis Our sampling algorithms provide an alternative to performing an exhaustive search or crawling on the network of a user using a depth-first-search or breadth-first-search. Cost Model Let Dd(v) = (N;E) be the neighborhood of a user v at depth d, where N is the set of nodes and E the set of links in the network. Nodes are autonomous in that they perform their computation and communicate with each other only by sending messages. Each node is unique and has local information, such as the identity of each of its neighbors. We assume that each node handles messages from and to neighbors and performs local computations in zero time, meaning that communication delays outweigh local computations on the nodes. V. EXPERIMENTAL ANALYSIS Sampling Accuracy Performing random walks by selecting each outgoing edge with equal probability shall pick leaf nodes in a biased manner. This is because some leaves, e.g., leaves that are close to the root, are more likely to be destinations of random walks than other leaves. In our first set of experiments, we explore the effect of this bias in the sampling accuracy and compare the performance of the aforementioned naive sampling method, say Naive, to our sampling method, Eval Single.
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 17 Sampling Cost EvalSingle performs considerably better in terms of accuracy than a naive sampling method, but as many of the performed random walks end up rejecting a selected leaf node, it can be expensive. In this experiment, we evaluate the cost of our sampling method against the naive sampling method and against the cost of crawling the entire neighborhood of a user. For experimental results we will use synthetic user search history logs. The synthetic log consists of the same users as the real log (from AOL data set along with their search history logs) but we populate user’s history logs with high numbers of queries and url counts. Following table shows the sampaling result of query. Name Type Users Queries Urls Real dataset Real 75888 4026350 2789542 Synthetic dataset Synthetic 50 200 150 Table 1: The sampling result of query Figure 3: Existing & Proposed Graph Figure 3 shows the Accuracy Vs Data size in existing & proposed system. Table 2 shows the Comparison of existing & proposed system. Existing System Proposed System Efficiency Low High Sampling Accuracy Medium High Sampling Cost Low High Table 2: Comparison with Existing system & Proposed system
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 6, June (2014), pp. 11-18 © IAEME 18 VI. CONCLUSION AND FUTURE WORK Our research shows the methods for collecting quickly Information from a user in a dynamic neighborhood its structure has limited knowledge of when social network or is not available. Our methods for efficient approximation of Sampling-based algorithms we sample. A user avoid listing all nodes around and thus to improve the performance of our approach, Running real experiments on show and Synthetic data set. Despite its potential, our liaising sampling method limitations and the amount is expected to be disabled with very little selectivity. About a similar problem arise Answering queries using sample collection. Solution Based on weighted sampling there rely on workload Information. However, in our reference data each node are stored on fast this method does not change the Consider that our algorithms directly applicable. Information A network logs, Web history, as each user Access to user personal information infringes on available. Privacy and, thus, privacy concerns may serve as a major obstacle toward acceptance of our algorithms. Systems that must follow our algorithms use to prepare to approach social translucence System that the visibility, awareness of the need to strike a balance of others, and accountability. A main contribution of this paper is the discussion between the inevitability of a transformation and privacy preservation and the application of these techniques to outlier detection. In future work Apart from hierarchical index structures, the proposed scheme of CS-SSE can be extended to other data structures like hashing which may further improve performance in terms of server side computations. One may also work towards achieving constant round protocol for the proposed CS-SSE scheme as opposed to the logarithmic round protocol. VII. REFERENCE [1] A. Mislove, K.P. Gummadi, and P. Druschel, “Exploiting Social Networks for Internet Search,” Proc. Fifth Workshop Hot Topics in Networks (HotNets), 2006. [2] D.E. Knuth, “Estimating the Efficiency of Backtrack Programs,” Math. of Computation, vol. 29, no. 129, pp. 121-136, 1975. [3] G. Cornujols, M. Karamanov, and Y. Li, “Early Estimates of the Size of Branch-and-Bound Trees,” INFORMS J. Computing, vol. 18, pp. 86-96, 2006. [4] P. Kilby, J. Slaney, S. Thie´baux, and T. Walsh, “Estimating Search Tree Size,” Proc. Nat’l Conf. Artificial Intelligence (AAAI), 2006. [5] L. Katzir, E. Liberty, and O. Somekh, “Estimating Sizes of Social Networks via Biased Sampling,” Proc. 20th Int’l Conf. World Wide Web (WWW), 2011. [6] S.A.M. Makki and G. Havas, “Distributed Algorithms for Depth- First Search,” Information Processing Letters, vol. 60, no. 1, pp. 7-12, 1996. [7] T.-Y. Cheung, “Graph Traversal Techniques and the Maximum Flow Problem in Distributed Computation,” IEEE Trans. Software Eng., vol. SE-9, no. 4, pp. 504-512, July 1983. [8] B. Awerbuch and R.G. Gallager, “A New Distributed Algorithm to Find Breadth First Search Trees,” IEEE Trans. Information Theory, vol. 33, no. 3, pp. 315-322, May 1987. [9] C.T.G. Pass and A. Chowdhury, “A Picture of Search,” Proc. First Int’l Conf. Scalable Information Systems (InfoScale), 2006. [10] R. Albert and I. Barabasi, “Statistical Mechanics of Complex Networks,” Modern Physics Rev., vol. 74, p. 47, 2002. [11] Muhanad A. Al-Khalisy and Dr.Haider K. Hoomod, “POSN: Private Information Protection in Online Social Networks”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 340 - 355, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [12] L.Rajeswari and Dr.S.S.Dhenakaran, “Page Access Coefficient Algorithm for Information Filtering in Social Network”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp. 60 - 69, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.