Your SlideShare is downloading. ×
Performance neo4j-versus (2)
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Performance neo4j-versus (2)


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Performance of Neo4J versus MongoDB for Social actions May 7, 2014 Santosh S Ravi1 Kalyanaraman Santhanam1 University of Southern California University of Southern California Abstract The data collected nowadays are highly connected in nature owing to the social nature of the way in which they are accumulated by the various social networks and other internet companies. Social network analysis (SNA) is the analysis of such data and views social relationships in terms of network theory, consisting of users and relationships between them. Graph databases like Neo4J have risen to handle these requirements by al- lowing efficient index free lookups. We try to understand the performance of the Neo4j over other NoSQL especially MongoDB. We have used three social metrics namely distance, network closure and assortavitity for our analysis. 1 Introduction Relational databases are popular for storing large amount of structured data for past few decades because of their ACID capabilities. Recent evolution of large volume of data from Social Networks and cloud services led to the development of non-traditional NoSQL datastores such as MongoDB, Neo4j and HBase etc,. With our requirements to model the highly interconnected social networking data, graph databases are particularly interesting as it directly fits into the model. Modeling the social networking data in relational databases requires many- to-many relations and should perform many join operations for a simple path traversal between two actors. Graph databases are designed to store the data in such a manner to perform traversal easily. Popular benchmarks like Yahoo! Cloud Serving Benchmark (YCSB) benchmarking framework aid in evaluating the performance of emerging cloud serving systems with different workloads. However, it does not suit evaluating the performance of popular social network- ing actions such as View Profile, List Friends in the cloud serving systems. To overcome the limitation, BG is a benchmark works well to evaluate the perfor- mance of data stores for interactive social networking actions and sessions. BG computes either a Social Action Rating (SoAR) or a Socialites rating of a data store. These ratings compute the number of concurrent actions performed by a system for a fixed percentage of requests. We leveraged BG to assess the performance of social metrics such as Dis- tance, Network Closure and Assortativity in both Neo4j and MongoDB data- stores. Neo4j, a popular java-based graph database which offers high perfor- mance, availability and ACID transactions. Neo4j supports query language, 1
  • 2. Cypher to access the data from database. We compared the performance of the social metrics for both Neo4j Embedded and Neo4j Cypher REST as well as MongoDB datastore. 2 Description 2.1 Data stores • Neo4J Community 2.0.0 Run mode: RESTful and embedded Query mode: Java API and Cypher 2.0 • MongoDB 2.6 2.2 Test setup All benchmarks are performed on a single machine with specifications as follows: 2.6 GHz Intel Core i5 with 8GB 1600 MHz DDR3 RAM, 256GB SSD, OS X 10.9.2. 2.3 BGBenchmark We used BGBenchmark v0.1.4776 for analysis of the social networking actions such as Assortativity, Network Closure and Distance. We also leveraged viewprofile action in BGBenchmark to test these social ac- tions. 2.4 Data Model and workload Figure 1 shows BGBenchmark’s data model. The workload used for benchmark- ing includes: 10,000 users with 4 friends per user and 10 resources per user. The friends relationship was created such that the data forms a torus model i.e that all the users are connected to all other users via Friends-of-Friends relation- ship. The users are given unique usersid between 0..9999 by the BGWorkload generator. 2.5 Social Metrics The following are the social metrics identified for the scope of this project. Network Closure: A measure of the completeness of relational triads. An individual’s assumption of network closure (i.e. that their friends are also friends) is called transitivity. Transitivity is an outcome of the individual or situational trait of Need for Cognitive Closure. Assortativity: The extent to which actors form ties with similar versus dissimilar others. Similarity can be defined by gender, race, age, occupation, educational achievement, status, values or any other salient characteristic. 2
  • 3. Figure 1: Data model used for Benchmarking Distance: The minimum number of ties required to connect two particular actors, as popularized by the idea of ‘six degrees of separation’. 2.6 Implementation The code developed can be classified into three sections namely - Embedded Neo4J, RESTful Neo4J and finally MongoDB. The goal was to find the best implementation for the Social Metrics identified here 2.5. To remain fair in our comparsion, we used the same algorithm in all these sections. The Distance metric is computed using Breadth-First-Search(BFS) algorithm, Assortativity metric involves iterating through the properties/attributes and finding the in- tersection and finally network closure retrieves all the nodes/actors in one hop for a given node/actor in a single query. 2.7 Test Suite The implementations are tested for accuracy using Junit test suite. Network closure and Assortativity are straight forward to test compared to Distance ac- tion. Since we already know the graph topology forms torus model, Network closure results can be validated by adding/subtracting the userid with outgo- ing and incoming friends count. In order to test the Assortativity results, we inserted ’country name’ and ’organization’ properties with synthetic data for every 100 users. We also used a formula to test distance metric since it proved to be less tedious than performing the actual BFS test. distance = min( | d − s | f , | N − d + s | f ), where s, d are the source and destination usersid and f is the number of outgoing friend relationships 3
  • 4. Figure 2: Throughput(actions/secs) comparison 3 Findings 3.1 Observation On comparison of performance between the social metrics, we find network clo- sure and assortativity has much higher throughput compared to distance metric. Since the distance metric performs graph transversals whereas the former met- rics just performs lookups on userid, an indexed attribute in Neo4j. Between network closure and assortativity, network closure performs poor as expected since it involves iterating all neighbours of the given nodes to find the intersec- tion of friend members between the user nodes. As expected, Embedded Java API outperforms others significantly. The rea- son being elimination of network overhead and object marshalling/unmarshalling overhead. We also believe the main reasons for Neo4j Cypher performing poorly is due to network overhead, cypher query parsing and optimization performed by cypher engine. The Neo4j’s index-free lookup takes center stage for the distance metric. MongoDB performs slower at least by factor of 10 and 100 compared to the Neo4j RESTful and Embedded versions respectively. The workload for distance metric being examined requires on an average of 2000 - 3000 index lookups to perform the BFS traversal to arrive at the goal node. Since Neo4j uses Relationship Expander for path traversals, it avoids the lookup of user indexes compared to MongoDB. As for other metrics, MongoDB performs more or less similar to Neo4j REST. However, we believe mongodb protocol plays a significant part in MongoDB’s higher throughput results for assortativity. mongodb protocol operates via TCP/IP over the transport layer using BSON format whereas Neo4j REST in addition to TCP/IP uses RESTful HTTP protocol headers with JSON encoding/decoding. 4
  • 5. Figure 3: Throughput(actions/secs) comparison for distance metric 4 Future Work We would like to include these social metrics as part of BGBenchmark frame- work to extend the current set of actions. We believe that the implementation of new social metrics would give better understanding of data stores for complex graph operations compared to the existing simple operations. References [1] Neo4j Documentation, [2] MongoDB Documentation, [3] Florian Holzschuher and Ren´e Peinl, Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j. [4] Social Network Analysis, [5] BG Benchmark, 5