Your SlideShare is downloading. ×



Published on

Cassovary is a Twitter's "big graph" processing library, for the JVM, written in Scala. It is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and …

Cassovary is a Twitter's "big graph" processing library, for the JVM, written in Scala. It is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and edges.
The project involves porting Cassovary from Scala 2.9.3 to Scala 2.10, making best use of the new features provided by the upgraded runtime wherever possible, rewriting those portions that might get broken, while trying to maintain backward compatibility.

Published in: Software, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. @Cassovary Porting Cassovary to Scala 2.10 BY SHRIKRISHNA HOLLA VINOD KUMAR L
  • 2. Context: Cassovary ● Simple big graph processing library for the JVM ● Designed for Scalability ● Not a database, so no persistence ● No partitioning, so better performance ● Written in Scala and can be used with any JVM-hosted language
  • 3. Problem Cassovary, currently written for Scala 2.9.x, needs to work with Scala 2.10.x ● Rationale: An opportunity to learn a functional programming language, Scala ● Scope: Any Scala program, written in Scala 2.10, must be able to use Cassovary ● Binary compatibility: Not possible
  • 4. Design and Approach ● Mistaken initial approach ● Realization ● Revised approach ● If we were to do the same for other projects…
  • 5. Time Estimates ● Initial time estimate a. Two weeks to learn Scala b. Three weeks for implementation c. Two weeks for code review and merging d. Exam period factored in. One week buffer ● Initial time estimate pushed by two weeks because of mistaken approach ● Code review: no reply from community yet
  • 6. Coding philosophies of community ● Less documentation from the start ● Coding style - code as documentation ● Descriptive variable and function names
  • 7. Code organization ● Similar to Java ● build.sbt which contains configuration rules to building the library ● library source in src/main ● test cases in src/test ● 147 test cases
  • 8. Implementation Details ● Initially started with trying to determine similarities and Differences in the Scala versions ● First resolved library dependencies ● Scala-Actor model vs Akka-Actor model ● After porting completely to Scala 2.10, realized cross- building wasn’t done ● Rewrite ● Binary incompatibility between Scala 2.9 and 2.10
  • 9. #Challenges ● Learning Scala ● Figuring out the similarities and differences between the two versions ● Resolving library dependencies for each version ● Cross building ● Effective communication with community
  • 10. Development ● Completely over Git, hosted in Github ● Created a fork of the original repository ● Two development branches ● One branch for merging changes ● Travis CI used to check builds ● Sent our first pull request, got build errors for previous versions ● Sent a revised pull request, yet to hear back
  • 11. @Community Communication ● Google groups mailing list ● Github issue list
  • 12. Pending work ● Layered Label Propagation - intended, but overtaken ● Benchmark performance for graph algorithms - halfway
  • 13. Future ● Seeing our code merged upstream ● Taking up /Pending Work ● Google Summer of Code 2014 ● Future employers? ;-)
  • 14. Summary ● Software Engineering concepts not consciously used ● SE process: V model ● After implementation, if something wrong, came back to design (Ex: cross build case) ● Group harmony: worked on separate things, came together when merging ● Blogs: Intentionally non-technical
  • 15. Gallery
  • 16. Gallery
  • 17. Gallery