Published on

Cassovary is a Twitter's "big graph" processing library, for the JVM, written in Scala. It is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and edges.
The project involves porting Cassovary from Scala 2.9.3 to Scala 2.10, making best use of the new features provided by the upgraded runtime wherever possible, rewriting those portions that might get broken, while trying to maintain backward compatibility.

Published in: Software, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. @Cassovary Porting Cassovary to Scala 2.10 BY SHRIKRISHNA HOLLA VINOD KUMAR L
  2. 2. Context: Cassovary ● Simple big graph processing library for the JVM ● Designed for Scalability ● Not a database, so no persistence ● No partitioning, so better performance ● Written in Scala and can be used with any JVM-hosted language
  3. 3. Problem Cassovary, currently written for Scala 2.9.x, needs to work with Scala 2.10.x ● Rationale: An opportunity to learn a functional programming language, Scala ● Scope: Any Scala program, written in Scala 2.10, must be able to use Cassovary ● Binary compatibility: Not possible
  4. 4. Design and Approach ● Mistaken initial approach ● Realization ● Revised approach ● If we were to do the same for other projects…
  5. 5. Time Estimates ● Initial time estimate a. Two weeks to learn Scala b. Three weeks for implementation c. Two weeks for code review and merging d. Exam period factored in. One week buffer ● Initial time estimate pushed by two weeks because of mistaken approach ● Code review: no reply from community yet
  6. 6. Coding philosophies of community ● Less documentation from the start ● Coding style - code as documentation ● Descriptive variable and function names
  7. 7. Code organization ● Similar to Java ● build.sbt which contains configuration rules to building the library ● library source in src/main ● test cases in src/test ● 147 test cases
  8. 8. Implementation Details ● Initially started with trying to determine similarities and Differences in the Scala versions ● First resolved library dependencies ● Scala-Actor model vs Akka-Actor model ● After porting completely to Scala 2.10, realized cross- building wasn’t done ● Rewrite ● Binary incompatibility between Scala 2.9 and 2.10
  9. 9. #Challenges ● Learning Scala ● Figuring out the similarities and differences between the two versions ● Resolving library dependencies for each version ● Cross building ● Effective communication with community
  10. 10. Development ● Completely over Git, hosted in Github ● Created a fork of the original repository ● Two development branches ● One branch for merging changes ● Travis CI used to check builds ● Sent our first pull request, got build errors for previous versions ● Sent a revised pull request, yet to hear back
  11. 11. @Community Communication ● Google groups mailing list ● Github issue list
  12. 12. Pending work ● Layered Label Propagation - intended, but overtaken ● Benchmark performance for graph algorithms - halfway
  13. 13. Future ● Seeing our code merged upstream ● Taking up /Pending Work ● Google Summer of Code 2014 ● Future employers? ;-)
  14. 14. Summary ● Software Engineering concepts not consciously used ● SE process: V model ● After implementation, if something wrong, came back to design (Ex: cross build case) ● Group harmony: worked on separate things, came together when merging ● Blogs: Intentionally non-technical
  15. 15. Gallery
  16. 16. Gallery
  17. 17. Gallery
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.