Haithem Jarraya
GerritForge
Haithem@gerritforge.com
http://www.gerritforge.com
Infinite Gerrit
2
About Haithem
• Haithem Jarraya
Big Data consultant based in London, worked in different
industries telecommunication, advertisement, financial services,
travel and government.
• Expertise with real time big data ecosystem, Apache Kafka,
Apache Cassandra, Apache Spark.
3
About GerritForge
Founded in 2009 in London UK
Mission:
Integrate Gerrit with
the Enterprise
4
Agenda
 Vision for Infinite Gerrit
 Why Cassandra?
 DfsObjDatabase – Cassandra proposed schema
• Pack list
• Packs
5
Gerrit – End Goal
 Every Gerrit mater accepts writes which in
turn will increase throughput.
 Increase flexibility for scaling up.
 Zero down time, Zero data loss.
 Reduce Gerrit operational cost by auto
scaling down instances.
6
Gerrit – Challenges
 Distributed storage
 Database replication
 Concurrent Ref updates
 Indexes update
 Cache consistency
 Shared sessions
 Agreement protocol between nodes
7
Gerrit – Challenges
 Distributed storage
 Database replication
 Concurrent Ref updates
 Indexes update
 Cache consistency
 Shared sessions
 Agreement protocol between nodes
8
C* in brief
 Each table has its rows distributed N token ranges, and each token range replicated
R times.
 Fast writes commit log(durable writes), memtable, stored to SSTable.
 Fast reads bloom filter, row key cache, row cache, SSTable index entry.
 Compaction, repairs, materialized view(3.X)…
9
DFS – Storage Layer for JGit
 org...storage.dfs.DfsObjDatabase
 org….storage.dfs.DfsRefDatabase
10
Packs C* schema - Packs
 Packs
CREATE TABLE git_store.packs (
id uuid,
ext text,
offset bigint,
value blob,
PRIMARY KEY ((id, ext, offset)))
11
Packs C* schema – Pack list
 Pack list
CREATE TABLE git_store.pack_list (
name text,
id uuid,
ext text,
size bigint,
PRIMARY KEY (name, id, ext))
WITH CLUSTERING ORDER BY
(id DESC, ext ASC)
12
LWB – HTTP route per repo
13
Questions?
14
Resources
 http://cassandra.apache.org
 http://www.datastax.com
 http://eclipse.org/jgit
 Dynamo paper http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
 Big Table paper http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf

Infinite gerrit

  • 1.
  • 2.
    2 About Haithem • HaithemJarraya Big Data consultant based in London, worked in different industries telecommunication, advertisement, financial services, travel and government. • Expertise with real time big data ecosystem, Apache Kafka, Apache Cassandra, Apache Spark.
  • 3.
    3 About GerritForge Founded in2009 in London UK Mission: Integrate Gerrit with the Enterprise
  • 4.
    4 Agenda  Vision forInfinite Gerrit  Why Cassandra?  DfsObjDatabase – Cassandra proposed schema • Pack list • Packs
  • 5.
    5 Gerrit – EndGoal  Every Gerrit mater accepts writes which in turn will increase throughput.  Increase flexibility for scaling up.  Zero down time, Zero data loss.  Reduce Gerrit operational cost by auto scaling down instances.
  • 6.
    6 Gerrit – Challenges Distributed storage  Database replication  Concurrent Ref updates  Indexes update  Cache consistency  Shared sessions  Agreement protocol between nodes
  • 7.
    7 Gerrit – Challenges Distributed storage  Database replication  Concurrent Ref updates  Indexes update  Cache consistency  Shared sessions  Agreement protocol between nodes
  • 8.
    8 C* in brief Each table has its rows distributed N token ranges, and each token range replicated R times.  Fast writes commit log(durable writes), memtable, stored to SSTable.  Fast reads bloom filter, row key cache, row cache, SSTable index entry.  Compaction, repairs, materialized view(3.X)…
  • 9.
    9 DFS – StorageLayer for JGit  org...storage.dfs.DfsObjDatabase  org….storage.dfs.DfsRefDatabase
  • 10.
    10 Packs C* schema- Packs  Packs CREATE TABLE git_store.packs ( id uuid, ext text, offset bigint, value blob, PRIMARY KEY ((id, ext, offset)))
  • 11.
    11 Packs C* schema– Pack list  Pack list CREATE TABLE git_store.pack_list ( name text, id uuid, ext text, size bigint, PRIMARY KEY (name, id, ext)) WITH CLUSTERING ORDER BY (id DESC, ext ASC)
  • 12.
    12 LWB – HTTProute per repo
  • 13.
  • 14.
    14 Resources  http://cassandra.apache.org  http://www.datastax.com http://eclipse.org/jgit  Dynamo paper http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf  Big Table paper http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf