Infinite gerrit

Haithem Jarraya
GerritForge
Haithem@gerritforge.com
http://www.gerritforge.com
Infinite Gerrit
2
About Haithem
• Haithem Jarraya
Big Data consultant based in London, worked in different
industries telecommunication, advertisement, financial services,
travel and government.
• Expertise with real time big data ecosystem, Apache Kafka,
Apache Cassandra, Apache Spark.
3
About GerritForge
Founded in 2009 in London UK
Mission:
Integrate Gerrit with
the Enterprise
4
Agenda
 Vision for Infinite Gerrit
 Why Cassandra?
 DfsObjDatabase – Cassandra proposed schema
• Pack list
• Packs
5
Gerrit – End Goal
 Every Gerrit mater accepts writes which in
turn will increase throughput.
 Increase flexibility for scaling up.
 Zero down time, Zero data loss.
 Reduce Gerrit operational cost by auto
scaling down instances.
6
Gerrit – Challenges
 Distributed storage
 Database replication
 Concurrent Ref updates
 Indexes update
 Cache consistency
 Shared sessions
 Agreement protocol between nodes
7
Gerrit – Challenges
 Distributed storage
 Database replication
 Concurrent Ref updates
 Indexes update
 Cache consistency
 Shared sessions
 Agreement protocol between nodes
8
C* in brief
 Each table has its rows distributed N token ranges, and each token range replicated
R times.
 Fast writes commit log(durable writes), memtable, stored to SSTable.
 Fast reads bloom filter, row key cache, row cache, SSTable index entry.
 Compaction, repairs, materialized view(3.X)…
9
DFS – Storage Layer for JGit
 org...storage.dfs.DfsObjDatabase
 org….storage.dfs.DfsRefDatabase
10
Packs C* schema - Packs
 Packs
CREATE TABLE git_store.packs (
id uuid,
ext text,
offset bigint,
value blob,
PRIMARY KEY ((id, ext, offset)))
11
Packs C* schema – Pack list
 Pack list
CREATE TABLE git_store.pack_list (
name text,
id uuid,
ext text,
size bigint,
PRIMARY KEY (name, id, ext))
WITH CLUSTERING ORDER BY
(id DESC, ext ASC)
12
LWB – HTTP route per repo
13
Questions?
14
Resources
 http://cassandra.apache.org
 http://www.datastax.com
 http://eclipse.org/jgit
 Dynamo paper http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
 Big Table paper http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
1 of 14

More Related Content

Recently uploaded(20)

Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum120 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya59 views

Infinite gerrit

  • 2. 2 About Haithem • Haithem Jarraya Big Data consultant based in London, worked in different industries telecommunication, advertisement, financial services, travel and government. • Expertise with real time big data ecosystem, Apache Kafka, Apache Cassandra, Apache Spark.
  • 3. 3 About GerritForge Founded in 2009 in London UK Mission: Integrate Gerrit with the Enterprise
  • 4. 4 Agenda  Vision for Infinite Gerrit  Why Cassandra?  DfsObjDatabase – Cassandra proposed schema • Pack list • Packs
  • 5. 5 Gerrit – End Goal  Every Gerrit mater accepts writes which in turn will increase throughput.  Increase flexibility for scaling up.  Zero down time, Zero data loss.  Reduce Gerrit operational cost by auto scaling down instances.
  • 6. 6 Gerrit – Challenges  Distributed storage  Database replication  Concurrent Ref updates  Indexes update  Cache consistency  Shared sessions  Agreement protocol between nodes
  • 7. 7 Gerrit – Challenges  Distributed storage  Database replication  Concurrent Ref updates  Indexes update  Cache consistency  Shared sessions  Agreement protocol between nodes
  • 8. 8 C* in brief  Each table has its rows distributed N token ranges, and each token range replicated R times.  Fast writes commit log(durable writes), memtable, stored to SSTable.  Fast reads bloom filter, row key cache, row cache, SSTable index entry.  Compaction, repairs, materialized view(3.X)…
  • 9. 9 DFS – Storage Layer for JGit  org...storage.dfs.DfsObjDatabase  org….storage.dfs.DfsRefDatabase
  • 10. 10 Packs C* schema - Packs  Packs CREATE TABLE git_store.packs ( id uuid, ext text, offset bigint, value blob, PRIMARY KEY ((id, ext, offset)))
  • 11. 11 Packs C* schema – Pack list  Pack list CREATE TABLE git_store.pack_list ( name text, id uuid, ext text, size bigint, PRIMARY KEY (name, id, ext)) WITH CLUSTERING ORDER BY (id DESC, ext ASC)
  • 12. 12 LWB – HTTP route per repo
  • 14. 14 Resources  http://cassandra.apache.org  http://www.datastax.com  http://eclipse.org/jgit  Dynamo paper http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf  Big Table paper http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf