2. 2
About Haithem
• Haithem Jarraya
Big Data consultant based in London, worked in different
industries telecommunication, advertisement, financial services,
travel and government.
• Expertise with real time big data ecosystem, Apache Kafka,
Apache Cassandra, Apache Spark.
5. 5
Gerrit – End Goal
Every Gerrit mater accepts writes which in
turn will increase throughput.
Increase flexibility for scaling up.
Zero down time, Zero data loss.
Reduce Gerrit operational cost by auto
scaling down instances.
8. 8
C* in brief
Each table has its rows distributed N token ranges, and each token range replicated
R times.
Fast writes commit log(durable writes), memtable, stored to SSTable.
Fast reads bloom filter, row key cache, row cache, SSTable index entry.
Compaction, repairs, materialized view(3.X)…
11. 11
Packs C* schema – Pack list
Pack list
CREATE TABLE git_store.pack_list (
name text,
id uuid,
ext text,
size bigint,
PRIMARY KEY (name, id, ext))
WITH CLUSTERING ORDER BY
(id DESC, ext ASC)
14. 14
Resources
http://cassandra.apache.org
http://www.datastax.com
http://eclipse.org/jgit
Dynamo paper http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
Big Table paper http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf