CTF3, Stripe's third Capture-the-Flag, focused on distributed systems engineering with a goal of learning to build fault-tolerant, performant software while playing around with a bunch of cool cutting-edge technologies.
More here: https://stripe.com/blog/ctf3-launch.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Stripe CTF3 wrap-up
1. Greg Brockman
Andy Brody
Christian Anderson
Philipp Antoni
Carl Jackson
Jonas Schneider
Siddarth
Chandrasekaran
Ludwig Pettersson
Nelson Elhage
Steve Woodrow
Jorge Ortiz
8. git push → lvl0-asdf@stripe-ctf.com:level0
stripe-ctf.com. IN A
ctfweb
ctfweb
ctfweb
(sinatra)
gate
gate
https://stripe-ctf.com/
(nginx, haproxy)
submitter
submitter
submitter
(poseidon)
colossus
(LDAP, scoring)
(poseidon)
(poseidon)
test case
test case
test case
generator
test case
generator
test case
generator
build
generator
generator
(sinatra)
(sinatra)
queue
queue
ctfdb
ctfdb
ctfdb
(mongo)
(mongo)
(mongo)
(RabbitMQ) .
(RabbitMQ)
(docker)
test case
test case
test case
generator
test case
generator
test case
generator
worker
generator
generator
(docker)
gitcoin
test case
test case
test case
generator
test case
generator
test case
test case
generator
generator.
generator
gen
9. What Went Wrong
– containerization
– garbage collection
- containers, filesystems, disk space
– system stability
– bugs, misconfiguration
10. What Went Right
– containerization
– service architecture
- queueing, separation of roles
– load balancing
– horizontal scaling
13. Level 0: mmap
● mmap,munmap - map or unmap files or
devices into memory
● mmap the dictionary into memory
● You can actually mmap stdin as well!
● Binary search!
14. Level 0: Bloom filters
●
●
●
●
●
Hash function: f(str) => int
Look at the result of N hash functions.
Probabilistic.
False positives, but no false negatives.
If you run into false positives, just push
again!
15. Level 0: Minimal perfect hashing
Given dictionary D = {w₁, w₂, … wn}
use MATH to generate a hash function f
f : D → {0..n-1} is one-to-one
aka every word hashes to a different small
integer
● So you can build a no-collisions hash table
● CMPH - C Minimal Perfect Hashing Library
● Build this ahead of time, link it to the binary
●
●
●
●
20. Git object model
$ git cat-file -p 000000effe7d920b391a24633e7298469dcf51b5
tree 7da86a5b10ff6db916598b653ce63e1dc0cb73c8
parent 0000000df4815161b72f4c5ed23e9fbf5deed922
author Alyssa P Hacker <alyssa@example.com> 1391129313 +0000
committer Alyssa P Hacker <alyssa@example.com> 1391129313
+0000
Mined a Gitcoin!
nonce 0302d1e2
$ git show 000000
error: short SHA1 000000 is ambiguous.
error: short SHA1 000000 is ambiguous.
46. The problem
● Text search over ~100M of text
● Arbitrary substring search (not just whole
words)
● There is a “free” indexing stage
● Distribute across up to 4 nodes
● Each node is limited to 500M of RAM
47. Search 101: Inverted Index
/tree/A: “the quick brown fox jumps over …”
/tree/B: “the fox is red”
“the”
[A, B]
“quick”
[A]
“brown”
[A]
“fox”
[A, B]
“red”
[B]
48. Search 102: Arbitrary Substring
● “trigram index”
● Store an inverted index of trigrams
● “the quick brown fox …” →
“the”, “he_”, “e_q”, “_qu”, “qui”, “uic”, …
● To query, look up all trigrams in the search
term, and intersect
● Search(“rown”) →
index[“row”] ∩ index[“own”]
○ Check each result to verify the match
49. Sharding
● We give you four nodes
● …but they all run on the same physical node
during grading
● And we didn’t resource-limit grading
containers (other than memory)
● So you don’t actually get more CPU, disk
I/O, or memory bandwidth
● Sharding ended up not really mattering
50. Winning the contest
● The spec is for arbitrary substring search
● But we only generate/query words from a
dictionary
● Some words are substrings of other
words…
● … but not too many
● Use an inverted index over dictionary words
51. Handling substrings
● Option A
○ substrings : word → [all words containing that word]
○ index : word → [list of lines containing that full word]
○
○
for word in substrings[query]:
results += index[word]
○ Can compute substrings table by brute search
○ When indexing lines, just split into words
● Option B
○ index : word → [all lines containing that word,
including as a substring]
○ Need to do the substring search as you index each
line
52. Other ways to beat the level
● Slurp the entire tree into RAM and use a
decent string search
○ (not java.lang.String.indexOf() -- that’s slow!)
● Shell out to grep
○ GNU grep is fast
56. Octopus
● (Grumpy) network simulator
● Submits queries and checks for correctness
● Several “monkeys” manipulate the network:
○
○
○
○
Netsplit monkey
Lagsplit monkey
SPOF monkey
etc.
57. Consensus Algorithms
● Raft (“In Search of an Understandable Consensus Algorithm”, Diego
Ongaro and John Ousterhout, 2013)
● Zab (“Zab: High-performance broadcast for primary-backup systems”,
Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini, 2011)
● Paxos (“The Part-Time Parliament”, Leslie Lamport, 1998, originally
submitted 1990)
● Viewstamped Replication (“Viewstamped Replication:
A New Primary Copy Method to Support Highly Available Distributed
Systems”, Brian M. Oki and Barbara H. Liskov, 1988)
etc. etc.
59. Gotchas: Idempotency
● Octopus sends node1 a commit
● Node1 forwards it to the leader, node0
● Node0 processes it and sends it back
● Octopus kills the node0 ⇔ node1 link
vs.
● Octopus sends node1 a commit
● Node1 forwards it to the leader, node0
● Octopus kills the node0 ⇔ node1 link
60. Gotchas: Idempotency
● How do you tell between these two cases?
● Naive: resubmit the query to find out!
● If the query was processed, return the old
result
● Common trick: Idempotency tokens
UPDATE ctf3 SET friendCount=friendCount+10,
requestCount=requestCount+1, favoriteWord="
jjfqcjamhpghnqq" WHERE name="carl"; SELECT * FROM ctf3
61. Making it Fast
● Every top solution replaced sql.go
● Two main strategies:
○ Write your own sqlite (or enough of it to pass)
○ Use sqlite bindings, :memory: database
● These perform roughly equally well
● Raft has a few timers you can tune
● Golf network traffic
69. Leaderboard Top 10
●
●
●
●
Everyone changed SQLite “bindings”
Seven solutions used go-raft
Two solutions used redirect-to-master
One solution implemented Raft in C++
● If at first you don’t succeed…
○ Mean submissions: 1444 (stddev 1031)
○ Max: 3946
○ Min: 58 (the C++ solution)
70. Speculative: Time-based consensus
●
●
●
●
●
All nodes were run on the same host
Local clock: total ordering on events
Leaderless state machine
???
Profit?
● Similar ideas to Spanner (Google)
71. Greg Brockman
Andy Brody
Christian Anderson
Philipp Antoni
Carl Jackson
Jonas Schneider
Siddarth
Chandrasekaran
Ludwig Pettersson
Nelson Elhage
Steve Woodrow
Jorge Ortiz