I had seen Torvalds’ talk
on YouTube about git. But it wasn’t really about git - it was more about distributed version control. It answered many of my questions and clariﬁed DVCS ideas. I still wasn’t sold on the whole idea, and I had no idea what it was good for.
What’s special about GitHub is
that people use the site in spite of git. Many git haters use the site because of what it is - more than a place to host git repositories, but a place to share code with others.
2008 january We launched the
beta in January at Steff’s on 2nd street in San Francisco’s SOMA district. The ﬁrst non-github user was wycats, and the ﬁrst project was merb-core. They wanted to use the site for their refactoring and 0.9 branch.
.com as opposed to ﬁ,
which I’m not going to get into today. You’ll have to invite PJ out if you want to hear about that.
the web app As everyone
knows, a web “site” is really a bunch of different components. Some of them generate and deliver HTML to you, but most of them don’t. Either way, let’s start with the HTMLy parts.
rails We use Ruby on
Rails 2.2.2 as our web framework. It’s kept up to date with all the security patches and includes custom patches we’ve added ourselves, as well as patches we’ve cherry-picked from more recent versions of Rails.
We found out Rails was
moving to GitHub in March 2008, after we had reached out to them and they had turned us down. So it was a bit of a surprise.
rails But there are entire
presentations on Rails, so I’m not going to get further into it here. As for whether it scales or not, we’ll let you know when we ﬁnd out. Because so far it hasn’t come close to presenting a problem.
In fact, the Coderack competition
is about to open voting to the public this week. Coders created and submitted dozens of Rack middleware for the competition. I was a judge so I got the see the submissions already. Some of my favorite were
chimney All user routes are
kept in Redis Chimney is how our BERT-RPC clients know which server to hit It falls back to a local cache and auto-detection if Redis is down
chimney It can also be
told a backend is down. Optimized for connection refused but in reality that wasn’t the real problem.
proxymachine All anonymous git clones
hit the front end machines the git-daemon connects to proxymachine, which uses chimney to proxy your connection between the front end machine and the back end machine (which holds the actual git repository) very fast, transparent to you
fragments Formerly we invalidated most
of our fragments using a generation scheme, where you put a number into a bunch of related keys and increment it when you want all those caches to be missed (thus creating new cache entries with fresh data)
fragments But we had high
cache eviction due to low ram and hardware constraints, and found that scheme did more harm than good. We also noticed some cached data we wanted to remain forever was being evicted due to the slabs with generational keys ﬁlling up fast
page We cache entire pages
using nginx’s memcached module Lots of HTML, but also other data which gets hit a lot and changes rarely:
sha asset id Instead of
scripty 301 When we changed
our wiki URL structure, we setup dynamic 301 redirects for the old urls. Scriptaculous’ old wiki was getting hit so much we put the redirect into nginx itself - this took strain off our web app and made the redirects happen almost instantly
ajax loading We also load
data in via ajax in many places. Sometimes a piece of information will just take too long to retrieve In those instances, we usually load it in with ajax
If Walker sees that it
doesn’t have all the information it needs, it kicks off a job to stick that information in memcached.
We then periodically hit a
URL which checks if the information is in memcached or not. If it is, we get it and rewrite the page with the new information.
test unit We mostly use
Ruby’s test/unit. We’ve experimented with other libraries including test/spec, shoulda, and RSpec, but in the end we keep coming back to test/unit
git ﬁxtures As many of
our ﬁxtures are git repositories, we specify in the test what sha we expect to be the HEAD of that ﬁxture. This means we can completely delete a git repository in one test, then have it back in pristine state in another. We plan to move all our ﬁxtures to a similar git-system in the future.
ci joe We use ci
joe, a continuous integration server, to run on tests after each push. He then notiﬁes us if the tests fail.
staging We also always deploy
the current branch to staging This means you can be working on your branch, someone else can be working on theirs, and you don’t need to worry about reconciling the two to test out a feature One of the best parts of Git