GITHUB
DOWN THE RABBIT HOLE
     Vagmi Mudumbai
    @vagmi, @dharanasoft
github.com/vagmi

                   @vagmi, @dharanasoft
@vagmi, @dharanasoft
DISTRIBUTED
      FAST
TOTALLY AWESOME


              @vagmi, @dharanasoft
https://github.com/defunkt   https://github.com/mojombo




                https://github.com/pjhyett
                                             @vagmi, @dharanasoft
@vagmi, @dharanasoft
AND A LOT MORE
             @vagmi, @dharanasoft
Service Frontend / Proxy Balancer

Analytics/    App      App      App      App
             Server   Server   Server   Server      Queue
Monitoring
                                                      /
 Services
                                                    Async
                 Persistence Persistence
                                                    Server
                   Cluster     Cluster




 GENERIC ARCHITECTURE
                                                 @vagmi, @dharanasoft
Smart      HTTP       HTTP
SSH   Git
             HTTP        API      Frontend




ONLY ONE PROBLEM
        Multiple Service Points
                                      @vagmi, @dharanasoft
• Git   relies on filesystem to store its repository

  • BLOB     objects

  • Packfiles

  • Git   GC

• SSH    relies on filesystem (~/.ssh/authorized_keys)

• Ergo, GitHub    is bloody hard to scale

    TECHNICAL CHALLENGES
                                                        @vagmi, @dharanasoft
GITHUB IS NOT ON THE
       CLOUD



                 @vagmi, @dharanasoft
@vagmi, @dharanasoft
A really smart
ldirectord         lb1a   lb1b         load balancer
                                           8 core
             fe1    fe2   fe3    fe4
                                           16 GB
         u0 u1 u2 u3 u4 u5 u6 u7 16 Unicorn
         u8 u9 ua ub uc ud ue uf Processes
                                             8 core, 16 GB
                   db1a db1b              15k RPM SAS drives




    THE HTTP STACK
                                            @vagmi, @dharanasoft
MYSQL IS THE CACHE



               @vagmi, @dharanasoft
u0 u1 u2 u3 u4 u5 u6 u7 16 Unicorn
             u8 u9 ua ub uc ud ue uf Processes



                                          16 ProxyMachine
 ProxyMachine fe1    fe2    fe3    fe4
                                              Instances
Smoke, Chimney
                                           Maps user/repo
                     db1a   db1b
                                          to machine/folder
                                           8 Core, 16 GB
    Ernie     fs1a   fs2a   fs3a   fs3b      6x300 GB
    Grit      fs1b   fs2b   fs3b   fs4b    15k RPM SAS
                                              RAID 10
                                             @vagmi, @dharanasoft
ALL THIS HAPPENS IF THE
   CACHE IS A MISS



                   @vagmi, @dharanasoft
u0 u1 u2 u3 u4 u5 u6 u7 16 Unicorn
            u8 u9 ua ub uc ud ue uf Processes




memcache1   memcache2   memcache3   memcache4
  fs1b        fs2b        fs3b        fs4b

    12 GB of memcached on each fs slave


                                       @vagmi, @dharanasoft
The SSH Stack
                @vagmi, @dharanasoft
A really smart
   ldirectord          lb1a     lb1b         load balancer
Custom SSHD
   Gerve        fe1     fe2     fe3    fe4
  Chimney

                       db1a    db1b


                fs1a    fs2a    fs3a   fs3b
      git
                fs1b    fs2b    fs3b   fs4b

                SSH STACK                         @vagmi, @dharanasoft
A really smart
   ldirectord          lb1a     lb1b         load balancer

ProxyMachine    fe1     fe2     fe3    fe4
  Chimney


                       db1a    db1b


                fs1a    fs2a    fs3a   fs3b
      gitd
                fs1b    fs2b    fs3b   fs4b

               GITD STACK                         @vagmi, @dharanasoft
WHAT ABOUT QUEUES?



               @vagmi, @dharanasoft
HTTP://GITHUB.COM/DEFUNKT/RESQUE
                          @vagmi, @dharanasoft
WHAT ABOUT ARCHIVE
   DOWNLOADS?




               @vagmi, @dharanasoft
WHAT ABOUT THE WIKI?

         +


                @vagmi, @dharanasoft
GITHUB PAGES
https://github.com/mojombo/jekyll




                                    @vagmi, @dharanasoft
SHAMELESSLY STOLEN FROM

https://github.com/blog/530-how-we-made-github-fast




                                         @vagmi, @dharanasoft
@vagmi, @dharanasoft
http://ruby-lang.org, http://rubyonrails.org/

      https://github.com/mojombo/ernie,
      http://www.erlang.org/, http://bert-rpc.org/
        http://git-scm.com
         http://nginx.org/
     http://unicorn.bogomips.org/

             http://mysql.com/

ldirectord   http://horms.net/projects/ldirectord/
             http://www.drbd.org/
                                              @vagmi, @dharanasoft
http://haproxy.1wt.eu/

 http://redis.io/

       http://nodejs.org/




                            @vagmi, @dharanasoft

Github - Down the Rabbit Hole