git internals
dbyrne
23andMe
agenda
1.data ... in two commits
a. blobs
b. trees
c. commits
2.algorithms ... in six commits
a. commit
b. branch
c. merge
d. reset
objects
commits
trees
blobs
objects, blobs
$ git init objects_example
$ cd objects_example
$ echo "puts 'hello world'" > foo.rb
$ git add foo.rb
$ strings .git/index | grep hello
$ strings .git/index | grep foo.rb
foo.rb
objects, blobs
$ find .git/objects -type file
.git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
$ git cat-file -t aaaa # what type of object is this?
blob
$ git cat-file -p aaaa # print the object
puts 'hello world'
objects, blobs
$ echo "puts 'hello world'" > bar.rb
$ git add bar.rb
$ strings .git/index | grep bar.rb
bar.rb
$ find .git/objects -type file # no new blob
.git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
objects
$ git commit -m 'commit msg #1'
$ find .git/objects -type file
.git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
.git/objects/bb/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
.git/objects/cc/cccccccccccccccccccccccccccccccccccccc
objects, trees
$ git cat-file -t bbbb
tree
$ git cat-file -p bbbb
100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bar.rb
100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa foo.rb
objects, commits
$ git cat-file -t cccc
commit
$ git cat-file -p cccc
tree bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
author dbyrne <dennis.byrne@gmail.com> 1474171822 -0700
committer dbyrne <dennis.byrne@gmail.com> 1474171822 -0700
commit msg #1
c
a
b
foo.rb bar.rb
objects
$ mkdir dir
$ echo "puts 'hello world'" > dir/baz.rb
$ git add dir/baz.rb
$ git commit -m 'commit msg #2'
objects
$ find .git/objects -type file
.git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
.git/objects/bb/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
.git/objects/cc/cccccccccccccccccccccccccccccccccccccc
.git/objects/dd/dddddddddddddddddddddddddddddddddddddd <- new
.git/objects/ee/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee <- new
.git/objects/ff/ffffffffffffffffffffffffffffffffffffff <- new
objects
$ git cat-file -p HEAD^{commit} # same as ffff
tree eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
parent cccccccccccccccccccccccccccccccccccccccc
author dbyrne <dennis.byrne@gmail.com> 1474172113 -0700
committer dbyrne <dennis.byrne@gmail.com> 1474172113 -0700
commit msg #2
objects
$ git cat-file -p HEAD^{tree} # same as eeee
100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bar.rb
040000 tree dddddddddddddddddddddddddddddddddddddddd dir
100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa foo.rb
objects
$ git cat-file -t dddd
tree
$ git cat-file -p dddd
100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa baz.rb
c
a
b /
f
e /
d /dir
foo.rb
bar.rb
baz.rb
foo.rb bar.rb
objects
there are three types of objects: blobs, trees & commits
blobs represent files
blobs are created with the add, not the commit command
blobs are snapshots, not deltas or patches
the add command records snapshots every time
a file can be both “staged” and “modified”
trees represent directories
trees point to blobs and/or trees
refs
$ ls .git/HEAD
.git/HEAD <- important
$ tree .git/refs/
├── heads <- important
│ └── master
├── remotes
├── stash
└── tags
git commit
$ git init refs_example
$ cd refs_example
$ echo "print 'commit #1'" > spam.py
$ git add spam.py
$ git commit -m 'commit msg #1'
[master (root-commit) 1111111] commit msg #1
git commit
$ ls .git/objects/11
11111111111111111111111111111111111111
$ cat .git/refs/heads/master
1111111111111111111111111111111111111111
$ cat .git/HEAD
ref: refs/heads/master
git commit
1111
master
HEAD
top to bottom
5
HEAD
branch
commit
tree
blob
git commit
$ echo "print 'commit #2'" >> spam.py
$ git commit -am 'commit msg #2'
[master 2222222] commit msg #2
$ cat .git/HEAD
ref: refs/heads/master
git commit
1111
master
HEAD2222
git commit
$ git show-ref master --abbrev
2222222 refs/heads/master
$ git cat-file -p 2222 | grep parent
parent 1111111111111111111111111111111111111111
git branch
$ git branch feature_branch
$ ls .git/refs/heads/
feature_branch
master
$ git show-ref --abbrev
2222222 refs/heads/feature_branch
2222222 refs/heads/master
git branch
1111
master
HEAD2222
feature_branch
git checkout
$ cat .git/HEAD
ref: refs/heads/master <- before
$ git checkout feature_branch
$ cat .git/HEAD
ref: refs/heads/feature_branch <- after
git checkout
1111
master
HEAD2222
feature_branch
working on a branch
$ echo "print 'commit #3'" > eggs.py
$ git add eggs.py
$ git commit -m 'commit msg #3'
[feature_branch 3333333] commit msg #3
$ git show-ref --abbrev
3333333 refs/heads/feature_branch
2222222 refs/heads/master
working on a branch
1111
master
HEAD2222
feature_branch
3333
divergence
$ git checkout master
$ echo "print 'commit #4'" > spam.py
$ git commit -am 'commit msg #4'
[master 4444444] commit msg #4
divergence
1111
master
HEAD2222
feature_branch
3333
4444
git merge
$ git merge --no-edit feature_branch
Merge made by the 'recursive' strategy. <- important
eggs.py | 1 +
1 file changed, 1 insertion($)
create mode 100644 eggs.py
git merge
1111
master
HEAD2222
feature_branch
3333
4444
5555
git lol
$ git log --graph --oneline --decorate
* 5555555 (HEAD -> master) Merge branch 'feature_branch'
|
| * 3333333 (feature_branch) commit msg #3
* | 4444444 commit msg #4
|/
* 2222222 commit msg #2
* 1111111 commit msg #1
git merge
$ git show-ref master --abbrev
5555555 refs/heads/master
$ git cat-file -p 5555 | grep parent
parent 4444444444444444444444444444444444444444 <-
master
parent 3333333333333333333333333333333333333333 <- fe.
br.
merge vs. rebase
record of what
actually happened
vs. story of how your
project was made
git reset (to a commit)
change index? change working dir? working dir safe?
git reset --soft SHA Yes No
git reset --mixed SHA No Yes
git reset --hard SHA Yes Yes
git reset
$ git show-ref --abbrev
3333333 refs/heads/feature_branch
5555555 refs/heads/master
$ git reset --hard HEAD^
HEAD is now at 4444444 commit msg #4
$ git show-ref --abbrev
3333333 refs/heads/feature_branch
4444444 refs/heads/master
git reset
1111
master
HEAD2222
3333
4444
5555
feature_branch
feature_branch
git reset, simplified
1111
master
HEAD2222
3333
4444
git rebase
$ git show-ref feature_branch --abbrev
3333333 refs/heads/feature_branch
$ git checkout feature_branch
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: commit msg #3
$ git show-ref feature_branch --abbrev
6666666 refs/heads/feature_branch
git rebase
1111
master
HEAD2222
feature_branch
3333
4444
6666
git rebase, simplified
1111
master
HEAD2222
feature_branch
4444 6666
git merge - fast forward
$ git checkout master
$ git merge feature_branch
Updating 4444444..6666666
Fast-forward <- important
eggs.py | 1 $
1 file changed, 1 insertion($)
create mode 100644 eggs.py
git merge - fast forward
1111
master
HEAD2222
feature_branch
4444 6666
git merge - fast forward
$ git show-ref --abbrev
6666666 refs/heads/feature_branch
6666666 refs/heads/master
$ ls
eggs.py
spam.py
git reflog
$ git reflog
6666666 HEAD@{0}: merge feature_branch: Fast-forward
...
6666666 HEAD@{3}: rebase: commit msg #3
...
4444444 HEAD@{6}: reset: moving to HEAD^
5555555 HEAD@{7}: merge feature_branch: Merge made by the
'recursive' strategy.
...
git fsck
$ echo "lost?" >> spam.py && git add spam.py
$ echo "no" >> spam.py && git add spam.py
$ git fsck --unreachable
unreachable blob 7777777777777777777777777777777777777777
$ git cat-file -p 7777
print 'commit #4'
lost?
git gc
$ git fsck --unreachable
unreachable blob 7777777777777777777777777777777777777777
$ git gc --prune=all
$ git fsck --unreachable
$ git cat-file -p 7777
fatal: Not a valid object name 7777
detached head state
$ git checkout 5555555
<(HEAD detached at 5555555)> $ cat .git/HEAD
5555555555555555555555555555555555555555
<(HEAD detached at 5555555)> $ # look around, even do work
$ git checkout -b nostalgia
$ git branch
feature_branch
master
* nostalgia
trends

git internals

  • 1.
  • 2.
    agenda 1.data ... intwo commits a. blobs b. trees c. commits 2.algorithms ... in six commits a. commit b. branch c. merge d. reset
  • 3.
  • 4.
    objects, blobs $ gitinit objects_example $ cd objects_example $ echo "puts 'hello world'" > foo.rb $ git add foo.rb $ strings .git/index | grep hello $ strings .git/index | grep foo.rb foo.rb
  • 5.
    objects, blobs $ find.git/objects -type file .git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa $ git cat-file -t aaaa # what type of object is this? blob $ git cat-file -p aaaa # print the object puts 'hello world'
  • 6.
    objects, blobs $ echo"puts 'hello world'" > bar.rb $ git add bar.rb $ strings .git/index | grep bar.rb bar.rb $ find .git/objects -type file # no new blob .git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  • 7.
    objects $ git commit-m 'commit msg #1' $ find .git/objects -type file .git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa .git/objects/bb/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb .git/objects/cc/cccccccccccccccccccccccccccccccccccccc
  • 8.
    objects, trees $ gitcat-file -t bbbb tree $ git cat-file -p bbbb 100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bar.rb 100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa foo.rb
  • 9.
    objects, commits $ gitcat-file -t cccc commit $ git cat-file -p cccc tree bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb author dbyrne <dennis.byrne@gmail.com> 1474171822 -0700 committer dbyrne <dennis.byrne@gmail.com> 1474171822 -0700 commit msg #1
  • 10.
  • 11.
    objects $ mkdir dir $echo "puts 'hello world'" > dir/baz.rb $ git add dir/baz.rb $ git commit -m 'commit msg #2'
  • 12.
    objects $ find .git/objects-type file .git/objects/aa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa .git/objects/bb/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb .git/objects/cc/cccccccccccccccccccccccccccccccccccccc .git/objects/dd/dddddddddddddddddddddddddddddddddddddd <- new .git/objects/ee/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee <- new .git/objects/ff/ffffffffffffffffffffffffffffffffffffff <- new
  • 13.
    objects $ git cat-file-p HEAD^{commit} # same as ffff tree eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee parent cccccccccccccccccccccccccccccccccccccccc author dbyrne <dennis.byrne@gmail.com> 1474172113 -0700 committer dbyrne <dennis.byrne@gmail.com> 1474172113 -0700 commit msg #2
  • 14.
    objects $ git cat-file-p HEAD^{tree} # same as eeee 100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bar.rb 040000 tree dddddddddddddddddddddddddddddddddddddddd dir 100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa foo.rb
  • 15.
    objects $ git cat-file-t dddd tree $ git cat-file -p dddd 100644 blob aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa baz.rb
  • 16.
    c a b / f e / d/dir foo.rb bar.rb baz.rb foo.rb bar.rb
  • 17.
    objects there are threetypes of objects: blobs, trees & commits blobs represent files blobs are created with the add, not the commit command blobs are snapshots, not deltas or patches the add command records snapshots every time a file can be both “staged” and “modified” trees represent directories trees point to blobs and/or trees
  • 18.
    refs $ ls .git/HEAD .git/HEAD<- important $ tree .git/refs/ ├── heads <- important │ └── master ├── remotes ├── stash └── tags
  • 19.
    git commit $ gitinit refs_example $ cd refs_example $ echo "print 'commit #1'" > spam.py $ git add spam.py $ git commit -m 'commit msg #1' [master (root-commit) 1111111] commit msg #1
  • 20.
    git commit $ ls.git/objects/11 11111111111111111111111111111111111111 $ cat .git/refs/heads/master 1111111111111111111111111111111111111111 $ cat .git/HEAD ref: refs/heads/master
  • 21.
  • 22.
  • 23.
    git commit $ echo"print 'commit #2'" >> spam.py $ git commit -am 'commit msg #2' [master 2222222] commit msg #2 $ cat .git/HEAD ref: refs/heads/master
  • 24.
  • 25.
    git commit $ gitshow-ref master --abbrev 2222222 refs/heads/master $ git cat-file -p 2222 | grep parent parent 1111111111111111111111111111111111111111
  • 26.
    git branch $ gitbranch feature_branch $ ls .git/refs/heads/ feature_branch master $ git show-ref --abbrev 2222222 refs/heads/feature_branch 2222222 refs/heads/master
  • 27.
  • 28.
    git checkout $ cat.git/HEAD ref: refs/heads/master <- before $ git checkout feature_branch $ cat .git/HEAD ref: refs/heads/feature_branch <- after
  • 29.
  • 30.
    working on abranch $ echo "print 'commit #3'" > eggs.py $ git add eggs.py $ git commit -m 'commit msg #3' [feature_branch 3333333] commit msg #3 $ git show-ref --abbrev 3333333 refs/heads/feature_branch 2222222 refs/heads/master
  • 31.
    working on abranch 1111 master HEAD2222 feature_branch 3333
  • 32.
    divergence $ git checkoutmaster $ echo "print 'commit #4'" > spam.py $ git commit -am 'commit msg #4' [master 4444444] commit msg #4
  • 33.
  • 34.
    git merge $ gitmerge --no-edit feature_branch Merge made by the 'recursive' strategy. <- important eggs.py | 1 + 1 file changed, 1 insertion($) create mode 100644 eggs.py
  • 35.
  • 36.
    git lol $ gitlog --graph --oneline --decorate * 5555555 (HEAD -> master) Merge branch 'feature_branch' | | * 3333333 (feature_branch) commit msg #3 * | 4444444 commit msg #4 |/ * 2222222 commit msg #2 * 1111111 commit msg #1
  • 37.
    git merge $ gitshow-ref master --abbrev 5555555 refs/heads/master $ git cat-file -p 5555 | grep parent parent 4444444444444444444444444444444444444444 <- master parent 3333333333333333333333333333333333333333 <- fe. br.
  • 38.
    merge vs. rebase recordof what actually happened vs. story of how your project was made
  • 39.
    git reset (toa commit) change index? change working dir? working dir safe? git reset --soft SHA Yes No git reset --mixed SHA No Yes git reset --hard SHA Yes Yes
  • 40.
    git reset $ gitshow-ref --abbrev 3333333 refs/heads/feature_branch 5555555 refs/heads/master $ git reset --hard HEAD^ HEAD is now at 4444444 commit msg #4 $ git show-ref --abbrev 3333333 refs/heads/feature_branch 4444444 refs/heads/master
  • 41.
  • 42.
  • 43.
    git rebase $ gitshow-ref feature_branch --abbrev 3333333 refs/heads/feature_branch $ git checkout feature_branch $ git rebase master First, rewinding head to replay your work on top of it... Applying: commit msg #3 $ git show-ref feature_branch --abbrev 6666666 refs/heads/feature_branch
  • 44.
  • 45.
  • 46.
    git merge -fast forward $ git checkout master $ git merge feature_branch Updating 4444444..6666666 Fast-forward <- important eggs.py | 1 $ 1 file changed, 1 insertion($) create mode 100644 eggs.py
  • 47.
    git merge -fast forward 1111 master HEAD2222 feature_branch 4444 6666
  • 48.
    git merge -fast forward $ git show-ref --abbrev 6666666 refs/heads/feature_branch 6666666 refs/heads/master $ ls eggs.py spam.py
  • 49.
    git reflog $ gitreflog 6666666 HEAD@{0}: merge feature_branch: Fast-forward ... 6666666 HEAD@{3}: rebase: commit msg #3 ... 4444444 HEAD@{6}: reset: moving to HEAD^ 5555555 HEAD@{7}: merge feature_branch: Merge made by the 'recursive' strategy. ...
  • 50.
    git fsck $ echo"lost?" >> spam.py && git add spam.py $ echo "no" >> spam.py && git add spam.py $ git fsck --unreachable unreachable blob 7777777777777777777777777777777777777777 $ git cat-file -p 7777 print 'commit #4' lost?
  • 51.
    git gc $ gitfsck --unreachable unreachable blob 7777777777777777777777777777777777777777 $ git gc --prune=all $ git fsck --unreachable $ git cat-file -p 7777 fatal: Not a valid object name 7777
  • 52.
    detached head state $git checkout 5555555 <(HEAD detached at 5555555)> $ cat .git/HEAD 5555555555555555555555555555555555555555 <(HEAD detached at 5555555)> $ # look around, even do work $ git checkout -b nostalgia $ git branch feature_branch master * nostalgia
  • 53.