INSIDE
GIT
GIT INSIDE OUT
MICHAEL NADEL
▸Developer @ Pine River Capital Management
▸New(ish) to .NET
▸3-year Git practitioner
▸Please reach out!
▸michael.nadel@gmail.com
▸@mnadel
GIT INSIDE OUT
GIT IS HARD
▸Linus Torvalds, creator of Git (and Linux)
▸Initial revision of “git”, the information manager from hell
▸I didn’t really expect anyone to use it because it’s so hard
to use.
▸Andrew Morton, lead Linux kernel developer
▸Git is expressly designed to make you feel less intelligent
than you thought you were.
GIT INSIDE OUT
THE CHALLENGE WITH GIT
▸Plenty of rope
▸Paradigm shifts
▸Distributed
▸Content-addressable filesystem
GIT INSIDE OUT
WHY INSIDE OUT?
GIT INSIDE OUT
AGENDA
▸Paradigm shifts
▸Conceptual models
▸Overview of internals
▸Dissect common operations
GIT INSIDE OUT
DISTRIBUTED
▸No central authority (except by convention)
YOUR REPO/CLONE OTHERS’ REPOS
CLONE
FETCH
PUSH
GIT INSIDE OUT
COMMITTING != SHARING
▸Separate concerns
▸Crafting your history
▸Publishing your history
▸Richer workflows
▸Commit, commit, commit, squash, push
▸Reorder, push subset
▸Enforced code reviews
GIT INSIDE OUT
CONTENT ADDRESSABLE
▸Version control is an abstraction on top of a primitive
key/value store
▸hash-object
▸cat-file
▸Prove
▸cat-file performs no magic
GIT INSIDE OUT
CONTENT ADDRESSABLE
▸Given arbitrary content
GIT INSIDE OUT
CONTENT ADDRESSABLE
▸Git primitive: hash-object
GIT INSIDE OUT
CONTENT ADDRESSABLE
▸Git returns its key (SHA1 hash of contents)
GIT INSIDE OUT
CONTENT ADDRESSABLE
▸Git primitive: cat-file
GIT INSIDE OUT
CONTENT ADDRESSABLE
▸Git returns the original content
GIT INSIDE OUT
CONTENT ADDRESSABLE
▸Final proof!
GIT INSIDE OUT
CONTENT ADDRESSABLE FILESYSTEM
▸Instead of text, how about your filesystem?
GIT INSIDE OUT
CONCEPTUAL MODELS
▸Git as a Database
▸Store, retrieve, search your source code & its history
▸Git as a Graph
▸CRUD operations are performed against a graph of
commits
GIT INSIDE OUT
GIT AS A DATABASE
▸CRUD, search operations
▸Data types
▸Commit
▸Tree
▸Blob
Structured text
byte[]
GIT INSIDE OUT
DB TYPE - COMMIT
GIT INSIDE OUT
DB TYPE - COMMIT
GIT INSIDE OUT
DB TYPE - COMMIT
GIT INSIDE OUT
DB TYPE - TREE
GIT INSIDE OUT
DB TYPE - TREE
GIT INSIDE OUT
DB TYPE - TREE
GIT INSIDE OUT
DB TYPE - TREE
GIT INSIDE OUT
DB TYPE - BLOB
GIT INSIDE OUT
CONTENT ADDRESSABLE FILESYSTEM
93b6
ae3e
7bbc63b7
4e3c 3ca5
GIT INSIDE OUT
GIT AS A GRAPH
▸What operations must I perform to get the graph to look the
way I want?
GIT INSIDE OUT
GIT COMMANDMENTS
▸Git is immutable
▸No updates, only appends
▸Git is a directed acyclic graph (DAG)
▸Directed: can only traverse in a single direction
▸Acyclic: no cycles — traversals only visit a node once
▸Every command is an operation on the graph
GIT INSIDE OUT
GIT IS IMMUTABLE
PREVIOUS SNAPSHOTCURRENT SNAPSHOT
▸Branch
▸Commit
▸Fetch
▸Merge
▸Push
▸Rebase
GIT INSIDE OUT
DISSECT COMMON OPERATIONS
GIT INSIDE OUT
REFS, HEADS, BRANCHES
▸Ref is a pointer to a commit
▸Branch is a ref
▸HEAD is a pointer to your current branch
▸Branches have “namespaces”
GIT INSIDE OUT
BRANCH (BEFORE)
▸A branch is a ref
▸A ref is a pointer to a commit
GIT INSIDE OUT
BRANCH (AFTER)
▸A branch is a ref
▸A ref is a pointer to a commit
▸Heads contain your branches
▸Remotes contain remote
branches (eg origin)
▸“Namespaces” are directories
▸Branches are 40-byte files
containing a SHA1 hash of a
commit object
GIT INSIDE OUT
BRANCH IMPLEMENTATION
GIT INSIDE OUT
COMMIT (BEFORE)
▸A commit references its parent
▸HEAD, branch point at commit
GIT INSIDE OUT
COMMIT (AFTER)
▸Add a new node
▸Advance branch
GIT INSIDE OUT
FETCH (BEFORE)
▸Fetch brings remote objects into repo
▸Refs, commits
▸But Git is immutable
▸Graft remote commits into graph
▸Updates refs in remote namespace
GIT INSIDE OUT
FETCH (AFTER)
GIT INSIDE OUT
MERGE (BEFORE)
▸Merge is just a commit object
▸But which refers to two parents
GIT INSIDE OUT
MERGE (AFTER)
▸Creates new commit with two parents
▸Current branch, other branch
▸Advance branch
GIT INSIDE OUT
FAST-FORWARD MERGE (BEFORE)
▸Type of merge
▸Is there a traversal from your branch to the other’s?
GIT INSIDE OUT
FAST-FORWARD MERGE (AFTER)
▸Simplest merge
▸Pointer manipulation only
GIT INSIDE OUT
REBASE (BEFORE)
▸Type of merge
▸Without merge commit objects
GIT INSIDE OUT
REBASE (AFTER)
▸Replay your commits onto the other branch
GIT INSIDE OUT
PUSH
▸Two-step process
▸“Reverse” fetch — push commits, refs
▸Attempt a fast-forward merge
GIT INSIDE OUT
PUSH
▸What will happen if you push?
▸How do you recover?
GIT INSIDE OUT
SAVING YOURSELF - RESET
GIT INSIDE OUT
SAVING YOURSELF - REFLOG
GIT INSIDE OUT
SAVING YOURSELF - INTERACTIVE
REBASE
GIT INSIDE OUT
RECAP - CONCEPTUAL MODELS
▸Duality of Git
▸As a database
▸As an immutable DAG
▸Reasoning through problems
▸Launch SmartGit & observe the result of commands
against the DAG
GIT INSIDE OUT
RECAP - “OH SHIT!” COMMANDS
▸git reflog
▸git reset
▸—soft won’t affect your workspace
▸—hard will make your workspace reflect where your
HEAD moved to (you can lose work)
▸git rebase -i
GIT INSIDE OUT
RESOURCES
▸https://git-scm.com/book/en/v1/Git-Internals
▸https://github.com/pluralsight/git-internals-pdf/releases
▸https://pinboard.in/u:mnadel/t:git/
▸http://www.syntevo.com/smartgit/
▸Free for open source projects
▸Also: https://www.sourcetreeapp.com/
▸michael.nadel@gmail.com

Git inside out

Editor's Notes

  • #3 Take informal survey!
  • #4 http://typicalprogrammer.com/linus-torvalds-goes-off-on-linux-and-git/ http://www.linuxfoundation.org/news-media/blogs/browse/2012/02/greatness-git
  • #6 How many people can relate? I often found myself in this situation. Then started learning more & more about Git’s internals. And found myself in this situation less & less. I started talking to other people about it, and, it turns out, they had a similar experience. This is why I want to take a depth-first approach with you folks tonight. I think it’s important to grok Git’s internals in order to be able to reason your way through situations you find yourself in. And I want to share that journey with you this evening.
  • #7 NEXT: Distributed
  • #8 Git is egalitarianistic
  • #9 NEXT: Content addressable
  • #10 Porcelain vs plumbing
  • #17 NEXT: Conceptual models
  • #18 NEXT: Git as a database
  • #27 Note that the filename isn’t part of the blob
  • #28 NEXT: Git as a graph
  • #30 A tree is a DAG iff each child has a single parent. It’s immutable b/c of the key-value store.
  • #31 NEXT: Dissect Ruby on Rails SVN repo: 115M Ruby on Rails Git repo: 13M
  • #35 Implemented as writing a 40-byte hash to a file on your file system. This is why branching is blazing fast.
  • #40 It's a *D*AG. Since new nodes aren't reachable by HEAD, your view of the graph hasn't changed, thus we haven't violated its immutability.
  • #48 NEXT: Saving yourself series
  • #51 NEXT: Recap
  • #53 NEXT: Resources