KI University - Git internals

> ls .git/
A deep dive into git internals

Before we start …
http://pollev.com/markusfuchs839

Who knows what “Event sourcing” is?

“Classic” way of storing data in a database
I D N A M E I S _ A R C H I V E D S P E A K E R _ I D A D D R E S S _ I D
1 Die zahnärztliche Niederlassung false 15 84
2 Psychosomatik I true 301 12

”Event sourced” way
I D e C O U R S E _ I D E V E N T _ T Y P E E V E N T _ D AT A
1 1 CourseCreated {
“name”: “Psychosomatik 1”,
“address_id”: 84,
“speaker_id”: 12
}
2 1 CourseSpeakerChanged { “new_speaker_id”: 15 }
3 1 CourseNameChanged { ”new_name”: “Psychosomatik I” }

”Event sourced” way
Handles CourseCreated event
Handles CourseSpeakerChanged event
Handles CourseNameChanged event
Handles CourseArchived event

Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Performance with large amount of events
• Querying is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder

🤔Why am I telling you this?

Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Filtering is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder
• Performance with large amount of events

Learning goals
• A little bit about the history of git
• Internal storage mechanisms
• What’s in the .git/ folder?
• Which data structures are used by git?

git (/ɡɪt/)
I'm an egoistical
bastard, and I name all
my projects after myself.
First 'Linux', now 'git'.
L I N U S T O RVA L D S
Source: https://www.urbandictionary.com/define.php?term=Git

Facts
• Development started in April 2005
• Linux kernel team was using BitKeeper (but the owner withdrew
free use of the product)
• Linus Torvalds wanted a DVCS but none met his needs

> ls .git/
Let’s dive in … 🏊♂️

https://github.com/fum36205/repository

0
1
Create a new file
02Add the file to the index
03Create a new commit
04 We can see the commit in the log
(incl. its hash)

0
1
Switch to a new branch that starts at
our initial commit
02Create a copy of our hello.txt file and
name it hello3.txt
03Create a new commit with it
04We can see all commits of this branch
but not the ones from master

Learnings so far …
• .git/refs/heads contains one text file for each branch (named
like the branch itself) = “branch pointer”
e.g. .git/refs/heads/master, .git/refs/heads/feature/a
• HEAD is a text file that always contains the path (relative to
.git/) to the currently checked out branch
e.g. ref: refs/heads/feature/a
• Creating new branches is very easy (we only need to store a
reference to the commit)

How does git know which commits
belong to a branch?

0
1
Get the commit hashes of master …
02… and feature/a
03Print out the contents of both
commits with git cat-file

Learnings
• git cat-file –p <hash> allows us to look at the contents of a
commit
• Each commit contains
• Information about its author
• The commit message
• A timestamp when it was created
• and a reference to its direct ancestor commit
• All commits of a single branch form a linked list that can be
traversed back to the initial/first commit
• .git/refs/heads/… points to the head of this linked list

A: By hashing its contents with SHA-1.
Q: How is the commit hash generated?

Q: How does git store the files?

0
1
Move hello.txt into a newly created
subfolder
02Create a new commit with this
change
03The tree object of the root folder now
contains a reference to another tree
object
Let’s create a subfolder

commit
tree
blob blob
commit
tree
blob
commit
tree
tree blob
blob
= referenced by hash

There are THREE different kinds of “objects”
commit
Author
Commit message
Reference to the
previous commit
Timestamp
tree
One per folder (incl.
one for the root folder)
Contains the name of all
files in the folder and
references to its
corresponding blob objects
blob
File contents
Hash = SHA-1(object)

But where are these objects stored?

Objects itself are stored in a KVS

Advantages
• Efficient storage/transfer because objects with the same content
are only stored once (same hash)
• If you fucked something up the chances are very high that it can
be fixed 🎉

Merkel Merkle trees
Source: https://komodoplatform.com/whats-merkle-tree/

commit: 676e6c8
tree: d80ea91
tree: 61b7138 blob: 9b4930f
blob: 9b4930f
commit: c3b1130
tree: ab3a8b9
tree: 61b7138 blob: ed9e506
blob: 9b4930f

Further reading
Building Git
J A M E S C O G L A N
https://shop.jcoglan.com/building-git/
Pro Git
S C O T T C H A C O N
https://git-scm.com/book/de/v2

KI University - Git internals

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to KI University - Git internals

Similar to KI University - Git internals (20)

Recently uploaded

Recently uploaded (20)

KI University - Git internals

Editor's Notes