SlideShare a Scribd company logo
1 of 47
> ls .git/
A deep dive into git internals
Before we start …
http://pollev.com/markusfuchs839
Who knows what “Event sourcing” is?
“Classic” way of storing data in a database
I D N A M E I S _ A R C H I V E D S P E A K E R _ I D A D D R E S S _ I D
1 Die zahnärztliche Niederlassung false 15 84
2 Psychosomatik I true 301 12
”Event sourced” way
I D e C O U R S E _ I D E V E N T _ T Y P E E V E N T _ D AT A
1 1 CourseCreated {
“name”: “Psychosomatik 1”,
“address_id”: 84,
“speaker_id”: 12
}
2 1 CourseSpeakerChanged { “new_speaker_id”: 15 }
3 1 CourseNameChanged { ”new_name”: “Psychosomatik I” }
”Event sourced” way
Handles CourseCreated event
Handles CourseSpeakerChanged event
Handles CourseNameChanged event
Handles CourseArchived event
Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Performance with large amount of events
• Querying is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder
🤔Why am I telling you this?
Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Filtering is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder
• Performance with large amount of events
Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Filtering is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder
• Performance with large amount of events
Learning goals
• A little bit about the history of git
• Internal storage mechanisms
• What’s in the .git/ folder?
• Which data structures are used by git?
git (/ɡɪt/)
I'm an egoistical
bastard, and I name all
my projects after myself.
First 'Linux', now 'git'.
L I N U S T O RVA L D S
Source: https://www.urbandictionary.com/define.php?term=Git
Facts
• Development started in April 2005
• Linux kernel team was using BitKeeper (but the owner withdrew
free use of the product)
• Linus Torvalds wanted a DVCS but none met his needs
> ls .git/
Let’s dive in … 🏊♂️
https://github.com/fum36205/repository
0
1
Create a new file
02Add the file to the index
03Create a new commit
04 We can see the commit in the log
(incl. its hash)
0
1
Switch to a new branch that starts at
our initial commit
02Create a copy of our hello.txt file and
name it hello3.txt
03Create a new commit with it
04We can see all commits of this branch
but not the ones from master
Learnings so far …
• .git/refs/heads contains one text file for each branch (named
like the branch itself) = “branch pointer”
e.g. .git/refs/heads/master, .git/refs/heads/feature/a
• HEAD is a text file that always contains the path (relative to
.git/) to the currently checked out branch
e.g. ref: refs/heads/feature/a
• Creating new branches is very easy (we only need to store a
reference to the commit)
How does git know which commits
belong to a branch?
0
1
Get the commit hashes of master …
02… and feature/a
03Print out the contents of both
commits with git cat-file
Learnings
• git cat-file –p <hash> allows us to look at the contents of a
commit
• Each commit contains
• Information about its author
• The commit message
• A timestamp when it was created
• and a reference to its direct ancestor commit
• All commits of a single branch form a linked list that can be
traversed back to the initial/first commit
• .git/refs/heads/… points to the head of this linked list
A: By hashing its contents with SHA-1.
Q: How is the commit hash generated?
Q: How does git store the files?
👀
0
1
Move hello.txt into a newly created
subfolder
02Create a new commit with this
change
03The tree object of the root folder now
contains a reference to another tree
object
Let’s create a subfolder
commit
tree
blob blob
commit
tree
blob
commit
tree
tree blob
blob
= referenced by hash
There are THREE different kinds of “objects”
commit
Author
Commit message
Reference to the
previous commit
Timestamp
tree
One per folder (incl.
one for the root folder)
Contains the name of all
files in the folder and
references to its
corresponding blob objects
blob
File contents
Hash = SHA-1(object)
But where are these objects stored?
Objects itself are stored in a KVS
Advantages
• Efficient storage/transfer because objects with the same content
are only stored once (same hash)
• If you fucked something up the chances are very high that it can
be fixed 🎉
✅ ✅
Merkel Merkle trees
Source: https://komodoplatform.com/whats-merkle-tree/
commit: 676e6c8
tree: d80ea91
tree: 61b7138 blob: 9b4930f
blob: 9b4930f
commit: c3b1130
tree: ab3a8b9
tree: 61b7138 blob: ed9e506
blob: 9b4930f
✅ ✅ ✅
Further reading
Building Git
J A M E S C O G L A N
https://shop.jcoglan.com/building-git/
Pro Git
S C O T T C H A C O N
https://git-scm.com/book/de/v2
Thank you

More Related Content

What's hot

Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013mumrah
 
Git journey from mars to neon EclipseCon North America - 2016-03-08
Git journey from mars to neon   EclipseCon North America - 2016-03-08Git journey from mars to neon   EclipseCon North America - 2016-03-08
Git journey from mars to neon EclipseCon North America - 2016-03-08msohn
 
RandomAccessFile Quick Start
RandomAccessFile Quick StartRandomAccessFile Quick Start
RandomAccessFile Quick StartGuo Albert
 
Information Gathering
Information GatheringInformation Gathering
Information Gatheringmirojo
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...ForgeRock
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupStartit
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesRob Miller
 
PyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software RepositoriesPyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software RepositoriesDelft University of Technology
 

What's hot (8)

Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
 
Git journey from mars to neon EclipseCon North America - 2016-03-08
Git journey from mars to neon   EclipseCon North America - 2016-03-08Git journey from mars to neon   EclipseCon North America - 2016-03-08
Git journey from mars to neon EclipseCon North America - 2016-03-08
 
RandomAccessFile Quick Start
RandomAccessFile Quick StartRandomAccessFile Quick Start
RandomAccessFile Quick Start
 
Information Gathering
Information GatheringInformation Gathering
Information Gathering
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix Utilities
 
PyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software RepositoriesPyDriller: Python Framework for Mining Software Repositories
PyDriller: Python Framework for Mining Software Repositories
 

Similar to KI University - Git internals

Introduction to Git for developers
Introduction to Git for developersIntroduction to Git for developers
Introduction to Git for developersDmitry Guyvoronsky
 
New Views on your History with git replace
New Views on your History with git replaceNew Views on your History with git replace
New Views on your History with git replaceChristian Couder
 
Understanding GIT
Understanding GITUnderstanding GIT
Understanding GIThybr1s
 
Version control with GIT
Version control with GITVersion control with GIT
Version control with GITZeeshan Khan
 
Six3 Getting Git
Six3 Getting GitSix3 Getting Git
Six3 Getting GitDaniel Cox
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
 
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...DrupalCape
 
Git: An introduction of plumbing and porcelain commands
Git: An introduction of plumbing and porcelain commandsGit: An introduction of plumbing and porcelain commands
Git: An introduction of plumbing and porcelain commandsth507
 
Intro to git (UT biocomputing 2015)
Intro to git (UT biocomputing 2015)Intro to git (UT biocomputing 2015)
Intro to git (UT biocomputing 2015)chenghlee
 
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)Ahmed El-Arabawy
 
Introduction to Git Version Control System
Introduction to Git Version Control SystemIntroduction to Git Version Control System
Introduction to Git Version Control SystemOleksandr Zaitsev
 

Similar to KI University - Git internals (20)

Introduction to Git for developers
Introduction to Git for developersIntroduction to Git for developers
Introduction to Git for developers
 
How git works
How git works  How git works
How git works
 
Git 101 for Beginners
Git 101 for Beginners Git 101 for Beginners
Git 101 for Beginners
 
The Nits and Grits of Git
The Nits and Grits of GitThe Nits and Grits of Git
The Nits and Grits of Git
 
Mini git tutorial
Mini git tutorialMini git tutorial
Mini git tutorial
 
New Views on your History with git replace
New Views on your History with git replaceNew Views on your History with git replace
New Views on your History with git replace
 
Understanding GIT
Understanding GITUnderstanding GIT
Understanding GIT
 
Version control with GIT
Version control with GITVersion control with GIT
Version control with GIT
 
Six3 Getting Git
Six3 Getting GitSix3 Getting Git
Six3 Getting Git
 
Learning git
Learning gitLearning git
Learning git
 
Git_new.pptx
Git_new.pptxGit_new.pptx
Git_new.pptx
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...
 
Git: An introduction of plumbing and porcelain commands
Git: An introduction of plumbing and porcelain commandsGit: An introduction of plumbing and porcelain commands
Git: An introduction of plumbing and porcelain commands
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
Intro to git (UT biocomputing 2015)
Intro to git (UT biocomputing 2015)Intro to git (UT biocomputing 2015)
Intro to git (UT biocomputing 2015)
 
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
 
Git session-2012-2013
Git session-2012-2013Git session-2012-2013
Git session-2012-2013
 
Introduction to Git Version Control System
Introduction to Git Version Control SystemIntroduction to Git Version Control System
Introduction to Git Version Control System
 
Git slides
Git slidesGit slides
Git slides
 

Recently uploaded

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 

Recently uploaded (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 

KI University - Git internals

  • 1. > ls .git/ A deep dive into git internals
  • 2. Before we start … http://pollev.com/markusfuchs839
  • 3. Who knows what “Event sourcing” is?
  • 4. “Classic” way of storing data in a database I D N A M E I S _ A R C H I V E D S P E A K E R _ I D A D D R E S S _ I D 1 Die zahnärztliche Niederlassung false 15 84 2 Psychosomatik I true 301 12
  • 5. ”Event sourced” way I D e C O U R S E _ I D E V E N T _ T Y P E E V E N T _ D AT A 1 1 CourseCreated { “name”: “Psychosomatik 1”, “address_id”: 84, “speaker_id”: 12 } 2 1 CourseSpeakerChanged { “new_speaker_id”: 15 } 3 1 CourseNameChanged { ”new_name”: “Psychosomatik I” }
  • 6. ”Event sourced” way Handles CourseCreated event Handles CourseSpeakerChanged event Handles CourseNameChanged event Handles CourseArchived event
  • 7. Pros and cons • Audit log for free • Version tracking for free • Course at time X But: • Performance with large amount of events • Querying is not as easy because we never store the current state in the database (Find all archived courses) • It can make some operational tasks harder
  • 8. 🤔Why am I telling you this?
  • 9. Pros and cons • Audit log for free • Version tracking for free • Course at time X But: • Filtering is not as easy because we never store the current state in the database (Find all archived courses) • It can make some operational tasks harder • Performance with large amount of events
  • 10. Pros and cons • Audit log for free • Version tracking for free • Course at time X But: • Filtering is not as easy because we never store the current state in the database (Find all archived courses) • It can make some operational tasks harder • Performance with large amount of events
  • 11.
  • 12.
  • 13. Learning goals • A little bit about the history of git • Internal storage mechanisms • What’s in the .git/ folder? • Which data structures are used by git?
  • 14.
  • 15. git (/ɡɪt/) I'm an egoistical bastard, and I name all my projects after myself. First 'Linux', now 'git'. L I N U S T O RVA L D S Source: https://www.urbandictionary.com/define.php?term=Git
  • 16. Facts • Development started in April 2005 • Linux kernel team was using BitKeeper (but the owner withdrew free use of the product) • Linus Torvalds wanted a DVCS but none met his needs
  • 17. > ls .git/ Let’s dive in … 🏊♂️
  • 19. 0 1 Create a new file 02Add the file to the index 03Create a new commit 04 We can see the commit in the log (incl. its hash)
  • 20.
  • 21.
  • 22. 0 1 Switch to a new branch that starts at our initial commit 02Create a copy of our hello.txt file and name it hello3.txt 03Create a new commit with it 04We can see all commits of this branch but not the ones from master
  • 23.
  • 24. Learnings so far … • .git/refs/heads contains one text file for each branch (named like the branch itself) = “branch pointer” e.g. .git/refs/heads/master, .git/refs/heads/feature/a • HEAD is a text file that always contains the path (relative to .git/) to the currently checked out branch e.g. ref: refs/heads/feature/a • Creating new branches is very easy (we only need to store a reference to the commit)
  • 25. How does git know which commits belong to a branch?
  • 26. 0 1 Get the commit hashes of master … 02… and feature/a 03Print out the contents of both commits with git cat-file
  • 27. Learnings • git cat-file –p <hash> allows us to look at the contents of a commit • Each commit contains • Information about its author • The commit message • A timestamp when it was created • and a reference to its direct ancestor commit • All commits of a single branch form a linked list that can be traversed back to the initial/first commit • .git/refs/heads/… points to the head of this linked list
  • 28. A: By hashing its contents with SHA-1. Q: How is the commit hash generated?
  • 29. Q: How does git store the files?
  • 30. 👀
  • 31.
  • 32. 0 1 Move hello.txt into a newly created subfolder 02Create a new commit with this change 03The tree object of the root folder now contains a reference to another tree object Let’s create a subfolder
  • 34. There are THREE different kinds of “objects” commit Author Commit message Reference to the previous commit Timestamp tree One per folder (incl. one for the root folder) Contains the name of all files in the folder and references to its corresponding blob objects blob File contents Hash = SHA-1(object)
  • 35. But where are these objects stored?
  • 36.
  • 37.
  • 38. Objects itself are stored in a KVS
  • 39.
  • 40. Advantages • Efficient storage/transfer because objects with the same content are only stored once (same hash) • If you fucked something up the chances are very high that it can be fixed 🎉
  • 41.
  • 43. Merkel Merkle trees Source: https://komodoplatform.com/whats-merkle-tree/
  • 44. commit: 676e6c8 tree: d80ea91 tree: 61b7138 blob: 9b4930f blob: 9b4930f commit: c3b1130 tree: ab3a8b9 tree: 61b7138 blob: ed9e506 blob: 9b4930f
  • 46. Further reading Building Git J A M E S C O G L A N https://shop.jcoglan.com/building-git/ Pro Git S C O T T C H A C O N https://git-scm.com/book/de/v2

Editor's Notes

  1. Modifying the data is as simple as updating the row in the database
  2. ”Event sourcing”: we don’t store the current state of the object but a stream of events that modified it: CourseCreated, CourseChanged, CourseArchived Later, we can “reconstruct” the current state of the object by applying all events in order again If we want to archive the course -> we need to insert a new event into the stream
  3. Reconstructing the current state later happens in code where we iterate over all events in the stream (from the database) And apply the corresponding changes e.g. CourseArchived event is handled by setting tha
  4. You store the timestamp of the event, also the user who executed the operation. CQRS Snapshots to cache intermediary states
  5. You store the timestamp of the event, also the user who executed the operation. CQRS Snapshots to cache intermediary states
  6. You store the timestamp of the event, also the user who executed the operation. CQRS Snapshots to cache intermediary states
  7. Indeed if we type git log –p Four each commit you can see all changes that have been to the working directory (creating files, modifying files, deleting files) The current state of the working directory can always be derived by applying all changes/change events since the beginning.
  8. Did you already know that git works like this? https://www.polleverywhere.com/multiple_choice_polls/VnIwdXx7qmTsGepodbekY
  9. What has git to do with these two persons (Linus and Angela Merkel), and a shopping cart Bingo: but please in your mind (don’t scream bingo ;))
  10. Applying patches should be fast “Take Concurrent Versions System (CVS) as an example of what not to do; if in doubt, make the exact opposite decision.” Include very strong safeguards against corruption, either accidental or malicious. Support a distributed, BitKeeper-like workflow.
  11. Learning goal of today: to know what’s inside the .git folder in a repository folder
  12. Git already created an .git/ folder with some files and subfolders. But most of it is still empty. Only hooks contain some example hooks and the refs folder contains two empty sub-directories called “heads” and “tags” I’ve also published this sample repository on GitHub (So you can also follow along if you want to)
  13. We take all that -> and hash it with SHA-1 Any change to any of this data also means that a new hash is generated (changing the email of the author for example)
  14. And the folder structure of the working directory at point in time X (commit X)
  15. Let’s go back to our two commits and what git store about them If you looked carefully you might already have noticed that there is another piece of information stored that I didn’t mention yet
  16. We can again use our friend git cat-file to find out what’s behind the hash Why do they all contain the same hash? Because the hashes for blobs are calculated the same way as those of the commit -> by hashing its content Files that have the same content are stored under the same hash Because we just copied the file we initially created they all have the same hash
  17. Only references We still don’t know where the actual contents of the objects (e.g. files are stored)
  18. We didn’t look into the objects directory yet. That’s what we’re going to do now.
  19. Where key is the object’s hash The objects folder contains sub-directories with the first two bytes of the hash
  20. Where key is the object’s hash The objects folder contains sub-directories with the first two bytes of the hash
  21. Every time you write “git add ” a new blob object is added to the objects KVS (unless there already is a file with the same contents) Objects with the same hash are only stored once Trees and commits are stored in a binary format and then deflated with zlib Objects are never removed (as long as there is a reference to it somewhere) Garbage collector removes unreferenced objects Packing algorithm that compresses objects
  22. Unless it was already sent to the remote; even then it’s more of a communication problem Say you accidentally created a squashed merge commit instead of a “normal” merge commit
  23. Select the **true** statements about the way _git_ works. https://www.polleverywhere.com/multiple_choice_polls/QH0fsFke1EnN0cjmZHgDR
  24. Three side tasks: What has git to do with these two persons, and a shopping cart
  25. Named after its inventor Ralph Merkle Also called hash tree Leaf nodes contain actual content Nodes contain hash of its children Cryptographic properties: Finding differences in two trees is really easy/efficient Used in bitcoin (Merkle proof) where a user can verify only with sending hashes that a transaction has taken place
  26. Trust on probabilities He couldn’t come up with a chain of hashes that produce the exact same merkle root
  27. If we change a file (hello3.txt) and create a new commit This blob will get a new hash (because its contents have changed) In turn also the tree object will get a new hash (because its contents, the references to the blobs/trees, have changed) And we also get the new commit hash (not only because of the modified tree hash but also because of the timestamp If we later (git log –p) want to find out where these two commits differ (and where not) we only need to traverse the trees -> subtrees with the same hash have not changed
  28. Three side tasks: What has git to do with these two persons, and a shopping cart