New Views on your History
with git replace
Christian Couder, Murex
chriscool@tuxfamily.org
OSDC.fr 2013
October 5, 2013
About Git
A Distributed Version Control System
(DVCS):
● created by Linus Torvalds
● maintained by Junio Hamano
● since 2005
● prefered VCS among open source
developers
Git Design
Git is made of these things:
● “Objects”
● “Refs”
● config, indexes, logs, hooks,
grafts, packs, ...
Only “Objects” and “Refs” are
transferred from one repository to
another.
Git Objects

● Blob: content of a file
● Tree: content of a directory
● Commit: state of the whole source code
● Tag: stamp on an object
Git Objects Storage

● Git Objects are stored in a
content addressable database.
● The key to retrieve each Object is the
SHA-1 of the Object’s content.
● A SHA-1 is a 160-bit / 40-hex / 20-byte
hash value which is considered
unique.
Blob
SHA1: e8455...

blob = content of a file
blob

size

/* content of this blob, it can be
anything like an image, a video,
... but most of the time it is
source code like:*/
#include <stdio.h>
int main(void)
{
printf("Hello world!n");
return 0;
}
Example of storing and
retrieving a blob
# echo “Whatever…” | git hash-object -w --stdin
aa02989467eea6d8e0bc68f3663de51767a9f5b1
# git cat-file -p aa02989467
Whatever...
Tree
SHA1: 0de24...
size

tree
blob
tree

hello.c
lib

tree = content of a
directory

e8455...
10af9...

It can point to blobs and
other trees.
Example of storing and
retrieving a tree
# BLOB=aa02989467eea6d8e0bc68f3663de51767a9f5b1
# (printf "100644 whatever.txt0"; echo $BLOB | xxd -r -p)
| git hash-object -t tree -w --stdin
0625da548ef0a7038c44b480f10d5550b2f2f962
# git cat-file -p 0625da548e
100644 blob aa02989467... whatever.txt
Commit
SHA1: 98ca9...
size

commit
tree

0de24...

parents

commit = information
about some changes

()

author

Christian <timestamp>

committer

Christian <timestamp>

My commit message

It points to one tree and 0
or more parents.
Example of storing and
retrieving a commit (1)
# TREE=0625da548ef0a7038c44b480f10d5550b2f2f962
# ME=”Christian Couder <chriscool@tuxfamily.org>”
# DATE=$(date "+%s %z")
# (echo -e "tree $TREEnauthor $ME $DATE";
echo -e "committer $ME $DATEnnfirst commit")
| git hash-object -t commit -w --stdin
37449e955443883a0a888ee100cfd0a7ba7927b3
Example of storing and
retrieving a commit (2)
# git cat-file -p 37449e9554
tree 0625da548ef0a7038c44b480f10d5550b2f2f962
author Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200
committer Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200
first commit
Git Objects Relations
SHA1: e84c7...
Commit

SHA1: 0de24...

size

tree

29c43...

parents

()

author

Christian

committer

Christian

Blob

size

SHA1: 29c43...
int main() { ... }
Tree

Initial commit

blob
tree

size
hello.c 0de24...
doc

98ca9...

SHA1: 98ca9...
Tree

size

blob readme 677f4...
blob

SHA1: 98ca9...
Commit
tree

install

23ae9...

size
5c11f...

parents

(e84c7...)

author

Arnaud

committer

Arnaud

Change hello.c

SHA1: 5c11f...
SHA1: bc789...
Tree
blob
tree

size
hello.c bc789...
doc

98ca9...

Blob

size

int main(void) { ... }
Git Refs
● Head: branch,
.git/refs/heads/
● Tag: lightweight tag,
.git/refs/tags/
● Remote: distant repository,
.git/refs/remotes/
● Note: note attached to an object,
.git/refs/notes/
● Replace: replacement of an object,
.git/refs/replace/
Example of storing and
retrieving a branch
# git update-ref refs/heads/master 37449e9554
# git rev-parse master
37449e955443883a0a888ee100cfd0a7ba7927b3
# git reset --hard master
HEAD is now at 37449e9 first commit
# cat whatever.txt
Whatever...
Result from previous examples
master

commit 37449e9554

tree 0625da548e

blob aa02989467
Commits in Git form a DAG
(Directed Acyclic Graph)

● history direction is from left to right
● new commits point to their parents
git bisect

B

● B introduces a bad behavior called "bug" or
"regression"
● red commits are called "bad"
● blue commits are called "good"
Problem when bisecting
Sometimes the commit that introduced a bug
will be in an untestable area of the graph.
For example:
W

X

X1

X2

X3

Y

Z

Commit X introduced a breakage, later fixed
by commit Y.
Possible solutions
Possible solutions to bisect anyway:
● apply a patch before testing and remove it
afterwards (can be done using "git cherrypick"), or
● create a fixed up branch (can be done with
"git rebase -i"), for example:
X+Y

W

X

X1'

X1

X2'

X2

X3'

X3

Z'

Y

Z

Z1
A good solution
The idea is that we will replace Z with Z' so that
we bisect from the beginning using the fixed up
branch.
X+Y

W

X

X1'

X1

$ git replace Z Z'

X2'

X2

X3'

X3

Z'

Y

Z1

Z
Grafts
Created mostly for projects like linux
kernel with old repositories.
● “.git/info/grafts” file
● each line describe parents of a
commit
● <commit> <parent> [<parent>]*
● this overrides the content in the
commit
Problem with Grafts

They are neither objects nor refs, so
they cannot be easily transferred.
We need something that is either:
● an object, or
● a ref
Solution, part 1: replace ref

● It is a ref in .git/refs/replace/
● Its name is the SHA-1 of the
object that should be replaced.
● It contains, so it points to, the
SHA-1 of the replacement object.
Solution, part 2: git replace

● git replace [ -f ] <object> <replacement>:
to create a replace ref
● git replace -d <object>:
to delete a replace ref
● git replace [ -l [ pattern ] ]:
to list some replace refs
Replace ref transfer
● as with heads, tags, notes, remotes
● except that there are no shortcuts and
you must be explicit
● refspec: refs/replace/*:refs/replace/*
● refspec can be configured (in .git/config),
or used on the command line (after git
push/fetch <remote>)
Creating replacement objects
When it is needed the following commands
can help:
● git rebase [ -i ]
● git cherry-pick
● git hash-object
● git filter-branch
What can it be used for?
Create new views of your history.
Right now only 2 views are possible:
● the view with all the replace refs enabled
● the view with all the replace refs disabled,
using --no-replace-objects or the
GIT_NO_REPLACE_OBJECTS
environment variable
Why new views?
● split old and new history or merge them
● fix bugs to bisect on a clean history
● fix mistakes in author, committer,
timestamps
● remove big files to have something lighter
to use, when you don’t need them
● prepare a repo cleanup
● mask/unmask some steps
● ...
Limitations
● everything is still in the repo
● so the repo is still big
● there are probably bugs
● confusing?
● ...
Current and future work
● a script to replace grafts
● fix bugs
● allow subdirectories in .git/refs/replace/
● maybe allow “views” as set of active
subdirectories
● ...
Considerations
● best of both world: immutability and
configurability of history
● no true view
● history is important for freedom
Many thanks to:
● Junio Hamano (comments, help, discussions,
reviews, improvements),
● Ingo Molnar,
● Linus Torvalds,
● many other great people in the Git and Linux
communities, especially: Andreas Ericsson,
Johannes Schindelin, H. Peter Anvin, Daniel
Barkalow, Bill Lear, John Hawley, ...
● OSDC/OWF organizers and attendants,
● Murex the company I am working for.
Questions ?

New Views on your History with git replace

  • 1.
    New Views onyour History with git replace Christian Couder, Murex chriscool@tuxfamily.org OSDC.fr 2013 October 5, 2013
  • 2.
    About Git A DistributedVersion Control System (DVCS): ● created by Linus Torvalds ● maintained by Junio Hamano ● since 2005 ● prefered VCS among open source developers
  • 3.
    Git Design Git ismade of these things: ● “Objects” ● “Refs” ● config, indexes, logs, hooks, grafts, packs, ... Only “Objects” and “Refs” are transferred from one repository to another.
  • 4.
    Git Objects ● Blob:content of a file ● Tree: content of a directory ● Commit: state of the whole source code ● Tag: stamp on an object
  • 5.
    Git Objects Storage ●Git Objects are stored in a content addressable database. ● The key to retrieve each Object is the SHA-1 of the Object’s content. ● A SHA-1 is a 160-bit / 40-hex / 20-byte hash value which is considered unique.
  • 6.
    Blob SHA1: e8455... blob =content of a file blob size /* content of this blob, it can be anything like an image, a video, ... but most of the time it is source code like:*/ #include <stdio.h> int main(void) { printf("Hello world!n"); return 0; }
  • 7.
    Example of storingand retrieving a blob # echo “Whatever…” | git hash-object -w --stdin aa02989467eea6d8e0bc68f3663de51767a9f5b1 # git cat-file -p aa02989467 Whatever...
  • 8.
    Tree SHA1: 0de24... size tree blob tree hello.c lib tree =content of a directory e8455... 10af9... It can point to blobs and other trees.
  • 9.
    Example of storingand retrieving a tree # BLOB=aa02989467eea6d8e0bc68f3663de51767a9f5b1 # (printf "100644 whatever.txt0"; echo $BLOB | xxd -r -p) | git hash-object -t tree -w --stdin 0625da548ef0a7038c44b480f10d5550b2f2f962 # git cat-file -p 0625da548e 100644 blob aa02989467... whatever.txt
  • 10.
    Commit SHA1: 98ca9... size commit tree 0de24... parents commit =information about some changes () author Christian <timestamp> committer Christian <timestamp> My commit message It points to one tree and 0 or more parents.
  • 11.
    Example of storingand retrieving a commit (1) # TREE=0625da548ef0a7038c44b480f10d5550b2f2f962 # ME=”Christian Couder <chriscool@tuxfamily.org>” # DATE=$(date "+%s %z") # (echo -e "tree $TREEnauthor $ME $DATE"; echo -e "committer $ME $DATEnnfirst commit") | git hash-object -t commit -w --stdin 37449e955443883a0a888ee100cfd0a7ba7927b3
  • 12.
    Example of storingand retrieving a commit (2) # git cat-file -p 37449e9554 tree 0625da548ef0a7038c44b480f10d5550b2f2f962 author Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200 committer Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200 first commit
  • 13.
    Git Objects Relations SHA1:e84c7... Commit SHA1: 0de24... size tree 29c43... parents () author Christian committer Christian Blob size SHA1: 29c43... int main() { ... } Tree Initial commit blob tree size hello.c 0de24... doc 98ca9... SHA1: 98ca9... Tree size blob readme 677f4... blob SHA1: 98ca9... Commit tree install 23ae9... size 5c11f... parents (e84c7...) author Arnaud committer Arnaud Change hello.c SHA1: 5c11f... SHA1: bc789... Tree blob tree size hello.c bc789... doc 98ca9... Blob size int main(void) { ... }
  • 14.
    Git Refs ● Head:branch, .git/refs/heads/ ● Tag: lightweight tag, .git/refs/tags/ ● Remote: distant repository, .git/refs/remotes/ ● Note: note attached to an object, .git/refs/notes/ ● Replace: replacement of an object, .git/refs/replace/
  • 15.
    Example of storingand retrieving a branch # git update-ref refs/heads/master 37449e9554 # git rev-parse master 37449e955443883a0a888ee100cfd0a7ba7927b3 # git reset --hard master HEAD is now at 37449e9 first commit # cat whatever.txt Whatever...
  • 16.
    Result from previousexamples master commit 37449e9554 tree 0625da548e blob aa02989467
  • 17.
    Commits in Gitform a DAG (Directed Acyclic Graph) ● history direction is from left to right ● new commits point to their parents
  • 18.
    git bisect B ● Bintroduces a bad behavior called "bug" or "regression" ● red commits are called "bad" ● blue commits are called "good"
  • 19.
    Problem when bisecting Sometimesthe commit that introduced a bug will be in an untestable area of the graph. For example: W X X1 X2 X3 Y Z Commit X introduced a breakage, later fixed by commit Y.
  • 20.
    Possible solutions Possible solutionsto bisect anyway: ● apply a patch before testing and remove it afterwards (can be done using "git cherrypick"), or ● create a fixed up branch (can be done with "git rebase -i"), for example: X+Y W X X1' X1 X2' X2 X3' X3 Z' Y Z Z1
  • 21.
    A good solution Theidea is that we will replace Z with Z' so that we bisect from the beginning using the fixed up branch. X+Y W X X1' X1 $ git replace Z Z' X2' X2 X3' X3 Z' Y Z1 Z
  • 22.
    Grafts Created mostly forprojects like linux kernel with old repositories. ● “.git/info/grafts” file ● each line describe parents of a commit ● <commit> <parent> [<parent>]* ● this overrides the content in the commit
  • 23.
    Problem with Grafts Theyare neither objects nor refs, so they cannot be easily transferred. We need something that is either: ● an object, or ● a ref
  • 24.
    Solution, part 1:replace ref ● It is a ref in .git/refs/replace/ ● Its name is the SHA-1 of the object that should be replaced. ● It contains, so it points to, the SHA-1 of the replacement object.
  • 25.
    Solution, part 2:git replace ● git replace [ -f ] <object> <replacement>: to create a replace ref ● git replace -d <object>: to delete a replace ref ● git replace [ -l [ pattern ] ]: to list some replace refs
  • 26.
    Replace ref transfer ●as with heads, tags, notes, remotes ● except that there are no shortcuts and you must be explicit ● refspec: refs/replace/*:refs/replace/* ● refspec can be configured (in .git/config), or used on the command line (after git push/fetch <remote>)
  • 27.
    Creating replacement objects Whenit is needed the following commands can help: ● git rebase [ -i ] ● git cherry-pick ● git hash-object ● git filter-branch
  • 28.
    What can itbe used for? Create new views of your history. Right now only 2 views are possible: ● the view with all the replace refs enabled ● the view with all the replace refs disabled, using --no-replace-objects or the GIT_NO_REPLACE_OBJECTS environment variable
  • 29.
    Why new views? ●split old and new history or merge them ● fix bugs to bisect on a clean history ● fix mistakes in author, committer, timestamps ● remove big files to have something lighter to use, when you don’t need them ● prepare a repo cleanup ● mask/unmask some steps ● ...
  • 30.
    Limitations ● everything isstill in the repo ● so the repo is still big ● there are probably bugs ● confusing? ● ...
  • 31.
    Current and futurework ● a script to replace grafts ● fix bugs ● allow subdirectories in .git/refs/replace/ ● maybe allow “views” as set of active subdirectories ● ...
  • 32.
    Considerations ● best ofboth world: immutability and configurability of history ● no true view ● history is important for freedom
  • 33.
    Many thanks to: ●Junio Hamano (comments, help, discussions, reviews, improvements), ● Ingo Molnar, ● Linus Torvalds, ● many other great people in the Git and Linux communities, especially: Andreas Ericsson, Johannes Schindelin, H. Peter Anvin, Daniel Barkalow, Bill Lear, John Hawley, ... ● OSDC/OWF organizers and attendants, ● Murex the company I am working for.
  • 34.