Make Git understand Excel workbook files. A hands-on talk about how to use Git as an Excel/VBA developer.
European Spreadsheet Risk Interest Group (Eusprig), 5 July 2018, Imperial College London, United Kingdom
8. xltrail.com
Git and Excel… Will It Work?
• Outofthe box,goodenough tocreateauditlogs andmanageversions
• Butnotexactlynotlove atfirstsight
• Workbooksarejustbinaryfiles(blackbox) toGit
• git diffandgit merge areuseless
• Git knowsthataworkbookhaschanged (file hash)butthatin itself isnotuseful enough
• Git’sdecentraliseddesign couldget in ourway:
– Full repository clone
– If we commit a 5mbworkbook twice/day, we end up with 250x 2x 5mb~ 2.5gb repository sizeafter one year
– git clone becomes painfully slow, takes up a lot of space
• Ensurerepositorydoesnotblow upin size
9. xltrail.com
G(r)it, Spit, And a Whole Lot of Duct Tape
• Git is(very) flexible
• Git isnotopinionated
• Git hasa thriving ecosystem
• How canweensureourrepositoriesdo notexplodein size?
• How canwemage git diff/ git mergeworkwith Excelworkbooks?
• How canweshareworkbookswithoutlosing oursanity?
It’sgameon!
11. xltrail.com
Option 1: Version-control VBA with Git Hooks
• AsknotwhatGit candoforExcel –askwhatExcel candoforGit
• MaintainVBAcode“outsideExcel”
– Export VBA code fromworkbook into standalone text files
– Usebuilt-ingit diff/git merge
– ImportVBA code back into workbook after mergeoperations
• Git hooksareevent triggers toexecute codeoncertainGit actions
• Git pre-commithook
– Triggeredbefore git commit
– Execute code to export a copy of the workbook’s VBAcode into text files
– Text files are added to the commit
– Actual commit is executed
• git merge worksoutof theboxbuthastobere-importedmanuallyintoworkbook
14. xltrail.com
Git Workflows 1: Handling Merge Conflicts
• Workflow problems allrevolve aroundmerge conflicts
– Someonepushed to remotewhilewewereediting thebranch locally
– How to mergechanges from our branch backinto themasterbranch
– This is all by designas Git is a distributed versioncontrol system
• Single branchworkflow
– Everybodyworkson master
– Mergeconflict whennewcommits have beenpushed in themeantime
• Solution 1:
– gitstash / gitpull –theirs /gitstash pop
– Getslatestfrom remote,diff against ourversionand mergemanually
• Solution 2
– gitpull –ours
– Getslatestversionfrom remoteand resolvemergeconflict by makingour versionthe newlatestcommit(bruteforce)
• Solution 3
– Mergefor real
• Similar issues andremedies apply to branch-basedworkflow
16. xltrail.com
I think this is the beginning
of a beautiful friendship
• Git andExcelcan work
• Leveragesa well establishedversion controlsystem
• Canbeascomplex (or assimple) aswewantit tobe
• Whoisit for?
– Excel/VBA developer
– The“business” Excel user
17. xltrail.com
No diff viewer (client)
Thank You
https://www.xltrail.com
bjoern.stiel@zoomeranalytics.com
https://linkedin.com/in/bjoernstiel
Editor's Notes
Hi, my name is Bjorn Stiel and I am the co-author of xltrail, a Git extension for Excel workbooks.
I had some issues trying to come up with a hook for this talk to ignite your excitement.
While I was searching for ideas, I came across a Jamie Oliver talk. It’s a TED Talk from 2010 and he stars with:
“Sadly, in the next 18 minutes, while we chat, four Americans will be dead from the food they eat.”
And I thought that was shocking and exciting at the same time.
If anything, in the next 18 minutes, while we chat, 400 Americans will have produced 4000 lines of VBA code and 1200 new spreadsheets.
And as you know,
Some of the these workbooks are part and parcel of a business.
Some have probably been around for a long time.
Some look back at a long life of changes, bugs and bug-fixes and varying.
Now, we have heard a lot about EUC and version control today.
I want you to forget about EUC for a second and take you onto a brief journey
through the basics of Git and how it can help you as an Excel/VBA developer
by simply leveraging what “professional developers” do for quite some time.
It’s all about free/open-source solutions so if you are interested in the snippets I show,
please contact me at the end.
Excel workbooks are code.
VBA most definitely is code.
Spreadsheet formulas are code, too – functional reactive programming, anyone?
Excel workbooks are code.
VBA most definitely is code.
Spreadsheet formulas are code, too – functional reactive programming, anyone?
Fortunately, there is Git that does just that.
What happens when you commit an Excel workbook file into Git?
Well, it works, as in: it can be committed.
Unfortunately, it is just one big solid black box which Git doesn’t understand.
Why is this a problem?
In fact, there are two problems:
- first one is a size problem; because of the decentralised nature of Git, every developer has the full change history on their computer. Changes in large binary files cause Git repositories to grow by the size of the file in question every time the file is changed and the change is committed; in other words, if you deal with a 5mb workbook which is committed three times a day, your repository grows by ~75mb/week in size. Luckily it is a solved problem, thanks to a Git extension called Git LFS (Large File Storage)
- the second problem is that Git does not know what to do with the file. Git knows it’s not a text but a binary file but it is simply a black box. Which renders git-diff and git-merge operations useless. Which in turn makes core features like change management and branching a lot more difficult as we have to resort to manual comparison and manual merging
Let’s see how we can make git-diff and git-merge work for Excel workbook files.
Good news is that Git is very flexible.
It can be extended easily as Git’s extension model follows the Unix philosophy of composing small, simple programs to achieve a similar effect.
That means is that git “extensions” can be written in any language.
By following a few simple rules it’s still possible to add commands that appear as if they were built-in.
For example, if you wanted to create a custom command, like “git xltrail” you can do so by simply:
naming your executable/script git-COMMANDNAME (e.g. git-xltrail.exe)
making your executable/script available on $PATH
So one way would be to have a custom diff and merge command.
Something like “git xltrail diff” and “git xltrail merge”
That’s great.
But we can do even better: We can even override git builtins like git-diff and git-merge by customising .gitattributes and .gitconfig (globally or per-repository, there is a variety of possibilities).
In .gitconfig we define a custom diff command named “xltrail” which executes the command “python /dev/xltrail-client/diff.py” (or a compiled executable).
In .gitattributes we define a list of file extensions and instruct git to use “xltrail” for the “diff” command.
Good news is that Git is very flexible.
It can be extended easily as Git’s extension model follows the Unix philosophy of composing small, simple programs to achieve a similar effect.
That means is that git “extensions” can be written in any language.
By following a few simple rules it’s still possible to add commands that appear as if they were built-in.
For example, if you wanted to create a custom command, like “git xltrail” you can do so by simply:
naming your executable/script git-COMMANDNAME (e.g. git-xltrail.exe)
making your executable/script available on $PATH
So one way would be to have a custom diff and merge command.
Something like “git xltrail diff” and “git xltrail merge”
That’s great.
But we can do even better: We can even override git builtins like git-diff and git-merge by customising .gitattributes and .gitconfig (globally or per-repository, there is a variety of possibilities).
In .gitconfig we define a custom diff command named “xltrail” which executes the command “python /dev/xltrail-client/diff.py” (or a compiled executable).
In .gitattributes we define a list of file extensions and instruct git to use “xltrail” for the “diff” command.
Good news is that Git is very flexible.
It can be extended easily as Git’s extension model follows the Unix philosophy of composing small, simple programs to achieve a similar effect.
That means is that git “extensions” can be written in any language.
By following a few simple rules it’s still possible to add commands that appear as if they were built-in.
For example, if you wanted to create a custom command, like “git xltrail” you can do so by simply:
naming your executable/script git-COMMANDNAME (e.g. git-xltrail.exe)
making your executable/script available on $PATH
So one way would be to have a custom diff and merge command.
Something like “git xltrail diff” and “git xltrail merge”
That’s great.
But we can do even better: We can even override git builtins like git-diff and git-merge by customising .gitattributes and .gitconfig (globally or per-repository, there is a variety of possibilities).
In .gitconfig we define a custom diff command named “xltrail” which executes the command “python /dev/xltrail-client/diff.py” (or a compiled executable).
In .gitattributes we define a list of file extensions and instruct git to use “xltrail” for the “diff” command.
That’s diffing sorted. But that’s only the first half of the match.
Time to play the second half: git-merge.
Without git-merge, we get a git merge conflict every single time as Git simply sees two different binary files.
A naïve solution would be: git merge -X ours – which simply resolves the merge conflicts by accepting our version of the file and discarding the other file version.
Before we go into technical implementation, you might wonder what the use case for merging workbooks is.
People might work on a new VBA feature and branch off master in order to do that.
In the meantime, someone fixed a bug on the master branch.
Manually merging the changes in… is error-prone.
The spreadsheet use case is probably less clear. And more complicated, because of two reasons:
spreadsheets are a lot more complicated in the first place (applies to diffing)
there’s other stuff to consider, for example named ranges
Back to technical implementation:
The road is very similar to what we did with the custom differ.
Which means: back to .gitconfig and .gitattributes.
The .gitconfig block:
The merge block contains the merge driver's identifier, used to reference the merge driver later.
The name property contains a description of the merge driver.
The driver property contains the command that will be called when a conflict occurs. A handful of predefined parameters, most notably:
%O: ancestor’s version of the conflicting file
%A: current version of the conflicting file
%B: other branches' version of the conflicting file
See https://git-scm.com/docs/gitattributes or full reference
Live coding example
- time for a simple example
- spin up example (without xltrail installed)
- git xltrail install --local
- change VBA code
- change spreadsheet code
- brief peek into cloud.xltrail.com and spreadsheet differ: https://cloud.xltrail.com/#/workbook/github.com%2FZoomerAnalytics%2Fxltrail-demo.git%2FEvaluator_model.xlsm/sheet/5%20Year%20Plan/diff/3c54f16b657dce1e87e87435597f1376a6d8a926?branch=master