How to Reduce Database
Load with Sparse Branches
John LoVerso
Software Archeologist
2
MathWorks
 We are a 3500+ person company dedicated to accelerating
the pace of engineering and science
 We have ~90 products based upon our core platforms
• MATLAB – The Language of Technical Computing
• Simulink – Simulation and Model-Based Design
3
Technical Challenges
 Unified code base from which full product family is released
twice a year
 Managing an over 1 million file code base
• 5,000+ components, acyclic dependencies
 Integrating changes from ~1500 developers
on ~270 active branches (streams)
4
Multilevel Streams Hierarchy
//mw/main
//mw/integ2//mw/integ1
//mw/product1 //mw/product5
5
Merge Down and Merge Up
 Build and Test infrastructure blesses submitted changes
• Qualify changes already submitted to the branch
 Merge down from the last blessed change-level
• Almost never merge from the tip
 Cannot copy up; must merge
6
Componentized Streams
 Our branches are built out of components
 We use pairs of stream specs for each branch
 For details, see
• Moving 1000 Users and 100 Branches Into Streams
- from MERGE2014
• Outsmarting Merge Edge Cases in Component Based Development
- from MERGE2016
7
Pair of Streams
 Wide-open development stream specs
Stream: //mw/product
Parent: //mw/integ
Paths:
share …
 Virtual streams computed from component information
Stream: //mw/product~CTB
Parent: //mw/product
Paths:
share matlab/toolbox/prodA/...
share matlab/toolbox/prodB/...
8
Average Week
Changes
Files
Min Average Max Total
New Work 3328 1 8 52 25641
Merges 610 1 1890 39305 1153239
Changes submitted over one week in our main product depot
9
2015.1 Merge Meltdown
 Many regressions in integration engine
 Lost weeks of developer time
 All because we merge all the time
10
What If We Didn’t Need to Merge?
 At any time, the number of files on a branch with unique
changes is small
 The rest of the files are the same as on the parent branch
11
What is Sparse Branching?
 Sparse branching, a.k.a. lightweight or just-in-time
branching, is a strategy where files are only branched when
modified and are otherwise just a reference to a file on the
corresponding parent branch
 Initial creation of a sparse branch is an O(1) operation akin
to creating a clone or snapshot in a copy-on-write filesystem
like ZFS
12
Why not simply use Task Streams?
 They are database expensive
• Order(n) rather than Order(1) to create
• They consume database space until deleted
 They require unnecessary merging in order to be kept up-to-
date with their parent
• Still exposes the user to the vagaries of complex merges
 They have limitations
• You can only merge from their parent
• Can’t recreate them if they are destroyed
• No virtual stream support
13
Some Terms
 Winked-in file
• a file mapped to a revision on an ancestor branch (lazy copy)
 Active file
• a file with changes that have not yet propagated to the parent branch
 Make concrete
• the act of branching a winked-in file in order to make it active
14
Our Approach to Sparse Branches
 A sparse branch is defined by the stream Paths:
• Winked-in files use “import path@change” to map the paths from an
ancestor branch
• Files are made active on the branch by inserting “share” lines for each file
after populating it on the sparse branch
 Moving ahead to a new parent change involves
• Advancing the change number in the “import path@change” directives
• Merging changes down from the parent into active files
15
Pair of Streams
 Wide-open development stream specs
Stream: //mw/product
Parent: //mw/integ
Paths:
share …
 Virtual streams computed from component information
Stream: //mw/product~CTB
Parent: //mw/product
Paths:
share matlab/toolbox/prodA/...
share matlab/toolbox/prodB/...
16
Pair of Streams – Sparse Branch
 Wide-open development stream specs
Stream: //mw/product
Parent: //mw/integ
Paths:
share …
 Virtual streams computed from component information
Stream: //mw/product~CTB
Parent: //mw/product
Paths:
import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000
import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000
17
Pair of Streams – Sparse Branch
With Active Files
 Wide-open development stream specs
Stream: //mw/product
Parent: //mw/integ
Paths:
share …
 Virtual streams computed from component information
Stream: //mw/product~CTB
Parent: //mw/product
Paths:
import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000
import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000
share matlab/toolbox/prodA/file1
18
Multi-level Sparse Hierarchies
 Multi-level sparse hierarchies
Stream: //mw/integ~CTB
Parent: //mw/integ
Paths:
import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980
import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980
share matlab/toolbox/prodB/file2
Stream: //mw/product~CTB
Parent: //mw/product
Paths:
import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980
import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980
import matlab/toolbox/prodB/file2 //mw/integ/matlab/toolbox/prodB/file2@1000
share matlab/toolbox/prodA/file1
20
Transparent to the User
 Once a sparse branch is created, user commands should be
entirely agnostic to the nature of the branch
• add/edit/delete/move/unshelve/merge should all just work
 Updating to a newer change level of the parent is special
• merge + revert of newly branched files
21
But what about new work?
We have explored two approaches:
 Branch On Edit
• Just-in-time branching of a file as soon as “p4 edit” happens
 Branch On Submit
• Files are opened on the parent branch and are only branched at submit
time
22
Branch On Edit
 Has the benefit of being easier to implement
 Broker wrapper to intercept operations that open files
• edit, delete, move, open, unshelve, merge
• Compute all the files about to be operated on
• Invoke ‘p4 populate’ to make concrete revisions on the sparse branch of
winked-in files
• Invoke ‘p4 flush’ to switch have revision
• Let operation complete to affect opened revision
 Trigger (change-content) to add files to active list on submit
23
How It Looks
$ p4 stream -t sparse_v1 -P //mw/robotics@1705498 //pdb/jloverso/robotics/demo
Stream //pdb/jloverso/robotics/demo created.
$ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -2
import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498
$ p4 files //pdb/jloverso/robotics/demo/...
//pdb/jloverso/robotics/demo/... - no such file(s).
24
How It Looks - Sync
$ p4 client -s -S //pdb/jloverso/robotics/demo
Client jloverso.demo switched.
$ p4 sync
…
//mw/robotics/matlab/toolbox/robotics/Makefile#3 -
/sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile
//mw/robotics/matlab/toolbox/robotics/baseline.cpp#7 -
/sandbox/jloverso.demo/matlab/toolbox/robotics/baseline.cpp
…
25
How It Looks - Edit
$ p4 have matlab/toolbox/robotics/Makefile
//mw/robotics/matlab/toolbox/robotics/Makefile#3 -
/sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile
$ p4 edit matlab/toolbox/robotics/Makefile
//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - opened for edit
$ p4 have matlab/toolbox/robotics/Makefile
//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 -
/sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile
26
How It Looks – Dynamically Branched
$ p4 files //pdb/jloverso/robotics/demo/...
//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - branch change
1722340 (text)
$ p4 filelog //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile
//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile
... #1 change 1722340 branch on 2016/03/31 by jloverso@jloverso.demo (text)
'Dynamically branched'
... ... branch from //mw/robotics/matlab/toolbox/robotics/Makefile#1,#4
27
How It Looks - Submit
$ p4 submit -d "new work"
Submitting change 1722341.
Locking 1 files ...
edit //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#2
Change 1722341 submitted.
$ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -3
import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498
share matlab/toolbox/robotics/Makefile
28
Branch On Submit
 Provides a truer version of copy-on-write semantics
 Pending changes are fully discardable with no remnants or
commit server impact
 Requires the ability to re-base (reopen) the files in a pending
change from one branch to another
• Can be done by creating journal entries in order to modify entries in
db.have, db.working, and db.locks tables
• This is known in 2015.2 as “p4 sync –r”
29
Status
 Branch-on-edit in production use for
• private developer branches
• “fixes” branches that are created on-the-fly for each change that is
processed by our Build & Test automation
 We have a prototype of branch-on-submit
• Some limitations; being worked on
 Both versions support multilevel hierarchies
30
Future Plans
 Our goal is to get all non-mainline branches to be sparse
• This is where we can truly reduce database sizes
 Possibility of open-source release of broker and trigger logic
• Some internal dependencies need to be eliminated
Thank you!
John.LoVerso@MathWorks.com

How to Reduce Database Load with Sparse Branches

  • 1.
    How to ReduceDatabase Load with Sparse Branches John LoVerso Software Archeologist
  • 2.
    2 MathWorks  We area 3500+ person company dedicated to accelerating the pace of engineering and science  We have ~90 products based upon our core platforms • MATLAB – The Language of Technical Computing • Simulink – Simulation and Model-Based Design
  • 3.
    3 Technical Challenges  Unifiedcode base from which full product family is released twice a year  Managing an over 1 million file code base • 5,000+ components, acyclic dependencies  Integrating changes from ~1500 developers on ~270 active branches (streams)
  • 4.
  • 5.
    5 Merge Down andMerge Up  Build and Test infrastructure blesses submitted changes • Qualify changes already submitted to the branch  Merge down from the last blessed change-level • Almost never merge from the tip  Cannot copy up; must merge
  • 6.
    6 Componentized Streams  Ourbranches are built out of components  We use pairs of stream specs for each branch  For details, see • Moving 1000 Users and 100 Branches Into Streams - from MERGE2014 • Outsmarting Merge Edge Cases in Component Based Development - from MERGE2016
  • 7.
    7 Pair of Streams Wide-open development stream specs Stream: //mw/product Parent: //mw/integ Paths: share …  Virtual streams computed from component information Stream: //mw/product~CTB Parent: //mw/product Paths: share matlab/toolbox/prodA/... share matlab/toolbox/prodB/...
  • 8.
    8 Average Week Changes Files Min AverageMax Total New Work 3328 1 8 52 25641 Merges 610 1 1890 39305 1153239 Changes submitted over one week in our main product depot
  • 9.
    9 2015.1 Merge Meltdown Many regressions in integration engine  Lost weeks of developer time  All because we merge all the time
  • 10.
    10 What If WeDidn’t Need to Merge?  At any time, the number of files on a branch with unique changes is small  The rest of the files are the same as on the parent branch
  • 11.
    11 What is SparseBranching?  Sparse branching, a.k.a. lightweight or just-in-time branching, is a strategy where files are only branched when modified and are otherwise just a reference to a file on the corresponding parent branch  Initial creation of a sparse branch is an O(1) operation akin to creating a clone or snapshot in a copy-on-write filesystem like ZFS
  • 12.
    12 Why not simplyuse Task Streams?  They are database expensive • Order(n) rather than Order(1) to create • They consume database space until deleted  They require unnecessary merging in order to be kept up-to- date with their parent • Still exposes the user to the vagaries of complex merges  They have limitations • You can only merge from their parent • Can’t recreate them if they are destroyed • No virtual stream support
  • 13.
    13 Some Terms  Winked-infile • a file mapped to a revision on an ancestor branch (lazy copy)  Active file • a file with changes that have not yet propagated to the parent branch  Make concrete • the act of branching a winked-in file in order to make it active
  • 14.
    14 Our Approach toSparse Branches  A sparse branch is defined by the stream Paths: • Winked-in files use “import path@change” to map the paths from an ancestor branch • Files are made active on the branch by inserting “share” lines for each file after populating it on the sparse branch  Moving ahead to a new parent change involves • Advancing the change number in the “import path@change” directives • Merging changes down from the parent into active files
  • 15.
    15 Pair of Streams Wide-open development stream specs Stream: //mw/product Parent: //mw/integ Paths: share …  Virtual streams computed from component information Stream: //mw/product~CTB Parent: //mw/product Paths: share matlab/toolbox/prodA/... share matlab/toolbox/prodB/...
  • 16.
    16 Pair of Streams– Sparse Branch  Wide-open development stream specs Stream: //mw/product Parent: //mw/integ Paths: share …  Virtual streams computed from component information Stream: //mw/product~CTB Parent: //mw/product Paths: import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000 import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000
  • 17.
    17 Pair of Streams– Sparse Branch With Active Files  Wide-open development stream specs Stream: //mw/product Parent: //mw/integ Paths: share …  Virtual streams computed from component information Stream: //mw/product~CTB Parent: //mw/product Paths: import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000 import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000 share matlab/toolbox/prodA/file1
  • 18.
    18 Multi-level Sparse Hierarchies Multi-level sparse hierarchies Stream: //mw/integ~CTB Parent: //mw/integ Paths: import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980 import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980 share matlab/toolbox/prodB/file2 Stream: //mw/product~CTB Parent: //mw/product Paths: import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980 import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980 import matlab/toolbox/prodB/file2 //mw/integ/matlab/toolbox/prodB/file2@1000 share matlab/toolbox/prodA/file1
  • 19.
    20 Transparent to theUser  Once a sparse branch is created, user commands should be entirely agnostic to the nature of the branch • add/edit/delete/move/unshelve/merge should all just work  Updating to a newer change level of the parent is special • merge + revert of newly branched files
  • 20.
    21 But what aboutnew work? We have explored two approaches:  Branch On Edit • Just-in-time branching of a file as soon as “p4 edit” happens  Branch On Submit • Files are opened on the parent branch and are only branched at submit time
  • 21.
    22 Branch On Edit Has the benefit of being easier to implement  Broker wrapper to intercept operations that open files • edit, delete, move, open, unshelve, merge • Compute all the files about to be operated on • Invoke ‘p4 populate’ to make concrete revisions on the sparse branch of winked-in files • Invoke ‘p4 flush’ to switch have revision • Let operation complete to affect opened revision  Trigger (change-content) to add files to active list on submit
  • 22.
    23 How It Looks $p4 stream -t sparse_v1 -P //mw/robotics@1705498 //pdb/jloverso/robotics/demo Stream //pdb/jloverso/robotics/demo created. $ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -2 import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498 $ p4 files //pdb/jloverso/robotics/demo/... //pdb/jloverso/robotics/demo/... - no such file(s).
  • 23.
    24 How It Looks- Sync $ p4 client -s -S //pdb/jloverso/robotics/demo Client jloverso.demo switched. $ p4 sync … //mw/robotics/matlab/toolbox/robotics/Makefile#3 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile //mw/robotics/matlab/toolbox/robotics/baseline.cpp#7 - /sandbox/jloverso.demo/matlab/toolbox/robotics/baseline.cpp …
  • 24.
    25 How It Looks- Edit $ p4 have matlab/toolbox/robotics/Makefile //mw/robotics/matlab/toolbox/robotics/Makefile#3 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile $ p4 edit matlab/toolbox/robotics/Makefile //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - opened for edit $ p4 have matlab/toolbox/robotics/Makefile //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile
  • 25.
    26 How It Looks– Dynamically Branched $ p4 files //pdb/jloverso/robotics/demo/... //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - branch change 1722340 (text) $ p4 filelog //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile ... #1 change 1722340 branch on 2016/03/31 by jloverso@jloverso.demo (text) 'Dynamically branched' ... ... branch from //mw/robotics/matlab/toolbox/robotics/Makefile#1,#4
  • 26.
    27 How It Looks- Submit $ p4 submit -d "new work" Submitting change 1722341. Locking 1 files ... edit //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#2 Change 1722341 submitted. $ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -3 import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498 share matlab/toolbox/robotics/Makefile
  • 27.
    28 Branch On Submit Provides a truer version of copy-on-write semantics  Pending changes are fully discardable with no remnants or commit server impact  Requires the ability to re-base (reopen) the files in a pending change from one branch to another • Can be done by creating journal entries in order to modify entries in db.have, db.working, and db.locks tables • This is known in 2015.2 as “p4 sync –r”
  • 28.
    29 Status  Branch-on-edit inproduction use for • private developer branches • “fixes” branches that are created on-the-fly for each change that is processed by our Build & Test automation  We have a prototype of branch-on-submit • Some limitations; being worked on  Both versions support multilevel hierarchies
  • 29.
    30 Future Plans  Ourgoal is to get all non-mainline branches to be sparse • This is where we can truly reduce database sizes  Possibility of open-source release of broker and trigger logic • Some internal dependencies need to be eliminated
  • 30.

Editor's Notes

  • #13 -complex merges are merges that cannot be committed atomically due to limitations in the Perforce integration engine
  • #23 Poses additional challenges for multi-level sparse hierarchies