diG : Deploying large applications with Git
                         Alec Clews              alec.clews@voga.com.au
5. Security polices that make installing additional software, daemons and setuid programs a
      problem (ssh was already...
Component Overview

Work flow
The diG staging repository is a Git version control database

   1.   Software is built a...
There is a branch in diG staging for each application environment and on each corresponding
application environment there ...
1. Release check in
   2. Deployment merge
   3. After the configuration of a new release on an environment (NB Requires a...
On the staging repository
    git remote add –m <envID> -f <envID>                      ssh://$username@$host:$port$locati...
Add the configuration changes to the local repository
Add any configuration changes back into the application repository
network storage. This could result in major savings on storage (Git needs only store each
        unique object version on...
Upcoming SlideShare
Loading in …5

Deploy Application Files with Git


Published on

Presentation for SAGE AU 2009

Source document at http://github.com/alecthegeek/diG

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Deploy Application Files with Git

  1. 1. diG : Deploying large applications with Git Alec Clews alec.clews@voga.com.au Principal Consultant at Voga Consulting Services Abstract When deploying files into commercial environments it is vital to document and audit the installation of files across multiple application instances. This paper presents a working example that addresses some of the major issues relating to deployment such as automation, access control, logging and change auditing. Traditionally the Git version control tool is used to manage development source code and as a delivery point for other tools such as such Capistrano or Puppet. However current deployment and system management frameworks often use a specialist domain language or pidgin and need significant configuration which can make them more complex to use. Some large tools also require programming implementation classes in a language such as Java to perform specific actions. In a system management environment this effort is worthwhile, but for application deployment (often with little time or budget available) it is often too much effort. Using Git with a number of simple wrapper scripts can greatly simplify the deployment implementation. A basic understanding of Git and ssh is assumed. Motivation Deployment of software in large scale, commercial, environments presents a set of requirements and constraints that are only usually considered as an after thought. These deployment processes should be: 1. Repeatable: Can be run repeatedly as required 2. Reproducible: Can be re-used and installed across different systems as needed 3. Documented 4. Audited: Records should be maintained of all activity 5. Secure: Access and functions should be restricted and based on some form of access control 6. Automated: As little human interaction should be required as possible 7. Easy to use: There should be as few opportunities for errors as possible This paper introduces some simple recipes (refered to collectively to as diG – deploy interactivity with Git) that can be quickly implemented to address these needs. Whilst there are a number of commercial and open source alternatives available, with much greater functionality, they all require significant set-up and infrastructure. diG is a quick win solution with small cost and effort but limited functionality. Background This approach was originally developed for teams with the following needs: 1. Frequent releases 2. Regular internal and external audits 3. Over forty application instances across production, test and development 4. An application that occupies 2.5Gb of file storage in over 10,000 files
  2. 2. 5. Security polices that make installing additional software, daemons and setuid programs a problem (ssh was already in use and installation of Git was easy) Scope The features provided by diG are outlined below in the user stories. However it is also important to articulate the things that diG does not do Excluded features: 1. Application configuration 2. Application management 3. Operating system operations (beyond the manipulation of application file system objects) User Stories The following user stories describe the features that diG provides: • diG will accept release contents from a configuration management or version control tool. If available a release identifier must be tracked in some fashion • diG will provide a report that lists the contents of each accepted release • diG will have a function to physically deploy release contents to a specific environment NB: • Pre and post processing of a releases contents are subject to separate deliveries outside the scope of diG (c.f. Environment configuration scripts) • A release can be re-deployed to any environment at will • diG will provide a report that lists the contents of the release deployed to each environment • diG will provide details of any differences between the release deployed to an environment and any changes detected to the objects in the same runtime environment (a change audit report). • A (semi automatic) 'roll back' feature to a documented state would be highly desirable • diG will allow detected changes to be marked as benign and ignored for future change audit reports Solution Overview Git, a distributed version control system, was selected for five reasons: • Tracks changes to files • Zero capital cost • Supports ssh as a transport mechanism • Does not require a server process • Excellent support for branches Subsequently Git has been the correct choice because of • Flexibility • Ease of testing
  3. 3. Component Overview Work flow The diG staging repository is a Git version control database 1. Software is built and released from an existing set of processes and tools 2. The release is identified with an external identifier 3. A new empty branch is created on the diG staging repository (a release branch) 4. The release contents are committed to the diG staging repository on the new release branch 5. The Git commit is a record in staging of the release contents. This may be tagged and signed if required
  4. 4. There is a branch in diG staging for each application environment and on each corresponding application environment there is another Git repository (one repository per application installation) 6. To release to an environment a release branch is merged to an application branch in the staging repository 7. The updated application branch is pushed to the remote application repository 8. On the application environment files are checked out (deployed) from Git 9. The remote application is configured as needed 10. Any configuration changes can be added back to the application repository 11. Configuration changes are pulled back to the staging repository. Assumptions 1. Each application environment has a unique and consistent identifier 2. Git software is installed on each system 3. ssh is installed on each system Set-up Involves two steps 1. Creating a staging repository 2. Creating a runtime environment definition (environment branch in staging) and corresponding repository. Repeated for each application environment as needed. Custom Merge Drivers When Git attempts to bring two branches together it performs a merge. In diG a merge occurs when a release is applied to an environment – the release branch is merged onto the contents of the current application branch. However the Git default behaviour takes similar files and integrates non overlapping changes. For a release this is not what we want (code merge has already been done during the development and build cycle) and files in the new release should overwrite existing application files. This is accomplished by defining a custom merge driver that implements this behaviour. 1. Define a merge driver stanza in the Git configuration file 2. Define the file pattern for which this driver will be used Refer to the Git documentation for more details1 The set-up scripts below perform these steps when the repositories are created by 'git init' This is done on all Git repositories that diG uses (for completeness, although the current use cases do not require merging on the application environment repositories). Naming Conventions and Git Tagging Using diG affectively requires the use of a consistent naming convention for all branches, tags etc. For examples in the following examples all new delivery branch names are prefixed with “DELIVERY_”. Git tags, basically immutable names that can be associated with specific commits, are useful for both auditing and reporting purposes, and a consistent naming convention should also be adopted. In the reporting section it is assumed that all deployments are tagged with a name that is prefixed with “DEPLOYED”, although the specific tagging is not show. The following minimum set of tag points is suggested 1 http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html#_defining_a_custom_merge_driver
  5. 5. 1. Release check in 2. Deployment merge 3. After the configuration of a new release on an environment (NB Requires a pull back to staging first) Create a staging repository On the staging repository (this may require substantial free desk space) create a Git repository. e.g. 1. Creates a new empty Git staging database (a repository or repo) 2. mkdir stagingRepo 3. cd stagingRepo 4. git init # TODO add –shared options 5. Creates a custom merge driver. Explained above 6. echo '[merge "overwrite"]' >> .git/config 7. echo ' name = overwrite using cp' >> .git/config 8. echo ' driver = cp %B %A' >> .git/config 9. echo ' #recursive = defaults to driver' >> .git/config 10. echo '* merge=overwrite' >> .git/info/attributes 11. An initial empty commit and branch to which all subsequent deliveries can be added 12. git commit --allow-empty -m "Creating empty staging repo" 13. git branch -m master Empty # rename master branch Adding a new environment Before deployments occur to a remote application system create a repository for each application runtime environment to be maintained. This is created on the operating system where the application is installed On the application runtime system: Ensure Git is installed Ensure there is adequate free hard disk space. This will depend on the size of the application and the size/number of releases to be managed. Login as the account to own and maintain the deployment database. A separate account is suggested 1. mkdir <envID>Repo 2. cd <envID>Repo 3. git init # TODO add –shared options 4. echo '[merge "overwrite"]' >> .git/config 5. echo ' name = overwrite using cp' >> .git/config 6. echo ' driver = cp %B %A' >> .git/config 7. echo ' #recursive = defaults to driver' >> .git/config 8. echo '* merge=overwrite' >> .git/info/attributes #no spaces around '=' 9. GIT_DIR=$location/<envID>Repo/.git ; export GIT_DIR 10. cd <AppEnvLocation> 11. Create .gitignore file. See below At this point the application should be stopped 12. find . -type f -print | git update-index --add –stdin The application may be restarted 13. git commit –m "Initial <envID> load" 14. git branch –m master <envID> The command 'git update-index' is used rather than the more usual 'git add' because: 1. It overcomes limitations of specifying lists of file names on the command line 2. Improves performance by not invoking Git add for every file 3. Provides an opportunity to apply selection criteria on the files to be included if .gitignore is not sufficient.
  6. 6. On the staging repository git remote add –m <envID> -f <envID> ssh://$username@$host:$port$location where $username, $host, $port and $location represent the location and credentials for the remote application repository Creating a .gitignore file In the application root directory a .gitignore2 must be created. It lists the file patterns that are NOT to be tracked by the system (e.g. application log files). Delivering a new release to staging The delivery mechanism is currently very simple. The release process, from the configuration management tool (e.g. Subversion3) will place a new release in an empty directory (e.g. a Subversion tag path). The delivery process attempts to identify the release ID, in the following example this value has been placed in $DEL_NAME. 1. git checkout -b $DEL_NAME EMPTY # We create a new empty branch for delivery 2. cp -a $DEL_LOC/. . # assumes files are in $DEL_LOC 3. git add . 4. git commit -m "Accepting delivery $DEL_NAME" 5. git tag DELIVERY_${DEL_NAME} # Optional but recommended Deploying a new release to an environment This is a multi-stage process Deployment Overview On staging 1. Merge the appropriate release branch into the application branch. NB This will use the custom merge driver and overwrite application files 2. Push the release files to the remote application environment repository On the runtime environment 3. Deliver files to the runtime environment from the local repository 4. Configure application (not part of diG) 5. Add the configuration changes to the local repository On Staging 6. Pull any changes back from the environment repository Push release files to remote application environment repository 1. Merge a release onto a environment git merge --stat --commit -m "Adding delivery $DEL_NAME to environment $ENV_NAME" DELIVERY_$DEL_NAME 2. Move files to remote environment repo. NB This does not touch the current runtime environment git push --verbose --force $envID Deliver files to the runtime environment from the local repository Push application files from the new delivery onto the application environment git reset --hard HEAD At this point configuration and installation processes would be executed 2 http://www.kernel.org/pub/software/scm/git/docs/gitignore.html 3 http://subversion.tigris.org/
  7. 7. Add the configuration changes to the local repository Add any configuration changes back into the application repository 1. git add . 2. git commit –m “Message” Pull any changes back from the environment repository Bring configuration changes back to staging so that the staging repository is the single source of truth git fetch <envID> HEAD Change Auditing Application environments may be monitored using a combination of 'git status' and 'git diff'. When changes are detected they can removed with 'git reset --hard HEAD' and 'git clean -fd' Reports Reports will be specific to the requirements of each organisation, however Git does maintain complete records of all files contents and their movement. By identifying artefact's unique sha1 name it is possible to track a complete deployment lifecycle. Reports can created by running scripts with Git commands and there are many examples across the Internet. Reports are made easier by using a consistent naming conventions in branches and tags. Example report #! /bin/bash # Offer a list of previous deployments and display their contents select rel_id in `git tag -l DEPLOYED*` quit; do case $rel_id in quit) break;; *) git log --summary $rel_id;; esac done Conclusion diG has limitations but is easy to set up and use. In an environment where no provision has been made to manage the deployment of file-system objects this solution it can be a convenient way to set up a deployment and audit solution. This work has been tested on Linux, Solaris 8 and Windows. So far there have been no portability issues. Further work • Development and refinement of the workflow for various deployment requirements and application types • Scalability testing in order to discover the limits at which diG become impractical to use • Integration of change audit on none file-system objects (e.g. database schema objects) Possibly via checksum storage • Further automation and integration of pre and post-deployment scripts and Git hook scripts (c.f. http://toroid.org/ams/git-website-howto) • A mechanism for a release to delete objects (files, database tables etc) from an environment • A documented security model • Investigation of improvements possible by placing the various Git repositories on shared
  8. 8. network storage. This could result in major savings on storage (Git needs only store each unique object version once) References and Further Reading Git 1. Git explained on Wikipedia http://en.wikipedia.org/wiki/Git_(software) 2. Git Home page by Petr Baudis http://git.or.cz/index.html 3. Git SCM page by Scott Chacon http://git-scm.com/ 4. Git Wiki http://git.or.cz/gitwiki/FrontPage 5. Git FAQ http://git.or.cz/gitwiki/GitFaq 6. Packaging Software Using Git http://www.golden- gryphon.com/software/misc/packaging.html Alternative Open Source Products NB This list is illustrative only. There are doubtless many other tools, apologies to any that have been omitted. 1. Puppet http://reductivelabs.com/trac/puppet/wiki/AboutPuppet 2. Capistrano http://www.capify.org/ 3. heroku http://heroku.com/ 4. Fabric http://docs.fabfile.org/ 5. Vlad the Deployer http://rubyhitsquad.com/Vlad_the_Deployer.html 6. ControlTier http://open.controltier.com/ 7. Sun N1 Service Provisioning System (N.B. free of cost, but restricted license) http://www.sun.com/software/products/service_provisioning/ Acknowledgements This paper was written with the help of Marilyn Clews and Dr Nithya Rae License and Availability This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this licence, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California 94105, USA This paper and the any examples are freely available from http://github.com/alecthegeek/diG/tree/master. Contributions and comments are welcome About the Author Alec is principal at Voga Consulting Services (http://voga.com.au), a boutique consultancy that specialises in Application Life Cycle Management (ALM). He drinks too much espresso and probably spends too much time on various social websites as alecthegeek or Alec Clews. LinkedIn profile http://www.linkedin.com/in/alecclews