a short talk I gave to my group to explain the basics of HG and version control

  1. 1. Giovanni DallOlio, IBE (UPF-CEXS)Introduction to version control and hg for our bioinformatics group
  2. 2. What is hg?● Programmers use software to keep track of all the versions of the code they write. These are called Version Control Systems (VCS)● There are many software to make VCS; the most renown are cvs, subversion, git, hg, bazaar● Git, hg and bazaar are newer and based on an improved paradigm called Distributed Version Control System (DVCS)
  3. 3. How will hg be useful for us?● Keep versions of the scripts we create ● also for the datasets, results, etc..● Have a common and official version of the pipeline and the scripts, on bitbucket.org● Everybody will work on his computer on his version of the scripts; every once in a while, he will merge it with the official version
  4. 4. Installing hg● Hg can run on any operating system● On linux, install it through your software center ● sudo apt-get install mercurial● On other OS, go to http://mercurial.selenic.com/ and download the installer
  5. 5. Initial hg configuration● Hg stores its configuration in a file called: ● ~/.hgrc on Unix ● C:Documents and Settingsyour_name.hgrc● Open it and write your username: [ui] username = Giovanni DallOlio <dalloliogm@gmail.com>
  6. 6. The basic operations of a VCS● Creating a repository ● Can be equivalent to start keeping track of the version of the files in this project● Adding files to the repository ● Files are not tracked unless you say so● Committing changes ● Saving a version of the actual state of the files● Pushing the changes and merging them with the standard version
  7. 7. Creating a repository● Create a new directory and create the repo with: ● hg init
  8. 8. Effect of creating a new repo● An hidden directory (.hg) will be created● From now on, it will be possible to give other hg commands
  9. 9. Adding files to the repo● By default, no files are added to the repository● It means that if you create a new file in the directory, hg will ignore it
  10. 10. Creating a file
  11. 11. Files are not added automatically to the repo● The command: ● hg log file.txt● should return the historial of changes of the file file.txt. Since it is not in the repo yet, nothing is shown
  12. 12. hg add● To add a file to the repository, use hg add● This will mean that the software should record all the changes on that file
  13. 13. Committing changes● The most important operation in VCS is the commit● This operation saves the status of the files tracked and associate it with a version● One commit → one version
  14. 14. Committing a change● We have added the file file.txt to the repo● This is a change compared to the previous version (where this file was not present) ● So we have to record it with a commit
  15. 15. Our first commit
  16. 16. Effects of adding a file and committing● From now on, all the changes made to the file will be tracked
  17. 17. What is being committed?● Every time you commit a new version, hg stores the set of changes since the previous version● Other old VCS stored a copy of all the files for each version ● => very big disk space occupation● By storing only the changes, hg occupies less space and makes it easier to compare versions
  18. 18. Hg diff● The hg diff command will show the differences between the file and its last saved version
  19. 19. Hg log● Hg log will show the history of the changes in the repository
  20. 20. Hg log
  21. 21. The story continues..● The basic operations in a VCS are adding files to the tracking, and commit changes● Next week we will see how to keep a copy of our repository on a remote server, and how to collaborate with other people● Now I will show you some example of using a version control system
  22. 22. Example: backup● Imagine that for error, you remove a file or a directory from your project● With a VCS, you can revert to the previous version and get the files back
  23. 23. Example: tracking code● VCS have been developed to track changes in the code ● Return to the point where you have made a mistake or a typo ● Implementing a parallel version of the code, like trying a different library or approach (branching) ● Remember what you have been doing, when you have to change code written months ago
  24. 24. Example: releasing a software● Mr. Werewolf publishes a software to predict when the moon will be full● The code gets adopted by the werewolf community. Papers got published using it● At a certain point, another werewolf discover a bug in the code. It will be possible to seek the version where the error occurred and identify all the versions affected
  25. 25. Example: tracking data● Version control can be applied to a dataset● Example: Mr Dracula wants to write a paper on the quality of the blood in his neighborhood. Every time he gets new data, he commits a change
  26. 26. Tracking everything else● VCS can be applied to many kinds of file● Usually they do not support binary files● OpenOffice documents can be tracked (they are XML)
  27. 27. Tracking huge files● Hg stores the differences between two versions● Storing all the 1000g will take: ● Some gigabytes to store a compressed version of the files ● Less space to store the following commits (but these commits will take time)● Maybe it is not worth to put gigabytes of data under version control ● No solution to date ● Some hg extensions for big files
  28. 28. How frequently should I commit?● Everybody has his/her own phylosophy ● Some people prefer to commit every smallest change ● Others prefer to make only a big commit every day● As a general rule: ● The biggest the commit is, the most difficult is to integrate it if there are conflicts ● Its up to you to decide
  29. 29. How to write the perfect commit messages● One or two sentences● Avoid generic messages ● “new changes”, “fixed bugs”● Use tags like Fix, Add, Config, etc..: ● “Fix: error when reading file” ● “Add: new function for plotting results”● Cite the files changed if you think it may be useful: ● Implemented new sorting algorithm for sorting.py