1. Understanding and using
GNU/Linux
by Giuseppe Profiti
Updated September 2015
Tutorial for the Programming for Bioinformatics course,
International Master of Bioinformatics
University of Bologna, Italy
http://www.biocomp.unibo.it/lsbioinfo/
First version: 12/2013
Last version: 11/2015
2. November 2015 Giuseppe Profiti 2/66
Goals and means
● Goals
– Understanding what an Operating System is
– Know how to proficiently use GNU/Linux
● Means
– Simple examples (maybe biology-inspired)
– Exercises and hands-on
● Not covered
– Formal details
– “How do I use <our favourite software>?”
3. November 2015 Giuseppe Profiti 3/66
What is an Operating System?
● It's a piece of
software
● It manages hardware
and software
resources
● It's useful for general
purpose and
heterogeneous
hardware systems
ImagefromWikimediaCommons,PublicDomain
4. November 2015 Giuseppe Profiti 4/66
Hardware, OS and software
Hardware
Operating
system
Image from Flickr, released under Creative Commons BY by Petr Dosek
5. November 2015 Giuseppe Profiti 5/66
Same OS, different software
Image from Wikimedia Commons, Public Domain (NASA)
ImagefromFlickr,CreativeCommonsBYbyTexasA&MUniversity
6. November 2015 Giuseppe Profiti 6/66
Another example
Image from Flickr, released under Creative Commons BY by Andrea Arden
Different hardware, different OS and software
7. November 2015 Giuseppe Profiti 7/66
GNU/Linux
● Originates from Unix
● Linux is the kernel
– Manages the hardware, memory and so on
● GNU is a set of software and tools
– They run on top of Linux
– Provide functionality
● Multi user, multi threaded
● Ubuntu, Lubuntu, Xubuntu, Debian, Red Hat..
● MacOS is based on Unix too
8. November 2015 Giuseppe Profiti 8/66
What's the difference?
Image from Wikimedia Commons, GNU GPL license
9. November 2015 Giuseppe Profiti 9/66
A Linux distribution includes
● The Kernel (Linux)
● An install system for the distribution
● Drivers
– How the system can manage specific hardware
● A package manager
– To install and update software
– Usually different from one distribution to the other
10. November 2015 Giuseppe Profiti 10/66
Login
● Once started the system asks for your
– Username
– Password
● Each user has a different main folder on disk
● Users have different access rights
● The superuser (called “root”) can do everything
● On Ubuntu, the main user you created when
installing can run programs as root, if needed
11. November 2015 Giuseppe Profiti 11/66
Shell
● It is the main interface with the system
● Can be used to
– Navigate the file system
– Execute tools
– Install software
– Connect to other machines
– Edit files
– … everything the system can do
● Also called Console, or Terminal
12. November 2015 Giuseppe Profiti 12/66
How a shell looks like
Image from Wikimedia Commons, licensed as Public Domain by User:AVRS
13. November 2015 Giuseppe Profiti 13/66
“It's a trap!”
Every time you use the mouse in a shell,
you are doing something wrong.
ImagebyManuelR.,WikimediaCommons,CC-BY
14. November 2015 Giuseppe Profiti 14/66
Exercise 1: Open a shell
● If you don't use the Graphical User Interface
– You already are in a shell
● If you use the Graphical User Interface
– In Ubuntu: Click the logo, type “terminal”, select it
– Other systems: find the terminal icon somewhere
● The terminal may have a black, white or colour
background
– No matter the colour, it works in the same way
15. November 2015 Giuseppe Profiti 15/66
The prompt
● It is a string saying that the shell is ready
● It may state the current directory
● It ends with $,%,> or #
● After that, you can type a command
● After a command, you type the Enter key
16. November 2015 Giuseppe Profiti 16/66
Exercise 2: create a directory
● To create a directory (or folder) type:
mkdir tutorial-p2b
● and press the Enter key ↵
● What do you see?
● To check the existence of the new directory:
ls
● and press the Enter key ↵
17. November 2015 Giuseppe Profiti 17/66
Upper-case and lower-case
● The shell is CASE SENSITIVE
– Upper-case and lower-case are different
● LS is different from ls
● Tutorial-p2b is not tutorial-p2b
● Then, to run a program, you have to type its
name correctly
●
You can use the TAB key ↹ to complete a
filename after typing its initials
– IF the system can distinguish what file you want
18. November 2015 Giuseppe Profiti 18/66
Exercise 3: look inside a directory
● Type:
ls tutorial-p2b ↵
● Type:
ls Tutorial-p2b ↵
● Type:
ls tut
● Then the TAB key ↹ , then the Enter key ↵
19. November 2015 Giuseppe Profiti 19/66
File system
● It stores both data files and programs
● Directories are lists of files
● Hierarchical structure
● The root of the tree is the directory /
/
home etc bin
me you
20. November 2015 Giuseppe Profiti
Filesystem
● Files and directories are stored in a filesystem
● The filesystem is like a tree:
– It has one root directory “/”
– Each subdirectory is a branch in the tree
– Each file is a leaf
21. November 2015 Giuseppe Profiti
Path
● A path specifies a location in the filesystem
● It indicates the branches to follow
● Each branch (directory) is separated by /
● The path can be absolute or relative
● Absolute: always starts from the root
– i.e. “/home/Alice/Desktop/vacation/sunset.jpg”
● Relative: starts from your current directory
– i.e. “Desktop/vacation/sunset.jpg” if you are in
/home/Alice/
22. November 2015 Giuseppe Profiti
Special directories
● The current directory is “.”
– So “sunset.jpg” and “./sunset.jpg” are the same file
● The previous directory is “..”
– i.e. If you are in “/home/Alice/Desktop/work/”, you
write “../vacation/sunset.jpg”
– If you are in “/home/Alice/experiment/data/”, you
type “../../Desktop/vacation/sunset.jpg”
23. November 2015 Giuseppe Profiti
/
B A
WORKHOME
A
3.TXT
1.TXT
3.TXT2.TXT1.TXT
Exercise 4: path
While in /home/ check the following relative paths:
● A/1.TXT
● ../WORK/1.TXT
● ../WORK/A/../1.TXT
● ../WORK/A/../../HOME/B/../A/1.TXT
Specify the absolute paths for
the following files:
● leftmost and rightmost 3.TXT
● leftmost and rightmost 1.TXT
24. November 2015 Giuseppe Profiti
File permissions
● Files can be read, written and executed
● The owner of a file can restrict these operations
– For herself
– For other members of the group
– For everyone else
Examples:
● Experiment data that should not be overwritten
● Data shared only with group members for read
purposes
25. November 2015 Giuseppe Profiti
File permissions
● Permissions can be changed using chmod
● The shortcuts are:
– User (u), Group (g), Others (o), All (a)
– adding (+), removing (-)
– Read (r), Write (w) and eXecute (x)
● To remove the write permission to the group:
chmod g-w
26. November 2015 Giuseppe Profiti
Show file permissions
● To show the permissions use
ls -l
● For each file, at the beginning you get
-rw-r-xr--
– -rw-r-xr-- this are for the user (read and write)
– -rw-r-xr-- this are for the group (read and execute)
– -rw-r-xr-- this are for others (only read)
● The first position is for things like directories (d)
27. November 2015 Giuseppe Profiti
File types
● Extensions mean nothing
– .doc, .jpg and so on are just conventions
● Text and binary files
– Text can be printed and read by humans
● Plain text, CSV, XML are all text-based
– Binary can be read by programs
● Data and programs
– A program can be executed by the system
(executable permission does not make a program)
28. November 2015 Giuseppe Profiti
Programs and processes
● An executable program sits in the disk
● A running program becomes a process
– You can have multiple processes spawned from the
same program: i.e. many blastall running
● Each process has a unique identifier (pid)
● To inspect the running processes: ps or top
● To quit a running process, use CTRL+c or
kill <pid>
29. November 2015 Giuseppe Profiti
Exercise 5: processes
● Open two shells
● In one shell run the following command
sleep 20m
● In the other shell, run ps to find the pid of sleep
● Kill the process using
kill <pid>
● Note: on remote servers you can't CTRL-C
unless you keep the connection open
30. November 2015 Giuseppe Profiti
Parameters vs arguments
● The argument(s) is the subject of the operation
– ls /home/Alice/Desktop
– kill 260046
● Parameters (or options) modify the behaviour
– ls -l /home/Beatrix/Desktop
– top -h
● Parameters usually start with minus sign
– Single one for single letter (-h, -p, -t)
– double for longer parameters (--help, --out)
31. November 2015 Giuseppe Profiti
Moving files around
● You can copy files using the command cp
– cp path/of/original/file path/of/copy
● You can move files using mv
– mv path/of/original/file new/path
● You can delete files using rm
– rm file/to/delete
– Warning: deletion is permanent
32. November 2015 Giuseppe Profiti 32/66
Redirection
● You can save the result of commands to a file
● The output is redirected using >
ls > files.list
● The file is created empty before running ls
● Avoid deletion of the content with append >>
– Adds the output to the end of file
● Errors are not “output”, use 2>
● Both output and error redirected with &>
33. November 2015 Giuseppe Profiti
Inspecting a file
● head prints the first 10 lines
● tail prints the last 10 lines
– You can change the number of lines of both head
and tail by specifying it as parameter
● cat shows the whole file
– Beware to long files
● more shows the whole file, one page at time
34. November 2015 Giuseppe Profiti
Editing a file
● Too many editors to list them all, just a few
● On the shell
– cat > filename writes everything you type to file
● CTRL+d ends the input
– nano, pico: easy to use
– vim, emacs: more advanced
● On the GUI
– gedit
– gvim
35. November 2015 Giuseppe Profiti
Finding text: grep
● It prints the lines containing a match
grep “pattern” filename
● Pattern can be a string or a regular expression
● Useful parameters
– -w matches whole words (i.e. spaces around)
– -x matches whole lines
– -i ignore case (uppercase = lowercase)
– -v reverse match (i.e. lines NOT containing pattern)
36. November 2015 Giuseppe Profiti
Exercise 6: grep
● Download the following file
http://profiti.web.cs.unibo.it/res/p2b/ex.tar.gz
● Move it to the working directory
● Uncompress it
tar -xvf ex.tar.gz
37. November 2015 Giuseppe Profiti
Exercise 6: grep
● Find all the lines containing “m” in test1.txt
grep “m” test1.txt
● Find all the lines NOT containing “m” in test1.txt
grep -v “m” test1.txt
38. November 2015 Giuseppe Profiti
Finding text: grep /2
● You can provide a file of patterns
grep -f patterns.txt data.txt
● The program looks for every line as a separate
pattern
● It may take a while if the two files are big
39. November 2015 Giuseppe Profiti
Comparing
● Look for the differences in two similar files
diff file1 file2
● Compares the two files line by line
● Output
– Line numbers for the different lines
– “<” for lines only in file1
– “>” for lines only in file2
● It is not quite easy to use
40. November 2015 Giuseppe Profiti
Sorting
● Diffing is easier when data are sorted
sort filename
● Useful parameters:
– -n numerical sort (otherwise 100 < 2)
– -r reverse sort
– -k x sort on column number x
– -t x uses x as column separator
41. November 2015 Giuseppe Profiti
Getting columns
● Printing a specific column with cut (ex.: 3rd)
cut -f 3 filename
● You can specify column separator with -d
● Useful arguments for -f:
– N prints the Nth column, counted starting from 1
– N- prints from the Nth to the end of the line
– N-M prints from Nth to Mth (included)
– -M prints from 1 up to Mth (included)
42. November 2015 Giuseppe Profiti 42/66
Pipe: motivation
● Example: I want the file names for all the files
with rwx permissions
● Solution with redirection:
ls -l > files.list
grep “rwx”files.list > wanted-files.list
cut -f 10- -d” ” wanted-files.list >
result.list
43. November 2015 Giuseppe Profiti 43/66
Pipe
● Too many intermediate files
– Possibly big: disk space issues
– Hard to remember: do I need myfiles.list or my.list?
● Rule of thumb: keep intermediate result only if you
need it later for further analysis
● For everything else, use pipe |
ls -l | grep “rwx” | cut -f 10- -d” ” >
result.list
● Pipe sends the result of a command to the input of the
following one
44. November 2015 Giuseppe Profiti 44/66
Pipe
● All the previous examples work also without a
file as input, but with a pipe
● The first 10 lines of a list of files
ls | head
● The first column of the last line of a sorted file
sort file.txt | tail -1 | cut -f 1
45. November 2015 Giuseppe Profiti 45/66
Pipe vs sequence
● Pipe sends the result to the next command
● If you want to execute commands in sequence,
separate them using ;
ls; head test.txt
● What if the second depends from the first?
python my.py > a.txt && sort a.txt
46. November 2015 Giuseppe Profiti
Shell scripting
● What if the command is very long and you have
to use it again?
● What if you have to repeat the same operations
for many inputs?
● Shell scripting is programming for the shell
● Same primitives of programming languages
– IF choices, FOR loops
– Parameters, variables
47. November 2015 Giuseppe Profiti
Shell scripting /2
● Save commands to a text file
● Add execution permissions to the file
● Call the file from the shell
● Example:
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
48. November 2015 Giuseppe Profiti
Shell scripting /3
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● $( ) returns the output of the commands
inside
● Useful for cat and everything that returns a
content
49. November 2015 Giuseppe Profiti
Shell scripting /4
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● * is a wildcard, it means “every string”
● In this case, every string ending with “.fasta”
● Other wildcards are:
– ? means any single character
– [] group choices, i.e. [ae] means either a or e
50. November 2015 Giuseppe Profiti
Shell scripting /5
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● for execute the commands between do and
done one time for each iteration
● i is the iteration variable, it gets one of the
values (in the example, a file name), you
access its value using $i
51. November 2015 Giuseppe Profiti
Shell scripting /6
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● The final result of all the for loops is passed to
sort
● This script returns a list of fasta file with an
associated number of entries, sorted by that
number
52. November 2015 Giuseppe Profiti
Shell scripting /7
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● The final result is redirected to a file, specified
at command line
● Examples:
bash myscript.sh result1.txt
bash myscript.sh result2.txt
53. November 2015 Giuseppe Profiti
Awk
● Awk executes a series of commands for each
line of the input
● It can execute different commands for different
lines, using matching regular expressions
● It may be faster than other tools
● It is easy to use and powerfull
54. November 2015 Giuseppe Profiti
Awk /2
awk '/<regex>/ {<commands>}' a.txt
● You can specify multiple regular expressions
● Commands can contain if and assignments
● Two special keywords instead of regex
– BEGIN matches the beginning of the input, before
the first line
– END matches the end of the input, after the last line
55. November 2015 Giuseppe Profiti
Awk /3
awk 'BEGIN {a=0} {a=a+1} END{print a}'
● It counts the number of lines
● Before the first line, sets the variable a to zero
● For each line, increases the counter
– There is no regex, so each line matches
● At the end, prints the value of the counter
● Works better than wc -l
56. November 2015 Giuseppe Profiti
Awk /4
awk '{print $2,$3}'
● Prints the second and the third column
● Columns are separated by space
● You can specify a different separator with -F
awk -F “,” '{print $2,$3}'
● NF is the number of columns (or “fields”)
● $NF is the value of the last column
57. November 2015 Giuseppe Profiti
Awk /5
awk '/^ATOM/ {if ($5==”A”) print $7,$8,$9}'
● Prints the positions for each atom in the A chain
● It matches only lines starting with “ATOM”
● You can select lines not matching a pattern
awk '!/(TAG)|(TAA)|(TGA)/ {print $3,$4}'
● The ! means “not matching”
● Round brackets group patterns
● | is for alternatives
58. November 2015 Giuseppe Profiti
Awk exercise 1
● Using the example files from
http://profiti.web.cs.unibo.it/res/p2b/ex.tar.gz
1.Print lines containing m in test1.txt
2.Print lines not containing m in test1.txt
3.Print lines with A in second column in test1.txt
4.Print the third column of test1.txt
(a) Use comma as separator
(b) Use E as separator
59. November 2015 Giuseppe Profiti
Awk /6
awk 'BEGIN {name=””}
/^>/ {name=$0; d[name]=””}
!/^>/ {d[name]=d[name]+length($0)}
END {for (i in d)
print substr(i,2,length(i)),d[i]}'
● Uses an array d, it's like python dictionaries
● $0 is the whole line
● substr is the substring, positions starts from 1
● Prints a list of fasta entries and their length
60. November 2015 Giuseppe Profiti
Awk exercise 2
● Print the sum of the elements of the third
column of test1.txt
● Print the average of the elements of the fourth
column of test1.txt
● Take a look at data1.txt and data2.txt
– Did you just opened them with an editor?
– Did you just used cat?
61. November 2015 Giuseppe Profiti
Awk exercise 3
● How many lines in data1.txt and data2.txt?
$wc -l data*
2999997 data1.txt
2999999 data2.txt
● Is it true?
– data1.txt contains 2999998 lines
– data2.txt contains 3000000 lines
● They contain the same numbers, but 2
● Which ones?
62. November 2015 Giuseppe Profiti
Awk /7
awk 'BEGIN {while
((getline<"patterns.txt")>0)diz[$1]=0}
{if ($1 in diz) print $0}'
● Works like grep -f patterns.txt
● Getline reads the file one line at the time
● Each line becomes a key in the array
● The input is then checked against existing keys
● For big files, it is faster than grep
– O(N*M) vs O(N+M)
63. November 2015 Giuseppe Profiti
Awk exercise 3, solution
diff <(sort data1.txt) <(sort data2.txt)
● Diff is picky, the result is not that good
– Took 14 seconds on a test computer
grep -v -f data1 data2.txt
● Good luck, it may take a while
– It may freeze your computer
● Awk takes 4 seconds on a test computer
64. November 2015 Giuseppe Profiti
Awk vs Python
● Reading fasta, awk style
awk 'BEGIN {name=””}
/^>/ {name=$0; d[name]=””}
!/^>/ {d[name]=d[name]+length($0)}
END {for (i in d)
print substr(i,2,length(i)),d[i]}'
● Note: awk scripts can be saved to a file
● Use the -f option to call the saved file
65. November 2015 Giuseppe Profiti
Awk vs Python
● Reading fasta, Python style
import sys
f = open(sys.argv[1])
d = {}
name = “”
for r in f:
r = r.rstrip()
if r[0]=='>':
name = r[1:]
d[name]=0
else:
d[name]+=len(r)
f.close()
for k in d:
print k,d[k]