Introduction to Programming and Algorithms       Paolo Marcatili       “Sapienza” University of Rome       Biocomputing gr...
Course Calendar & structure   Fri 2, Dec: Introduction & Unix  Fri 13, Jan: Perl – Data types  Fri 20, Jan: Perl – IO and ...
Unix for Bioinformaticians                               A survival guideBioinformatics master course, ‘11/’12            ...
Agenda •    Unix •    Folders •    Files •    Processes •    RedirectionBioinformatics master course, ‘11/’12            P...
UnixBioinformatics master course, ‘11/’12            Paolo Marcatili
What is Unix? Unix                                         Bioinformatics master course, ‘11/’12            Paolo Marcatil...
What is Unix?  Operating system stable,  multi-user,  multi-tasking  for servers,  desktops  and laptops. Bioinformatics m...
Types of Unix •  Solaris"     •  OS-X"     •  Linux!Bioinformatics master course, ‘11/’12            Paolo Marcatili
Unix Operating SystemBioinformatics master course, ‘11/’12            Paolo Marcatili
What’s in Unix?  Files (data)     Processes (actions)Bioinformatics master course, ‘11/’12            Paolo Marcatili
Unix FilesystemBioinformatics master course, ‘11/’12            Paolo Marcatili
Terminal It makes the difference!  Powerful Transparent User-unfriendlyBioinformatics master course, ‘11/’12            Pa...
Our Task TodayBioinformatics master course, ‘11/’12            Paolo Marcatili
Human ImmunoglobulinsBioinformatics master course, ‘11/’12            Paolo Marcatili
Human ImmunoglobulinsBioinformatics master course, ‘11/’12            Paolo Marcatili
The data We have 3 files  Fasta Sequences of Heavy (heavy.fasta) Lambda (lambda.fasta) Kappa (kappa.fasta)Bioinformatics ma...
The Task •    Look data •    Correct errors •    Do some analysis •    Make a single fasta fileBioinformatics master course...
Folders: read and writeBioinformatics master course, ‘11/’12            Paolo Marcatili
Change Directory To know where you are >pwd /home/bioinfoSMFN Let’s move! >cd /home/bioinfoSMFN/Desktop/task Or >cd Deskto...
Folder content Let’s check what’s in the folder: >ls Better >ls -la  Cheat: parameters are useful! >ls -lart Bioinformatic...
Wildcards Extremely useful!  >ls -la *heavy* >ls -la heavy.?asta  *= 0,1,2,…. occurences of whatever ?= exactly 1 occurenc...
Commands To know more about a command:  >man ls >whatis ls >apropos ls  Or Google!  Bioinformatics master course, ‘11/’12 ...
Unix PermissionsBioinformatics master course, ‘11/’12            Paolo Marcatili
Folders - summary Command       Meaning ls          list files and directories ls -a       list all files and directories mk...
Files: read and writeBioinformatics master course, ‘11/’12            Paolo Marcatili
Backup original data Create a directory >mkdir backup Copy all the files in the directory >cp heavy.fasta backup/ >cp heavy...
Read a file Let’s see… >cat heavy.fasta Bioinformatics master course, ‘11/’12            Paolo Marcatili
Read a file Let’s see… >cat heavy.fasta  Mmh… >less heavy.fasta Hint: q to quit…  but if you are stuck try ctrl+cBioinforma...
Look for text If we want to look for something while in less: / 91979410 (enter) Bioinformatics master course, ‘11/’12    ...
Edit a file Let’s remove the first H sequence >vi heavy.fasta >nano heavy.fasta  For those who live in 2008 >gedit heavy.fas...
Look for text How many Igs in each file? We can count the lines! >wc heavy.fasta Bioinformatics master course, ‘11/’12     ...
Look for text How many Igs in each file? We can count the lines! >wc heavy.fasta  Lines > proteins!!! Bioinformatics master...
Grep # of proteins = # of “>” in a file  >grep “>” heavy.fasta >grep -c “>” heavy.fasta  Do it for all the files and write t...
Grep >grep -c V-region heavy.fasta >grep -c v-region heavy.fasta >grep -ci v-region heavy.fasta  Ok, better! Bioinformatic...
Files: sumary cp file1 file2 copy file1 and call it file2 mv file1 file2 move or rename file1 to file2 rm file  remove a file cat fil...
ProcessesBioinformatics master course, ‘11/’12            Paolo Marcatili
Run a process Process = execution of some instructions Usually the instructions are in a file.  e.g. less -> /usr/bin/less ...
Run - It’s simple Ok, so let’s try >./loop.plBioinformatics master course, ‘11/’12            Paolo Marcatili
Run - It’s simple Ok, so let’s try >./loop.pl  Ok, and now?Bioinformatics master course, ‘11/’12            Paolo Marcatil...
Controlling processes Ctrl+z -> go to sleep  Now you’ve got the control again! But it’s not still dead… bg Ctrl+C Bioinfor...
Controlling processes Or: >./loop.pl Ctrl+z  >ps >top (q to exit) >kill the_number_that_you_have_just_read Bioinformatics ...
Redirection                              Control the force                                      Bioinformatics master cour...
Inputs and outputs                Keyboard               Mouse                                Tablet                      ...
Write into a file >Cat > lista.txt  Heavy 29061 Ctrl+d  And >less lista.txtBioinformatics master course, ‘11/’12           ...
Append to a file >Cat >> lista.txt  Kappa 7476 Ctrl+d  And >less lista.txt Bioinformatics master course, ‘11/’12           ...
The big one! >cat heavy.fasta > bigone.fasta >cat kappa >> bigone.fasta >cat lambda.fasta >> bigone.fasta >grep -c “>” big...
Extract headers >grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head  Do we need the first file?Bioi...
Extract headers >grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head  Do we need the first file? No!...
Redirect everything! >grep ">" biggone.fasta 2> err.log >less err.log   Bioinformatics master course, ‘11/’12            P...
Redirect - summary  command > file redirect std output to a file command >> file append std output to a file command < file red...
Quiz •  What does this command perform?"    >grep -v ">" heavy.fasta •  How many proteins in each file? •  How many residue...
Upcoming SlideShare
Loading in …5
×

Master unix 2011

400 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
400
On SlideShare
0
From Embeds
0
Number of Embeds
112
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Master unix 2011

  1. 1. Introduction to Programming and Algorithms Paolo Marcatili “Sapienza” University of Rome Biocomputing group Bioinformatics master course, ‘11/’12 Paolo Marcatili
  2. 2. Course Calendar & structure Fri 2, Dec: Introduction & Unix Fri 13, Jan: Perl – Data types Fri 20, Jan: Perl – IO and loops Fri 27, Jan: Algorithms: Sort and Search Thu 2, Feb: Perl – Parsing Thu 9, Feb: Seminar – Machine Learning Fri 10, Feb: Perl – Practicals Fri 17, Feb: Perl – PracticalsBioinformatics master course, ‘11/’12 Paolo Marcatili
  3. 3. Unix for Bioinformaticians A survival guideBioinformatics master course, ‘11/’12 Paolo Marcatili
  4. 4. Agenda •  Unix •  Folders •  Files •  Processes •  RedirectionBioinformatics master course, ‘11/’12 Paolo Marcatili
  5. 5. UnixBioinformatics master course, ‘11/’12 Paolo Marcatili
  6. 6. What is Unix? Unix Bioinformatics master course, ‘11/’12 Paolo Marcatili
  7. 7. What is Unix? Operating system stable, multi-user, multi-tasking for servers, desktops and laptops. Bioinformatics master course, ‘11/’12 Paolo Marcatili
  8. 8. Types of Unix •  Solaris" •  OS-X" •  Linux!Bioinformatics master course, ‘11/’12 Paolo Marcatili
  9. 9. Unix Operating SystemBioinformatics master course, ‘11/’12 Paolo Marcatili
  10. 10. What’s in Unix? Files (data) Processes (actions)Bioinformatics master course, ‘11/’12 Paolo Marcatili
  11. 11. Unix FilesystemBioinformatics master course, ‘11/’12 Paolo Marcatili
  12. 12. Terminal It makes the difference! Powerful Transparent User-unfriendlyBioinformatics master course, ‘11/’12 Paolo Marcatili
  13. 13. Our Task TodayBioinformatics master course, ‘11/’12 Paolo Marcatili
  14. 14. Human ImmunoglobulinsBioinformatics master course, ‘11/’12 Paolo Marcatili
  15. 15. Human ImmunoglobulinsBioinformatics master course, ‘11/’12 Paolo Marcatili
  16. 16. The data We have 3 files Fasta Sequences of Heavy (heavy.fasta) Lambda (lambda.fasta) Kappa (kappa.fasta)Bioinformatics master course, ‘11/’12 Paolo Marcatili
  17. 17. The Task •  Look data •  Correct errors •  Do some analysis •  Make a single fasta fileBioinformatics master course, ‘11/’12 Paolo Marcatili
  18. 18. Folders: read and writeBioinformatics master course, ‘11/’12 Paolo Marcatili
  19. 19. Change Directory To know where you are >pwd /home/bioinfoSMFN Let’s move! >cd /home/bioinfoSMFN/Desktop/task Or >cd Desktop/ Bioinformatics master course, ‘11/’12 Paolo Marcatili
  20. 20. Folder content Let’s check what’s in the folder: >ls Better >ls -la Cheat: parameters are useful! >ls -lart Bioinformatics master course, ‘11/’12 Paolo Marcatili
  21. 21. Wildcards Extremely useful! >ls -la *heavy* >ls -la heavy.?asta *= 0,1,2,…. occurences of whatever ?= exactly 1 occurence of whatever Bioinformatics master course, ‘11/’12 Paolo Marcatili
  22. 22. Commands To know more about a command: >man ls >whatis ls >apropos ls Or Google! Bioinformatics master course, ‘11/’12 Paolo Marcatili
  23. 23. Unix PermissionsBioinformatics master course, ‘11/’12 Paolo Marcatili
  24. 24. Folders - summary Command Meaning ls list files and directories ls -a list all files and directories mkdir make a directory cd directory change to named directory cd change to home-directory cd ~ change to home-directory cd .. change to parent directory pwd path of the current directoryBioinformatics master course, ‘11/’12 Paolo Marcatili
  25. 25. Files: read and writeBioinformatics master course, ‘11/’12 Paolo Marcatili
  26. 26. Backup original data Create a directory >mkdir backup Copy all the files in the directory >cp heavy.fasta backup/ >cp heavy.fasta heavy_copy.fasta >mv heavy_copy.fasta backup/ heavy_backup.fastaBioinformatics master course, ‘11/’12 Paolo Marcatili
  27. 27. Read a file Let’s see… >cat heavy.fasta Bioinformatics master course, ‘11/’12 Paolo Marcatili
  28. 28. Read a file Let’s see… >cat heavy.fasta Mmh… >less heavy.fasta Hint: q to quit… but if you are stuck try ctrl+cBioinformatics master course, ‘11/’12 Paolo Marcatili
  29. 29. Look for text If we want to look for something while in less: / 91979410 (enter) Bioinformatics master course, ‘11/’12 Paolo Marcatili
  30. 30. Edit a file Let’s remove the first H sequence >vi heavy.fasta >nano heavy.fasta For those who live in 2008 >gedit heavy.fastaBioinformatics master course, ‘11/’12 Paolo Marcatili
  31. 31. Look for text How many Igs in each file? We can count the lines! >wc heavy.fasta Bioinformatics master course, ‘11/’12 Paolo Marcatili
  32. 32. Look for text How many Igs in each file? We can count the lines! >wc heavy.fasta Lines > proteins!!! Bioinformatics master course, ‘11/’12 Paolo Marcatili
  33. 33. Grep # of proteins = # of “>” in a file >grep “>” heavy.fasta >grep -c “>” heavy.fasta Do it for all the files and write the result (I think they are too many…)Bioinformatics master course, ‘11/’12 Paolo Marcatili
  34. 34. Grep >grep -c V-region heavy.fasta >grep -c v-region heavy.fasta >grep -ci v-region heavy.fasta Ok, better! Bioinformatics master course, ‘11/’12 Paolo Marcatili
  35. 35. Files: sumary cp file1 file2 copy file1 and call it file2 mv file1 file2 move or rename file1 to file2 rm file remove a file cat file display a file less file display a file a page at a time head file display the first few lines tail file display the last few lines grep key file search a file for keywords wc file number of lines Bioinformatics master course, ‘11/’12 Paolo Marcatili
  36. 36. ProcessesBioinformatics master course, ‘11/’12 Paolo Marcatili
  37. 37. Run a process Process = execution of some instructions Usually the instructions are in a file. e.g. less -> /usr/bin/less So >/usr/bin/less heavy.fastaBioinformatics master course, ‘11/’12 Paolo Marcatili
  38. 38. Run - It’s simple Ok, so let’s try >./loop.plBioinformatics master course, ‘11/’12 Paolo Marcatili
  39. 39. Run - It’s simple Ok, so let’s try >./loop.pl Ok, and now?Bioinformatics master course, ‘11/’12 Paolo Marcatili
  40. 40. Controlling processes Ctrl+z -> go to sleep Now you’ve got the control again! But it’s not still dead… bg Ctrl+C Bioinformatics master course, ‘11/’12 Paolo Marcatili
  41. 41. Controlling processes Or: >./loop.pl Ctrl+z >ps >top (q to exit) >kill the_number_that_you_have_just_read Bioinformatics master course, ‘11/’12 Paolo Marcatili
  42. 42. Redirection Control the force Bioinformatics master course, ‘11/’12 Paolo Marcatili
  43. 43. Inputs and outputs Keyboard Mouse Tablet Kernel Display Printer FileBioinformatics master course, ‘11/’12 Paolo Marcatili
  44. 44. Write into a file >Cat > lista.txt Heavy 29061 Ctrl+d And >less lista.txtBioinformatics master course, ‘11/’12 Paolo Marcatili
  45. 45. Append to a file >Cat >> lista.txt Kappa 7476 Ctrl+d And >less lista.txt Bioinformatics master course, ‘11/’12 Paolo Marcatili
  46. 46. The big one! >cat heavy.fasta > bigone.fasta >cat kappa >> bigone.fasta >cat lambda.fasta >> bigone.fasta >grep -c “>” bigone.fasta Bioinformatics master course, ‘11/’12 Paolo Marcatili
  47. 47. Extract headers >grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head Do we need the first file?Bioinformatics master course, ‘11/’12 Paolo Marcatili
  48. 48. Extract headers >grep “>” bigone.fasta > headers.head >sort headers.head > headers_sort.head Do we need the first file? No! >grep “>” bigone.fasta | sort > headers_sorted.head | is called pipe, and it’s somethingBioinformatics master course, ‘11/’12 Paolo Marcatili
  49. 49. Redirect everything! >grep ">" biggone.fasta 2> err.log >less err.log Bioinformatics master course, ‘11/’12 Paolo Marcatili
  50. 50. Redirect - summary command > file redirect std output to a file command >> file append std output to a file command < file redirect std input from a file cmd1 | cmd2 pipe the output of cmd1 to the input of cmd2 cat f1 f2 > f0 concatenate f1 and f2 to f0 sort sort data Bioinformatics master course, ‘11/’12 Paolo Marcatili
  51. 51. Quiz •  What does this command perform?" >grep -v ">" heavy.fasta •  How many proteins in each file? •  How many residues in each file?" (hint: wc -m counts the # of char in a line) •  Average length of proteins in each file? •  Average content of Prolines of VH, VL and VK?" (hint: look at grep parameters)Bioinformatics master course, ‘11/’12 Paolo Marcatili

×