Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
I Workshop on command-
line tools
(day 2)
Center for Applied Genomics
Children's Hospital of Philadelphia
February 12-13, ...
awk - a powerful way to check conditions
and show specific columns
Example: show only CNV that use less than 3
targets (ex...
awk - different ways to do the same thing
tail -n +2 DATA.xcnv | awk '$8 <= 3'
# same effect 1
tail -n +2 DATA.xcnv | awk ...
awk - more options on if statement
# Applying XHMM "gold" thresholds (KB >= 1,
# NUM_TARG >= 3, Q_SOME >= 65, Q_NON_DIPLOI...
diff - compare files line by line
# Compare
diff DATA.gold.xcnv DATA.gold2.xcnv
# Tip: install tkdiff to use a
# graphic v...
Exercises
1. Using adhd.map, show 10 SNPs with rsID starting with 'rs' on
chrom. 2, between positions 1Mb and 2Mb
2. Check...
Suggestions
# 1.
grep 'brs' adhd.map | 
awk '$1 == 2 && int($4) >= 1000000 && int($4) <= 2000000' | 
less
# 2.
cut -f1 adh...
More awk - inserting external variables
awk -v Mb=1000000 -v chrom=2 
'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb' 
a...
Using awk to check number of variants
in ped files
# Options using only awk, but takes (much) more time
awk 'NR == 1 {prin...
time - time command execution
time head -n 1 adhd.ped | awk '{print (NF-6)/2}'
real 0m0.485s
user 0m0.391s
sys 0m0.064s
ti...
top - display and update sorted information
about processes / display Linux taks
top
z : color
k : kill process
u : choose...
screen - screen manager with terminal emulation (i)
screen
screen -S <session_name>
Ctrl+a, then c: create window
Ctrl+a, ...
screen - screen manager with terminal emulation (ii)
Ctrl+a, then [ : activate copy mode (to scroll screen)
q : quit copy ...
split - split a file into pieces
split -l <lines_of_each_piece> <input> <prefix>
# Example
split -l 100000 adhd.map map_
w...
in-line Perl/sed to find and replace (i)
head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr/CHR/g'
head DATA.gold.xcnv | cut ...
in-line Perl/sed to find and replace (ii)
# "s" means substitute
# "g" means global (replace all matches, not only first)
...
copy from terminal to clipboard/
paste from clipboard to terminal
# This is like Ctrl+V in your terminal
pbpaste
# This is...
datamash - command-line calculations
tail -n +2 DATA.xcnv | 
head | 
cut -f6,10,11 | 
datamash mean 1 sum 2 min 3
# mean o...
touch - change file access and
modification times
ls -lh DATA.gold.xcnv
touch DATA.gold.xcnv
ls -lh DATA.gold.xcnv
Introduction to "for" loop
tail -n +2 DATA.xcnv | cut -f1 | sort | uniq | head >
samples.txt
for sample in `cat samples.tx...
Variables (i)
i=1
name=Leandro
count=`wc -l adhd.map`
echo $i
echo $name
echo $count
Variables (ii)
# Examples
bwa=/home/users/llima/tools/bwa
hg19=/references/hg19.fasta
# Do not run
$bwa index $hg19
System variables
echo $HOME
echo $USER
echo $PWD
# directory where bash looks for your programs
echo $PATH
Exercise
1. Create a program that shows input
parameters/arguments
2. Create a program (say, "fields", or
"colnames") that...
Running a bash script (i)
cat > arguments.sh
echo Your program is $0
echo Your first argument is $1
echo Your second argum...
Running a bash script (ii)
bash arguments.sh
bash arguments.sh A B C D E
ls -lh arguments.sh
-rw-r--r--
# First character
b Block special file.
c Character special file.
d Directory.
l Symbolic l...
Next characters
user, group, others | read, write, execute
ls -lh arguments.sh
-rw-r--r--
# Everybody can read
# Only user...
# Add writing permission to group
chmod g+w arguments.sh
ls -lh arguments.sh
# Remove writing permission from group
chmod ...
# Add writing permission to group
./arguments.sh
./arguments.sh A B C D E
# change the name
mv arguments.sh arguments
# Se...
Upcoming SlideShare
Loading in …5
×

Workshop on command line tools - day 2

579 views

Published on

Slides of the I Workshop on command-line tools with the collaboration of CAG (Center for Applied Genomics - Children's Hospital of Philadelphia) bioinformatics analysts.

2nd day

Published in: Software
  • Be the first to comment

  • Be the first to like this

Workshop on command line tools - day 2

  1. 1. I Workshop on command- line tools (day 2) Center for Applied Genomics Children's Hospital of Philadelphia February 12-13, 2015
  2. 2. awk - a powerful way to check conditions and show specific columns Example: show only CNV that use less than 3 targets (exons) tail -n +2 DATA.xcnv | awk '$8 <= 3'
  3. 3. awk - different ways to do the same thing tail -n +2 DATA.xcnv | awk '$8 <= 3' # same effect 1 tail -n +2 DATA.xcnv | awk '$8 <= 3 {print}' # same effect 2 tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print}' # same effect 3 tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $0}' # different effect tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $1}'
  4. 4. awk - more options on if statement # Applying XHMM "gold" thresholds (KB >= 1, # NUM_TARG >= 3, Q_SOME >= 65, Q_NON_DIPLOID >= 65) tail -n +2 DATA.xcnv | awk '$4 >= 1 && $8 >= 3 && $10 >= 65 && $11 >= 65' > DATA.gold.xcnv # Using only awk awk 'NR > 1 && $4 >= 1 && $8 >= 3 && $10 >= 65 && $11 >= 65' DATA.xcnv > DATA.gold2.xcnv
  5. 5. diff - compare files line by line # Compare diff DATA.gold.xcnv DATA.gold2.xcnv # Tip: install tkdiff to use a # graphic version of diff
  6. 6. Exercises 1. Using adhd.map, show 10 SNPs with rsID starting with 'rs' on chrom. 2, between positions 1Mb and 2Mb 2. Check which chromosome has more SNPs 3. Check which snp IDs are duplicated
  7. 7. Suggestions # 1. grep 'brs' adhd.map | awk '$1 == 2 && int($4) >= 1000000 && int($4) <= 2000000' | less # 2. cut -f1 adhd.map | sort | uniq -c | sort -k1n | tail -1 # 3. cut -f2 adhd.map | sort | uniq -c | awk '$1 > 1'
  8. 8. More awk - inserting external variables awk -v Mb=1000000 -v chrom=2 '$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb' adhd.map | less # Printing specific columns awk -v Mb=1000000 -v chrom=2 '$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb {print $1" "$2" "$4}' adhd.map | less
  9. 9. Using awk to check number of variants in ped files # Options using only awk, but takes (much) more time awk 'NR == 1 {print (NF-6)/2}' adhd.ped awk 'NR < 2 {print (NF-6)/2}' adhd.ped # Slow, too # Better alternative head -n 1 adhd.ped | awk '{print (NF-6)/2}' # Now, the map file wc -l adhd.map
  10. 10. time - time command execution time head -n 1 adhd.ped | awk '{print (NF-6)/2}' real 0m0.485s user 0m0.391s sys 0m0.064s time awk 'NR < 2 {print (NF-6)/2}' adhd.ped # Forget… just press Ctrl+C real 1m0.611s user 0m51.261s sys 0m0.826s
  11. 11. top - display and update sorted information about processes / display Linux taks top z : color k : kill process u : choose specific user c : show complete commands running 1 : show usage of singles CPUs q : quit
  12. 12. screen - screen manager with terminal emulation (i) screen screen -S <session_name> Ctrl+a, then c: create window Ctrl+a, then n: go to next window Ctrl+a, then p: go to previous window Ctrl+a, then 0: go to window number 0 Ctrl+a, then z: leave your session, but keep running
  13. 13. screen - screen manager with terminal emulation (ii) Ctrl+a, then [ : activate copy mode (to scroll screen) q : quit copy mode exit : close current window screen -r : resume the only session detached screen -r <session_name> : resume specific session detached screen -rD <session_name> : reattach session
  14. 14. split - split a file into pieces split -l <lines_of_each_piece> <input> <prefix> # Example split -l 100000 adhd.map map_ wc -l map_*
  15. 15. in-line Perl/sed to find and replace (i) head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr/CHR/g' head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr//g' # Other possibilities head DATA.gold.xcnv | cut -f3 | perl -pe 's|chr||g' head DATA.gold.xcnv | cut -f3 | perl -pe 's!chr!!g' head DATA.gold.xcnv | cut -f3 | sed 's/chr//g' # Creating a BED file head DATA.gold.xcnv | cut -f3 | perl -pe 's/[:-]/t/g'
  16. 16. in-line Perl/sed to find and replace (ii) # "s" means substitute # "g" means global (replace all matches, not only first) # See the difference... head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/g' head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/' # Adding more replacements head DATA.gold.xcnv | cut -f3 | sed 's/1/one/g; s/2/two/g'
  17. 17. copy from terminal to clipboard/ paste from clipboard to terminal # This is like Ctrl+V in your terminal pbpaste # This is like Ctrl+C from your terminal head DATA.xcnv | pbcopy # Then, Ctrl+V in other text editor # On Linux, you can install "xclip" http://sourceforge.net/projects/xclip/
  18. 18. datamash - command-line calculations tail -n +2 DATA.xcnv | head | cut -f6,10,11 | datamash mean 1 sum 2 min 3 # mean of 1st column # sum of 2nd column # minimum of 3rd column http://www.gnu.org/software/datamash/
  19. 19. touch - change file access and modification times ls -lh DATA.gold.xcnv touch DATA.gold.xcnv ls -lh DATA.gold.xcnv
  20. 20. Introduction to "for" loop tail -n +2 DATA.xcnv | cut -f1 | sort | uniq | head > samples.txt for sample in `cat samples.txt`; do touch $sample.txt; done ls -lh Sample* for sample in `cat samples.txt`; do mv $sample.txt $sample.csv; done
  21. 21. Variables (i) i=1 name=Leandro count=`wc -l adhd.map` echo $i echo $name echo $count
  22. 22. Variables (ii) # Examples bwa=/home/users/llima/tools/bwa hg19=/references/hg19.fasta # Do not run $bwa index $hg19
  23. 23. System variables echo $HOME echo $USER echo $PWD # directory where bash looks for your programs echo $PATH
  24. 24. Exercise 1. Create a program that shows input parameters/arguments 2. Create a program (say, "fields", or "colnames") that prints the column names of a <tab>-delimited file (example: DATA.xcnv) 3. Send this program to your PATH
  25. 25. Running a bash script (i) cat > arguments.sh echo Your program is $0 echo Your first argument is $1 echo Your second argument is $2 echo You entered $# parameters. # Ctrl+C to exit "cat"
  26. 26. Running a bash script (ii) bash arguments.sh bash arguments.sh A B C D E
  27. 27. ls -lh arguments.sh -rw-r--r-- # First character b Block special file. c Character special file. d Directory. l Symbolic link. s Socket link. p FIFO. - Regular file. chmod - set permissions (i)
  28. 28. Next characters user, group, others | read, write, execute ls -lh arguments.sh -rw-r--r-- # Everybody can read # Only user can write/modify chmod - set permissions (ii)
  29. 29. # Add writing permission to group chmod g+w arguments.sh ls -lh arguments.sh # Remove writing permission from group chmod g-w arguments.sh ls -lh arguments.sh # Add execution permission to all chmod a+x arguments.sh ls -lh arguments.sh chmod - set permissions (iii)
  30. 30. # Add writing permission to group ./arguments.sh ./arguments.sh A B C D E # change the name mv arguments.sh arguments # Send to your PATH (showing on Mac) sudo cp arguments /usr/local/bin/ # Go to other directory # Type argu<Tab>, and "which arguments" Run your program again

×