Nerd talk: regexes

352 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
352
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nerd talk: regexes

  1. 1. Regexes: It's magic!
  2. 2. “Some people, when confronted with a problem, think 'I know, I'll use regular expressions!' Now they have two problems.”
  3. 3. *
  4. 4. Perl style regex: It's magic done right!
  5. 5. Metacharacters ^ beginning $ end . anything escape /^....G..AA$/
  6. 6. Escaped characters s whitespace /^wwwwGwwAA$/ S not-whitespace /^dddddddd$/ w word d digit . dot counterslash
  7. 7. Repetition ? 0 or 1 time /^w{4}Gw{2}AA$/ * 0 or more times /^d{1,2}d{1,2}d{2,4}$/ + 1 or more times *? ungreedy * +? ungreedy + {m} m times {m, n} m up to n times {m, n}? ungreedy {m,n}
  8. 8. Grouping [ABC] any of these characters (AB|BC|CA) any of these expressions (THIS!) save this [A-Za-z0-9] ranges /^[ACTG]{4}G[ACTG] {2}AA$/ /^(0?[1-9]|[0-2]d|3[01]) (0?d|1[0-2]) (d{2}|d{4})$/
  9. 9. OVERKILL http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb
  10. 10. In Python (sigh...)
  11. 11. E.g.: finding files
  12. 12. E.g.: finding files iel' v 'Dan ' | grep bo p -v 'bu e '->' | gr ep -la | gr ls
  13. 13. E.g.: demultiplexing fasta 1. Barcode 2. Primer 3. Random nucleotides grep -P '1:N:0:ACTGGTT' -A3 –no-group-separator multiplex_R1.fastq | grep -P '^[ACTGN] {4}CCC[ACGT]T[GC]AGATA' -A2 -B1 --no-group-separator > deplexed_R1.fq
  14. 14. E.g.: paper figures! From the subset of unique sequences that span the entire region under study, how many unique sequences are matched by each primer combination?
  15. 15. Sed: find & replace “Are you gonna talk about vim regexes?” “Sed regexes are weird” My work around: use ranges [0-9] [A-Z] [a-z] [A-Za-z]
  16. 16. Sed: find & replace “Are you gonna talk about vim regexes?” Sed regexes are weird” My work around: use ranges [0-9] [A-Z] [a-z] [A-Za-z] E.g.: “Oh noes, Americans don't know how to separate decimals!” sed 's/./,/g' hisfile.tab > myfile.tab “Oh noes, this bloody file was edited in Windows!” sed 's/r/n/' theirfile.tab > decentfile.tab “Oh noes, Cassava 1.6 has a slash in it!” sed 's,/1, 1:N:0:NNNNNN,' oldfile.fq > newfile.fq
  17. 17. Other neat stuff grep (-c) sort (-n, -r, -k, -t) uniq -c
  18. 18. LMGTFY: sed http://www.tutorialspoint.com/unix/unix-regular-expressions.htm grep http://linux.about.com/od/commands/l/blcmdl1_grep.htm Perl http://www.cs.tut.fi/~jkorpela/perl/regexp.html Python http://docs.python.org/2/howto/regex.html Vim http://vimregex.com/
  19. 19. sed 's/fear of regex/love of regex/g'

×