Your SlideShare is downloading. ×
0
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hacker102 - RegExes w/JavaScript and Python

667

Published on

Basic introduction to regexes using JavaScript and Python. Developed for code4lib 2010 conference preconf "Hacker 101/102".

Basic introduction to regexes using JavaScript and Python. Developed for code4lib 2010 conference preconf "Hacker 101/102".

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
667
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. hacker 102 code4lib 2010 preconference Asheville, NC, USA 2010-02-21
  • 2. iv. regular expressions JavaScript
  • 3. if all language looked like “aabaaaabbbabaababa” it’d be easy to parse
  • 4. parsing “aabaaaabbbabaababa” • there are two elements, “a” and “b” • either may occur in any order • /([ab]+)/
  • 5. • [] denotes “elements” or “class” • // demarcates regex • + denotes “one or more of previous thing” • () denotes “remember this matched group” • /[ab]/ # an ‘a’ or a ‘b’ • /[ab]+/ # one or more ‘a’s or ‘b’s • /([ab]+)/ # a group of one or more ‘a’s or ‘b’s
  • 6. to firebug!
  • 7. • [a-z] is any lower case char bet. a-z • [0-9] is any digit • + is one or more of previous thing • ? is zero or one of previous thing • | is or, e.g. [a|b] is ‘a’ or ‘b’ • * is zero to many of previous thing • . matches any character
  • 8. • [^a-z] is anything *but* [a-z] • [a-zA-Z0-9] is any of a-z, A-Z, 0-9 • {5} matches only 5 of the preceding thing • {2,} matches at least 2 of the preceding thing • {2,6} matches from 2 to 6 of preceding thing • [d] is like [0-9] (any digit) • [S] is any non-whitespace
  • 9. try this • visit any web page • open firebug console • title = window.document.title • try regexes to match parts of the title
  • 10. most every language has regex support
  • 11. try unix “grep”
  • 12. v. glue it together Python
  • 13. problem: Carol’s data
  • 14. TITLE: ABA journal. BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008) CURRENT VOL.: Vol. 95 (2009) - OTHER LIBRARIES: Miami:v. 68 (1982) - USDC: v. 88 (2002) - Birm.:v. 89 (2003) - (Formerly: American Bar Association Journal) (Bound and on Hein) TITLE: Administrative law review. BD. HOLDINGS: Vol. 22 (1969/1970) - Vol. 60 (2008) CURRENT VOL.: Vol. 61 (2009) - (Bound and on Hein)
  • 15. starter code for you
  • 16. #!/usr/bin/env python import re re_tag = re.compile(r'([A-Z .]+):') re_title = re.compile('TITLE: (.*)') for line in open('journals-carol-bean.txt'): line = line.strip() m1 = re_tag.match(line) m2 = re_title.match(line) if line == "": continue print "n->", line, "<-" if m1 or m2: print "MATCH" if m1: print 'tag:', m1.groups() if m2: print 'title:', m2.groups()

×