Regular expressions in oracle


Published on

Oracle database supports perl- and POSIX-compatible regular expressions with five elegant and powerful functions: REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_INSTR, REGEXP_LIKE, and REGEXP_COUNT.

This session will demonstrate their nuances and how to use them effectively for data cleansing, manipulation and selection, for validating things such as Social Security Numbers, credit cards, IP addresses, phone numbers, DNAs, XMLs, for extracting things such as email-ids, hostnames from URLs and strings, and for transposing delimited columns to rows. There will be a demo of a few tricky examples taken from and

The session will end with fuzzy matching and optimization techniques, and things to watch out for.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Regular expressions in oracle

  1. 1. Regular Expressions in Oracle Logan Palanisamy
  2. 2. Agenda Introduction to regular expressions REGEXP_* functions in Oracle Coffee Break Examples More examples
  3. 3. Meeting Basics Put your phones/pagers on vibrate/mute Messenger: Change the status to offline or in-meeting Remote attendees: Mute yourself (*6). Ask questions via Adobe Connect.
  4. 4. What are Regular Expressions? A way to express patterns  credit cards, license plate numbers, vehicle identification numbers, voter id, driving license UNIX (grep, egrep), PHP, JAVA support Regular Expressions PERL made it popular
  5. 5. String operations before Regular Expression support in Oracle Pull the data from DB and perform it in middle tier or FE OWA_PATTERN in 9i and before LIKE operator
  6. 6. LIKE operator % matches zero or more of any character _ matches exactly one character Examples  WHERE col1 LIKE abc%;  WHERE col1 LIKE %abc;  WHERE col1 LIKE ab_d;  WHERE col1 LIKE ab_d escape ;  WHERE col1 NOT LIKE abc%; Very limited functionality  Check whether first character is numeric: where c1 like 0% OR c1 like 1% OR .. .. c1 like 9%  Very trivial with Regular Exp: where regexp_like(c1, ^[0-9])
  7. 7. Regular ExpressionsMeta Meaningcharacter. Matches any single "character" except newline.* Matches zero or more of the character preceding it e.g.: bugs*, table.*^ Denotes the beginning of the line. ^A denotes lines starting with A$ Denotes the end of the line. :$ denotes lines ending with : Escape character (., *, [, , etc)[] matches one or more characters within the brackets. e.g. [aeiou], [a-z], [a-zA-Z], [0-9], [:alpha:], [a-z?,!][^] negation - matches any characters other than the ones inside brackets. eg. ^[^13579] denotes all lines not starting with odd numbers, [^02468]$ denotes all lines not ending with even numbers 7
  8. 8. Extended Regular ExpressionsMeta character Meaning| alternation. e.g.: ho(use|me), the(y|m), (they|them)+ one or more occurrences of previous character.? zero or one occurrences of previous character.{n} exactly n repetitions of the previous char or group{n,} n or more repetitions of the previous char or group{,m} zero to m repetitions of the previous char or group{n, m} n to m repetitions of previous char or group(....) grouping or subexpressionn back referencing where n stands for the nth sub- expression. e.g.: 1 is the back reference for first sub-expression. 8
  9. 9. POSIX Character ClassesPOSIX Description[:alnum:] Alphanumeric characters[:alpha:] Alphabetic characters[:ascii:] ASCII characters[:blank:] Space and tab[:cntrl:] Control characters[:digit:] Digits, Hexadecimal digits[:xdigit:][:graph:] Visible characters (i.e. anything except spaces, control characters, etc.)[:lower:] Lowercase letters[:print:] Visible characters and spaces (i.e. anything except control characters)[:punct:] Punctuation and symbols.[:space:] All whitespace characters, including line breaks[:upper:] Uppercase letters[:word:] Word characters (letters, numbers and underscores)
  10. 10. Perl Character ClassesPerl POSIX Descriptiond [[:digit:]] [0-9]D [^[:digit:]] [^0-9]w [[:alnum:]_] [0-9a-zA-Z_]W [^[:alnum:]_] [^0-9a-zA-Z_]s [[:space:]]S [^[:space:]] 10
  11. 11. Tools to learn Regular Expressions
  12. 12. REGEXP_* functions Available from 10g onwards. Powerful and flexible, but CPU-hungry. Easy and elegant, but sometimes less performant Usable on text literal, bind variable, or any column that holds character data such as CHAR, NCHAR, CLOB, NCLOB, NVARCHAR2, and VARCHAR2 (but not LONG). Useful as column constraint for data validation
  13. 13. REGEXP_LIKE Determines whether pattern matches. REGEXP_LIKE (source_str, pattern, [,match_parameter]) Returns TRUE or FALSE. Use in WHERE clause to return rows matching a pattern Use as a constraint  alter table t add constraint alphanum check (regexp_like (x, [[:alnum:]])); Use in PL/SQL to return a boolean.  IF (REGEXP_LIKE(v_name, [[:alnum:]])) THEN .. Cant be used in SELECT clause regexp_like.sql
  14. 14. REGEXP_SUBSTR Extracts the matching pattern. Returns NULL when nothing matches REGEXP_SUBSTR(source_str, pattern [, position [, occurrence [, match_parameter]]]) position: character at which to begin the search. Default is 1 occurrence: The occurrence of pattern you want to extract regexp_substr.sql
  15. 15. REGEXP_INSTR Returns the location of match in a string REGEXP_INSTR(source_str, pattern, [, position [, occurrence [, return_option [, match_parameter]]]]) return_option:  0, the default, returns the position of the first character.  1 returns the position of the character following the occurence. regexp_instr.sql
  16. 16. REGEXP_REPLACE Search and Replace a pattern REGEXP_REPLACE(source_str, pattern [, replace_str] [, position [, occurrence [, match_parameter]]]]) If replace_str is not specified, pattern/search_str is replaced with empty string occurence:  when 0, the default, replaces all occurrences of the match.  when n, any positive integer, replaces the nth occurrence. regexp_replace.sql
  17. 17. REGEXP_COUNT New in 11g Returns the number of times a pattern appears in a string. REGEXP_COUNT(source_str, pattern [,position [,match_param]]) For simple patterns it is same as (LENGTH(source_str) – LENGTH(REPLACE(source_str, pattern)))/LENGTH(pattern) regexp_count.sql
  18. 18. Pattern Matching modifiers i – Specifies case-insensitive matching (ignore case) c – Specifies case-sensitive matching n – allows the period (.) to match the newline character m - treats the source string as multiple lines. x - ignores whitespace characters when match_parameter is not specified,  case sensitivity is determined by NLS_SORT parameter (BINARY, BINARY_CI)  A period (.) doesnt match newline character  Source string is treated as a single line match_params.sql
  19. 19. Is a CHAR column all numeric? to_number(c1) returns ORA-01722: invalid number if a varchar2 column contains alpha characters. is_numeric.sql
  20. 20. Check constraints Put validation close to where the data is stored No need to have validation at different clients check_constraint.sql
  21. 21. Extract email-ids Find email-ids embedded in text strings. Possible email-id formats:    extract_emailid.sql
  22. 22. Extract dates Extract dates embedded in text strings. Possible formats 1/5/2007, 2-5-03, 12-31-2009, 1/31/10, 2/5-10 extract_date.sql
  23. 23. Extracting hostnames from URLs Extract hostnames/domain-names embedded in text strings. Possible formats  =sbc&.gx=1&.rand=fegr2vucbecu5    extract_hostname.sql
  24. 24. Convert value pairs to XML Input: A string such as remain1=1;remain2=2; Output: An XML string <remain1><value=1></remain1> <remain2><value=2></remain2> convert_to_xml.sql
  25. 25. Sort IP addresses in numerical order Sort IP addresses, that are stored as character strings, in numerical order. Input      sort_ip_address.sql
  26. 26. Extract first name, last name, and middle initial Extract the first name, last name with an optional middle initial. first_last_mi.sql
  27. 27. Finding the Last Occurrence Find the last numeric sequence from a sequence. Return 567 from abc/123/def567/xyz INSTR and SUBSTR allow backward search when position is negative. REGEXP functions dont allow backward search last_occurrence.sql
  28. 28. Fuzzy Match Tables t1 and t2 each have a varchar2(12) column (t1.x, t2.y). A row in t1 is considered a match for a row in t2, if any six characters in t1.x matches with any six characters in t2.y fuzzy_match.sql
  29. 29. The lazy operator ? is lazy/non-greedy quantifier greedy_lazy.sql
  30. 30. Meta-characters with multiple meanings Same meta characters are used with multiple meanings  ^ used for anchoring and negation.  ? used as quantifier and lazy operator  () used for grouping or sub-expression metachars_with_multiple_meanings.sql
  31. 31. Nuances ? (zero or one), * (zero or more) could sometimes mislead you nuances.sql
  32. 32. Stored patterns patterns can be stored in table columns and be referenced in REGEXP functions No need to hard-code them stored_patterns.sql
  33. 33. Random things Insert a dash before the two last digits Remove a substring Get rid of useless commas from a string Find the word that comes immediately before a substring (e.g. XXX) Replace whole words, not its parts Trimming the trailing digits random.sql
  34. 34. A few other points When not to use Regular Expressions  If the same thing could be used without regular expressions and without too much coding. POSIX notations need double brackets [[:upper]]. [:upper:] wont work. [[:UPPER:]] wont work either. It has to be in lower case letters. Locale support provided with Collation Element ][.ce.]], and Equivalence Classes [[=e=]] MySQL supports regular expressions with RLIKE
  35. 35. References Oracle® Database Advanced Application Developers Guide ( 1/appdev.112/e17125/adfns_regexp.htm#CHDGH BHF) Anti-Patterns in Regular Expressions: First Expressions. An article by Jonathan Gennick Oracle Magazine, Sep/Oct 2003. Oracle Regular Expressions Pocket Reference by Gonathan Gennick gexPocketData.sql
  36. 36. References ... rver.112/e10592/conditions007.htm#SQLRF00501 ernos_regexp.html ase/application_development/pdf/TWP_Regular_E xpressions.pdf h?p_string=regexp_
  37. 37. References ... velop/regexp/regexp_otn.htm l/index.html 716 regular+expression&objID=f75&dateRange=all&userID=&nu mResults=120&rankBy=9 l/index.html STION_ID:2200894550208#1568589800346862515
  38. 38. Q&A devel_oracle@