• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Regular expressions in oracle

Regular expressions in oracle



Oracle database supports perl- and POSIX-compatible regular expressions with five elegant and powerful functions: REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_INSTR, REGEXP_LIKE, and REGEXP_COUNT. ...

Oracle database supports perl- and POSIX-compatible regular expressions with five elegant and powerful functions: REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_INSTR, REGEXP_LIKE, and REGEXP_COUNT.

This session will demonstrate their nuances and how to use them effectively for data cleansing, manipulation and selection, for validating things such as Social Security Numbers, credit cards, IP addresses, phone numbers, DNAs, XMLs, for extracting things such as email-ids, hostnames from URLs and strings, and for transposing delimited columns to rows. There will be a demo of a few tricky examples taken from forums.oracle.com and asktom.oracle.com.

The session will end with fuzzy matching and optimization techniques, and things to watch out for.




Total Views
Views on SlideShare
Embed Views



2 Embeds 8

https://www.linkedin.com 5
http://www.linkedin.com 3



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Regular expressions in oracle Regular expressions in oracle Presentation Transcript

    • Regular Expressions in Oracle Logan Palanisamy
    • Agenda Introduction to regular expressions REGEXP_* functions in Oracle Coffee Break Examples More examples
    • Meeting Basics Put your phones/pagers on vibrate/mute Messenger: Change the status to offline or in-meeting Remote attendees: Mute yourself (*6). Ask questions via Adobe Connect.
    • What are Regular Expressions? A way to express patterns  credit cards, license plate numbers, vehicle identification numbers, voter id, driving license UNIX (grep, egrep), PHP, JAVA support Regular Expressions PERL made it popular
    • String operations before Regular Expression support in Oracle Pull the data from DB and perform it in middle tier or FE OWA_PATTERN in 9i and before LIKE operator
    • LIKE operator % matches zero or more of any character _ matches exactly one character Examples  WHERE col1 LIKE abc%;  WHERE col1 LIKE %abc;  WHERE col1 LIKE ab_d;  WHERE col1 LIKE ab_d escape ;  WHERE col1 NOT LIKE abc%; Very limited functionality  Check whether first character is numeric: where c1 like 0% OR c1 like 1% OR .. .. c1 like 9%  Very trivial with Regular Exp: where regexp_like(c1, ^[0-9])
    • Regular ExpressionsMeta Meaningcharacter. Matches any single "character" except newline.* Matches zero or more of the character preceding it e.g.: bugs*, table.*^ Denotes the beginning of the line. ^A denotes lines starting with A$ Denotes the end of the line. :$ denotes lines ending with : Escape character (., *, [, , etc)[] matches one or more characters within the brackets. e.g. [aeiou], [a-z], [a-zA-Z], [0-9], [:alpha:], [a-z?,!][^] negation - matches any characters other than the ones inside brackets. eg. ^[^13579] denotes all lines not starting with odd numbers, [^02468]$ denotes all lines not ending with even numbers 7
    • Extended Regular ExpressionsMeta character Meaning| alternation. e.g.: ho(use|me), the(y|m), (they|them)+ one or more occurrences of previous character.? zero or one occurrences of previous character.{n} exactly n repetitions of the previous char or group{n,} n or more repetitions of the previous char or group{,m} zero to m repetitions of the previous char or group{n, m} n to m repetitions of previous char or group(....) grouping or subexpressionn back referencing where n stands for the nth sub- expression. e.g.: 1 is the back reference for first sub-expression. 8
    • POSIX Character ClassesPOSIX Description[:alnum:] Alphanumeric characters[:alpha:] Alphabetic characters[:ascii:] ASCII characters[:blank:] Space and tab[:cntrl:] Control characters[:digit:] Digits, Hexadecimal digits[:xdigit:][:graph:] Visible characters (i.e. anything except spaces, control characters, etc.)[:lower:] Lowercase letters[:print:] Visible characters and spaces (i.e. anything except control characters)[:punct:] Punctuation and symbols.[:space:] All whitespace characters, including line breaks[:upper:] Uppercase letters[:word:] Word characters (letters, numbers and underscores)
    • Perl Character ClassesPerl POSIX Descriptiond [[:digit:]] [0-9]D [^[:digit:]] [^0-9]w [[:alnum:]_] [0-9a-zA-Z_]W [^[:alnum:]_] [^0-9a-zA-Z_]s [[:space:]]S [^[:space:]] 10
    • Tools to learn Regular Expressions http://www.weitz.de/regex-coach/ http://www.regexbuddy.com/
    • REGEXP_* functions Available from 10g onwards. Powerful and flexible, but CPU-hungry. Easy and elegant, but sometimes less performant Usable on text literal, bind variable, or any column that holds character data such as CHAR, NCHAR, CLOB, NCLOB, NVARCHAR2, and VARCHAR2 (but not LONG). Useful as column constraint for data validation
    • REGEXP_LIKE Determines whether pattern matches. REGEXP_LIKE (source_str, pattern, [,match_parameter]) Returns TRUE or FALSE. Use in WHERE clause to return rows matching a pattern Use as a constraint  alter table t add constraint alphanum check (regexp_like (x, [[:alnum:]])); Use in PL/SQL to return a boolean.  IF (REGEXP_LIKE(v_name, [[:alnum:]])) THEN .. Cant be used in SELECT clause regexp_like.sql
    • REGEXP_SUBSTR Extracts the matching pattern. Returns NULL when nothing matches REGEXP_SUBSTR(source_str, pattern [, position [, occurrence [, match_parameter]]]) position: character at which to begin the search. Default is 1 occurrence: The occurrence of pattern you want to extract regexp_substr.sql
    • REGEXP_INSTR Returns the location of match in a string REGEXP_INSTR(source_str, pattern, [, position [, occurrence [, return_option [, match_parameter]]]]) return_option:  0, the default, returns the position of the first character.  1 returns the position of the character following the occurence. regexp_instr.sql
    • REGEXP_REPLACE Search and Replace a pattern REGEXP_REPLACE(source_str, pattern [, replace_str] [, position [, occurrence [, match_parameter]]]]) If replace_str is not specified, pattern/search_str is replaced with empty string occurence:  when 0, the default, replaces all occurrences of the match.  when n, any positive integer, replaces the nth occurrence. regexp_replace.sql
    • REGEXP_COUNT New in 11g Returns the number of times a pattern appears in a string. REGEXP_COUNT(source_str, pattern [,position [,match_param]]) For simple patterns it is same as (LENGTH(source_str) – LENGTH(REPLACE(source_str, pattern)))/LENGTH(pattern) regexp_count.sql
    • Pattern Matching modifiers i – Specifies case-insensitive matching (ignore case) c – Specifies case-sensitive matching n – allows the period (.) to match the newline character m - treats the source string as multiple lines. x - ignores whitespace characters when match_parameter is not specified,  case sensitivity is determined by NLS_SORT parameter (BINARY, BINARY_CI)  A period (.) doesnt match newline character  Source string is treated as a single line match_params.sql
    • Is a CHAR column all numeric? to_number(c1) returns ORA-01722: invalid number if a varchar2 column contains alpha characters. is_numeric.sql
    • Check constraints Put validation close to where the data is stored No need to have validation at different clients check_constraint.sql
    • Extract email-ids Find email-ids embedded in text strings. Possible email-id formats: abc123@company.com namex@mail.company.com xyz_1@yahoo.co.in extract_emailid.sql
    • Extract dates Extract dates embedded in text strings. Possible formats 1/5/2007, 2-5-03, 12-31-2009, 1/31/10, 2/5-10 extract_date.sql
    • Extracting hostnames from URLs Extract hostnames/domain-names embedded in text strings. Possible formats  http://us.mg201.mail.yahoo.com/dc/launch?.partner =sbc&.gx=1&.rand=fegr2vucbecu5  https://www.mybank.com:8080/abc/xyz  www.mybank.com  ftp://www.mycharity.org/abc/xyz extract_hostname.sql
    • Convert value pairs to XML Input: A string such as remain1=1;remain2=2; Output: An XML string <remain1><value=1></remain1> <remain2><value=2></remain2> convert_to_xml.sql
    • Sort IP addresses in numerical order Sort IP addresses, that are stored as character strings, in numerical order. Input      sort_ip_address.sql
    • Extract first name, last name, and middle initial Extract the first name, last name with an optional middle initial. first_last_mi.sql
    • Finding the Last Occurrence Find the last numeric sequence from a sequence. Return 567 from abc/123/def567/xyz INSTR and SUBSTR allow backward search when position is negative. REGEXP functions dont allow backward search last_occurrence.sql
    • Fuzzy Match Tables t1 and t2 each have a varchar2(12) column (t1.x, t2.y). A row in t1 is considered a match for a row in t2, if any six characters in t1.x matches with any six characters in t2.y fuzzy_match.sql
    • The lazy operator ? is lazy/non-greedy quantifier greedy_lazy.sql
    • Meta-characters with multiple meanings Same meta characters are used with multiple meanings  ^ used for anchoring and negation.  ? used as quantifier and lazy operator  () used for grouping or sub-expression metachars_with_multiple_meanings.sql
    • Nuances ? (zero or one), * (zero or more) could sometimes mislead you nuances.sql
    • Stored patterns patterns can be stored in table columns and be referenced in REGEXP functions No need to hard-code them stored_patterns.sql
    • Random things Insert a dash before the two last digits Remove a substring Get rid of useless commas from a string Find the word that comes immediately before a substring (e.g. XXX) Replace whole words, not its parts Trimming the trailing digits random.sql
    • A few other points When not to use Regular Expressions  If the same thing could be used without regular expressions and without too much coding. POSIX notations need double brackets [[:upper]]. [:upper:] wont work. [[:UPPER:]] wont work either. It has to be in lower case letters. Locale support provided with Collation Element ][.ce.]], and Equivalence Classes [[=e=]] MySQL supports regular expressions with RLIKE
    • References Oracle® Database Advanced Application Developers Guide (http://download.oracle.com/docs/cd/E11882_0 1/appdev.112/e17125/adfns_regexp.htm#CHDGH BHF) Anti-Patterns in Regular Expressions: http://gennick.com/antiregex.html First Expressions. An article by Jonathan Gennick Oracle Magazine, Sep/Oct 2003. Oracle Regular Expressions Pocket Reference by Gonathan Gennick http://examples.oreilly.com/9780596006013/Re gexPocketData.sql
    • References ... http://www.psoug.org/reference/regexp.html http://download.oracle.com/docs/cd/E11882_01/se rver.112/e10592/conditions007.htm#SQLRF00501 http://www.oracle.com/technology/pub/articles/sat ernos_regexp.html http://www.oracle.com/technology/products/datab ase/application_development/pdf/TWP_Regular_E xpressions.pdf http://asktom.oracle.com/pls/asktom/asktom.searc h?p_string=regexp_
    • References ... http://www.oracle.com/technology/obe/10gr2_db_single/de velop/regexp/regexp_otn.htm http://www.oracle.com/technology/sample_code/tech/pl_sq l/index.html http://forums.oracle.com/forums/thread.jspa?threadID=427 716 http://forums.oracle.com/forums/search.jspa?threadID=&q= regular+expression&objID=f75&dateRange=all&userID=&nu mResults=120&rankBy=9 http://www.oracle.com/technology/sample_code/tech/pl_sq l/index.html http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUE STION_ID:2200894550208#1568589800346862515
    • Q&A devel_oracle@