Python Programming - XI. String Manipulation and Regular Expressions
Upcoming SlideShare
Loading in...5
×
 

Python Programming - XI. String Manipulation and Regular Expressions

on

  • 892 views

 

Statistics

Views

Total Views
892
Views on SlideShare
890
Embed Views
2

Actions

Likes
2
Downloads
53
Comments
0

1 Embed 2

http://www.slideee.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Python Programming - XI. String Manipulation and Regular Expressions Python Programming - XI. String Manipulation and Regular Expressions Presentation Transcript

    • PYTHON PROGRAMMING Text Processing XI. String Manipulation and Regular Expressions Engr. Ranel O. Padon
    • PYTHON PROGRAMMING TOPICS I • Introduction to Python Programming II • Python Basics III • Controlling the Program Flow IV • Program Components: Functions, Classes, Packages, and Modules V • Sequences (List and Tuples), and Dictionaries VI • Object-Based Programming: Classes and Objects VII • Customizing Classes and Operator Overloading VIII • Object-Oriented Programming: Inheritance and Polymorphism IX • Randomization Algorithms X • Exception Handling and Assertions XI • String Manipulation and Regular Expressions XII • File Handling and Processing XIII • GUI Programming Using Tkinter
    • Text Processing String Manipulation Regular Expressions
    • TEXT PROCESSING * used to develop text editors, word processors, page-layout soft-ware, computerized typesetting systems, and other text-processing software * used to search for patterns in text * used to validate user-inputs * used to process the contents of text files
    • STRING MANIPULATION Strings are made up of Characters. Characters are made up of: Digits (0, 1, 2, …, 9) Letters (a, b, c, …, z) Symbols (@, *, #, $, %, &, …)
    • String Methods
    • String Methods
    • String Methods
    • String Methods
    • String Methods
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • STRING MANIPULATION | Samples
    • REGULAR EXPRESSIONS to test if a certain string contains a day of a week, it has to test if it contains “Monday,” “Tuesday”, and so on. you will need to use the find() method seven times but, it could be solved elegantly by Regular Expressions
    • REGULAR EXPRESSIONS * use string methods for simple text processing * string methods are more readable and simpler than regular expressions
    • REGULAR EXPRESSION text pattern that a program uses to find substrings that will match the required pattern expression that specify a set of strings a pattern matching mechanism also known as Regex introduced in the 1950s as part of formal language theory
    • REGULAR EXPRESSIONS very powerful! hundreds of code could be reduced to a one-liner elegant regular expression. used to construct compilers, interpreters, text editors, … used to search & match text patterns used to validate text data formats especially input data
    • REGULAR EXPRESSIONS Popular programming languages have RegEx capabilities: Perl, JavaScript, PHP, Python, Ruby, Tcl, Java, C, C++, C#, .Net, Ruby, …
    • REGEX Popular programming languages have RegEx capabilities: Perl, JavaScript, PHP, Python, Ruby, Tcl, Java, C, C++, C#, .Net, Ruby, …
    • REGEX | General Concepts  Alternative  Grouping  Quantification  Anchors  Meta-characters  Character Classes
    • REGEX | General Concepts  Alternative: |  Grouping: ()  Quantification: ? + * {m,n}  Anchors: ^$  Meta-characters: . [ ] [-] [^ ]  Character Classes: w d s W …
    • REGEX | Alternative “ranel|ranilio” == “ranel” or “ranilio” “gray|grey” == “gray” or “grey”
    • REGEX | Grouping “ran(el|ilio)” == “ranel” or “ranilio” “gr(a|e)y” == “gray” or “grey” “ra(mil|n(ny|el))” == “ramil” or “ranny” or “ranel”
    • REGEX | Quantification | ? ? == zero or one of the preceding element “rani?el” == “raniel” or “ranel” “colou?r” == “colour” or “color”
    • REGEX | Quantification | * * == zero or more of the preceding element “goo*gle” == “gogle” or “google” or “gooooogle” “(ha)*” == “” or “ha” or “haha” or “hahahahaha” “12*3” == “13” or “1223” or “12223”
    • REGEX | Quantification | + + == one or more of the preceding element “goo+gle” == “google” or “gooogle” or “gooooogle” “(ha)+” == “ha” or “haha” or “hahahahaha” “12+3” == “123” or “1223” or “12223”
    • REGEX | Quantification | {m,n} {m, n} == m to n times of the preceding element “go{2, 3}gle” == “google” or “gooogle” “6{3, 6}” == “666” or “6666” or “66666” or “666666” “5{3}” == “555” “a{2,}” == “aa” or “aaa” or “aaaa” or “aaaaa” …
    • REGEX | Anchors | ^ ^ == matches the starting position within the string “^laman” == “lamang” or “lamang-loob” or “lamang-lupa” “^2013” == “2013”, “2013-12345”, “2013/1320”
    • REGEX | Anchors | $ $ == matches the ending position within the string “laman$” == “halaman” or “kaalaman” “2013$” == “2013”, “777-2013”, “0933-445-2013”
    • REGEX | Meta-characters | . . == matches any single character “ala.” == “ala” or “alat” or “alas” or “ala2” “1.3” == “123” or “143” or “1s3”
    • REGEX | Meta-characters | [ ] [ ] == matches a single character that is contained within the brackets. “[abc]” == “a” or “b” or “c” “[aoieu]” == any vowel “[0123456789]” == any digit
    • REGEX | Meta-characters | [ - ] [ - ] == matches a single character that is contained within the brackets and the specified range. “[a-c]” == “a” or “b” or “c” “[a-z]” == all alphabet letters (lowercase only) “[a-zA-Z]” == all letters (lowercase & uppercase) “[0-9]” == all digits
    • REGEX | Meta-characters | [^ ] [^ ] == matches a single character that is not contained within the brackets. “[^aeiou]” == any non-vowel “[^0-9]” == any non-digit “[^abc]” == any character, but not “a”, “b”, or “c”
    • REGEX | Character Classes Character classes specifies a group of characters to match in a string
    • REGEX | Summary  Alternative: |  Grouping: ()  Quantification: ? + * {m,n}  Anchors: ^$  Meta-characters: . [ ] [-] [^ ]  Character Classes: w d s W …
    • REGEX | Combo
    • REGEX | Date Validation “1/3/2013” or “24/2/2020” (d{1,2}/d{1,2}/d{4})
    • REGEX | Alphanumeric, -, & _ “rr2000” or “ranel_padon” or “Oblan-Padon” ([a-zA-Z0-9-_]+)
    • REGEX | Numbers in 1 to 50 “1” or “50” or “14” (^[1-9]{1}$|^[1-4]{1}[0-9]{1}$|^50$)
    • REGEX | HTML Tags “<title>” or “<strong>” or “/body” (<(/?[^>]+)>)
    • PYTHON REGEX | Raw String
    • PYTHON REGEX | Raw String r Two Solutions:
    • PYTHON REGEX | Raw String r Raw Strings are used for enhancing readability.
    • PYTHON REGEX | Raw String
    • PYTHON REGEX | The re Module
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • PYTHON REGEX | Samples
    • REFERENCES  Deitel, Deitel, Liperi, and Wiedermann - Python: How to Program (2001).  Disclaimer: Most of the images/information used here have no proper source citation, and I do not claim ownership of these either. I don’t want to reinvent the wheel, and I just want to reuse and reintegrate materials that I think are useful or cool, then present them in another light, form, or perspective. Moreover, the images/information here are mainly used for illustration/educational purposes only, in the spirit of openness of data, spreading light, and empowering people with knowledge. 