Python Programming - XI. String Manipulation and Regular Expressions
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Python Programming - XI. String Manipulation and Regular Expressions

on

  • 1,048 views

 

Statistics

Views

Total Views
1,048
Views on SlideShare
1,046
Embed Views
2

Actions

Likes
3
Downloads
66
Comments
0

1 Embed 2

http://www.slideee.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Python Programming - XI. String Manipulation and Regular Expressions Presentation Transcript

  • 1. PYTHON PROGRAMMING Text Processing XI. String Manipulation and Regular Expressions Engr. Ranel O. Padon
  • 2. PYTHON PROGRAMMING TOPICS I • Introduction to Python Programming II • Python Basics III • Controlling the Program Flow IV • Program Components: Functions, Classes, Packages, and Modules V • Sequences (List and Tuples), and Dictionaries VI • Object-Based Programming: Classes and Objects VII • Customizing Classes and Operator Overloading VIII • Object-Oriented Programming: Inheritance and Polymorphism IX • Randomization Algorithms X • Exception Handling and Assertions XI • String Manipulation and Regular Expressions XII • File Handling and Processing XIII • GUI Programming Using Tkinter
  • 3. Text Processing String Manipulation Regular Expressions
  • 4. TEXT PROCESSING * used to develop text editors, word processors, page-layout soft-ware, computerized typesetting systems, and other text-processing software * used to search for patterns in text * used to validate user-inputs * used to process the contents of text files
  • 5. STRING MANIPULATION Strings are made up of Characters. Characters are made up of: Digits (0, 1, 2, …, 9) Letters (a, b, c, …, z) Symbols (@, *, #, $, %, &, …)
  • 6. String Methods
  • 7. String Methods
  • 8. String Methods
  • 9. String Methods
  • 10. String Methods
  • 11. STRING MANIPULATION | Samples
  • 12. STRING MANIPULATION | Samples
  • 13. STRING MANIPULATION | Samples
  • 14. STRING MANIPULATION | Samples
  • 15. STRING MANIPULATION | Samples
  • 16. STRING MANIPULATION | Samples
  • 17. STRING MANIPULATION | Samples
  • 18. STRING MANIPULATION | Samples
  • 19. STRING MANIPULATION | Samples
  • 20. STRING MANIPULATION | Samples
  • 21. REGULAR EXPRESSIONS to test if a certain string contains a day of a week, it has to test if it contains “Monday,” “Tuesday”, and so on. you will need to use the find() method seven times but, it could be solved elegantly by Regular Expressions
  • 22. REGULAR EXPRESSIONS * use string methods for simple text processing * string methods are more readable and simpler than regular expressions
  • 23. REGULAR EXPRESSION text pattern that a program uses to find substrings that will match the required pattern expression that specify a set of strings a pattern matching mechanism also known as Regex introduced in the 1950s as part of formal language theory
  • 24. REGULAR EXPRESSIONS very powerful! hundreds of code could be reduced to a one-liner elegant regular expression. used to construct compilers, interpreters, text editors, … used to search & match text patterns used to validate text data formats especially input data
  • 25. REGULAR EXPRESSIONS Popular programming languages have RegEx capabilities: Perl, JavaScript, PHP, Python, Ruby, Tcl, Java, C, C++, C#, .Net, Ruby, …
  • 26. REGEX Popular programming languages have RegEx capabilities: Perl, JavaScript, PHP, Python, Ruby, Tcl, Java, C, C++, C#, .Net, Ruby, …
  • 27. REGEX | General Concepts  Alternative  Grouping  Quantification  Anchors  Meta-characters  Character Classes
  • 28. REGEX | General Concepts  Alternative: |  Grouping: ()  Quantification: ? + * {m,n}  Anchors: ^$  Meta-characters: . [ ] [-] [^ ]  Character Classes: w d s W …
  • 29. REGEX | Alternative “ranel|ranilio” == “ranel” or “ranilio” “gray|grey” == “gray” or “grey”
  • 30. REGEX | Grouping “ran(el|ilio)” == “ranel” or “ranilio” “gr(a|e)y” == “gray” or “grey” “ra(mil|n(ny|el))” == “ramil” or “ranny” or “ranel”
  • 31. REGEX | Quantification | ? ? == zero or one of the preceding element “rani?el” == “raniel” or “ranel” “colou?r” == “colour” or “color”
  • 32. REGEX | Quantification | * * == zero or more of the preceding element “goo*gle” == “gogle” or “google” or “gooooogle” “(ha)*” == “” or “ha” or “haha” or “hahahahaha” “12*3” == “13” or “1223” or “12223”
  • 33. REGEX | Quantification | + + == one or more of the preceding element “goo+gle” == “google” or “gooogle” or “gooooogle” “(ha)+” == “ha” or “haha” or “hahahahaha” “12+3” == “123” or “1223” or “12223”
  • 34. REGEX | Quantification | {m,n} {m, n} == m to n times of the preceding element “go{2, 3}gle” == “google” or “gooogle” “6{3, 6}” == “666” or “6666” or “66666” or “666666” “5{3}” == “555” “a{2,}” == “aa” or “aaa” or “aaaa” or “aaaaa” …
  • 35. REGEX | Anchors | ^ ^ == matches the starting position within the string “^laman” == “lamang” or “lamang-loob” or “lamang-lupa” “^2013” == “2013”, “2013-12345”, “2013/1320”
  • 36. REGEX | Anchors | $ $ == matches the ending position within the string “laman$” == “halaman” or “kaalaman” “2013$” == “2013”, “777-2013”, “0933-445-2013”
  • 37. REGEX | Meta-characters | . . == matches any single character “ala.” == “ala” or “alat” or “alas” or “ala2” “1.3” == “123” or “143” or “1s3”
  • 38. REGEX | Meta-characters | [ ] [ ] == matches a single character that is contained within the brackets. “[abc]” == “a” or “b” or “c” “[aoieu]” == any vowel “[0123456789]” == any digit
  • 39. REGEX | Meta-characters | [ - ] [ - ] == matches a single character that is contained within the brackets and the specified range. “[a-c]” == “a” or “b” or “c” “[a-z]” == all alphabet letters (lowercase only) “[a-zA-Z]” == all letters (lowercase & uppercase) “[0-9]” == all digits
  • 40. REGEX | Meta-characters | [^ ] [^ ] == matches a single character that is not contained within the brackets. “[^aeiou]” == any non-vowel “[^0-9]” == any non-digit “[^abc]” == any character, but not “a”, “b”, or “c”
  • 41. REGEX | Character Classes Character classes specifies a group of characters to match in a string
  • 42. REGEX | Summary  Alternative: |  Grouping: ()  Quantification: ? + * {m,n}  Anchors: ^$  Meta-characters: . [ ] [-] [^ ]  Character Classes: w d s W …
  • 43. REGEX | Combo
  • 44. REGEX | Date Validation “1/3/2013” or “24/2/2020” (d{1,2}/d{1,2}/d{4})
  • 45. REGEX | Alphanumeric, -, & _ “rr2000” or “ranel_padon” or “Oblan-Padon” ([a-zA-Z0-9-_]+)
  • 46. REGEX | Numbers in 1 to 50 “1” or “50” or “14” (^[1-9]{1}$|^[1-4]{1}[0-9]{1}$|^50$)
  • 47. REGEX | HTML Tags “<title>” or “<strong>” or “/body” (<(/?[^>]+)>)
  • 48. PYTHON REGEX | Raw String
  • 49. PYTHON REGEX | Raw String r Two Solutions:
  • 50. PYTHON REGEX | Raw String r Raw Strings are used for enhancing readability.
  • 51. PYTHON REGEX | Raw String
  • 52. PYTHON REGEX | The re Module
  • 53. PYTHON REGEX | Samples
  • 54. PYTHON REGEX | Samples
  • 55. PYTHON REGEX | Samples
  • 56. PYTHON REGEX | Samples
  • 57. PYTHON REGEX | Samples
  • 58. PYTHON REGEX | Samples
  • 59. PYTHON REGEX | Samples
  • 60. PYTHON REGEX | Samples
  • 61. PYTHON REGEX | Samples
  • 62. REFERENCES  Deitel, Deitel, Liperi, and Wiedermann - Python: How to Program (2001).  Disclaimer: Most of the images/information used here have no proper source citation, and I do not claim ownership of these either. I don’t want to reinvent the wheel, and I just want to reuse and reintegrate materials that I think are useful or cool, then present them in another light, form, or perspective. Moreover, the images/information here are mainly used for illustration/educational purposes only, in the spirit of openness of data, spreading light, and empowering people with knowledge. 