• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Extracting data from text documents using the regex
 

Extracting data from text documents using the regex

on

  • 1,115 views

Slide Deck from RegEx meets .Net at Silicon Valley Code Camp 2011

Slide Deck from RegEx meets .Net at Silicon Valley Code Camp 2011

Statistics

Views

Total Views
1,115
Views on SlideShare
1,115
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Extracting data from text documents using the regex Extracting data from text documents using the regex Presentation Transcript

    • Extracting Data from Text Documents using the Regex Class Steve Mylroie
    • Bio Steve Mylroie
      • Current Status
        • Semi Retired – 1099 Consultant (Microsoft Stack)
      • Baynet Roles:
        • Co-Chair South Bay Chapter, Treasurer, Board Member
      • Employment History – 40 + years
        • Semiconductor Industry
          • Signetic, NV Philips, Monolithic Memories, AMD, KLA-Tencor, Promise System(Samsung)
          • Process Development, TCAD, Metrology Tools, Factory Management Software, Shop Floor Control systems
        • Medical Startups
          • QuickSilver Systems Lummisys (Ultrasound Image Management) (NT)
          • 5 Degree Bios (Cancer Treatment Planning) (Dotnet Nuke)
        • Education
          • BSEE U of W MS and Phd EE Stanford
    • Roadmap
      • The BAYNET Application (Problem Statement)
        • Email Deliver Error Reports (RFCs 821, 1893, 2043, 3436 )
      • What is Regex
      • The .Net Implementation
      • Regex syntax (brief)
      • Code
        • SMTPExtendedStatusMessage
        • SMTPDiagnosticMessage
        • SMTPDeliveryErrorReport
        • Main
      • Demo
      • Regex vs String Class (Time Permitting)
    • SMTP Email Delivery Failure Reports
      • Baynet was receiving arround 1050 of these files with every meeting announcement posting.
      • Need to Automate Analysis
      • Needed to transfer error information to a Database for reporting, analysis and correction
    • Email Error Reports
      • Returned in the body of a textual Delivery Status Notification prefix to original message, which is returned to the sender
      • RFC 821 Section 4 - 3 digit reply codes 4XX &5XX error replies
      • RFC 1893, RFC 2043, RFC 3464 Extended Error Codes
        • three dot separated fields
          • Class Field, One Digit 2, 4 or 5
          • Subject Three digit
          • Detail Three digit
          • C.SSS.DDD
    • What is Regex
      • Text parsing and Text Replacement Utility based on a Pattern matching syntax
        • Original a UNIX shell utility
        • Today there are UNIX & LINUX shell utilities, C++ and, Java Libraries, Java Script, PHP, Ruby, Phylton, Pearl, PowerShell, VB 6, MySQL, Oracle, PostgreSQL awk and VBScript implementation in addition to a .Net version
        • Web site devoted to Regex
          • http://www.regular-expressions.info/
          • http://www.regexlib.com/
          • http://regexlib.com/CheatSheet.aspx
        • Regex documentation in Visual Studio Help
          • http://msdn.microsoft.com/en-us/library/az24scfc(VS.90).aspx
          • http://msdn.microsoft.com/en-us/library/az24scfc.aspx
    • .Net Implementation
      • Class Regex (System.dll)
      • Namespace System.Text.RegularExpression
      • Static and Dynamic Implementation
      • Static Implementation Public Methods
        • IsMatch, Match, Matches, Replace, Split, Escape, Unescape, CombileToAssembly
      • Static Implementation Public Property
        • CacheSize (Default size 15)
      • Versions
        • Net 1.0+, Compact Framework 1.0+, Silverlight, XNA 1.0
        • (Pearl 5 compatiblity or ECMA compatibility)
    • .Net Implementation (Cont)
      • Contructors
        • Regex(String pattern)
        • Regex(String pattern, string Options)
      • Dynamic Version
        • Same Method More parameter options
        • Two Added Properties (Options, RightToLeft)
        • Compiled but not cached by default
      • Most Options can also be set using inline elements in the pattern string
      • BackReferences
      • System.Web.RegularExpresions
    • Other Objects In The Namespace
      • Match Object
      • Matches Collection
      • Group Object
      • Groups Collection
      • Captured Group Object
      • Capture Collection
      • Individual Capture
      • Regular Expression Engine
    • Regex Pattern Syntax
      • MSDN Web Page containing enumeration of regex expression syntax
      • http://msdn.microsoft.com/en-us/library/az24scfc.aspx
    • Regex Syntax Example U.S. Currency Valuator Pattern as C# String @"^s*[+-]?s?$?s?d+(.d{2})?{$" ^ Start at the beginning of the string. s* Match zero or more white-space characters. [+-]? Match zero or one occurrence of either the positive sign or the negative sign. s? Match zero or one white-space character. $? Match zero or one occurrence of the dollar sign. s? Match zero or one white-space character. d+ Match one or more decimal digits. .? Match zero or one decimal point symbol. d{2}? Match two decimal digits zero or one time. (d*.?d{2}?){1} Match the pattern of integral and fractional digits separated by a decimal point symbol at least one time. $ Match the end of the string.
    • Database Schema
    • Code Example
      • Enough Power Point time for Some Code
    •  
    • Demos
      • Demo (10 Error Reports)
      • Query Demos (Full DataSet)