Incremental Evolving Grammar Fragments


Published on

Incremental Evolving Grammar Fragments, UK Computational Intelligence Workshop 2008, Leicester UK

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We are dealing with diverse sources of data everyday, let it be product manual, technical report, through media, and communication with our friends. New data are generated every minute until it is outstripping our ability to reuse the existing data. When there is a request for some data, user has to sift multiple possibilities to locate relevant answers
  • These are some of the images I got when searching for the keyword information overload. People can be burdened with the information overloading issue especially because human understanding of words is generally based on usage, content and experience rather than on formal logical definitions. Therefore, there is a need to employ the techniques of soft computing to manage this intrinsic uncertainty. We aim to use computational intelligence to extract (or at least approximate) aspects of the human-understandable semantic content.
  • Our approach is towards examining the underlying structure of the textual data which can usually be explained by discovering the grammar belonging to the data. Text can be divided into short and long texts and the approach to discover their patterns differ on the their language model.
  • We are currently focusing on UK address and we do not use a full language model because UK address are usually in short sentence, less structured, and have multiple patterns.
  • A proper UK address is usually in the form of number,streetname,town and postcode. However, alternative patterns also exists and thus the challenge is to learn the variations of these data.
  • Next I would like to describe some of the successful stories in grammar These are standard machine learning approaches that can create grammar fragments, but these generally use a two stage (train / test) approach that requires a fully representative set of training examples and such approaches cannot easily cope with the addition of a new training example
  • GA is one of the popular machine learning method, and the first part of this research has been focusing on investigating suitability of genetic algorithm for the grammar learning task. However, our observation shows that this approach is not suitable for incremental learning support. GA is naturally in the common train-then-test machine learning setting of and therefore is not efficient to be rebuild when new data undefined by the training set is fed.
  • Since there is shortcoming on the use of GA on our data, we move to using fuzzy approach. We are trying to incorporate the uncertainty and redundancy notion in the address. Figure 2 shows the overlapping between grammar terminals and the multiple possibilities of deriving the grammar for an address.
  • The fuzzy approach adopted in this research is loosely inspired by Levenshtein Edit Distance to measure the difference between sequences, which counts the number of insertion, deletions and substitutions necessary to make two sequences the same
  • After the matrix traversing, the marked elements are combined for approximation purpose. There are 3 operators involved which are called create, merge and replace. These operators are used to maintain the fixed grammar representation throughout the learning task which is to have only a single optional in each grammar fragment.
  • The overlap between the source and the target grammar is measured using Fuzzy Grammar Overlap function. The overlap matrix is then crawled to detect the changes needed to transform the source into the target grammar. This incremental technique will cumulatively learn the new example without sacrificing the parsing of the past examples. The matrix is traversed starting from the final cell. Then the minimum neighbor from the current cell is iteratively identified. A horizontal and vertical move means that the element is an optional type where a diagonal move means the element pairs should be saved as duplicate grammar. If the target grammar element in this duplicate pair is more general than the source, this pair is generalized.
  • This is an overview of the incremental learning approach where first the grammar for the address is derived. Then, the new grammar is approximated to the existing final grammar, which is the target grammar, to make them equally parsed.
  • As a conclusion, the devised algorithm has proven ability to learn and adapt to new data without sacrificing the past data. The introduced approximation operators have managed to escape from the common genetic operators. Future work remaining include the task to refine the approximation method and to move to other datasets.
  • Incremental Evolving Grammar Fragments

    1. 1. Incremental Evolutionary Grammar Fragments Nurfadhlina Mohd Sharef, Trevor Martin, Yun Shen Artificial Intelligence Group, University of Bristol, BS8 1TR UK [email_address] , [email_address] , [email_address]
    2. 2. <ul><li>Outline </li></ul><ul><li>Background of problem </li></ul><ul><li>Literature review </li></ul><ul><li>Shortcoming of evolutionary approach </li></ul><ul><li>Fuzzy text pattern learning </li></ul><ul><li>Grammar Approximation </li></ul><ul><li>Conclusion </li></ul>
    3. 3. <ul><li>Digital Obesity </li></ul>report website News paper TV news pamphlet comic books brochures meeting Sms/mms
    4. 4. <ul><li>Information Overload? </li></ul>
    5. 5. <ul><li>Text Structure </li></ul><ul><li>Grammar : the word order governs the message that is to be delivered in the sentences </li></ul><ul><li>Short vs. Long texts </li></ul><ul><li>Full language model (such as the subject-verb-object approach) is difficult to specify , complex to process , and subject to problem domains . </li></ul>
    6. 6. <ul><li>Learning Text Fragments </li></ul><ul><li>Shorter Sentence </li></ul><ul><li>Less Structured </li></ul><ul><li>Multiple patterns </li></ul><ul><li>Do not follow formal grammar rules </li></ul><ul><li>No need for complete language model </li></ul><ul><li>e.g:, </li></ul><ul><ul><li>dates and times, </li></ul></ul><ul><ul><li>names of products, </li></ul></ul><ul><ul><li>names of people, </li></ul></ul><ul><ul><li>simple sentence forms such as questions, complaints, and news. </li></ul></ul>
    7. 7. <ul><li>Grammars for Postal Address </li></ul><ul><ul><li>number, street name , town, postCode </li></ul></ul><ul><ul><li>‘ 21 London Rd Ipswich Suffolk IP1 2EZ’ </li></ul></ul><ul><li>And others: </li></ul><ul><ul><li>‘ 29 Meredith Rd Ipswich’ </li></ul></ul><ul><ul><li>number, street name,town </li></ul></ul><ul><ul><li>‘ Belfairs Hotel 33 Graham Rd Ipswich’ </li></ul></ul><ul><ul><li>word, business, number, street name, town </li></ul></ul><ul><li>The variations of the pattern will probably increase as more data samples are encountered. </li></ul>Address A: 29 Meredith Rd Ipswich A is an address, but is B a valid address? A and B are valid addresses! Address B: Future House, 31, Mars Ave, Mars
    8. 8. <ul><li>Existing Approaches </li></ul><ul><li>tagging-based information extraction </li></ul><ul><li>document distributions and statistical model </li></ul><ul><li>evolutionary genetic algorithms </li></ul><ul><li>semantic nets </li></ul><ul><li>fuzzy methods </li></ul><ul><li>Aimed at generating grammars that would parse fully defined dataset and cannot easily cope with the addition of a new training example. </li></ul>Figure 1: Example of information tagging
    9. 9. <ul><li>Genetic Algorithm for Grammar Parsing </li></ul><ul><li>Goal : Generate grammar that would cover past and new examples </li></ul><ul><li>Approach : binary trees of non-terminal nodes </li></ul><ul><li>left branch: T:= {word, number, street ending,…} </li></ul><ul><li>right branch: T U { AND , OR , OPTIONAL } </li></ul><ul><li>Population Setting: Groups of grammar </li></ul><ul><li>files with varied number of grammar definitions </li></ul><ul><li>Mating selection (Elitist): Among files within and </li></ul><ul><li>between groups and among grammar elements </li></ul><ul><li>in and between groups </li></ul><ul><li>Genetic operators: crossover and mutation </li></ul><ul><li>Fitness Function : measure the ability of the grammar to parse test strings </li></ul>Figure 2: Address Grammar Fragments Binary Tree 25 acacia avenue
    10. 10. <ul><li>Result : </li></ul><ul><li>Fitness is low although all grammars have converged (average highest score=0.388, highest score=0.6) </li></ul><ul><li>2. Effective for grammar building but requires complete retraining if the initial set of examples is not sufficiently general to create a good classifier. </li></ul>Figure 3: parsing score of generated grammar groups in generation 0 Figure 4: parsing score of generated grammar groups in generation 32
    11. 11. <ul><li>Fuzzy Approach for Text Pattern Learning </li></ul><ul><li>To describe a relation between the text and the grammar fragment </li></ul><ul><li>Represents the </li></ul><ul><li>membership degree </li></ul><ul><li>of the grammar belonging </li></ul><ul><li>to the text. </li></ul><ul><li>The grammar element </li></ul><ul><li>can be terminal </li></ul><ul><li>as well as fuzzy sets </li></ul>Figure 5: Partial Order Table for UK Address
    12. 12. <ul><li>Grammar Similarity </li></ul><ul><li>Fuzzy Grammar and Fuzzy Membership </li></ul><ul><ul><li>Loosely inspired by Levenshtein Edit Distance </li></ul></ul>Table 1: Example of string edit distance operation (*I:Insert, D:Delete, S:Substitute) Table 2: Example of Grammar Edit Distance Operation (*I:Insert, D:Delete, S:Substitute) = = = = = D=1 D=1 S=1 S=1 Edit distance* Y A D S E U T Target string Y A D S E N D E W Source string I=1 = = D=1 S=1 = Edit distance* Countyname Placename Streetending Placename Number Target grammar Placename Streetending Word Word Number Source grammar
    13. 13. <ul><li>Fuzzy Parsing </li></ul><ul><li>Fuzzy Membership : Measure the parsing degree of a grammar on strings </li></ul><ul><li>Fuzzy Overlap : Cost GG (GS, GT): estimate of the cost of changing a string parsed by the grammar GS into one parsed by the grammar GT . </li></ul>… (Eq. 1) … (Eq. 2) … (Eq. 3) I: insertion D: Deletion S: Substitute Rs: Remainder in the source Rt: Remainder in the target
    14. 14. <ul><li>Equations (I) </li></ul>S, T : sequences of grammar elements, s, t : terminal symbols, TSi and TSj : (fuzzy) sets of terminal symbols, X : any single grammar element Hs, Ht : tags.
    15. 15. <ul><li>Equations (II) </li></ul>S, T : sequences of grammar elements, s, t : terminal symbols, TSi and TSj : (fuzzy) sets of terminal symbols, X : any single grammar element Hs, Ht : tags.
    16. 16. <ul><li>Equations (III) </li></ul>S, T : sequences of grammar elements, s, t : terminal symbols, TSi and TSj : (fuzzy) sets of terminal symbols, X : any single grammar element Hs, Ht : tags.
    17. 17. <ul><li>Incremental Evolution Strategy </li></ul><ul><li>Suppose we have a set of positive examples (P). </li></ul><ul><li>We find the grammar fragment H max that parses </li></ul><ul><li>S p with maximum membership </li></ul><ul><li>If Cost GG ( S p , H max) ≤ ( Cost GG ( S p , H i )) </li></ul><ul><li>Then we shall incrementally alter H max or create a new grammar. </li></ul><ul><li>Cost GG ( S p , H max) ≥ max ( Cost GG ( H i , H max )) </li></ul>
    18. 18. <ul><li>Grammar Approximation Operators </li></ul><ul><ul><li>Create a new rule H new ::= S p , where appropriate substring can be tagged and restrict to maintain single optional </li></ul></ul><ul><ul><li>H final =[H i ]GH i+1 </li></ul></ul><ul><ul><li>Merge duplicate grammar definition which can be generalized and replace with a more generalize fuzzy superset grammar </li></ul></ul><ul><ul><li>H i :={g i , g i+1 ,…, g n }, g i = moreGeneral(gS,gT) </li></ul></ul><ul><ul><li>Replace contiguous optional grammar with optional fuzzy grammar </li></ul></ul><ul><ul><li>[H new ]={ g i , g i+1 ,…, g n } </li></ul></ul>
    19. 19. <ul><li>Fuzzy Grammar Overlap </li></ul>Figure 6: Overlap Matrix Approximation Result : X:=placeName X:=anyWord G1:= placeName-postCode Addr:= number-X-streetend-[G1] Approximation Result Generalized : G1:= placeName-postCode Addr: Number-anyWord-streetend-[G1] Final cost I=0, D=0, S=0 I=0, D=0, S=1 3 0 1 (A) 1 0 1 (B) 0 0 1 (C) 0 0 1 (D) 0 0 0 (E) I: insertion + remainder of target D: Deletion+ remainder of source S: Substitute 3 0 1 (A) 1 0 1 (B) 0 0 1 (C) 0 1 1 0 2 0 0 3 0 streetend Row=3 4 0 1 2 0 1 1 0 1 0 0 1 (D) 0 1 0 0 2 0 placename Row=2 5 0 0 3 0 0 2 0 0 1 0 0 0 0 0 (E) 0 1 0 number Row=1 5 0 0 4 0 0 3 0 0 2 0 0 1 0 0 0 0 0 Null Row=0 postcode placename streetend anyWord number Null Source grammar Column=5 Column=4 Column=3 Column=2 Column=1 Column=0 Target grammar
    20. 20. <ul><li>Grammar Approximation </li></ul>Figure 7: Grammar Approximation Example ADDR:=number-placeName-streetend-placeName-postCode number-placeName-streetend-placeName-postCode 107 hatfield rd ipswich ip3 9ag Approximated Grammar Grammar derived from address Address G1:=streetend-placeName-[postCode] G2:=anyWord G2:=placeName ADDR:=number-G2-G1 number-anyWord-streetend-placeName 121 sidegate ln ipswich G1:=postCode G2:=streetend-placeName-[G1] G3:=anyWord-anyWord G4:=anyWord G4:=number ADDR:=G4-anyWord-[G3]-G2 anyWord-anyWord-anyWord-anyWord-streetend-placeName alnesbourne priory club nacton rd ipswich
    21. 21. <ul><li>Conclusion and Future Work </li></ul><ul><li>The fuzzy method outperforms the standard genetic techniques to create fuzzy grammars </li></ul><ul><li>Highlight: ability to learn new text pattern without sacrificing past data </li></ul><ul><li>Approximation operators: escaped from the common genetic operators </li></ul><ul><li>Future Work: refine the approximation method and test with other softer structures data </li></ul>