2008/10/07 Regular meeting


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2008/10/07 Regular meeting

  1. 1. A fuzzy symbolic inference system for postal address component extraction and labelling P. Nagabhushan, S.A. Angadi, and B.S. Anami FSKD,2006 Speaker: Shu-Ying Li 2008/10/7
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Postal mail address component labelling problem </li></ul><ul><li>Symbolic representation </li></ul><ul><li>Fuzzy symbolic inference system </li></ul><ul><li>Results and discussions </li></ul><ul><li>Conclusion </li></ul>2008/10/7
  3. 3. Introduction <ul><li>The task of address component labelling is similar to text/ word categorization. </li></ul><ul><li>Symbolic similarity measure is devised for identifying the address component labels using the symbolic representation of the postal address and a symbolic knowledge base. </li></ul><ul><li>The similarity measure is a fuzzy membership function as it gives approximate nearness to various possible labels. </li></ul><ul><li>The alpha cut set is further used in defining a confidence value for the decision made. </li></ul><ul><li>The methodology has given a labelling accuracy of 94%. </li></ul>2008/10/7
  4. 4. Postal Mail address component labelling problem(1/2) <ul><li>The structure of postal addresses in some countries is fairly standardized, but some countries have unstructured postal address which it is hard to devise a standard address format. </li></ul>2008/10/7
  5. 5. Postal Mail address component labelling problem(2/2) <ul><li>It may some times use wrong spellings. </li></ul><ul><li>Every address will not contain all the components, and some addresses may contain more than one value for the same component type. </li></ul><ul><li>This address component labelling task is not trivial, particularly when the addresses are unstructured and the labelling is to be based on the address information itself. </li></ul><ul><ul><li>The methodology can be adopted in other countries having similar unstructured format. </li></ul></ul>2008/10/7
  6. 6. Symbolic representation—postal address(1/3) <ul><li>Some of the fields of postal addresses are qualitative; other fields such as house number. </li></ul><ul><ul><ul><li>A postal address may not contain all the possible fields. </li></ul></ul></ul><ul><li>Symbolic objects offer a formal methodology to represent such variable information about an entity. </li></ul><ul><ul><ul><li>Assertion Object : conjunction of events pertaining to a given object. </li></ul></ul></ul><ul><ul><ul><li>Hoard Object : collection of one or more assertion objects. </li></ul></ul></ul><ul><ul><ul><li>Synthetic Object : collection of one or more hoard objects. </li></ul></ul></ul><ul><li>An event is a pair which links feature variables and feature values. </li></ul>2008/10/7
  7. 7. Symbolic representation—postal address(2/3) <ul><li>The postal address object is described as a hoard object consisting of three assertion type objects, namely Addressee, Location and Place as described below. </li></ul><ul><ul><ul><li>[Addressee] : the name and other personal details. </li></ul></ul></ul><ul><ul><ul><li>[Location] : the geographical position. </li></ul></ul></ul><ul><ul><ul><li>[Place] : the city/ town or village. </li></ul></ul></ul><ul><li>The feature variables or postal address fields of the different assertion objects are listed below. </li></ul>2008/10/7
  8. 8. Symbolic representation—postal address(3/3) <ul><li>Each of the feature describes some aspect of the object and all the features together completely specify the assertions objects. </li></ul><ul><li>A typical postal address and its representation as a symbolic object is given in Table 1. </li></ul>2008/10/7
  9. 9. Symbolic representation—knowledge base for address component labelling(1/2) <ul><li>The symbolic knowledge base used in this work provides a systematic approach for address component labelling and an improved performance. </li></ul><ul><li>The symbolic knowledge base, AD_COMP_KB is organized as a synthetic object of three hoard objects. </li></ul><ul><ul><ul><li>Addressee Knowledge base: [Addresskb] </li></ul></ul></ul><ul><ul><ul><li>Location Knowledge base: [Locationkb] </li></ul></ul></ul><ul><ul><ul><li>Place Knowledge base: [Placekb] </li></ul></ul></ul>2008/10/7
  10. 10. Symbolic representation—knowledge base for address component labelling(2/2) <ul><li>The hoard objects are made of assertion objects as detailed in Figure 2. </li></ul><ul><li>All the assertion objects of the symbolic knowledge base have the events described in Figure 3. </li></ul>2008/10/7
  11. 11. Fuzzy symbolic inference system <ul><li>Symbolic similarity measure for address component labelling. </li></ul><ul><li>Fuzzy symbolic methodology for address component labelling. </li></ul>2008/10/7
  12. 12. Symbolic similarity measure for address component labelling. <ul><li>The similarity measure is made up of three components: </li></ul><ul><ul><ul><li>Similarity due to position </li></ul></ul></ul><ul><ul><ul><li>Similarity due to content </li></ul></ul></ul><ul><ul><ul><li>Similarity due to span of the two objects being compared. </li></ul></ul></ul><ul><li>The similarity measure gives the similarity of the input component with various component labels (assertion objects) of the symbolic synthetic object AD_COMP_KB . </li></ul><ul><li>The similarity measure between i th input component (IP i ) and j th component label (ct j ) of the knowledge base is found using </li></ul>2008/10/7
  13. 13. <ul><ul><ul><li>n : the number of available components in input address </li></ul></ul></ul><ul><ul><ul><li>m : the number of possible component labels or assertion objects in the knowledge base. </li></ul></ul></ul><ul><ul><ul><li>EV : a value of 7, representing the seven events of the assertion objects </li></ul></ul></ul><ul><ul><ul><li>netsimk are calculated for each event of assertion object using the computations implied in (7) for the first five to calculate content similarity and (8) for the last two to calculate span and content similarity. </li></ul></ul></ul><ul><ul><ul><li>Interse is number of words/elements common to input component and component label under test </li></ul></ul></ul><ul><ul><ul><li>Comp_IP is the number of words/ elements in the input component </li></ul></ul></ul><ul><ul><ul><li>Comp_KB is the number of words/ elements in the component label (knowledge base) under test and Sum _ IP _ KB = Comp _ IP + Comp _ KB − Interse. </li></ul></ul></ul>2008/10/7
  14. 14. Fuzzy symbolic methodology for address component labelling. <ul><li>Make a decision to which component class the input component belongs, a de-fuzzification process is taken up. </li></ul><ul><li>The de-fuzzification is done by defining the fuzzy α-cut set. The α value is calculated using equation. </li></ul><ul><ul><ul><li>S0 is the maximum similarity value obtained for the input component. </li></ul></ul></ul><ul><ul><ul><li>DFC is the de-fuzzification constant and is taken as 0.1. </li></ul></ul></ul><ul><li>The alpha cut set is obtained from the similarity array by taking into the cut set all the members of the similarity array whose value is greater than α. </li></ul>2008/10/7
  15. 15. <ul><li>The confidence of the system in a given component label is evaluated using equation. </li></ul><ul><ul><ul><li>C i,j = Confidence of assigning j th component label to i th input component n is the number of input components and p is the number of component labels in α- cut set </li></ul></ul></ul><ul><ul><ul><li>S j is the similarity if i th input component with j th component label in similarity array. </li></ul></ul></ul>2008/10/7
  16. 16. Results and discussions(1/2) 2008/10/7
  17. 17. Results and discussions(2/2) 2008/10/7
  18. 18. Conclusion <ul><li>The fuzzy symbolic methodology for address component labelling presented in this paper has addressed on extracting and labelling of postal address components. </li></ul><ul><ul><ul><li>Employ the symbolic similarity measures and fuzzy alpha cut method for addresses components labelling and deciding on the label of components. </li></ul></ul></ul>2008/10/7