Text Mining Documents in Electronic
   Data Interchange Environment
            Dr. Zakaria Suliman Zubi,
            Associate Professor ,
            Computer Science Department,
            Faculty of Science ,
            Sirt University,
            Sirt ,Libya.



                                  LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining       .

    4   Types of Information and
        Methods .
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Abstract
1.   Internet is a huge source of electronic text documents,
     in multilingual languages.
2.   Electronic documents could be interchanged through the
     web via Electronic Data Interchange (EDI) environments.
3.   The text data can be exchanged in the web in an EDI
     format such as X12 formats.
4.   The EDI format can be transformed and stored in a
     database.
5.   The EDI database will be normalized and mapped into a
     flat file in a form such as spreadsheets.
6.   Text mining using clustering method were applied.
7.   K-mean algorithm used with Euclidean distance measure.
8.   We generate a dataset using text mining application
     program solution called WEKA, to show some
     experimental results.


                                  www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods .
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Introduction
1.   Internet is a huge source of electronic documents in
     multilingual languages.

2.   Electronic documents may contains text, images, audios and
     videos.

3.   Text documents may contains text in Latin languages such as
     English, French, Spanish,….etc Or Non-Latin's languages such as
     Arabic, Chinese, Japanese, Indian,…etc.

4.   As a matter of fact, text content of any electronic document is
     the most significant value in any document, which makes
     applying text mining or information retrieval approaches much
     more reasonable.

5.   Electronic Data Interchange (EDI) is another approach for
     electronic documents interchange through the web in Electronic
     Data Interchange (EDI) environments.


                                        www.themegallery.com     LOGO
Add your company slogan

    Introduction (Continue…..)
                      6.   EDI is becoming progressively more
                           significant as an easy mechanism for
                           organizations to manage, buy, sell, and
                           trade information. ANSI has approved a
                           set of EDI standards known as the X12
                           standards.

                      7.   X12 standards represented the electronic
                           documents.

                      8.   These electronic standards are a
                           necessary condition between any two
                           organization to start a business
                           transactions.

                      9.   The EDI format can be transformed and
                           stored in a database.
EDI documents- to-
database – to- text
 mining life cycle.
                                     www.themegallery.com     LOGO
Add your company slogan

Introduction (Continue…..)
             9.   The EDI database will be
                  normalized and mapped into a flat
                  file in a form such as spreadsheets.

             10. Text mining using clustering
                 method will applied.

             11. K-mean algorithm used with
                 Euclidean distance measure.

             12. We generate a dataset using text
                 mining application program
                 solution called WEKA, to show some
                 experimental results.




                        www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods .
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan
       Types of Text Mining
The purposes of using text mining or data mining:
   ď‚§      To improve customer achievement and maintenance.
   ď‚§      To reduce fraud .
   ď‚§      To identify internal inefficiencies and then revamp
          operations.
   ď‚§      To map the unexplored environment of the Internet.


The major types of tools used in text mining are:
   I.     Artificial Neural Networks;
   II.    Decision trees;
   III.   Genetic algorithms;
   IV.    Rule induction;
   V.     Nearest Neighbor Method;
   VI.    Data Visualization;

                                         www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

    Types of Information and Methods
Text mining usually produces five types of information such
     as:
                                               Turn out when occurrences
   1.   Associations;                          linked in a single occasion.

   2.   Sequences;
   3.   Classifications;                          Procedures linked over time
                                                  based on the event that happen.
   4.   Forecasting
   5.   Clustering;
                                           It Classificationfuture value ofto
                                               guesses the can assist us
        Is one of the essential methods used discover the personality sales
                                           continuous variables like of
        in text mining approaches to discovercustomers who are likelywithin
                                           figures based on patterns to
        different groupings with the data. the data. provides a model that
                                              leave and
                                              used to expect who they are.




                                              www.themegallery.com        LOGO
Add your company slogan

Types of Information and Methods (count)
               Clustering:
               1.   Is unsupervised learning process
                    applied to the text data depending
                    on pre-specified knowledge .

               2.   We use a common partitioned
                    method called K-means algorithm.

               3.   We calculate the distance
                    measures by using Euclidean
                    measures from the centroid.

               4.   Improving performance of text in
                    electronic documents.



                          www.themegallery.com      LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Methods and Algorithms used
1. Clustering using k- means Algorithm:
ď‚§     The k-means algorithm assigns each point to the cluster whose
     centroid is the nearest point.

ď‚§    The center is the average of all the points in the cluster that is, its
     coordinates are the arithmetic mean for each dimension separately
     over all the points in the cluster.

ď‚§    The data set has three dimensions and the cluster has two points: X =
     (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3),
     where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2.The algorithm
     steps are:

       1.   Input D:= {d1,d2,….,dn}; k:= the cluster number;
       2.   Select k document vectors as initial centriods of k cluster;
       3.   Repeat;
       4.    Select one vector d in remaining documents;
       5.   Compute similarities between d and k centriods;
       6.   Put d in the closest cluster and recomputed the centriods;
       7.   Until the centriods don't change;
       8.   Output: k clusters of documents.

                                                      www.themegallery.com           LOGO
Add your company slogan
    Methods and Algorithms used (count)
2.Bag-of-Words Document : The generation of electronic
 documents as a bag of words in EDI database will leads to
 the following features:
      ď‚§   Text document is represented by the words it contains (and
          their occurrences) e.g., "Lord of the rings" → {"the", "lord",
          "rings", "of"}. This representation has a high efficient which
          makes learning far simpler and easer. The order of words in
          this case is not important for certain application.

      ď‚§   Stemming to identify a word by it's root is also conducted
          e.g., flying, flew → fly, it's used to reduce dimensionality.

      ď‚§   Stop words are also used whereas, the most common words
          are unlikely to help text mining e.g., "the", "a", "an", "you"
          ..etc.

      ď‚§   Each document represented by the set of its word
          frequencies and categories that it belongs too.
                                          www.themegallery.com       LOGO
Add your company slogan
     Methods and Algorithms used (count)
3. Text in EDI document representation :
ď‚§   The representation of EDI text document will be as a bag of words,
    which appears independently without considering the order.
ď‚§   Each word corresponds to a dimension in the resulting data space
    and each document then becomes a vector consisting of non-negative
    values on each dimension. We also remove stop words
ď‚§   We uses the frequency of each term as its weight, which means terms
    that appear more frequently are more important and descriptive for the
    document.
ď‚§    Let D = {d1, . . . , dn} be a set of documents and T = {t1, . . . ,tm} the set
    of distinct terms occurring in D.
ď‚§    A document represented as a vector td. Let tf(d, t) signify the
    frequency of term t ε T in document d ε D. Then the vector
    representation of a document d: td = (tf(d, t1), . . . , tf(d, tm))

                                                www.themegallery.com        LOGO
Add your company slogan
     Methods and Algorithms used (count)
4. Distance Measures map the distance between the representative
     description of two objects into a single numeric value, which depends
     on two factors the properties of the two objects and the measure it.
     To qualify a distance measure as a metric, a measure d must satisfy
     the following four conditions.
        1.    Let x and y be any two objects (electronic document) in a data set
             and d(x, y) be the distance between x and y The distance between
             any two points must be nonnegative, that is, d(x, y) ≥ 0.

        2.   The distance between two objects must be zero if and only if the
             two objects are identical, that is, d(x, y) = 0 if and only if x = y.

        3.   Distance must be symmetric, that is, distance from x to y is the
             same as the distance from y to x, i.e. d(x, y) = d(y, x).

        4.   The measure must satisfy the triangle inequality, which is d(x, z) ≤
             d(x, y) + d(y, z).


                                                 www.themegallery.com         LOGO
Add your company slogan
     Methods and Algorithms used (count)
Euclidian distance Measures :
ď‚§   A widely used method in text clustering problem.
ď‚§   It is also the default distance measure used with the K-means
    algorithm.
ď‚§   It is also the ordinary distance between two points and can be easily
    measured with a ruler in two or three-dimensional space.
ď‚§   If we give two documents da and db represented by their term vectors ta
    and tb respectively, the Euclidean distance of the two documents
    defined as:


     It can be calculated also: distance(x,y) = {Σi (xi - yi)2 }½.
ď‚§   Squared Euclidean distance: is used also when we want a greater
    weight on objects that are further apart. This distance computed in
    the following: distance(x,y) = ÎŁi (xi - yi)2

                                           www.themegallery.com       LOGO
Add your company slogan
     Methods and Algorithms used (count)
5. Dataset :We propose a collection of a banking
    transaction of EDI electronic text data that been
    gathered from EDI databases.
        1. EDI text data collected and
           aggregated in seven main
           categories.
        2. We create an EDI corpus.
        3. This corpus represent the datasets
           that consist of 2000 EDI electronic
           documents of different lengths that
           belongs to seven categories.
        4. the categories are transactions
           divisions in X12 standard EDI
           format.


                                             www.themegallery.com     LOGO
Add your company slogan
    Methods and Algorithms used (count)
6. Translating EDI to Databases :
         1)   Is an essential process for storing and accessing our
              transaction information in a valid database format which
              support all common database format.
         2)   It could be done by translating an EDI message EDI X12
              standards formats into a variety of transactions.
         3)   Each transaction file format identifies as a mapping file
              and can be transformed into a flat file format?
         4)   Mapping the translated EDI message into the database will
              constricts a database more likely as illustrated in figure.
         5)   This flat file can be in any common form for instance in
              comma-separated format or any common format. The
              redundancy of data in the flat table can be clearly seen
              from a small portion of an EDI file.


 Table
                                          www.themegallery.com     LOGO
Add your company slogan
   Methods and Algorithms used (count)




Back
                          www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan
        Types of Outputs
 ď‚§Text mining, using EDI data a retailer can identify the demographics of its customers
such as gender, martial status, number of children, etc. and the products that they buy.
 ď‚§This information can have a tremendous positive impact on their operations by
decreasing inventory movement as well as placing inventory in locations where it is likely
to sell.

         1.    Buying patterns of customers; associations among customer
               demographic characteristics; predictions on which customers will
               respond to which mailings;

         2.    Patterns of fraudulent credit card usage; identities of “loyal” customers;
               credit card spending by customer groups; predictions of customers who
               are likely to change their credit card affiliation;

         3.    Predictions on which customers will buy new insurance policies;
               behavior patterns of risky customers; expectations of fraudulent
               behavior;

         4.    Characterizations of patient behavior to predict frequency of office visits.




                                                     www.themegallery.com          LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Applications of Text Mining in EDI databases
ď‚§   Text-mining and EDI applications can be used in a variety of
    sectors: consumer product sales, finance, manufacturing, health,
    bank, insurance, and utilities.

ď‚§   We can benefit from these technologies (text mining and EDI) if
    the types of data are available in EDI databases to perform text-
    mining applications for customer-based businesses which are:
       1)   demographics, such as age, gender and marital status;

       2)   banking and economic status, such as salary, profession and
            household income; and,

       3)   geographic details, such as city, state or regions.

       4)    Other demographics like education, hobbies or marital status
            can also be used.


                                            www.themegallery.com       LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

       Experimental Results
ď‚§   We generate the dataset by using Euclidean distance measures in
    k-mean algorithms to assign every item to its nearest cluster
    center using a common text mining application called WEKA.

ď‚§   The EDI banking text dataset normalized in a flat file and
    represented in a comma-separated format. A primary dataset will
    be created.

ď‚§   The resulting data file consists of 600 instances.

ď‚§   We will use the K-means algorithm to cluster the customers in the
    bank dataset, to characterize the resulting customer data
    segments.

ď‚§   Since K-mean permit numerical values for attributes, so we convert
    the dataset into the standard spreadsheet format and convert
    categorical attributes to binary.


                                          www.themegallery.com     LOGO
Add your company slogan

       Experimental Results (count)
ď‚§   The WEKA k-means algorithm uses Euclidean distance measure to
    compute distances between instances and clusters.

ď‚§    Entering seven clusters and seed values as well to generate a
    random number for making the initial assignment of instances to
    clusters.

ď‚§   WEKA illustrates the centroid of every cluster as well as statistics
    on the number and percentage of instances assigned to dissimilar
    clusters.

ď‚§   Cluster centroids are the mean vectors for each cluster (so, each
    dimension value in the centroid corresponds to the mean value for
    that dimension in the cluster).

ď‚§   In the final data portion, each instance has its assigned cluster as
    the last attribute value.


                                          www.themegallery.com     LOGO
Add your company slogan




www.themegallery.com     LOGO
Add your company slogan




www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

        Conclusion
ď‚§In this paper, we have used a homogenous mixture of two common technologies such
as EDI and Text mining.

ď‚§EDI with a transformation process represented the database storage.

ď‚§We used Text Mining to extract the useful hidden and previously unknown patterns or
information from EDI text databases.

ď‚§We also circled only the most interesting intersection point that correlates between EDI
and text mining.

ď‚§In EDI format, the file was translated into a normalized flat file in a comma-separated
format.

ď‚§The flat file represented the EDI database where we propose a dataset collected from a
banking transaction of EDI electronic text data which been gathered from EDI databases.

ď‚§In text mining, we suggest to use k-mean algorithm in clustering method to calculate the
Euclidean distance measures to assign every item to its nearest cluster center.

ď‚§ In the experimental section, we used a text mining application program solution called
WEKA to represent our results in a visual fashion.

                                                    www.themegallery.com          LOGO
Add your company slogan




www.themegallery.com     LOGO
!!!Thank you
34
35

Edi text

  • 1.
    Text Mining Documentsin Electronic Data Interchange Environment Dr. Zakaria Suliman Zubi, Associate Professor , Computer Science Department, Faculty of Science , Sirt University, Sirt ,Libya. LOGO
  • 2.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining . 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 3.
    Add your companyslogan Abstract 1. Internet is a huge source of electronic text documents, in multilingual languages. 2. Electronic documents could be interchanged through the web via Electronic Data Interchange (EDI) environments. 3. The text data can be exchanged in the web in an EDI format such as X12 formats. 4. The EDI format can be transformed and stored in a database. 5. The EDI database will be normalized and mapped into a flat file in a form such as spreadsheets. 6. Text mining using clustering method were applied. 7. K-mean algorithm used with Euclidean distance measure. 8. We generate a dataset using text mining application program solution called WEKA, to show some experimental results. www.themegallery.com LOGO
  • 4.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 5.
    Add your companyslogan Introduction 1. Internet is a huge source of electronic documents in multilingual languages. 2. Electronic documents may contains text, images, audios and videos. 3. Text documents may contains text in Latin languages such as English, French, Spanish,….etc Or Non-Latin's languages such as Arabic, Chinese, Japanese, Indian,…etc. 4. As a matter of fact, text content of any electronic document is the most significant value in any document, which makes applying text mining or information retrieval approaches much more reasonable. 5. Electronic Data Interchange (EDI) is another approach for electronic documents interchange through the web in Electronic Data Interchange (EDI) environments. www.themegallery.com LOGO
  • 6.
    Add your companyslogan Introduction (Continue…..) 6. EDI is becoming progressively more significant as an easy mechanism for organizations to manage, buy, sell, and trade information. ANSI has approved a set of EDI standards known as the X12 standards. 7. X12 standards represented the electronic documents. 8. These electronic standards are a necessary condition between any two organization to start a business transactions. 9. The EDI format can be transformed and stored in a database. EDI documents- to- database – to- text mining life cycle. www.themegallery.com LOGO
  • 7.
    Add your companyslogan Introduction (Continue…..) 9. The EDI database will be normalized and mapped into a flat file in a form such as spreadsheets. 10. Text mining using clustering method will applied. 11. K-mean algorithm used with Euclidean distance measure. 12. We generate a dataset using text mining application program solution called WEKA, to show some experimental results. www.themegallery.com LOGO
  • 8.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 9.
    Add your companyslogan Types of Text Mining The purposes of using text mining or data mining: ď‚§ To improve customer achievement and maintenance. ď‚§ To reduce fraud . ď‚§ To identify internal inefficiencies and then revamp operations. ď‚§ To map the unexplored environment of the Internet. The major types of tools used in text mining are: I. Artificial Neural Networks; II. Decision trees; III. Genetic algorithms; IV. Rule induction; V. Nearest Neighbor Method; VI. Data Visualization; www.themegallery.com LOGO
  • 10.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 11.
    Add your companyslogan Types of Information and Methods Text mining usually produces five types of information such as: Turn out when occurrences 1. Associations; linked in a single occasion. 2. Sequences; 3. Classifications; Procedures linked over time based on the event that happen. 4. Forecasting 5. Clustering; It Classificationfuture value ofto guesses the can assist us Is one of the essential methods used discover the personality sales continuous variables like of in text mining approaches to discovercustomers who are likelywithin figures based on patterns to different groupings with the data. the data. provides a model that leave and used to expect who they are. www.themegallery.com LOGO
  • 12.
    Add your companyslogan Types of Information and Methods (count) Clustering: 1. Is unsupervised learning process applied to the text data depending on pre-specified knowledge . 2. We use a common partitioned method called K-means algorithm. 3. We calculate the distance measures by using Euclidean measures from the centroid. 4. Improving performance of text in electronic documents. www.themegallery.com LOGO
  • 13.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 14.
    Add your companyslogan Methods and Algorithms used 1. Clustering using k- means Algorithm:  The k-means algorithm assigns each point to the cluster whose centroid is the nearest point.  The center is the average of all the points in the cluster that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster.  The data set has three dimensions and the cluster has two points: X = (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3), where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2.The algorithm steps are: 1. Input D:= {d1,d2,….,dn}; k:= the cluster number; 2. Select k document vectors as initial centriods of k cluster; 3. Repeat; 4. Select one vector d in remaining documents; 5. Compute similarities between d and k centriods; 6. Put d in the closest cluster and recomputed the centriods; 7. Until the centriods don't change; 8. Output: k clusters of documents. www.themegallery.com LOGO
  • 15.
    Add your companyslogan Methods and Algorithms used (count) 2.Bag-of-Words Document : The generation of electronic documents as a bag of words in EDI database will leads to the following features:  Text document is represented by the words it contains (and their occurrences) e.g., "Lord of the rings" → {"the", "lord", "rings", "of"}. This representation has a high efficient which makes learning far simpler and easer. The order of words in this case is not important for certain application.  Stemming to identify a word by it's root is also conducted e.g., flying, flew → fly, it's used to reduce dimensionality.  Stop words are also used whereas, the most common words are unlikely to help text mining e.g., "the", "a", "an", "you" ..etc.  Each document represented by the set of its word frequencies and categories that it belongs too. www.themegallery.com LOGO
  • 16.
    Add your companyslogan Methods and Algorithms used (count) 3. Text in EDI document representation :  The representation of EDI text document will be as a bag of words, which appears independently without considering the order.  Each word corresponds to a dimension in the resulting data space and each document then becomes a vector consisting of non-negative values on each dimension. We also remove stop words  We uses the frequency of each term as its weight, which means terms that appear more frequently are more important and descriptive for the document.  Let D = {d1, . . . , dn} be a set of documents and T = {t1, . . . ,tm} the set of distinct terms occurring in D.  A document represented as a vector td. Let tf(d, t) signify the frequency of term t ε T in document d ε D. Then the vector representation of a document d: td = (tf(d, t1), . . . , tf(d, tm)) www.themegallery.com LOGO
  • 17.
    Add your companyslogan Methods and Algorithms used (count) 4. Distance Measures map the distance between the representative description of two objects into a single numeric value, which depends on two factors the properties of the two objects and the measure it. To qualify a distance measure as a metric, a measure d must satisfy the following four conditions. 1. Let x and y be any two objects (electronic document) in a data set and d(x, y) be the distance between x and y The distance between any two points must be nonnegative, that is, d(x, y) ≥ 0. 2. The distance between two objects must be zero if and only if the two objects are identical, that is, d(x, y) = 0 if and only if x = y. 3. Distance must be symmetric, that is, distance from x to y is the same as the distance from y to x, i.e. d(x, y) = d(y, x). 4. The measure must satisfy the triangle inequality, which is d(x, z) ≤ d(x, y) + d(y, z). www.themegallery.com LOGO
  • 18.
    Add your companyslogan Methods and Algorithms used (count) Euclidian distance Measures :  A widely used method in text clustering problem.  It is also the default distance measure used with the K-means algorithm.  It is also the ordinary distance between two points and can be easily measured with a ruler in two or three-dimensional space.  If we give two documents da and db represented by their term vectors ta and tb respectively, the Euclidean distance of the two documents defined as: It can be calculated also: distance(x,y) = {Σi (xi - yi)2 }½.  Squared Euclidean distance: is used also when we want a greater weight on objects that are further apart. This distance computed in the following: distance(x,y) = Σi (xi - yi)2 www.themegallery.com LOGO
  • 19.
    Add your companyslogan Methods and Algorithms used (count) 5. Dataset :We propose a collection of a banking transaction of EDI electronic text data that been gathered from EDI databases. 1. EDI text data collected and aggregated in seven main categories. 2. We create an EDI corpus. 3. This corpus represent the datasets that consist of 2000 EDI electronic documents of different lengths that belongs to seven categories. 4. the categories are transactions divisions in X12 standard EDI format. www.themegallery.com LOGO
  • 20.
    Add your companyslogan Methods and Algorithms used (count) 6. Translating EDI to Databases : 1) Is an essential process for storing and accessing our transaction information in a valid database format which support all common database format. 2) It could be done by translating an EDI message EDI X12 standards formats into a variety of transactions. 3) Each transaction file format identifies as a mapping file and can be transformed into a flat file format? 4) Mapping the translated EDI message into the database will constricts a database more likely as illustrated in figure. 5) This flat file can be in any common form for instance in comma-separated format or any common format. The redundancy of data in the flat table can be clearly seen from a small portion of an EDI file. Table www.themegallery.com LOGO
  • 21.
    Add your companyslogan Methods and Algorithms used (count) Back www.themegallery.com LOGO
  • 22.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 23.
    Add your companyslogan Types of Outputs Text mining, using EDI data a retailer can identify the demographics of its customers such as gender, martial status, number of children, etc. and the products that they buy. This information can have a tremendous positive impact on their operations by decreasing inventory movement as well as placing inventory in locations where it is likely to sell. 1. Buying patterns of customers; associations among customer demographic characteristics; predictions on which customers will respond to which mailings; 2. Patterns of fraudulent credit card usage; identities of “loyal” customers; credit card spending by customer groups; predictions of customers who are likely to change their credit card affiliation; 3. Predictions on which customers will buy new insurance policies; behavior patterns of risky customers; expectations of fraudulent behavior; 4. Characterizations of patient behavior to predict frequency of office visits. www.themegallery.com LOGO
  • 24.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 25.
    Add your companyslogan Applications of Text Mining in EDI databases ď‚§ Text-mining and EDI applications can be used in a variety of sectors: consumer product sales, finance, manufacturing, health, bank, insurance, and utilities. ď‚§ We can benefit from these technologies (text mining and EDI) if the types of data are available in EDI databases to perform text- mining applications for customer-based businesses which are: 1) demographics, such as age, gender and marital status; 2) banking and economic status, such as salary, profession and household income; and, 3) geographic details, such as city, state or regions. 4) Other demographics like education, hobbies or marital status can also be used. www.themegallery.com LOGO
  • 26.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 27.
    Add your companyslogan Experimental Results ď‚§ We generate the dataset by using Euclidean distance measures in k-mean algorithms to assign every item to its nearest cluster center using a common text mining application called WEKA. ď‚§ The EDI banking text dataset normalized in a flat file and represented in a comma-separated format. A primary dataset will be created. ď‚§ The resulting data file consists of 600 instances. ď‚§ We will use the K-means algorithm to cluster the customers in the bank dataset, to characterize the resulting customer data segments. ď‚§ Since K-mean permit numerical values for attributes, so we convert the dataset into the standard spreadsheet format and convert categorical attributes to binary. www.themegallery.com LOGO
  • 28.
    Add your companyslogan Experimental Results (count) ď‚§ The WEKA k-means algorithm uses Euclidean distance measure to compute distances between instances and clusters. ď‚§ Entering seven clusters and seed values as well to generate a random number for making the initial assignment of instances to clusters. ď‚§ WEKA illustrates the centroid of every cluster as well as statistics on the number and percentage of instances assigned to dissimilar clusters. ď‚§ Cluster centroids are the mean vectors for each cluster (so, each dimension value in the centroid corresponds to the mean value for that dimension in the cluster). ď‚§ In the final data portion, each instance has its assigned cluster as the last attribute value. www.themegallery.com LOGO
  • 29.
    Add your companyslogan www.themegallery.com LOGO
  • 30.
    Add your companyslogan www.themegallery.com LOGO
  • 31.
    Add your companyslogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 32.
    Add your companyslogan Conclusion ď‚§In this paper, we have used a homogenous mixture of two common technologies such as EDI and Text mining. ď‚§EDI with a transformation process represented the database storage. ď‚§We used Text Mining to extract the useful hidden and previously unknown patterns or information from EDI text databases. ď‚§We also circled only the most interesting intersection point that correlates between EDI and text mining. ď‚§In EDI format, the file was translated into a normalized flat file in a comma-separated format. ď‚§The flat file represented the EDI database where we propose a dataset collected from a banking transaction of EDI electronic text data which been gathered from EDI databases. ď‚§In text mining, we suggest to use k-mean algorithm in clustering method to calculate the Euclidean distance measures to assign every item to its nearest cluster center. ď‚§ In the experimental section, we used a text mining application program solution called WEKA to represent our results in a visual fashion. www.themegallery.com LOGO
  • 33.
    Add your companyslogan www.themegallery.com LOGO
  • 34.
  • 35.