SlideShare a Scribd company logo
1 of 35
Text Mining Documents in Electronic
   Data Interchange Environment
            Dr. Zakaria Suliman Zubi,
            Associate Professor ,
            Computer Science Department,
            Faculty of Science ,
            Sirt University,
            Sirt ,Libya.



                                  LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining       .

    4   Types of Information and
        Methods .
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Abstract
1.   Internet is a huge source of electronic text documents,
     in multilingual languages.
2.   Electronic documents could be interchanged through the
     web via Electronic Data Interchange (EDI) environments.
3.   The text data can be exchanged in the web in an EDI
     format such as X12 formats.
4.   The EDI format can be transformed and stored in a
     database.
5.   The EDI database will be normalized and mapped into a
     flat file in a form such as spreadsheets.
6.   Text mining using clustering method were applied.
7.   K-mean algorithm used with Euclidean distance measure.
8.   We generate a dataset using text mining application
     program solution called WEKA, to show some
     experimental results.


                                  www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods .
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Introduction
1.   Internet is a huge source of electronic documents in
     multilingual languages.

2.   Electronic documents may contains text, images, audios and
     videos.

3.   Text documents may contains text in Latin languages such as
     English, French, Spanish,….etc Or Non-Latin's languages such as
     Arabic, Chinese, Japanese, Indian,…etc.

4.   As a matter of fact, text content of any electronic document is
     the most significant value in any document, which makes
     applying text mining or information retrieval approaches much
     more reasonable.

5.   Electronic Data Interchange (EDI) is another approach for
     electronic documents interchange through the web in Electronic
     Data Interchange (EDI) environments.


                                        www.themegallery.com     LOGO
Add your company slogan

    Introduction (Continue…..)
                      6.   EDI is becoming progressively more
                           significant as an easy mechanism for
                           organizations to manage, buy, sell, and
                           trade information. ANSI has approved a
                           set of EDI standards known as the X12
                           standards.

                      7.   X12 standards represented the electronic
                           documents.

                      8.   These electronic standards are a
                           necessary condition between any two
                           organization to start a business
                           transactions.

                      9.   The EDI format can be transformed and
                           stored in a database.
EDI documents- to-
database – to- text
 mining life cycle.
                                     www.themegallery.com     LOGO
Add your company slogan

Introduction (Continue…..)
             9.   The EDI database will be
                  normalized and mapped into a flat
                  file in a form such as spreadsheets.

             10. Text mining using clustering
                 method will applied.

             11. K-mean algorithm used with
                 Euclidean distance measure.

             12. We generate a dataset using text
                 mining application program
                 solution called WEKA, to show some
                 experimental results.




                        www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods .
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan
       Types of Text Mining
The purposes of using text mining or data mining:
         To improve customer achievement and maintenance.
         To reduce fraud .
         To identify internal inefficiencies and then revamp
          operations.
         To map the unexplored environment of the Internet.


The major types of tools used in text mining are:
   I.     Artificial Neural Networks;
   II.    Decision trees;
   III.   Genetic algorithms;
   IV.    Rule induction;
   V.     Nearest Neighbor Method;
   VI.    Data Visualization;

                                         www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

    Types of Information and Methods
Text mining usually produces five types of information such
     as:
                                               Turn out when occurrences
   1.   Associations;                          linked in a single occasion.

   2.   Sequences;
   3.   Classifications;                          Procedures linked over time
                                                  based on the event that happen.
   4.   Forecasting
   5.   Clustering;
                                           It Classificationfuture value ofto
                                               guesses the can assist us
        Is one of the essential methods used discover the personality sales
                                           continuous variables like of
        in text mining approaches to discovercustomers who are likelywithin
                                           figures based on patterns to
        different groupings with the data. the data. provides a model that
                                              leave and
                                              used to expect who they are.




                                              www.themegallery.com        LOGO
Add your company slogan

Types of Information and Methods (count)
               Clustering:
               1.   Is unsupervised learning process
                    applied to the text data depending
                    on pre-specified knowledge .

               2.   We use a common partitioned
                    method called K-means algorithm.

               3.   We calculate the distance
                    measures by using Euclidean
                    measures from the centroid.

               4.   Improving performance of text in
                    electronic documents.



                          www.themegallery.com      LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Methods and Algorithms used
1. Clustering using k- means Algorithm:
     The k-means algorithm assigns each point to the cluster whose
     centroid is the nearest point.

    The center is the average of all the points in the cluster that is, its
     coordinates are the arithmetic mean for each dimension separately
     over all the points in the cluster.

    The data set has three dimensions and the cluster has two points: X =
     (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3),
     where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2.The algorithm
     steps are:

       1.   Input D:= {d1,d2,….,dn}; k:= the cluster number;
       2.   Select k document vectors as initial centriods of k cluster;
       3.   Repeat;
       4.    Select one vector d in remaining documents;
       5.   Compute similarities between d and k centriods;
       6.   Put d in the closest cluster and recomputed the centriods;
       7.   Until the centriods don't change;
       8.   Output: k clusters of documents.

                                                      www.themegallery.com           LOGO
Add your company slogan
    Methods and Algorithms used (count)
2.Bag-of-Words Document : The generation of electronic
 documents as a bag of words in EDI database will leads to
 the following features:
         Text document is represented by the words it contains (and
          their occurrences) e.g., "Lord of the rings" → {"the", "lord",
          "rings", "of"}. This representation has a high efficient which
          makes learning far simpler and easer. The order of words in
          this case is not important for certain application.

         Stemming to identify a word by it's root is also conducted
          e.g., flying, flew → fly, it's used to reduce dimensionality.

         Stop words are also used whereas, the most common words
          are unlikely to help text mining e.g., "the", "a", "an", "you"
          ..etc.

         Each document represented by the set of its word
          frequencies and categories that it belongs too.
                                          www.themegallery.com       LOGO
Add your company slogan
     Methods and Algorithms used (count)
3. Text in EDI document representation :
   The representation of EDI text document will be as a bag of words,
    which appears independently without considering the order.
   Each word corresponds to a dimension in the resulting data space
    and each document then becomes a vector consisting of non-negative
    values on each dimension. We also remove stop words
   We uses the frequency of each term as its weight, which means terms
    that appear more frequently are more important and descriptive for the
    document.
    Let D = {d1, . . . , dn} be a set of documents and T = {t1, . . . ,tm} the set
    of distinct terms occurring in D.
    A document represented as a vector td. Let tf(d, t) signify the
    frequency of term t ε T in document d ε D. Then the vector
    representation of a document d: td = (tf(d, t1), . . . , tf(d, tm))

                                                www.themegallery.com        LOGO
Add your company slogan
     Methods and Algorithms used (count)
4. Distance Measures map the distance between the representative
     description of two objects into a single numeric value, which depends
     on two factors the properties of the two objects and the measure it.
     To qualify a distance measure as a metric, a measure d must satisfy
     the following four conditions.
        1.    Let x and y be any two objects (electronic document) in a data set
             and d(x, y) be the distance between x and y The distance between
             any two points must be nonnegative, that is, d(x, y) ≥ 0.

        2.   The distance between two objects must be zero if and only if the
             two objects are identical, that is, d(x, y) = 0 if and only if x = y.

        3.   Distance must be symmetric, that is, distance from x to y is the
             same as the distance from y to x, i.e. d(x, y) = d(y, x).

        4.   The measure must satisfy the triangle inequality, which is d(x, z) ≤
             d(x, y) + d(y, z).


                                                 www.themegallery.com         LOGO
Add your company slogan
     Methods and Algorithms used (count)
Euclidian distance Measures :
   A widely used method in text clustering problem.
   It is also the default distance measure used with the K-means
    algorithm.
   It is also the ordinary distance between two points and can be easily
    measured with a ruler in two or three-dimensional space.
   If we give two documents da and db represented by their term vectors ta
    and tb respectively, the Euclidean distance of the two documents
    defined as:


     It can be calculated also: distance(x,y) = {Σi (xi - yi)2 }½.
   Squared Euclidean distance: is used also when we want a greater
    weight on objects that are further apart. This distance computed in
    the following: distance(x,y) = Σi (xi - yi)2

                                           www.themegallery.com       LOGO
Add your company slogan
     Methods and Algorithms used (count)
5. Dataset :We propose a collection of a banking
    transaction of EDI electronic text data that been
    gathered from EDI databases.
        1. EDI text data collected and
           aggregated in seven main
           categories.
        2. We create an EDI corpus.
        3. This corpus represent the datasets
           that consist of 2000 EDI electronic
           documents of different lengths that
           belongs to seven categories.
        4. the categories are transactions
           divisions in X12 standard EDI
           format.


                                             www.themegallery.com     LOGO
Add your company slogan
    Methods and Algorithms used (count)
6. Translating EDI to Databases :
         1)   Is an essential process for storing and accessing our
              transaction information in a valid database format which
              support all common database format.
         2)   It could be done by translating an EDI message EDI X12
              standards formats into a variety of transactions.
         3)   Each transaction file format identifies as a mapping file
              and can be transformed into a flat file format?
         4)   Mapping the translated EDI message into the database will
              constricts a database more likely as illustrated in figure.
         5)   This flat file can be in any common form for instance in
              comma-separated format or any common format. The
              redundancy of data in the flat table can be clearly seen
              from a small portion of an EDI file.


 Table
                                          www.themegallery.com     LOGO
Add your company slogan
   Methods and Algorithms used (count)




Back
                          www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan
        Types of Outputs
 Text mining, using EDI data a retailer can identify the demographics of its customers
such as gender, martial status, number of children, etc. and the products that they buy.
 This information can have a tremendous positive impact on their operations by
decreasing inventory movement as well as placing inventory in locations where it is likely
to sell.

         1.    Buying patterns of customers; associations among customer
               demographic characteristics; predictions on which customers will
               respond to which mailings;

         2.    Patterns of fraudulent credit card usage; identities of “loyal” customers;
               credit card spending by customer groups; predictions of customers who
               are likely to change their credit card affiliation;

         3.    Predictions on which customers will buy new insurance policies;
               behavior patterns of risky customers; expectations of fraudulent
               behavior;

         4.    Characterizations of patient behavior to predict frequency of office visits.




                                                     www.themegallery.com          LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

     Applications of Text Mining in EDI databases
   Text-mining and EDI applications can be used in a variety of
    sectors: consumer product sales, finance, manufacturing, health,
    bank, insurance, and utilities.

   We can benefit from these technologies (text mining and EDI) if
    the types of data are available in EDI databases to perform text-
    mining applications for customer-based businesses which are:
       1)   demographics, such as age, gender and marital status;

       2)   banking and economic status, such as salary, profession and
            household income; and,

       3)   geographic details, such as city, state or regions.

       4)    Other demographics like education, hobbies or marital status
            can also be used.


                                            www.themegallery.com       LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

       Experimental Results
   We generate the dataset by using Euclidean distance measures in
    k-mean algorithms to assign every item to its nearest cluster
    center using a common text mining application called WEKA.

   The EDI banking text dataset normalized in a flat file and
    represented in a comma-separated format. A primary dataset will
    be created.

   The resulting data file consists of 600 instances.

   We will use the K-means algorithm to cluster the customers in the
    bank dataset, to characterize the resulting customer data
    segments.

   Since K-mean permit numerical values for attributes, so we convert
    the dataset into the standard spreadsheet format and convert
    categorical attributes to binary.


                                          www.themegallery.com     LOGO
Add your company slogan

       Experimental Results (count)
   The WEKA k-means algorithm uses Euclidean distance measure to
    compute distances between instances and clusters.

    Entering seven clusters and seed values as well to generate a
    random number for making the initial assignment of instances to
    clusters.

   WEKA illustrates the centroid of every cluster as well as statistics
    on the number and percentage of instances assigned to dissimilar
    clusters.

   Cluster centroids are the mean vectors for each cluster (so, each
    dimension value in the centroid corresponds to the mean value for
    that dimension in the cluster).

   In the final data portion, each instance has its assigned cluster as
    the last attribute value.


                                          www.themegallery.com     LOGO
Add your company slogan




www.themegallery.com     LOGO
Add your company slogan




www.themegallery.com     LOGO
Add your company slogan
Contents
    1   Abstract.

    2   Introduction .

    3   Types of Text Mining.

    4   Types of Information and
        Methods.
    5   Methods and Algorithms
        used.
    6   Types of Outputs .

    7   Applications of Text Mining in EDI
        databases.
    8   Experimental Results.
                       www.themegallery.com     LOGO
    9   Conclusion.
Add your company slogan

        Conclusion
In this paper, we have used a homogenous mixture of two common technologies such
as EDI and Text mining.

EDI with a transformation process represented the database storage.

We used Text Mining to extract the useful hidden and previously unknown patterns or
information from EDI text databases.

We also circled only the most interesting intersection point that correlates between EDI
and text mining.

In EDI format, the file was translated into a normalized flat file in a comma-separated
format.

The flat file represented the EDI database where we propose a dataset collected from a
banking transaction of EDI electronic text data which been gathered from EDI databases.

In text mining, we suggest to use k-mean algorithm in clustering method to calculate the
Euclidean distance measures to assign every item to its nearest cluster center.

 In the experimental section, we used a text mining application program solution called
WEKA to represent our results in a visual fashion.

                                                    www.themegallery.com          LOGO
Add your company slogan




www.themegallery.com     LOGO
!!!Thank you
34
35

More Related Content

What's hot

AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...csandit
 
Lecture Notes in Computer Science:
Lecture Notes in Computer Science:Lecture Notes in Computer Science:
Lecture Notes in Computer Science:butest
 
Drubbing an Audio Messages inside a Digital Image Using (ELSB) Method
Drubbing an Audio Messages inside a Digital Image Using (ELSB) MethodDrubbing an Audio Messages inside a Digital Image Using (ELSB) Method
Drubbing an Audio Messages inside a Digital Image Using (ELSB) MethodIOSRJECE
 
EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHY
EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHYEXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHY
EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHYIJNSA Journal
 
L10: queries and indices
L10: queries and indicesL10: queries and indices
L10: queries and indicesmedialeg gmbh
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...cscpconf
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningIOSR Journals
 
Machine Learning based Text Classification introduction
Machine Learning based Text Classification introductionMachine Learning based Text Classification introduction
Machine Learning based Text Classification introductionTreparel
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontologyIAEME Publication
 
Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...
Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...
Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...ITIIIndustries
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyjournalBEEI
 
Adding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylenAdding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylenDynamic People B.V.
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Top 10 cited articles in nlp
Top 10 cited articles in nlpTop 10 cited articles in nlp
Top 10 cited articles in nlpkevig
 

What's hot (19)

AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
Lecture Notes in Computer Science:
Lecture Notes in Computer Science:Lecture Notes in Computer Science:
Lecture Notes in Computer Science:
 
Drubbing an Audio Messages inside a Digital Image Using (ELSB) Method
Drubbing an Audio Messages inside a Digital Image Using (ELSB) MethodDrubbing an Audio Messages inside a Digital Image Using (ELSB) Method
Drubbing an Audio Messages inside a Digital Image Using (ELSB) Method
 
EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHY
EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHYEXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHY
EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHY
 
L10: queries and indices
L10: queries and indicesL10: queries and indices
L10: queries and indices
 
Convolutional Neural Networks
Convolutional Neural Networks Convolutional Neural Networks
Convolutional Neural Networks
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Machine Learning based Text Classification introduction
Machine Learning based Text Classification introductionMachine Learning based Text Classification introduction
Machine Learning based Text Classification introduction
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontology
 
Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...
Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...
Boosting the Capacity of Web based Steganography by Utilizing Html Space Code...
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganography
 
Adding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylenAdding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylen
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
ocr
ocrocr
ocr
 
Top 10 cited articles in nlp
Top 10 cited articles in nlpTop 10 cited articles in nlp
Top 10 cited articles in nlp
 

Viewers also liked

I- Extended Databases
I- Extended DatabasesI- Extended Databases
I- Extended DatabasesZakaria Zubi
 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Zakaria Zubi
 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Zakaria Zubi
 
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...Zakaria Zubi
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification Zakaria Zubi
 
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA Zakaria Zubi
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
 

Viewers also liked (8)

Ismail&&ziko 2003
Ismail&&ziko 2003Ismail&&ziko 2003
Ismail&&ziko 2003
 
I- Extended Databases
I- Extended DatabasesI- Extended Databases
I- Extended Databases
 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)
 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases
 
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification
 
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 

Similar to Edi text

‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
 
2014.11 meetup presentation v1
2014.11 meetup presentation v12014.11 meetup presentation v1
2014.11 meetup presentation v1gradyneff
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Karen Thompson
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringIRJET Journal
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...IRJET Journal
 
9 Tips to write efficient and scalable code.pdf
9 Tips to write efficient and scalable code.pdf9 Tips to write efficient and scalable code.pdf
9 Tips to write efficient and scalable code.pdfOprim Solutions
 
Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Kognitio
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalkUdaya Kumar
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docbutest
 
antrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxantrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxAnkitMishra616883
 
IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...
IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...
IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...IRJET Journal
 
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-Tree
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-TreeIRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-Tree
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-TreeIRJET Journal
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platformJesse Wang
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 

Similar to Edi text (20)

‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
 
B0410206010
B0410206010B0410206010
B0410206010
 
PoolParty Semantic Classifier
PoolParty Semantic ClassifierPoolParty Semantic Classifier
PoolParty Semantic Classifier
 
2014.11 meetup presentation v1
2014.11 meetup presentation v12014.11 meetup presentation v1
2014.11 meetup presentation v1
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web Engineering
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
 
9 Tips to write efficient and scalable code.pdf
9 Tips to write efficient and scalable code.pdf9 Tips to write efficient and scalable code.pdf
9 Tips to write efficient and scalable code.pdf
 
Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Product forecastingwebinar 20130417
Product forecastingwebinar 20130417
 
IT6701-Information management question bank
IT6701-Information management question bankIT6701-Information management question bank
IT6701-Information management question bank
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalk
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
antrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxantrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptx
 
IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...
IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...
IRJET- Behaviour of Hybrid Fibre Reinforced Sintered Fly Ash Aggregate Concre...
 
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-Tree
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-TreeIRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-Tree
IRJET- SVM-based Web Content Mining with Leaf Classification Unit From DOM-Tree
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 

More from Zakaria Zubi

applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...Zakaria Zubi
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understandingZakaria Zubi
 
Ibtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital imagesIbtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital imagesZakaria Zubi
 
Information communication technology in libya for educational purposes
Information communication technology in libya for educational purposesInformation communication technology in libya for educational purposes
Information communication technology in libya for educational purposesZakaria Zubi
 

More from Zakaria Zubi (6)

applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understanding
 
Model
ModelModel
Model
 
Ibtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital imagesIbtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital images
 
Deep Web mining
Deep Web miningDeep Web mining
Deep Web mining
 
Information communication technology in libya for educational purposes
Information communication technology in libya for educational purposesInformation communication technology in libya for educational purposes
Information communication technology in libya for educational purposes
 

Edi text

  • 1. Text Mining Documents in Electronic Data Interchange Environment Dr. Zakaria Suliman Zubi, Associate Professor , Computer Science Department, Faculty of Science , Sirt University, Sirt ,Libya. LOGO
  • 2. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining . 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 3. Add your company slogan Abstract 1. Internet is a huge source of electronic text documents, in multilingual languages. 2. Electronic documents could be interchanged through the web via Electronic Data Interchange (EDI) environments. 3. The text data can be exchanged in the web in an EDI format such as X12 formats. 4. The EDI format can be transformed and stored in a database. 5. The EDI database will be normalized and mapped into a flat file in a form such as spreadsheets. 6. Text mining using clustering method were applied. 7. K-mean algorithm used with Euclidean distance measure. 8. We generate a dataset using text mining application program solution called WEKA, to show some experimental results. www.themegallery.com LOGO
  • 4. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 5. Add your company slogan Introduction 1. Internet is a huge source of electronic documents in multilingual languages. 2. Electronic documents may contains text, images, audios and videos. 3. Text documents may contains text in Latin languages such as English, French, Spanish,….etc Or Non-Latin's languages such as Arabic, Chinese, Japanese, Indian,…etc. 4. As a matter of fact, text content of any electronic document is the most significant value in any document, which makes applying text mining or information retrieval approaches much more reasonable. 5. Electronic Data Interchange (EDI) is another approach for electronic documents interchange through the web in Electronic Data Interchange (EDI) environments. www.themegallery.com LOGO
  • 6. Add your company slogan Introduction (Continue…..) 6. EDI is becoming progressively more significant as an easy mechanism for organizations to manage, buy, sell, and trade information. ANSI has approved a set of EDI standards known as the X12 standards. 7. X12 standards represented the electronic documents. 8. These electronic standards are a necessary condition between any two organization to start a business transactions. 9. The EDI format can be transformed and stored in a database. EDI documents- to- database – to- text mining life cycle. www.themegallery.com LOGO
  • 7. Add your company slogan Introduction (Continue…..) 9. The EDI database will be normalized and mapped into a flat file in a form such as spreadsheets. 10. Text mining using clustering method will applied. 11. K-mean algorithm used with Euclidean distance measure. 12. We generate a dataset using text mining application program solution called WEKA, to show some experimental results. www.themegallery.com LOGO
  • 8. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 9. Add your company slogan Types of Text Mining The purposes of using text mining or data mining:  To improve customer achievement and maintenance.  To reduce fraud .  To identify internal inefficiencies and then revamp operations.  To map the unexplored environment of the Internet. The major types of tools used in text mining are: I. Artificial Neural Networks; II. Decision trees; III. Genetic algorithms; IV. Rule induction; V. Nearest Neighbor Method; VI. Data Visualization; www.themegallery.com LOGO
  • 10. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 11. Add your company slogan Types of Information and Methods Text mining usually produces five types of information such as: Turn out when occurrences 1. Associations; linked in a single occasion. 2. Sequences; 3. Classifications; Procedures linked over time based on the event that happen. 4. Forecasting 5. Clustering; It Classificationfuture value ofto guesses the can assist us Is one of the essential methods used discover the personality sales continuous variables like of in text mining approaches to discovercustomers who are likelywithin figures based on patterns to different groupings with the data. the data. provides a model that leave and used to expect who they are. www.themegallery.com LOGO
  • 12. Add your company slogan Types of Information and Methods (count) Clustering: 1. Is unsupervised learning process applied to the text data depending on pre-specified knowledge . 2. We use a common partitioned method called K-means algorithm. 3. We calculate the distance measures by using Euclidean measures from the centroid. 4. Improving performance of text in electronic documents. www.themegallery.com LOGO
  • 13. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 14. Add your company slogan Methods and Algorithms used 1. Clustering using k- means Algorithm:  The k-means algorithm assigns each point to the cluster whose centroid is the nearest point.  The center is the average of all the points in the cluster that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster.  The data set has three dimensions and the cluster has two points: X = (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3), where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2.The algorithm steps are: 1. Input D:= {d1,d2,….,dn}; k:= the cluster number; 2. Select k document vectors as initial centriods of k cluster; 3. Repeat; 4. Select one vector d in remaining documents; 5. Compute similarities between d and k centriods; 6. Put d in the closest cluster and recomputed the centriods; 7. Until the centriods don't change; 8. Output: k clusters of documents. www.themegallery.com LOGO
  • 15. Add your company slogan Methods and Algorithms used (count) 2.Bag-of-Words Document : The generation of electronic documents as a bag of words in EDI database will leads to the following features:  Text document is represented by the words it contains (and their occurrences) e.g., "Lord of the rings" → {"the", "lord", "rings", "of"}. This representation has a high efficient which makes learning far simpler and easer. The order of words in this case is not important for certain application.  Stemming to identify a word by it's root is also conducted e.g., flying, flew → fly, it's used to reduce dimensionality.  Stop words are also used whereas, the most common words are unlikely to help text mining e.g., "the", "a", "an", "you" ..etc.  Each document represented by the set of its word frequencies and categories that it belongs too. www.themegallery.com LOGO
  • 16. Add your company slogan Methods and Algorithms used (count) 3. Text in EDI document representation :  The representation of EDI text document will be as a bag of words, which appears independently without considering the order.  Each word corresponds to a dimension in the resulting data space and each document then becomes a vector consisting of non-negative values on each dimension. We also remove stop words  We uses the frequency of each term as its weight, which means terms that appear more frequently are more important and descriptive for the document.  Let D = {d1, . . . , dn} be a set of documents and T = {t1, . . . ,tm} the set of distinct terms occurring in D.  A document represented as a vector td. Let tf(d, t) signify the frequency of term t ε T in document d ε D. Then the vector representation of a document d: td = (tf(d, t1), . . . , tf(d, tm)) www.themegallery.com LOGO
  • 17. Add your company slogan Methods and Algorithms used (count) 4. Distance Measures map the distance between the representative description of two objects into a single numeric value, which depends on two factors the properties of the two objects and the measure it. To qualify a distance measure as a metric, a measure d must satisfy the following four conditions. 1. Let x and y be any two objects (electronic document) in a data set and d(x, y) be the distance between x and y The distance between any two points must be nonnegative, that is, d(x, y) ≥ 0. 2. The distance between two objects must be zero if and only if the two objects are identical, that is, d(x, y) = 0 if and only if x = y. 3. Distance must be symmetric, that is, distance from x to y is the same as the distance from y to x, i.e. d(x, y) = d(y, x). 4. The measure must satisfy the triangle inequality, which is d(x, z) ≤ d(x, y) + d(y, z). www.themegallery.com LOGO
  • 18. Add your company slogan Methods and Algorithms used (count) Euclidian distance Measures :  A widely used method in text clustering problem.  It is also the default distance measure used with the K-means algorithm.  It is also the ordinary distance between two points and can be easily measured with a ruler in two or three-dimensional space.  If we give two documents da and db represented by their term vectors ta and tb respectively, the Euclidean distance of the two documents defined as: It can be calculated also: distance(x,y) = {Σi (xi - yi)2 }½.  Squared Euclidean distance: is used also when we want a greater weight on objects that are further apart. This distance computed in the following: distance(x,y) = Σi (xi - yi)2 www.themegallery.com LOGO
  • 19. Add your company slogan Methods and Algorithms used (count) 5. Dataset :We propose a collection of a banking transaction of EDI electronic text data that been gathered from EDI databases. 1. EDI text data collected and aggregated in seven main categories. 2. We create an EDI corpus. 3. This corpus represent the datasets that consist of 2000 EDI electronic documents of different lengths that belongs to seven categories. 4. the categories are transactions divisions in X12 standard EDI format. www.themegallery.com LOGO
  • 20. Add your company slogan Methods and Algorithms used (count) 6. Translating EDI to Databases : 1) Is an essential process for storing and accessing our transaction information in a valid database format which support all common database format. 2) It could be done by translating an EDI message EDI X12 standards formats into a variety of transactions. 3) Each transaction file format identifies as a mapping file and can be transformed into a flat file format? 4) Mapping the translated EDI message into the database will constricts a database more likely as illustrated in figure. 5) This flat file can be in any common form for instance in comma-separated format or any common format. The redundancy of data in the flat table can be clearly seen from a small portion of an EDI file. Table www.themegallery.com LOGO
  • 21. Add your company slogan Methods and Algorithms used (count) Back www.themegallery.com LOGO
  • 22. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 23. Add your company slogan Types of Outputs Text mining, using EDI data a retailer can identify the demographics of its customers such as gender, martial status, number of children, etc. and the products that they buy. This information can have a tremendous positive impact on their operations by decreasing inventory movement as well as placing inventory in locations where it is likely to sell. 1. Buying patterns of customers; associations among customer demographic characteristics; predictions on which customers will respond to which mailings; 2. Patterns of fraudulent credit card usage; identities of “loyal” customers; credit card spending by customer groups; predictions of customers who are likely to change their credit card affiliation; 3. Predictions on which customers will buy new insurance policies; behavior patterns of risky customers; expectations of fraudulent behavior; 4. Characterizations of patient behavior to predict frequency of office visits. www.themegallery.com LOGO
  • 24. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 25. Add your company slogan Applications of Text Mining in EDI databases  Text-mining and EDI applications can be used in a variety of sectors: consumer product sales, finance, manufacturing, health, bank, insurance, and utilities.  We can benefit from these technologies (text mining and EDI) if the types of data are available in EDI databases to perform text- mining applications for customer-based businesses which are: 1) demographics, such as age, gender and marital status; 2) banking and economic status, such as salary, profession and household income; and, 3) geographic details, such as city, state or regions. 4) Other demographics like education, hobbies or marital status can also be used. www.themegallery.com LOGO
  • 26. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 27. Add your company slogan Experimental Results  We generate the dataset by using Euclidean distance measures in k-mean algorithms to assign every item to its nearest cluster center using a common text mining application called WEKA.  The EDI banking text dataset normalized in a flat file and represented in a comma-separated format. A primary dataset will be created.  The resulting data file consists of 600 instances.  We will use the K-means algorithm to cluster the customers in the bank dataset, to characterize the resulting customer data segments.  Since K-mean permit numerical values for attributes, so we convert the dataset into the standard spreadsheet format and convert categorical attributes to binary. www.themegallery.com LOGO
  • 28. Add your company slogan Experimental Results (count)  The WEKA k-means algorithm uses Euclidean distance measure to compute distances between instances and clusters.  Entering seven clusters and seed values as well to generate a random number for making the initial assignment of instances to clusters.  WEKA illustrates the centroid of every cluster as well as statistics on the number and percentage of instances assigned to dissimilar clusters.  Cluster centroids are the mean vectors for each cluster (so, each dimension value in the centroid corresponds to the mean value for that dimension in the cluster).  In the final data portion, each instance has its assigned cluster as the last attribute value. www.themegallery.com LOGO
  • 29. Add your company slogan www.themegallery.com LOGO
  • 30. Add your company slogan www.themegallery.com LOGO
  • 31. Add your company slogan Contents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
  • 32. Add your company slogan Conclusion In this paper, we have used a homogenous mixture of two common technologies such as EDI and Text mining. EDI with a transformation process represented the database storage. We used Text Mining to extract the useful hidden and previously unknown patterns or information from EDI text databases. We also circled only the most interesting intersection point that correlates between EDI and text mining. In EDI format, the file was translated into a normalized flat file in a comma-separated format. The flat file represented the EDI database where we propose a dataset collected from a banking transaction of EDI electronic text data which been gathered from EDI databases. In text mining, we suggest to use k-mean algorithm in clustering method to calculate the Euclidean distance measures to assign every item to its nearest cluster center.  In the experimental section, we used a text mining application program solution called WEKA to represent our results in a visual fashion. www.themegallery.com LOGO
  • 33. Add your company slogan www.themegallery.com LOGO
  • 35. 35