• Save
Edi text
Upcoming SlideShare
Loading in...5
×
 

Edi text

on

  • 211 views

 

Statistics

Views

Total Views
211
Views on SlideShare
211
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Edi text Edi text Presentation Transcript

    • Text Mining Documents in Electronic Data Interchange Environment Dr. Zakaria Suliman Zubi, Associate Professor , Computer Science Department, Faculty of Science , Sirt University, Sirt ,Libya. LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining . 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Abstract1. Internet is a huge source of electronic text documents, in multilingual languages.2. Electronic documents could be interchanged through the web via Electronic Data Interchange (EDI) environments.3. The text data can be exchanged in the web in an EDI format such as X12 formats.4. The EDI format can be transformed and stored in a database.5. The EDI database will be normalized and mapped into a flat file in a form such as spreadsheets.6. Text mining using clustering method were applied.7. K-mean algorithm used with Euclidean distance measure.8. We generate a dataset using text mining application program solution called WEKA, to show some experimental results. www.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Introduction1. Internet is a huge source of electronic documents in multilingual languages.2. Electronic documents may contains text, images, audios and videos.3. Text documents may contains text in Latin languages such as English, French, Spanish,….etc Or Non-Latins languages such as Arabic, Chinese, Japanese, Indian,…etc.4. As a matter of fact, text content of any electronic document is the most significant value in any document, which makes applying text mining or information retrieval approaches much more reasonable.5. Electronic Data Interchange (EDI) is another approach for electronic documents interchange through the web in Electronic Data Interchange (EDI) environments. www.themegallery.com LOGO
    • Add your company slogan Introduction (Continue…..) 6. EDI is becoming progressively more significant as an easy mechanism for organizations to manage, buy, sell, and trade information. ANSI has approved a set of EDI standards known as the X12 standards. 7. X12 standards represented the electronic documents. 8. These electronic standards are a necessary condition between any two organization to start a business transactions. 9. The EDI format can be transformed and stored in a database.EDI documents- to-database – to- text mining life cycle. www.themegallery.com LOGO
    • Add your company sloganIntroduction (Continue…..) 9. The EDI database will be normalized and mapped into a flat file in a form such as spreadsheets. 10. Text mining using clustering method will applied. 11. K-mean algorithm used with Euclidean distance measure. 12. We generate a dataset using text mining application program solution called WEKA, to show some experimental results. www.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods . 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Types of Text MiningThe purposes of using text mining or data mining:  To improve customer achievement and maintenance.  To reduce fraud .  To identify internal inefficiencies and then revamp operations.  To map the unexplored environment of the Internet.The major types of tools used in text mining are: I. Artificial Neural Networks; II. Decision trees; III. Genetic algorithms; IV. Rule induction; V. Nearest Neighbor Method; VI. Data Visualization; www.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Types of Information and MethodsText mining usually produces five types of information such as: Turn out when occurrences 1. Associations; linked in a single occasion. 2. Sequences; 3. Classifications; Procedures linked over time based on the event that happen. 4. Forecasting 5. Clustering; It Classificationfuture value ofto guesses the can assist us Is one of the essential methods used discover the personality sales continuous variables like of in text mining approaches to discovercustomers who are likelywithin figures based on patterns to different groupings with the data. the data. provides a model that leave and used to expect who they are. www.themegallery.com LOGO
    • Add your company sloganTypes of Information and Methods (count) Clustering: 1. Is unsupervised learning process applied to the text data depending on pre-specified knowledge . 2. We use a common partitioned method called K-means algorithm. 3. We calculate the distance measures by using Euclidean measures from the centroid. 4. Improving performance of text in electronic documents. www.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Methods and Algorithms used1. Clustering using k- means Algorithm: The k-means algorithm assigns each point to the cluster whose centroid is the nearest point. The center is the average of all the points in the cluster that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster. The data set has three dimensions and the cluster has two points: X = (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3), where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2.The algorithm steps are: 1. Input D:= {d1,d2,….,dn}; k:= the cluster number; 2. Select k document vectors as initial centriods of k cluster; 3. Repeat; 4. Select one vector d in remaining documents; 5. Compute similarities between d and k centriods; 6. Put d in the closest cluster and recomputed the centriods; 7. Until the centriods dont change; 8. Output: k clusters of documents. www.themegallery.com LOGO
    • Add your company slogan Methods and Algorithms used (count)2.Bag-of-Words Document : The generation of electronic documents as a bag of words in EDI database will leads to the following features:  Text document is represented by the words it contains (and their occurrences) e.g., "Lord of the rings" → {"the", "lord", "rings", "of"}. This representation has a high efficient which makes learning far simpler and easer. The order of words in this case is not important for certain application.  Stemming to identify a word by its root is also conducted e.g., flying, flew → fly, its used to reduce dimensionality.  Stop words are also used whereas, the most common words are unlikely to help text mining e.g., "the", "a", "an", "you" ..etc.  Each document represented by the set of its word frequencies and categories that it belongs too. www.themegallery.com LOGO
    • Add your company slogan Methods and Algorithms used (count)3. Text in EDI document representation : The representation of EDI text document will be as a bag of words, which appears independently without considering the order. Each word corresponds to a dimension in the resulting data space and each document then becomes a vector consisting of non-negative values on each dimension. We also remove stop words We uses the frequency of each term as its weight, which means terms that appear more frequently are more important and descriptive for the document. Let D = {d1, . . . , dn} be a set of documents and T = {t1, . . . ,tm} the set of distinct terms occurring in D. A document represented as a vector td. Let tf(d, t) signify the frequency of term t ε T in document d ε D. Then the vector representation of a document d: td = (tf(d, t1), . . . , tf(d, tm)) www.themegallery.com LOGO
    • Add your company slogan Methods and Algorithms used (count)4. Distance Measures map the distance between the representative description of two objects into a single numeric value, which depends on two factors the properties of the two objects and the measure it. To qualify a distance measure as a metric, a measure d must satisfy the following four conditions. 1. Let x and y be any two objects (electronic document) in a data set and d(x, y) be the distance between x and y The distance between any two points must be nonnegative, that is, d(x, y) ≥ 0. 2. The distance between two objects must be zero if and only if the two objects are identical, that is, d(x, y) = 0 if and only if x = y. 3. Distance must be symmetric, that is, distance from x to y is the same as the distance from y to x, i.e. d(x, y) = d(y, x). 4. The measure must satisfy the triangle inequality, which is d(x, z) ≤ d(x, y) + d(y, z). www.themegallery.com LOGO
    • Add your company slogan Methods and Algorithms used (count)Euclidian distance Measures : A widely used method in text clustering problem. It is also the default distance measure used with the K-means algorithm. It is also the ordinary distance between two points and can be easily measured with a ruler in two or three-dimensional space. If we give two documents da and db represented by their term vectors ta and tb respectively, the Euclidean distance of the two documents defined as: It can be calculated also: distance(x,y) = {Σi (xi - yi)2 }½. Squared Euclidean distance: is used also when we want a greater weight on objects that are further apart. This distance computed in the following: distance(x,y) = Σi (xi - yi)2 www.themegallery.com LOGO
    • Add your company slogan Methods and Algorithms used (count)5. Dataset :We propose a collection of a banking transaction of EDI electronic text data that been gathered from EDI databases. 1. EDI text data collected and aggregated in seven main categories. 2. We create an EDI corpus. 3. This corpus represent the datasets that consist of 2000 EDI electronic documents of different lengths that belongs to seven categories. 4. the categories are transactions divisions in X12 standard EDI format. www.themegallery.com LOGO
    • Add your company slogan Methods and Algorithms used (count)6. Translating EDI to Databases : 1) Is an essential process for storing and accessing our transaction information in a valid database format which support all common database format. 2) It could be done by translating an EDI message EDI X12 standards formats into a variety of transactions. 3) Each transaction file format identifies as a mapping file and can be transformed into a flat file format? 4) Mapping the translated EDI message into the database will constricts a database more likely as illustrated in figure. 5) This flat file can be in any common form for instance in comma-separated format or any common format. The redundancy of data in the flat table can be clearly seen from a small portion of an EDI file. Table www.themegallery.com LOGO
    • Add your company slogan Methods and Algorithms used (count)Back www.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Types of Outputs Text mining, using EDI data a retailer can identify the demographics of its customerssuch as gender, martial status, number of children, etc. and the products that they buy. This information can have a tremendous positive impact on their operations bydecreasing inventory movement as well as placing inventory in locations where it is likelyto sell. 1. Buying patterns of customers; associations among customer demographic characteristics; predictions on which customers will respond to which mailings; 2. Patterns of fraudulent credit card usage; identities of “loyal” customers; credit card spending by customer groups; predictions of customers who are likely to change their credit card affiliation; 3. Predictions on which customers will buy new insurance policies; behavior patterns of risky customers; expectations of fraudulent behavior; 4. Characterizations of patient behavior to predict frequency of office visits. www.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Applications of Text Mining in EDI databases Text-mining and EDI applications can be used in a variety of sectors: consumer product sales, finance, manufacturing, health, bank, insurance, and utilities. We can benefit from these technologies (text mining and EDI) if the types of data are available in EDI databases to perform text- mining applications for customer-based businesses which are: 1) demographics, such as age, gender and marital status; 2) banking and economic status, such as salary, profession and household income; and, 3) geographic details, such as city, state or regions. 4) Other demographics like education, hobbies or marital status can also be used. www.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan Experimental Results We generate the dataset by using Euclidean distance measures in k-mean algorithms to assign every item to its nearest cluster center using a common text mining application called WEKA. The EDI banking text dataset normalized in a flat file and represented in a comma-separated format. A primary dataset will be created. The resulting data file consists of 600 instances. We will use the K-means algorithm to cluster the customers in the bank dataset, to characterize the resulting customer data segments. Since K-mean permit numerical values for attributes, so we convert the dataset into the standard spreadsheet format and convert categorical attributes to binary. www.themegallery.com LOGO
    • Add your company slogan Experimental Results (count) The WEKA k-means algorithm uses Euclidean distance measure to compute distances between instances and clusters. Entering seven clusters and seed values as well to generate a random number for making the initial assignment of instances to clusters. WEKA illustrates the centroid of every cluster as well as statistics on the number and percentage of instances assigned to dissimilar clusters. Cluster centroids are the mean vectors for each cluster (so, each dimension value in the centroid corresponds to the mean value for that dimension in the cluster). In the final data portion, each instance has its assigned cluster as the last attribute value. www.themegallery.com LOGO
    • Add your company sloganwww.themegallery.com LOGO
    • Add your company sloganwww.themegallery.com LOGO
    • Add your company sloganContents 1 Abstract. 2 Introduction . 3 Types of Text Mining. 4 Types of Information and Methods. 5 Methods and Algorithms used. 6 Types of Outputs . 7 Applications of Text Mining in EDI databases. 8 Experimental Results. www.themegallery.com LOGO 9 Conclusion.
    • Add your company slogan ConclusionIn this paper, we have used a homogenous mixture of two common technologies suchas EDI and Text mining.EDI with a transformation process represented the database storage.We used Text Mining to extract the useful hidden and previously unknown patterns orinformation from EDI text databases.We also circled only the most interesting intersection point that correlates between EDIand text mining.In EDI format, the file was translated into a normalized flat file in a comma-separatedformat.The flat file represented the EDI database where we propose a dataset collected from abanking transaction of EDI electronic text data which been gathered from EDI databases.In text mining, we suggest to use k-mean algorithm in clustering method to calculate theEuclidean distance measures to assign every item to its nearest cluster center. In the experimental section, we used a text mining application program solution calledWEKA to represent our results in a visual fashion. www.themegallery.com LOGO
    • Add your company sloganwww.themegallery.com LOGO
    • !!!Thank you34
    • 35