Lecture 10 on Data Mining and Agents


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lecture 10 on Data Mining and Agents

  1. 1. Data Mining Techniques <ul><li>Cluster Analysis </li></ul><ul><li>Induction </li></ul><ul><li>Neural Networks </li></ul><ul><li>OLAP </li></ul><ul><li>Data Visualization </li></ul>
  2. 2. Association Rule <ul><li>An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database. </li></ul><ul><li>Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items. </li></ul><ul><li>The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y. </li></ul>
  3. 3. Support <ul><li>The support of an item set S is the percentage of those transactions in T which contain S. </li></ul><ul><li>If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively. </li></ul>
  4. 4. Confidence <ul><li>Confidence of a candidate rule X Y is calculated as support(XY) / support(X). </li></ul><ul><li>The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y </li></ul>
  5. 5. Example: Association Rule <ul><li>In a store we might have I={cheese,ham,bread,butter,salt,coke} </li></ul><ul><li>A transaction could look like: t={bread,butter} for a customer who bought cheese and coke. </li></ul><ul><li>An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter. </li></ul>
  6. 6. Apriori Algorithm <ul><li>Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets. </li></ul><ul><li>Use the frequent itemsets to generate the desired rules. </li></ul>
  7. 7. Apriori Algorithm(cont’d) <ul><li>Pass 1 </li></ul><ul><li>Generate the candidate itemsets in C 1 </li></ul><ul><li>Save the frequent itemsets in L 1 </li></ul><ul><li>Pass k </li></ul><ul><li>Generate the candidate itemsets in Ck from the frequent itemsets in L k -1 </li></ul><ul><li>Join L k -1 with L k -1 , as follows: insert into C k select p. item 1 , q. item 1 , . . . , p. item k -1 , q. item k -1 from L k -1 p , L k -1 q where p. item 1 = q .item 1 , . . . , p. item k -1 < q .item k -1 </li></ul>
  8. 8. Apriori Algorithm(cont’d) <ul><li>3. Generate all ( k -1)-subsets from the candidate itemsets in C k </li></ul><ul><li>4. Prune all candidate itemsets from C k where some ( k -1)-subset of the candidate itemset is not in the frequent itemset L k -1 </li></ul><ul><li>2. Scan the transaction database to determine the support for each candidate itemset in C k </li></ul><ul><li>3. Save the frequent itemsets in L k </li></ul>
  9. 9. Smart Web Search Agents <ul><li>Data Search Engines >> Information Search Agents </li></ul><ul><li>- Traditional searching on the Web is done using one of the following three: </li></ul><ul><li>- Directories (Yahoo, Lycos, etc) </li></ul><ul><li>- Search Engines (AltaVista, NorthernLight, etc) </li></ul><ul><li>- Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc) </li></ul><ul><li>All of these involve keyword searches; Drawback: not easily personalized, </li></ul><ul><li>too many results (although many give relevancy factors) </li></ul>
  10. 10. <ul><li>- local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!) </li></ul><ul><li>- local cache information base (containing mined information and discovered knowledge for efficient personal use) </li></ul><ul><li>- domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries) </li></ul>
  11. 11. Intelligent Tools for E-Business <ul><li>Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems </li></ul><ul><li>Learning Algorithms, Heuristic Searching </li></ul><ul><li>Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery </li></ul><ul><li>Prediction & Time Series Analysis </li></ul><ul><li>Information Retrieval, Intelligent User Interface </li></ul><ul><li>Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems </li></ul>
  12. 12. Enhancing E-Business Process Through Data Mining <ul><li>Quality of discovered knowledge </li></ul><ul><ul><li>Having right data </li></ul></ul><ul><ul><li>Having appropriate data mining tools!!! </li></ul></ul><ul><li>Traditional Data Mining Tools </li></ul><ul><ul><li>Simple query and reporting </li></ul></ul><ul><ul><li>Visualization driven data exploration tools, OLAP </li></ul></ul><ul><ul><li>Discovery process is user driven </li></ul></ul>
  13. 13. Intelligent Data Mining Tools <ul><li>Automate the process of discovering patterns/knowledge in data </li></ul><ul><li>Require hypothesis, exploration </li></ul><ul><li>Derive business knowledge (patterns) from data </li></ul><ul><li>Combine business knowledge of users with results of discovery algorithms </li></ul>
  14. 14. Intelligent Information Agents <ul><li>The Data Mining Problem: </li></ul><ul><ul><li>Clustering/ Classification </li></ul></ul><ul><ul><li>Association </li></ul></ul><ul><ul><li>Sequencing </li></ul></ul><ul><li>Viewed as an Optimization Problem </li></ul><ul><li>Tools: Genetic Algorithms </li></ul>
  15. 15. Fuzzy Rules Discovering <ul><li>Rules discovering : The discovery of associations between business events, i.e. which items are purchased together </li></ul><ul><li>In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge </li></ul><ul><li>Fuzzy Query uses fuzzy terms like tall , small , and near to define linguistic concepts and formulate a query </li></ul><ul><li>Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data </li></ul>