Your SlideShare is downloading. ×
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Lecture 10 on Data Mining and Agents
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lecture 10 on Data Mining and Agents


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Data Mining Techniques
    • Cluster Analysis
    • Induction
    • Neural Networks
    • OLAP
    • Data Visualization
  • 2. Association Rule
    • An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database.
    • Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items.
    • The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.
  • 3. Support
    • The support of an item set S is the percentage of those transactions in T which contain S.
    • If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.
  • 4. Confidence
    • Confidence of a candidate rule X Y is calculated as support(XY) / support(X).
    • The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y
  • 5. Example: Association Rule
    • In a store we might have I={cheese,ham,bread,butter,salt,coke}
    • A transaction could look like: t={bread,butter} for a customer who bought cheese and coke.
    • An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.
  • 6. Apriori Algorithm
    • Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets.
    • Use the frequent itemsets to generate the desired rules.
  • 7. Apriori Algorithm(cont’d)
    • Pass 1
    • Generate the candidate itemsets in C 1
    • Save the frequent itemsets in L 1
    • Pass k
    • Generate the candidate itemsets in Ck from the frequent itemsets in L k -1
    • Join L k -1 with L k -1 , as follows: insert into C k select p. item 1 , q. item 1 , . . . , p. item k -1 , q. item k -1 from L k -1 p , L k -1 q where p. item 1 = q .item 1 , . . . , p. item k -1 < q .item k -1
  • 8. Apriori Algorithm(cont’d)
    • 3. Generate all ( k -1)-subsets from the candidate itemsets in C k
    • 4. Prune all candidate itemsets from C k where some ( k -1)-subset of the candidate itemset is not in the frequent itemset L k -1
    • 2. Scan the transaction database to determine the support for each candidate itemset in C k
    • 3. Save the frequent itemsets in L k
  • 9. Smart Web Search Agents
    • Data Search Engines >> Information Search Agents
    • - Traditional searching on the Web is done using one of the following three:
    • - Directories (Yahoo, Lycos, etc)
    • - Search Engines (AltaVista, NorthernLight, etc)
    • - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc)
    • All of these involve keyword searches; Drawback: not easily personalized,
    • too many results (although many give relevancy factors)
  • 10.
    • - local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!)
    • - local cache information base (containing mined information and discovered knowledge for efficient personal use)
    • - domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)
  • 11. Intelligent Tools for E-Business
    • Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems
    • Learning Algorithms, Heuristic Searching
    • Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery
    • Prediction & Time Series Analysis
    • Information Retrieval, Intelligent User Interface
    • Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems
  • 12. Enhancing E-Business Process Through Data Mining
    • Quality of discovered knowledge
      • Having right data
      • Having appropriate data mining tools!!!
    • Traditional Data Mining Tools
      • Simple query and reporting
      • Visualization driven data exploration tools, OLAP
      • Discovery process is user driven
  • 13. Intelligent Data Mining Tools
    • Automate the process of discovering patterns/knowledge in data
    • Require hypothesis, exploration
    • Derive business knowledge (patterns) from data
    • Combine business knowledge of users with results of discovery algorithms
  • 14. Intelligent Information Agents
    • The Data Mining Problem:
      • Clustering/ Classification
      • Association
      • Sequencing
    • Viewed as an Optimization Problem
    • Tools: Genetic Algorithms
  • 15. Fuzzy Rules Discovering
    • Rules discovering : The discovery of associations between business events, i.e. which items are purchased together
    • In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge
    • Fuzzy Query uses fuzzy terms like tall , small , and near to define linguistic concepts and formulate a query
    • Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data