Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

Lecture 10 on Data Mining and Agents

377

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
377
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
26
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript

• 1. Data Mining Techniques
• Cluster Analysis
• Induction
• Neural Networks
• OLAP
• Data Visualization
• 2. Association Rule
• An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database.
• Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items.
• The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.
• 3. Support
• The support of an item set S is the percentage of those transactions in T which contain S.
• If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.
• 4. Confidence
• Confidence of a candidate rule X Y is calculated as support(XY) / support(X).
• The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y
• 5. Example: Association Rule
• In a store we might have I={cheese,ham,bread,butter,salt,coke}
• A transaction could look like: t={bread,butter} for a customer who bought cheese and coke.
• An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.
• 6. Apriori Algorithm
• Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets.
• Use the frequent itemsets to generate the desired rules.
• 7. Apriori Algorithm(cont’d)
• Pass 1
• Generate the candidate itemsets in C 1
• Save the frequent itemsets in L 1
• Pass k
• Generate the candidate itemsets in Ck from the frequent itemsets in L k -1
• Join L k -1 with L k -1 , as follows: insert into C k select p. item 1 , q. item 1 , . . . , p. item k -1 , q. item k -1 from L k -1 p , L k -1 q where p. item 1 = q .item 1 , . . . , p. item k -1 < q .item k -1
• 8. Apriori Algorithm(cont’d)
• 3. Generate all ( k -1)-subsets from the candidate itemsets in C k
• 4. Prune all candidate itemsets from C k where some ( k -1)-subset of the candidate itemset is not in the frequent itemset L k -1
• 2. Scan the transaction database to determine the support for each candidate itemset in C k
• 3. Save the frequent itemsets in L k
• 9. Smart Web Search Agents
• Data Search Engines >> Information Search Agents
• - Traditional searching on the Web is done using one of the following three:
• - Directories (Yahoo, Lycos, etc)
• - Search Engines (AltaVista, NorthernLight, etc)
• - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc)
• All of these involve keyword searches; Drawback: not easily personalized,
• too many results (although many give relevancy factors)
• 10.
• - local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!)
• - local cache information base (containing mined information and discovered knowledge for efficient personal use)
• - domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)
• 11. Intelligent Tools for E-Business
• Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems
• Learning Algorithms, Heuristic Searching
• Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery
• Prediction & Time Series Analysis
• Information Retrieval, Intelligent User Interface
• Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems
• 12. Enhancing E-Business Process Through Data Mining
• Quality of discovered knowledge
• Having right data
• Having appropriate data mining tools!!!
• Simple query and reporting
• Visualization driven data exploration tools, OLAP
• Discovery process is user driven
• 13. Intelligent Data Mining Tools
• Automate the process of discovering patterns/knowledge in data
• Require hypothesis, exploration
• Derive business knowledge (patterns) from data
• Combine business knowledge of users with results of discovery algorithms
• 14. Intelligent Information Agents
• The Data Mining Problem:
• Clustering/ Classification
• Association
• Sequencing
• Viewed as an Optimization Problem
• Tools: Genetic Algorithms
• 15. Fuzzy Rules Discovering
• Rules discovering : The discovery of associations between business events, i.e. which items are purchased together
• In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge
• Fuzzy Query uses fuzzy terms like tall , small , and near to define linguistic concepts and formulate a query
• Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data