Your SlideShare is downloading. ×
0
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Data Mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Mining

631

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
631
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang
  • 2. Outline <ul><li>Introduction </li></ul><ul><li>Mining Process </li></ul><ul><li>Main Functionalities of Intelligent Miner </li></ul><ul><li>Other Data Mining Products </li></ul><ul><li>Data Mining and Privacy </li></ul><ul><li>Summary </li></ul><ul><li>References </li></ul>
  • 3. What is Data Mining <ul><li>Data mining : discovering interesting patterns from large amounts of data </li></ul><ul><ul><li>Knowledge discovery (mining) in databases (KDD), data/pattern analysis, information harvesting, business intelligence, etc . </li></ul></ul>
  • 4. Evolution of Database Technology <ul><li>1960s: </li></ul><ul><ul><li>Data collection, database creation </li></ul></ul><ul><li>1970s: </li></ul><ul><ul><li>Relational data model, relational DBMS implementation </li></ul></ul><ul><li>1980s ~ present: </li></ul><ul><ul><li>RDBMS, advanced data models 1990s—2000s: </li></ul></ul><ul><ul><li>Data mining and data warehousing, multimedia databases, and Web databases </li></ul></ul>
  • 5. Data Mining VS. Database Query <ul><li>Database </li></ul><ul><li>Data Mining </li></ul><ul><ul><ul><ul><ul><li>Find all customers who have purchased milk </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Find all items which are frequently purchased with milk. (association rules) </li></ul></ul></ul></ul></ul><ul><ul><li>Identify customers who have purchased more than $10,000 in the last month. </li></ul></ul><ul><ul><ul><ul><ul><li>Identify customers with similar buying habits. (Clustering) </li></ul></ul></ul></ul></ul>
  • 6. Data Mining Process (KDD) Data Cleaning Databases Data Warehouse Knowledge Task-relevant Data Selection Data Mining Pattern Evaluation J. Han. and M. Kamber. Data Mining: Concepts and Techniques,2001
  • 7. About DB2 Intelligent Miner <ul><li>DB2 Intelligent Miner for Data “ focused on the large-scale mining, such as large volumes of data, parallel data mining on Windows NT, Sun Solaris, and OS/390 ” – IBM </li></ul>
  • 8. Main Functionalities <ul><li>Cluster analysis </li></ul><ul><ul><li>Group the data that share similar trends and patterns </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Predict the outcome based on historical data </li></ul></ul><ul><li>Association analysis </li></ul><ul><ul><li>Finding frequent patterns . </li></ul></ul>
  • 9.  
  • 10.  
  • 11.  
  • 12.  
  • 13.  
  • 14.  
  • 15.  
  • 16.  
  • 17.  
  • 18. This follows an example from Quinlan’s ID3 Classification
  • 19.  
  • 20. Classification
  • 21. This follows an example from Quinlan’s ID3 Classification
  • 22. Association <ul><ul><li>Association Rule: identifies relationships </li></ul></ul><ul><ul><li>Example </li></ul></ul><ul><ul><li> “ 30% customers buy shirts in all the transactions, 60% of these customers </li></ul></ul><ul><ul><li>will also by a tie” </li></ul></ul><ul><ul><ul><li>Confidence factor is 60% </li></ul></ul></ul><ul><ul><ul><li>Support – if buying shirt and tie together is observed in 12% of all transactions, then the support is thus 12% </li></ul></ul></ul><ul><ul><ul><li>Lift = 60% / 30%=2 </li></ul></ul></ul>
  • 23. Association <ul><li>Support Confidence Type Lift Rule Body Rule Head </li></ul><ul><li>(%) (%) </li></ul><ul><li>5.5286 34.0800 + 2.7300 [203] + [1207] => [1716] </li></ul><ul><li>7.0388 34.1300 + 2.7400 [203] + [1719] => [1716] </li></ul><ul><li>5.4662 34.1700 + 2.7400 [202] + [802] => [1716] </li></ul><ul><li>5.8805 34.3400 + 2.7500 [203] + [802] => [1716] </li></ul><ul><li>5.0163 34.4900 + 2.7600 [203] + [705] => [1716] </li></ul><ul><li>7.1279 34.7400 + 2.7800 [202] + [1718] => [1716] </li></ul><ul><li>5.8226 34.7600 + 3.3900 [711] + [203] => [710] </li></ul><ul><li>5.0697 34.8300 + 2.7400 [202] + [1702] => [1703] </li></ul><ul><li>5.2836 34.8300 + 2.7400 [202] + [1207] => [1703] </li></ul><ul><li>5.4350 34.9400 + 3.4100 [201] + [711] => [710] </li></ul><ul><li>5.3459 35.0200 + 2.7600 [201] + [1702] => [1703] </li></ul>
  • 24. Data Mining Products <ul><li>more than 50 commercial data mining tools </li></ul><ul><li>Wide range of pricing </li></ul><ul><ul><li>SAS Institute’s Enterprise Miner ~ $80k </li></ul></ul><ul><ul><li>SPSS Inc. Clementine ~ 75K </li></ul></ul><ul><ul><li>IBM Intelligent Miner ~ $60k </li></ul></ul><ul><ul><li>Desktop products start at few hundred dollars </li></ul></ul>
  • 25. Data Mining Products Data Ming Product Comparison on Algorithm √ Nearest Neighbour √ √ Association √ √ Clustering √ √ Kohonen Self- Organizing Map √ √ √ Decision Tree √ √ √ Neural Network SPSS SAS IBM Algorithm
  • 26. Data Mining & Privacy <ul><li>Release limited subset of data </li></ul><ul><ul><li>Hide attributes that potentially related to personal information </li></ul></ul><ul><li>Release Encrypted Data </li></ul><ul><li>Audit to detect misuse of Data </li></ul><ul><li>Set up Data Mining Controller </li></ul>
  • 27. Summary <ul><li>Introduction to Data Mining </li></ul><ul><li>A KDD Data Mining Process </li></ul><ul><li>Functionalities of Intelligent Miner </li></ul><ul><li>Commercial Data Mining Tools </li></ul><ul><li>Data Mining & Privacy </li></ul>
  • 28. References <ul><li>Angoss Whitepaper: </li></ul><ul><li>http://www.angoss.com/ProdServ/AnalyticalTools/kseeker/whitepaper.html. Retrieved on Oct26th,2003 </li></ul><ul><li>C. Clifton. & D. Marks Security and Privacy Implications of Data Ming.1996 </li></ul><ul><li>D.W. Abbott, I. P. Matkovsky & J. F. Elder IV. An Evaluation of High-end Data Mining Tools </li></ul><ul><li>Elder Research. http://www.rgrossman.com/faq/dm-02.htm . Retrieved on Oct28th,2003 </li></ul><ul><li>IBM. BD2 Intelligent Mine. </li></ul><ul><li>http://www-3.ibm.com/software/data/iminer/ . </li></ul><ul><li>Retrieved on Oct26th,2003 </li></ul><ul><li>J. F. Elder & D. W. Abbott. August, 1988 A comparison of Leading Data Mining Tools </li></ul><ul><li>J. Han. and M. Kamber. Data Mining: Concepts and Techniques, 2000 </li></ul><ul><li>http://www.cald.cs.cmu.edu/summerschool03/PrivacyPreservingDM.ppt Retrieved on Nov 10th,2003 </li></ul><ul><li>Robert Grossman http:// www.datamininglab.com/toolcomp.html#comparison . Retrieved on Oct20th,2003 </li></ul><ul><li>SPSS. http:// www.spss.com / . Retrieved on Nov12th,2003 </li></ul>
  • 29.  
  • 30. Evolution of Database Technology <ul><li>1960s: </li></ul><ul><ul><li>Data collection, database creation, and network DBMS </li></ul></ul><ul><li>1970s: </li></ul><ul><ul><li>Relational data model, relational DBMS implementation </li></ul></ul><ul><li>1980s: </li></ul><ul><ul><li>RDBMS, advanced data models 1990s—2000s: </li></ul></ul><ul><ul><li>Data mining and data warehousing, multimedia databases, and Web databases </li></ul></ul>
  • 31. Data Mining: On What Kind of Data? <ul><li>Data Sources </li></ul><ul><ul><li>Relational database </li></ul></ul><ul><ul><li>Data warehouses </li></ul></ul><ul><ul><li>Transactional databases </li></ul></ul><ul><ul><li>WWW </li></ul></ul><ul><li>Data types </li></ul><ul><ul><li>Audio </li></ul></ul><ul><ul><li>Image </li></ul></ul><ul><ul><li>Text </li></ul></ul>
  • 32. Output: A Decision Tree for “ buys_computer” age? overcast student? credit rating? no yes fair excellent <=30 >40 no no yes yes yes 30..40
  • 33. Neural network  k - f weighted sum Input vector x output y Activation function weight vector w  w 0 w 1 w n x 0 x 1 x n
  • 34. Neural network 0.15 0.29 0.11 0.25 0.09 0.23 0.32 0.27
  • 35. Neural network
  • 36. Applications of Clustering <ul><li>Pattern Recognition </li></ul><ul><li>Image Processing </li></ul><ul><li>Economic Science (especially market research) </li></ul><ul><li>WWW </li></ul><ul><ul><li>Document classification </li></ul></ul><ul><ul><li>Cluster Weblog data to discover groups of similar access patterns </li></ul></ul>
  • 37. Data Mining & Privacy Data Mining Tool Mining Controller Data warehouse
  • 38. Examples of Clustering Applications <ul><li>Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs </li></ul><ul><li>Insurance: Identifying groups of motor insurance policy holders with a high average claim cost </li></ul><ul><li>City-planning: Identifying groups of houses according to their house type, value, and geographical location </li></ul><ul><li>Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults </li></ul>
  • 39. Association <ul><li>Association and pattern analysis </li></ul><ul><ul><li>Applications: </li></ul></ul><ul><ul><ul><li>Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc . </li></ul></ul></ul><ul><ul><li>Examples. </li></ul></ul><ul><ul><ul><li>buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] </li></ul></ul></ul><ul><ul><ul><li>major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%] </li></ul></ul></ul>
  • 40. Data Mining: On What Kind of Data? <ul><li>Relational databases </li></ul><ul><li>Data warehouses </li></ul><ul><li>Transactional databases </li></ul><ul><li>Advanced DB and information repositories </li></ul><ul><ul><li>Object-oriented and object-relational databases </li></ul></ul><ul><ul><li>Text databases and multimedia databases </li></ul></ul><ul><ul><li>Heterogeneous and legacy databases </li></ul></ul><ul><ul><li>WWW </li></ul></ul>
  • 41. Steps of a KDD Process <ul><li>Learning the application domain: </li></ul><ul><ul><li>relevant prior knowledge and goals of application </li></ul></ul><ul><li>Creating a target data set: data selection </li></ul><ul><li>Data cleaning and preprocessing: (may take 60% of effort!) </li></ul><ul><li>Data reduction and transformation : </li></ul><ul><ul><li>Find useful features, dimensionality/variable reduction, invariant representation. </li></ul></ul><ul><li>Choosing functions of data mining </li></ul><ul><ul><li>summarization, classification, regression, association, clustering. </li></ul></ul><ul><li>Choosing the mining algorithm(s) </li></ul><ul><li>Data mining : search for patterns of interest </li></ul><ul><li>Pattern evaluation and knowledge presentation </li></ul><ul><ul><li>visualization, transformation, removing redundant patterns, etc. </li></ul></ul><ul><li>Use of discovered knowledge </li></ul>
  • 42. Strength and Weakness <ul><li>Strength </li></ul><ul><ul><li>Algorithm breadth </li></ul></ul><ul><ul><li>Graphical output </li></ul></ul><ul><ul><li>Available for PC and mainframe environment </li></ul></ul><ul><li>Weakness </li></ul><ul><ul><li>No automation </li></ul></ul><ul><ul><li>Data has to reside in IBM’s database system </li></ul></ul>

×