Slide 1 - Department of Computer Science

427 views
385 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
427
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Slide 1 - Department of Computer Science

  1. 1. Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden
  2. 2. Presentation Outline <ul><li>Problem Statement </li></ul><ul><li>Objective </li></ul><ul><li>Background </li></ul><ul><li>Expected Results </li></ul><ul><li>Possible Extensions </li></ul><ul><li>Plan of action </li></ul><ul><li>Timeline </li></ul><ul><li>Literature Survey </li></ul><ul><li>Questions </li></ul>
  3. 3. Problem Statement <ul><li>The commercial world is fast reacting to the growth & potential in the DM area, as a wide range of tools are being marketed as DM suites. </li></ul><ul><li>Examples of these are: </li></ul><ul><ul><ul><li>Oracle DM </li></ul></ul></ul><ul><ul><ul><li>DB2’s Intelligent Miner </li></ul></ul></ul><ul><ul><ul><li>Informix’s Data Mine </li></ul></ul></ul><ul><ul><ul><li>SQL Data miner </li></ul></ul></ul><ul><ul><ul><li>Ghost miner </li></ul></ul></ul><ul><ul><ul><li>Clementine 9.0 (SPSS) </li></ul></ul></ul><ul><ul><ul><li>SAS </li></ul></ul></ul><ul><ul><ul><li>Gornish systems, etc </li></ul></ul></ul>
  4. 4. Problem <ul><li>It is vital to know the algorithms a DM suite uses and which algorithm to use on a particular data set. </li></ul><ul><li>Secondly, how well each algorithm performs in terms of accuracy, efficiency and effectiveness when using a particular DM suite e.g. Oracle DM . </li></ul>
  5. 5. Objective <ul><li>Investigate two types of algorithms available in Oracle for data mining (ODM). </li></ul><ul><li>Apply the two algorithms to actual data. </li></ul><ul><ul><ul><li>Analyse & </li></ul></ul></ul><ul><ul><ul><li>Evaluate </li></ul></ul></ul><ul><ul><li>results in terms of performance . </li></ul></ul>
  6. 6. What is Data Mining? ( Background) <ul><li>Simply put, DM is knowledge discovery. </li></ul><ul><li>DM is the process of automatic discovery of [hidden] patterns and relationships within enormous amounts of data. </li></ul><ul><li>It is a powerful & new technology that allows businesses to make proactive, knowledge-driven decisions as it tries to predict the future. </li></ul><ul><li>Data (represents knowledge) normally stored in databases and data warehouses ( typical size in tera-bytes). </li></ul>
  7. 7. Automatic discovery is implemented by the use of algorithms provided by DM suites <ul><li>E.g. oracle offers: </li></ul><ul><li>Adaptive Bayes Network supporting decision trees (classification) </li></ul><ul><li>Naive Bayes (classification) </li></ul><ul><li>Model Seeker (classification) </li></ul><ul><li>k -Means (clustering) </li></ul><ul><li>O-Cluster (clustering) </li></ul><ul><li>Predictive variance (attribute importance) </li></ul><ul><li>Apriori (association rules) </li></ul>
  8. 8. <ul><li>Algorithms are grouped as either supervised or unsupervised learning strategies. </li></ul>DM strategies Unsupervised learning Supervised learning Classification Naive Bayes Model Seeker Adaptive Bayes Estimation Prediction Predictive variance Clustering k-Means O-Cluster Input attributes and output one or more attributes Input attributes but have no output attributes
  9. 9. <ul><ul><ul><ul><ul><li>The data mining process involves a series of steps to define a business problem, gather and prepare the data, build and evaluate mining models, and apply the models and disseminate the new information. </li></ul></ul></ul></ul></ul>
  10. 10. Expected Results <ul><li>Aim at conclusively saying which algorithm will be most effective and suitable for the process of data mining on any dataset - since datasets are different. </li></ul>
  11. 11. Possible Extensions to the Project: <ul><li>testing of the same algorithms with different tools offered by other vendors. </li></ul><ul><li>e.g. testing with the DM suite in SQL and checking if the results are similar . </li></ul><ul><li>If not, investigating why the results are different, could be another extension. </li></ul>
  12. 12. Plan of Action <ul><li>Carry out a literature search: </li></ul><ul><ul><li>mainly to obtain background knowledge and understanding of field. </li></ul></ul><ul><li>Get to know Oracle DM Suite: </li></ul><ul><ul><li>Do DM tutorials provided by oracle. </li></ul></ul><ul><ul><li>The server Ora1 is the machine I’ll be working with. </li></ul></ul><ul><ul><li>It is already installed with JDeveloper & oracle 10g database, oracle 9i DM . </li></ul></ul>
  13. 13. Timeline Due 7/11 Final project write up September vacation and 3rd term Write up paper Second semester Apply algorithms to data found then Critically Analyse & assess results 2nd term- End of May Search databases for testing (possibilities: AIDS data & faculty data) 2nd term- End of May Find suitable computerised case studies of the use of above algorithms – with or without Oracle. 2nd term- 15 to 30 April Investigate Clustering & Classification algorithms (theory) done Continuation from literature and tutorials
  14. 14. Literature Survey <ul><li>Richard J. Roiger and Michael W. Geatz , Data mining: a tutorial- based primer. Boston, Massachusetts, Addison Wesley, 2003; </li></ul><ul><li>This book will provide the necessary background and practical knowledge required for the project research and also presents different methodologies used in data mining that may be useful. </li></ul>
  15. 15. <ul><li>David Hand, Heikki Mannila and Padhraic Smyth, Principles of data mining . </li></ul><ul><li>Cambridge Massachusetts, MIT Press, 2001. </li></ul><ul><li>Jesus Mena, Data mining your website. Digital Press, 1999. </li></ul><ul><li>Jiawei Han and Micheline Kamber, Data mining: concepts and techniques </li></ul><ul><li>San Francisco, California, Morgan Kauffmann, 2001 </li></ul><ul><li>Robert P. Trueblood and John N. Lovett, Jnr. Data Mining and Statistical Analysis Using SQL , USA, Apress, </li></ul><ul><li>http://www.lc.leidenuniv.nl/awcourse/oracle/datamine.920/a95961/preface.htm </li></ul><ul><li>http://www.oracle.com/technology/products/oracle9i/htdocs/o9idm_faq.html </li></ul><ul><li>http://fas.sfu.ca/cs/research/groups/DB/sections/publication/kdd/kdd.html . </li></ul>
  16. 16. Questions? Thank you

×