An Evaluation of Commercial Data Mining Proposed and Presented by  Emily Davis Supervisor: John Ebden
Statement of the Problem <ul><li>An Evaluation of Commercial Data Mining Capabilities, for example Oracle9i’s Data Mining ...
Background <ul><li>Data mining is a relatively new offshoot of </li></ul><ul><li>database technology which has arisen as a...
<ul><li>Data mining discovers the patterns in data that represent knowledge.  </li></ul><ul><li>It is of interest what alg...
<ul><li>#             data a  data b         data c  </li></ul><ul><li>1.00000000  0.71132700  0.15379400  1.88403600 2.00...
<ul><li>49.0000000  0.07845276  0.69584199  2.24443147  </li></ul><ul><li>50.0000000  0.07548299  0.52973340  1.74016616  ...
<ul><li>51 st  value calculated by Excel: 4.37385831   </li></ul><ul><li>Value calculated using Knowledge Miner – a Macint...
<ul><li>Experiment repeated using three columns of random numbers and this equation: </li></ul><ul><li>Data d  = 23*(data ...
<ul><li>These were generated by Excel: </li></ul><ul><li>14.7314558  </li></ul><ul><li>12.0720505  </li></ul><ul><li>22.00...
Plan of Action <ul><li>Literature Survey (and other resources) </li></ul><ul><li>Install Software for Oracle  </li></ul><u...
Install Software for Oracle   <ul><li>Including JDeveloper </li></ul><ul><li>May be extended to the installation of other ...
Investigate Oracle9i’s Data Mining Suite   <ul><li>Two major algorithm types – supervised and unsupervised learning. </li>...
Get to know the Oracle DM Suite (a major task).   <ul><li>Explore JDeveloper, Oracle9i’s Java based API. </li></ul><ul><li...
Addressing the Problem:   <ul><li>Run the different algorithms available in the data mining suite. </li></ul><ul><li>Docum...
Expected Results :  <ul><li>The ability to say conclusively whether Oracle's data mining capabilities are inferior or supe...
Possible Extensions to the Project:   <ul><li>To have sufficient knowledge of the topic to give recommendations or feedbac...
Literature Survey <ul><li>Principles of data mining  by David Hand, Heikki Mannila and Padhraic Smyth, Cambridge Massachus...
<ul><li>Data Mining  by Pieter Adriaans and Dolf Zantinge, Harlow, England, Addison Wesley, 1996 – real life application <...
<ul><li>Mastering Data Mining: The Art and Science of Customer Relationship Management  by Michael J.A. Berry and Gordon S...
<ul><li>The White Paper:  Data Mining- Beyond Algorithms  by Dr Akeel Al-Attar, available at  http://www.attar.com/tutor/m...
Upcoming SlideShare
Loading in …5
×

An Investigation into Commercial Data Mining

621 views
549 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
621
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An Investigation into Commercial Data Mining

  1. 1. An Evaluation of Commercial Data Mining Proposed and Presented by Emily Davis Supervisor: John Ebden
  2. 2. Statement of the Problem <ul><li>An Evaluation of Commercial Data Mining Capabilities, for example Oracle9i’s Data Mining Suite. </li></ul>
  3. 3. Background <ul><li>Data mining is a relatively new offshoot of </li></ul><ul><li>database technology which has arisen as a result </li></ul><ul><li>of the ability of computers to: </li></ul><ul><li>Store vast quantities of data in data warehouses. </li></ul><ul><li>Implement ingenious algorithms for the mining of data. </li></ul><ul><li>Use these algorithms to analyse these vast quantities of data in a reasonable amount of time. </li></ul>
  4. 4. <ul><li>Data mining discovers the patterns in data that represent knowledge. </li></ul><ul><li>It is of interest what algorithms data mining suites use and how well each category of data mining algorithm performs on data and what kind of results are produced. </li></ul><ul><li>Another important issue is usability of the algorithm. </li></ul><ul><li>Random Number Example taken from http://www.saltspring.com/brochmann/math/mining/mining1.html </li></ul>
  5. 5. <ul><li>#           data a data b        data c </li></ul><ul><li>1.00000000 0.71132700 0.15379400 1.88403600 2.00000000 0.62219935 0.83119106 3.73797189 3.00000000 0.33872289 0.80881084 3.10387831 4.00000000 0.54262732 0.35427095 2.14806749 5.00000000 0.50631348 0.71599532 3.16061290 6.00000000 0.00132503 0.22447315 0.67606951 7.00000000 0.76211535 0.94620700 4.36285170 8.00000000 0.91026206 0.89499186 4.50549970 9.00000000 0.92640874 0.47156928 3.26752532 10.0000000 0.49323546 0.27673696 1.81668179 11.0000000 0.04501477 0.30142353 0.99430013 12.0000000 0.49180000 0.17909135 1.52087404 13.0000000 0.06747225 0.85629071 2.70381663 14.0000000 0.84239974 0.41916601 2.94229750 </li></ul>
  6. 6. <ul><li>49.0000000 0.07845276 0.69584199 2.24443147 </li></ul><ul><li>50.0000000 0.07548299 0.52973340 1.74016616 </li></ul><ul><li>51.0000000 0.72301849 0.97594044 ???????? </li></ul><ul><li>Data A and B random numbers generated in Excel. </li></ul><ul><li>Data c = 2*(data a) + 3*(data b). </li></ul>
  7. 7. <ul><li>51 st value calculated by Excel: 4.37385831 </li></ul><ul><li>Value calculated using Knowledge Miner – a Macintosh data mining tool: </li></ul><ul><li>4.34791231 and the equation : </li></ul><ul><li>1.97*(data a) + 2.96*(data b) + 0.0324 </li></ul>
  8. 8. <ul><li>Experiment repeated using three columns of random numbers and this equation: </li></ul><ul><li>Data d = 23*(data a)-4.5*(data b)+(data a + data c) . </li></ul><ul><li>The last five entries for Data D were missing from the column. </li></ul>
  9. 9. <ul><li>These were generated by Excel: </li></ul><ul><li>14.7314558 </li></ul><ul><li>12.0720505 </li></ul><ul><li>22.0008992 </li></ul><ul><li>7.52633344 </li></ul><ul><li>5.25167700 </li></ul><ul><li>These are what Knowledge Miner predicted: </li></ul><ul><li>14.7341613 </li></ul><ul><li>12.0731391 </li></ul><ul><li>22.0080223 </li></ul><ul><li>7.52465867 </li></ul><ul><li>5.24861860 </li></ul>
  10. 10. Plan of Action <ul><li>Literature Survey (and other resources) </li></ul><ul><li>Install Software for Oracle </li></ul><ul><li>Get to know the Oracle Suite </li></ul><ul><li>Evaluate Oracle9i’s Data Mining Suite </li></ul>
  11. 11. Install Software for Oracle <ul><li>Including JDeveloper </li></ul><ul><li>May be extended to the installation of other commercial data mining suites eg. </li></ul><ul><li>DB2’s Intelligent Miner </li></ul><ul><li>Informix’s Data Mine </li></ul>
  12. 12. Investigate Oracle9i’s Data Mining Suite <ul><li>Two major algorithm types – supervised and unsupervised learning. </li></ul><ul><li>A Medical Example: </li></ul><ul><li>Supervised learning – researchers input medical profiles into a leukaemia model to predict propensity for the disease. </li></ul><ul><li>Unsupervised learning – searches for clusters of related information in data sets to reveal insights about diseases and patient populations. </li></ul>
  13. 13. Get to know the Oracle DM Suite (a major task). <ul><li>Explore JDeveloper, Oracle9i’s Java based API. </li></ul><ul><li>JDeveloper complies with JDM (Java Data Mining) used by Oracle, Sun, IBM and others. </li></ul><ul><li>Explore DM4J( Data Mining for Java) the new Graphical User Interface for Oracle DM. </li></ul>
  14. 14. Addressing the Problem: <ul><li>Run the different algorithms available in the data mining suite. </li></ul><ul><li>Document and analyse results in terms of performance and effectiveness of algorithm. </li></ul>
  15. 15. Expected Results : <ul><li>The ability to say conclusively whether Oracle's data mining capabilities are inferior or superior to anything else in the market place and why this can be stated. </li></ul>
  16. 16. Possible Extensions to the Project: <ul><li>To have sufficient knowledge of the topic to give recommendations or feedback: </li></ul><ul><li>to Oracle regarding their data mining suite. </li></ul><ul><li>to IT customers wanting to purchase data mining suites. </li></ul><ul><li>Explore the field of Random stereograms- could a computer see them? If not, why not? </li></ul>
  17. 17. Literature Survey <ul><li>Principles of data mining by David Hand, Heikki Mannila and Padhraic Smyth, Cambridge Massachusetts, MIT Press, 2001 – algorithmic concepts </li></ul><ul><li>Data mining: concepts and techniques by Jiawei Han and Micheline Kamber, San Francisco, California, Morgan Kauffmann, 2001 – algorithmic evaluations </li></ul><ul><li>Data mining: a tutorial- based primer by Richard J. Roiger and Michael W. Geatz, Boston, Massachusetts, Addison Wesley, 2003 - practical knowledge and processing </li></ul>
  18. 18. <ul><li>Data Mining by Pieter Adriaans and Dolf Zantinge, Harlow, England, Addison Wesley, 1996 – real life application </li></ul><ul><li>Data Mining and Statistical Analysis Using SQL by Robert P. Trueblood and John N. Lovett, Jnr., USA, Apress, 2001 – statistical principles </li></ul><ul><li>Data Mining Using SAS Applications by George Fernandez, USA, Chapman and Hall/CRC, 2003 - methodologies </li></ul>
  19. 19. <ul><li>Mastering Data Mining: The Art and Science of Customer Relationship Management by Michael J.A. Berry and Gordon S. Linoff, USA, Wiley Computer Publishing, 2000 – building effective models </li></ul><ul><li>Data Preparation for Data Mining by Dorian Pyle, San Francisco, California, Morgan Kauffman, 2000 – Demo code, </li></ul><ul><li>10 Golden Rules. </li></ul>
  20. 20. <ul><li>The White Paper: Data Mining- Beyond Algorithms by Dr Akeel Al-Attar, available at http://www.attar.com/tutor/mining.htm </li></ul><ul><li>Summary from the KDD-03 Panel—Data Mining: The Next Ten Years available at http://www.acm.org/sigs/sigkdd/explorations/issue5-2/pnl_10yrs_final1.pdf </li></ul><ul><li>Oracle Website </li></ul><ul><li>Oracle Magazine </li></ul>

×