Data Mining  with JDM API Regina Wang
Data Mining <ul><li>Knowledge-Discovery in Databases (KDD) </li></ul><ul><li>Searching large volumes of data for patterns....
Descriptive Statistics <ul><li>Collect data  </li></ul><ul><li>Classify data  </li></ul><ul><li>Summarize data  </li></ul>...
Machine Learning <ul><li>Concerned with the development of techniques which allow computers to &quot;learn&quot;.  </li></...
Common Machine Learning Algorithm <ul><li>Supervised learning—prior knowledge </li></ul><ul><li>Unsupervised learning—stat...
Pattern Recognition <ul><li>The act of taking in raw data and taking an action based on the category of the data. </li></u...
Supervised Techniques <ul><li>Classification: </li></ul><ul><li>--  k -Nearest Neighbors </li></ul><ul><li>--Naïve Bayes <...
Supervised Techniques <ul><li>Prediction (Estimation): </li></ul><ul><li>--Regression </li></ul><ul><li>--Regression Trees...
Unsupervised Techniques <ul><li>Cluster Analysis </li></ul><ul><li>Principle Components </li></ul><ul><li>Association Rule...
<ul><li>Data-mining tools were traditionally provided in products with vendor-specific interfaces. </li></ul><ul><li>The J...
JDM Current Versions <ul><li>JDM 1.0 (JSR 73) final specification in August, 2004 </li></ul><ul><li>http:// www.jcp.org/en...
Data Mining System <ul><li>A typical data-mining system consists of </li></ul><ul><li>--a data-mining engine  </li></ul><u...
JDM Architectural components <ul><li>Application programming interface (API) </li></ul><ul><li>Data mining engine (DME)  –...
Key JDM API benefit : abstracts out the physical components, tasks, and algorithms to java classes Figure 1. Components of...
Building a data-mining model   <ul><li>Decide what you want to learn. </li></ul><ul><li>Select and prepare your data.  </l...
Data Mining Process Figure 2. Data mining steps.
Usage of JDM API  <ul><li>Using JDM to explore mining object repository (MOR) and find out what models and model building ...
Figure 3. Top level packages.   Figure 4. Top level interfaces.
Figure 4. Top level interfaces.
Using the JDM API <ul><li>Identify the data  you wish to use to build your model—your  build data —with a URL that points ...
Using the JDM API <ul><li>Specify  the parameters to your data-mining  algorithms   </li></ul><ul><li>Create a build task ...
Using data model and results <ul><li>Once you've created a model, you can test that model, and then even apply the model t...
JDM Data Connection <ul><li>A JDM connection is represented by the  engine  variable, which is of type javax.datamining.re...
JDM Data Connection <ul><li>Build data is referenced via a PhysicalDataSet object, which, in turn, loads the data from a f...
Code Example: Building a clustering model <ul><li>// Create the physical representation of the data </li></ul><ul><li>(1) ...
Code Example: Building a clustering model con’t <ul><li>(11) clusteringSettings.setMinClusterCaseCount( 5 ); </li></ul><ul...
References <ul><li>Java Data Mining Specification </li></ul><ul><li>http://www.jcp.org/en/jsr/detail?id=73  </li></ul><ul>...
Upcoming SlideShare
Loading in...5
×

Data Mining with JDM API by Regina Wang (4/11)

809

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
809
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Mining with JDM API by Regina Wang (4/11)

  1. 1. Data Mining with JDM API Regina Wang
  2. 2. Data Mining <ul><li>Knowledge-Discovery in Databases (KDD) </li></ul><ul><li>Searching large volumes of data for patterns. </li></ul><ul><li>The nontrivial extraction of implicit, previously known, and potentially useful information from data. </li></ul><ul><li>The science of extracting useful information from large data sets or databases. </li></ul><ul><li>Uses computational techniques from statistics, machine learning, and pattern recognition . </li></ul>
  3. 3. Descriptive Statistics <ul><li>Collect data </li></ul><ul><li>Classify data </li></ul><ul><li>Summarize data </li></ul><ul><li>present data </li></ul><ul><li>Make inferences to draw a conclusions </li></ul><ul><li>--Point and interval estimation </li></ul><ul><li>--Hypothesis testing </li></ul><ul><li>--Prediction </li></ul>
  4. 4. Machine Learning <ul><li>Concerned with the development of techniques which allow computers to &quot;learn&quot;. </li></ul><ul><li>Concerned with the algorithmic complexity of computational implementations. </li></ul><ul><li>Many inference problems turn out to be NP-hard or harder . </li></ul>
  5. 5. Common Machine Learning Algorithm <ul><li>Supervised learning—prior knowledge </li></ul><ul><li>Unsupervised learning—statistical regularity of the patterns </li></ul><ul><li>Semi-supervised learning </li></ul><ul><li>Reinforcement learning </li></ul><ul><li>Transduction </li></ul><ul><li>Learning to learn </li></ul>
  6. 6. Pattern Recognition <ul><li>The act of taking in raw data and taking an action based on the category of the data. </li></ul><ul><li>Aims to classify data patterns based on prior knowledge or on statistical info. </li></ul><ul><li>Based on availability of training set: supervised and unsupervised leanings </li></ul><ul><li>Two approaches: statistical (decision theory) and syntactic (structural). </li></ul>
  7. 7. Supervised Techniques <ul><li>Classification: </li></ul><ul><li>-- k -Nearest Neighbors </li></ul><ul><li>--Naïve Bayes </li></ul><ul><li>--Classification Trees </li></ul><ul><li>--Descriminant Analysis </li></ul><ul><li>--Logistic Regression </li></ul><ul><li>--Neural Nets </li></ul>
  8. 8. Supervised Techniques <ul><li>Prediction (Estimation): </li></ul><ul><li>--Regression </li></ul><ul><li>--Regression Trees </li></ul><ul><li>-- k -Nearest Neighbors </li></ul>
  9. 9. Unsupervised Techniques <ul><li>Cluster Analysis </li></ul><ul><li>Principle Components </li></ul><ul><li>Association Rules </li></ul><ul><li>Collaborative Filtering </li></ul>
  10. 10. <ul><li>Data-mining tools were traditionally provided in products with vendor-specific interfaces. </li></ul><ul><li>The Java Data Mining API (JDM) defines a common Java API to interact with data-mining systems. </li></ul><ul><li>Developed by Java Community Data Mining Expert Group </li></ul>JAVA Data Mining API (JDM)
  11. 11. JDM Current Versions <ul><li>JDM 1.0 (JSR 73) final specification in August, 2004 </li></ul><ul><li>http:// www.jcp.org/en/jsr/detail?id =73 </li></ul><ul><li>JDM 2.0 (JSR 247) Early Review </li></ul><ul><li>http:// www.jcp.org/en/jsr/detail?id =247 </li></ul><ul><li>JDM is for the Java™ 2 Platform (J2EE™) and (J2SE™) </li></ul>
  12. 12. Data Mining System <ul><li>A typical data-mining system consists of </li></ul><ul><li>--a data-mining engine </li></ul><ul><li>--a repository that persists the data-mining artifacts, such as the models, created in the process. </li></ul><ul><li>The actual data is obtained via a database connection, or via a file-system API. </li></ul>
  13. 13. JDM Architectural components <ul><li>Application programming interface (API) </li></ul><ul><li>Data mining engine (DME) – or data mining server (DMS), provides the infrastructure that offers a set of data mining services to its API clients. </li></ul><ul><li>Mining object repository (MOR) - The DME uses a mining object repository which serves to persist data mining objects </li></ul>
  14. 14. Key JDM API benefit : abstracts out the physical components, tasks, and algorithms to java classes Figure 1. Components of a data-mining system
  15. 15. Building a data-mining model <ul><li>Decide what you want to learn. </li></ul><ul><li>Select and prepare your data. </li></ul><ul><li>Choose mining tasks and configure the mining algorithms. </li></ul><ul><li>Build your data-mining model. </li></ul><ul><li>Test and refine the models. </li></ul><ul><li>Report findings or predict future outcomes. </li></ul>
  16. 16. Data Mining Process Figure 2. Data mining steps.
  17. 17. Usage of JDM API <ul><li>Using JDM to explore mining object repository (MOR) and find out what models and model building parameters work best. </li></ul><ul><li>Follow a few simple steps that map the process to JDM interactions. </li></ul><ul><li>Build Java Data Mining GUI Application </li></ul>
  18. 18. Figure 3. Top level packages. Figure 4. Top level interfaces.
  19. 19. Figure 4. Top level interfaces.
  20. 20. Using the JDM API <ul><li>Identify the data you wish to use to build your model—your build data —with a URL that points to that data. </li></ul><ul><li>Specify the type of model you want to build, and parameters to the build process. Such parameters are termed build settings in JDM. such as clustering, classification, or association rules. These tasks are represented by API classes. </li></ul><ul><li>Create a logical representation of your data to select certain attributes of the physical data, and then map those attributes to logical values. </li></ul>
  21. 21. Using the JDM API <ul><li>Specify the parameters to your data-mining algorithms </li></ul><ul><li>Create a build task , and apply to that task the physical data references and the build settings. </li></ul><ul><li>Finally, you execute the task . The outcome of that execution is your data model. That model will have a signature —a kind of interface—that describes the possible input attributes for later applying the model to additional data. </li></ul>
  22. 22. Using data model and results <ul><li>Once you've created a model, you can test that model, and then even apply the model to additional data. Building, testing, and applying the model to additional data is an iterative process that, ideally, yields increasingly accurate models. </li></ul><ul><li>Those models can then be saved in the MOR, and used to either explain data, or to predict the outcome of new data in relation to your data-mining objective. </li></ul>
  23. 23. JDM Data Connection <ul><li>A JDM connection is represented by the engine variable, which is of type javax.datamining.resource.Connection. JDM connections are very similar to JDBC connections, with one connection per thread. </li></ul><ul><li>PhysicalDataSetFactory dataSetFactory = (PhysicalDataSetFactory) engine.getFactory(&quot;javax.datamining.data.PhysicalDataSet&quot;); </li></ul>
  24. 24. JDM Data Connection <ul><li>Build data is referenced via a PhysicalDataSet object, which, in turn, loads the data from a file or a database table, referenced with a URL. </li></ul><ul><li>PhysicalDataSet dataSet = pdsFactory.create( &quot;file:///export/data/textFileData.data&quot;, true); </li></ul>
  25. 25. Code Example: Building a clustering model <ul><li>// Create the physical representation of the data </li></ul><ul><li>(1) PhysicalDataSetFactory pdsFactory = (PhysicalDataSetFactory) dme- </li></ul><ul><li>Conn.getFactory( “javax.datamining.data.PhysicalDataSet” ); </li></ul><ul><li>(2) PhysicalDataSet buildData = pdsFactory.create( uri, true ); </li></ul><ul><li>(3) dmeConn.saveObject( “myBuildData”, buildData, false ); </li></ul><ul><li>// Create the logical representation of the data from physical data </li></ul><ul><li>(4) LogicalDataFactory ldFactory = (LogicalDataFactory) dmeConn.getFactory( </li></ul><ul><li>“ javax.datamining.data.LogicalData” ); </li></ul><ul><li>(5) LogicalData ld = ldFactory.create( buildData ); </li></ul><ul><li>(6) dmeConn.saveObject( “myLogicalData”, ld, false ); </li></ul><ul><li>// Create the settings to build a clustering model </li></ul><ul><li>(7) ClusteringSettingsFactory csFactory = (ClusteringSettingsFactory) dme- </li></ul><ul><li>Conn.getFactory( “javax.datamining.clustering.ClusteringSettings”); </li></ul><ul><li>(8) ClusteringSettings clusteringSettings = csFactory.create(); </li></ul><ul><li>(9) clusteringSettings.setLogicalDataName( “myLogicalData” ); </li></ul><ul><li>(10) clusteringSettings.setMaxNumberOfClusters( 20 ); </li></ul>
  26. 26. Code Example: Building a clustering model con’t <ul><li>(11) clusteringSettings.setMinClusterCaseCount( 5 ); </li></ul><ul><li>(12) dmeConn.saveObject( “myClusteringBS”, clusteringSettings, false ); </li></ul><ul><li>// Create a task to build a clustering model with data and settings </li></ul><ul><li>(13) BuildTaskFactory btFactory = (BuildTaskFactory) dmeConn.getFactory( </li></ul><ul><li>“ javax.datamining.task.BuildTask” ); </li></ul><ul><li>(14) BuildTask task = btFactory.create( “myBuildData”, “myClusteringBS”, </li></ul><ul><li>“ myClusteringModel” ); </li></ul><ul><li>(15) dmeConn.saveObject( “myClusteringTask”, task, false ); </li></ul><ul><li>// Execute the task and check the status </li></ul><ul><li>(16) ExecutionHandle handle = dmeConn.execute( “myClusteringTask” ); </li></ul><ul><li>(17) handle.waitForCompletion( Integer.MAX_VALUE ); // wait until done </li></ul><ul><li>(18) ExecutionStatus status = handle.getLatestStatus(); </li></ul><ul><li>(19) if( ExecutionState.success.equals( status.getState() ) ) </li></ul><ul><li>(20) // task completed successfully... </li></ul>
  27. 27. References <ul><li>Java Data Mining Specification </li></ul><ul><li>http://www.jcp.org/en/jsr/detail?id=73 </li></ul><ul><li>Mine Your Own Data with the JDM API, Frank Sommers, July 7, 2005 </li></ul><ul><li>http://www.artima.com/lejava/articles/data_mining.html </li></ul><ul><li>http://www.stanford.edu/class/cs345a/#handouts </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×