Weka and NetDraw


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Weka and NetDraw

  1. 1. MIS510 Spring 2009 Introduction to Weka and Net D raw
  2. 2. Outline <ul><li>Weka </li></ul><ul><ul><li>Introduction </li></ul></ul><ul><ul><li>Weka Tools/Functions </li></ul></ul><ul><ul><li>How to use Weka ? </li></ul></ul><ul><ul><ul><li>Weka Data File Format (Input) </li></ul></ul></ul><ul><ul><ul><li>Weka for D ata M ining </li></ul></ul></ul><ul><ul><ul><li>Sample O utput from Weka (Output) </li></ul></ul></ul><ul><ul><li>Conclusion </li></ul></ul><ul><li>NetDraw </li></ul><ul><ul><li>Introduction </li></ul></ul><ul><ul><li>How to use NetDraw? </li></ul></ul><ul><ul><ul><li>NetDraw Input Data File Format </li></ul></ul></ul><ul><ul><ul><li>Draw Networks using NetDraw </li></ul></ul></ul><ul><ul><li>Conclusion </li></ul></ul>
  3. 3. Weka
  4. 4. Introduction to Weka (Data Mining Tool) <ul><li>Weka was d eveloped at the University of Waikato in New Zealand. http://www.cs.waikato.ac.nz/ml/weka/ </li></ul><ul><li>Weka is a open source data mining tool developed in Java. It is u sed for research, education, and applications. It can be run on Windows, Linux and Mac. </li></ul>
  5. 5. What can Weka do? <ul><li>Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset (using GUI) or called from your own Java code (using Weka Java library) . </li></ul><ul><li>Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. </li></ul>
  6. 6. Weka Tools/Functions <ul><li>Tools (or functions) in Weka include: </li></ul><ul><ul><li>D ata preprocessing ( e.g. , Data Filters ), </li></ul></ul><ul><ul><li>Classification ( e.g. , BayesNet, KNN, C4.5 Decision Tree, Neural Networks, SVM ) , </li></ul></ul><ul><ul><li>Regression ( e.g. , Linear Regression, Isotonic Regression, SVM for Regression ), </li></ul></ul><ul><ul><li>Clustering ( e.g. , Simple K-means, Expectation Maximization (EM)) , </li></ul></ul><ul><ul><li>Association rules ( e.g. , Apriori Algorithm, Predictive Accuracy, Confirmation Guided ), </li></ul></ul><ul><ul><li>Feature Selection ( e.g. , Cfs Subset Evaluation, Information Gain, Chi-squared Statistic ), and </li></ul></ul><ul><ul><li>Visualization ( e.g. , View different two-dimensional plots of the data ) . </li></ul></ul>
  7. 7. Weka’s R ole in the B ig P icture <ul><li>Input </li></ul><ul><li>Raw data </li></ul><ul><li>Data Ming </li></ul><ul><li>by Weka </li></ul><ul><li>Pre-processing </li></ul><ul><li>Classification </li></ul><ul><li>Regression </li></ul><ul><li>Clustering </li></ul><ul><li>Association Rules </li></ul><ul><li>Visualization </li></ul><ul><li>Output </li></ul><ul><li>Result </li></ul>
  8. 8. How to use Weka ? <ul><li>Weka Data File Format (Input) </li></ul><ul><li>Weka for D ata M ining </li></ul><ul><li>Sample O utput from Weka (Output) </li></ul>
  9. 9. Weka Data File Format (Input) <ul><li>FILE FORMAT </li></ul><ul><li>@relation RELATION_NAME </li></ul><ul><li>@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR </li></ul><ul><li>@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR </li></ul><ul><li>@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR </li></ul><ul><li>@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR </li></ul><ul><li>@data </li></ul><ul><li>DATAROW1 </li></ul><ul><li>DATAROW2 </li></ul><ul><li>DATAROW3 </li></ul><ul><li>The m ost popular data input format of Weka is “a rff ” (with “arff” being the extension name of your input data file) . </li></ul>
  10. 10. <ul><li>@relation heart-disease-simplified </li></ul><ul><li>@attribute age numeric </li></ul><ul><li>@attribute sex { female, male} </li></ul><ul><li>@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} </li></ul><ul><li>@attribute cholesterol numeric </li></ul><ul><li>@attribute exercise_induced_angina { no, yes} </li></ul><ul><li>@attribute class { present, not_present} </li></ul><ul><li>@data </li></ul><ul><li>63,male,typ_angina,233,no,not_present </li></ul><ul><li>67,male,asympt,286,yes,present </li></ul><ul><li>67,male,asympt,229,yes,present </li></ul><ul><li>38,female,non_anginal,?,no,not_present </li></ul><ul><li>... </li></ul>Example of “a rff ” Input File numeric attribute nominal attribute
  11. 11. Weka for D ata M ining <ul><li>There are mainly 2 ways to use Weka to conduct your data mining tasks. </li></ul><ul><ul><li>Use Weka Graphical U ser I nterfaces (GUI) </li></ul></ul><ul><ul><ul><li>GUI is straightforward and e asy to use. But it is not flexible. It can not be called from you own application. </li></ul></ul></ul><ul><ul><li>Import Weka Java library to your own java application. </li></ul></ul><ul><ul><ul><li>Developers can leverage on Weka Java library to develop software or modify the source code to meet special requirements. It is more flexible and advanced. But it is not as easy to use as GUI. </li></ul></ul></ul>
  12. 12. Weka GUI Different analysis tools/functions Different attributes to choose The value set of the chosen attribute and the # of input items with each value
  13. 13. Weka GUI Classification Algorithms
  14. 14. Import Weka Java library to your own J ava application <ul><li>Three sets of classes you may need to use when developing your own application </li></ul><ul><ul><li>Classes for Loading D ata </li></ul></ul><ul><ul><li>Classes for Classifiers </li></ul></ul><ul><ul><li>Classes for Evaluation </li></ul></ul>
  15. 15. Classes for Loading D ata <ul><li>Related Weka c lasses </li></ul><ul><ul><li>weka.core.Instances </li></ul></ul><ul><ul><li>weka.core.Instance </li></ul></ul><ul><ul><li>weka.core.Attribute </li></ul></ul><ul><li>How to l oad input data file into i nstances ? </li></ul><ul><ul><li>Every DataRow -> Instance, Every Attribute -> Attribute, Whole -> Instances </li></ul></ul># Load a file as Instances FileReader reader; reader = new FileReader(path); Instances instances = new Instances(reader);
  16. 16. <ul><li>Instances contains Attribute and Instance </li></ul><ul><ul><li>How to get every Instance within the Instances ? </li></ul></ul><ul><ul><li>How to get an Attribute ? </li></ul></ul># Get Instance Instance instance = instances.instance(index); # Get Instance Count int count = instances.numInstances(); # Get Attribute Name Attribute attribute = instances.attribute(index); # Get Attribute Count int count = instances.numAttributes(); Classes for Loading D ata
  17. 17. <ul><ul><li>How to g et the Attribute value of each Instance ? </li></ul></ul><ul><ul><li>Class Index (Very important!) </li></ul></ul># Get value instance.value(index); or instance.value(attrName); # Get Class Index instances.classIndex(); or instances. classAttribute ().index(); # Set Class Index instances.setClass(attribute); or instances.setClassIndex(index); Classes for Loading D ata
  18. 18. Classes for Classifiers <ul><li>Weka c lasses for C4.5, Naïve Bayes, and SVM </li></ul><ul><ul><li>Classifier: all classes which extend weka.classifiers.Classifier </li></ul></ul><ul><ul><ul><li>C4.5: weka.classifier.trees.J48 </li></ul></ul></ul><ul><ul><ul><li>NaiveBayes: weka.classifiers.bayes.NaiveBayes </li></ul></ul></ul><ul><ul><ul><li>SVM: weka.classifiers.functions.SMO </li></ul></ul></ul><ul><li>How to b uild a classifier ? </li></ul># Build a C4.5 Classifier Classifier c = new weka.classifier.trees.J48(); c.buildClassifier(trainingInstances); Build a SVM Classifier Classifier e = weka.classifiers.functions.SMO(); e.buildClassifier(trainingInstances);
  19. 19. Classes for Evaluation <ul><li>Related Weka c lasses </li></ul><ul><ul><li>weka.classifiers.CostMatrix </li></ul></ul><ul><ul><li>weka.classifiers.Evaluation </li></ul></ul><ul><li>How to use the evaluation classes? </li></ul># Use Classifier To Do Classification CostMatrix costMatrix = null; Evaluation eval = new Evaluation(testingInstances, costMatrix); for (int i = 0; i < testingInstances.numInstances(); i++){ eval.evaluateModelOnceAndRecordPrediction(c,testingInstances.instance(i)); System. out.println(eval.toSummaryString( false)); System. out.println(eval.toClassDetailsString()) ; System. out.println(eval.toMatrixString()); }
  20. 20. Classes for Evaluation <ul><li>Cross Validation </li></ul><ul><ul><li>In cross validation process, we split a single dataset into N equal shares. While taking N-1 shares as a training dataset , the rest will be used as testing dataset . </li></ul></ul><ul><ul><li>The most widely used is 10 cross fold validation. </li></ul></ul>
  21. 21. Classes for Evaluation <ul><li>How to obtain the training dataset and the testing dataset? </li></ul>Random random = new Random(seed); instances.randomize(random); instances.stratify(N); for (int i = 0; i < N; i++) { Instances train = instances.trainCV(N, i , random); Instances test = instances.testCV(N, i , random); }
  22. 22. Sample O utput from Weka
  23. 23. Conclusion about Weka <ul><li>In sum, t he overall goal of Weka is to build a state-of-the-art facility for developing machine learning (ML) techniques and allow people to apply them to real-world data mining problems. </li></ul><ul><li>Detailed documentation about different functions provided by Weka can be found on Weka website. </li></ul><ul><li>WEKA is available at : </li></ul><ul><li>http:// www.cs.waikato.ac.nz/ml/weka </li></ul>
  24. 24. NetDraw
  25. 25. Introduction to NetDraw (Visualization Tool) <ul><li>NetDraw is a n open source program written by Steve Borgatti from Analytic Technologies . It is often used for visualizing both 1-mode and 2-mode social network data. </li></ul><ul><li>You can download it from: </li></ul><ul><ul><li>http:// www.analytictech.com/downloadnd.htm </li></ul></ul><ul><li>(Compared to Weka, it is much easier to use :P) </li></ul>
  26. 26. What can NetDraw do? <ul><li>NetDraw can: </li></ul><ul><ul><li>h andle multiple relations at the same time, and </li></ul></ul><ul><ul><li>use node attributes to set colors, shapes, and sizes of nodes. </li></ul></ul><ul><li>Pictures can be saved in metafile, jpg, gif and bitmap formats. </li></ul><ul><li>Two basic kinds of layouts are implemented: a circle and an MDS based on geodesic distance. </li></ul><ul><li>You can also rotate, flip, shift, resize and zoom configurations. </li></ul>
  27. 27. How to use NetDraw? <ul><li>NetDraw Input Data File Format </li></ul><ul><li>Draw Networks using NetDraw </li></ul>
  28. 28. NetDraw Input Data File Format *node data &quot;ID&quot;, num &quot;$10 Gift Card off REGIS SALON (SALON SERVICES) + E&quot; 2 &quot;$10 iTunes Gift Certificate exp 9/2008&quot; 2 &quot;$10 STARBUCKS gift CARD CERTIFICATE&quot; 3 &quot;$10 Target Gift Card&quot; 3 &quot;$10.00 iTunes Music Gift Card - Free Shipping&quot; 2 &quot;$100 Best Buy Gift Card&quot; 15 &quot;$100 Gap Gift Card - FREE Shipping&quot; 9 … … … … … … … … *Tie data FROM TO &quot;Strength&quot; &quot;Home Depot Gift Card $500.&quot; &quot;$100 Home Depot Gift Card Accepted Nationwide&quot; 1 &quot;** $250 Best Buy GiftCard Gift Card Gift Certifica&quot; &quot;$25 Best Buy Gift Card for Store or Online!&quot; 1 &quot;$50 Bed Bath & Beyond Gift Card - FREE SHIPPING!&quot; &quot;$200 Cost Plus World Market Gift Card 4 Jewelry Be&quot; 1 &quot;$500.00 Best Buy gift certificate&quot; &quot;$15 Best Buy Gift Card *Free Shipping*&quot; 1 &quot;$25 Best Buy Gift Card for Store or Online!&quot; &quot;$15 Best Buy Gift Card *Free Shipping*&quot; 1 &quot;Bath and Body Works $25 Gift Card&quot; &quot;$200 Cost Plus World Market Gift Card 4 Jewelry Be&quot; 1 “ vna” Data Format The VNA data format (with “vna” being the extension name of the input data file) allows user s to store not only network data but also attributes of the nodes, along with information about how to display them (color, size, etc.).
  29. 29. Draw Networks using NetDraw Display setup of the nodes and relations Different functions The networks: nodes representing the individuals and links representing the relations
  30. 30. Analysis Example: Hot Item Analysis based on Giftcard selling information from eBay <ul><li>Each circle in the graph represents an active item in the database. </li></ul><ul><li>The label of the circle is the item title. </li></ul><ul><li>The bigger the circle and the label of circle, the hotter the item. </li></ul><ul><li>Items are clustered together based on the brand information. </li></ul><ul><li>Hot Topics during April 15 – April 22, 2007 </li></ul><ul><li>Hot Topics during April 22 – April 29, 2007 </li></ul>
  31. 31. Conclusion <ul><li>In sum, NetDraw can be used for social network visualization. </li></ul><ul><li>There are a lot of parameters to play with in the tool. The results can be saved as EMF, WMF, BMP and JPG files. </li></ul><ul><li>NetDraw is available at : </li></ul><ul><li> http://www.analytictech.com/downloadnd.htm </li></ul><ul><li>The website also provides detailed documentation . </li></ul><ul><li>If you have interest, you may also try some other visualization tools such as JUNG ( http://jung.sourceforge.net/ ) and GraphViz ( http:// www.graphviz.org / ). </li></ul>
  32. 32. <ul><li>Carefully prepare your data according to the input format required by each tool. </li></ul><ul><li>Read the documentation of each tool that you decide to use and understand its functionality. Think how it can be applied to your project. </li></ul><ul><li>Download and play with the tools. You cannot learn anything unless you t ry them by yourself! !! </li></ul>Some Suggestions
  33. 33. Thanks! Good luck for your projects! 