1. Combining NEO4J graph databse with WEKA Basic “toy” example drawn upon mining SEC filings of Form -D
2. Experiment :Find intersection among VC firms related to Google and its latest acquisitions (i.e the “Dataset”) and play with “predicting” the chance of newly funded startup being acquired by Google by examining proximity.
3. Weka: Machine learning toolkit containing classification and clustering algorithms. In this case used for creating recommendations based on input. Neo4j: Graph Database. Very suitable for social networks data. Used here for finding “shortest path” between two nodes
15. 1) Create Attributes: Attribute pathAttribute = new Attribute("path");Attribute categoryAttribute = new Attribute("category");Attribute similiarityAttribute = new Attribute("similarity");Attribute probabiityAttribute = new Attribute("probability"); In Weka a vector is container foR Attributes FastVector allAttributes = new FastVector(4); allAttributes.addElement(pathAttribute); allAttributes.addElement(categoryAttribute); 2) Create Dataset:Instance is a “container” of Attributesand the Dataset is container of Instances. Instances trainingDataSet = new Instances("VC", allAttributes, 17); For each instance we set values to be trained upon: Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity); instance.setValue(3, rank); trainingDataSet.add(instance);
16. 3) Train Classifier and Evaluate RBFNetwork rbfLearner = new RBFNetwork(); rbfLearner.setNumClusters(17); rbfLearner.buildClassifier(trainingDataSet ); Evaluation learningSetEvaluation = new Evaluation(learningDataset); learningSetEvaluation.evaluateModel(rbfLearner, learningDataset); 4) Predict Unknown Cases Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity); instance.setValue(3, 0); double prediction = rbfLearner.classifyInstance(testInstance);