0
Combining NEO4J graph databse with WEKA<br />Basic “toy” example drawn upon mining SEC filings of Form -D<br />
Experiment :Find intersection among VC firms related to Google and its latest acquisitions (i.e the “Dataset”) and play wi...
Weka:<br />Machine learning toolkit containing classification and clustering algorithms. In this case used for creating re...
Neo4J can handle large sets of unstructured linked data:<br />
RDF : Subject- Property- Object<br />Neo4J: Node 1–Relationship-Node2<br />
Statement:<br />“Sequoia Capital Funded Google”<br />Initialize Database:<br />grapb = new EmbeddedGraphDatabase( “SEC" );...
Traversertraverser = node.traverse( Order.DEPTH_FIRST, topEvaluator.END_OF_NETWORK, new ReturnableEvaluator(){public boole...
“Path to Google:”<br />
Weka<br />Create Attributes (table input)<br />Create DataSet for Learning <br />Build predictive model<br />Evaluate qual...
Basic terms in WEKA<br /><ul><li>Dataset </li></ul>A set of data items, the dataset, is a very basic concept of machine le...
Instance –Dataset consist of Instances
Attribute –Each instance consist of attributes
Classifier </li></li></ul><li>Weka<br />Create Attributes (table input)<br />Create DataSet for Learning <br />Build predi...
Example:Attributes<br />
Upcoming SlideShare
Loading in...5
×

Neo4J and Weka 2

2,846

Published on

Combining recommendation engine with a graph database as a sample of the potential of emerging technologies.

Published in: Technology, Education

Transcript of "Neo4J and Weka 2 "

  1. 1. Combining NEO4J graph databse with WEKA<br />Basic “toy” example drawn upon mining SEC filings of Form -D<br />
  2. 2. Experiment :Find intersection among VC firms related to Google and its latest acquisitions (i.e the “Dataset”) and play with “predicting” the chance of newly funded startup being acquired by Google by examining proximity.<br />
  3. 3. Weka:<br />Machine learning toolkit containing classification and clustering algorithms. In this case used for creating recommendations based on input.<br />Neo4j:<br />Graph Database. Very suitable for social networks data. Used here for finding “shortest path” between two nodes <br />
  4. 4. Neo4J can handle large sets of unstructured linked data:<br />
  5. 5. RDF : Subject- Property- Object<br />Neo4J: Node 1–Relationship-Node2<br />
  6. 6. Statement:<br />“Sequoia Capital Funded Google”<br />Initialize Database:<br />grapb = new EmbeddedGraphDatabase( “SEC" );<br />index = new LuceneIndexService( graphDb );<br />Create the Nodes:<br />Node Sequoia = graphDb.createNode();<br />Sequoia.setProperty( "name", “Seqioua Capital” );<br />Node Google = graphDb.createNode();<br />Google.setProperty( "name", “Google” );<br />index.index(Sequoia , "name“,” Seqioua Capital”) );<br />Create Relationship:<br />Relationship rel = Sequoia.createRelationshipTo(Google, Relationship.FUNDED);<br />
  7. 7. Traversertraverser = node.traverse( Order.DEPTH_FIRST, topEvaluator.END_OF_NETWORK, new ReturnableEvaluator(){public booleanisReturnableNode(TraversalPositioncurrentPosition){Relationship last =currentPosition.lastRelationshipTraversed(); <br /> return( last.getType().equals(InvestorRelationTypes.FUNDED) ) return false; } }, InvestorRelationTypes.BOARD, Direction.INCOMING, InvestorRelationTypes.FUNDED, Direction.INCOMING, InvestorRelationTypes.ACQUIRED, Direction.OUTGOING );<br /> return traverser.getAllNodes();<br />
  8. 8. “Path to Google:”<br />
  9. 9. Weka<br />Create Attributes (table input)<br />Create DataSet for Learning <br />Build predictive model<br />Evaluate quality of Model<br />Predict the rank based on input<br />
  10. 10. Basic terms in WEKA<br /><ul><li>Dataset </li></ul>A set of data items, the dataset, is a very basic concept of machine learning. A dataset is roughly equivalent to a two-dimensional spreadsheet or database table. In WEKA a dataset is a collection of Instances.<br /><ul><li>Concept –The thing to be learned
  11. 11. Instance –Dataset consist of Instances
  12. 12. Attribute –Each instance consist of attributes
  13. 13. Classifier </li></li></ul><li>Weka<br />Create Attributes (table input)<br />Create DataSet for Learning <br />Build predictive model<br />Evaluate quality of Model<br />Predict the rank based on input<br />
  14. 14. Example:Attributes<br />
  15. 15. 1) Create Attributes: <br />Attribute pathAttribute = new Attribute("path");Attribute categoryAttribute = new Attribute("category");Attribute similiarityAttribute = new Attribute("similarity");Attribute probabiityAttribute = new Attribute("probability"); In Weka a vector is container foR Attributes FastVector allAttributes = new FastVector(4); allAttributes.addElement(pathAttribute); allAttributes.addElement(categoryAttribute); <br />2) Create Dataset:Instance is a “container” of Attributesand the Dataset is container of Instances.<br />Instances trainingDataSet = new Instances("VC", allAttributes, 17); <br />For each instance we set values to be trained upon: Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity); <br />instance.setValue(3, rank); trainingDataSet.add(instance);<br />
  16. 16. 3) Train Classifier and Evaluate<br />RBFNetwork rbfLearner = new RBFNetwork();<br /> rbfLearner.setNumClusters(17);<br />rbfLearner.buildClassifier(trainingDataSet );<br />Evaluation learningSetEvaluation = new Evaluation(learningDataset);<br />learningSetEvaluation.evaluateModel(rbfLearner, learningDataset);<br />4) Predict Unknown Cases <br />Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity); <br />instance.setValue(3, 0);<br />double prediction = rbfLearner.classifyInstance(testInstance);<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×