Neo4J and Weka 2
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Neo4J and Weka 2

  • 3,211 views
Uploaded on

Combining recommendation engine with a graph database as a sample of the potential of emerging technologies.

Combining recommendation engine with a graph database as a sample of the potential of emerging technologies.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,211
On Slideshare
3,062
From Embeds
149
Number of Embeds
5

Actions

Shares
Downloads
39
Comments
0
Likes
5

Embeds 149

http://yanago.wordpress.com 132
http://www.docshut.com 11
http://www.slideshare.net 4
http://www.docseek.net 1
http://www.slashdocs.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Combining NEO4J graph databse with WEKA
    Basic “toy” example drawn upon mining SEC filings of Form -D
  • 2. Experiment :Find intersection among VC firms related to Google and its latest acquisitions (i.e the “Dataset”) and play with “predicting” the chance of newly funded startup being acquired by Google by examining proximity.
  • 3. Weka:
    Machine learning toolkit containing classification and clustering algorithms. In this case used for creating recommendations based on input.
    Neo4j:
    Graph Database. Very suitable for social networks data. Used here for finding “shortest path” between two nodes
  • 4. Neo4J can handle large sets of unstructured linked data:
  • 5. RDF : Subject- Property- Object
    Neo4J: Node 1–Relationship-Node2
  • 6. Statement:
    “Sequoia Capital Funded Google”
    Initialize Database:
    grapb = new EmbeddedGraphDatabase( “SEC" );
    index = new LuceneIndexService( graphDb );
    Create the Nodes:
    Node Sequoia = graphDb.createNode();
    Sequoia.setProperty( "name", “Seqioua Capital” );
    Node Google = graphDb.createNode();
    Google.setProperty( "name", “Google” );
    index.index(Sequoia , "name“,” Seqioua Capital”) );
    Create Relationship:
    Relationship rel = Sequoia.createRelationshipTo(Google, Relationship.FUNDED);
  • 7. Traversertraverser = node.traverse( Order.DEPTH_FIRST, topEvaluator.END_OF_NETWORK, new ReturnableEvaluator(){public booleanisReturnableNode(TraversalPositioncurrentPosition){Relationship last =currentPosition.lastRelationshipTraversed();
    return( last.getType().equals(InvestorRelationTypes.FUNDED) ) return false; } }, InvestorRelationTypes.BOARD, Direction.INCOMING, InvestorRelationTypes.FUNDED, Direction.INCOMING, InvestorRelationTypes.ACQUIRED, Direction.OUTGOING );
    return traverser.getAllNodes();
  • 8. “Path to Google:”
  • 9. Weka
    Create Attributes (table input)
    Create DataSet for Learning
    Build predictive model
    Evaluate quality of Model
    Predict the rank based on input
  • 10. Basic terms in WEKA
    • Dataset
    A set of data items, the dataset, is a very basic concept of machine learning. A dataset is roughly equivalent to a two-dimensional spreadsheet or database table. In WEKA a dataset is a collection of Instances.
    • Concept –The thing to be learned
    • 11. Instance –Dataset consist of Instances
    • 12. Attribute –Each instance consist of attributes
    • 13. Classifier
  • Weka
    Create Attributes (table input)
    Create DataSet for Learning
    Build predictive model
    Evaluate quality of Model
    Predict the rank based on input
  • 14. Example:Attributes
  • 15. 1) Create Attributes:
    Attribute pathAttribute = new Attribute("path");Attribute categoryAttribute = new Attribute("category");Attribute similiarityAttribute = new Attribute("similarity");Attribute probabiityAttribute = new Attribute("probability"); In Weka a vector is container foR Attributes FastVector allAttributes = new FastVector(4); allAttributes.addElement(pathAttribute); allAttributes.addElement(categoryAttribute);
    2) Create Dataset:Instance is a “container” of Attributesand the Dataset is container of Instances.
    Instances trainingDataSet = new Instances("VC", allAttributes, 17);
    For each instance we set values to be trained upon: Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity);
    instance.setValue(3, rank); trainingDataSet.add(instance);
  • 16. 3) Train Classifier and Evaluate
    RBFNetwork rbfLearner = new RBFNetwork();
    rbfLearner.setNumClusters(17);
    rbfLearner.buildClassifier(trainingDataSet );
    Evaluation learningSetEvaluation = new Evaluation(learningDataset);
    learningSetEvaluation.evaluateModel(rbfLearner, learningDataset);
    4) Predict Unknown Cases
    Instance instance = new Instance(4);instance.setDataset(trainingDataSet);instance.setValue(0, path);instance.setValue(1, category); instance.setValue(2, similiarity);
    instance.setValue(3, 0);
    double prediction = rbfLearner.classifyInstance(testInstance);