Building a Directed Graph with MongoDB
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Building a Directed Graph with MongoDB

  • 27,521 views
Uploaded on

Details of how Wordnik built a directed graph on top of MongoDB. This is the presentation given during MongoSF 2011 by Tony Tam.

Details of how Wordnik built a directed graph on top of MongoDB. This is the presentation given during MongoSF 2011 by Tony Tam.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Very interesting presentation. But why did you not go with a pure graph DB?
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
27,521
On Slideshare
20,811
From Embeds
6,710
Number of Embeds
36

Actions

Shares
Downloads
254
Comments
1
Likes
38

Embeds 6,710

http://www.10gen.com 2,000
http://neo4j.tw 1,944
http://www.mongodb.com 1,547
http://blog.nosqlfan.com 1,044
https://twitter.com 35
http://webcache.googleusercontent.com 23
http://www.neo4j.tw 19
http://paper.li 18
http://localhost:8080 14
http://feed.feedsky.com 7
http://archive.10gen.com 6
https://si0.twimg.com 5
http://educatie.10gen.com 4
http://www.linkedin.com 4
http://us-w1.rockmelt.com 4
http://www.twylah.com 4
http://10gen 3
http://www.slideshare.net 3
http://fromwww.mongodb.org 2
http://drupal1.10gen.cc 2
https://www.mongodb.com 2
http://www.google.com 2
http://translate.googleusercontent.com 2
http://www.zhuaxia.com 2
http://zhuaxia.com 2
url_unknown 2
https://ssl.nowall.be 1
http://xue.uplook.cn 1
https://www.linkedin.com 1
http://localhost 1
http://fossd.net 1
http://ru.wiki.mongodb.org 1
http://ikeepu.com 1
http://twitter.com 1
http://static.slidesharecdn.com 1
http://apy.mongodb.org 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building A directed graph with mongodb
    MongoSF 5/24/2011
    By Tony Tam @fehguy
  • 2. Who is wordnik
    Word + Meaning Discovery Engine
    Clustered Application built with:
    Scala/Java/Jetty
    Only way in is via REST
    19M API calls/day @ 7ms/query average
    Physical servers
    72GB RAM, 8 core
    4.3TB DAS
    We’re MongoDB users for ~1.5 yrs
    Used in master/slave
    14B documents in MongoDB
  • 3. Why a graph for words
    Technique to model network relationships
    Properties are dynamic
    Links are “arbitrary”
    Runtime performance
    Answers in < 5ms/request
    Routing functions based on goals
    “find most likely word for X”
    “find more common form of Y”
  • 4. Why a graph for words
    Misspellings, abbreviations, texting, Twitter
  • 5. More about graphs
    Different types of Graphs
    Decisions have huge impact on design + implementation
    Nodes (vertices)
    String and numeric properties
    Edges (links)
    Finite set of labeled edge types (~30)
    Multiple target nodes per edge
    Each potentially different weight
    Directed, non-symmetrical
  • 6. Why build on Mongodb?
    Word Graph is core to Wordnik
    Many ways to build a graph
    Dedicated graph DBs
    Relational DBs
    MongoDB Document Storage
    Uber-flexible
    Successfully routes in < 5ms
    Long runway for scale-out
    Limit storage infrastructure components
    Easy to implement
  • 7. Wordnik graph data model
    Nodes
    _id field holds name, object type
    Index at no extra cost
    Arbitrary number of properties
    Only two datatypes for us, String, Double
    Node type info in node ID (_id)
    na_corpusCount => Double
    sa_source => String
  • 8. Wordnik graph data model
    Edges
    Destination(s)
    Weight
    Link Properties
    Stored in Mongo Arrays
    Array size is app limited
    Use $push, $pop
  • 9. Access to mongo
    Mongo Access via DAO layer
    Limit queries to ones that work“well”
    ALL queries use index
    Find Node “cat” of type “word”:
    db.node.findOne({_id:"cat|word"})
    Find Edge types for above:
    db.edge.find({_id:/^cat|word|/},{_id:1})
    Serialization/deserialization
    Done “the old-fashioned way”
    BasicDBObject, BasicDBList faster than mappers for our use case
  • 10. Query efficiency
    Max execution time is f (ahops)
  • 11. Routing, traversals, functions
    Typically find path from A to B
    Routes have costs
    Low cost or high probability
    Our use case is atypical
    LinkedIn vs. Maps
    Not from A to B
    More like “from A with 3 hops”
    This matters!
  • 12. Performance + Scaling
  • 13. Performance + scaling
    Query by index only
    Use regex syntax in restricted fashion
    Starts with only
    No look behind
    Case sensitive
    Boring? Fast?
    Sharding is a no-brainer
    What about ObjectId()?
  • 14. Performance + scaling
    Horizontal? Vertical? Both? And when?
    Separate collections by edge type/object type
    Increases storage needs
    Collections all have padding, 30 collections => ~30x padding
    Sharding
    Use slick, built-in Mongo sharding
    Roll your own based on your data
    What does Wordnik do?
    Neither! (yet)
    30M Nodes, 50M Edges
    One collection for nodes
    One collection for edges
  • 15. Performance + scaling
    Selecting a shard key
    Done in application logic based on OUR data
    Depends on what you need
  • 16. End result
    Solves Wordnik Graph infrastructure needs
    Store Word nodes with UGC, corpus, structured, analytical data
    Batch fetch Edges @ > 50k/second
    Find Edge + endpoints in 80mS
    Powers our…
    Word Selection
    Canonicalization
    Misspelling
    “Did you mean” logic
    Classification + Matching Engine
  • 17. Examples
    Misspellings
    Abbreviations
    Lemmatization
  • 18. Examples
    Term normalization
    Find similar words
    Meaning normalization
    Find “more common” form
  • 19. examples
    Applied Word Graph
    Recall:
    “Computers are stupid”
    English is complex
    Clustering + classification algorithms:
    Stink without consistent data
    “The” => “the” (duh)
    “geese” => “goose” (ok)
    Stink when they’re slow
    Graph + Clustering/Classification
    Just add data
  • 20. MongoDB makes a Great graph back-end
    See more about Wordnik APIs:
    http://developer.wordnik.com
    Further Reading
    Migrating from MySQL to MongoDB
    http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik
    Maintaining your MongoDB Installation
    http://www.slideshare.net/fehguy/mongo-sv-tony-tam
    Source Code
    Mapping Benchmark
    https://github.com/fehguy/mongodb-benchmark-tools
    Wordnik OSS Tools
    https://github.com/wordnik/wordnik-oss
  • 21. MongoDB makes a Great graph back-end
    Questions?