Building a Directed Graph with MongoDB
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Building a Directed Graph with MongoDB

Uploaded on

Details of how Wordnik built a directed graph on top of MongoDB. This is the presentation given during MongoSF 2011 by Tony Tam.

Details of how Wordnik built a directed graph on top of MongoDB. This is the presentation given during MongoSF 2011 by Tony Tam.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Very interesting presentation. But why did you not go with a pure graph DB?
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 6,710 2,000 1,944 1,547 1,044 35 23 19 18
http://localhost:8080 14 7 6 5 4 4 4 4
http://10gen 3 3 2 2 2 2 2 2 2
url_unknown 2 1 1 1
http://localhost 1 1 1 1 1 1 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Building A directed graph with mongodb
    MongoSF 5/24/2011
    By Tony Tam @fehguy
  • 2. Who is wordnik
    Word + Meaning Discovery Engine
    Clustered Application built with:
    Only way in is via REST
    19M API calls/day @ 7ms/query average
    Physical servers
    72GB RAM, 8 core
    4.3TB DAS
    We’re MongoDB users for ~1.5 yrs
    Used in master/slave
    14B documents in MongoDB
  • 3. Why a graph for words
    Technique to model network relationships
    Properties are dynamic
    Links are “arbitrary”
    Runtime performance
    Answers in < 5ms/request
    Routing functions based on goals
    “find most likely word for X”
    “find more common form of Y”
  • 4. Why a graph for words
    Misspellings, abbreviations, texting, Twitter
  • 5. More about graphs
    Different types of Graphs
    Decisions have huge impact on design + implementation
    Nodes (vertices)
    String and numeric properties
    Edges (links)
    Finite set of labeled edge types (~30)
    Multiple target nodes per edge
    Each potentially different weight
    Directed, non-symmetrical
  • 6. Why build on Mongodb?
    Word Graph is core to Wordnik
    Many ways to build a graph
    Dedicated graph DBs
    Relational DBs
    MongoDB Document Storage
    Successfully routes in < 5ms
    Long runway for scale-out
    Limit storage infrastructure components
    Easy to implement
  • 7. Wordnik graph data model
    _id field holds name, object type
    Index at no extra cost
    Arbitrary number of properties
    Only two datatypes for us, String, Double
    Node type info in node ID (_id)
    na_corpusCount => Double
    sa_source => String
  • 8. Wordnik graph data model
    Link Properties
    Stored in Mongo Arrays
    Array size is app limited
    Use $push, $pop
  • 9. Access to mongo
    Mongo Access via DAO layer
    Limit queries to ones that work“well”
    ALL queries use index
    Find Node “cat” of type “word”:
    Find Edge types for above:
    Done “the old-fashioned way”
    BasicDBObject, BasicDBList faster than mappers for our use case
  • 10. Query efficiency
    Max execution time is f (ahops)
  • 11. Routing, traversals, functions
    Typically find path from A to B
    Routes have costs
    Low cost or high probability
    Our use case is atypical
    LinkedIn vs. Maps
    Not from A to B
    More like “from A with 3 hops”
    This matters!
  • 12. Performance + Scaling
  • 13. Performance + scaling
    Query by index only
    Use regex syntax in restricted fashion
    Starts with only
    No look behind
    Case sensitive
    Boring? Fast?
    Sharding is a no-brainer
    What about ObjectId()?
  • 14. Performance + scaling
    Horizontal? Vertical? Both? And when?
    Separate collections by edge type/object type
    Increases storage needs
    Collections all have padding, 30 collections => ~30x padding
    Use slick, built-in Mongo sharding
    Roll your own based on your data
    What does Wordnik do?
    Neither! (yet)
    30M Nodes, 50M Edges
    One collection for nodes
    One collection for edges
  • 15. Performance + scaling
    Selecting a shard key
    Done in application logic based on OUR data
    Depends on what you need
  • 16. End result
    Solves Wordnik Graph infrastructure needs
    Store Word nodes with UGC, corpus, structured, analytical data
    Batch fetch Edges @ > 50k/second
    Find Edge + endpoints in 80mS
    Powers our…
    Word Selection
    “Did you mean” logic
    Classification + Matching Engine
  • 17. Examples
  • 18. Examples
    Term normalization
    Find similar words
    Meaning normalization
    Find “more common” form
  • 19. examples
    Applied Word Graph
    “Computers are stupid”
    English is complex
    Clustering + classification algorithms:
    Stink without consistent data
    “The” => “the” (duh)
    “geese” => “goose” (ok)
    Stink when they’re slow
    Graph + Clustering/Classification
    Just add data
  • 20. MongoDB makes a Great graph back-end
    See more about Wordnik APIs:
    Further Reading
    Migrating from MySQL to MongoDB
    Maintaining your MongoDB Installation
    Source Code
    Mapping Benchmark
    Wordnik OSS Tools
  • 21. MongoDB makes a Great graph back-end