Building a Directed Graph with MongoDB

Building A directed graph with mongodb MongoSF 5/24/2011 By Tony Tam @fehguy

Who is wordnik Word + Meaning Discovery Engine Clustered Application built with: Scala/Java/Jetty Only way in is via REST 19M API calls/day @ 7ms/query average Physical servers 72GB RAM, 8 core 4.3TB DAS We’re MongoDB users for ~1.5 yrs Used in master/slave 14B documents in MongoDB

Why a graph for words Technique to model network relationships Properties are dynamic Links are “arbitrary” Runtime performance Answers in < 5ms/request Routing functions based on goals “find most likely word for X” “find more common form of Y”

Why a graph for words Misspellings, abbreviations, texting, Twitter

More about graphs Different types of Graphs Decisions have huge impact on design + implementation Nodes (vertices) String and numeric properties Edges (links) Finite set of labeled edge types (~30) Multiple target nodes per edge Each potentially different weight Directed, non-symmetrical

Why build on Mongodb? Word Graph is core to Wordnik Many ways to build a graph Dedicated graph DBs Relational DBs MongoDB Document Storage Uber-flexible Successfully routes in < 5ms Long runway for scale-out Limit storage infrastructure components Easy to implement

Wordnik graph data model Nodes _id field holds name, object type Index at no extra cost Arbitrary number of properties Only two datatypes for us, String, Double Node type info in node ID (_id) na_corpusCount => Double sa_source => String

Wordnik graph data model Edges Destination(s) Weight Link Properties Stored in Mongo Arrays Array size is app limited Use $push, $pop

Access to mongo Mongo Access via DAO layer Limit queries to ones that work“well” ALL queries use index Find Node “cat” of type “word”: db.node.findOne({_id:"cat|word"}) Find Edge types for above: db.edge.find({_id:/^catword/},{_id:1}) Serialization/deserialization Done “the old-fashioned way” BasicDBObject, BasicDBList faster than mappers for our use case

Query efficiency Max execution time is f (ahops)

Routing, traversals, functions Typically find path from A to B Routes have costs Low cost or high probability Our use case is atypical LinkedIn vs. Maps Not from A to B More like “from A with 3 hops” This matters!

Performance + scaling Query by index only Use regex syntax in restricted fashion Starts with only No look behind Case sensitive Boring? Fast? Sharding is a no-brainer What about ObjectId()?

Performance + scaling Horizontal? Vertical? Both? And when? Separate collections by edge type/object type Increases storage needs Collections all have padding, 30 collections => ~30x padding Sharding Use slick, built-in Mongo sharding Roll your own based on your data What does Wordnik do? Neither! (yet) 30M Nodes, 50M Edges One collection for nodes One collection for edges

Performance + scaling Selecting a shard key Done in application logic based on OUR data Depends on what you need

End result Solves Wordnik Graph infrastructure needs Store Word nodes with UGC, corpus, structured, analytical data Batch fetch Edges @ > 50k/second Find Edge + endpoints in 80mS Powers our… Word Selection Canonicalization Misspelling “Did you mean” logic Classification + Matching Engine

Examples Misspellings Abbreviations Lemmatization

Examples Term normalization Find similar words Meaning normalization Find “more common” form

examples Applied Word Graph Recall: “Computers are stupid” English is complex Clustering + classification algorithms: Stink without consistent data “The” => “the” (duh) “geese” => “goose” (ok) Stink when they’re slow Graph + Clustering/Classification Just add data

MongoDB makes a Great graph back-end See more about Wordnik APIs: http://developer.wordnik.com Further Reading Migrating from MySQL to MongoDB http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik Maintaining your MongoDB Installation http://www.slideshare.net/fehguy/mongo-sv-tony-tam Source Code Mapping Benchmark https://github.com/fehguy/mongodb-benchmark-tools Wordnik OSS Tools https://github.com/wordnik/wordnik-oss

MongoDB makes a Great graph back-end Questions?

Building a Directed Graph with MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building a Directed Graph with MongoDB

Similar to Building a Directed Graph with MongoDB (20)

More from Tony Tam

More from Tony Tam (20)

Recently uploaded

Recently uploaded (20)

Building a Directed Graph with MongoDB