Finding Love with MongoDB

Finding Love With MongoDB

{ name : "Oliver Dodd",
email : "oliver.dodd@gmail.com",
twitter : "01001111"
}

Traditional Search

Unidirectional User Defined Criteria

eHarmony Matching

Bidirectional User Defined Criteria

Matching Overview

Potential Match Finder Machine Learned Matching Match Delivery

Photo
Credits

Magnifying
glass:
andercismo
@
h7p://www.flickr.com/photos/andercismo/

Machine
learning:
University
of
Maryland
Press
Releases
@
h7p://www.flickr.com/photos/umdnews/

Mailman:
h7p://www.flickr.com/photos/noizephotography/

Potential Match Generator

•  Find candidates that meet user’s
preferences.

•  Ensure user doesn’t violate each
candidate’s preferences.

•  Discard pairings that violate Compatibility
Models.

•  Do this as fast as possible.

Legacy “Potential Match Generator”

Redesign

Requirements for a new data store

–  Centralized
–  Scalable
–  Automagical
–  Easy to maintain
–  Fast, multi-attribute searches

New ”Potential Match Generator”

Why MongoDB?

•  Scalability

•  Built in sharding and replication

•  Autobalancing

•  Rich, complex queries

Why MongoDB?

MongoDB is web scale.

Wins

•  Deploy new instances on demand.
–  No need to load a local database.

•  Adding replicas is easy and fast.

•  Fast queries when isolated to a shard.

•  Flexible schema
–  No more reloading for minor data model changes.

•  Built-in iterative fetching.

Losses

•  No schema = larger footprint.

•  Traditional DBAs can’t help (without training).

•  Aggregation queries are drastically different.

•  Initial configuration can be a long, manual
process.

Use Real Queries

Turn on the fire hose
When testing or even evaluating, use production data and
queries.

photo by Official U.S. Navy Imagery on Flickr

Use Real Queries

Unleash the Chaos Monkey
Kill your own mongod instances to ensure your cluster and
applications continue to function normally.

photo by dboy @ http://www.flickr.com/photos/dannyboyster/

Minimize

Minify property names.
–  In Java, use Morphia for mapping or Salat in Scala
(also good for queries but we developed our own generic Query API)
–  Use one or two characters per property name.

Consider retrieving full objects from another
collection or data store, storing only what you
absolutely need for your queries in the search
store.
–  On a related note, cache full objects; cache query results only if
your queried attributes are small in number.

Indexes

When performing large, variable, multi-
attribute searches, have a decent number of
them. Cover the major types of queries and
the worst performing outliers.

–  What is present in every query?

–  What are the best performing attributes when present?

–  What should my index look like when no high performing
attributes appear in the query?

Indexes

Omit ranges unless they are absolutely critical;
if needed, put them at the end.
–  Can I replace this with an $in clause?

–  Can this be prioritized in its own index?

–  Should there be versions of this index with and without this
particular attribute?

–  Will the appearance of this attribute in the index give me any
speed advantage over inspecting the full object?

Indexes

Ordering is very, very important.
–  Attributes for which a user can only have a single value
should appear towards the top of the index.

–  Attributes that depend on the values of another attribute
should appear in immediate succession.

–  Again, put ranges at the bottom. If multiple ranges are
necessary, ensure that they appear in order of their ability to
reduce the working set.

The order of fields in an index should be:
First, fields on which you will query for exact values.
Second, fields on which you will sort.
Finally, fields on which you will query for a range of values.
Eric@MongoLab - http://blog.mongolab.com/2012/06/cardinal-ins/

Indexes

Analyze slow queries to find out what attributes
you can capitalize on.

When building a compound index, don’t include
fields that only appear in $or queries as part of
multi-attribute queries.
db.toasters.find({
slots: 4,
canBagel: true,
$or: [
{ material: "stainless-steel"},
{ price: {$lte: 50}},
]
})

Queries – Ranges

Translate "between" queries to in clauses when
dealing with discrete values.

$and: [
{a: { $gte: 0}},
{a: { $lte: 5}}
]

becomes

a: { $in: [0,1,2,3,4,5]}

Attributes - Decrease Granularity

birthdate => birthyear

floats => ints

number _of_items => has_items?

Sharding

•  Try to isolate queries to a particular shard.

•  Ensure that your data and indexes can fit
entirely in memory.

•  If certain attributes ALWAYS appear in the
query and, in combination, give you a large
number of well distributed data partitions,
consider making them the shard key.

We’re Hiring

h7p://www.eharmony.com/about/careers

Finding Love with MongoDB

More Related Content

What's hot

Viewers also liked

Similar to Finding Love with MongoDB

More from MongoDB

Finding Love with MongoDB