Aggregating Social Data by Location

Managing Social Content with
MongoDB
Chris Harris - charris@10gen.com
twitter: @cj_harris5

Traditional Architecture

HTML

Web Server

Application Server

Controllers

Services

SQL

Database

Increase in Write Ratio
Users don’t just want to read content!

They want to share and contribute to the
content!

Volume Writes!

Need to Scale Datasource

JSON JSON JSON

Web Server

Application Server

Service #1

SQL

Bottleneck!

Application Cache?

JSON JSON JSON

Web Server

Application Server
Service #1

App Cache

SQL

Issues

+ Read Only data comes from a Cache

- Writes slow down as need to update the
Cache and the Database

- Need to keep cache data in sync between
Application Servers

IT needs are evolving...
Agile
Development
• Iterative
• Continuous

Data Volume, Type
& Use
• Trillions of records
• 100’s of millions of
queries per second
• Real-Time Analytics
• Unstructured / semi-
New Hardware
structured Architectures
• Commodity servers
• Cloud Computing
• Horizontal Scaling

Tradeoff: Scale vs. Functionality

• memcached
scalability & performance

• key/value

• RDBMS

depth of functionality

Terminology

RDBMS MongoDB

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking

Publishing Content with
MongoDB

A simple start

article = {author: "Chris",
date: new Date(),
title: "Managing Social Content"}

> db.articles.save(article)

Map the documents to your application.

Find the document

> db.articles.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title: “Managing Social Content"
}

Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied

Add an index, ﬁnd via index
> db.articles.ensureIndex({author: 1})
> db.articles.find({author: 'Chris'})

author: "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}

Secondary index on "author"

Extending the schema

http://nysi.org.uk/kids_stuff/rocket/rocket.htm

Adding Tags

> db.articles.update(
{title: "Managing Social Content" },
{$push: {tags: [“MongoDB”, “NoSQL”]}}
)

Push social "tags" into the existing article

Find the document

> db.articles.find()
author: "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title: "Managing Social Content",
tags: [ "comic", "adventure" ]
}
Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied

Query operators
• Conditional operators:
‣ $ne, $in, $nin, $mod, $all, $size, $exists,
$type, ..
‣ $lt, $lte, $gt, $gte, $ne

• Update operators:
‣ $set, $inc, $push

Extending the Schema
new_comment = {author: "Marc",
date: new Date(),
text: "great article",
stars: 5}

> db.articles.update(
{title: "Managing Social Content" },

{"$push": {comments: new_comment},
"$inc": {comments_count: 1}
}
)

Extending the Schema
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title : "Managing Social Content",
tags : [ "MongoDB", "NoSQL" ],
comments : [{
author : "Marc",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great article",
stars : 5
}],
comments_count: 1
}

Trees
//Embedded Tree

{ ...
comments : [{
author : "Marc", text : "...",
replies : [{
author : "Fred", text : "..."
replies : [],
}]
}]
}
+ PROs: Single Document, Performance, Intuitive

- CONs: Hard to search, Partial Results, 16MB limit

One to Many - Normalized
// Articles collection
{ _id : 1000,
author : "Chris",
date: ISODate("2012-01-23T14:01:00.117Z"),
title : "Managing Social Content"
}
// Comments collection
{ _id : 1,
article : 1000,
author : "Marc",
date : ISODate("2012-01-23T14:31:53.848Z"),
...
}
> article = db. articles.find({title: "Managing Social
Content"});
> db.comments.find({article: article._id});

Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }

Array of Ancestors
A B C
{ _id: "a" } E D
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})

Array of Ancestors
A B C
{ _id: "a" } E D
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})

Array of Ancestors
A B C
{ _id: "a" } E D
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
//find all ancestors of f:
> threads = db.msg_tree.findOne({"_id": "f"}).thread
> db.msg_tree.find({"_id ": { $in : threads})

Geospatial
• Geo hash stored in B-Tree
• First two values indexed
db.articles.save({
loc: { long: 40.739037, lat: 40.739037 }
});

db.articles.save({
loc: [40.739037, 40.739037]
});

db.articles.ensureIndex({"loc": "2d"})

Geospatial Query
• Multi-location indexes for a single document
• $near may return the document for each index match
• $within will return a document once and once only

Find 100 nearby locations:

> db.locations.ﬁnd({loc: {$near: [37.75, -122.42]}});

Find all locations within a box
>box = [[40, 40], [60, 60]]
>db.locations.ﬁnd({loc: {$within: {$box: box}}});

Aggregation framework

• New aggregation framework
• Declarative framework (no JavaScript)
• Describe a chain of operations to apply
• Expression evaluation
• Return computed values
• Framework: new operations added easily
• C++ implementation

Aggregation - Pipelines

• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Members of a collection are passed through
a pipeline to produce a result
• ps -ef | grep -i mongod

Example - twitter
{
"_id" : ObjectId("4f47b268fb1c80e141e9888c"),
"user" : {
"friends_count" : 73,
"location" : "Brazil",
"screen_name" : "Bia_cunha1",
"name" : "Beatriz Helena Cunha",
"followers_count" : 102,
}
}

• Find the # of followers and # friends by location

Example - twitter
db.tweets.aggregate(
{$match:
{"user.friends_count": { $gt: 0 },
"user.followers_count": { $gt: 0 }
}
},
{$project:
{ location: "$user.location",
friends: "$user.friends_count",
followers: "$user.followers_count"
}
},
{$group:
{_id: "$location",
friends: {$sum: "$friends"},
followers: {$sum: "$followers"}
}
}
);

Example - twitter
{$match: Predicate
}
},
{$project:
friends: "$user.friends_count",
followers: "$user.followers_count"
}
},
{$group:
{_id: "$location",
}
}
);

Example - twitter
{$match: Predicate
}
},
{$project:
Parts of the
friends: "$user.friends_count", document you
followers: "$user.followers_count" want to project
}
},
{$group:
{_id: "$location",
}
}
);

Example - twitter
{$match: Predicate
}
},
{$project:
Parts of the
friends: "$user.friends_count", document you
followers: "$user.followers_count" want to project
}
},
{$group:
{_id: "$location", Function to
apply to the
} result set
}
);

Example - twitter
{

"result" : [

{

"_id" : "Far Far Away",

"friends" : 344,

"followers" : 789

},
...

],

"ok" : 1
}

Demo
Demo ﬁles are at https://gist.github.com/
2036709

Use Cases
Content Management Operational Intelligence Product Data Management

User Data Management High Volume Data Feeds

Questions

• 10Gen Services
– Development Support
– Consultancy
– TAM
– Production Support

• Free online MongoDB
training
– Develop
– Deploy
– Classes start Oct. 2012
43

Aggregating Social Data by Location

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Aggregating Social Data by Location

Similar to Aggregating Social Data by Location (20)

More from MongoDB

More from MongoDB (20)

Aggregating Social Data by Location

Editor's Notes