Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
#MongoDBBasics @MongoDB

‘Build an Application’ Webinar Series

Schema design and
application architecture
Matthew Bates
S...
Agenda
• Working with documents
• Application requirements
• Schema design iteration

• ‘myCMS’ sample application archite...
The Lingo
RDBMS

MongoDB

Database

➜ Database

Table

➜ Collection

Row

➜ Document

Index

➜ Index

Join

➜ Embedded Doc...
Modelling data
Example document
{
‘_id’ : ObjectId(..),
‘title’ : ‘Schema design in MongoDB’,
‘author’ : ‘mattbates’,
‘text’ : ‘Data in M...
‘myCMS’ requirements
• Different types of categorised articles.
• Users can register as members, login, edit profile and l...
‘myCMS’ entities
• Articles
• Different types – blogs, galleries, surveys
• Embedded multimedia (images, video)
• Tags

• ...
Typical (relational) ERD

* Note: this is an illustrative example.
Design schema.. In application code
# Python dictionary (or object)
>>> article = { ‘title’ : ‘Schema design in MongoDB’,
...
Let’s add a headline image
>>> img_data = Binary(open(‘article_img.jpg’).read())
>>> article = { ‘title’ : ‘Schema design ...
And different types of article
>>> article = { ‘title’ : ‘Favourite web application framework’,
‘author’ : ‘mattbates’,
‘s...
Users and profiles
>>> user= { ‘user’ : ‘mattbates’,
‘email’ : ‘matt.bates@mongodb.com’,
‘password’ : ‘xxxxxxxxxx’,
‘joine...
Modelling comments (1)
• Two collections – articles and comments
• Use a reference (i.e. foreign key) to link together
• B...
Modelling comments (2)
{

• Single articles collection –

embed comments in article
documents
• Pros
• Single query, docum...
Modelling comments (3)
• Another option: hybrid of (2) and (3), embed

top x comments (e.g. by date, popularity) into
the ...
Modelling comments (3)
{
‘_id’ : ObjectId(..),
‘title’ : ‘Schema design in MongoDB’,
‘author’ : ‘mattbates’,
‘date’ : ISOD...
Modelling comments (3)
{
‘_id’ : ObjectId(..),
‘article_id’ : ObjectId(..),
‘page’ : 1,
‘count’ : 42
‘comments’ : [
{
‘tex...
Modelling interactions
• Interactions
– Article views
– Comments
– (Social media sharing)

• Requirements
– Time series
– ...
Modelling interactions
• Document per article per day –

‘bucketing’
• Daily counter and hourly sub-

document counters fo...
JSON and RESTful API
Real applications are not built at a shell – let’s build a RESTful
API.
Client-side
JSON
(eg AngularJ...
myCMS REST endpoints
Method

URI

Action

GET

/articles

Retrieve all articles

GET

/articles-by-tag/[tag]

Retrieve all...
Getting started with the skeleton
code
$ git clone http://www.github.com/mattbates/mycms_mongodb
$ cd mycms-mongodb
$ virt...
RESTful API methods in Python +
Flask
@app.route('/cms/api/v1.0/articles', methods=['GET'])

def get_articles():
"""Retrie...
RESTful API methods in Python +
Flask
@app.route('/cms/api/v1.0/articles/<string:article_id>/comments', methods = ['POST']...
RESTful API methods in Python +
Flask
# $inc the page count if bucket size (100) is exceeded
if page['count'] > 100:
db.ar...
RESTful API methods in Python +
Flask
def add_interaction(article_id, type):

"""Record the interaction (view/comment) for...
Testing the API – retrieve articles
$ curl -i http://localhost:5000/cms/api/v1.0/articles

HTTP/1.0 200 OK
Content-Type: a...
Testing the API – comment on an
article
$ curl -H "Content-Type: application/json" -X POST -d '{"text":"An interesting
art...
Schema iteration
New feature in the backlog?
Documents have dynamic schema so we just iterate
the object schema.
>>> user ...
Scale out with sharding
Summary
• Flexible schema documents with ability to embed

rich and complex data structures for performance
• Schema is de...
Further reading
• ‘myCMS’ skeleton source code:

http://www.github.com/mattbates/mycms_mongodb
• Data Models

http://docs....
Next Session– 20th February

• Daniel Roberts
– Interacting with the database
• Query and Update Language
• Interactions b...
#MongoDBBasics @MongoDB

Thank You
Q&A with the team
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting Started
Upcoming SlideShare
Loading in …5
×

Webinar: Build an Application Series - Session 2 - Getting Started

9,075 views

Published on

This session - presented by Matthew Bates, Solutions Architect & Consulting Engineer at MongoDB - will cover an outline of an application, schema design decisions, application functionality and design for scale out.

About the speaker

Matthew Bates is a Solutions Architect in the EMEA region for MongoDB and helps advise customers how to best use and make the most out of MongoDB in their organisations. He has a background in solutions for the acquisition, management and exploitation of big data in government and public sector and telco industries through his previous roles at consultancy firms and a major European telco. He's a Java and Python coder and has a BSc(Hons) in Computer Science from the University of Nottingham.


Next in the Series:

February 20th 2014
Build an Application Series - Session 3 - Interacting with the database:
This webinar will discuss queries and updates and the interaction between an application and a database

March 6th 2014
Build an Application Series - Session 4 - Indexing:
This session will focus on indexing strategies for the application, including geo spatial and full text search

March 20th 2014
Build an Application Series - Session 5 - Reporting in your application:
This session covers Reporting and Aggregation Framework and Building application usage reports

April 3th 2014
Operations for your application - Session 6 - Deploying the application:
By this stage, we will have built the application. Now we need to deploy it. We will discuss architecture for High Availability and scale out

April 17th 2014
Operations for your application - Session 7 - Backup and DR:
This webinar will discuss back up and restore options. Learn what you should do in the event of a failure and how to perform a backup and recovery of the data in your applications

May 6th 2014
Operations for your application - Session 8 - Monitoring and Performance Tuning:
The final webinar of the series will discuss what metrics are important and how to manage and monitor your application for key performance.

Published in: Technology
  • Be the first to comment

Webinar: Build an Application Series - Session 2 - Getting Started

  1. 1. #MongoDBBasics @MongoDB ‘Build an Application’ Webinar Series Schema design and application architecture Matthew Bates Solution Architect/Consulting Engineer, MongoDB EMEA
  2. 2. Agenda • Working with documents • Application requirements • Schema design iteration • ‘myCMS’ sample application architecture and code snippets • ‘Genius Bar’ Q&A with the MongoDB EMEA Solution Architects
  3. 3. The Lingo RDBMS MongoDB Database ➜ Database Table ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference
  4. 4. Modelling data
  5. 5. Example document { ‘_id’ : ObjectId(..), ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘text’ : ‘Data in MongoDB has a flexible schema..’, ‘date’ : ISODate(..), ‘tags’ : [‘MongoDB’, ‘schema’], ‘comments’ : [ { ‘text ‘ : ‘Really useful..’, ts: ISODate(..) } ] }
  6. 6. ‘myCMS’ requirements • Different types of categorised articles. • Users can register as members, login, edit profile and logout. • Users can post new articles and comment on articles. • Collect and analyse usage stats – article publishing, views and interactions– for on-site and admin analytics.
  7. 7. ‘myCMS’ entities • Articles • Different types – blogs, galleries, surveys • Embedded multimedia (images, video) • Tags • Users • Profile • Interactions • Comments • Views
  8. 8. Typical (relational) ERD * Note: this is an illustrative example.
  9. 9. Design schema.. In application code # Python dictionary (or object) >>> article = { ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘section’ : ‘schema’, ‘slug’ : ‘schema-design-in-mongodb’, ‘text’ : ‘Data in MongoDB has a flexible schema..’, ‘date’ : datetime.datetime.utcnow(), ‘tags’ : [‘MongoDB’, ‘schema’] } >>> db[‘articles’].insert(article)
  10. 10. Let’s add a headline image >>> img_data = Binary(open(‘article_img.jpg’).read()) >>> article = { ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘section’ : ‘schema’, ‘slug’ : ‘schema-design-in-mongodb’, ‘text’ : ‘Data in MongoDB has a flexible schema..’, ‘date’ : datetime.datetime.utcnow(), ‘tags’ : [‘MongoDB’, ‘schema’], ‘headline_img’ : { ‘img’ : img_data, ‘caption’ : ‘A sample document at the shell’ }} >>> db[‘articles’].insert(article)
  11. 11. And different types of article >>> article = { ‘title’ : ‘Favourite web application framework’, ‘author’ : ‘mattbates’, ‘section’ : ‘web-dev’, ‘slug’ : ‘web-app-frameworks’, ‘gallery’ : [ { ‘img_url’ : ‘http://x.com/45rty’, ‘caption’ : ‘Flask’, ..}, .. ] ‘date’ : datetime.datetime.utcnow(), ‘tags’ : [‘MongoDB’, ‘schema’], } >>> db[‘articles’].insert(article)
  12. 12. Users and profiles >>> user= { ‘user’ : ‘mattbates’, ‘email’ : ‘matt.bates@mongodb.com’, ‘password’ : ‘xxxxxxxxxx’, ‘joined’ : datetime.datetime.utcnow() ‘location’ : { ‘city’ : ‘London’ }, } >>> db[‘users’].insert(user)
  13. 13. Modelling comments (1) • Two collections – articles and comments • Use a reference (i.e. foreign key) to link together • But.. N+1 queries to retrieve article and comments { ‘_id’ : ObjectId(..), ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘date’ : ISODate(..), ‘tags’ : [‘MongoDB’, ‘schema’], ‘section’ : ‘schema’, ‘slug’ : ‘schema-design-in-mongodb’, ‘comments’ : [ ObjectId(..), …] } { } ‘_id’ : ObjectId(..), ‘article_id’ : 1, ‘text’ : ‘A great article, helped me understand schema design’, ‘date’ : ISODate(..),, ‘author’ : ‘johnsmith’
  14. 14. Modelling comments (2) { • Single articles collection – embed comments in article documents • Pros • Single query, document designed for the access pattern • Locality (disk, shard) • Cons • Comments array is unbounded; documents will grow in size (remember 16MB document limit) ‘_id’ : ObjectId(..), ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘date’ : ISODate(..), ‘tags’ : [‘MongoDB’, ‘schema’], … ‘comments’ : [ { ‘text’ : ‘A great article, helped me understand schema design’, ‘date’ : ISODate(..), ‘author’ : ‘johnsmith’ }, … ] }
  15. 15. Modelling comments (3) • Another option: hybrid of (2) and (3), embed top x comments (e.g. by date, popularity) into the article document • Fixed-size (2.4 feature) comments array • All other comments ‘overflow’ into a comments collection (double write) in buckets • Pros – Document size is more fixed – fewer moves – Single query built – Full comment history with rich query/aggregation
  16. 16. Modelling comments (3) { ‘_id’ : ObjectId(..), ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘date’ : ISODate(..), ‘tags’ : [‘MongoDB’, ‘schema’], … ‘comments_count’: 45, ‘comments_pages’ : 1 ‘comments’ : [ { ‘text’ : ‘A great article, helped me understand schema design’, ‘date’ : ISODate(..), ‘author’ : ‘johnsmith’ }, … ] } Total number of comments • Integer counter updated by update operation as comments added/removed Number of pages • Page is a bucket of 100 comments (see next slide..) Fixed-size comments array • 10 most recent • Sorted by date on insertion
  17. 17. Modelling comments (3) { ‘_id’ : ObjectId(..), ‘article_id’ : ObjectId(..), ‘page’ : 1, ‘count’ : 42 ‘comments’ : [ { ‘text’ : ‘A great article, helped me understand schema design’, ‘date’ : ISODate(..), ‘author’ : ‘johnsmith’ }, … } One comment bucket (page) document containing up to about 100 comments Array of 100 comment subdocuments
  18. 18. Modelling interactions • Interactions – Article views – Comments – (Social media sharing) • Requirements – Time series – Pre-aggregated in preparation for analytics
  19. 19. Modelling interactions • Document per article per day – ‘bucketing’ • Daily counter and hourly sub- document counters for interactions • Bounded array (24 hours) • Single query to retrieve daily article interactions; ready-made for graphing and further aggregation { ‘_id’ : ObjectId(..), ‘article_id’ : ObjectId(..), ‘section’ : ‘schema’, ‘date’ : ISODate(..), ‘daily’: { ‘views’ : 45, ‘comments’ : 150 } ‘hours’ : { 0 : { ‘views’ : 10 }, 1 : { ‘views’ : 2 }, … 23 : { ‘comments’ : 14, ‘views’ : 10 } } }
  20. 20. JSON and RESTful API Real applications are not built at a shell – let’s build a RESTful API. Client-side JSON (eg AngularJS, HTTP(S) REST Python web app Pymongo driver Examples to follow: Python RESTful API using Flask microframework (BSON)
  21. 21. myCMS REST endpoints Method URI Action GET /articles Retrieve all articles GET /articles-by-tag/[tag] Retrieve all articles by tag GET /articles/[article_id] Retrieve a specific article by article_id POST /articles Add a new article GET /articles/[article_id]/comments Retrieve all article comments by article_id POST /articles/[article_id]/comments Add a new comment to an article. POST /users Register a user user GET /users/[username] Retrieve user’s profile PUT /users/[username] Update a user’s profile
  22. 22. Getting started with the skeleton code $ git clone http://www.github.com/mattbates/mycms_mongodb $ cd mycms-mongodb $ virtualenv venv $ source venv/bin/activate $ pip install –r requirements.txt $ mkdir –p data/db $ mongod --dbpath=data/db --fork --logpath=mongod.log $ python web.py [$ deactivate]
  23. 23. RESTful API methods in Python + Flask @app.route('/cms/api/v1.0/articles', methods=['GET']) def get_articles(): """Retrieves all articles in the collection sorted by date """ # query all articles and return a cursor sorted by date cur = db['articles'].find().sort('date’) if not cur: abort(400) # iterate the cursor and add docs to a dict articles = [article for article in cur] return jsonify({'articles' : json.dumps(articles, default=json_util.default)})
  24. 24. RESTful API methods in Python + Flask @app.route('/cms/api/v1.0/articles/<string:article_id>/comments', methods = ['POST']) def add_comment(article_id): """Adds a comment to the specified article and a bucket, as well as updating a view counter "”” … page_id = article['last_comment_id'] // 100 … # push the comment to the latest bucket and $inc the count page = db['comments'].find_and_modify( { 'article_id' : ObjectId(article_id), 'page' : page_id}, { '$inc' : { 'count' : 1 }, '$push' : { 'comments' : comment } }, fields= {'count' : 1}, upsert=True, new=True)
  25. 25. RESTful API methods in Python + Flask # $inc the page count if bucket size (100) is exceeded if page['count'] > 100: db.articles.update( { '_id' : article_id, 'comments_pages': article['comments_pages'] }, { '$inc': { 'comments_pages': 1 } } ) # let's also add to the article itself # most recent 10 comments only res = db['articles'].update( {'_id' : ObjectId(article_id)}, {'$push' : {'comments' : { '$each' : [comment], '$sort' : {’date' : 1 }, '$slice' : -10}}, '$inc' : {'comment_count' : 1}}) …
  26. 26. RESTful API methods in Python + Flask def add_interaction(article_id, type): """Record the interaction (view/comment) for the specified article into the daily bucket and update an hourly counter """ ts = datetime.datetime.utcnow() # $inc daily and hourly view counters in day/article stats bucket # note the unacknowledged w=0 write concern for performance db['interactions'].update( { 'article_id' : ObjectId(article_id), 'date' : datetime.datetime(ts.year, ts.month, ts.day)}, { '$inc' : { 'daily.{}’.format(type) : 1, 'hourly.{}.{}'.format(ts.hour, type) : 1 }}, upsert=True, w=0)
  27. 27. Testing the API – retrieve articles $ curl -i http://localhost:5000/cms/api/v1.0/articles HTTP/1.0 200 OK Content-Type: application/json Content-Length: 20 Server: Werkzeug/0.9.4 Python/2.7.6 Date: Sat, 01 Feb 2014 09:52:57 GMT { "articles": "[{"author": "mattbates", "title": "Schema design in MongoDB", "text": "Data in MongoDB has a flexible schema..", "tags": ["MongoDB", "schema"], "date": {"$date": 1391293347408}, "_id": {"$oid": "52ed73a30bd031362b3c6bb3"}}]" }
  28. 28. Testing the API – comment on an article $ curl -H "Content-Type: application/json" -X POST -d '{"text":"An interesting article and a great read."}' http://localhost:5000/cms/api/v1.0/articles/52ed73a30bd031362b3c6bb3/comment s { "comment": "{"date": {"$date": 1391639269724}, "text": "An interesting article and a great read."}” }
  29. 29. Schema iteration New feature in the backlog? Documents have dynamic schema so we just iterate the object schema. >>> user = { ‘username’ : ‘matt’, ‘first’ : ‘Matt’, ‘last’ : ‘Bates’, ‘preferences’ : { ‘opt_out’ : True } } >>> user.save(user)
  30. 30. Scale out with sharding
  31. 31. Summary • Flexible schema documents with ability to embed rich and complex data structures for performance • Schema is designed around data access patterns – and not data storage • Referencing for more flexibility • Develop schema with scale out in mind; shard key consideration is important (more to come)
  32. 32. Further reading • ‘myCMS’ skeleton source code: http://www.github.com/mattbates/mycms_mongodb • Data Models http://docs.mongodb.org/manual/data-modeling/ • Use case - metadata and asset management: http://docs.mongodb.org/ecosystem/use-cases/metadataand-asset-management/ • Use case - storing comments:http://docs.mongodb.org/ecosystem/usecases/storing-comments/
  33. 33. Next Session– 20th February • Daniel Roberts – Interacting with the database • Query and Update Language • Interactions between the application and database – Code Examples
  34. 34. #MongoDBBasics @MongoDB Thank You Q&A with the team

×