Tugdual Grall (@tgrall)
Alain Hélaïli (@AlainHelaili)
#MongoDBBasics @MongoDB
Construire une application avec MongoDB
Desi...
Agenda
• Travailler avec des documents
• Fonctionnalités de l’application
• Design du schéma
• Architecture de ‘myCMS’ et ...
Vocabulaire
RDBMS MongoDB
Database ➜ Database
Table ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded Document
For...
Modélisation des données
Exemple de document
{
‘_id’ : ObjectId(..),
‘title’: ‘Schema design in MongoDB’,
‘author’: ‘mattbates’,
‘text’: ‘Data in M...
Fonctionnalités de ‘myCMS’
• Différents types d’articles et catégories.
• Les utilisateurs peuvent s’enregistrer, se
conne...
Entités de ‘myCMS’
• Articles
• Differents types – blogs, galleries, enquêtes
• Multimedia embarqué (images, videos)
• Tag...
Typical (relational) ERD
# Python dictionary (or object)
>>> article = { ‘title’ : ‘Schema design in MongoDB’,
‘author’ : ‘mattbates’,
‘section’ : ...
>>> img_data = Binary(open(‘article_img.jpg’).read())
>>> article = { ‘title’ : ‘Schema design in MongoDB’,
‘author’ : ‘ma...
>>> article = { ‘title’ : ‘Favourite web application framework’,
‘author’ : ‘mattbates’,
‘section’ : ‘web-dev’,
‘slug’ : ‘...
>>> user= { ‘user’ : ‘mattbates’,
‘email’ : ‘matt.bates@mongodb.com’,
‘password’ : ‘xxxxxxxxxx’,
‘joined’ : datetime.datet...
Modélisation des commentaires (1)
• Deux collections – articles et comments
• Référence (i.e. foreign key) pour les relier...
Modélisation des commentaires (2)
• Une seule collection
articles–commentaires
embarqués dans les
documents article
• Pros...
Modélisation des commentaires (3)
• Autre option: hybride de (2) et (3), embarquer
top x commentaires (e.g. par date, popu...
Modélisation des commentaires (3)
{
‘_id’: ObjectId(..),
‘title’: ‘Schemadesignin MongoDB’,
‘author’: ‘mattbates’,
‘date’:...
Modélisation des commentaires (3)
{
‘_id’: ObjectId(..),
‘article_id’: ObjectId(..),
‘page’: 1,
‘count’: 42
‘comments’: [
...
Modélisation des interactions
• Interactions
– Article vus
– Commentaires
– (Social media sharing)
• Besoins
– Séries temp...
Modélisation des interactions
• Document par article par jour –
‘bucketing’
• Compteur journalier et sous-
document par he...
JSON and RESTful API
Client-side
JSON
(eg AngularJS, (BSON)
Real applications are not built at a shell – let’s build a RES...
myCMS REST endpoints
Method URI Action
GET /articles Retrieve all articles
GET /articles-by-tag/[tag] Retrieve all article...
$ git clone http://www.github.com/mattbates/mycms-mongodb
$ cd mycms-mongodb
$ virtualenv venv
$ source venv/bin/activate
...
@app.route('/cms/api/v1.0/articles', methods=['GET'])
def get_articles():
"""Retrieves all articles in the collection
sort...
@app.route('/cms/api/v1.0/articles/<string:article_id>/comments', methods = ['POST'])
def add_comment(article_id):
"""Adds...
# $inc the page count if bucket size (100) is exceeded
if page['count'] > 100:
db.articles.update(
{ '_id' : article_id,
'...
def add_interaction(article_id, type):
"""Record the interaction (view/comment) for the
specified article into the daily b...
$ curl -i http://localhost:5000/cms/api/v1.0/articles
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 20
Se...
$ curl -H "Content-Type: application/json" -X POST -d '{"text":"An interesting
article and a great read."}'
http://localho...
Schema iteration
New feature in the backlog?
Documents have dynamic schema so we just iterate
the object schema.
>>> user ...
Scale out with sharding
Résumé
• Document avec schéma flexible et possiblité
d’embarquer des structures de données riches et
complexes
• Différent...
Further reading
• ‘myCMS’ skeleton source code:
http://www.github.com/mattbates/mycms-mongodb
• Use case - metadata and as...
Prochaine Session– 26 Mars
• Interactions avec la base de données
• Langage de requêtes (find & update)
• Interactions ent...
#MongoDBBasics
Merci
Q&A avec l’équipe
2014 03-12-fr schema design and app architecture-2
2014 03-12-fr schema design and app architecture-2
Upcoming SlideShare
Loading in …5
×

2014 03-12-fr schema design and app architecture-2

1,468 views
1,208 views

Published on

Published in: Marketing
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,468
On SlideShare
0
From Embeds
0
Number of Embeds
274
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In the filing cabinet model, the patient’s x-rays, checkups, and allergies are stored in separate drawers and pulled together (like an RDBMS)In the file folder model, we store all of the patient information in a single folder (like MongoDB)
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • Large scale operation can be combined with high performance on commodity hardware through horizontal scalingBuild - Document oriented database maps perfectly to object oriented languagesScale - MongoDB presents clear path to scalability that isn&apos;t ops intensive - Provides same interface for sharded cluster as single instance
  • Cardinality – Can your data be broken down enough?Query Isolation - query targeting to a specific shardReliability – shard outagesA good shard key can:Optimize routingMinimize (unnecessary) trafficAllow best scaling
  • 2014 03-12-fr schema design and app architecture-2

    1. 1. Tugdual Grall (@tgrall) Alain Hélaïli (@AlainHelaili) #MongoDBBasics @MongoDB Construire une application avec MongoDB Design du schéma et architecture applicative
    2. 2. Agenda • Travailler avec des documents • Fonctionnalités de l’application • Design du schéma • Architecture de ‘myCMS’ et exemples de code • Q&A
    3. 3. Vocabulaire RDBMS MongoDB Database ➜ Database Table ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference
    4. 4. Modélisation des données
    5. 5. Exemple de document { ‘_id’ : ObjectId(..), ‘title’: ‘Schema design in MongoDB’, ‘author’: ‘mattbates’, ‘text’: ‘Data in MongoDB has a flexible schema..’, ‘date’ : ISODate(..), ‘tags’: [‘MongoDB’, ‘schema’], ‘comments’: [ { ‘text ‘ : ‘Really useful..’, ts: ISODate(..) } ] }
    6. 6. Fonctionnalités de ‘myCMS’ • Différents types d’articles et catégories. • Les utilisateurs peuvent s’enregistrer, se connecter/déconnecter, et éditer leur profil. • Les utilisateurs peuvent poster des articles et effectuer des commentaires sur ces articles. • Des statistiques d’utilisation sont collectées et analysées – publications, visualisations, interactions – pour le site et le back-office (analytics).
    7. 7. Entités de ‘myCMS’ • Articles • Differents types – blogs, galleries, enquêtes • Multimedia embarqué (images, videos) • Tags • Utilisateurs • Profils • Interactions • Commentaires • Vues
    8. 8. Typical (relational) ERD
    9. 9. # Python dictionary (or object) >>> article = { ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘section’ : ‘schema’, ‘slug’ : ‘schema-design-in-mongodb’, ‘text’ : ‘Data in MongoDB has a flexible schema..’, ‘date’ : datetime.datetime.utcnow(), ‘tags’ : [‘MongoDB’, ‘schema’] } >>> db[‘articles’].insert(article) Design du schéma… en code
    10. 10. >>> img_data = Binary(open(‘article_img.jpg’).read()) >>> article = { ‘title’ : ‘Schema design in MongoDB’, ‘author’ : ‘mattbates’, ‘section’ : ‘schema’, ‘slug’ : ‘schema-design-in-mongodb’, ‘text’ : ‘Data in MongoDB has a flexible schema..’, ‘date’ : datetime.datetime.utcnow(), ‘tags’ : [‘MongoDB’, ‘schema’], ‘headline_img’ : { ‘img’ : img_data, ‘caption’ : ‘A sample document at the shell’ }} >>> db[‘articles’].insert(article) Ajoutons une image
    11. 11. >>> article = { ‘title’ : ‘Favourite web application framework’, ‘author’ : ‘mattbates’, ‘section’ : ‘web-dev’, ‘slug’ : ‘web-app-frameworks’, ‘gallery’ : [ { ‘img_url’ : ‘http://x.com/45rty’, ‘caption’ : ‘Flask’, ..}, .. ] ‘date’ : datetime.datetime.utcnow(), ‘tags’ : [‘MongoDB’, ‘schema’], } >>> db[‘articles’].insert(article) Et differents types d’articles
    12. 12. >>> user= { ‘user’ : ‘mattbates’, ‘email’ : ‘matt.bates@mongodb.com’, ‘password’ : ‘xxxxxxxxxx’, ‘joined’ : datetime.datetime.utcnow() ‘location’ : { ‘city’ : ‘London’ }, } >>> db[‘users’].insert(user) Utilisateurs et profils
    13. 13. Modélisation des commentaires (1) • Deux collections – articles et comments • Référence (i.e. foreign key) pour les relier • MAIS.. N+1 requêtes pour récupérer articles et commentaires { ‘_id’: ObjectId(..), ‘title’: ‘Schema design in MongoDB’, ‘author’: ‘mattbates’, ‘date’: ISODate(..), ‘tags’: [‘MongoDB’, ‘schema’], ‘section’: ‘schema’, ‘slug’: ‘schema-design-in-mongodb’, ‘comments’: [ ObjectId(..),…] } { ‘_id’: ObjectId(..), ‘article_id’: 1, ‘text’: ‘Agreat article, helped me understand schema design’, ‘date’: ISODate(..),, ‘author’: ‘johnsmith’ }
    14. 14. Modélisation des commentaires (2) • Une seule collection articles–commentaires embarqués dans les documents article • Pros • Requête unique, design optimisé pour la lecture • Localité (disk, shard) • Cons • Tableau de commentaires non borné; taille des documents va croitre (rappel : limite 16MB) { ‘_id’: ObjectId(..), ‘title’: ‘Schema design in MongoDB’, ‘author’: ‘mattbates’, ‘date’: ISODate(..), ‘tags’: [‘MongoDB’, ‘schema’], … ‘comments’: [ { ‘text’: ‘Agreat article,helped me understandschema design’, ‘date’: ISODate(..), ‘author’: ‘johnsmith’ }, … ] }
    15. 15. Modélisation des commentaires (3) • Autre option: hybride de (2) et (3), embarquer top x commentaires (e.g. par date, popularité) dans le document article • Tableau de commentaires de taille fixe (2.4 feature) • Tous les autres commentaires sont déversés dans une collection ‘comments’ par lots • Pros – Taille des documents plus stable– moins de déplacements – Basé sur une seule requête dans la plupart des accès – Historique complet des commentaires disponible via requêtage/agrégation
    16. 16. Modélisation des commentaires (3) { ‘_id’: ObjectId(..), ‘title’: ‘Schemadesignin MongoDB’, ‘author’: ‘mattbates’, ‘date’: ISODate(..), ‘tags’:[‘MongoDB’,‘schema’], … ‘comments_count’:45, ‘comments_pages’: 1 ‘comments’: [ { ‘text’: ‘Agreat article, helped me understandschema design’, ‘date’: ISODate(..), ‘author’: ‘johnsmith’ }, … ] } Ajout d’un compteurde commentaires • Elimine les comptages lors de la lecture Tableau de commentairesde taille fixe • 10 plus récents • Triés par date lors de l’insertion
    17. 17. Modélisation des commentaires (3) { ‘_id’: ObjectId(..), ‘article_id’: ObjectId(..), ‘page’: 1, ‘count’: 42 ‘comments’: [ { ‘text’: ‘Agreat article, helped me understand schema design’, ‘date’: ISODate(..), ‘author’: ‘johnsmith’ }, … } Document ‘lot de commentaires’contenant jusqu’à 100 commentaires Tableau de 100 commentaires
    18. 18. Modélisation des interactions • Interactions – Article vus – Commentaires – (Social media sharing) • Besoins – Séries temporelles – Pré-agrégations pour préparer l’analytique
    19. 19. Modélisation des interactions • Document par article par jour – ‘bucketing’ • Compteur journalier et sous- document par heure pour les interactions • Tableau borné (24 heures) • Requête unitaire, prêt à être graphé { ‘_id’: ObjectId(..), ‘article_id’: ObjectId(..), ‘section’: ‘schema’, ‘date’: ISODate(..), ‘daily’: { ‘views’: 45, ‘comments’: 150 } ‘hours’: { 0 : { ‘views’: 10 }, 1 : { ‘views’: 2 }, … 23 : { ‘comments’: 14, ‘views’: 10 } } }
    20. 20. JSON and RESTful API Client-side JSON (eg AngularJS, (BSON) Real applications are not built at a shell – let’s build a RESTful API. Pymongo driver Python web app HTTP(S) REST Examples to follow: Python RESTful API using Flask microframework
    21. 21. myCMS REST endpoints Method URI Action GET /articles Retrieve all articles GET /articles-by-tag/[tag] Retrieve all articles by tag GET /articles/[article_id] Retrieve a specific article by article_id POST /articles Add a new article GET /articles/[article_id]/comments Retrieve all article comments by article_id POST /articles/[article_id]/comments Add a new comment to an article. POST /users Register a user user GET /users/[username] Retrieve user’s profile PUT /users/[username] Update a user’s profile
    22. 22. $ git clone http://www.github.com/mattbates/mycms-mongodb $ cd mycms-mongodb $ virtualenv venv $ source venv/bin/activate $ pip install –r requirements.txt ($ deactivate) Getting started with the skeleton code
    23. 23. @app.route('/cms/api/v1.0/articles', methods=['GET']) def get_articles(): """Retrieves all articles in the collection sorted by date """ # query all articles and return a cursor sorted by date cur = db['articles'].find().sort({'date':-1}) if not cur: abort(400) # iterate the cursor and add docs to a dict articles = [article for article in cur] return jsonify({'articles' : json.dumps(articles, default=json_util.default)}) RESTful API methods in Python + Flask
    24. 24. @app.route('/cms/api/v1.0/articles/<string:article_id>/comments', methods = ['POST']) def add_comment(article_id): """Adds a comment to the specified article and a bucket, as well as updating a view counter "”” … # push the comment to the latest bucket and $inc the count page = db['comments'].find_and_modify( { 'article_id' : ObjectId(article_id), 'page' : comments_pages}, { '$inc' : { 'count' :1 }, '$push' : { 'comments' : comment } }, fields= {'count':1}, upsert=True, new=True) RESTful API methods in Python + Flask
    25. 25. # $inc the page count if bucket size (100) is exceeded if page['count'] > 100: db.articles.update( { '_id' : article_id, 'comments_pages': article['comments_pages'] }, { '$inc': { 'comments_pages': 1 } } ) # let's also add to the article itself # most recent 10 comments only res = db['articles'].update( {'_id' : ObjectId(article_id)}, {'$push' : {'comments' : { '$each' : [comment], '$sort' : {’date' : 1 }, '$slice' : -10}}, '$inc' : {'comment_count' : 1}}) … RESTful API methods in Python + Flask
    26. 26. def add_interaction(article_id, type): """Record the interaction (view/comment) for the specified article into the daily bucket and update an hourly counter """ ts = datetime.datetime.utcnow() # $inc daily and hourly view counters in day/article stats bucket # note the unacknowledged w=0 write concern for performance db['interactions'].update( { 'article_id' : ObjectId(article_id), 'date' : datetime.datetime(ts.year, ts.month, ts.day)}, { '$inc' : { 'daily.views' : 1, 'hourly.{}.{}'.format(type, ts.hour) : 1 }}, upsert=True, w=0) RESTful API methods in Python + Flask
    27. 27. $ curl -i http://localhost:5000/cms/api/v1.0/articles HTTP/1.0 200 OK Content-Type: application/json Content-Length: 20 Server: Werkzeug/0.9.4 Python/2.7.6 Date: Sat, 01 Feb 2014 09:52:57 GMT { "articles": "[{"author": "mattbates", "title": "Schema design in MongoDB", "text": "Data in MongoDB has a flexible schema..", "tags": ["MongoDB", "schema"], "date": {"$date": 1391293347408}, "_id": {"$oid": "52ed73a30bd031362b3c6bb3"}}]" } Testing the API – retrieve articles
    28. 28. $ curl -H "Content-Type: application/json" -X POST -d '{"text":"An interesting article and a great read."}' http://localhost:5000/cms/api/v1.0/articles/52ed73a30bd031362b3c6bb3/comment s { "comment": "{"date": {"$date": 1391639269724}, "text": "An interesting article and a great read."}” } Testing the API – comment on an article
    29. 29. Schema iteration New feature in the backlog? Documents have dynamic schema so we just iterate the object schema. >>> user = { ‘username’: ‘matt’, ‘first’ : ‘Matt’, ‘last’ : ‘Bates’, ‘preferences’: { ‘opt_out’: True } } >>> user..save(user)
    30. 30. Scale out with sharding
    31. 31. Résumé • Document avec schéma flexible et possiblité d’embarquer des structures de données riches et complexes • Différentes stratégies pour assurer la performance • Design du schéma s’appuie sur les modes d’accès – pas sur les modes de stockage • Références pour plus de flexibilité • Garder en tête la distribution horizontale (shard key)
    32. 32. Further reading • ‘myCMS’ skeleton source code: http://www.github.com/mattbates/mycms-mongodb • Use case - metadata and asset management: http://docs.mongodb.org/ecosystem/use- cases/metadata-and-asset-management/ • Use case - storing comments:http://docs.mongodb.org/ecosystem/use- cases/storing-comments/
    33. 33. Prochaine Session– 26 Mars • Interactions avec la base de données • Langage de requêtes (find & update) • Interactions entre l’application et la base • Exemples de code
    34. 34. #MongoDBBasics Merci Q&A avec l’équipe

    ×