Importing Wikipedia in Plone

458 views

Published on

Eric BREHAULT – Plone Conference 2013

Published in: Technology, Self Improvement
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
458
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Importing Wikipedia in Plone

  1. 1. Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
  2. 2. ZODB is good at storing objects ● Plone contents are objects, ● we store them in the ZODB, ● everything is fine, end of the story.
  3. 3. But what if ... ... we want to store non-contentish records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
  4. 4. Store them as contents anyway That is a powerfull solution. But there are 2 major problems...
  5. 5. Problem 1: You need to manage a secondary system ● you need to deploy it, ● you need to backup it, ● you need to secure it, ● etc.
  6. 6. Problem 2: I hate SQL No explanation here.
  7. 7. I think I just cannot digest it...
  8. 8. How to store many records in the ZODB? ● Is the ZODB strong enough? ● Is the ZCatalog strong enough?
  9. 9. My grandmother often told me "If you want to become stronger, you have to eat your soup."
  10. 10. Where do we find a good soup for Plone? In a super souper!!!
  11. 11. souper.plone and souper ● It provides both storage and indexing. ● Record can store any persistent pickable data. ● Created by BlueDynamics. ● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
  12. 12. Add a record >>> soup = get_soup('mysoup', context) >>> record = Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
  13. 13. Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] = '6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
  14. 14. Access record >>> from souper.soup import get_soup >>> soup = get_soup('mysoup', context) >>> record = soup.get(record_id)
  15. 15. Query >>> from repoze.catalog.query import Eq, Contains >>> [r for r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
  16. 16. souper ● a Soup-container can be moved to a specific ZODB mount- point, ● it can be shared across multiple independent Plone instances, ● souper works on Plone and Pyramid.
  17. 17. Plomino & souper ● we use Plomino to build non-content oriented apps easily, ● we use souper to store huge amount of application data.
  18. 18. Plomino data storage Originally, documents (=record) were ATFolder. Capacity about 30 000.
  19. 19. Plomino data storage Since 1.14, documents are pure CMF. Capacity about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
  20. 20. Plomino & souper With souper, documents are just soup records. Capacity: several millions.
  21. 21. Typical use case ● Store 500 000 addresses, ● Be able to query them in full text and display the result on a map. Demo
  22. 22. What is the limit? Can we import Wikipedia in souper? Demo with 400 000 records Demo with 5,5 millions of records
  23. 23. Conclusion ● Usage performances are good, ● Plone performances are not impacted. Use it!
  24. 24. Thoughts ● What about a REST API on top of it? ● Massive import is long and difficult, could it be improved?
  25. 25. Makina Corpus For all questions related to this talk, please contact Éric Bréhault eric.brehault@makina-corpus.com Tel : +33 534 566 958 www.makina-corpus.com

×