SlideShare a Scribd company logo
1 of 25
Download to read offline
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects
● Plone contents are objects,
● we store them in the ZODB,
● everything is fine, end of the story.
But what if ...
... we want to store non-contentish records?
Like polls, statistics, mail-list subscribers, etc.,
or any business-specific structured data.
Store them as contents anyway
That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary
system
● you need to deploy it,
● you need to backup it,
● you need to secure it,
● etc.
Problem 2: I hate SQL
No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB?
● Is the ZODB strong enough?
● Is the ZCatalog strong enough?
My grandmother often told me
"If you want to become stronger, you have to eat your soup."
Where do we find a good soup for Plone?
In a super souper!!!
souper.plone and souper
● It provides both storage and indexing.
● Record can store any persistent pickable data.
● Created by BlueDynamics.
● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record
>>> soup = get_soup('mysoup', context)
>>> record = Record()
>>> record.attrs['user'] = 'user1'
>>> record.attrs['text'] = u'foo bar baz'
>>> record.attrs['keywords'] = [u'1', u'2', u'ü']
>>> record_id = soup.add(record)
Record in record
>>> record['homeaddress'] = Record()
>>> record['homeaddress'].attrs['zip'] = '6020'
>>> record['homeaddress'].attrs['town'] = 'Innsbruck'
>>> record['homeaddress'].attrs['country'] = 'Austria'
Access record
>>> from souper.soup import get_soup
>>> soup = get_soup('mysoup', context)
>>> record = soup.get(record_id)
Query
>>> from repoze.catalog.query import Eq, Contains
>>> [r for r in soup.query(Eq('user', 'user1')
& Contains('text', 'foo'))]
[<Record object 'None' at ...>]
or using CQE format
>>> [r for r in soup.query("user == 'user1' and 'foo' in text")]
[<Record object 'None' at ...>]
souper
● a Soup-container can be moved to a specific ZODB mount-
point,
● it can be shared across multiple independent Plone instances,
● souper works on Plone and Pyramid.
Plomino & souper
● we use Plomino to build non-content oriented apps easily,
● we use souper to store huge amount of application data.
Plomino data storage
Originally, documents (=record) were ATFolder.
Capacity about 30 000.
Plomino data storage
Since 1.14, documents are pure CMF.
Capacity about 100 000.
Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper
With souper, documents are just soup records.
Capacity: several millions.
Typical use case
● Store 500 000 addresses,
● Be able to query them in full text and display the result on a map.
Demo
What is the limit?
Can we import Wikipedia in souper?
Demo with 400 000 records
Demo with 5,5 millions of records
Conclusion
● Usage performances are good,
● Plone performances are not impacted.
Use it!
Thoughts
● What about a REST API on top of it?
● Massive import is long and difficult, could it be improved?
Makina Corpus
For all questions related to this talk,
please contact Éric Bréhault
eric.brehault@makina-corpus.com
Tel : +33 534 566 958
www.makina-corpus.com

More Related Content

Similar to Importing wikipedia in Plone

Centralized logging system using mongoDB
Centralized logging system using mongoDBCentralized logging system using mongoDB
Centralized logging system using mongoDBVivek Parihar
 
Async IO and Multithreading explained
Async IO and Multithreading explainedAsync IO and Multithreading explained
Async IO and Multithreading explainedDirecti Group
 
Buildout: creating and deploying repeatable applications in python
Buildout: creating and deploying repeatable applications in pythonBuildout: creating and deploying repeatable applications in python
Buildout: creating and deploying repeatable applications in pythonCodeSyntax
 
python-160403194316.pdf
python-160403194316.pdfpython-160403194316.pdf
python-160403194316.pdfgmadhu8
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
Python Seminar PPT
Python Seminar PPTPython Seminar PPT
Python Seminar PPTShivam Gupta
 
Running a Plone product on Substance D
Running a Plone product on Substance DRunning a Plone product on Substance D
Running a Plone product on Substance DMakina Corpus
 
A novel approach to Undo
A novel approach to UndoA novel approach to Undo
A novel approach to UndoSean Upton
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexingMark Veltzer
 
GSoC2014 - PGCon2015 Presentation June, 2015
GSoC2014 - PGCon2015 Presentation June, 2015GSoC2014 - PGCon2015 Presentation June, 2015
GSoC2014 - PGCon2015 Presentation June, 2015Fabrízio Mello
 
Python_Introduction&DataType.pptx
Python_Introduction&DataType.pptxPython_Introduction&DataType.pptx
Python_Introduction&DataType.pptxHaythamBarakeh1
 
Prototype4Production Presented at FOSSASIA2015 at Singapore
Prototype4Production Presented at FOSSASIA2015 at SingaporePrototype4Production Presented at FOSSASIA2015 at Singapore
Prototype4Production Presented at FOSSASIA2015 at SingaporeDhruv Gohil
 
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...GeeksLab Odessa
 
python presntation 2.pptx
python presntation 2.pptxpython presntation 2.pptx
python presntation 2.pptxArpittripathi45
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughgabriellekuruvilla
 

Similar to Importing wikipedia in Plone (20)

Centralized logging system using mongoDB
Centralized logging system using mongoDBCentralized logging system using mongoDB
Centralized logging system using mongoDB
 
Async IO and Multithreading explained
Async IO and Multithreading explainedAsync IO and Multithreading explained
Async IO and Multithreading explained
 
Buildout: creating and deploying repeatable applications in python
Buildout: creating and deploying repeatable applications in pythonBuildout: creating and deploying repeatable applications in python
Buildout: creating and deploying repeatable applications in python
 
python-160403194316.pdf
python-160403194316.pdfpython-160403194316.pdf
python-160403194316.pdf
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Data analysis with pandas
Data analysis with pandasData analysis with pandas
Data analysis with pandas
 
Data Analysis With Pandas
Data Analysis With PandasData Analysis With Pandas
Data Analysis With Pandas
 
Python
PythonPython
Python
 
Python Seminar PPT
Python Seminar PPTPython Seminar PPT
Python Seminar PPT
 
python into.pptx
python into.pptxpython into.pptx
python into.pptx
 
Running a Plone product on Substance D
Running a Plone product on Substance DRunning a Plone product on Substance D
Running a Plone product on Substance D
 
A novel approach to Undo
A novel approach to UndoA novel approach to Undo
A novel approach to Undo
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
 
GSoC2014 - PGCon2015 Presentation June, 2015
GSoC2014 - PGCon2015 Presentation June, 2015GSoC2014 - PGCon2015 Presentation June, 2015
GSoC2014 - PGCon2015 Presentation June, 2015
 
Python_Introduction&DataType.pptx
Python_Introduction&DataType.pptxPython_Introduction&DataType.pptx
Python_Introduction&DataType.pptx
 
Prototype4Production Presented at FOSSASIA2015 at Singapore
Prototype4Production Presented at FOSSASIA2015 at SingaporePrototype4Production Presented at FOSSASIA2015 at Singapore
Prototype4Production Presented at FOSSASIA2015 at Singapore
 
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
 
python presntation 2.pptx
python presntation 2.pptxpython presntation 2.pptx
python presntation 2.pptx
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
 

Recently uploaded

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Importing wikipedia in Plone

  • 1. Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
  • 2. ZODB is good at storing objects ● Plone contents are objects, ● we store them in the ZODB, ● everything is fine, end of the story.
  • 3. But what if ... ... we want to store non-contentish records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
  • 4. Store them as contents anyway That is a powerfull solution. But there are 2 major problems...
  • 5. Problem 1: You need to manage a secondary system ● you need to deploy it, ● you need to backup it, ● you need to secure it, ● etc.
  • 6. Problem 2: I hate SQL No explanation here.
  • 7. I think I just cannot digest it...
  • 8. How to store many records in the ZODB? ● Is the ZODB strong enough? ● Is the ZCatalog strong enough?
  • 9. My grandmother often told me "If you want to become stronger, you have to eat your soup."
  • 10. Where do we find a good soup for Plone? In a super souper!!!
  • 11. souper.plone and souper ● It provides both storage and indexing. ● Record can store any persistent pickable data. ● Created by BlueDynamics. ● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
  • 12. Add a record >>> soup = get_soup('mysoup', context) >>> record = Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
  • 13. Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] = '6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
  • 14. Access record >>> from souper.soup import get_soup >>> soup = get_soup('mysoup', context) >>> record = soup.get(record_id)
  • 15. Query >>> from repoze.catalog.query import Eq, Contains >>> [r for r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
  • 16. souper ● a Soup-container can be moved to a specific ZODB mount- point, ● it can be shared across multiple independent Plone instances, ● souper works on Plone and Pyramid.
  • 17. Plomino & souper ● we use Plomino to build non-content oriented apps easily, ● we use souper to store huge amount of application data.
  • 18. Plomino data storage Originally, documents (=record) were ATFolder. Capacity about 30 000.
  • 19. Plomino data storage Since 1.14, documents are pure CMF. Capacity about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
  • 20. Plomino & souper With souper, documents are just soup records. Capacity: several millions.
  • 21. Typical use case ● Store 500 000 addresses, ● Be able to query them in full text and display the result on a map. Demo
  • 22. What is the limit? Can we import Wikipedia in souper? Demo with 400 000 records Demo with 5,5 millions of records
  • 23. Conclusion ● Usage performances are good, ● Plone performances are not impacted. Use it!
  • 24. Thoughts ● What about a REST API on top of it? ● Massive import is long and difficult, could it be improved?
  • 25. Makina Corpus For all questions related to this talk, please contact Éric Bréhault eric.brehault@makina-corpus.com Tel : +33 534 566 958 www.makina-corpus.com