SlideShare a Scribd company logo
Large Files
                         Without the Trials

                          Aaron VanDerlip and Sally Kleinfeldt
                             Plone Symposium East 2010




Thursday, June 3, 2010
Acknowledgments
                    • Bioneers provides environmental education
                         and social connectivity through
                         conferences, radio and TV, books, and online
                         materials
                    • Engaged Jazkarta to build a file asset server
                         based on Plone to help them organize,
                         capture, and store multimedia and textual
                         content with files as large as 5 GB.


Thursday, June 3, 2010
Acknowledgments


                    • Aaron VanDerlip - Project Manager
                    • Kapil Thangavelu - Developer


Thursday, June 3, 2010
What is a Big File?


                    • Anything that makes you wait...


Thursday, June 3, 2010
Plone Problems with
                               Big Files

                    1.Uploading/Downloading
                    2.Versioning



Thursday, June 3, 2010
Uploading Big Files




                    • Both the user and a Zope thread are
                         waiting for the file transfer
Thursday, June 3, 2010
Thursday, June 3, 2010
Uploading Big Files

                    • Browser encodes file in multipart mime
                         format
                    • Zope must undo this encoding
                    • CPU and memory intensive, and SLOW
                    • Zope thread is blocked during this process

Thursday, June 3, 2010
Downloading Big Files


                    • ...the same thing happens in reverse



Thursday, June 3, 2010
Learning from Rails
                    • Get file encoding/unencoding and read/
                         write operations out of Plone
                    • Web servers are really good at this -
                         Apache, Nginx, and Lighttpd
                    • Our implementation uses Apache
                    • Apache file streaming is fast and threads
                         are cheap


Thursday, June 3, 2010
Learning from Rails

                    • Uploads: Apache plus mod_porter
                         http://therailsway.com/tags/porter
                    • Downloads: Apache plus mod_xsendfile
                         http://john.guen.in/past/2007/4/17/
                         send_files_faster_with_xsendfile/
                    • ...and of course ZODB Blob storage

Thursday, June 3, 2010
Mod Porter
                    • Parses the multipart mime data
                    • Writes the file to disk
                    • Changes the Request to contain a pointer
                         to the temp file on disk
                    • All done efficiently in C code inside your
                         Apache process


Thursday, June 3, 2010
Mod Porter




Thursday, June 3, 2010
Apache Config for
                                 Mod Porter
                         LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so

                         LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so

                         # Apache has a default read limit of 64MB, set it higher

                         APREQ2_ReadLimit 2G

                         ...

                         Porter On

                         # Files below this size will not be handled by mod-porter

                         PorterMinSize 14M

                         # Where the uploaded files are stored

                         PorterDir /mnt/uploads-Apache




Thursday, June 3, 2010
X-Sendfile

                    • HTTP header
                    • Set an X-Sendfile header and the path of a
                         file on your response
                    • Apache does the rest


Thursday, June 3, 2010
Apache Config for
                                  X-Sendfile
                         LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so

                         ...

                         EnableSendfile On

                         XSendFile on

                         # Config to send file resources directly from blob storage

                         XSendFilePath /mnt/bioneers/var/blobstorage




Thursday, June 3, 2010
Using X-Sendfile
                              from Python
                         def download(self, response, file_path):

                             response.setHeader("X-Sendfile",

                                                file_path)




Thursday, June 3, 2010
Blob Storage
                    • Uploads
                     • Blob.consumeFile moves file from
                           Apache’s temp area to blob storage
                           (ZODB/blob.py)
                         • Uses os.rename, file never enters Plone
                    • Downloads
                     • Served directly from blob storage
Thursday, June 3, 2010
Upload Process




Thursday, June 3, 2010
What About Really
                          Really Big Files?
                    • Use FTP
                    • Supports continuation and batching
                    • Handles files too large for browser limits
                    • Content editors use FTP to transfer files to
                         an upload directory



Thursday, June 3, 2010
UI




Thursday, June 3, 2010
Uploading with FTP




Thursday, June 3, 2010
ore.bigfile
                    • Minimally intrusive, works with the grain of
                         Plone
                    • Provides Big File content type
                    • IFrontendFileServer interface defines two
                         methods that provide web server support
                         for upload and download
                    • Apache and Nginx implementations
                         provided

Thursday, June 3, 2010
ore.bigfile
                                 Limitations

                    • Upload directory is hardcoded
                    • Possibility of error on very large images
                         which Mod Porter intercepts




Thursday, June 3, 2010
Versioning Big Files




Thursday, June 3, 2010
Solution
                    • Bypass CMFEditions - no file size limitation
                    • Create a new version only when file
                         changes (not metadata)
                    • Allow old versions to be purged
                    • Version information stored on Big File
                         object using annotations


Thursday, June 3, 2010
UI




Thursday, June 3, 2010
Conclusion
                    • ore.bigfile solves the Big File problem for a
                         particular use case, not feature complete
                    • It does so by taking advantage of mature
                         web server technology
                    • The code is minimally intrusive
                    • It provides a strategy for implementation
                         we can learn from as we improve Plone’s
                         Big File story

Thursday, June 3, 2010
http://svn.objectrealms.net/
                  view/public/browser/ore.bigfile

                          Questions

Thursday, June 3, 2010

More Related Content

Similar to Large Files without the Trials

App Engine Meetup
App Engine MeetupApp Engine Meetup
App Engine Meetup
John Woodell
 
Evaluating Online Video Platforms
Evaluating Online Video PlatformsEvaluating Online Video Platforms
Evaluating Online Video Platforms
indiefilmshow
 
The Reluctant SysAdmin : 360|iDev Austin 2010
The Reluctant SysAdmin : 360|iDev Austin 2010The Reluctant SysAdmin : 360|iDev Austin 2010
The Reluctant SysAdmin : 360|iDev Austin 2010
Voxilate
 
URIplay for Media Futures Conference (2009)
URIplay for Media Futures Conference (2009)URIplay for Media Futures Conference (2009)
URIplay for Media Futures Conference (2009)
Chris Jackson
 
BRAINREPUBLIC - Powered by no-SQL
BRAINREPUBLIC - Powered by no-SQLBRAINREPUBLIC - Powered by no-SQL
BRAINREPUBLIC - Powered by no-SQL
Andreas Jung
 
Deployment presentation
Deployment presentationDeployment presentation
Deployment presentation
Corey Purcell
 
Yuusuf alqadi assignment 3
Yuusuf alqadi assignment 3 Yuusuf alqadi assignment 3
Yuusuf alqadi assignment 3
AL-Qadi
 
Symfony in the Cloud
Symfony in the CloudSymfony in the Cloud
Symfony in the Cloud
Kris Wallsmith
 
Red Dirt Ruby Conference
Red Dirt Ruby ConferenceRed Dirt Ruby Conference
Red Dirt Ruby Conference
John Woodell
 
Cloudlytics - Analyze S3 & CloudFront Logs
Cloudlytics - Analyze S3 & CloudFront LogsCloudlytics - Analyze S3 & CloudFront Logs
Cloudlytics - Analyze S3 & CloudFront Logs
Cloudlytics
 
Ftp data exchange-mechanism
Ftp data exchange-mechanismFtp data exchange-mechanism
Ftp data exchange-mechanism
LATIPAT
 
Integrating Erlang with PHP
Integrating Erlang with PHPIntegrating Erlang with PHP
Integrating Erlang with PHP
Alvaro Videla
 
Railsconf 2010
Railsconf 2010Railsconf 2010
Railsconf 2010
John Woodell
 
TYPO3 CMS 6.2 LTS Workshop T3DD13
TYPO3 CMS 6.2 LTS Workshop T3DD13TYPO3 CMS 6.2 LTS Workshop T3DD13
TYPO3 CMS 6.2 LTS Workshop T3DD13
Ernesto Baschny
 
Codeworks'12 Rock Solid Deployment of PHP Apps
Codeworks'12 Rock Solid Deployment of PHP AppsCodeworks'12 Rock Solid Deployment of PHP Apps
Codeworks'12 Rock Solid Deployment of PHP Apps
Pablo Godel
 
Tomas Grails
Tomas GrailsTomas Grails
Tomas Grails
Skills Matter
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
Acronym Soup
Acronym SoupAcronym Soup
Acronym Soup
Dan Brickley
 
Cors
CorsCors
Cors
s4parke
 
Python-data-science.pptx
Python-data-science.pptxPython-data-science.pptx
Python-data-science.pptx
KabileshCm
 

Similar to Large Files without the Trials (20)

App Engine Meetup
App Engine MeetupApp Engine Meetup
App Engine Meetup
 
Evaluating Online Video Platforms
Evaluating Online Video PlatformsEvaluating Online Video Platforms
Evaluating Online Video Platforms
 
The Reluctant SysAdmin : 360|iDev Austin 2010
The Reluctant SysAdmin : 360|iDev Austin 2010The Reluctant SysAdmin : 360|iDev Austin 2010
The Reluctant SysAdmin : 360|iDev Austin 2010
 
URIplay for Media Futures Conference (2009)
URIplay for Media Futures Conference (2009)URIplay for Media Futures Conference (2009)
URIplay for Media Futures Conference (2009)
 
BRAINREPUBLIC - Powered by no-SQL
BRAINREPUBLIC - Powered by no-SQLBRAINREPUBLIC - Powered by no-SQL
BRAINREPUBLIC - Powered by no-SQL
 
Deployment presentation
Deployment presentationDeployment presentation
Deployment presentation
 
Yuusuf alqadi assignment 3
Yuusuf alqadi assignment 3 Yuusuf alqadi assignment 3
Yuusuf alqadi assignment 3
 
Symfony in the Cloud
Symfony in the CloudSymfony in the Cloud
Symfony in the Cloud
 
Red Dirt Ruby Conference
Red Dirt Ruby ConferenceRed Dirt Ruby Conference
Red Dirt Ruby Conference
 
Cloudlytics - Analyze S3 & CloudFront Logs
Cloudlytics - Analyze S3 & CloudFront LogsCloudlytics - Analyze S3 & CloudFront Logs
Cloudlytics - Analyze S3 & CloudFront Logs
 
Ftp data exchange-mechanism
Ftp data exchange-mechanismFtp data exchange-mechanism
Ftp data exchange-mechanism
 
Integrating Erlang with PHP
Integrating Erlang with PHPIntegrating Erlang with PHP
Integrating Erlang with PHP
 
Railsconf 2010
Railsconf 2010Railsconf 2010
Railsconf 2010
 
TYPO3 CMS 6.2 LTS Workshop T3DD13
TYPO3 CMS 6.2 LTS Workshop T3DD13TYPO3 CMS 6.2 LTS Workshop T3DD13
TYPO3 CMS 6.2 LTS Workshop T3DD13
 
Codeworks'12 Rock Solid Deployment of PHP Apps
Codeworks'12 Rock Solid Deployment of PHP AppsCodeworks'12 Rock Solid Deployment of PHP Apps
Codeworks'12 Rock Solid Deployment of PHP Apps
 
Tomas Grails
Tomas GrailsTomas Grails
Tomas Grails
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
 
Acronym Soup
Acronym SoupAcronym Soup
Acronym Soup
 
Cors
CorsCors
Cors
 
Python-data-science.pptx
Python-data-science.pptxPython-data-science.pptx
Python-data-science.pptx
 

Recently uploaded

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Large Files without the Trials

  • 1. Large Files Without the Trials Aaron VanDerlip and Sally Kleinfeldt Plone Symposium East 2010 Thursday, June 3, 2010
  • 2. Acknowledgments • Bioneers provides environmental education and social connectivity through conferences, radio and TV, books, and online materials • Engaged Jazkarta to build a file asset server based on Plone to help them organize, capture, and store multimedia and textual content with files as large as 5 GB. Thursday, June 3, 2010
  • 3. Acknowledgments • Aaron VanDerlip - Project Manager • Kapil Thangavelu - Developer Thursday, June 3, 2010
  • 4. What is a Big File? • Anything that makes you wait... Thursday, June 3, 2010
  • 5. Plone Problems with Big Files 1.Uploading/Downloading 2.Versioning Thursday, June 3, 2010
  • 6. Uploading Big Files • Both the user and a Zope thread are waiting for the file transfer Thursday, June 3, 2010
  • 8. Uploading Big Files • Browser encodes file in multipart mime format • Zope must undo this encoding • CPU and memory intensive, and SLOW • Zope thread is blocked during this process Thursday, June 3, 2010
  • 9. Downloading Big Files • ...the same thing happens in reverse Thursday, June 3, 2010
  • 10. Learning from Rails • Get file encoding/unencoding and read/ write operations out of Plone • Web servers are really good at this - Apache, Nginx, and Lighttpd • Our implementation uses Apache • Apache file streaming is fast and threads are cheap Thursday, June 3, 2010
  • 11. Learning from Rails • Uploads: Apache plus mod_porter http://therailsway.com/tags/porter • Downloads: Apache plus mod_xsendfile http://john.guen.in/past/2007/4/17/ send_files_faster_with_xsendfile/ • ...and of course ZODB Blob storage Thursday, June 3, 2010
  • 12. Mod Porter • Parses the multipart mime data • Writes the file to disk • Changes the Request to contain a pointer to the temp file on disk • All done efficiently in C code inside your Apache process Thursday, June 3, 2010
  • 14. Apache Config for Mod Porter LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so # Apache has a default read limit of 64MB, set it higher APREQ2_ReadLimit 2G ... Porter On # Files below this size will not be handled by mod-porter PorterMinSize 14M # Where the uploaded files are stored PorterDir /mnt/uploads-Apache Thursday, June 3, 2010
  • 15. X-Sendfile • HTTP header • Set an X-Sendfile header and the path of a file on your response • Apache does the rest Thursday, June 3, 2010
  • 16. Apache Config for X-Sendfile LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so ... EnableSendfile On XSendFile on # Config to send file resources directly from blob storage XSendFilePath /mnt/bioneers/var/blobstorage Thursday, June 3, 2010
  • 17. Using X-Sendfile from Python def download(self, response, file_path): response.setHeader("X-Sendfile", file_path) Thursday, June 3, 2010
  • 18. Blob Storage • Uploads • Blob.consumeFile moves file from Apache’s temp area to blob storage (ZODB/blob.py) • Uses os.rename, file never enters Plone • Downloads • Served directly from blob storage Thursday, June 3, 2010
  • 20. What About Really Really Big Files? • Use FTP • Supports continuation and batching • Handles files too large for browser limits • Content editors use FTP to transfer files to an upload directory Thursday, June 3, 2010
  • 23. ore.bigfile • Minimally intrusive, works with the grain of Plone • Provides Big File content type • IFrontendFileServer interface defines two methods that provide web server support for upload and download • Apache and Nginx implementations provided Thursday, June 3, 2010
  • 24. ore.bigfile Limitations • Upload directory is hardcoded • Possibility of error on very large images which Mod Porter intercepts Thursday, June 3, 2010
  • 26. Solution • Bypass CMFEditions - no file size limitation • Create a new version only when file changes (not metadata) • Allow old versions to be purged • Version information stored on Big File object using annotations Thursday, June 3, 2010
  • 28. Conclusion • ore.bigfile solves the Big File problem for a particular use case, not feature complete • It does so by taking advantage of mature web server technology • The code is minimally intrusive • It provides a strategy for implementation we can learn from as we improve Plone’s Big File story Thursday, June 3, 2010
  • 29. http://svn.objectrealms.net/ view/public/browser/ore.bigfile Questions Thursday, June 3, 2010