Large Files
                         Without the Trials

                          Aaron VanDerlip and Sally Kleinfeldt
                             Plone Symposium East 2010




Thursday, June 3, 2010
Acknowledgments
                    • Bioneers provides environmental education
                         and social connectivity through
                         conferences, radio and TV, books, and online
                         materials
                    • Engaged Jazkarta to build a file asset server
                         based on Plone to help them organize,
                         capture, and store multimedia and textual
                         content with files as large as 5 GB.


Thursday, June 3, 2010
Acknowledgments


                    • Aaron VanDerlip - Project Manager
                    • Kapil Thangavelu - Developer


Thursday, June 3, 2010
What is a Big File?


                    • Anything that makes you wait...


Thursday, June 3, 2010
Plone Problems with
                               Big Files

                    1.Uploading/Downloading
                    2.Versioning



Thursday, June 3, 2010
Uploading Big Files




                    • Both the user and a Zope thread are
                         waiting for the file transfer
Thursday, June 3, 2010
Thursday, June 3, 2010
Uploading Big Files

                    • Browser encodes file in multipart mime
                         format
                    • Zope must undo this encoding
                    • CPU and memory intensive, and SLOW
                    • Zope thread is blocked during this process

Thursday, June 3, 2010
Downloading Big Files


                    • ...the same thing happens in reverse



Thursday, June 3, 2010
Learning from Rails
                    • Get file encoding/unencoding and read/
                         write operations out of Plone
                    • Web servers are really good at this -
                         Apache, Nginx, and Lighttpd
                    • Our implementation uses Apache
                    • Apache file streaming is fast and threads
                         are cheap


Thursday, June 3, 2010
Learning from Rails

                    • Uploads: Apache plus mod_porter
                         http://therailsway.com/tags/porter
                    • Downloads: Apache plus mod_xsendfile
                         http://john.guen.in/past/2007/4/17/
                         send_files_faster_with_xsendfile/
                    • ...and of course ZODB Blob storage

Thursday, June 3, 2010
Mod Porter
                    • Parses the multipart mime data
                    • Writes the file to disk
                    • Changes the Request to contain a pointer
                         to the temp file on disk
                    • All done efficiently in C code inside your
                         Apache process


Thursday, June 3, 2010
Mod Porter




Thursday, June 3, 2010
Apache Config for
                                 Mod Porter
                         LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so

                         LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so

                         # Apache has a default read limit of 64MB, set it higher

                         APREQ2_ReadLimit 2G

                         ...

                         Porter On

                         # Files below this size will not be handled by mod-porter

                         PorterMinSize 14M

                         # Where the uploaded files are stored

                         PorterDir /mnt/uploads-Apache




Thursday, June 3, 2010
X-Sendfile

                    • HTTP header
                    • Set an X-Sendfile header and the path of a
                         file on your response
                    • Apache does the rest


Thursday, June 3, 2010
Apache Config for
                                  X-Sendfile
                         LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so

                         ...

                         EnableSendfile On

                         XSendFile on

                         # Config to send file resources directly from blob storage

                         XSendFilePath /mnt/bioneers/var/blobstorage




Thursday, June 3, 2010
Using X-Sendfile
                              from Python
                         def download(self, response, file_path):

                             response.setHeader("X-Sendfile",

                                                file_path)




Thursday, June 3, 2010
Blob Storage
                    • Uploads
                     • Blob.consumeFile moves file from
                           Apache’s temp area to blob storage
                           (ZODB/blob.py)
                         • Uses os.rename, file never enters Plone
                    • Downloads
                     • Served directly from blob storage
Thursday, June 3, 2010
Upload Process




Thursday, June 3, 2010
What About Really
                          Really Big Files?
                    • Use FTP
                    • Supports continuation and batching
                    • Handles files too large for browser limits
                    • Content editors use FTP to transfer files to
                         an upload directory



Thursday, June 3, 2010
UI




Thursday, June 3, 2010
Uploading with FTP




Thursday, June 3, 2010
ore.bigfile
                    • Minimally intrusive, works with the grain of
                         Plone
                    • Provides Big File content type
                    • IFrontendFileServer interface defines two
                         methods that provide web server support
                         for upload and download
                    • Apache and Nginx implementations
                         provided

Thursday, June 3, 2010
ore.bigfile
                                 Limitations

                    • Upload directory is hardcoded
                    • Possibility of error on very large images
                         which Mod Porter intercepts




Thursday, June 3, 2010
Versioning Big Files




Thursday, June 3, 2010
Solution
                    • Bypass CMFEditions - no file size limitation
                    • Create a new version only when file
                         changes (not metadata)
                    • Allow old versions to be purged
                    • Version information stored on Big File
                         object using annotations


Thursday, June 3, 2010
UI




Thursday, June 3, 2010
Conclusion
                    • ore.bigfile solves the Big File problem for a
                         particular use case, not feature complete
                    • It does so by taking advantage of mature
                         web server technology
                    • The code is minimally intrusive
                    • It provides a strategy for implementation
                         we can learn from as we improve Plone’s
                         Big File story

Thursday, June 3, 2010
http://svn.objectrealms.net/
                  view/public/browser/ore.bigfile

                          Questions

Thursday, June 3, 2010

Large Files without the Trials

  • 1.
    Large Files Without the Trials Aaron VanDerlip and Sally Kleinfeldt Plone Symposium East 2010 Thursday, June 3, 2010
  • 2.
    Acknowledgments • Bioneers provides environmental education and social connectivity through conferences, radio and TV, books, and online materials • Engaged Jazkarta to build a file asset server based on Plone to help them organize, capture, and store multimedia and textual content with files as large as 5 GB. Thursday, June 3, 2010
  • 3.
    Acknowledgments • Aaron VanDerlip - Project Manager • Kapil Thangavelu - Developer Thursday, June 3, 2010
  • 4.
    What is aBig File? • Anything that makes you wait... Thursday, June 3, 2010
  • 5.
    Plone Problems with Big Files 1.Uploading/Downloading 2.Versioning Thursday, June 3, 2010
  • 6.
    Uploading Big Files • Both the user and a Zope thread are waiting for the file transfer Thursday, June 3, 2010
  • 7.
  • 8.
    Uploading Big Files • Browser encodes file in multipart mime format • Zope must undo this encoding • CPU and memory intensive, and SLOW • Zope thread is blocked during this process Thursday, June 3, 2010
  • 9.
    Downloading Big Files • ...the same thing happens in reverse Thursday, June 3, 2010
  • 10.
    Learning from Rails • Get file encoding/unencoding and read/ write operations out of Plone • Web servers are really good at this - Apache, Nginx, and Lighttpd • Our implementation uses Apache • Apache file streaming is fast and threads are cheap Thursday, June 3, 2010
  • 11.
    Learning from Rails • Uploads: Apache plus mod_porter http://therailsway.com/tags/porter • Downloads: Apache plus mod_xsendfile http://john.guen.in/past/2007/4/17/ send_files_faster_with_xsendfile/ • ...and of course ZODB Blob storage Thursday, June 3, 2010
  • 12.
    Mod Porter • Parses the multipart mime data • Writes the file to disk • Changes the Request to contain a pointer to the temp file on disk • All done efficiently in C code inside your Apache process Thursday, June 3, 2010
  • 13.
  • 14.
    Apache Config for Mod Porter LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so # Apache has a default read limit of 64MB, set it higher APREQ2_ReadLimit 2G ... Porter On # Files below this size will not be handled by mod-porter PorterMinSize 14M # Where the uploaded files are stored PorterDir /mnt/uploads-Apache Thursday, June 3, 2010
  • 15.
    X-Sendfile • HTTP header • Set an X-Sendfile header and the path of a file on your response • Apache does the rest Thursday, June 3, 2010
  • 16.
    Apache Config for X-Sendfile LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so ... EnableSendfile On XSendFile on # Config to send file resources directly from blob storage XSendFilePath /mnt/bioneers/var/blobstorage Thursday, June 3, 2010
  • 17.
    Using X-Sendfile from Python def download(self, response, file_path): response.setHeader("X-Sendfile", file_path) Thursday, June 3, 2010
  • 18.
    Blob Storage • Uploads • Blob.consumeFile moves file from Apache’s temp area to blob storage (ZODB/blob.py) • Uses os.rename, file never enters Plone • Downloads • Served directly from blob storage Thursday, June 3, 2010
  • 19.
  • 20.
    What About Really Really Big Files? • Use FTP • Supports continuation and batching • Handles files too large for browser limits • Content editors use FTP to transfer files to an upload directory Thursday, June 3, 2010
  • 21.
  • 22.
  • 23.
    ore.bigfile • Minimally intrusive, works with the grain of Plone • Provides Big File content type • IFrontendFileServer interface defines two methods that provide web server support for upload and download • Apache and Nginx implementations provided Thursday, June 3, 2010
  • 24.
    ore.bigfile Limitations • Upload directory is hardcoded • Possibility of error on very large images which Mod Porter intercepts Thursday, June 3, 2010
  • 25.
  • 26.
    Solution • Bypass CMFEditions - no file size limitation • Create a new version only when file changes (not metadata) • Allow old versions to be purged • Version information stored on Big File object using annotations Thursday, June 3, 2010
  • 27.
  • 28.
    Conclusion • ore.bigfile solves the Big File problem for a particular use case, not feature complete • It does so by taking advantage of mature web server technology • The code is minimally intrusive • It provides a strategy for implementation we can learn from as we improve Plone’s Big File story Thursday, June 3, 2010
  • 29.
    http://svn.objectrealms.net/ view/public/browser/ore.bigfile Questions Thursday, June 3, 2010