Ways to generate PDF from Python Web applications, Gaël Le Mignot

Pôle Systematic Paris-Region
Pôle Systematic Paris-RegionPôle Systematic Paris-Region
Generating PDF from Python web applications
Gaël LE MIGNOT
Pilot Systems
June 6, 2017
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Summary
1 Introduction
2 Tools
3 Tips, tricks and pitfalls
4 Conclusion
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Introduction
Pilot Systems
Free Software service provider
Python Web application development and hosting
Using Zope/Plone (since 2000) and Django (since 0.96)
All kind of customers (public/private, small/big, . . . )
Generating PDFs
Very frequently asked
Different purpose require different tools
Several pitfalls to avoid
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Weasyprint - presentation
What is weasyprint?
Free Software Python library
Convert HTML5 page (using a print CSS) into PDF
Also exists in command-line
When to use it?
To convert an existing HTML document
Consistency: same templating engine, same language
For simple page layouts
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Weasyprint - code details
Simple usage
from weasyprint import HTML, CSS
html = template()
data = HTML(string=html).write_pdf()
Some mangling with BeautifulSoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
bl = ('typography.com', 'logged-in.css')
for css in soup.findAll("link"):
for cssname in bl:
if cssname in css['href']:
css.extract()
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Weasyprint - code details
Add some page header/footer
@page {
margin: 3cm 2cm;
@bottom-right {
content: "Page " counter(page)
}
@top-center {
content: "Pilot Systems";
}
}
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Reportlab - presentation
What is reportlab?
Python library for generating PDF and graphs
Powerful RML templating language
Template and story concepts
Versions and tools
Complicated licensing
Reportlab PDF toolkit: limited Free Software version
Reportlab PLUS: non-free complete version
trml2pdf: free software, third-party implementation of RML
RMLPageTemplate: Zope integration of trml2pdf
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Reportlab - code example
Bright warning
<template>
<fill color="yellow"/>
<rect x="115mm" y="217mm"
width="90mm" height="18mm"
fill="yes" stroke="yes"/>
<frame id="warning" x1="115mm" y1="213mm"
width="90mm" height="24mm" />
</template>
<story>
<para>
TEMPORARY DOCUMENT - DO NOT PRINT
</para>
</story>
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
pdftk
What is pdftk?
Toolbox to manipulate PDF
Perform operation like extract pages, concatenate
Can also stamp a PDF on top of another
Command-line tool, so use subprocess
Use-case
Afdas - collect taxes and finance training
Companies make a yearly declaration
Take a background and fill cells
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
LATEX
What is LATEX?
Very powerful document composition system
Used for scientific publishing, among others
Used for those slides, too
How to use it?
Generate a .tex file
Can use a template, or intermediate language (like rst)
Then execute pdflatex
When to use it?
Rich formatting
Table of content, index, glossary, . . .
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Other tools
For the brave
Client-side rendering with JS libraries
Using LibreOffice with pyuno
Generate QR-code/datamatrix with elaphe
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
HTTP Headers
Don’t forget HTTP headers
Specify the content-type
Hint between displaying and downloading
Provide default filename
Code example
response.setHeader('Content-Type',
'application/pdf')
cd = 'attachment; filename="%s"' % filename
response.setHeader('Content-Disposition', cd)
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Handling long generation times
The problem
Generate a PDF report of 500 pages
It takes 10 minutes
Timeout or users get angry
Solutions
Increase timeouts, inform users
Use fork or threads to generate async
Use a scheduler like Celery
Send the result by email, with a link
Cleaning
find /path/to/pdfs -mtime +14 -delete
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Careful with search engines
Typical situation
Public website (using a CMS)
Button on each page to get a PDF version
A crawler comes... and boom.
Don’t panic
Use robots.txt file, but limited
Have the button do a POST
Use load-balancer like haproxy and pin PDF requests
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
CPU and RAM usage
PDF generation is expensive
PDF generation can be heavy both in CPU and RAM
Always estimate your volume before deploying
Task schedulers (like Celery) are great help
Be nice!
#!/bin/sh
PDFTK=/usr/bin/pdftk
exec nice -n 10 taskset -c 0 $PDFTK "$@"
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Accessing external resources
The problem
Restricted access CSS and images
Common with weasyprint, but can also happen with other
tools
Solutions
Reuse the user’s cookies in the sub-requests
Extract the resources to a temporary directory
Allow unprotected access from localhost (dangerous)
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Accessing external resources
Cookie code example
import urllib2
cookies = request.cookies.items()
cookies = [ '%s=%s' % (k,v) for k,v in cookies ]
cookiestr = "; ".join(cookies)
cookiestr = cookiestr.replace('n', '')
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', cookies))
html = opener.open(ressource_url).read()
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Encrypted PDFs
Typical use-case
User submitted a form with text fields and PDF
attachments
At the end the answers are contactened into a PDF
Or even all the answers of all users!
Use weasyprint + pdftk or LATEX
What happens
It works most of the time
But on some PDF it breaks weirdly
The culprit: DRM (Digital Restrictions Management)
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Encrypted PDFs
What to do?
Ensure the PDF is not DRM-protected
Use pdfinfo from poppler
Code example
out = subprocess.check_output([ 'pdfinfo',
pdffile ])
if re.search('Encrypted:.*yes', out):
raise ValueError, "DRM protected"
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Conclusion
Conclusion
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Conclusion
Thanks for listening!
Any question?
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
1 of 21

Recommended

Developer-friendly taskqueues: What you should ask yourself before choosing one by
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
684 views30 slides
Simple ETL in python 3.5+ with Bonobo - PyParis 2017 by
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Romain Dorgueil
1.7K views82 slides
Is Python still production ready ? Ludovic Gasc by
Is Python still production ready ? Ludovic GascIs Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic GascPôle Systematic Paris-Region
649 views68 slides
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat by
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat Pôle Systematic Paris-Region
997 views42 slides
Designing and coding for cloud-native applications using Python, Harjinder Mi... by
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Pôle Systematic Paris-Region
653 views33 slides
Building a high-performance, scalable ML & NLP platform with Python, Sheer El... by
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...Pôle Systematic Paris-Region
3.1K views27 slides

More Related Content

What's hot

DaNode - A home made web server in D by
DaNode - A home made web server in DDaNode - A home made web server in D
DaNode - A home made web server in DAndrei Alexandrescu
927 views9 slides
Pulumi. Modern Infrastructure as Code. by
Pulumi. Modern Infrastructure as Code.Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.Yurii Bychenok
693 views16 slides
Infrastructure as "Code" with Pulumi by
Infrastructure as "Code" with PulumiInfrastructure as "Code" with Pulumi
Infrastructure as "Code" with PulumiVenura Athukorala
231 views12 slides
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)? by
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?Jonas Hecht
1.8K views65 slides
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5 by
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5David Voyles
6.9K views53 slides
Our Puppet Story (Linuxtag 2014) by
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)DECK36
2.9K views49 slides

What's hot(20)

Pulumi. Modern Infrastructure as Code. by Yurii Bychenok
Pulumi. Modern Infrastructure as Code.Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.
Yurii Bychenok693 views
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)? by Jonas Hecht
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
Jonas Hecht1.8K views
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5 by David Voyles
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5
David Voyles6.9K views
Our Puppet Story (Linuxtag 2014) by DECK36
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)
DECK362.9K views
20151117 IoT를 위한 서비스 구성과 개발 by 영욱 김
20151117 IoT를 위한 서비스 구성과 개발20151117 IoT를 위한 서비스 구성과 개발
20151117 IoT를 위한 서비스 구성과 개발
영욱 김1.1K views
Beachhead implements new opcode on CLR JIT by Kouji Matsui
Beachhead implements new opcode on CLR JITBeachhead implements new opcode on CLR JIT
Beachhead implements new opcode on CLR JIT
Kouji Matsui18.2K views
#PDR15 - waf, wscript and Your Pebble App by Pebble Technology
#PDR15 - waf, wscript and Your Pebble App#PDR15 - waf, wscript and Your Pebble App
#PDR15 - waf, wscript and Your Pebble App
Pebble Technology2.2K views
PyHEP 2018: Tools to bind to Python by Henry Schreiner
PyHEP 2018:  Tools to bind to PythonPyHEP 2018:  Tools to bind to Python
PyHEP 2018: Tools to bind to Python
Henry Schreiner862 views
Data Management and Streaming Strategies in Drakensang Online by Andre Weissflog
Data Management and Streaming Strategies in Drakensang OnlineData Management and Streaming Strategies in Drakensang Online
Data Management and Streaming Strategies in Drakensang Online
Andre Weissflog3.3K views
Machine Learning on Your Hand - Introduction to Tensorflow Lite Preview by Modulabs
Machine Learning on Your Hand - Introduction to Tensorflow Lite PreviewMachine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Machine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Modulabs1.5K views
Puppetizing Your Organization by Robert Nelson
Puppetizing Your OrganizationPuppetizing Your Organization
Puppetizing Your Organization
Robert Nelson1.2K views
Real-Time Web Apps & Symfony. What are your options? by Phil Leggetter
Real-Time Web Apps & Symfony. What are your options?Real-Time Web Apps & Symfony. What are your options?
Real-Time Web Apps & Symfony. What are your options?
Phil Leggetter3.6K views
Infrastructure as (real) Code – Manage your K8s resources with Pulumi by inovex GmbH
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
inovex GmbH280 views
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ... by DigitalOcean
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
DigitalOcean14 views

Similar to Ways to generate PDF from Python Web applications, Gaël Le Mignot

PyQt Application Development On Maemo by
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemoachipa
2.8K views30 slides
Introduction to Google App Engine with Python by
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with PythonBrian Lyttle
11.9K views26 slides
Tool overview – how to capture – how to create basic workflow .pptx by
Tool overview – how to capture – how to create basic workflow .pptxTool overview – how to capture – how to create basic workflow .pptx
Tool overview – how to capture – how to create basic workflow .pptxRUPAK BHATTACHARJEE
7 views33 slides
Continuous Delivery for Python Developers – PyCon Otto by
Continuous Delivery for Python Developers – PyCon OttoContinuous Delivery for Python Developers – PyCon Otto
Continuous Delivery for Python Developers – PyCon OttoPeter Bittner
148 views38 slides

Similar to Ways to generate PDF from Python Web applications, Gaël Le Mignot(20)

PyQt Application Development On Maemo by achipa
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemo
achipa2.8K views
Introduction to Google App Engine with Python by Brian Lyttle
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with Python
Brian Lyttle11.9K views
Tool overview – how to capture – how to create basic workflow .pptx by RUPAK BHATTACHARJEE
Tool overview – how to capture – how to create basic workflow .pptxTool overview – how to capture – how to create basic workflow .pptx
Tool overview – how to capture – how to create basic workflow .pptx
Continuous Delivery for Python Developers – PyCon Otto by Peter Bittner
Continuous Delivery for Python Developers – PyCon OttoContinuous Delivery for Python Developers – PyCon Otto
Continuous Delivery for Python Developers – PyCon Otto
Peter Bittner148 views
بررسی چارچوب جنگو by railsbootcamp
بررسی چارچوب جنگوبررسی چارچوب جنگو
بررسی چارچوب جنگو
railsbootcamp369 views
Taking Your FDM Application to the Next Level with Advanced Scripting by Alithya
Taking Your FDM Application to the Next Level with Advanced ScriptingTaking Your FDM Application to the Next Level with Advanced Scripting
Taking Your FDM Application to the Next Level with Advanced Scripting
Alithya10.3K views
Cloud Native Development by Manuel Garcia
Cloud Native DevelopmentCloud Native Development
Cloud Native Development
Manuel Garcia546 views
From localhost to the cloud: A Journey of Deployments by Tegar Imansyah
From localhost to the cloud: A Journey of DeploymentsFrom localhost to the cloud: A Journey of Deployments
From localhost to the cloud: A Journey of Deployments
Tegar Imansyah191 views
Php Development Stack by shah_neeraj
Php Development StackPhp Development Stack
Php Development Stack
shah_neeraj412 views
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run by wesley chun
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud RunDesigning flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
wesley chun212 views
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later by Haehnchen
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years laterSymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later
Haehnchen1.7K views

More from Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na... by
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
686 views39 slides
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ... by
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
293 views24 slides
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ... by
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
349 views38 slides
OSIS19_Cloud : Performance and power management in virtualized data centers, ... by
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
288 views27 slides
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ... by
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
271 views30 slides
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt... by
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
229 views9 slides

More from Pôle Systematic Paris-Region(20)

Recently uploaded

"Node.js Development in 2024: trends and tools", Nikita Galkin by
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
17 views38 slides
Info Session November 2023.pdf by
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
15 views15 slides
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
66 views46 slides
HTTP headers that make your website go faster - devs.gent November 2023 by
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
26 views151 slides
The Forbidden VPN Secrets.pdf by
The Forbidden VPN Secrets.pdfThe Forbidden VPN Secrets.pdf
The Forbidden VPN Secrets.pdfMariam Shaba
20 views72 slides
Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
24 views68 slides

Recently uploaded(20)

"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays17 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn26 views
The Forbidden VPN Secrets.pdf by Mariam Shaba
The Forbidden VPN Secrets.pdfThe Forbidden VPN Secrets.pdf
The Forbidden VPN Secrets.pdf
Mariam Shaba20 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana17 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by Simone Puorto
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
Simone Puorto13 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson126 views

Ways to generate PDF from Python Web applications, Gaël Le Mignot

  • 1. Generating PDF from Python web applications Gaël LE MIGNOT Pilot Systems June 6, 2017 Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 2. Summary 1 Introduction 2 Tools 3 Tips, tricks and pitfalls 4 Conclusion Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 3. Introduction Pilot Systems Free Software service provider Python Web application development and hosting Using Zope/Plone (since 2000) and Django (since 0.96) All kind of customers (public/private, small/big, . . . ) Generating PDFs Very frequently asked Different purpose require different tools Several pitfalls to avoid Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 4. Weasyprint - presentation What is weasyprint? Free Software Python library Convert HTML5 page (using a print CSS) into PDF Also exists in command-line When to use it? To convert an existing HTML document Consistency: same templating engine, same language For simple page layouts Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 5. Weasyprint - code details Simple usage from weasyprint import HTML, CSS html = template() data = HTML(string=html).write_pdf() Some mangling with BeautifulSoup from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') bl = ('typography.com', 'logged-in.css') for css in soup.findAll("link"): for cssname in bl: if cssname in css['href']: css.extract() Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 6. Weasyprint - code details Add some page header/footer @page { margin: 3cm 2cm; @bottom-right { content: "Page " counter(page) } @top-center { content: "Pilot Systems"; } } Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 7. Reportlab - presentation What is reportlab? Python library for generating PDF and graphs Powerful RML templating language Template and story concepts Versions and tools Complicated licensing Reportlab PDF toolkit: limited Free Software version Reportlab PLUS: non-free complete version trml2pdf: free software, third-party implementation of RML RMLPageTemplate: Zope integration of trml2pdf Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 8. Reportlab - code example Bright warning <template> <fill color="yellow"/> <rect x="115mm" y="217mm" width="90mm" height="18mm" fill="yes" stroke="yes"/> <frame id="warning" x1="115mm" y1="213mm" width="90mm" height="24mm" /> </template> <story> <para> TEMPORARY DOCUMENT - DO NOT PRINT </para> </story> Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 9. pdftk What is pdftk? Toolbox to manipulate PDF Perform operation like extract pages, concatenate Can also stamp a PDF on top of another Command-line tool, so use subprocess Use-case Afdas - collect taxes and finance training Companies make a yearly declaration Take a background and fill cells Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 10. LATEX What is LATEX? Very powerful document composition system Used for scientific publishing, among others Used for those slides, too How to use it? Generate a .tex file Can use a template, or intermediate language (like rst) Then execute pdflatex When to use it? Rich formatting Table of content, index, glossary, . . . Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 11. Other tools For the brave Client-side rendering with JS libraries Using LibreOffice with pyuno Generate QR-code/datamatrix with elaphe Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 12. HTTP Headers Don’t forget HTTP headers Specify the content-type Hint between displaying and downloading Provide default filename Code example response.setHeader('Content-Type', 'application/pdf') cd = 'attachment; filename="%s"' % filename response.setHeader('Content-Disposition', cd) Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 13. Handling long generation times The problem Generate a PDF report of 500 pages It takes 10 minutes Timeout or users get angry Solutions Increase timeouts, inform users Use fork or threads to generate async Use a scheduler like Celery Send the result by email, with a link Cleaning find /path/to/pdfs -mtime +14 -delete Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 14. Careful with search engines Typical situation Public website (using a CMS) Button on each page to get a PDF version A crawler comes... and boom. Don’t panic Use robots.txt file, but limited Have the button do a POST Use load-balancer like haproxy and pin PDF requests Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 15. CPU and RAM usage PDF generation is expensive PDF generation can be heavy both in CPU and RAM Always estimate your volume before deploying Task schedulers (like Celery) are great help Be nice! #!/bin/sh PDFTK=/usr/bin/pdftk exec nice -n 10 taskset -c 0 $PDFTK "$@" Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 16. Accessing external resources The problem Restricted access CSS and images Common with weasyprint, but can also happen with other tools Solutions Reuse the user’s cookies in the sub-requests Extract the resources to a temporary directory Allow unprotected access from localhost (dangerous) Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 17. Accessing external resources Cookie code example import urllib2 cookies = request.cookies.items() cookies = [ '%s=%s' % (k,v) for k,v in cookies ] cookiestr = "; ".join(cookies) cookiestr = cookiestr.replace('n', '') opener = urllib2.build_opener() opener.addheaders.append(('Cookie', cookies)) html = opener.open(ressource_url).read() Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 18. Encrypted PDFs Typical use-case User submitted a form with text fields and PDF attachments At the end the answers are contactened into a PDF Or even all the answers of all users! Use weasyprint + pdftk or LATEX What happens It works most of the time But on some PDF it breaks weirdly The culprit: DRM (Digital Restrictions Management) Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 19. Encrypted PDFs What to do? Ensure the PDF is not DRM-protected Use pdfinfo from poppler Code example out = subprocess.check_output([ 'pdfinfo', pdffile ]) if re.search('Encrypted:.*yes', out): raise ValueError, "DRM protected" Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 20. Conclusion Conclusion Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 21. Conclusion Thanks for listening! Any question? Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications