SlideShare a Scribd company logo
1 of 42
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 1
Perl and PDF
Prabhakar Somu
psomu@yahoo.com
Zentech Innovations Pvt. Ltd.
Hyderabad, India
September 2, 2015
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 2
What we do
Transactional Communications
• Our business is primarily involved in
creation, printing, dispatching,
emailing and web-presenting
transactional/financial documents.
• Large volume PDF document
production
• Variable Data, Statement
composition
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 3
A brief history of PDF (Portable
Document Format)
• Created by Adobe in 1993
• Compact, device independent, cross
platform
• A subset of Postscript page description
language
• Font embedding/replacement/sub-setting
• Compression and structured (reusable)
component storage
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 4
A brief history of PDF contd…
• Versions:
– PDF 1.0 (Acrobat 1.0) – 1992
– PDF 1.1 (Acrobat 2.0) – 1994
– PDF 1.2 (Acrobat 3.0) – 1996
– PDF 1.3 (Acrobat 4.0) – 1999
– PDF 1.4 (Acrobat 5.0) – 2001
– PDF 1.5 (Acrobat 6.0) – 2003
– PDF 1.6 (Acrobat 7.0) – 2005
– PDF 1.7 (Acrobat 8.0) - 2006
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 5
Postscript and PDF
• Postscript is a Page Description Language
and a programming language
• Has to be interpreted and Imaged (ripped)
on a device
• Device specific
• PDF contains a subset of Postscript
elements without any control flow
• Self-Contained
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 6
Postscript and PDF
• A page in Postscript has to be ‘ripped’ and
imaged before a subsequent page can be
imaged (graphics state needs to be
maintained)
• Any page of a PDF file can be displayed
without needing to display earlier pages
• Device independent
• Self contained, fonts embedded, identical
rendering on all platforms
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 7
Postscript and PDF
• Ideal in a Web environment – as any page
can be displayed at any time
• PDF files can be streamed
• Compact, reusable components within a
PDF file
• Identical rendering across devices
• No interpretation required in PDF
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 8
Perl and PDF
• PDF::Create
• CAM::PDF
• PDF::API2
• PDF::API3
• PDF::Extract
• PDF::Xtract
• PDF::GetImages
• PDF::Template
• PDF::Reuse
• PDF::ReportWriter
• PDF::Table
• PDF::Parse
• PDF::Report
Several modules on CPAN
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 9
Perl and PDF (contd.)
• PDF::Core
• PDF::OCR2
• Fuse::PDF
• PDF::Burst
• PDF::Haru
• PDF::Imposition
• PDF::EasyPDF
• Image::Magick::Thumbn
ail::PDF
• PDF::Labels
• PDF::Tk
• PDF::Reuse::Barcode
• deletepdfpage.pl
• PDF::Boxer
• PDFlib
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 10
Special Mention
• PDF::API2
• PDF::API3
• CAM::PDF
• PDF::Haru
• Quite extensive
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 11
Approximate categorization of Perl
PDF Modules
• Creation
• Repurposing
• Extraction
• Miscellaneous
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 12
Creation
Repurposing
Content
Extraction
Miscellaneous
PDF::Create
PDF::Create
PDF::Table
PDF::Report
PDF::Template
PDF::ReportWriter
PDF::Haru
PDF::EasyPDF
PDF::Labels
PDF::Boxer
PDF::API2
PDF::API3
CAM::PDF
PDF::Extract
PDF::Xtract
PDF::GetImages
PDF::OCR
PDF::OCR2
PDF::Reuse
PDF::Burst
Image::Magick::
Thumbnail::PDF
PDF::Tk
PDF::Parse
PDF::Core
Fuse::PDF
pdflib
PDF::Reuse::Barc
ode
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 13
Approximate categorization of Perl
PDF Modules - Creation
• PDF::Create
• PDF::Table
• PDF::Report
• PDF::Template
• PDF::ReportWriter
• PDF::Haru
• PDF::EasyPDF
• PDF::Labels
• PDF::Boxer
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 14
Approximate categorization of Perl
PDF Modules - Repurposing
• PDF::Reuse
• PDF::Burst
• PDF::Imposition
• Image::Magick::Thumbnail::PDF
• PDF::Tk
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 15
Approximate categorization of Perl
PDF Modules – Content Extraction
• PDF::Extract
• PDF::Xtract
• PDF::OCR
• PDF::OCR2
• PDF::GetImages
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 16
Approximate categorization of Perl
PDF Modules - Miscellaneous
• PDF::Parse
• PDF::Core
• Fuse::PDF
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 17
Approximate categorization of Perl
PDF Modules – General Purpose
• PDF::API2
• PDF::API3
• CAM::PDF
• pdflib
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 18
Will be using PDFLib as a way of
showing various aspects
• From pdflib.com
• Commercial as well as open source
• Comprehensive, cross-platform, wide support for
versions, image formats, color spaces, text
rendering, graphics etc.
• PDFlib (for creating PDF files)
• PDI (for repurposing existing PDF files)
• TET (for extracting text)
• pCos (for accessing non-page data)
• PLOP (for linearizing PDFs)
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 19
Creating PDF files - PDFLib
use PDFlib::PDFlib 8.0;
my $p = new PDFlib::PDFlib;
$p->set_parameter(“compatibility” , “1.7”);
$p->set_parameter(“license” , “XYX);
$p->begin_document(“output.pdf” , “optimize”);
$p->begin_page_ext(“width=A4.width height=A4.height”, “”);
# Create content here
$p->end_page_ext( “” );
$p->end_document( “” );
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 20
Creating linearized PDF files -
PDFLib
use PDFlib::PDFlib 8.0;
my $p = new PDFlib::PDFlib;
$p->set_parameter(“compatibility” , “1.7”);
$p->set_parameter(“license” , “XYX);
$p->begin_document(“output.pdf” , “optimize linearize”);
$p->begin_page_ext(“width=A4.width height=A4.height”, “”);
# Create content here
$p->end_page_ext( “” );
$p->end_document( “” );
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 21
Creating PDF files - PDFLib
Several Options when creating a document and page:
Document Options
• password, document open actions, openmode (bookmarks,
thumbnails etc.), optimize,
• permissions (noprint, nomodify etc.)
Page Options
• Specify artbox, cropbox etc.
• Width and height
• XMP Metadata
• and many more
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 22
Fonts and PDF
my $font_handle1 = $p->load_font(“Courier” , “” , “” );
my $font_handle2 = $p->load_font(“Calibri” , “” , “” );
• Searches for fonts in the resource path
(set_parameter)
• Unicode fonts
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 23
Laying out Text
$p->setfont( $font_handle1, “32”);
$p->fittextline( “Your Text Here” , $xpos, $ypos, “” );
$p->fittextline( “More text” , $xpos2, $ypos2, “” );
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 24
Placing Images
my $image_handle =
$p->load_image( “auto” , $image_file, “options”);
$p->fit_image( $image_handle, $x, $y, “options” );
• Many image formats such as JPG, TIF, BMP are
automatically identified and loaded
• Multi-Page TIF files are handled as well
• Black and White, Grayscale, Color images handled
• Color profiles and many other parameters can be
set
• Several positioning and fitting options
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 25
Barcodes in PDF files
• A number of barcode types are possible
• Two methods of placing barcodes:
– Font Based
– Image Based
• In Font based method, load a barcode font (like a
QRCode font) and place text in that font
• In the image based method, load an image (of the
barcode) and place the image on a page.
PDF-SamplesPunjabi.pdf
PDF-SamplesAssame.pdf
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 26
Metadata in PDF files (XMP
Metadata)
• Concept of non-printable metadata in PDF
files
• Some information such as Author, Date of
Creation, Key Words can be placed using
set_parameter call
• More extensive arbitrary data can be
injected usingthe ‘XMP Metadata’ channel
• Possible in TIFF, JPEG files as well
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 27
Metadata in PDF files (XMP
Metadata)
• XMP data can be placed at the document,
page or image level
PDF-Samplessimple.txt
$p->begin_document( $output_pdf,
“metadata={filename={simple.txt}}”);
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 28
Named Destinations – Navigating to
specific pages in a PDF
$p->begin_page_ext(“width=A4.width
height=A4.height”, “”);
$p->add_nameddest( “Page1” , “options” );
$p->end_page_ext( “” );
file:///C:/Output.pdf#nameddest=Page1
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 29
Javascript in PDF files
• Javascript can be embedded in PDF files
• Actions can be tied to Javascript code
• For example, when a page is displayed
(opened) – execute a function
PDF-Samplesbarcode_field.pdf
PDF-Samplesbarcode_field.pl
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 30
PDF/A
• Special version of PDF meant for long term
archival and retrieval
• Many interactive elements not allowed
• No Hyperlinks, forms etc.
• Guaranteed to be supported by Adobe
• Applications in Library archival systems,
legal document archival and retrieval etc.
where long term compatibility of documents
is crucial
• PDFLib can create such documents
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 31
Types of Page Boundaries
http://www.prepressure.com/pdf/basics/page
-boxes
• Media Box – Specifies the width and height
of the media (paper size)
• Crop Box – Are to which page contents are
clipped (for display)
• Trim Box – Intended dimensions of the
finished page (by default = Crop Box)
• All of these can be specified in PDFLib as
options in begin_document
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 32
Bookmarks, Thumbnails
• Bookmarks and Thumbnails can be created using
PDFLib
$p->begin_page_ext(“width=A4.width
height=A4.height”, “”);
$p->create_bookmark(“Bookmark Display Name” ,
“{type fitwindow}” );
$p->add_thumbnail( $thumbnail_image_handle);
$p->end_page_ext( “” );
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 33
Controlling how a PDF file is first
displayed in Adobe (Acrobat or
Viewer)
$p->begin_document( “output.pdf”, “viewer-
preferences=centerwindow”);
$p->begin_document( “output.pdf”, “viewer-
preferences=duplex”);
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 34
Dealing with encrypted PDF files
• Specify password in begin_document
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 35
Disabling ability to Print,
Cut/Copy/Save etc.
$p->begin_document( “output.pdf”,
“action=noprint nomodify”);
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 36
Repurposing existing PDF files using
PDI
• An existing PDF file can be read in and
content placed as is in an output
my $input_doc_handle = $p->open_pdi( $input_file, “”, 0);
$p->begin_document( $output_file , “” );
$p->begin_page_ext( $width, $height);
my $page_handle = $p->open_pdi_page( $input_doc_handle,
$page_no);
$p->fit_pdi_page( $page_handle, 0 , 0, $boxsize);
$p->end_page_ext( “” );
$p->close_pdi_page( “” );
$p->end_document( “” );
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 37
PDF Files with embedded images
• Each page can have an image and nothing
else
• Scanned images are typically combined into
such PDF files
• Many options and possibilities to compress
such images in a PDF file (from Adobe
Acrobat as well as PDFLib)
• Text and other content can be overlaid on
such files as well using PDFLib’s graphic
operators
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 38
Converting PDF pages into other
formats
• ImageMagick and PerlMagick
• Convert individual pages to images
• PDF::GetImages
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 39
Text Extraction with TET
Using the Text Extraction Tool (TET), text in a PDF file
can be intelligently and reliably extracted
tet.exe –tetopt —pageopt=“{{200 750 400 755}}” –xml
line
(find text in the box 200,750,400,755 output as XML
and recognize lines)
• A perl binding for TET exists
• Note that this is not performing an ‘OCR’ option – it
is intelligently querying the PDF nodes
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 40
Non Printable data extraction using
pCOS
Using the pCOS tool, non printable data such as
number of pages, XMP metadata,
Author/Creator/Date information can be extracted.
• Extract/check for bookmarks
• Extract ICC Profiles
• Check for security problems
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 41
Important take-aways from this
presentation
• Too many modules on CPAN (quite confusing)
• None are complete (in my humble opinion)
• Commercial or open source equivalent of PDFLib
(and associated libraries such as TET, pCOS) make
an ideal toolset.
• A lot more than text and graphics is possible with
PDF files
Zentech Innovatiosn Pvt. Ltd.
Hyderabad
Telangana
India
© 2010 Zentech Innovations Pvt. Ltd.
Page 42
Prabhakar Somu
+91 97048 71236 (Mobile India)
(908) 500 5902 (Mobile US)
Email: somup@zensys.com
Thanks for your attention!

More Related Content

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Perl and PDF - YAPC::EU 2015 Presentation

  • 1. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 1 Perl and PDF Prabhakar Somu psomu@yahoo.com Zentech Innovations Pvt. Ltd. Hyderabad, India September 2, 2015
  • 2. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 2 What we do Transactional Communications • Our business is primarily involved in creation, printing, dispatching, emailing and web-presenting transactional/financial documents. • Large volume PDF document production • Variable Data, Statement composition
  • 3. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 3 A brief history of PDF (Portable Document Format) • Created by Adobe in 1993 • Compact, device independent, cross platform • A subset of Postscript page description language • Font embedding/replacement/sub-setting • Compression and structured (reusable) component storage
  • 4. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 4 A brief history of PDF contd… • Versions: – PDF 1.0 (Acrobat 1.0) – 1992 – PDF 1.1 (Acrobat 2.0) – 1994 – PDF 1.2 (Acrobat 3.0) – 1996 – PDF 1.3 (Acrobat 4.0) – 1999 – PDF 1.4 (Acrobat 5.0) – 2001 – PDF 1.5 (Acrobat 6.0) – 2003 – PDF 1.6 (Acrobat 7.0) – 2005 – PDF 1.7 (Acrobat 8.0) - 2006
  • 5. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 5 Postscript and PDF • Postscript is a Page Description Language and a programming language • Has to be interpreted and Imaged (ripped) on a device • Device specific • PDF contains a subset of Postscript elements without any control flow • Self-Contained
  • 6. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 6 Postscript and PDF • A page in Postscript has to be ‘ripped’ and imaged before a subsequent page can be imaged (graphics state needs to be maintained) • Any page of a PDF file can be displayed without needing to display earlier pages • Device independent • Self contained, fonts embedded, identical rendering on all platforms
  • 7. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 7 Postscript and PDF • Ideal in a Web environment – as any page can be displayed at any time • PDF files can be streamed • Compact, reusable components within a PDF file • Identical rendering across devices • No interpretation required in PDF
  • 8. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 8 Perl and PDF • PDF::Create • CAM::PDF • PDF::API2 • PDF::API3 • PDF::Extract • PDF::Xtract • PDF::GetImages • PDF::Template • PDF::Reuse • PDF::ReportWriter • PDF::Table • PDF::Parse • PDF::Report Several modules on CPAN
  • 9. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 9 Perl and PDF (contd.) • PDF::Core • PDF::OCR2 • Fuse::PDF • PDF::Burst • PDF::Haru • PDF::Imposition • PDF::EasyPDF • Image::Magick::Thumbn ail::PDF • PDF::Labels • PDF::Tk • PDF::Reuse::Barcode • deletepdfpage.pl • PDF::Boxer • PDFlib
  • 10. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 10 Special Mention • PDF::API2 • PDF::API3 • CAM::PDF • PDF::Haru • Quite extensive
  • 11. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 11 Approximate categorization of Perl PDF Modules • Creation • Repurposing • Extraction • Miscellaneous
  • 12. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 12 Creation Repurposing Content Extraction Miscellaneous PDF::Create PDF::Create PDF::Table PDF::Report PDF::Template PDF::ReportWriter PDF::Haru PDF::EasyPDF PDF::Labels PDF::Boxer PDF::API2 PDF::API3 CAM::PDF PDF::Extract PDF::Xtract PDF::GetImages PDF::OCR PDF::OCR2 PDF::Reuse PDF::Burst Image::Magick:: Thumbnail::PDF PDF::Tk PDF::Parse PDF::Core Fuse::PDF pdflib PDF::Reuse::Barc ode
  • 13. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 13 Approximate categorization of Perl PDF Modules - Creation • PDF::Create • PDF::Table • PDF::Report • PDF::Template • PDF::ReportWriter • PDF::Haru • PDF::EasyPDF • PDF::Labels • PDF::Boxer
  • 14. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 14 Approximate categorization of Perl PDF Modules - Repurposing • PDF::Reuse • PDF::Burst • PDF::Imposition • Image::Magick::Thumbnail::PDF • PDF::Tk
  • 15. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 15 Approximate categorization of Perl PDF Modules – Content Extraction • PDF::Extract • PDF::Xtract • PDF::OCR • PDF::OCR2 • PDF::GetImages
  • 16. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 16 Approximate categorization of Perl PDF Modules - Miscellaneous • PDF::Parse • PDF::Core • Fuse::PDF
  • 17. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 17 Approximate categorization of Perl PDF Modules – General Purpose • PDF::API2 • PDF::API3 • CAM::PDF • pdflib
  • 18. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 18 Will be using PDFLib as a way of showing various aspects • From pdflib.com • Commercial as well as open source • Comprehensive, cross-platform, wide support for versions, image formats, color spaces, text rendering, graphics etc. • PDFlib (for creating PDF files) • PDI (for repurposing existing PDF files) • TET (for extracting text) • pCos (for accessing non-page data) • PLOP (for linearizing PDFs)
  • 19. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 19 Creating PDF files - PDFLib use PDFlib::PDFlib 8.0; my $p = new PDFlib::PDFlib; $p->set_parameter(“compatibility” , “1.7”); $p->set_parameter(“license” , “XYX); $p->begin_document(“output.pdf” , “optimize”); $p->begin_page_ext(“width=A4.width height=A4.height”, “”); # Create content here $p->end_page_ext( “” ); $p->end_document( “” );
  • 20. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 20 Creating linearized PDF files - PDFLib use PDFlib::PDFlib 8.0; my $p = new PDFlib::PDFlib; $p->set_parameter(“compatibility” , “1.7”); $p->set_parameter(“license” , “XYX); $p->begin_document(“output.pdf” , “optimize linearize”); $p->begin_page_ext(“width=A4.width height=A4.height”, “”); # Create content here $p->end_page_ext( “” ); $p->end_document( “” );
  • 21. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 21 Creating PDF files - PDFLib Several Options when creating a document and page: Document Options • password, document open actions, openmode (bookmarks, thumbnails etc.), optimize, • permissions (noprint, nomodify etc.) Page Options • Specify artbox, cropbox etc. • Width and height • XMP Metadata • and many more
  • 22. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 22 Fonts and PDF my $font_handle1 = $p->load_font(“Courier” , “” , “” ); my $font_handle2 = $p->load_font(“Calibri” , “” , “” ); • Searches for fonts in the resource path (set_parameter) • Unicode fonts
  • 23. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 23 Laying out Text $p->setfont( $font_handle1, “32”); $p->fittextline( “Your Text Here” , $xpos, $ypos, “” ); $p->fittextline( “More text” , $xpos2, $ypos2, “” );
  • 24. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 24 Placing Images my $image_handle = $p->load_image( “auto” , $image_file, “options”); $p->fit_image( $image_handle, $x, $y, “options” ); • Many image formats such as JPG, TIF, BMP are automatically identified and loaded • Multi-Page TIF files are handled as well • Black and White, Grayscale, Color images handled • Color profiles and many other parameters can be set • Several positioning and fitting options
  • 25. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 25 Barcodes in PDF files • A number of barcode types are possible • Two methods of placing barcodes: – Font Based – Image Based • In Font based method, load a barcode font (like a QRCode font) and place text in that font • In the image based method, load an image (of the barcode) and place the image on a page. PDF-SamplesPunjabi.pdf PDF-SamplesAssame.pdf
  • 26. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 26 Metadata in PDF files (XMP Metadata) • Concept of non-printable metadata in PDF files • Some information such as Author, Date of Creation, Key Words can be placed using set_parameter call • More extensive arbitrary data can be injected usingthe ‘XMP Metadata’ channel • Possible in TIFF, JPEG files as well
  • 27. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 27 Metadata in PDF files (XMP Metadata) • XMP data can be placed at the document, page or image level PDF-Samplessimple.txt $p->begin_document( $output_pdf, “metadata={filename={simple.txt}}”);
  • 28. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 28 Named Destinations – Navigating to specific pages in a PDF $p->begin_page_ext(“width=A4.width height=A4.height”, “”); $p->add_nameddest( “Page1” , “options” ); $p->end_page_ext( “” ); file:///C:/Output.pdf#nameddest=Page1
  • 29. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 29 Javascript in PDF files • Javascript can be embedded in PDF files • Actions can be tied to Javascript code • For example, when a page is displayed (opened) – execute a function PDF-Samplesbarcode_field.pdf PDF-Samplesbarcode_field.pl
  • 30. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 30 PDF/A • Special version of PDF meant for long term archival and retrieval • Many interactive elements not allowed • No Hyperlinks, forms etc. • Guaranteed to be supported by Adobe • Applications in Library archival systems, legal document archival and retrieval etc. where long term compatibility of documents is crucial • PDFLib can create such documents
  • 31. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 31 Types of Page Boundaries http://www.prepressure.com/pdf/basics/page -boxes • Media Box – Specifies the width and height of the media (paper size) • Crop Box – Are to which page contents are clipped (for display) • Trim Box – Intended dimensions of the finished page (by default = Crop Box) • All of these can be specified in PDFLib as options in begin_document
  • 32. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 32 Bookmarks, Thumbnails • Bookmarks and Thumbnails can be created using PDFLib $p->begin_page_ext(“width=A4.width height=A4.height”, “”); $p->create_bookmark(“Bookmark Display Name” , “{type fitwindow}” ); $p->add_thumbnail( $thumbnail_image_handle); $p->end_page_ext( “” );
  • 33. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 33 Controlling how a PDF file is first displayed in Adobe (Acrobat or Viewer) $p->begin_document( “output.pdf”, “viewer- preferences=centerwindow”); $p->begin_document( “output.pdf”, “viewer- preferences=duplex”);
  • 34. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 34 Dealing with encrypted PDF files • Specify password in begin_document
  • 35. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 35 Disabling ability to Print, Cut/Copy/Save etc. $p->begin_document( “output.pdf”, “action=noprint nomodify”);
  • 36. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 36 Repurposing existing PDF files using PDI • An existing PDF file can be read in and content placed as is in an output my $input_doc_handle = $p->open_pdi( $input_file, “”, 0); $p->begin_document( $output_file , “” ); $p->begin_page_ext( $width, $height); my $page_handle = $p->open_pdi_page( $input_doc_handle, $page_no); $p->fit_pdi_page( $page_handle, 0 , 0, $boxsize); $p->end_page_ext( “” ); $p->close_pdi_page( “” ); $p->end_document( “” );
  • 37. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 37 PDF Files with embedded images • Each page can have an image and nothing else • Scanned images are typically combined into such PDF files • Many options and possibilities to compress such images in a PDF file (from Adobe Acrobat as well as PDFLib) • Text and other content can be overlaid on such files as well using PDFLib’s graphic operators
  • 38. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 38 Converting PDF pages into other formats • ImageMagick and PerlMagick • Convert individual pages to images • PDF::GetImages
  • 39. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 39 Text Extraction with TET Using the Text Extraction Tool (TET), text in a PDF file can be intelligently and reliably extracted tet.exe –tetopt —pageopt=“{{200 750 400 755}}” –xml line (find text in the box 200,750,400,755 output as XML and recognize lines) • A perl binding for TET exists • Note that this is not performing an ‘OCR’ option – it is intelligently querying the PDF nodes
  • 40. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 40 Non Printable data extraction using pCOS Using the pCOS tool, non printable data such as number of pages, XMP metadata, Author/Creator/Date information can be extracted. • Extract/check for bookmarks • Extract ICC Profiles • Check for security problems
  • 41. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 41 Important take-aways from this presentation • Too many modules on CPAN (quite confusing) • None are complete (in my humble opinion) • Commercial or open source equivalent of PDFLib (and associated libraries such as TET, pCOS) make an ideal toolset. • A lot more than text and graphics is possible with PDF files
  • 42. Zentech Innovatiosn Pvt. Ltd. Hyderabad Telangana India © 2010 Zentech Innovations Pvt. Ltd. Page 42 Prabhakar Somu +91 97048 71236 (Mobile India) (908) 500 5902 (Mobile US) Email: somup@zensys.com Thanks for your attention!