SlideShare a Scribd company logo
1 of 29
Download to read offline
Tag! Your PDF is It!
Alejandro Piñeiro and Joanmarie Diggs
GUADEC 2013
2
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
Topics
● Tagged PDFs:
– What They Are
– Why We Want Them
– How to Make Them
● Current Status of the Project
● Getting the Code (and what you'll see when you do)
`
Tagged PDFs
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
4
Tagged PDF > PDF
• Meta-information about page content
• HTMLish tags and IDs for text spans
• Alternative text for images
• Replacement text for symbols
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
5
Why We Want Them
• Enhanced document accessibility
• Through exposure of structural and semantic
information associated with the tags
Thanks (again) Friends of GNOME!!!
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
6
Why We Want Them (cont.)
• Reflow functionality (e.g. for mobile devices)
• Export to other applications with format, layout,
font data, etc.
• Copy and paste to other applications with some
fundamental retention of content format
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
7
Making Tagged PDFs
✘ AbiWord: No
✘ Google Docs: No
✘ LaTeX: No
✘ Scribus: No
✘ PDF Studio: No
✘ python-pisa: No
✔ LibreOffice: Yes
(and it's easy!)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
8
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
9
PDF/A-1a > Tagged PDF
• Objective: Search and repurpose document content
• Includes:
- PDF/A-1b: Reproduce document appearance
- Structure / Hierarchy
- Tagged PDF
- Unicode character maps
- Language specification
`
Current Status
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
11
Tagged PDF Support
✔Parse the document structure tree: Poppler
✔Expose the tree and attributes: Poppler GLib
✔Provide tools to examine and verify result: Poppler
● Create parallel object tree with attributes: Evince
● (Expose object tree and attributes via ATK: Evince)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
12
PDF/A-1a Support
? PDF/A-1b
✔ Tagged PDF
✔ Structure / Hierarchy
? Unicode character maps
✔ Language specification
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
13
What's Next?
• Create parallel object tree with attributes: Evince
• (Expose object tree and attributes via ATK: Evince)
? PDF/A-1b and Unicode character maps
? Adding support to LaTeX, et al.
`
Getting the Code
(and what you'll see when you do)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
15
Credit Where Credit is Due
• Adrián Pérez: Document Parser Extraordinaire
• Carlos García Campos: Maintains Evince & Poppler
Thanks Guys!!!
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
16
Getting the Code
• git://git.freedesktop.org/git/poppler/poppler
• Today
- Branch: tagged-pdf
- Patches: fdo bugs 64816 and 67710
• Soon: master branch
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
17
Getting the Code (cont.)
• Poppler:
10 files changed, 2309 insertions(+), 17 deletions(-)
• Popper Glib:
16 files changed, 3011 insertions(+)
• Utils:
3 files changed, 661 insertions(+), 2 deletions(-)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
18
Associated Output Tools: Before
• pdfinfo: author, editor, etc.
• pdftotext: content (plain text)
• pdftohtml: content (barely formatted text)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
19
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
20
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
21
Associated Output Tools: After
● pdfstructtohtml: like pdftohtml but preserves tags
● pdfinfo's new options:
- hierarchy
- hierarchy along with content of each element
● poppler-glib-demo: new option to display hierarchy
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
22
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
23
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
24
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
25
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
26
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
27
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
28
`
Questions?

More Related Content

Similar to Tag! Your PDF is It! (GUADEC 2013)

How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeeHow To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeedigitalthinkingbee.com
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationAlkacon Software GmbH & Co. KG
 
Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Max Ekman
 
Digitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyDigitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyCliff Landis
 
Google Tools For Researchers
Google Tools For ResearchersGoogle Tools For Researchers
Google Tools For Researchersdcsla
 
Web optimizations Back to the basics - Razvan Rosu
Web optimizations  Back to the basics - Razvan RosuWeb optimizations  Back to the basics - Razvan Rosu
Web optimizations Back to the basics - Razvan RosuRazvan Rosu
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook Moon Soo Lee
 
PDF/a for Dutch Law firms
PDF/a for Dutch Law firmsPDF/a for Dutch Law firms
PDF/a for Dutch Law firmsDean Sappey
 
Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical WritersJeff Haas
 
Osseo Apps- Weaver Tech Institute
Osseo Apps-  Weaver Tech InstituteOsseo Apps-  Weaver Tech Institute
Osseo Apps- Weaver Tech InstituteLisa Sjogren
 
Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7robinpuga
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3Holden Karau
 
Hacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopHacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopIan Macfarlane
 
Technology Tools
Technology ToolsTechnology Tools
Technology Toolsmrarbit
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Fishbowl Solutions
 

Similar to Tag! Your PDF is It! (GUADEC 2013) (20)

How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeeHow To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
 
Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07
 
Digitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyDigitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case Study
 
Google Tools For Researchers
Google Tools For ResearchersGoogle Tools For Researchers
Google Tools For Researchers
 
Web optimizations Back to the basics - Razvan Rosu
Web optimizations  Back to the basics - Razvan RosuWeb optimizations  Back to the basics - Razvan Rosu
Web optimizations Back to the basics - Razvan Rosu
 
Ignite ID PePcon 2014
Ignite ID PePcon 2014Ignite ID PePcon 2014
Ignite ID PePcon 2014
 
Collaboration in the workplace and beyond
Collaboration in the workplace and beyondCollaboration in the workplace and beyond
Collaboration in the workplace and beyond
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook
 
PDF/a for Dutch Law firms
PDF/a for Dutch Law firmsPDF/a for Dutch Law firms
PDF/a for Dutch Law firms
 
Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical Writers
 
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGISSFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
 
Osseo Apps- Weaver Tech Institute
Osseo Apps-  Weaver Tech InstituteOsseo Apps-  Weaver Tech Institute
Osseo Apps- Weaver Tech Institute
 
Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7
 
Lesson 05
Lesson 05Lesson 05
Lesson 05
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3
 
Hacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopHacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshop
 
Technology Tools
Technology ToolsTechnology Tools
Technology Tools
 
CollegeDiveIn presentation
CollegeDiveIn presentationCollegeDiveIn presentation
CollegeDiveIn presentation
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
 

More from Igalia

Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JITIgalia
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!Igalia
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerIgalia
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in MesaIgalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIgalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera LinuxIgalia
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVMIgalia
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsIgalia
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesIgalia
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSIgalia
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webIgalia
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersIgalia
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...Igalia
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on RaspberryIgalia
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Igalia
 
Async page flip in DRM atomic API
Async page flip in DRM  atomic APIAsync page flip in DRM  atomic API
Async page flip in DRM atomic APIIgalia
 
From the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by StepFrom the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by StepIgalia
 
Migrating Babel from CommonJS to ESM
Migrating Babel     from CommonJS to ESMMigrating Babel     from CommonJS to ESM
Migrating Babel from CommonJS to ESMIgalia
 
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...Igalia
 
Freedreno on Android – XDC 2023
Freedreno on Android          – XDC 2023Freedreno on Android          – XDC 2023
Freedreno on Android – XDC 2023Igalia
 

More from Igalia (20)

Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...
 
Async page flip in DRM atomic API
Async page flip in DRM  atomic APIAsync page flip in DRM  atomic API
Async page flip in DRM atomic API
 
From the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by StepFrom the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by Step
 
Migrating Babel from CommonJS to ESM
Migrating Babel     from CommonJS to ESMMigrating Babel     from CommonJS to ESM
Migrating Babel from CommonJS to ESM
 
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
 
Freedreno on Android – XDC 2023
Freedreno on Android          – XDC 2023Freedreno on Android          – XDC 2023
Freedreno on Android – XDC 2023
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Tag! Your PDF is It! (GUADEC 2013)

  • 1. Tag! Your PDF is It! Alejandro Piñeiro and Joanmarie Diggs GUADEC 2013
  • 2. 2 Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 Topics ● Tagged PDFs: – What They Are – Why We Want Them – How to Make Them ● Current Status of the Project ● Getting the Code (and what you'll see when you do)
  • 4. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 4 Tagged PDF > PDF • Meta-information about page content • HTMLish tags and IDs for text spans • Alternative text for images • Replacement text for symbols
  • 5. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 5 Why We Want Them • Enhanced document accessibility • Through exposure of structural and semantic information associated with the tags Thanks (again) Friends of GNOME!!!
  • 6. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 6 Why We Want Them (cont.) • Reflow functionality (e.g. for mobile devices) • Export to other applications with format, layout, font data, etc. • Copy and paste to other applications with some fundamental retention of content format
  • 7. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 7 Making Tagged PDFs ✘ AbiWord: No ✘ Google Docs: No ✘ LaTeX: No ✘ Scribus: No ✘ PDF Studio: No ✘ python-pisa: No ✔ LibreOffice: Yes (and it's easy!)
  • 8. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 8
  • 9. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 9 PDF/A-1a > Tagged PDF • Objective: Search and repurpose document content • Includes: - PDF/A-1b: Reproduce document appearance - Structure / Hierarchy - Tagged PDF - Unicode character maps - Language specification
  • 11. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 11 Tagged PDF Support ✔Parse the document structure tree: Poppler ✔Expose the tree and attributes: Poppler GLib ✔Provide tools to examine and verify result: Poppler ● Create parallel object tree with attributes: Evince ● (Expose object tree and attributes via ATK: Evince)
  • 12. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 12 PDF/A-1a Support ? PDF/A-1b ✔ Tagged PDF ✔ Structure / Hierarchy ? Unicode character maps ✔ Language specification
  • 13. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 13 What's Next? • Create parallel object tree with attributes: Evince • (Expose object tree and attributes via ATK: Evince) ? PDF/A-1b and Unicode character maps ? Adding support to LaTeX, et al.
  • 14. ` Getting the Code (and what you'll see when you do)
  • 15. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 15 Credit Where Credit is Due • Adrián Pérez: Document Parser Extraordinaire • Carlos García Campos: Maintains Evince & Poppler Thanks Guys!!!
  • 16. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 16 Getting the Code • git://git.freedesktop.org/git/poppler/poppler • Today - Branch: tagged-pdf - Patches: fdo bugs 64816 and 67710 • Soon: master branch
  • 17. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 17 Getting the Code (cont.) • Poppler: 10 files changed, 2309 insertions(+), 17 deletions(-) • Popper Glib: 16 files changed, 3011 insertions(+) • Utils: 3 files changed, 661 insertions(+), 2 deletions(-)
  • 18. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 18 Associated Output Tools: Before • pdfinfo: author, editor, etc. • pdftotext: content (plain text) • pdftohtml: content (barely formatted text)
  • 19. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 19
  • 20. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 20
  • 21. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 21 Associated Output Tools: After ● pdfstructtohtml: like pdftohtml but preserves tags ● pdfinfo's new options: - hierarchy - hierarchy along with content of each element ● poppler-glib-demo: new option to display hierarchy
  • 22. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 22
  • 23. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 23
  • 24. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 24
  • 25. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 25
  • 26. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 26
  • 27. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 27
  • 28. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 28