Convert PDF to EPUB with SPiZone
Upcoming SlideShare
Loading in...5
×
 

Convert PDF to EPUB with SPiZone

on

  • 441 views

Converting PDF to EPUB can be challenging without the right tools. After doing a lot of R&D, SPi has come up with a new approach for extracting text from searchable PDF inputs.

Converting PDF to EPUB can be challenging without the right tools. After doing a lot of R&D, SPi has come up with a new approach for extracting text from searchable PDF inputs.

Statistics

Views

Total Views
441
Views on SlideShare
439
Embed Views
2

Actions

Likes
0
Downloads
7
Comments
0

2 Embeds 2

http://www.topbest.ph 1
http://www.w3schools.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Convert PDF to EPUB with SPiZone Convert PDF to EPUB with SPiZone Presentation Transcript

  • SPiZONE Presentation We inSPire success.
  • Challenges in Text Extraction from PDF • PDF is not a markup format. Extracting text from a PDF file is not easy. • When extracting the text, we need to take care of fonts, encoding and sometimes font-subsets. • Usual problems encountered when extracting text from PDF using conventional method are:  Special characters are not properly extracted.  Missing formatting including case changes.  Unwanted merging/splitting of paragraphs.  Content extracted in incorrect order.  Text in columns are mixed up. We inSPire success. 2
  • Introduction • After doing a lot of R&D, SPi has come up with a new approach for extracting text from searchable PDF inputs. • SPiZONE tool was developed to have a generic workflow for OCR on raster PDF and scanned images, text extraction processes for searchable PDF. • Output of SPiZONE Verify is short-tagged text file. It can be further converted into any output format like XML, ePub etc. We inSPire success. 3
  • Product Highlights • Text extraction is possible for all languages. • Text accuracy is more than 99.95%. • Table extraction along with column-spanning and row-spanning etc, based on user input. • Image extraction. • Options to mark some text as ‘Ignore Text’ within zones, so that it will not be produced in output. We inSPire success. 4
  • PDF to Text using SPiZONE - Quick Workflow SZI Generator •SZI Generator (Server Process) SPiZONE Edit •Styling and Zoning Extraction •PDF to HTML (Sever Process) SPiZONE Verify We inSPire success. •Content QA 5
  • SZI Generation • Sever based process • Input: PDF • Output: LowRes TIFF and SZI • SZI – Styling and Zoning Information We inSPire success. 6
  • SPiZONE Edit • Styling and Zoning application • • Input: TIFF and SZI Output: SZI • User will identify the text to be extracted by drawing zones. When drawing zones, style names and sequence numbers and other properties, are assigned to each element. • These style names are used during post-extraction processing and during XML/ePub conversion • The zones information are saved in SZI file. We inSPire success. 7
  • SPiZONE Edit -- DEMO We inSPire success. 8
  • Text Extraction from PDF • Server based process. • • Input: PDF and SZI Output: HTML, SZD • SZD – SPiZONE Document used for logging. • Font details, uncertain space, soft-hyphens etc are flagged in the extracted file which are used by SPiZONE Verify. We inSPire success. 9
  • SPiZONE Verify • OCR/Text Extraction QA application. • • Input: Extracted content in HTML format, SZI and LowRes TIFF. Output: Short-tagged files. • With this application user performs a regulated content checking on the extracted HTML files. • Font Normalization is used to make sure all the characters are extracted fine. User can correct the discrepancies if any. • Verify will not allow the user to create short-tagged file without normalizing all fonts and checking all uncertain space/soft-hyphens. • To see how SPIZONE Verify works, open the video on next slide. We inSPire success. 10
  • SPiZONE Verify -- DEMO We inSPire success. 11
  • Processing SPiZONE Output • PDF to Short-tagged text file creation workflow process is generic for all projects. • Short-tagged text files can be further converted into XML or ePub or any other format as per project requirement. • SPiZONE Structure is a customizable application which is used for conversion into any format like (but not limited to) XML, ePub etc. • Structure applications can be built in shorter period of time for any XML conversion project. • SPiZONE ePub application accepts short-tagged files as input to create ePub2/3. We inSPire success. 12
  • SPiZONE Edit Samples We inSPire success. 13
  • SPiZONE Edit Samples We inSPire success. 14
  • SPiZONE Edit Samples We inSPire success. 15
  • SPiZONE Verify Samples We inSPire success. 16
  • SPiZONE Verify Samples We inSPire success. 17
  • SPiZONE Verify Samples We inSPire success. 18
  • SPiZONE Verify Samples We inSPire success. 19
  • SPiZONE Verify Samples We inSPire success. 20
  • ePUB Output Samples We inSPire success. 21
  • ePUB Output Samples We inSPire success. 22
  • ePUB Output Samples We inSPire success. 23
  • Know more about PDF to ePUB conversion http://www.spi-global.com/content-solutions/our-services/publishingsolutions/conversion/convert-pdf-epub We inSPire success.