2. STEPS FOR CONVERSION
1) Extract the content from Indesign or Pdf.
Client Input: PDF
2) Arrange the text as per project requirement.
Word file: Doc
3) Apply the style in the Word Document. (e.g. Heading 1, Heading 2,
Heading 3)
4) Apply Auto Tags in the Word Document. (e.g. <h1>, <h2>, <h3>, <p>,
<sup>, <b>, <i>)
Word file: Formatted Doc
5) Convert Word file to XML through Developer-Schema.
Final output: HTML File
11. Convert Word to XML
1) Select the Developer option from menu bar
2) After that select the Schema Option and allow the check
box in below mentioned snap:
12. 3) Select the templates option and attach right template:
13. 4) Apply root element tag <html>, <body> then select the xml
option and tick the save data only option.
14. 5) Go to File Save as Other Formats then change
the Save as type Doc to Word 2003 xml Document (*.xml).
15.
16. 6) Open Xml file.
Find the < and > entity.
Replace with normal < and > sign.
17. 7) Arrange the html tags by using regex then valid the xml file
on xml copy editor.
18. Using regular expressions for Apply the tags
and Linking process.
Findings by using Regex:
1) ([0-9]) Using this regex expression for find only
numbers.
2) ([A-Za-z]) Using this regex expression for find only alpha
characters.
3) ([^>]*) Using this regex expression for find anything in
between specific tag.
4) ([^*]*) Using this for find all the content till new line of
the any paragraph in the file.
19. Replacement by using Regex:
1) 1 Using in replacement properties for replacement the
first regex expression.
21. 2) i <expression> Substitute a sequence number.
Expression: Effect:
i Replace with numbers starting
from 1, incrementing by 1.
i(10) Replace with numbers starting
from 10, incrementing by 1.
i(0,10) Replace with numbers starting
from 0, incrementing by 10.
i(100,-10) Replace with numbers starting
from 100, decrementing by -
10.
22. 3) We using this i expression in our document for linking to
footnotes & generate the unique number to each
footnotes.
Example:
Content &
Element
<sup1>a</sup1>
Find <sup1>([^>]*)</sup1>
Replace <a id="ntfia"
href="Ch000.html#ntfi"><sup1>[i]</sup1></a>
Result <a id="ntf1a"
href="Ch000.html#ntf1"><sup1>[1]</sup1></a>