Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ripping your PDF files apart

838 views

Published on

iText Summit 2012 by Mark Stephens, CEO/Developer at IDRsolutions, explaining "What you really need to know about the guts of your PDF files"

Published in: Technology, Education
  • Be the first to comment

Ripping your PDF files apart

  1. 1. RIPPING YOUR PDF FILES APART What you need to know about what goes on inside your PDF files Mark Stephens Thursday, 29 March 12
  2. 2. RIPPING YOUR PDF FILES APART What you need to know about what goes on inside your PDF files Mark Stephens Thursday, 29 March 12
  3. 3. Mark’s Bio Thursday, 29 March 12
  4. 4. Mark’s Bio Thursday, 29 March 12
  5. 5. Mark’s Bio Thursday, 29 March 12
  6. 6. Mark’s Bio Working with Java and PDF since 1997 Thursday, 29 March 12
  7. 7. Mark’s Bio Working with Java and PDF since 1997 Founded IDRsolutions 1999 Thursday, 29 March 12
  8. 8. Mark’s Bio Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software Thursday, 29 March 12
  9. 9. Mark’s Bio Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software Thursday, 29 March 12
  10. 10. Mark’s Bio Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software MA degree in Mediaeval History from St Andrews (how useless is that) Thursday, 29 March 12
  11. 11. Mark’s Bio Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software MA degree in Mediaeval History from St Andrews (how useless is that) Ask me about Java, PDF, business or anything which happened before 1500 AD Thursday, 29 March 12
  12. 12. BUT FIRST SOME KITTENS... The support team at IDRsolutions are waiting for your call (maybe) Thursday, 29 March 12
  13. 13. The PDF reference guide Thursday, 29 March 12
  14. 14. Loading page 1124 of a file Word Read pages 1-1123 (time passes - scroll bar shrinks) Found it (eventually) Thursday, 29 March 12
  15. 15. Loading page 1124 of a file Word Read pages 1-1123 (time passes - scroll bar shrinks) Found it (eventually) PDF Read the metadata refs table(s) - where do I find all the objects Skip to page 1124 Thursday, 29 March 12
  16. 16. Loading page 1124 of a file Word Read pages 1-1123 (time passes - scroll bar shrinks) Found it (eventually) PDF Read the metadata refs table(s) - where do I find all the objects Skip to page 1124 PDF (in detail) Read the refs table(s) - where do I find all the objects Read the Root object - points to the Pages object Read object for page 1124 (tells me the linked font, image, content objects) Draw it Thursday, 29 March 12
  17. 17. Your PDF file is a Tree A root linked to all the branches Thursday, 29 March 12
  18. 18. The PDF reference guide Thursday, 29 March 12
  19. 19. The PDF reference guide Like you have never seen it before... Thursday, 29 March 12
  20. 20. The PDF reference guide Like you have never seen it before... Thursday, 29 March 12
  21. 21. The PDF reference guide Like you have never seen it before... You can use vi or emacs if you prefer Thursday, 29 March 12
  22. 22. The PDF reference guide End of the file Thursday, 29 March 12
  23. 23. The PDF reference guide Like you have never seen it before... Thursday, 29 March 12
  24. 24. The PDF reference guide Thursday, 29 March 12
  25. 25. The PDF reference guide Like you have never seen it before... Thursday, 29 March 12
  26. 26. The PDF root object Like you have never seen it before... Thursday, 29 March 12
  27. 27. The PDF root object Like you have never seen it before... Thursday, 29 March 12
  28. 28. PDF files on the web Isn’t having the marker at the end a problem?? Thursday, 29 March 12
  29. 29. PDF files on the web Not if you create it properly Thursday, 29 March 12
  30. 30. Key takeaways from the PDF structure Thursday, 29 March 12
  31. 31. Key takeaways from the PDF structure We do not need to load the whole file Thursday, 29 March 12
  32. 32. Key takeaways from the PDF structure We do not need to load the whole file It is equally fast to load any part of it Thursday, 29 March 12
  33. 33. Key takeaways from the PDF structure We do not need to load the whole file It is equally fast to load any part of it It is very easy to replace objects with new versions Thursday, 29 March 12
  34. 34. Key takeaways from the PDF structure We do not need to load the whole file It is equally fast to load any part of it It is very easy to replace objects with new versions There are certain key locations - like at the end of a file Thursday, 29 March 12
  35. 35. Key takeaways from the PDF structure We do not need to load the whole file It is equally fast to load any part of it It is very easy to replace objects with new versions There are certain key locations - like at the end of a file You should not edit it in a text editor Thursday, 29 March 12
  36. 36. Key takeaways from the PDF structure We do not need to load the whole file It is equally fast to load any part of it It is very easy to replace objects with new versions There are certain key locations - like at the end of a file You should not edit it in a text editor If you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first. Thursday, 29 March 12
  37. 37. Key takeaways from the PDF structure We do not need to load the whole file It is equally fast to load any part of it It is very easy to replace objects with new versions There are certain key locations - like at the end of a file You should not edit it in a text editor If you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first. Lots of features need you to setup the PDF file correctly. Thursday, 29 March 12
  38. 38. Those PDF objects in more detail All PDF objects have:1. An ID number 2. (Optional) A set of dictionary key pairs 3. (Optional) A block of binary data. Thursday, 29 March 12
  39. 39. Those PDF objects in more detail All PDF objects have:1. An ID number 2. (Optional) A set of dictionary key pairs 3. (Optional) A block of binary data. Thursday, 29 March 12
  40. 40. PDF images are not Tiff, Png or JPeg Thursday, 29 March 12
  41. 41. PDF images are not Tiff, Png or JPeg Thursday, 29 March 12
  42. 42. A word on colour Thursday, 29 March 12
  43. 43. A word on colour DeviceRGB CalRGB DeviceCMYK ICC Separation DeviceN DeviceGray CalGray Lab Pattern Thursday, 29 March 12
  44. 44. PDF pages are ‘drawn’ Thursday, 29 March 12
  45. 45. PDF pages are ‘drawn’ Thursday, 29 March 12
  46. 46. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black Thursday, 29 March 12
  47. 47. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text Thursday, 29 March 12
  48. 48. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere Thursday, 29 March 12
  49. 49. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties Thursday, 29 March 12
  50. 50. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen Thursday, 29 March 12
  51. 51. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* Thursday, 29 March 12
  52. 52. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* /T1_1 1Tf change font Thursday, 29 March 12
  53. 53. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* /T1_1 1Tf change font 0.856 0 Td move to a different location onscreen Thursday, 29 March 12
  54. 54. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* /T1_1 1Tf change font 0.856 0 Td move to a different location onscreen ( = 100) Tj draw the text = 100 Thursday, 29 March 12
  55. 55. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* /T1_1 1Tf change font 0.856 0 Td move to a different location onscreen ( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen Thursday, 29 March 12
  56. 56. PDF pages are ‘drawn’ 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* /T1_1 1Tf change font 0.856 0 Td move to a different location onscreen ( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen [(whit)6(e)] Tj draw the text white (put a gap between t and e) Thursday, 29 March 12
  57. 57. Thursday, 29 March 12
  58. 58. PDF myth - files are cross platform Only if you create them properly... Thursday, 29 March 12
  59. 59. Obfuscation for idiots! No-one will be able to guess the secret password Thursday, 29 March 12
  60. 60. 20 seconds later... And the password is.... Thursday, 29 March 12
  61. 61. Lastly a plea Not all PDF creation tools are equal Thursday, 29 March 12
  62. 62. In summary Thursday, 29 March 12

×