2011 11-mozcamp

5,411
-1

Published on

Presentation given at #mozcamp 2011 in Berlin about PDF.JS

Published in: Art & Photos, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,411
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
43
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

2011 11-mozcamp

  1. 1. PDF.JS Julian Viereck @jviereckjviereck.dev@gmail.com
  2. 2. BespinSkywriter Ace
  3. 3. Bespin FirefoxSkywriter DevTools Ace
  4. 4. Bespin Firefox ETHSkywriter DevTools Zurich Ace
  5. 5. Bespin Firefox ETH ?Skywriter PDF.JS DevTools Zurich Ace
  6. 6. Overview• What is PDF.JS• How PDF is structured• Processing in PDF.JS• Images & Fonts• Infrastructure• Problems & Todos• Demo
  7. 7. What is PDF.JS
  8. 8. What is PDF.JS• building faithful & efficient PDF renderer
  9. 9. What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment
  10. 10. What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment• no native code
  11. 11. What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment• no native code• secure (web sandbox)
  12. 12. What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment• no native code• secure (web sandbox)• Mozilla Labs Project - Open Source
  13. 13. Most vulnerable programs Source: http://www.csis.dk/en/csis/news/3321
  14. 14. How PDF is structured PDF file
  15. 15. How PDF is structured Header PDF version PDF file
  16. 16. How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fields PDF file
  17. 17. How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fieldsxRef Table mapping objID byte offset PDF file
  18. 18. How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fieldsxRef Table mapping objID byte offset Trailer root objID, xRef byte offset PDF file root obj = ref to pages catalog
  19. 19. Processing in PDF.JS
  20. 20. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream
  21. 21. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object
  22. 22. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N)
  23. 23. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N)• page.startRendering(graphics)
  24. 24. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N)• page.startRendering(graphics) • read & convert all PDF cmds ➟ IR
  25. 25. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR
  26. 26. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator
  27. 27. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator • load required objects (fonts, images)
  28. 28. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator • load required objects (fonts, images) • graphics.executeIR(IR)
  29. 29. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator • load required objects (fonts, images) • graphics.executeIR(IR) CanvasGraphics
  30. 30. Why IR?Data
  31. 31. Why IR? PartialData Evaluator
  32. 32. Why IR? PartialData Evaluator
  33. 33. Why IR? “get page 2” PartialData Evaluator
  34. 34. Why IR? “get page 2” PartialData Evaluator builds
  35. 35. Why IR? “get page 2” PartialData Evaluator builds draw( obj#3, dict.x, dict.y )
  36. 36. Why IR? “get page 2” Partial Data Evaluator builds draw( obj#3,Graphics dict.x, dict.y )
  37. 37. Why IR? “get page 2” Partial Data Evaluator builds draw( obj#3,Graphics dict.x, dict.y )
  38. 38. Why IR? “get page 2” Partial Data Evaluator builds draw( obj#3,Graphics dict.x, drawing cmds dict.y )
  39. 39. Why IR? “get page 2” Partial Data Evaluator obj#3? buildsdict.x, .y? draw( obj#3, Graphics dict.x, drawing cmds dict.y )
  40. 40. Why IR? “get page 2” Partial Data Evaluator obj#3? buildsdict.x, .y? draw( obj#3, Graphics dict.x, drawing cmds dict.y )
  41. 41. Why IR? “get page 2” Partial Data Evaluator obj#3? obj#3 = ”foo” buildsdict.x, .y? x = 20 y = 30 draw( obj#3, Graphics dict.x, drawing cmds dict.y )
  42. 42. Why IR? “get page 2” Partial Data Evaluator obj#3? obj#3 = ”foo” buildsdict.x, .y? x = 20 y = 30 draw( obj#3, Graphics dict.x, drawing cmds dict.y )
  43. 43. Why IR? “get page 2” Partial Data Evaluator obj#3? obj#3 = ”foo” buildsdict.x, .y? x = 20 y = 30 draw( obj#3, Graphics dict.x, drawing cmds dict.y ) draw on canvas
  44. 44. Problem Processing
  45. 45. Problem Processing• Extracting data slow (compressed)
  46. 46. Problem Processing• Extracting data slow (compressed)• Transform data (images) slow
  47. 47. Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page
  48. 48. Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page➡ Freezes UI
  49. 49. Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page➡ Freezes UI➡ Use WebWorker
  50. 50. Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page➡ Freezes UI➡ Use WebWorker➡ :( no direct memory access, postMessage
  51. 51. Main WebThread WorkerData
  52. 52. Main WebThread Worker PartialData Evaluator
  53. 53. Main WebThread Worker data PartialData “get page 2” Evaluator
  54. 54. Main WebThread Worker data PartialData Data “get page 2” Evaluator
  55. 55. Main WebThread Worker data PartialData Data “get page 2” Evaluator builds
  56. 56. Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( obj#3, dict.x, dict.y )
  57. 57. Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( obj#3, dict.x, dict.y )
  58. 58. Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( draw( obj#3, “foo”, dict.x, 20, dict.y 30 ) )
  59. 59. Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”, dict.x, 20, dict.y 30 ) )
  60. 60. Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, 20, dict.y 30 ) )
  61. 61. Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, IR cmds 20, dict.y 30 ) )
  62. 62. Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, IR cmds 20, dict.y 30 ) )
  63. 63. Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, IR cmds 20, dict.y 30 ) draw on ) canvas
  64. 64. 5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
  65. 65. 5 0 obj PartialEvaluator<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
  66. 66. 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
  67. 67. 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
  68. 68. 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstreamendobj Graphics
  69. 69. 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstreamendobj Graphics
  70. 70. 5 0 obj xRef, catalog, IR + resources PartialEvaluator<< /Length 8 0 R>> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstreamendobj Graphics
  71. 71. Images
  72. 72. Images• JPEG streams:
  73. 73. Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));
  74. 74. Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream:
  75. 75. Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace
  76. 76. Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData()
  77. 77. Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData() • fillWithPixelData(bytes, imgData)
  78. 78. Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData() • fillWithPixelData(bytes, imgData) • canvas.putImageData(imgData)
  79. 79. Jpeg, but...
  80. 80. Jpeg, but...• no natives support for CMYK Jpeg
  81. 81. Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation
  82. 82. Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation• no native support for Jpeg 2000
  83. 83. Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation• no native support for Jpeg 2000 ➡ use EMScripten: C-Lib ➟ JS
  84. 84. Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation• no native support for Jpeg 2000 ➡ use EMScripten: C-Lib ➟ JS‣ works, but not that performant
  85. 85. Fonts
  86. 86. Fonts• There are lots of different font formats!
  87. 87. Fonts• There are lots of different font formats! • fonts are converted to OpenType
  88. 88. Fonts• There are lots of different font formats! • fonts are converted to OpenType • use CSS: @font-face { font-family:font0; src:url(data:font/opentype;base64, ...)
  89. 89. Fonts• There are lots of different font formats! • fonts are converted to OpenType • use CSS: @font-face { font-family:font0; src:url(data:font/opentype;base64, ...)• some fonts can’t be converted :(
  90. 90. Fonts• There are lots of different font formats! • fonts are converted to OpenType • use CSS: @font-face { font-family:font0; src:url(data:font/opentype;base64, ...)• some fonts can’t be converted :( • paint them
  91. 91. FontsType I convert to Type IIType II “use directly”Type III paint ourself CDI convert to Type II
  92. 92. FontsType I convert to Type II still needType II “use directly” to repair fonts!Type III paint ourself CDI convert to Type II
  93. 93. Infrastructure
  94. 94. Infrastructure• Using GitHub
  95. 95. Infrastructure• Using GitHub • Issue Tracker
  96. 96. Infrastructure• Using GitHub • Issue Tracker • Pull Requests
  97. 97. Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki
  98. 98. Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push
  99. 99. Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push• Testing:
  100. 100. Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push• Testing: • In Pull Request: “@pdfjsbot test”
  101. 101. Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push• Testing: • In Pull Request: “@pdfjsbot test” • Runs tests on AC2 instance
  102. 102. Infrastructure
  103. 103. Infrastructure• AreWePdfYet?
  104. 104. Infrastructure• AreWePdfYet? • Take top100 PDFs from Google
  105. 105. Infrastructure• AreWePdfYet? • Take top100 PDFs from Google • render the first 5 pages each
  106. 106. Infrastructure• AreWePdfYet? • Take top100 PDFs from Google • render the first 5 pages each • compare to Preview
  107. 107. Infrastructure• AreWePdfYet? • Take top100 PDFs from Google • render the first 5 pages each • compare to Preview • http://people.mozilla.com/~bdahl/ corpusreport/test/ref/
  108. 108. Todo = Help :)
  109. 109. Worker Canvas
  110. 110. Read-Only Memory Web Worker
  111. 111. Faster Canvas Rendering
  112. 112. CMYK Jpeg Jpeg2000
  113. 113. Font Load Event
  114. 114. WebPrint API
  115. 115. XHR Range Support
  116. 116. Font Support
  117. 117. Parallel Web Worker
  118. 118. SVG Backend(text selection [Gecko])
  119. 119. “HTML5” Backend
  120. 120. Search | Selection | Copy
  121. 121. Input Forms
  122. 122. More Parts Of Spec
  123. 123. Improve Viewer
  124. 124. Pref & Memory Analysis
  125. 125. Improve TestInfrastructure
  126. 126. More Testing!
  127. 127. More Testing• use PDF.JS extension!• http://mozilla.github.com/pdf.js/extensions/ firefox/pdf.js.xpi• report broken PDFs!• help us categorize issues
  128. 128. Feedback Feature
  129. 129. Demo
  130. 130. Github: Readme https://github.com/mozilla/pdf.js Issues WikiTwitter: @pdfjsMailing List: https://groups.google.com/group/mozilla.dev.pdf-js/topicsIRC: irc.mozilla.org #pdfjsEngineering Weekly Call: Thursday - 10:00am PDT, 17:00 UTC
  131. 131. Q &A

×