PDF.JS at SwissJeese 2012

2,015 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,015
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

PDF.JS at SwissJeese 2012

  1. 1. Julian Viereck @jviereck+julian.viereck
  2. 2. Overview5 • What is PDF.JS about10 • How PDF is structured & processing in PDF.JS15 • “Why are you doing this?”5 • Firefox Integration5 • What’s next?15 • Demo5 • Q &A
  3. 3. About me Bespin Firefox ETH ?Skywriter Zurich PDF.JS DevTools Ace (Physics)
  4. 4. PDF Viewer usingOpenWebStandards
  5. 5. What is PDF.JS• building faithful & efficient PDF viewer• HTML5 technology experiment• no native code• secure (web sandbox)• Mozilla Labs Project - Open Source (Github)
  6. 6. What is PDF.JS• Not Firefox-Specific - all modern browsers• 1.3 MB uncompressed JS• ~ 33`000 lines of code• viewer in different languages• async API
  7. 7. How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fieldsxRef Table mapping objID byte offset Trailer root objID, xRef byte offset PDF file root obj = ref to pages catalog
  8. 8. Let’s look at it
  9. 9. Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Operation• page.startRendering(graphics) List • read & convert all PDF cmds ➟ OL PartialEvaluator • load required objects (fonts, images) • graphics.executeOperatorList(OL) CanvasGraphics
  10. 10. Execution Example “get page 2” Partial Data Evaluator obj#3? obj#3 = ”foo” buildsdict.x, .y? x = 20 y = 30 draw( obj#3, Graphics dict.x, drawing cmds dict.y ) draw on canvas
  11. 11. Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page➡ Freezes UI➡ Use WebWorker➡ :( no direct memory access, postMessage
  12. 12. Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( obj#3, Op Operation “foo”,Graphics dict.x, List 20, List + Data dict.y 30 ) draw on ) canvas
  13. 13. 5 0 obj xRef, catalog, OL + resources PartialEvaluator<< /Length 8 0 R>> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstreamendobj Graphics
  14. 14. Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData() • fillWithPixelData(bytes, imgData) • canvas.putImageData(imgData)
  15. 15. Jpeg, but...• no natives support for Jpeg 2000, CMYK ➡ use JS implementation‣ works, not that performant but good enough
  16. 16. Fonts• There are lots of different font formats! • fonts are converted to OpenType • use CSS for loading: @font-face { font-family:font0; src:url(data:font/opentype;base64, ...)• Fonts are sanitized by browser • Need to rebuild malformed fonts :/
  17. 17. “Why are you doing this?” aka. ∃ C/C++ libraries = isn’t that faster?
  18. 18. “Performance is not the only measure”
  19. 19. 1. Security
  20. 20. Most vulnerable programs Source: http://www.csis.dk/en/csis/news/3321
  21. 21. ~ 25% crashes in Firefox are Plugin related
  22. 22. 2. WebSpecific Viewer
  23. 23. 3. Drive Innovation
  24. 24. 4. Speed
  25. 25. 4. Speed• Rendering slower then C/C++• BUT • Partial downloading • Render page in background • Make slow become faster • Mostly: Good enough
  26. 26. 5. Can do better
  27. 27. 6. Push WebPlatform
  28. 28. B2G aka. Boot2Gecko
  29. 29. New API: Printing• Printing very limited on the web right now• no way to achieve native printing experience• NEED: New API for printing • mozPrintCallback • define canvas content during printing • send drawing commands directly to printer
  30. 30. PrintWebPage Single Pages
  31. 31. Page 1 • Find print canvas on page • Execute printCallback • All canvas done ➠ print pagePage 2
  32. 32. canvas.mozPrintCallback
  33. 33. Firefox Integration
  34. 34. Firefox Integration• PDF.JS as bundled Addon in Firefox Nightly• Getting in Release Channel is hard • 400M users have expectations • more testing coverage • accessibility • match UX expectation • fallback if something is not working
  35. 35. Firefox Integration• Try to make it till Aurora Merge (6/5)• Firefox Specific, BUT • improving quality browser independent • only small parts Firefox specific
  36. 36. What’s next• Fix broken PDFs• Improve performance• Improve Text selection• Text search• Form support• Printing support
  37. 37. Demo
  38. 38. Contributing• Lots of areas • Translation • Writing Code (embeddable viewer?) • Testing (Firefox Auto-Update Addon)
  39. 39. Github: Readme https://github.com/mozilla/pdf.js Issues WikiTwitter: @pdfjsMailing List: https://groups.google.com/group/mozilla.dev.pdf-js/topicsIRC: irc.mozilla.org #pdfjsEngineering Weekly Call: Thursday - 10:00am PDT
  40. 40. Q &A

×