• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
2011 11-mozcamp
 

2011 11-mozcamp

on

  • 5,301 views

Presentation given at #mozcamp 2011 in Berlin about PDF.JS

Presentation given at #mozcamp 2011 in Berlin about PDF.JS

Statistics

Views

Total Views
5,301
Views on SlideShare
1,970
Embed Views
3,331

Actions

Likes
3
Downloads
41
Comments
0

19 Embeds 3,331

http://www.communitytrainingaustralia.edu.au 1424
http://bayanijuan.com.ph 796
http://communitytrainingaustralia.edu.au 592
http://dev.communitytrainingaustralia.edu.au 222
http://localhost 130
http://stagingcta.communitytrainingaustralia.edu.au 51
http://www.bayanijuan.com.ph 32
http://www.dev.communitytrainingaustralia.edu.au 17
http://xperienceitsolutions.com.au 17
http://a0.twimg.com 14
http://rmp.aspaeth.dev 10
http://paper.li 9
http://staging.communitytrainingaustralia.edu.au 7
http://192.168.3.93 3
http://www.takecarehealth.in 2
http://www16.jimdo.com 2
http://elektrouslugisofia.alle.bg 1
http://search.learnwithin.com 1
http://twitter.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    2011 11-mozcamp 2011 11-mozcamp Presentation Transcript

    • PDF.JS Julian Viereck @jviereckjviereck.dev@gmail.com
    • BespinSkywriter Ace
    • Bespin FirefoxSkywriter DevTools Ace
    • Bespin Firefox ETHSkywriter DevTools Zurich Ace
    • Bespin Firefox ETH ?Skywriter PDF.JS DevTools Zurich Ace
    • Overview• What is PDF.JS• How PDF is structured• Processing in PDF.JS• Images & Fonts• Infrastructure• Problems & Todos• Demo
    • What is PDF.JS
    • What is PDF.JS• building faithful & efficient PDF renderer
    • What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment
    • What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment• no native code
    • What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment• no native code• secure (web sandbox)
    • What is PDF.JS• building faithful & efficient PDF renderer• HTML5 technology experiment• no native code• secure (web sandbox)• Mozilla Labs Project - Open Source
    • Most vulnerable programs Source: http://www.csis.dk/en/csis/news/3321
    • How PDF is structured PDF file
    • How PDF is structured Header PDF version PDF file
    • How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fields PDF file
    • How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fieldsxRef Table mapping objID byte offset PDF file
    • How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fieldsxRef Table mapping objID byte offset Trailer root objID, xRef byte offset PDF file root obj = ref to pages catalog
    • Processing in PDF.JS
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N)
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N)• page.startRendering(graphics)
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N)• page.startRendering(graphics) • read & convert all PDF cmds ➟ IR
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator • load required objects (fonts, images)
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator • load required objects (fonts, images) • graphics.executeIR(IR)
    • Processing in PDF.JS• get plain Uint8Array via XHR2, build Stream• new PDFDoc(stream): read xRef, root object• page = PDFDoc.getPage(N) Intermediate• page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator • load required objects (fonts, images) • graphics.executeIR(IR) CanvasGraphics
    • Why IR?Data
    • Why IR? PartialData Evaluator
    • Why IR? PartialData Evaluator
    • Why IR? “get page 2” PartialData Evaluator
    • Why IR? “get page 2” PartialData Evaluator builds
    • Why IR? “get page 2” PartialData Evaluator builds draw( obj#3, dict.x, dict.y )
    • Why IR? “get page 2” Partial Data Evaluator builds draw( obj#3,Graphics dict.x, dict.y )
    • Why IR? “get page 2” Partial Data Evaluator builds draw( obj#3,Graphics dict.x, dict.y )
    • Why IR? “get page 2” Partial Data Evaluator builds draw( obj#3,Graphics dict.x, drawing cmds dict.y )
    • Why IR? “get page 2” Partial Data Evaluator obj#3? buildsdict.x, .y? draw( obj#3, Graphics dict.x, drawing cmds dict.y )
    • Why IR? “get page 2” Partial Data Evaluator obj#3? buildsdict.x, .y? draw( obj#3, Graphics dict.x, drawing cmds dict.y )
    • Why IR? “get page 2” Partial Data Evaluator obj#3? obj#3 = ”foo” buildsdict.x, .y? x = 20 y = 30 draw( obj#3, Graphics dict.x, drawing cmds dict.y )
    • Why IR? “get page 2” Partial Data Evaluator obj#3? obj#3 = ”foo” buildsdict.x, .y? x = 20 y = 30 draw( obj#3, Graphics dict.x, drawing cmds dict.y )
    • Why IR? “get page 2” Partial Data Evaluator obj#3? obj#3 = ”foo” buildsdict.x, .y? x = 20 y = 30 draw( obj#3, Graphics dict.x, drawing cmds dict.y ) draw on canvas
    • Problem Processing
    • Problem Processing• Extracting data slow (compressed)
    • Problem Processing• Extracting data slow (compressed)• Transform data (images) slow
    • Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page
    • Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page➡ Freezes UI
    • Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page➡ Freezes UI➡ Use WebWorker
    • Problem Processing• Extracting data slow (compressed)• Transform data (images) slow• Sometimes a lot of objects on page➡ Freezes UI➡ Use WebWorker➡ :( no direct memory access, postMessage
    • Main WebThread WorkerData
    • Main WebThread Worker PartialData Evaluator
    • Main WebThread Worker data PartialData “get page 2” Evaluator
    • Main WebThread Worker data PartialData Data “get page 2” Evaluator
    • Main WebThread Worker data PartialData Data “get page 2” Evaluator builds
    • Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( obj#3, dict.x, dict.y )
    • Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( obj#3, dict.x, dict.y )
    • Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( draw( obj#3, “foo”, dict.x, 20, dict.y 30 ) )
    • Main WebThread Worker data PartialData Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”, dict.x, 20, dict.y 30 ) )
    • Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, 20, dict.y 30 ) )
    • Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, IR cmds 20, dict.y 30 ) )
    • Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, IR cmds 20, dict.y 30 ) )
    • Main Web Thread Worker data Partial Data Data “get page 2” Evaluator builds draw( draw( IR obj#3, “foo”,Graphics dict.x, IR cmds 20, dict.y 30 ) draw on ) canvas
    • 5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
    • 5 0 obj PartialEvaluator<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
    • 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
    • 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
    • 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstreamendobj Graphics
    • 5 0 obj xRef, catalog, + resources PartialEvaluator<< /Length 8 0 R>> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstreamendobj Graphics
    • 5 0 obj xRef, catalog, IR + resources PartialEvaluator<< /Length 8 0 R>> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstreamendobj Graphics
    • Images
    • Images• JPEG streams:
    • Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));
    • Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream:
    • Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace
    • Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData()
    • Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData() • fillWithPixelData(bytes, imgData)
    • Images• JPEG streams: • DOMImg.src = data:image/jpeg;base64, + window.btoa(bytesToString(bytes));• If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData() • fillWithPixelData(bytes, imgData) • canvas.putImageData(imgData)
    • Jpeg, but...
    • Jpeg, but...• no natives support for CMYK Jpeg
    • Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation
    • Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation• no native support for Jpeg 2000
    • Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation• no native support for Jpeg 2000 ➡ use EMScripten: C-Lib ➟ JS
    • Jpeg, but...• no natives support for CMYK Jpeg ➡ use JS implementation• no native support for Jpeg 2000 ➡ use EMScripten: C-Lib ➟ JS‣ works, but not that performant
    • Fonts
    • Fonts• There are lots of different font formats!
    • Fonts• There are lots of different font formats! • fonts are converted to OpenType
    • Fonts• There are lots of different font formats! • fonts are converted to OpenType • use CSS: @font-face { font-family:font0; src:url(data:font/opentype;base64, ...)
    • Fonts• There are lots of different font formats! • fonts are converted to OpenType • use CSS: @font-face { font-family:font0; src:url(data:font/opentype;base64, ...)• some fonts can’t be converted :(
    • Fonts• There are lots of different font formats! • fonts are converted to OpenType • use CSS: @font-face { font-family:font0; src:url(data:font/opentype;base64, ...)• some fonts can’t be converted :( • paint them
    • FontsType I convert to Type IIType II “use directly”Type III paint ourself CDI convert to Type II
    • FontsType I convert to Type II still needType II “use directly” to repair fonts!Type III paint ourself CDI convert to Type II
    • Infrastructure
    • Infrastructure• Using GitHub
    • Infrastructure• Using GitHub • Issue Tracker
    • Infrastructure• Using GitHub • Issue Tracker • Pull Requests
    • Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki
    • Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push
    • Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push• Testing:
    • Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push• Testing: • In Pull Request: “@pdfjsbot test”
    • Infrastructure• Using GitHub • Issue Tracker • Pull Requests • Wiki• Update gh-pages on every push• Testing: • In Pull Request: “@pdfjsbot test” • Runs tests on AC2 instance
    • Infrastructure
    • Infrastructure• AreWePdfYet?
    • Infrastructure• AreWePdfYet? • Take top100 PDFs from Google
    • Infrastructure• AreWePdfYet? • Take top100 PDFs from Google • render the first 5 pages each
    • Infrastructure• AreWePdfYet? • Take top100 PDFs from Google • render the first 5 pages each • compare to Preview
    • Infrastructure• AreWePdfYet? • Take top100 PDFs from Google • render the first 5 pages each • compare to Preview • http://people.mozilla.com/~bdahl/ corpusreport/test/ref/
    • Todo = Help :)
    • Worker Canvas
    • Read-Only Memory Web Worker
    • Faster Canvas Rendering
    • CMYK Jpeg Jpeg2000
    • Font Load Event
    • WebPrint API
    • XHR Range Support
    • Font Support
    • Parallel Web Worker
    • SVG Backend(text selection [Gecko])
    • “HTML5” Backend
    • Search | Selection | Copy
    • Input Forms
    • More Parts Of Spec
    • Improve Viewer
    • Pref & Memory Analysis
    • Improve TestInfrastructure
    • More Testing!
    • More Testing• use PDF.JS extension!• http://mozilla.github.com/pdf.js/extensions/ firefox/pdf.js.xpi• report broken PDFs!• help us categorize issues
    • Feedback Feature
    • Demo
    • Github: Readme https://github.com/mozilla/pdf.js Issues WikiTwitter: @pdfjsMailing List: https://groups.google.com/group/mozilla.dev.pdf-js/topicsIRC: irc.mozilla.org #pdfjsEngineering Weekly Call: Thursday - 10:00am PDT, 17:00 UTC
    • Q &A