0
PDF	  in	  Smalltalk	     Chris1an	  Haider	  
Introduc1on	  •  PDF	  is	       –  a	  graphics	  Model	       –  a	  document	  Format	  
Graphics	  •  2D	  Vector	  Graphics	  •  Mathema1cal	      –  Paths	      –  Coordinate	  transforma1ons	  •  Dominant	  ...
Documents	  •  Faithful	  Reproduc1on	         –  Abstracts	  from	  OS‘s	  and	  Printers	         –  Fonts	  are	  embed...
Standard	  •  ISO	  32000-­‐2008	  Standard	      –  PDF-­‐1.7	  (Acrobat	  8)	      –  Last	  Standard;	  progress	  thro...
Open	  Source	  •  PDF	  is	  important	  •  PDF	  is	  there	  •  PDF	  is	  big	  •  PDF	  is	  free:	  MIT	  Licence	  
Overview	  •  File	  format	      –  Updates	  •  Object	  Model	      –  Object	  Types	      –  Document	  Structure	  •...
File	  Structure	  %PDF–1.4                      •  Header	  endobj5 0 obj               •  List	  of	  Objects	     (A St...
Updates	                             •  Original	  stays	  unchanged	    Original	  PDF	                                 –...
+	  /	  -­‐	  •  Can	      –  Reading	  any	  valid	  PDF	          •  Updated	  PDFs	  (many	  Xref	  tables)	      –  Wr...
Object	  Model	                        •  Basic	  Values	                            –  null,	  true,	  false	  42 3.14 +7...
Dic1onaries	  <<     /name (a String)     /id 12345     /properties << /active 6 0 R >>>>•    Unordered	  collec1on	  of	 ...
Streams	  << /Length 10 >>             •  Dic1onary	  with	  arbitrary	  data	  stream                             –  Dic1...
Stream	  Filter	  •  Compression	     –    /FlateDecode 	        	  %	  zlib	  (smaller),	  everywhere,	  Predictor	     –...
Implementa1on	  •  PDF	  Classes	  in	  Smalltalk	      –  PDF	  Objects	  implement	  #content	      –  Smalltalk	  Objec...
+	  /	  -­‐	  •  Can	      –  Read	  all	  object	  Types	      –  Write	  any	  Object	      –  Can	  use	  /FlateDecode	...
Speaking	  PDF	  •  With	  this,	  we	  can	  read	  any	  PDF	  •  We	  can	  use	  PDF	  instead	  of	  Smalltalk	      ...
Object	  Model:	  Documents	  •  /Root	      –  /Type	  /Catalog 	  %	  required	      –  /Pages	      –  /Outlines	      ...
Domain	  Objects	  •  Subclass	  of	  Dic1onary	  or	  Stream	      –  May	  be	  typed	  explicitly	  with	  /Type	      ...
Typing	  •  Explicit	  with	  /Type	  •  Implied	  by	  a]ribute	  Type	      –  specialized	  when	  assigning	  to	  an	...
PDF	  Explorer	  •  A	  good	  Writer	  needs	  a	  good	  Reader	      –  and	  vice	  versa	  •  Shows	  the	  Contents	...
+	  /	  -­‐	  •  Can	      –  Infer	  the	  implemented	  Types	      –  Detect	  type	  Errors	      –  Infer	  Version	 ...
Graphics	  •  Stream	  of	  Operators	  with	  Parameters	  •  Executed	  in	  sequence	  to	  produce	  Graphics	  •  /Gr...
Lines	  and	  Paths	  0 0.5 0.5 0 K3 w                      •  Line	  10 100 m300 500 lS0.5 0   0 0.5 k20 40   m20 8040 80...
+	  /	  -­‐	  •  Have	      –  Read	  and	  write	  Opera1ons	  with	  Parameters	      –  Bare	  Metal	      –  Only	  /D...
Text	  BT/F13 12 Tf                             •  Paints	  Chars	  from	  a	  Font	  288 720 Td(Hello World) TjET/Resourc...
About	  Fonts	  •  Occupied	  me	  last	  Year	  •  Varie1es	  of	  vector	  Fonts	      –  PostScript	  Type	  1	      – ...
<< /Type /Font   /Subtype /Type1   /BaseFont /DDPEFM+Tahoma   /FirstChar 32   /LastChar 169                               ...
+	  /	  -­‐	  •  Have	       –  Font	  Explorer	       –  OpenType	  (PostScript	  kind)	       –  Type-­‐1	  (last	  minu...
Transparency	  •  More	  and	  more	  useful:	  Gradients,	  Shadows…	  and	     everywhere	  •  Approach	      –  Combine...
Implementa1on	  •  Graphic	  Editor	  needs	  Screen	  Output	      –  Fonts	      –  Transparency	  •  VisualWorks	  7.8	...
+	  /	  -­‐	  •  Have	      –  Font	  support	  for	  Windows	  •  Don´t	  have	      –  Transparency	      –  Font	  supp...
Documenta1on	  •  Class	  Documenta1on	  from	  the	  Spec	  •  A]ribute	  Documenta1on	  from	  the	  Spec	  •  Extracted...
Extending	  •  Subclass	  (Typed)Dic1onary	  or	  (Typed)Stream	      –  Use	  name	  from	  the	  Spec	      –  Add	  PDF...
+	  /	  -­‐	  •  Have	      –  Good	  places	  for	  Doc	      –  Good	  opera1onal	  Annota1ons	      –  Easy	  to	  exte...
Package	  Structure	  –	  load	  Order	  •  Fonts	      –  (Fonts	  for	  Windows)	  •  PDF	  •  Prerequisites	      –  Va...
To	  do	  •  Support	  por1ng	      –  To	  Pharo,	  Squeak,	  VA,	  Smalltalk/X,	  Dolpin	  …	      –  Problem	  with	  N...
Summary	  
What	  do	  I	  have?	  •  Writer	  for	  smallCharts	     –  Driven	  by	  customer	  Demand	     –  Vector	  Graphics	  ...
What	  I	  don´t	  have	  •  Relaxed	  Reader	        –  Not	  error	  tolerant	  at	  all	  (unlike	  Acrobat)	  •    No	...
Projects	  –	  What	  to	  do	  with	  it?	  •    Vector	  graphics	  Editor	  •    Online	  PDF	  Genera1on	  •    PDF	  ...
References	  •  PDF	  Specifica1on	    h]p://www.adobe.com/devnet/pdf/pdf_reference.html	  •  Project	  Page	  (Docs,	  For...
Upcoming SlideShare
Loading in...5
×

PDF in Smalltalk

1,491

Published on

ESUG 2011, Edinburgh

Published in: Technology, Art & Photos
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,491
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "PDF in Smalltalk"

  1. 1. PDF  in  Smalltalk   Chris1an  Haider  
  2. 2. Introduc1on  •  PDF  is   –  a  graphics  Model   –  a  document  Format  
  3. 3. Graphics  •  2D  Vector  Graphics  •  Mathema1cal   –  Paths   –  Coordinate  transforma1ons  •  Dominant  Model   –  PostScript,  SVG,  …  •  Advanced   –  Transparency  
  4. 4. Documents  •  Faithful  Reproduc1on   –  Abstracts  from  OS‘s  and  Printers   –  Fonts  are  embedded  •  Elaborate  Object  Model  for  Documents   –  Interac1ve   –  Linkable  graphics  Content  •  No  execu1on  Model   –  no  programming  like  PostScript    
  5. 5. Standard  •  ISO  32000-­‐2008  Standard   –  PDF-­‐1.7  (Acrobat  8)   –  Last  Standard;  progress  through  extensions  •  ~  750  Pages   –  79  Indispensable  References  •  Well  wri]en   –  must  have  for  doing  anything  PDF  
  6. 6. Open  Source  •  PDF  is  important  •  PDF  is  there  •  PDF  is  big  •  PDF  is  free:  MIT  Licence  
  7. 7. Overview  •  File  format   –  Updates  •  Object  Model   –  Object  Types   –  Document  Structure  •  Graphics   –  Vector  Graphics   –  Text  and  Fonts   –  Transparency  
  8. 8. File  Structure  %PDF–1.4 •  Header  endobj5 0 obj •  List  of  Objects   (A String)endobj6 0 obj …0000000081 00000 n0000000248 00001 n •  Reference  Table  0000000000 00000 f –  File  Posi1on  of  each  Object  trailer << /Size 22 •  Trailer   /Root 1 0 R >>startxref –  Reference  Table  Size  and  Loca1on  18799%%EOF –  /Root   Show  minimal  
  9. 9. Updates   •  Original  stays  unchanged   Original  PDF   –  Can  be  signed   New/changed   •  New  Objects  are  appended   Objects   •  Objects  can  be  overwri]en  New  XRef  Table   –  Versions   •  New  XRef  Table  for  new  Objects   New/changed   Objects   •  Can  be  Many  New  XRef  Table  
  10. 10. +  /  -­‐  •  Can   –  Reading  any  valid  PDF   •  Updated  PDFs  (many  Xref  tables)   –  Wri1ng  Objects  as  new  File   •  Only  1  XRef  Table  •  Can´t  do   –  Recrea1ng  XRef  Table   –  Upda1ng  PDFs  with  incremental  Changes   –  Linearizing  for  the  Web  
  11. 11. Object  Model   •  Basic  Values   –  null,  true,  false  42 3.14 +7.5 -.3 –  Numbers   •  Integer  or  Real;  only  decimal,  no  exponents  (a String) –  Strings  (with n new Line) •  Encoding:  PDFDoc,  Font,  Unicode  (with char 245)<901FA3> •  Date  (utc  String)  (D:201108241030+0200) –  Names  /Root /with#20space •  Like  Smalltalk  Symbols   •  Arrays  [3.14 (Pi) [/Math]]
  12. 12. Dic1onaries  << /name (a String) /id 12345 /properties << /active 6 0 R >>>>•  Unordered  collec1on  of  Associa1ons  •  Unique  Names  as  Keys  •  Values  are  either  Objects  or  References  •  Null  cannot  be  a  Value  (same  as  absent  Key)  •  The  Root  of  all  other  object  Types  
  13. 13. Streams  << /Length 10 >> •  Dic1onary  with  arbitrary  data  stream –  Dic1onary  must  be  direct  (a String) –  Unlimited  data  endstream –  Must  be  indirect    << /Length 1835 •  Can  have  Filters  to  compress  or  encrypt   /Filter /FlateDecode >> –  Cascaded    -­‐>  [/FlateDecode  /Crypt]  stream…Binary content…endstream      << /Type /XRef •  XRefStreams   /Size … –  Replaces  XRef  Tables   /Root … >> –  Very  compact   •  Object  Streams  
  14. 14. Stream  Filter  •  Compression   –  /FlateDecode    %  zlib  (smaller),  everywhere,  Predictor   –  /LZWDecode    %  zlib  (faster),  Predictor   –  /RunLengthDecode   –  /CCITTFaxDecode    %  B/W  Pictures   –  /JBIG2Decode    %  B/W  Pictures   –  /DCTDecode    %  JPEG  (approximates)   –  /JPXDecode    %  JPEG2000  (loss  less)  •  /Crypt  •  Development   –  /ASCIIHexDecode   –  /ASCII85Decode  
  15. 15. Implementa1on  •  PDF  Classes  in  Smalltalk   –  PDF  Objects  implement  #content   –  Smalltalk  Objects  implement  #asPDF   –  In  separate  namespace  PDF   –  Same  names  as  in  the  spec  (if  possible)   •  Dic1onary,  Array,  String,  Date  etc.   –  Some  Classes  may  be  aliased   •  Name,  Number,  Boolean,  null   –  Can  be  confusing  
  16. 16. +  /  -­‐  •  Can   –  Read  all  object  Types   –  Write  any  Object   –  Can  use  /FlateDecode  for  Reading  and  Wri1ng  •  Cannot   –  No  picture  oriented  stream  filters  
  17. 17. Speaking  PDF  •  With  this,  we  can  read  any  PDF  •  We  can  use  PDF  instead  of  Smalltalk   –  Would  be  cool  to  have  that  in  Smalltalk…  •  We  can  specify  the  PDFs  by  configuring  the   Dic1onaries  •  Domain  Language  PDF  
  18. 18. Object  Model:  Documents  •  /Root   –  /Type  /Catalog  %  required   –  /Pages   –  /Outlines   –  /StructTreeRoot   –  /MetaData    %  XML   –  /Names   –  ….  •  /Page(s)   –  /MediaBox  [0  0  595  842]   –  /Contents      %  Stream  of  graphics  Operators   –  /Resources      %  Fonts,  Images,  Color  Spaces   create  minimal  
  19. 19. Domain  Objects  •  Subclass  of  Dic1onary  or  Stream   –  May  be  typed  explicitly  with  /Type   •  TypedDic1onary  and  TypedStream   –  Has  Version   –  Has  Documenta1on  •  Typed  A]ributes   –  Type(s)   –  direct  or  indirect   –  required/op1onal   –  Version   –  Documenta1on  
  20. 20. Typing  •  Explicit  with  /Type  •  Implied  by  a]ribute  Type   –  specialized  when  assigning  to  an  A]ribute  •  Checks  when  reading   –  Checks  compa1bility  =>  Error   –  Specializes  Objects  •  Reads  lazy  
  21. 21. PDF  Explorer  •  A  good  Writer  needs  a  good  Reader   –  and  vice  versa  •  Shows  the  Contents  of  a  PDF  on  the  object   Level  •  Uses  meta  Data  about  A]ributes  (Version,   Doc,  required  etc.)   Show  PDFExlorer  
  22. 22. +  /  -­‐  •  Can   –  Infer  the  implemented  Types   –  Detect  type  Errors   –  Infer  Version   –  Show  Documenta1on  •  Cannot   –  Not  all  type  restric1ons  are  implemented   –  edit   Time:  30  min  
  23. 23. Graphics  •  Stream  of  Operators  with  Parameters  •  Executed  in  sequence  to  produce  Graphics  •  /GraphicsState   –  holds  all  (28)  A]ributes  for  the  current  Opera1on   –  Can  be  stacked  (nested)  •  Opera1ons  (73)   –  15  groups  of  Func1onality   •  GraphicsState,  Color,  Marking…   •  Paths,  clipping,  Text,  pain1ng…  
  24. 24. Lines  and  Paths  0 0.5 0.5 0 K3 w •  Line  10 100 m300 500 lS0.5 0 0 0.5 k20 40 m20 8040 80 l l •  Filled  Rectangle  40 40 lf Create  Graphics  
  25. 25. +  /  -­‐  •  Have   –  Read  and  write  Opera1ons  with  Parameters   –  Bare  Metal   –  Only  /DeviceCMYK  and  /DeviceGray  •  Don´t  have   –  GraphicsState   –  Enforcing  correct  order  of  Opera1ons   •  Examples:  marking,  text…     –  No  /DeviceRGB  or  any  other  colour  Spaces   –  Higher  Abstrac1ons  (publicly)   •  Graphical  Objects   •  Text  Objects  
  26. 26. Text  BT/F13 12 Tf •  Paints  Chars  from  a  Font  288 720 Td(Hello World) TjET/Resources << /Font << /F13 23 0 R >> •  Needs  /Font  Resource  >> –  Type-­‐1  23 0 obj<< /Type /Font –  TrueType   /Subtype /Type1 /BaseFont /Helvetica >> –  OpenType  endobj Create  Text  
  27. 27. About  Fonts  •  Occupied  me  last  Year  •  Varie1es  of  vector  Fonts   –  PostScript  Type  1   –  TrueType   –  OpenType  (PS  /TT)  •  14  PDF  Standard  Fonts  (Type  1)  
  28. 28. << /Type /Font /Subtype /Type1 /BaseFont /DDPEFM+Tahoma /FirstChar 32 /LastChar 169 •  Font   /Widths [278 …] /FontDescriptor 4 0 R   /Encoding /WinAnsiEncoding >>4 0 obj<< /Type /FontDescriptor /FontName /DDPEFM+Tahoma /Flags 32 /FontBBox [-166 -225 1000 931] /ItalicAngle 0 •  Descriptor   /Ascent 718 /Descent -207 /CapHeight 718 /StemV 88   /FontFile3 5 0 R>>5 0 obj  << /Length 3723 /Subtype /Type1C >> •  File  stream … endstream Create  Text  
  29. 29. +  /  -­‐  •  Have   –  Font  Explorer   –  OpenType  (PostScript  kind)   –  Type-­‐1  (last  minute  implementa1on  J)   •  Standard  14  Fonts   •  Custom  (one  free  example  Font  is  included)   –  Tabular  Glyphs  •  Don´t  have   –  TrueType,  OpenType  (TT)   –  Subsesng   •  Allows  to  publish  custom  graphics   –  Kerning,  Ligatures   –  General  way  to  access  alterna1ve  Glyphs   –  Advanced  Typography  (as  possible  with  OpenType)   Show  FontExplorer  
  30. 30. Transparency  •  More  and  more  useful:  Gradients,  Shadows…  and   everywhere  •  Approach   –  Combine  the  colors  from  different  layers   –  Usually  done  on  pixel  level   –  PDF  on  the  graphics  Level  •  How  to?   –  Create  Graphics  with  own  contents  stream   –  Paint  Graphics  onto  another  Graphics  using  the  right   a]ributes  
  31. 31. Implementa1on  •  Graphic  Editor  needs  Screen  Output   –  Fonts   –  Transparency  •  VisualWorks  7.8  •  Directly  implemented  in  Windows  GDI(+)   –  Text  output  with  pixel  level  adjustments   –  Graphics  (planed)   –  Only  Windows  
  32. 32. +  /  -­‐  •  Have   –  Font  support  for  Windows  •  Don´t  have   –  Transparency   –  Font  support  for   •  TrueType   •  non-­‐Windows  plavorms  
  33. 33. Documenta1on  •  Class  Documenta1on  from  the  Spec  •  A]ribute  Documenta1on  from  the  Spec  •  Extracted  Proper1es  of  A]ributes  and  made   them  opera1onal  •  Docuware  –  1ght  connec1on  between  doc   and  code  
  34. 34. Extending  •  Subclass  (Typed)Dic1onary  or  (Typed)Stream   –  Use  name  from  the  Spec   –  Add  PDF  Documenta1on  to  the  class  comment  •  Add  A]ributes   –  Add  class  method  named  with  a]ribute  Name   –  Add  PDF  Documenta1on  as  comment   –  Extract  Pragmas  from  docu   –  Implement  the  access  (with  or  without  Default)   –  Add  your  Logic  Pages <typeIndirect: #Pages> <required> <attribute: 4 documentation: The page tree node that shall bethe root of the document’s page tree.> ^self objectAt: #Pages Show  code  
  35. 35. +  /  -­‐  •  Have   –  Good  places  for  Doc   –  Good  opera1onal  Annota1ons   –  Easy  to  extent  •  Don´t  have   –  No  class  doc   –  No  PDF  Reference  link   –  Not  all  dependencies  are  implemented   •  requiredIf:  version  =  x  and:  a]ribute  /y  notNil  
  36. 36. Package  Structure  –  load  Order  •  Fonts   –  (Fonts  for  Windows)  •  PDF  •  Prerequisites   –  Values  
  37. 37. To  do  •  Support  por1ng   –  To  Pharo,  Squeak,  VA,  Smalltalk/X,  Dolpin  …   –  Problem  with  Namespaces,  Pragmas?  •  Fonts   –  Subsesng,  Kerning,  Ligatures  •  PostScript  Interpreter  •  GraphicsState  •  Smalltalk  source  parser  for  PDF  
  38. 38. Summary  
  39. 39. What  do  I  have?  •  Writer  for  smallCharts   –  Driven  by  customer  Demand   –  Vector  Graphics  with  custom  Fonts  •  Bare  metal  implementa1on   –  Strictly  implemen1ng  the  Spec   –  Object  Model  •  Implementa1on  in  VisualWorks  7.8   –  On  Windows  
  40. 40. What  I  don´t  have  •  Relaxed  Reader   –  Not  error  tolerant  at  all  (unlike  Acrobat)  •  No  Bitmaps,  no  Reports,  no  Tables  •  No  Encryp1on,  no  signing  •  No  non-­‐la1n  Languages  •  No  pluggable  GraphicsContext  •  No  rendering/pain1ng   –  Acrobat   –  Ghostscript  •  No  screen  support  for  other  Plavorms  •  Ports  to  other  Smalltalks  
  41. 41. Projects  –  What  to  do  with  it?  •  Vector  graphics  Editor  •  Online  PDF  Genera1on  •  PDF  Tools  and  Verifier  •  Renderer  •  Embedding  Viewer   –  Ghostscript  /  Acrobat  
  42. 42. References  •  PDF  Specifica1on   h]p://www.adobe.com/devnet/pdf/pdf_reference.html  •  Project  Page  (Docs,  Forum,  FileOuts…)   h]p://pdf4smalltalk.origo.ethz.ch/  •  Cincom  Public  Store   h]p://www.cincomsmalltalk.com/CincomSmalltalkWiki/PostgreSQL+Access+Page  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×