How to make Google Books at home

3,015 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,015
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

How to make Google Books at home

  1. 1. How to make Google Books at home
  2. 2. in Perl
  3. 3. at home
  4. 4. not to beat Google
  5. 5. What do we have in the internet today?
  6. 6. Find a word
  7. 7. Find the words
  8. 8. Google
  9. 9. Show the page
  10. 10. Show the page and highlight the words
  11. 11. ... Russia ... ece yapc/coe ...
  12. 12. ... Russia ... ece yapc/coe ... Pushkin?
  13. 13. ... Russia ... ece yapc/coe ... YAPC?
  14. 14. ... Russia ... ece yapc/coe ... XIX?
  15. 15. ... Russia ... ece yapc/coe ... WTF?
  16. 16. ece yapc/coe
  17. 17. ece yapc/coe все царское
  18. 18. все царское ece yapc/coe
  19. 19. Amazon
  20. 20. Guess the next screen
  21. 21. Text archive
  22. 22. Berlin
  23. 23. How to make it
  24. 24. PDF
  25. 25. PDF
  26. 26. (Black box) PDF
  27. 27. WEB PDF
  28. 28. Sample PDF use.perl.org/~andy.sh/journal
  29. 29. Work with PDF?
  30. 30. No Work with PDF?
  31. 31. SVG
  32. 32. SVG Scalable vector graphics
  33. 33. SVG Scalable vector graphics http://www.w3.org/Graphics/SVG/
  34. 34. SVG is XML
  35. 35. SVG is XML XML::LibXML
  36. 36. SVG is XML XML::LibXML XPath
  37. 37. SVG is XML XML::LibXML XPath XSLT
  38. 38. PDF http://www.pdftron.com/pdf2svg/ SVG
  39. 39. $ ./pdf2svg book.pdf book.svg
  40. 40. Structure
  41. 41. Geometry
  42. 42. <g> </g>
  43. 43. <g>     <g>     </g> </g>
  44. 44. <g>     <g>     </g>     <g>     </g> </g>
  45. 45. <g>     <g>         <text>         </text>     </g>     <g>     </g> </g>
  46. 46. <g>     <g>         <text>         </text>         <text>         </text>     </g>     <g>     </g> </g>
  47. 47. <g>     <g>         <text>           <tspan>           </tspan>         </text>         <text>         </text>     </g>     <g>     </g> </g>
  48. 48. <g>
  49. 49. <text>
  50. 50. <text     transform=... >
  51. 51. <text     transform=     quot;matrix(       1 0 0 ‐1        10 584     )quot; >
  52. 52. Page
  53. 53. Page g
  54. 54. Page g text
  55. 55. Page g text + transform
  56. 56. <tspan>
  57. 57. Page g text + transform tspan
  58. 58.     my $transform = $node‐>findvalue('@transform');     if ($transform =~ /matrix/) {         my ($sx, $sy, $tx, $ty) = $transform =~ /matrix((‐? d+(?:.d+)?) ‐?d(?:.d+)?+ ‐?d(?:.d+)?+ (‐?d+(?:.d +)?) (‐?d+(?:.d+)?) (‐?d+(?:.d+)?))/;                 print quot;($sx, $sy, $tx, $ty)quot;;         $pos{x} = $sx * $tx;         $pos{x} += $pos{pagew} if $sx < 0;         $pos{y} = $sy * $ty;         $pos{y} += $pos{pageh} if $sy < 0;         print quot; [$pos{x}, $pos{y}]quot;;     }
  59. 59. <tspan  x=quot;0,16.875,26.258,34.695,4 0.314,44.533,49.224,55.789, 60.008,64.699quot; y=quot;‐0quot;  class=quot;ps00 ps23quot;>What is  it</tspan>
  60. 60. <tspan  x=quot;0,16.875,26.258,34.695,4 0.314,44.533,49.224,55.789, 60.008,64.699quot; y=quot;‐0quot;  class=quot;ps00 ps23quot;>What is  it</tspan>
  61. 61. <tspan  x=quot;0,16.875,26.258,34.695,4 0.314,44.533,49.224,55.789, 60.008,64.699quot; y=quot;‐0quot;  class=quot;ps00 ps23quot;>What is  it</tspan>
  62. 62. <tspan  x=quot;0,16.875,26.258,34.695,4 0.314,44.533,49.224,55.789, 60.008,64.699quot; y=quot;‐0quot;  class=quot;ps00 ps23quot;>What is  it</tspan>
  63. 63. <tspan  x=quot;0,16.875,26.258,34.695,4 0.314,44.533,49.224,55.789, 60.008,64.699quot; y=quot;‐0quot;  class=quot;ps00 ps23quot;>What is  it</tspan>
  64. 64. <tspan  x=quot;0,16.875,26.258,34.695,4 0.314,44.533,49.224,55.789, 60.008,64.699quot; y=quot;‐0quot;  class=quot;ps00 ps23quot;>What is  it</tspan>
  65. 65. <tspan  x=quot;0,16.875,26.258,34.695,4 0.314,44.533,49.224,55.789, 60.008,64.699quot; y=quot;‐0quot;  class=quot;ps00 ps23quot;>What is  it</tspan>
  66. 66. YAPC
  67. 67. <tspan>YAPC</tspan>
  68. 68. Y APC
  69. 69. Y APC
  70. 70. <tspan>Y</tspan> <tspan>APC</tspan>
  71. 71. Dictionary
  72. 72. mysql> select * from base where base like 'seek'; +‐‐‐‐‐‐‐‐+‐‐‐‐‐‐+‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐+ | id     | base | rules | grammar | +‐‐‐‐‐‐‐‐+‐‐‐‐‐‐+‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐+ | 189785 | seek | GRSZ  |         |  +‐‐‐‐‐‐‐‐+‐‐‐‐‐‐+‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐+
  73. 73. mysql> select * from word where ref = 189785; +‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐+ | ref    | word    | +‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐+ | 189785 | seek    |  | 189785 | seeking |  | 189785 | seeker  |  | 189785 | seeks   |  | 189785 | seekers |  +‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐+
  74. 74. YAPC attendee seeks where to drink after the evening talk.
  75. 75. Morphology YAPC attendee seeks where to drink after the evening talk.
  76. 76. Stop words YAPC attendee seeks where to drink after the evening talk.
  77. 77. YAPC attendee seeks where to drink after the evening talk.
  78. 78. yapc attendee seeks where to drink after the evening talk.
  79. 79. DEMO live demonstration at http://booksearch.andy.sh
  80. 80. __END__ Andrew Shitov http://andy.sh

×