Cleaning plain text books with Text::Perfide::BookCleaner

783
-1

Published on

Slides from a presentation about Text::Perfide::BookCleaner given at PtPW2011. T::P::BC is a Perl module created to clean books in plain text format, making them suitable for further automatic text processing activities.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
783
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cleaning plain text books with Text::Perfide::BookCleaner

  1. 1. Cleaning plain text books withText::Perfide::BookCleaner Andr´ Santos e andrefs@cpan.org September 23, 2011
  2. 2. Introduction Per-Fide1 Introduction Per-Fide Text alignment Books2 Text::Perfide::BookCleaner3 Conclusions, wish list and future work Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  3. 3. Introduction Per-Fide1 Introduction Per-Fide Text alignment Books2 Text::Perfide::BookCleaner3 Conclusions, wish list and future work Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  4. 4. Introduction Per-FideProject Per-Fide Joint venture between the Computer Science Department and the School of Humanities of the University of Minho Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  5. 5. Introduction Per-FideProject Per-Fide Joint venture between the Computer Science Department and the School of Humanities of the University of Minho Portuguese in parallel with six languages: Espa˜ol, Russian, Fran¸ais, Italiano, Deutsch, n c English Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  6. 6. Introduction Per-FideProject Per-Fide Joint venture between the Computer Science Department and the School of Humanities of the University of Minho Portuguese in parallel with six languages: Espa˜ol, Russian, Fran¸ais, Italiano, Deutsch, n c English Build parallel corpora that will establish a relation between Portuguese and the other 6 languages Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  7. 7. Introduction Per-Fide[Parallel] Corpora Corpora Collection of natural language texts Parallel corpora Collection of nat. lang. bitexts Bitext Pair formed by a text in a given language and its translation in another language, frequently aligned. Alignment Mapping between the sentences/paragraphs/words of one text and the other. Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  8. 8. Introduction Per-FideProject Per-Fide Original texts in the seven languages and their translations Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  9. 9. Introduction Per-FideProject Per-Fide Original texts in the seven languages and their translations Two main genres: contemporary fiction and non-fiction Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  10. 10. Introduction Per-FideProject Per-Fide Original texts in the seven languages and their translations Two main genres: contemporary fiction and non-fiction non-fiction: judicial, journalistic, religious, technical, ... Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  11. 11. Introduction Per-FideProject Per-Fide Original texts in the seven languages and their translations Two main genres: contemporary fiction and non-fiction non-fiction: judicial, journalistic, religious, technical, ... fiction: contemporary novels and short stories Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  12. 12. Introduction Per-FideProject Per-Fide Original texts in the seven languages and their translations Two main genres: contemporary fiction and non-fiction non-fiction: judicial, journalistic, religious, technical, ... fiction: contemporary novels and short stories per-fide.di.uminho.pt Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  13. 13. Introduction Text alignmentText alignment Manual or automatic Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  14. 14. Introduction Text alignmentText alignment Manual or automatic Paragraph/sentence/word level Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  15. 15. Introduction Text alignmentText alignment Manual or automatic Paragraph/sentence/word level Automatic alignment tools/algorithms generally fall into three categories: Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  16. 16. Introduction Text alignmentText alignment Manual or automatic Paragraph/sentence/word level Automatic alignment tools/algorithms generally fall into three categories: length based: “when two sentences correspond, the words in them also correspond” Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  17. 17. Introduction Text alignmentText alignment Manual or automatic Paragraph/sentence/word level Automatic alignment tools/algorithms generally fall into three categories: length based: “when two sentences correspond, the words in them also correspond” lexical/dictionary based: relies on lexical information or dictionaries to perform the alignment Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  18. 18. Introduction Text alignmentText alignment Manual or automatic Paragraph/sentence/word level Automatic alignment tools/algorithms generally fall into three categories: length based: “when two sentences correspond, the words in them also correspond” lexical/dictionary based: relies on lexical information or dictionaries to perform the alignment partial similarity (cognates) based: relies on occurrences of tokens graphically or otherwise identical (cognates) Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  19. 19. Introduction Text alignmentText alignment – Example Table: Extract of sentence-level alignment performed using Portuguese and Russian subtitles from the movie Tron. Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  20. 20. Introduction BooksBooks Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  21. 21. Introduction BooksBooks Obtained directly from publishers or, if in public domain, from Project Gutenberg and similar projects Large variety of formats: PDF, MS Word, HTML, ebook formats, ... Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  22. 22. Introduction BooksBooks Obtained directly from publishers or, if in public domain, from Project Gutenberg and similar projects Large variety of formats: PDF, MS Word, HTML, ebook formats, ... If not already in plain text, they need to be converted before the alignment Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  23. 23. Introduction BooksBooks Obtained directly from publishers or, if in public domain, from Project Gutenberg and similar projects Large variety of formats: PDF, MS Word, HTML, ebook formats, ... If not already in plain text, they need to be converted before the alignment This is where all the trouble starts! Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  24. 24. Introduction BooksBook alignment problems pagination – page numbers, headers, footers, . . . previous text formatting – sub/superscript, bold, italics, . . . sections paragraphs translineations and transpaginations footnotes text encoding ... Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  25. 25. Introduction BooksBook alignment problems – Example (. . . ) gaiement. Sur le devant s<92>’ouvrait la porte d<92>’entr´e, donnant acc`s dans la salle commune. e e Une l´g`re v´randa, qui en prot´- e e e e <96>- 86 <96>- ^L geait la partie ant´rieure contre l<92>’action e des rayons solaires, reposait sur de sveltes bambous. Le tout ´tait peint d<92>’une fra^che e ı (. . . ) La Jangada, Jules Verne Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  26. 26. Text::Perfide::BookCleaner1 Introduction Per-Fide Text alignment Books2 Text::Perfide::BookCleaner3 Conclusions, wish list and future work Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  27. 27. Text::Perfide::BookCleaner1 Introduction Per-Fide Text alignment Books2 Text::Perfide::BookCleaner3 Conclusions, wish list and future work Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  28. 28. Text::Perfide::BookCleanerFirst approach RegExp + Find & Replace Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  29. 29. Text::Perfide::BookCleanerFirst approach RegExp + Find & Replace Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  30. 30. Text::Perfide::BookCleanerFirst approach Well-intentioned but: Too na¨ıve Big mess A more sofisticated approach was needed! Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  31. 31. Text::Perfide::BookCleanerArchitecture Build a pipeline; each step handles a specific set of problems. 1 pages 2 sections 3 paragraphs 4 footnotes 5 chars 6 ... Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  32. 32. Text::Perfide::BookCleanerArchitecture Build a pipeline; each step handles a specific set of problems. 1 pages 2 sections 3 paragraphs 4 footnotes 5 chars 6 ... 7 commit Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  33. 33. Text::Perfide::BookCleanerArchitecture Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  34. 34. Text::Perfide::BookCleanerArchitecture whenever possible, use ontologies and DSLs they help organizing stuff they allow to abstract from the code and discuss details at a higher level (even with people from other areas) Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  35. 35. Text::Perfide::BookCleanerPages Goal Identify and remove from text elements related to book pagination: page numbers headers footers page breaks These elements often lead to a bad performance of the aligner. Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  36. 36. Text::Perfide::BookCleanerPages – Example est vrai qu’il fallait etre assez chanceux pour ^ rencontrer le nabab, et assez audacieux pour s’emparer de sa personne. Page 3 ^L La maison ` vapeur a Jules Verne Le faquir, - evidemment le seul entre tous ´ que ne surexcit^t pas l’espoir de gagner la a prime, - filait au milieu des groupes, s’arr^tant e La Maison ` Vapeur, Jules Verne a Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  37. 37. Text::Perfide::BookCleanerPages – Algorithm 1 identify page breaks (e.g., ^L ) 2 nearby: candidates to headers and footers 3 count the occurrences of each normalized candidate 4 headers and footers are extracted from candidates which occur more thant a threshold value 5 replace everything with a custom mark 6 move all the necessary information to a standoff file Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  38. 38. Text::Perfide::BookCleanerPages – Example est vrai qu’il fallait etre assez chanceux pour ^ rencontrer le nabab, et assez audacieux pour s’emparer de sa personne. Page 3 ^L La maison ` vapeur a Jules Verne Le faquir, - evidemment le seul entre tous ´ que ne surexcit^t pas l’espoir de gagner la a prime, - filait au milieu des groupes, s’arr^tant e La Maison ` Vapeur, Jules Verne a Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  39. 39. Text::Perfide::BookCleanerPages – Example est vrai qu’il fallait etre assez chanceux pour ^ rencontrer le nabab, et assez audacieux pour s’emparer de sa personne. _pb2_ Le faquir, - evidemment le seul entre tous ´ que ne surexcit^t pas l’espoir de gagner la a prime, - filait au milieu des groupes, s’arr^tant e La Maison ` Vapeur, Jules Verne a Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  40. 40. Text::Perfide::BookCleanerSections Goal Identify and normalize the divisions between the several sections of a book (parts, chapters, acts, scenes, epilogue, afterword, ...) An ontology was created, containing types of divisions and subdivisions, in several languages. Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  41. 41. Text::Perfide::BookCleanerSections – Ontology Example cap PT cap´tulo, cap, capitulo ı FR chapitre, chap EN chapter, chap NT sec PT fim FR fin EN the_end BT _alone This ontology is used to automatically generate a parte of the code. Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  42. 42. Text::Perfide::BookCleanerSections – Example PRIMEIRA PARTE FANTINE ^L LIVRO PRIMEIRO UM JUSTO O abade Myriel Em 1815, era bispo de Digne, o reverendo Carlos Francisco Bemvindo Myriel, o qual contava setenta e Os Miser´veis, Vitor Hugo a Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  43. 43. Text::Perfide::BookCleanerSections – Algorithm 1 Search for potential sections divisions: lines with keywords – cap´ıtulo, chapter, Chap., Appendix, Table des Mati´res, . . . e pages or lines containing only numbers roman numbering ... 2 Insert a custom mark immediately before the section identified Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  44. 44. Text::Perfide::BookCleanerSections – Example PRIMEIRA PARTE FANTINE ^L LIVRO PRIMEIRO UM JUSTO O abade Myriel Em 1815, era bispo de Digne, o reverendo Carlos Francisco Bemvindo Myriel, o qual contava setenta e Os Miser´veis, Vitor Hugo a Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  45. 45. Text::Perfide::BookCleanerSections – Example _sec+O:PARTE=PRIMEIRA_ FANTINE _sec+O:LIVRO=PRIMEIRO_ UM JUSTO O abade Myriel Em 1815, era bispo de Digne, o reverendo Carlos Francisco Bemvindo Myriel, o qual contava setenta e Os Miser´veis, Vitor Hugo a Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  46. 46. Text::Perfide::BookCleanerSections Identifying the different parts within a bitext: allows to subsequently compare the two versions and remove parts which can only be found in one of them allows to perform a structural alignment1 1 Text::Perfide::BookSync Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  47. 47. Text::Perfide::BookCleanerParagraphs Goal Handles things related with identifying and normalizing paragraph notation, direct speech, etc. Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  48. 48. Text::Perfide::BookCleanerParagraphs – Example L’h^tesse prit la d´fense de son cur´: o e e - D’ailleurs, il en plierait quatre comme vous sur son genou. Il a, l’ann´e derni`re, aid´ nos gens a e e e ` rentrer la paille; il en portait jusqu’` six bottes a a la fois, tant il est fort! ` - Bravo! dit le pharmacien. Envoyez donc vos filles en confesse a des gaillards d’un temp´rament pareil! ` e Moi, si j’´tais le gouvernement, je voudrais qu’on e saign^t les pr^tres une fois par mois. a e Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  49. 49. Text::Perfide::BookCleanerParagraphs – Example L’h^tesse prit la d´fense de son cur´: o e e "D’ailleurs, il en plierait quatre comme vous sur son genou. Il a, l’ann´e derni`re, aid´ nos gens a e e e ` rentrer la paille; il en portait jusqu’` six bottes a a la fois, tant il est fort! " ` "Bravo!" dit le pharmacien. "Envoyez donc vos filles en confesse a des gaillards d’un temp´rament pareil! ` e Moi, si j’´tais le gouvernement, je voudrais qu’on e saign^t les pr^tres une fois par mois." a e Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  50. 50. Text::Perfide::BookCleanerParagraphs – Algorithm paragraph identification is performed by calculating metrics based on the number of blank lines and indentation identification and normalization of direct speech: punctuation, paragraph, dash text in quotes Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  51. 51. Text::Perfide::BookCleanerFootnotes Goal Identify and remove footnote callmarks and footnote expansions Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  52. 52. Text::Perfide::BookCleanerFootnotes – Example On fit un inventaire de son argent comptant, et on le mena dans le ch^teau que fit construire le roi a Charles V, fils de Jean II, aupr`s de la rue e Saint-Antoine, a la porte des Tournelles[1]. ` [1] La Bastille, qui fut prise par le peuple de Paris, le 14 juillet 1789, puis d´molie. B. e ^L Quel etait en chemin l’´tonnement de l’Ing´nu! ´ e e je vous le laisse a penser. Il crut d’abord ` que c’´tait un r^ve. e e Oeuvres de Voltaire, Voltaire Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  53. 53. Text::Perfide::BookCleanerFootnotes – Algorithm 1 Search for footnote expansions (lines beggining with <<1>>, [2], ^3, . . . ) 2 Replace with custom mark 3 Only footnote call marks left 4 Search again for the same patterns in the middle of the text 5 Replace with custom mark Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  54. 54. Text::Perfide::BookCleanerFootnotes – Algorithm On fit un inventaire de son argent comptant, et on le mena dans le ch^teau que fit construire le roi a Charles V, fils de Jean II, aupr`s de la rue e Saint-Antoine, a la porte des Tournelles[1]. ` [1] La Bastille, qui fut prise par le peuple de Paris, le 14 juillet 1789, puis d´molie. B. e (fbox^LQuel ´tait en chemin l’´tonnement de l’Ing´nu! e e e je vous le laisse a penser. Il crut d’abord ` que c’´tait un r^ve. e e Oeuvres de Voltaire, Voltaire Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  55. 55. Text::Perfide::BookCleanerFootnotes – Algorithm On fit un inventaire de son argent comptant, et on le mena dans le ch^teau que fit construire le roi a Charles V, fils de Jean II, aupr`s de la rue e Saint-Antoine, a la porte des Tournelles_fnr29_. ` _fne8_ ^L Quel etait en chemin l’´tonnement de l’Ing´nu! ´ e e je vous le laisse a penser. Il crut d’abord ` que c’´tait un r^ve. e e Oeuvres de Voltaire, Voltaire Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  56. 56. Text::Perfide::BookCleanerWords and characters translineations text encoding ... Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  57. 57. Text::Perfide::BookCleanerReport Previous steps produce a report Summarizes what was found, what was assumed and what was done Main goal is to allow to make a diagnostic of the program, allowing to manually emend what is wrong Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  58. 58. Text::Perfide::BookCleanerReport livros/_FR_15.pdf.txt: footers=[’( Page) = 241’] headers=[ "(La maison x{e0} vapeur Jules Verne) = 241"] ctrL=1; pagnum_ctrL=241; sectionsO=2; sectionsN=30; word_tr=58; words=118036; Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  59. 59. Text::Perfide::BookCleanerCommit Final and irreversible step which removes all the custom marks added by the previous steps Outputs a cleaned copy of the document This is the last stage before the alignment (or any other further processing) Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  60. 60. Conclusions, wish list and future work1 Introduction Per-Fide Text alignment Books2 Text::Perfide::BookCleaner3 Conclusions, wish list and future work Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  61. 61. Conclusions, wish list and future work1 Introduction Per-Fide Text alignment Books2 Text::Perfide::BookCleaner3 Conclusions, wish list and future work Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  62. 62. Conclusions, wish list and future workConclusions and wish list There is no de facto standard format for plain text books (documents?) Documents are way heterogeneous (provenience, type and quantity, notation formats, . . . ) Hurrah to regular expressions! 20/80 rule applies Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  63. 63. Conclusions, wish list and future workConclusions and wish list Ontologies and DSLs lead to a better structure Common pattern: search text calculate metrics perform action accordingly Report generated at the end should present a smart summary of what was found and done Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  64. 64. Conclusions, wish list and future workRelated ongoing work Text::Perfide::BookPairs Find repeated books and pairs of books (same book in different languages) within a collection Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  65. 65. Conclusions, wish list and future workRelated ongoing work Text::Perfide::BookPairs Find repeated books and pairs of books (same book in different languages) within a collection Text::Perfide::BookSync Uses the section delimitation made by T::P::BC to make a structural alignment: Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  66. 66. Conclusions, wish list and future workRelated ongoing work Text::Perfide::BookPairs Find repeated books and pairs of books (same book in different languages) within a collection Text::Perfide::BookSync Uses the section delimitation made by T::P::BC to make a structural alignment: Text::Perfide::CorporaFlow Uses a DSL to guide the corpora preparation workflow (to be done) Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  67. 67. Conclusions, wish list and future workRelated ongoing work Text::Perfide::BookPairs Find repeated books and pairs of books (same book in different languages) within a collection Text::Perfide::BookSync Uses the section delimitation made by T::P::BC to make a structural alignment: Text::Perfide::CorporaFlow Uses a DSL to guide the corpora preparation workflow (to be done) Text::Perfide::SciPaperCleaner Cleaner for scientific papers (to be done) Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  68. 68. Conclusions, wish list and future workFuture work Standoff annotation – no changes in the original file until commit Export to ebook formats – .fb2, .epub, . . . ... Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  69. 69. Conclusions, wish list and future workCPAN Is it on CPAN yet? Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  70. 70. Conclusions, wish list and future workCPAN Is it on CPAN yet? No, but it will be really, really soon! Missing More and better documentation More and better tests Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  71. 71. Conclusions, wish list and future workQuestions o/ Andr´ Santos e andrefs@cpan.org Andr´ Santos andrefs@cpan.org e Cleaning plain text books with Text::Perfide::BookCleaner
  72. 72. Cleaning plain text books withText::Perfide::BookCleaner Andr´ Santos e andrefs@cpan.org September 23, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×