Bień, Janusz S. (2012) Scanned publications in digital libraries: new Open Source DjVu tools. In: the Library 2.012 worldwide virtual conference, October 3 - 5, 2012, Internet. (Unpublished). Formerly at http://bc.klf.uw.edu.pl/298/.
My presentation for the third day at the Open P2P Design workshop organized with Roger Pitiot at IDAS in Singapore.
http://www.workshop.colab-design.org/
My presentation for the third day at the Open P2P Design workshop organized with Roger Pitiot at IDAS in Singapore.
http://www.workshop.colab-design.org/
Open Source Software (OSS) is sometimes associated with Freeware and Shareware, but this webinar will eliminate that confusion and discuss the value of all three of these for your library. With libraries facing Draconian budget cuts it seems natural for them to select and use a variety of the above-mentioned software tools, but this frequently is not the case. Learn why in this presentation and leave with a jam-packed software toolbox.
Michael Weber - Rechenkraft.net - From Volunteers to ScientistsCitizenCyberlab
Michael Weber presenting Rechenkraft.net - From Volunteers to Scientists, at the Citizen Cyberlab Summit, 17-18 September 2015, University of Geneva (UNIGE).
We all know that creating or adapting high-quality OER can be pretty difficult, but it doesn't have to be. OERPub and Connexions, as well as other partners, are working together to build a next generation open-source web editor, that helps authors create rich open educational resources (OER) from scratch or using their existing educational materials. These new tools make it easy to collaboratively create, adapt, and distribute OER so students can get them however students need them (print, web, computer, mobile tablet or phone, etc).
In this webinar, we will show of the latest advances in open education publishing technology as well as invite discussion about what authors need to create truly interactive learning content.
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...locloud
This report presents the prototype of LoCloud Collections system. The main aim of LoCloud Collections (initially named Lightweight Digital Library) is to
provide small cultural institutions with the possibility to host their digitized collections (metadata as well as content) very easily in the cloud, and make that data widely available on the internet, and in particular to Europeana.
The developed service prototype is available on-line at https://locloudhosting.net/ and can be used by anyone to create new digital library in just a few minutes.
A presentation by Gordon Dunsire.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
Open P2P Design: A Metadesign methodology for Open Design Projects @IaacMassimo Menichinelli
Presentation about Open P2P Design applied to Open Design projects at the Institute for Advanced Architecture of Catalonia,
Barcelona
01-02-10
http://www.iaac.net/
http://www.iaacblog.com/2010/02/s2-open-source-design-5/
Open P2P Design brings open source and peer-to-peer dynamics inside a community-centered design process, in order to have real co-design projects with people and their communities. We can use Open P2P Design for co-designing Open Design processes or commercial or public services with open and peer-to-peer dynamics, starting from communities and involving them inside the design process. We can also use it for analyzing an existing business and opening to collaboration some of its activities, or design new ones in order to start a collaboration with a community of users.
http://dmy-berlin.com/en/festival/2011-2/makerlab/
LSDigital is an “Add-on” item to LibSys software and thus is fully compatible with LibSys. The entire process is fully automated thereby requiring minimum effort by the user and integration with LibSys database is implicit. It provides great benefits to the users who would be able to search both digitized and non-digitized library collection through a common library OPAC
Open Source Software (OSS) is sometimes associated with Freeware and Shareware, but this webinar will eliminate that confusion and discuss the value of all three of these for your library. With libraries facing Draconian budget cuts it seems natural for them to select and use a variety of the above-mentioned software tools, but this frequently is not the case. Learn why in this presentation and leave with a jam-packed software toolbox.
Michael Weber - Rechenkraft.net - From Volunteers to ScientistsCitizenCyberlab
Michael Weber presenting Rechenkraft.net - From Volunteers to Scientists, at the Citizen Cyberlab Summit, 17-18 September 2015, University of Geneva (UNIGE).
We all know that creating or adapting high-quality OER can be pretty difficult, but it doesn't have to be. OERPub and Connexions, as well as other partners, are working together to build a next generation open-source web editor, that helps authors create rich open educational resources (OER) from scratch or using their existing educational materials. These new tools make it easy to collaboratively create, adapt, and distribute OER so students can get them however students need them (print, web, computer, mobile tablet or phone, etc).
In this webinar, we will show of the latest advances in open education publishing technology as well as invite discussion about what authors need to create truly interactive learning content.
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...locloud
This report presents the prototype of LoCloud Collections system. The main aim of LoCloud Collections (initially named Lightweight Digital Library) is to
provide small cultural institutions with the possibility to host their digitized collections (metadata as well as content) very easily in the cloud, and make that data widely available on the internet, and in particular to Europeana.
The developed service prototype is available on-line at https://locloudhosting.net/ and can be used by anyone to create new digital library in just a few minutes.
A presentation by Gordon Dunsire.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
Open P2P Design: A Metadesign methodology for Open Design Projects @IaacMassimo Menichinelli
Presentation about Open P2P Design applied to Open Design projects at the Institute for Advanced Architecture of Catalonia,
Barcelona
01-02-10
http://www.iaac.net/
http://www.iaacblog.com/2010/02/s2-open-source-design-5/
Open P2P Design brings open source and peer-to-peer dynamics inside a community-centered design process, in order to have real co-design projects with people and their communities. We can use Open P2P Design for co-designing Open Design processes or commercial or public services with open and peer-to-peer dynamics, starting from communities and involving them inside the design process. We can also use it for analyzing an existing business and opening to collaboration some of its activities, or design new ones in order to start a collaboration with a community of users.
http://dmy-berlin.com/en/festival/2011-2/makerlab/
LSDigital is an “Add-on” item to LibSys software and thus is fully compatible with LibSys. The entire process is fully automated thereby requiring minimum effort by the user and integration with LibSys database is implicit. It provides great benefits to the users who would be able to search both digitized and non-digitized library collection through a common library OPAC
Kilka uwag o słownikach przyszłości i Radzie Języka Polskiegojsbien
Bień, Janusz S. (2004) Kilka uwag o słownikach przyszłości i Radzie Języka Polskiego. Referat na posiedzeniu Towarzystwa Miłośników Języka Polskiego, 4.04.2004, Toruń.
Bień, Janusz S. and Bilińska, Joanna A. (2012) Słownik Lindego jako korpus. In: Bibliotheca Lindiana. Samuel Bogumił Linde (1771 – 1847), pierwszy dyrektor Biblioteki Uniwersyteckiej w Warszawie. W 165 rocznicę śmierci., 19 – 20 listopada 2012 roku, Warszawa. Licencja Creative Commons Attribution
Otwarty dostęp do zasobów lingwistycznych w praktycejsbien
Język jest labiryntem ścieżek. Nowe kierunki i nowe zadania w badaniach nad językiem polskim, 18-19 czerwca 2016 r., Warszawa. Licencja Creative Commons Attribution. Patrz także Prace Filologiczne, LXXI. pp. 23-31. ISSN 0138-0567.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Scanned publications in digital libraries: new Open Source DjVu tools.
1. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
.
.
. ..
.
.
Scanned publications in digital libraries:
new Open Source DjVu tools
Janusz S. Bień
Formal Linguistics Department, University of Warsaw
The Library 2.012 Worldwide Virtual Conference
October 3 - 5, 2012
http://bc.klf.uw.edu.pl/298/
2. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Introduction
General information
.
Grant ”Digitalization tools for philological research” 2009-2012
..
.
. ..
.
.
The tools were developed within the Ministry of Science and Higher
Education’s grant (no. N N519 384036) directed by the present author.
.
Some links
..
.
. ..
.
.
The project site: https://bitbucket.org/jsbien/ndt
Our digital library: http://bc.klf.uw.edu.pl/
.
Mailing lists
..
.
. ..
.
.
the announcement list:
http://lists.mimuw.edu.pl/listinfo/nmpt-ann
the discussion and support list:
http://lists.mimuw.edu.pl/listinfo/nmpt-l
3. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Introduction
Grant results
.
A DjVu search engine (client-server architecture)
..
.
. ..
.
.
Poliqarp for DjVu — the Poliqarp server extension by Jakub Wilk
marasca — the WWW client by Jakub Wilk,
cf. http://poliqarp.wbl.klf.uw.edu.pl/en/
djview4poliqarp — the remote client for Debian/Ubuntu and
MSWindows by Michał Rudolf,
cf. https://bitbucket.org/mrudolf/
djview-poliqarp/downloads
.
DjVu utilities
..
.
. ..
.
.
pdf2djvu, didjvu, ocrodjvu, djvusmooth by Jakub Wilk
some experimental tools
by Tomasz Olejniczak, Michał Rudolf and Piotr Sikora
4. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Introduction
An example: searching a geographical gazeteer
5. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu and DjVuLibre
Yann Le Cun, Léon Bottou, Patrick Haffner, and Paul G. Howard
1996
.
What is DjVu? More then just a format for scans…
..
.
. ..
.
.
an image compression technique, a document format, and a
software platform for delivering documents images over the
Internet
.
OCR, searching and indexing
..
.
. ..
.
.
DjVu pages can contain a ”hidden text” chunk which
includes the recognized text as well as the coordinates of
each word on the page in a compressed form.
Quoted from:
http://leon.bottou.org/papers/lecun-2001
6. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu and DjVuLibre
.
Some design principles (for remote access)
..
.
. ..
.
.
Action Real-word equivalent Acceptable delay
Zooming/Panning Moving the eyes Immediate
Next/Previous Page Turning a page < 1 second
Random Page access Finding a page < 3 seconds
Quoted from:
http://leon.bottou.org/papers/lecun-2001
.
Unbundled DjVu documents
..
.
. ..
.
.
Every page can be stored and served as a separate file!
7. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu metadata (also eXtensible Metadata Platform)
8. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu outlines
9. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu annotations
10. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu annotations
11. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu external hyperlinks
12. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu internal hyperlinks
13. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
DjVu internal hyperlinks
14. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Referencing DjVu documents (the first page)
15. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Referencing DjVu documents (a specific page)
16. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Referencing DjVu documents (a view)
17. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Referencing DjVu documents (highlightings)
18. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
URLs for DjVu documents
http://triggs.djvu.org/century-
dictionary.com/04/index04.djvu
19. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
URLs for DjVu documents
http://triggs.djvu.org/century-
dictionary.com/04/index04.djvu
?djvuopts=&page=p2719.djvu
20. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
URLs for DjVu documents
http://triggs.djvu.org/century-
dictionary.com/04/index04.djvu
?djvuopts=&page=p2719.djvu
&zoom=556&showposition=0.49,0.22
21. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
URLs for DjVu documents
http://triggs.djvu.org/century-
dictionary.com/04/index04.djvu
?djvuopts=&page=p2719.djvu
&zoom=556&showposition=0.49,0.22
&highlight=1100,3735,217,46
&highlight=1284,3538,166,35
&highlight=1640,3538,166,35
&highlight=1901,3288,168,35
22. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Creating URLs with djview
23. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Creating URLs with djview
24. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Creating URLs with djview
25. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
Layer of DjVu documents
Graphic layers:
Stencil
Background
usually in lower resolution
Foreground
encoded using shape dictionaries
Hidden text layer
encoded in Unicode
26. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
The legal status of the DjVu technology
a patent (or more?) granted
some patents pending (?)
crucial code (DjVuLibre, djview)
available on
GNU General Public Licence
and maintained by the inventors of DjVu
http://djvu.sourceforge.net/
new software
available on
GNU General Public Licence
27. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Why DjVu?
GNU GPL
.
GNU General Public License
..
.
. ..
.
.
4 freedoms
(http://www.gnu.org/philosophy/free-sw.html):
The freedom to run the program, for any purpose.
The freedom to study how the program works,
and adapt it to your needs.
The freedom to redistribute copies
so you can help your neighbor.
The freedom to improve the program,
and release your improvements to the public,
so that the whole community benefits.
28. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Sample input files
Tagged Image File Format
.
ABBY FineReader 11
..
.
. ..
.
.
29. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Sample input files
ABBY FineReader 11 — Optical Character Recognition
30. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Sample input files
ABBY FineReader 11 OCR output — text under image
.
Formats and sizes
..
.
. ..
.
.
50M Parkosz4demoFR11pdf.pdf
11M Parkosz4demoFR11pdfMRC.pdf
2.2M Parkosz4demoFR11pdfaMRC.pdf
779K Parkosz4demoFR11djvu.djvu
.
MRC — Multiple Raster Content
..
.
. ..
.
.
A time-consuming compression method with a high compression ratio
.
PDF/A — a variant of Portable Document Format for archiving
..
.
. ..
.
.
ISO 19005-1:2005. Document management — Electronic
document file format for long-term preservation …
…
31. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Sample input files
A FineReader 11 PDF output — outline and metadata
32. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Sample input files
The FineReader 11 DjVu output — outline and metadata
33. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Sample input files
The FineReader 11 DjVu output
— foregorund and hidden text
34. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
pdf2djvu
pdf2djvu
http://jwilk.net/software/pdf2djvu
Developed since 2007,
current version 0.7.14 (released on 2012-09-18)
Platforms: included in the following Unix distributions
Debian, Ubuntu, openSUSE, FreeBSD,
Digitlab (http://dl.psnc.pl/2012/09/23/digitlab/)
Win32 (with a GUI: http://www.trustfm.net/
GeneralTools/SoftwarePdfToDjvuGUI.php)
Language versions:
English, German, Polish, Russian, Ukrainian
Demonstration: Open Virtual Appliance
http://fleksem.klf.uw.edu.pl/ndt/
ubuntu4poliqarp/NDT_ubuntu4poliqarp1.ova
35. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
pdf2djvu
pdf2djvu users
.
Debian/Ubuntu popularity contest
..
.
. ..
.
.
Installed/votes: ∼ 45 000/1000. Debian only actual use:
36. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
pdf2djvu
pdf2djvu simple use example
pdf2djvu -d 600 Parkosz4demoFR11pdf.pdf
-o Parkosz4demoFR11pdf_p2d600.pdf
Parkosz4demoFR11pdf.pdf:
- page #1 -> #1
- page #2 -> #2
- page #3 -> #3
- page #4 -> #4
- page #5 -> #5
- page #6 -> #6
- page #7 -> #7
- Warning: metadata[CreationDate] is not a valid date
- Warning: metadata[ModDate] is not a valid date
0,091 bits/pixel; 42,890:1, 97,67% saved,
51802361 bytes in, 1207797 bytes out
37. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
pdf2djvu
pdf2djvu output size
50M Parkosz4demoFR11pdf.pdf
11M Parkosz4demoFR11pdfMRC.pdf
2.2M Parkosz4demoFR11pdfaMRC.pdf
1.4M Parkosz4demoFR11djvuMRC_p2d600.djvu
1.3M Parkosz4demoFR11djvuaMRC_p2d600.djvu
1.2M Parkosz4demoFR11pdf_p2d600.djvu
779K Parkosz4demoFR11djvu.djvu
38. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
pdf2djvu
pdf2djvu output — foregorund and hidden text
39. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
pdf2djvu
pdf2djvu output — outline and metadata
41. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
didjvu
didjvu
didjvu uses the Gamera framework to separate foreground/background
layers, which can be then encoded into a DjVu file.
http://jwilk.net/software/didjvu
http://gamera.informatik.hsnr.de/
(http://minidjvu.sourceforge.net/)
Developed since 2009, current version 0.2.6 (released on 2012-05-15)
Platforms: Linux distributions Debian, Ubuntu
Debian+Ubuntu popcon installed/votes: ∼ 200/20
Language versions: English
Demonstration: Open Virtual Appliance
http://fleksem.klf.uw.edu.pl/ndt/
ubuntu4poliqarp/NDT_ubuntu4poliqarp1.ova
42. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
didjvu
didjvu simple use example
didjvu bundle -d 600 -o Parkosz_di600.djvu *.tif
Parkosz_0003.tif:
- reading image
- converting to DjVu
- 0.029 bits/pixel; 275.856:1, 99.64% saved,
15078290 bytes in, 54660 bytes out
Parkosz_0004.tif:
- reading image
- converting to DjVu
- 0.054 bits/pixel; 147.387:1, 99.32% saved,
14263346 bytes in, 96775 bytes out
...
bundling
43. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
didjvu
didjvu output size
50M Parkosz4demoFR11pdf.pdf
11M Parkosz4demoFR11pdfMRC.pdf
2.2M Parkosz4demoFR11pdfaMRC.pdf
1.4M Parkosz4demoFR11djvuMRC_p2d600.djvu
1.3M Parkosz4demoFR11djvuaMRC_p2d600.djvu
1.2M Parkosz4demoFR11pdf_p2d600.djvu
779K Parkosz4demoFR11djvu.djvu
668K Parkosz_di600.djvu
44. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
didjvu
didjvu output — foregorund and background
45. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
didjvu
didjvu output — foregorund shape structures
46. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
didjvu
didjvu advanced usage
didjvu --help
usage: didjvu [-h] [--version] {separate,encode,bundle} ...
positional arguments:
{separate,encode,bundle}
separate generate masks for images
encode convert images to single-page DjVu documents
bundle convert images to bundled multi-page DjVu document
optional arguments:
-h, --help show this help message and exit
--version show version information and exit
more help:
didjvu separate --help
didjvu encode --help
didjvu bundle --help
47. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
didjvu
didjvu advanced usage
didjvu bundle --help
usage: didjvu bundle [-h] [-o FILE] [--pageid-template TEMPLATE]
[--loss-level N] [--lossless] [--clean] [--lossy]
[--masks MASK [MASK ...]] [--mask MASK] [--fg-slices N]
[--fg-crcb {normal,half,full,none}] [--fg-subsample N]
[--bg-slices N+...+N] [--bg-crcb {normal,half,full,none}]
[--bg-subsample N] [-d N] [-p N]
[-m {bernsen,tsai,white_rohrer,gatos,abutaleb,otsu,djvu,sauvola,niblack}]
[-v] [-q]
<input-image> [<input-image> ...]
positional arguments:
<input-image>
optional arguments:
-h, --help show this help message and exit
-o FILE, --output FILE
output filename
--pageid-template TEMPLATE
naming scheme for page identifiers
--loss-level N aggressiveness of lossy compression
--lossless lossless compression
--clean lossy compression: remove flyspecks
--lossy lossy compression: substitute patterns with small
variations
--masks MASK [MASK ...]
use pre-generated masks
--mask MASK use a pre-generated mask
...
48. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
ocrodjvu
ocrodjvu
ocrodjvu is a wrapper for OCR systems, that allows you to perform
OCR on DjVu files.
http://jwilk.net/software/ocrodjvu
http://en.wikipedia.org/wiki/OCRopus
http://en.wikipedia.org/wiki/Tesseract_(software)
http://en.wikipedia.org/wiki/CuneiForm_(software)
http://en.wikipedia.org/wiki/Ocrad
http://en.wikipedia.org/wiki/GOCR
Developed since 2008,
current version 0.7.12 (released on 2012-08-15)
Platforms: Linux distributions Debian, Ubuntu, openSUSE
Debian+Ubuntu popcon installed/votes: ∼ 2 000/100
Language versions: English
Demonstration: Open Virtual Appliance
http://fleksem.klf.uw.edu.pl/ndt/sid4ocr,
[…] squeeze4ocropus
49. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
ocrodjvu
ocrodjvu simple use example
ocrodjvu -D -e tesseract -l pol
Parkosz_di600.djvu -o Parkosz_di600t.djvu
Processing '../Parkosz4demo/di/Parkosz_di600.djvu':
- Page #1
- Page #2
- Page #3
- Page #4
- Page #5
- Page #6
- Page #7
Intermediate files were left
in the '/tmp/ocrodjvu.ueIzif' directory.
50. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
ocrodjvu
Tesseract 3.02 & ocrodjvu output — hidden text
51. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
ocrodjvu
ocrodjvu advanced usage
ocrodjvu --help
usage: ocrodjvu [options] FILE
positional arguments:
FILE DjVu file to process
optional arguments:
-h, --help show this help message and exit
-v, --version show version information and exit
-e ENGINE, --engine ENGINE
OCR engine to use
--list-engines print list of available OCR engines
--ocr-only don't save pages without OCR
--clear-text remove existing hidden text
-l LANGUAGE, --language LANGUAGE
set recognition language
--list-languages print list of available languages
--render {foreground,all,mask}
image layers to render
...
advanced options:
-D, --debug don't delete intermediate files
-X KEY=VALUE set an engine-specific property
--on-error {abort,resume}
error handling strategy
--html5 use HTML5 parse
52. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
ocrodjvu
Open Source vs commercial OCR
http://lib.psnc.pl/publication/428
53. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
djvusmooth
djvusmooth
djvusmooth is a graphical editor for DjVu documents
http://jwilk.net/software/djvusmooth
Developed since 2008,
current version 0.2.13 (released on 2012-10-02)
Platforms: Linux distributions Debian, Ubuntu, openSUSE
Debian+Ubuntu popcon installed/votes: ∼ 1 500/80
Language versions: English, Russian, Spanish
Demonstration: Open Virtual Appliance
http://fleksem.klf.uw.edu.pl/ndt/ubuntu4poliqarp/
54. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
djvusmooth
djvusmooth — editing hidden text
55. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Jakub Wilk’s utilities
djvusmooth
djvusmooth — using external text editor
56. . . . . . .
Scanned publications in digital libraries: new Open Source DjVu tools
Closing remarks
Thank you
for your attention!
Any questions?