3. o D_ISRAELI:
Build an innovative DL model
• Advanced features enhancing content management and
search
o Two major issues are addressed
Use of the information contained in the digital objects
Semantic-based Indexing of contents for more powerful
search engines
Vision
4. o Scientific papers that compare the most popular Open Source
DLMSs indicate dSpace as the most complete one to reach our
goals:
Supports various metadata standards and protocols for
interoperability
Based on a single programming language (Java and related
technologies) and three-layer architecture
Provides rich documentation
o Need of additional components at different levels:
Presentation
Logic
Data
DLMS Charateristics
5. o Starting point: Open Source Digital Library Management System
dSpace
o Major improvement 1: Integration of innovative tools
ICRPad platform for recognition, graphic matching and text
extraction from digital objects
• Information Extraction can include historical material
Semantic-based indexing by @DOMINUS
• Search capabilities based on the digital object’s content
o Major improvement 2: Support of different formats of digital objects,
and in particular:
FITS (Flexible Image Transport System), an open standard
defining a digital file format useful for storage, transmission and
processing of scientific and other images
Major proposed Improvements
6. o Two modules:
ICR/IWR/OCR: the algorithm performs the segmentation of images
into regions of interest (single characters or words) for the recognition
process. The text recognition is based on reference models of
characters or words learned through a supervised learning process
from pre-classified training samples
Graphic Matching: the matching algorithm represents, recognizes
and describes the object by the contour shape, extracted by selecting
the contour points whose neighboring contrast exceeds a fixed
threshold. Images are processed at various levels of resolution and
the matching process of the extracted shape is performed at a
different levels (pyramid levels) (implemented for Vatican Library)
ICRPad platform
7. @DOMINUS
o The Artificial Brain DOcument Management INtelligent Universal
System is a general framework that processes digital documents
through a pipeline consisting of several steps, aimed at acquiring
increasingly abstract information from each incoming document
Acquisition, Layout Analysis, Document Image Interpretation, Text
Analysis, Categorization, Information Extraction
o Advanced Semantic-based indexing technique will be used in In
D_ISRAELI
Enhanced version of Latent Semantic Indexing (LSI) and Concept
Indexing (CI) will be applied