210 mmIntegration of an Automatic IndexingSystem within the Document Flow of aGrey Literature RepositoryJindřich Mynarz, C...
210 mm Indexing of Grey Literature• self-publishing, self-indexing• the Web made publishing easier, can it make  indexing ...
210 mm Automatic Indexing• conditional on full-text availability• machine learning based on analysis of  language corpora•...
210 mm Implementation• re-use of existing components   o combination and extension• open source, open formats  subject hea...
210 mm Subject Heading System• Polythematic Structured Subject Headings  System   o universal Czech-English controlled    ...
210 mm Digital Repository• CDS Invenio  o open source, modular architecture  o extensions to the interface for entering   ...
210 mm Automatic Indexer• Maui Indexer  o automatic term assignment with a    controlled vocabulary  o extensions for Czec...
210 mm Text Corpus• National Repository of Grey Literature  o maintained by the National Technical    Library  o aggregate...
210 mm  Glue Code• code to tie all pieces together• web services   o loose coupling   o re-use of existing code
210 mm    User Interface Design    Considerations• opt-in indexing procedure• suggest indexing headings• autocomplete head...
210 mm Further Possibilities and Challenges• indexing must be reflected in end-user  interfaces• continuous enhancements o...
210 mmThank you for yourattention!<mailto:jindrich.mynarz@techlib.cz><mailto:ctibor.skuta@techlib.cz><http://www.techlib.c...
Upcoming SlideShare
Loading in …5
×

Integration of an Automatic Indexing System within the Document Flow of a Grey Literature Repository

1,094 views
1,024 views

Published on

Slides from the talk at the Grey Literature 12 conference.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,094
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Integration of an Automatic Indexing System within the Document Flow of a Grey Literature Repository

  1. 1. 210 mmIntegration of an Automatic IndexingSystem within the Document Flow of aGrey Literature RepositoryJindřich Mynarz, Ctibor ŠkutaNational Technical LibraryGrey Literature 12 Conference, 7.12. 2010
  2. 2. 210 mm Indexing of Grey Literature• self-publishing, self-indexing• the Web made publishing easier, can it make indexing easier as well?• make non-professional indexing better through technology• increase grey literature visibility and support navigation interfaces
  3. 3. 210 mm Automatic Indexing• conditional on full-text availability• machine learning based on analysis of language corpora• automatic term assignment• automatic suggestions of indexing terms lessen the cognitive overhead involved in indexing• human feedback to correct the obvious mistakes
  4. 4. 210 mm Implementation• re-use of existing components o combination and extension• open source, open formats subject headings system + digital repository + automatic indexer + text corpus + glue code = automatic indexing system
  5. 5. 210 mm Subject Heading System• Polythematic Structured Subject Headings System o universal Czech-English controlled vocabulary managed and used at the National Technical Library o expressed in RDF data format via SKOS vocabulary
  6. 6. 210 mm Digital Repository• CDS Invenio o open source, modular architecture o extensions to the interface for entering new documents and the search interface
  7. 7. 210 mm Automatic Indexer• Maui Indexer o automatic term assignment with a controlled vocabulary o extensions for Czech language (stemmer, stopwords) o indexing model for Czech language with usage of PSH
  8. 8. 210 mm Text Corpus• National Repository of Grey Literature o maintained by the National Technical Library o aggregates documents from partner institutions o in some cases, metadata are created by the users
  9. 9. 210 mm Glue Code• code to tie all pieces together• web services o loose coupling o re-use of existing code
  10. 10. 210 mm User Interface Design Considerations• opt-in indexing procedure• suggest indexing headings• autocomplete headings fragments• learn by example — show example documents indexed with the heading in question• extending search interface
  11. 11. 210 mm Further Possibilities and Challenges• indexing must be reflected in end-user interfaces• continuous enhancements of the individual parts of the document processing pipeline• user-generated indexing• feeding back into the development of the subject headings system
  12. 12. 210 mmThank you for yourattention!<mailto:jindrich.mynarz@techlib.cz><mailto:ctibor.skuta@techlib.cz><http://www.techlib.cz/en/>

×