Project Gutenberg as an
Information Retrieval System
Kai Li
IST616 Final Assignment
2012.11
Introduction to Project Gutenberg
• The first digital library project in the
world, initiated by the late Michael Hart in
...
Intended Audience and Functionalities
• Intended audience: eBook readers and general
users.
• Functionalities: portal of t...
Mobile Site
• There are two kinds of
interfaces of this
website based on the
device one uses. Only
the traditional nonmobi...
Indexing System
Issues of Indexing/Tag System
• There is a searching box as well as a tag called
“Search Catalog”;
– The searching box is ...
Means To Find a Book
• Searching
• Browsing
– By categories
Searching
Issues of Searching
• The display is different from most of the
interfaces one can see on the Internet, which
may result s...
Precision and Recall
• The retrieval method used by this website is a
string-matching method, which matches the
string inp...
Browsing
Issues of Browsing
• There are three searching tools offered on this
page, which should have been offered on the
searching...
Categories/Classification
• There are two tiers of the “classification” on
this website:
– Subcategories: 23
• These subca...
Overall Evaluation
• Advantages:
– Mobile functionalities:
• Mobile site
• QR codes

• Disadvantages:
– Poorly organized a...
Thanks!
Upcoming SlideShare
Loading in …5
×

Project Gutenberg as Information Retrieval System

771 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
771
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • The project has been accepting eBooks uploaded by members which are not protected by US copyright laws.
  • Because this website is also the main page of the whole project, the audience include not only the people who want to get the eBooks but also people who are interested in the project itself.
  • The indexing system is actually very confusing. This slide lists some of the problems.
  • The searching result page: related bookshelves and subjects are displayed in front of all the books; books are ranked by popularity (times of download), but one can also choose to sort alphabetically or by released date.
  • The interface was very unintuitive for me when I first used it.If the book is not ranked high in terms of alphabetic, popularity or released date, and if the result is big, it’s almost impossible for one to find a specific book. Like traditional library catalogs, this interface doesn’t support finding an unknown book very well.
  • String-matching method cannot solve the issues of one words with multiple meanings or different words bearing the same meaning.
  • Methods: by author; by title; by language; by recently added; by popularity.One can also browse the website by LC classification (as well as LCSH). However, they are not listed on this page. LC classification can be found only from the book pages.
  • Not all bookshelves can be linked with a subcategory.Moreover, there are also some bookshelves containing materials in other languages that is not inside the above system, which indicates that the classification scheme in English may not cover all the resources on the website.
  • Many libraries and other parties have imported the metadata of Gutenberg eBooks to the local systems, which makes the issues of this website a less important one.But this is still a problem!
  • Project Gutenberg as Information Retrieval System

    1. 1. Project Gutenberg as an Information Retrieval System Kai Li IST616 Final Assignment 2012.11
    2. 2. Introduction to Project Gutenberg • The first digital library project in the world, initiated by the late Michael Hart in 1971. • Project Gutenberg currently offers more than 41,000 public domain eBooks (in more than 50 languages) as well as other resources (like scientific data). • Website: http://www.gutenberg.org/
    3. 3. Intended Audience and Functionalities • Intended audience: eBook readers and general users. • Functionalities: portal of the project, eBook repository and discovery system.
    4. 4. Mobile Site • There are two kinds of interfaces of this website based on the device one uses. Only the traditional nonmobile interface will be examined in this presentation due to the limited scope of the assignment.
    5. 5. Indexing System
    6. 6. Issues of Indexing/Tag System • There is a searching box as well as a tag called “Search Catalog”; – The searching box is too small to be noticed; – The tag “Search Catalog” actually leads users to a page where one cannot find the searching box, but only some browsing selections; • There are a number of repetitive tags on the left-hand bar and on the top of the page; – For example, the tag “Book Categories”.
    7. 7. Means To Find a Book • Searching • Browsing – By categories
    8. 8. Searching
    9. 9. Issues of Searching • The display is different from most of the interfaces one can see on the Internet, which may result some difficulties for new users; • Due to a lack of navigation mechanism and the function to refine the result by facets, it’s extremely inconvenient to locate a resource if the result is big.
    10. 10. Precision and Recall • The retrieval method used by this website is a string-matching method, which matches the string inputted by the user with the full-text of all the resources. – “Or” relationship used for multiple words. • Because the scope of the index is the full-text, the recall is higher than traditional library catalogs; however, since it is still a string-matching method, the precision is still not very good.
    11. 11. Browsing
    12. 12. Issues of Browsing • There are three searching tools offered on this page, which should have been offered on the searching page rather than this one. • Only one standard can be used to limit the resources at the same time. And after one chooses a certain standard, there is no other way to further limit the result.
    13. 13. Categories/Classification • There are two tiers of the “classification” on this website: – Subcategories: 23 • These subcategories are called “bookshelf” too, which is confusing. – Bookshelves: 133 • Which can be seen as a lower level than subcategories. However, not all bookshelves are linked to a given subcategory.
    14. 14. Overall Evaluation • Advantages: – Mobile functionalities: • Mobile site • QR codes • Disadvantages: – Poorly organized and designed; – Failing to display the full richness of the metadata on the website: • LoC classification and subject headings – The interface being lack of communication with the users;
    15. 15. Thanks!

    ×