Constructing a Thesaurus of Irish Folklore
Using Facet Analysis
The MoTIF Project

LAI CMG Annual Seminar, November 8 2013
Project Aims
Thesaurus Construction Guidelines

MoTIF: Pilot Thesaurus of Irish Folklore

LAI CMG Annual Seminar, November...
Restricted List of Terms

Consistency, remove ambiguity,
improve precision

Controlled
Vocabularies

LAI CMG Annual Semina...
Restricted List of Terms

Consistency, remove ambiguity,
improve precision

Browsing, navigating

Taxonomies

LAI CMG Annu...
Restricted List of Terms

Consistency, remove ambiguity,
improve precision

Browsing, navigating

Synonyms, antonyms, maki...
Custom vocabularies
Adapted vocabularies
International standards
and best practice

LAI CMG Annual Seminar, November 8 201...
Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, no...
Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, no...
Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, no...
Guidelines
Literature review
Main elements of a
thesaurus
Terms and concepts
Relationships (USE, UF, BT,
NT, RT)
Notes, no...
MoTIF
Construction
Process

7

3

1

2
8

6

4

5

LAI CMG Annual Seminar, November 8 2013
MoTIF
Construction
Process
1. Selection of
terms

7

3

1

2
8

6

4

5

LAI CMG Annual Seminar, November 8 2013
MoTIF
Construction
Process
1. Selection of
terms
2. Structure

7

3

1

2
8

6

4

5

LAI CMG Annual Seminar, November 8 2...
MoTIF
Construction
Process
1. Selection of
terms
2. Structure
3. Facet Analysis

7

3

1

2
8

6

4

5

LAI CMG Annual Sem...
MoTIF
Construction
Process
1. Selection of
terms
2. Structure
3. Facet Analysis
4. Relationships

7

3

1

2
8

6

4

5

L...
MoTIF
Construction
Process
1. Selection of
terms
2. Structure
3. Facet Analysis
4. Relationships
5. Alphabetical List

7

...
MoTIF
Construction
Process
1. Selection of
terms
2. Structure
3. Facet Analysis
4. Relationships
5. Alphabetical List
6. E...
MoTIF
Construction
Process
1. Selection of
terms
2. Structure
3. Facet Analysis
4. Relationships
5. Alphabetical List
6. E...
MoTIF
Construction
Process
1. Selection
2. Structure
3. Facet Analysis
4. Relationships
5. Alphabetical List
6. Expert Rev...
Term Selection I
Vocabulary Resources
A Handbook of Irish
Folklore by Seán Ó
Suilleabháin
Bealoideas: Journal of
the Folkl...
Term Selection II
Form of Entry
Nouns: count nouns (cows, dogs) in the plural,
non-count (livestock, milk) in the singular...
LAI CMG Annual Seminar, November 8 2013
Systematic Structure
Hierarchical (facet) or classified (subject) display
Fundamental facets as top concepts (TT).
Easily ...
Facet Analysis
ISO: “grouping of concepts of the same inherent
category”
Objects, materials, people, places, etc.

Rangana...
Time
Place / Space /
Environment
Products
Activities
Processes and
Phenomena
Events

Agents
Objects
Materials
Attributes a...
LAI CMG Annual Seminar, November 8 2013
LAI CMG Annual Seminar, November 8 2013
LAI CMG Annual Seminar, November 8 2013
LAI CMG Annual Seminar, November 8 2013
LAI CMG Annual Seminar, November 8 2013
LAI CMG Annual Seminar, November 8 2013
Current and Future Work
Expansion of the pilot thesaurus to approximately
2,000 preferred terms
Feasibility study into rep...
Thank you!
UF
UF
UF

Thanks!
Cheers!
Ta!

LAI CMG Annual Seminar, November 8 2013
Upcoming SlideShare
Loading in …5
×

The MoTIF Project: Constructing a Pilot Thesaurus of Irish Folklore Using Facet Analysis - Catherine Ryan

542 views

Published on

Presented to the Annual Seminar of the Library Association of Ireland’s Cataloguing and Metadata Group, 8th of November, 2013.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
542
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • MoTIF is a collaborative project undertaken by the Digital Repository of Ireland and the National Library of Ireland.It produced a set of guidelines on thesaurus construction as a resource for librarians, archivists and other information professionals who wish to organise and annotate their content for improved search and retrieval. The guidelines are accompanied by a pilot thesaurus of Irish folklore which acts as a illustrative example, a visual demonstration of the principles and processes outlined in the guidelines.Both guidelines and pilot have been submitted for review and will be published in December 2013.
  • Controlled vocabularies are restricted lists of terms used to provide consistency across search, remove any ambiguity between terms and improve search precision. They may contain equivalence relationships such as USE, Use For, see reference types but don’t have to.
  • Taxonomies are controlled vocabularies with hierarchical relationships, which can be used for browsing up and down a tree or navigating a website.
  • Thesauri are controlled vocabularies with hierarchical, associative and equivalence relationships that offer all the benefits previously described but can also make more connections between terms using associative relationships, allow search over non-preferred terms using equivalence relationships, and clarify the meaning of terms using definitions and scope notes.Now, definitions can overlap but that, broadly speaking, is how they work.
  • Following the Digital Archiving in Ireland DRI report, an opportunity was identified to produce guidelines which would give professionals the advice they need to improve their own data practices by adhering to international standards and best practices.
  • The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
  • The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
  • The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
  • The guidelines act as a basic introduction to thesauri, including the main elements of a thesaurus, facet analysis, and the construction process. They also briefly cover planning, multilingual thesauri, mapping thesauri and the relationship between thesauri and the Semantic Web.
  • The construction process in brief
  • Step 1 involves the selection and recording of terms
  • Step 2: determining structure and display of the thesaurus, if it will be organised by subject or by facet.
  • Step 3: the facet analysis itself
  • Step 4: creating relationships and notes within the now complete structure
  • Step 5: creating an alphabetical list (if desired)
  • This will then be followed by expert review and
  • Documentation.
  • =thesaurus.
  • The correct form of these terms was then chosen.ISO 25964-1 sets out guidelines on the form that term should take when entered into the thesaurus. Nouns are the most common form you will encounter. Verbs are the next most common and will take the gerund or verbal forms (usually ending in –ing)Adjectives, adverbs (usually ending in –ly) and articles are to be avoided. Some adjectives were included in the pilot thesaurus as they did pop up as significant in the literature. For example, the significance of wearing red on a particular day might be discussed as part of the lore of a particular area.
  • The initial list of terms is all over the place and a systematic structure needs to be developed to order them in a logical way.
  • A systematic structure can be though of as a hierarchical or classified display with subjects or facets at the top of the hierarchies.At this stage a hierarchical display was chosen with fundamental facets as the main divisions, or top concepts as this structure is more easily updated, and it is a good demonstration of the ISO standard rules for hierarchical arrangement, that broader and narrower terms should be one of three different types of relationships: a thing/kind of a thing relationship, all concepts in a the objects facet will be a kind or type of object a whole and its parts relationship, the human anatomy will have hands, heads and so on a narrower terms or a narrower term should be an instance of the broader term. So, for example, the class ‘dogs’ would have a narrower term ‘Spot’It’s important to emphasise that we didn’t structure the thesaurus at this stage, we only made the decision on the design of the scaffolding. More detailed elements of the systematic structure were only determined during the vocabulary analysis.
  • We used the method of facet analysis which is theanalysis of a subject area into its constituent concepts which are then grouped into facets.The ISO thesaurus standard defines a facet as a ‘grouping of concepts of the same inherent category’. Object, materials, people, places and so on are known as fundamental facets. These fundamental categories of facets were first devised by Ranganathan as part of a library classification scheme in the 1920s and 1930s. Ranganathan proposed five categories, Personality, Matter, Energy, Space and Time, or PMEST, which could cover all aspects of a discipline or subject. These were later expanded by Brian Vickery for the Classification Research Group (CRG) based on the Aristotelian fundamental categories—thing, kind, part, property, material, process, operation, agent, patient, product, by-product, space and time. The CRG went on to state that these categories act as guides to analysis and should not be imposed on subjects Ultimately the choice of facets will depend on the subject matter and what is most practical.
  • What was most practical for the pilot were the facets listed above. How to organise..
  • ...a jumble of words into...
  • ...intolists of basic coherent facets. For example, in the literature, an agent is a person or piece of equipment which carries out transitive actions, i.e. actions that require a direct object. Following this, animals, fish and people were placed under the Agents facet as these were living creatures which can perform actions and can have an effect on the environment around them.The category also includes supernatural beings and creatures. Other living organisms were originally located under a separate Living Entities facet. In the end, the decision was taken to include all living organisms, from people through to mythical beings and plants under the Agents facet as it made more sense to keep these all living entities together. It is also arguable that, in folklore, some plants, trees and other such living entities have the potential to perform actions or have an effect on others. So that made sense in the context of folklore. It may not in another. Rather than confuse people, equipment was then put into objects. The guidelines go through a few more tricky decisions and they also outline the scope of each fundamental facet as defined for the pilot thesaurus.
  • Facet analysis IIOnce the initial analysis has been completed and all terms grouped, the facets were then grouped into narrower divisions, using node labels to divide the facets into sub-facets and to organise them according to the principles, or characteristics of division. In the above example, the Agent facets has sub-facets, people, animals, other living organisms and supernatural beings. The animal sub-facet is then organised according to their characteristics of division, in this case animals by function, by species and so on. This is exactly the kind of division you would see on say a fashion website where shirts are organised by size, by colour and so on.
  • Once the hierarchies were completed and input into the thesaurus management software, associative relationships were added. This is the process recommended by the ISO standard as the most useful associative relationships are usually across hierarchies and so this is easier to do once those hierarchies have been established.These are examples of the most common type of relationships created across hierarchies, so we have agents relating to their activities, materials referring to their products, objects with parts, etc. It should also be noted that these relationships are reciprocal, so they refer to each other.
  • After that, example scope notes were added to the pilot thesaurus to explain concepts. Like the relationships, the scope notes present in the pilot thesaurus should be considered as illustrative examples as this was as much as the time frame of the project would allow.
  • Two lists, alphabetical and hierarchical, were then generated within the software and exported. These formed the basis of the print version of the thesaurus.An electronic version of the software also exists and it contains both hierarchical and alphabetical displays which can be browsed. It can also be searched by keyword.
  • The MoTIF Project: Constructing a Pilot Thesaurus of Irish Folklore Using Facet Analysis - Catherine Ryan

    1. 1. Constructing a Thesaurus of Irish Folklore Using Facet Analysis The MoTIF Project LAI CMG Annual Seminar, November 8 2013
    2. 2. Project Aims Thesaurus Construction Guidelines MoTIF: Pilot Thesaurus of Irish Folklore LAI CMG Annual Seminar, November 8 2013
    3. 3. Restricted List of Terms Consistency, remove ambiguity, improve precision Controlled Vocabularies LAI CMG Annual Seminar, November 8 2013
    4. 4. Restricted List of Terms Consistency, remove ambiguity, improve precision Browsing, navigating Taxonomies LAI CMG Annual Seminar, November 8 2013 Hierarchical relationships
    5. 5. Restricted List of Terms Consistency, remove ambiguity, improve precision Browsing, navigating Synonyms, antonyms, making connections , definitions, scope Thesauri Hierarchical relationships Equivalence Relationships, Associative Relationships, Scope Notes LAI CMG Annual Seminar, November 8 2013
    6. 6. Custom vocabularies Adapted vocabularies International standards and best practice LAI CMG Annual Seminar, November 8 2013
    7. 7. Guidelines Literature review Main elements of a thesaurus Terms and concepts Relationships (USE, UF, BT, NT, RT) Notes, node labels and arrays Facet analysis Construction process LAI CMG Annual Seminar, November 8 2013
    8. 8. Guidelines Literature review Main elements of a thesaurus Terms and concepts Relationships (USE, UF, BT, NT, RT) Notes, node labels and arrays Facet analysis Construction process LAI CMG Annual Seminar, November 8 2013
    9. 9. Guidelines Literature review Main elements of a thesaurus Terms and concepts Relationships (USE, UF, BT, NT, RT) Notes, node labels and arrays Facet analysis Construction process LAI CMG Annual Seminar, November 8 2013
    10. 10. Guidelines Literature review Main elements of a thesaurus Terms and concepts Relationships (USE, UF, BT, NT, RT) Notes, node labels and arrays Facet analysis Construction process LAI CMG Annual Seminar, November 8 2013
    11. 11. MoTIF Construction Process 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    12. 12. MoTIF Construction Process 1. Selection of terms 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    13. 13. MoTIF Construction Process 1. Selection of terms 2. Structure 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    14. 14. MoTIF Construction Process 1. Selection of terms 2. Structure 3. Facet Analysis 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    15. 15. MoTIF Construction Process 1. Selection of terms 2. Structure 3. Facet Analysis 4. Relationships 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    16. 16. MoTIF Construction Process 1. Selection of terms 2. Structure 3. Facet Analysis 4. Relationships 5. Alphabetical List 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    17. 17. MoTIF Construction Process 1. Selection of terms 2. Structure 3. Facet Analysis 4. Relationships 5. Alphabetical List 6. Expert Review 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    18. 18. MoTIF Construction Process 1. Selection of terms 2. Structure 3. Facet Analysis 4. Relationships 5. Alphabetical List 6. Expert Review 7. Documentation 7 3 1 2 8 6 4 5 LAI CMG Annual Seminar, November 8 2013
    19. 19. MoTIF Construction Process 1. Selection 2. Structure 3. Facet Analysis 4. Relationships 5. Alphabetical List 6. Expert Review 7. Documentation 7 3 1 2 8 6 Thesaurus 5 LAI CMG Annual Seminar, November 8 2013 4
    20. 20. Term Selection I Vocabulary Resources A Handbook of Irish Folklore by Seán Ó Suilleabháin Bealoideas: Journal of the Folklore of Ireland Society LAI CMG Annual Seminar, November 8 2013
    21. 21. Term Selection II Form of Entry Nouns: count nouns (cows, dogs) in the plural, non-count (livestock, milk) in the singular. Verbs: gerund or verbal, no infinitive. Adjectives: avoid unless significant. Adverbs: avoid. Articles (the, a): avoid. LAI CMG Annual Seminar, November 8 2013
    22. 22. LAI CMG Annual Seminar, November 8 2013
    23. 23. Systematic Structure Hierarchical (facet) or classified (subject) display Fundamental facets as top concepts (TT). Easily updated structure. Good demonstration of the ISO 25964 hierarchy rules. Thing/kind Whole/part Particular instances of a class LAI CMG Annual Seminar, November 8 2013
    24. 24. Facet Analysis ISO: “grouping of concepts of the same inherent category” Objects, materials, people, places, etc. Ranganathan, 1920s and 1930s Personality, Matter, Energy, Space, Time Classification Research Group, 1960s Thing, kind, part, property, material, process, operation, agent, patient, product, by-product, space and time LAI CMG Annual Seminar, November 8 2013
    25. 25. Time Place / Space / Environment Products Activities Processes and Phenomena Events Agents Objects Materials Attributes and Properties Parts Genre Abstract Entities and Concepts LAI CMG Annual Seminar, November 8 2013
    26. 26. LAI CMG Annual Seminar, November 8 2013
    27. 27. LAI CMG Annual Seminar, November 8 2013
    28. 28. LAI CMG Annual Seminar, November 8 2013
    29. 29. LAI CMG Annual Seminar, November 8 2013
    30. 30. LAI CMG Annual Seminar, November 8 2013
    31. 31. LAI CMG Annual Seminar, November 8 2013
    32. 32. Current and Future Work Expansion of the pilot thesaurus to approximately 2,000 preferred terms Feasibility study into representation in SKOS Potential Future Work Multilingual thesaurus (Irish and English) Representation in SKOS Mapping to other vocabularies LAI CMG Annual Seminar, November 8 2013
    33. 33. Thank you! UF UF UF Thanks! Cheers! Ta! LAI CMG Annual Seminar, November 8 2013

    ×