Improving Subject Coding


Published on

This presentation by a Bowker subject analyst discussed the need for subject codes versus keywords and provides best practices on applying subject codes. It references materials available from the Book Industry Study Group (BISG) as well.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Codes are three letters followed by six numbers

    The codes are “dumb” in that they don’t really convey any information except for the fact that the first three characters reflect the section and the first code in each section ends with 000000 and has the subheading General (except the standalone code NON000000).

    This is because codes persist even if the text is changed (unless a subject is moved to another section…in which case one code is inactivated and a new one is added).

    Subjects are hierarchical based on the text (we’ll see a sample page next).

    Literal is the section name followed by one or more subheadings separated by /

    The full BISAC document contains both general and specific usage notes, highlights the changes made for the new edition, lists all codes that have been inactivated (ever) with instructions on recoding, and also includes two separate lists: merchandising themes and regional themes.

    MT and RT not used as often as subjects but are very helpful for providing information not covered by subjects.

    They are not as specific as LC subjects, but the benefit is that they are easier to learn and they are an excellent way to group books together and track industry trends.

    For example, there aren’t enough books published on the Sistine Chapel to really spot a trend vs. a blip…but BISAC subjects allow you to do this on various levels (either by sections like Fiction, groupings within a section like romance fiction, or specific subject like paranormal romance).

  • Here’s a sample of the beginning of the ARC section.
    All codes begin with ARC
    000000 first
    ARC001000 way down the list
    Effectively random codes otherwise
    Trees based on text
    Xrefs within section and outside of section (in this case subject was moved)
  • 79 codes.

    The purpose of this field is to provide a list of terms representing frequently requested merchandising themes and topics so that various organizations within the book industry can assemble materials appropriate to that theme or topic.

    Merchandising themes can be used in addition to subject codes to denote:
    An audience to which a work may be of particular appeal.
    A time of year or event for which a work may be especially appropriate.
    A frequently requested topic.

    In addition to grouping together works that can be classified under a variety of subjects, this field can be used to further describe fictional works that have been subject-coded by genre.
  • 466 codes.

    The BISAC Regional Themes are optional codes that can be used in conjunction with a Subject Code or with a Subject Code and a Merchandising Theme, but should not be used on their own, e.g., in lieu of a Subject Code. The codes can be used for fiction or non-fiction. When selecting a code, the editor should use the most general applicable code without using multiple specific codes. For example, if a book focuses on New England, it should be assigned New England rather than a code for each of the six states that comprise New England. A general rule is to use the most inclusive code. However, the BISAC Subject Codes Committee makes no recommendation as to how many codes should be used. While a book does not need a separate code for continent, country, state and city, if the desired city is included on the list, it should be coded for each location appropriate to the book.

    Level 1 – Continents
    Level 2 – Subcontinents
    Level 3 – Countries
    Level 4 – Subcountry regions
    Level 5 – States, Provinces, Counties
    Level 6 – City, Town, Area
    Level 7 – Borough, Neighborhood, District
  • Not getting into these but the websites are given
  • Statistics just based on data Bowker has received from publishers.

    We received 1.66 million ISBNs with a USA publication date of 2013 (print or ebooks).
    1.27 million had BISAC code provided by publisher
    1.25 codes per title
    742K contained a general code
    674K only contained a general code
    2,502 contained NON000000
    2,331 were only assigned NON000000
    4,326 contained a code that is now inactive
    13K had miscellaneous bad stuff (NA, 8-character codes, text instead of codes, etc.)
  • At least one valid active code.

    The most specific code(s) applicable to a product and generally speaking up to three codes (if appropriate).

    Fielded or coded correctly depending on the format that is used.

    More than three codes only for those cases where it is necessary to convey the multidisciplinary aspect of a book.

    The same code(s) for each format of a title (i.e., where the same content is provided in a different delivery method – if something is a graphic novelization or children’s adaptation of a novel, then the subjects should change accordingly). Editions too!

    Not telling people how to assign subjects to their own books….but there is “wrong” (basketball book with baseball subject), there is “could be better”, and there is “depends on your preference” in particular as to the order of the subjects – bed and breakfasts in New England needs two subjects and I wouldn’t want to say the order.
  • Improving Subject Coding

    1. 1. Improving Subject Coding Presented By Michael Olenick
    2. 2. Agenda • Why Subjects? • Why Keywords? • Subjects vs. Keywords • What Are BISAC Subjects? • Advice for BISAC Coding • Samples • Where Can I Learn More? 1
    3. 3. 11K Searches Books In Print® 2
    4. 4. Why Subjects? • Uniformity of usage • One place for all books on topic • Specificity depends on the list • Grouping data together for – Analytical purposes – Data retrieval – Shelving 3
    5. 5. The Goal Assign the most specific non-redundant subject(s) from the list you are using. BISAC • ART016000 ART / Individual Artists / General • ART015080 ART / History / Renaissance • ARC016000 ARCHITECTURE / Buildings / Religious Library of Congress • Michelangelo Buonarroti, 1475-1564--Criticism and interpretation. • Cappella Sistina (Vatican Palace, Vatican City) • Bible. Old Testament--Illustrations. • Mural painting and decoration, Italian--Vatican City. • Mural painting and decoration, Renaissance--Vatican City. 4
    6. 6. Why Keywords? • Immediate access • No need for a book to be manually classified • Keywords are ”free” if derived from title, description, reviews, full text, etc. • No need to know the specific phrasing and rules for a subject list 5
    7. 7. User Preferences • How tolerant are people for wading through a lot of results? • People are “Googlized” • You only get the books you want with subjects…you get more results with keywords • Subjects feel valuable…but sometimes it is hard to monetize this and subjects “cost” more 6
    8. 8. Keyword – 1307 Results on “mantle” 7
    9. 9. Using Link on the Subject 8
    10. 10. Gets 57 Results on That Subject 9
    11. 11. Subjects vs. Keywords 10 Subjects Keywords fewer results more results higher match to relevant titles includes false hits could assign incorrect subjects even if false hit, the keyword is in the metadata so it’s never “wrong” better for conveying genre fiction information easy to identify characters and places within the work some headings are not intuitive no need to cross reference or know particular terminology meets needs of resellers & discovery engines (shelf classification & website organization) meets needs of end user searching in small text snippets
    12. 12. What Are BISAC Subjects? • North American book industry standard for subject classification • Maintained by the BISAC Subject Committee of BISG with new edition issued annually in the Fall • BISG offers training on subjects and other topics via webcasts and sponsored events • 3946 codes/subjects grouped under 52 main sections • The subject list is available on the BISG site – No cost for lookup purposes – Licensing the list allows you to download versions in Excel, PDF and Word for unlimited use in internal systems 11
    13. 13. What Are BISAC Subjects? • Codes are three letters followed by six numbers • “Dumb” codes except that: – First three characters reflect the section – First code in each section ends with 000000 • Subjects are hierarchical based on text • Literal is the section name followed by one or more subheadings separated by / • Usage notes, edition differences, inactivated codes • Merchandising Themes & Regional Themes • Not as specific as LC subjects but this allows for statistically significant groupings 12
    14. 14. BISAC Subjects ARC000000 ARCHITECTURE / General ARC022000 ARCHITECTURE / Adaptive Reuse & Renovation ARC023000 ARCHITECTURE / Annuals ARC024000 ARCHITECTURE / Buildings / General ARC024010 ARCHITECTURE / Buildings / Landmarks & Monuments ARC011000 ARCHITECTURE / Buildings / Public, Commercial & Industrial ARC016000 ARCHITECTURE / Buildings / Religious ARC003000 ARCHITECTURE / Buildings / Residential ARCHITECTURE / CAD (Computer Aided Design) see Design, Drafting, Drawing & Presentation ARC019000 ARCHITECTURE / Codes & Standards ARC001000 ARCHITECTURE / Criticism ARC002000 ARCHITECTURE / Decoration & Ornament ARC004000 ARCHITECTURE / Design, Drafting, Drawing & Presentation ARCHITECTURE / Feng Shui see BODY, MIND & SPIRIT / Feng Shui 13
    15. 15. Merchandising Themes [non-consecutive samples] ET010 CULTURAL HERITAGE / African ET020 CULTURAL HERITAGE / African American ET022 CULTURAL HERITAGE / Asian / General ET040 CULTURAL HERITAGE / Asian / Chinese ET110 CULTURAL HERITAGE / Asian / Japanese ET130 CULTURAL HERITAGE / Asian / Korean ET220 CULTURAL HERITAGE / Asian / Vietnamese EV010 EVENT / Anniversary EV020 EVENT / Back to School EV030 EVENT / Baptism HL005 HOLIDAY / Chinese New Year HL010 HOLIDAY / Christmas TP020 TOPICAL / Black History TP024 TOPICAL / Blank Books, Journals TP026 TOPICAL / Boy's Interest 14
    16. 16. Regional Themes Code Level Region LEVEL_1 Europe LEVEL_2 British Isles LEVEL_3 Ireland LEVEL_6 Dublin LEVEL_3 United Kingdom, Great Britain LEVEL_4 Channel Islands LEVEL_4 England LEVEL_6 London LEVEL_6 Manchester LEVEL_4 Isle of Man LEVEL_4 Northern Ireland LEVEL_4 Scotland LEVEL_6 Edinburgh 15
    17. 17. Other Schemas • BIC – roughly the UK equivalent of BISAC maintained by Book Industry Communication Categories/ • Thema – roughly an internationalized version of BIC maintained via a global steering committee under EDItEUR– not officially a replacement for BIC but if the uptake is strong, it effectively will be (but not a replacement for BISAC) – recently introduced 16
    18. 18. Statistics for 2013 Publications • 1.66 million ISBNs received (USA print & ebooks) • 1.27 million had BISAC code provided by publisher • 1.25 codes per title • 742K contained a general code – 674K only contained a general code • 2,502 contained NON000000 – 2,331 were only assigned NON000000 • 4,326 contained a code that is now inactive • 13K had miscellaneous bad stuff (NA, 8-character codes, spaces, text instead of codes, etc.) 17
    19. 19. What Publishers Should Provide • At least 1 valid active code • Most specific code(s) applicable • Up to 3 codes (if appropriate) • > 3 codes only for rare cases where it is necessary to convey the multidisciplinary aspect of a work • All formats & editions should have same codes (assuming same content) • There is “wrong”, there is “could be better”, and there is “depends on your preference” 18
    20. 20. Additional Tips • Check the latest version for newly added codes • Use data to help you subject code – Descriptions – Table of Contents • Match subject codes to other metadata fields – Age – Grade – Audience – Reading Level – Keyword – Themes 19
    21. 21. User Advisories • Duplicate codes • Inactive codes • NON000000 NON-CLASSIFIABLE • General codes only • General codes along with more specific codes • Mixing codes – Fiction & nonfiction – Adult & juvenile 20
    22. 22. How Many Subjects Does an ISBN Need? • No correct number • Do not have to force multiple codes • But new information is valuable • Primary code first 21
    23. 23. Samples from Publisher Data • Good – several MED, SOC and PSY subjects • New subject could be added – PSY045070 PSYCHOLOGY / Movements / Cognitive Behavioral Therapy (CBT) 22
    24. 24. Samples from Publisher Data • Good - JUV043000 JUVENILE FICTION / Readers / Beginner • But add also - JUV002200 JUVENILE FICTION / Animals / Pigs 23
    25. 25. Samples from Publisher Data • Good - TRV010000 TRAVEL / Essays & Travelogues and TRV001000 TRAVEL / Special Interest / Adventure • Not Needed - TRV000000 TRAVEL / General 24
    26. 26. Samples from Publisher Data • Only code assigned was inactivated in 2001 - SEL022000 SELF-HELP / Recovery 25
    27. 27. Samples from Publisher Data • Could be better - FIC032000 FICTION / War & Military • Inactivated & incorrect - FAM019000 FAMILY & RELATIONSHIPS / Family Relationships 26
    28. 28. Samples from Publisher Data • First subject better as second - JNF048000 JUVENILE NONFICTION / Reference / General • Second subject better as first - JNF051050 JUVENILE NONFICTION / Science & Nature / Biology 27
    29. 29. Samples from Publisher Data • Correct - FIC019000 FICTION / Literary • Correctish - ART016010 ART / Individual Artists / Artists' Books 28
    30. 30. 29 Where Can I Learn More? • Latest Version of BISAC Subjects 2013-edition • FAQ for BISAC Subjects • BISG Best Practices for Product Metadata product-metadata • BISG Best Practices for Keywords in Metadata 29
    31. 31. How Can I Participate in BISG? • If your company is a member of BISG, you can participate in committee meetings (including the BISAC Subject Committee). • lists all the committees and working groups. • However, even if you are not a member you can e-mail suggestions through the BISG site. 30
    32. 32. 31
    33. 33. About Bowker Bowker is the world’s leading provider of bibliographic information and management solutions designed to help publishers, booksellers, and libraries better serve their customers. Creators of products and services that make books easier for people to discover, evaluate, order, and experience, the company also generates research and resources for publishers, helping them understand and meet the interests of readers worldwide. Bowker, a ProQuest affiliate, is the official ISBN Agency for the United States and its territories and is headquartered in New Providence, New Jersey with additional operations in England and Australia.