American Chemical Society
Lessons Learned From
Building a Taxonomy and
Indexing 140+ years of
Content
Michael Darr
Columbus, OH
DHUG 2021
February 10
© 2021 American Chemical Society
Who is the American Chemical
Society?
A non-profit scientific organization with
more than 140 years’ experience, we are a
champion for chemistry, its practitioners
and our global community of members.
ACS Family: ACS Publications, C&EN
news, CAS, AACT (American Association
of Chemistry Teachers)
ACS Publications is recognized as a
leading publisher of authoritative scientific
information. Our 60+ peer-reviewed
journals are ranked the “most-trusted,
most-cited and most-read”.
© 2021 American Chemical Society
ACS Publications Products
ACS publishes across the full spectrum of chemistry
and related sciences and in every print medium.
We’ve published more than
• 1.3 million research articles across more than 60
journals
• 100,000 news stories in award winning C&EN
magazine
• 35,000 book chapter across more than 1,600
books
• 1,000 references and standards in ACS Reagent
Chemicals
© 2021 American Chemical Society 5
Where were we starting from?
• In 2016 in partnership with CAS (a sister division of ACS) we
developed in initial Taxonomy for use with ACS Omega, our new a
multidisciplinary open access journal
• Content was indexed manually by CAS scientists during an article’s
production lifecycle
• Terms were available typically just in time for publication, for which
at the time was a relatively small set of content
• Assigned terms were uploaded to our delivery system where they
were displayed on the article page and used to provide a taxonomy-
driven navigation for the journal
© 2021 American Chemical Society 6
Where did we need to go?
• Needed a taxonomy that was more customized for ACS
Publication’s needs
• Classify all published content
• Be able to handle processing 60,000+ articles a year in a timely
fashion
• Integrate display into a newly redesigned website
• Lay the groundwork to allow for expanding opportunities for new
non-journal products
© 2021 American Chemical Society
SLIDE TITLES SHOULD NOT GO MORE THAN
TWO LINES IN LENGTH.
Lessons From
Building a Taxonomy
Infographic vector created by vectorjuice
© 2021 American Chemical Society 8
Lessons From Building a Taxonomy
• Gather information on best practices and others’ experiences
• Get agreement early on from all the business owners on the
requirements for building the taxonomy
– Content domain experts and UI/UX engineers may have differing views
of what the customer and product needs are; establish clear decision
making roles.
• Be aware of complications due to polyhierarchy
– Makes content discoverable under a subject area for which it may not
pertain
– For a publisher prospective authors may try to use it to justify why their
submitted article fits the scope of a journal
© 2021 American Chemical Society 9
Lessons From Building a Taxonomy
• Ensure enough time and budget to enable sufficient
collaboration between your taxonomy consultants and
your internal content subject matter experts
• Establish live documents for more interactive
collaboration
• Ensure random sampling of content still includes an
appropriate percentage of research content and high
value content
• Give more time to building content for customer
research
– Dependent on tools being used to facilitate customer
interaction
© 2021 American Chemical Society 10
Actions We Took
• Chose to have a “full taxonomy” and a “visible taxonomy”
– The full taxonomy was what was needed to accurately classify the
content
– The visible taxonomy is a subset of the full taxonomy, including only the
top levels and specific terms in those levels to display on our platforms
• Engaged in customer focus research testing two different visible
taxonomies
– Found in individual testing that testers didn’t have any real preference
on the structure (note final versions were not hugely dissimilar)
– Found in A|B Testing on our Platform that the data captured on user
interactions didn’t provide a unanimous customer preference
© 2021 American Chemical Society
Visible Taxonomy Display
© 2021 American Chemical Society
SLIDE TITLES SHOULD NOT GO MORE THAN
TWO LINES IN LENGTH.
Lessons From
Classifying Content
Infographic vector created by vectorjuice
© 2021 American Chemical Society 13
Lessons From Classifying Content
• If you have PDF content, evaluate as early as possible how accurate
automated classification of the content will be
– 120 years of PDF-only content caused issues on being able to
programmatically identify content consistently
– Common issue of skewed indexing results due to content from the
preceding and following articles as the content was generated from
scans of the original text
• Engage platform architects early to fully understand all existing
capabilities and limitations for applying and leveraging the terms
• Consider weighting the text of the article for more accurate results
– Example: Title (8), Abstract (8), Experimental Section (4)
© 2021 American Chemical Society 14
Actions We Took
• We developed an internal automated process to derive the visible
taxonomy from the full taxonomy by determining the top 5 terms
• Validation of indexing results at a granular and visible level
– Using internal Subject Matter Experts to ensure consistently
hitting 85% or better accuracy
– Using external customers to verify accuracy of terms displayed
with the article
• Created a process for making adjustments to the visible terms
applied to the content
© 2021 American Chemical Society
Thank You!
Michael Darr
IT Project Manager
Publications Production Operations
American Chemical Society
2540 Olentangy River Rd
Columbus, OH
mdarr@acs.org

Acs discoverability-dhug2021

  • 2.
    American Chemical Society LessonsLearned From Building a Taxonomy and Indexing 140+ years of Content Michael Darr Columbus, OH DHUG 2021 February 10
  • 3.
    © 2021 AmericanChemical Society Who is the American Chemical Society? A non-profit scientific organization with more than 140 years’ experience, we are a champion for chemistry, its practitioners and our global community of members. ACS Family: ACS Publications, C&EN news, CAS, AACT (American Association of Chemistry Teachers) ACS Publications is recognized as a leading publisher of authoritative scientific information. Our 60+ peer-reviewed journals are ranked the “most-trusted, most-cited and most-read”.
  • 4.
    © 2021 AmericanChemical Society ACS Publications Products ACS publishes across the full spectrum of chemistry and related sciences and in every print medium. We’ve published more than • 1.3 million research articles across more than 60 journals • 100,000 news stories in award winning C&EN magazine • 35,000 book chapter across more than 1,600 books • 1,000 references and standards in ACS Reagent Chemicals
  • 5.
    © 2021 AmericanChemical Society 5 Where were we starting from? • In 2016 in partnership with CAS (a sister division of ACS) we developed in initial Taxonomy for use with ACS Omega, our new a multidisciplinary open access journal • Content was indexed manually by CAS scientists during an article’s production lifecycle • Terms were available typically just in time for publication, for which at the time was a relatively small set of content • Assigned terms were uploaded to our delivery system where they were displayed on the article page and used to provide a taxonomy- driven navigation for the journal
  • 6.
    © 2021 AmericanChemical Society 6 Where did we need to go? • Needed a taxonomy that was more customized for ACS Publication’s needs • Classify all published content • Be able to handle processing 60,000+ articles a year in a timely fashion • Integrate display into a newly redesigned website • Lay the groundwork to allow for expanding opportunities for new non-journal products
  • 7.
    © 2021 AmericanChemical Society SLIDE TITLES SHOULD NOT GO MORE THAN TWO LINES IN LENGTH. Lessons From Building a Taxonomy Infographic vector created by vectorjuice
  • 8.
    © 2021 AmericanChemical Society 8 Lessons From Building a Taxonomy • Gather information on best practices and others’ experiences • Get agreement early on from all the business owners on the requirements for building the taxonomy – Content domain experts and UI/UX engineers may have differing views of what the customer and product needs are; establish clear decision making roles. • Be aware of complications due to polyhierarchy – Makes content discoverable under a subject area for which it may not pertain – For a publisher prospective authors may try to use it to justify why their submitted article fits the scope of a journal
  • 9.
    © 2021 AmericanChemical Society 9 Lessons From Building a Taxonomy • Ensure enough time and budget to enable sufficient collaboration between your taxonomy consultants and your internal content subject matter experts • Establish live documents for more interactive collaboration • Ensure random sampling of content still includes an appropriate percentage of research content and high value content • Give more time to building content for customer research – Dependent on tools being used to facilitate customer interaction
  • 10.
    © 2021 AmericanChemical Society 10 Actions We Took • Chose to have a “full taxonomy” and a “visible taxonomy” – The full taxonomy was what was needed to accurately classify the content – The visible taxonomy is a subset of the full taxonomy, including only the top levels and specific terms in those levels to display on our platforms • Engaged in customer focus research testing two different visible taxonomies – Found in individual testing that testers didn’t have any real preference on the structure (note final versions were not hugely dissimilar) – Found in A|B Testing on our Platform that the data captured on user interactions didn’t provide a unanimous customer preference
  • 11.
    © 2021 AmericanChemical Society Visible Taxonomy Display
  • 12.
    © 2021 AmericanChemical Society SLIDE TITLES SHOULD NOT GO MORE THAN TWO LINES IN LENGTH. Lessons From Classifying Content Infographic vector created by vectorjuice
  • 13.
    © 2021 AmericanChemical Society 13 Lessons From Classifying Content • If you have PDF content, evaluate as early as possible how accurate automated classification of the content will be – 120 years of PDF-only content caused issues on being able to programmatically identify content consistently – Common issue of skewed indexing results due to content from the preceding and following articles as the content was generated from scans of the original text • Engage platform architects early to fully understand all existing capabilities and limitations for applying and leveraging the terms • Consider weighting the text of the article for more accurate results – Example: Title (8), Abstract (8), Experimental Section (4)
  • 14.
    © 2021 AmericanChemical Society 14 Actions We Took • We developed an internal automated process to derive the visible taxonomy from the full taxonomy by determining the top 5 terms • Validation of indexing results at a granular and visible level – Using internal Subject Matter Experts to ensure consistently hitting 85% or better accuracy – Using external customers to verify accuracy of terms displayed with the article • Created a process for making adjustments to the visible terms applied to the content
  • 15.
    © 2021 AmericanChemical Society Thank You! Michael Darr IT Project Manager Publications Production Operations American Chemical Society 2540 Olentangy River Rd Columbus, OH mdarr@acs.org

Editor's Notes

  • #12 Note we still clash on whether Subject Areas should be organized alphabetically or by article count