Taxonomy:
Do I need one?


                     Leigh White
            ElementalSource, LLC
Yes
What I’ll talk about

•   What happens without a taxonomy
•   What a taxonomy is and does
•   Why a taxonomy is important
•   A few first development steps
What I won’t talk about

• All the different kinds of taxonomies
• Details about development
• Tools for development
  – except DITA subjectScheme (briefly!)
A little history
What the he** IS that???
Oh, let’s call it a…

• Use the native name
• Name it after something familiar
  that it’s kind of “like”
• “Like” is murky; you have to define
  “like”
  – How it looks? Shape? Color? Size?
  – How it tastes?
  – How it acts?
Earth apples, anyone?

• aardappel (Dutch)
• pomme de terre (French)
*not apples
We know this because

• We have a taxonomy (Linnean
  classification) that specifies degrees
  of relationship between living things
Distant cousins, at best

          apple          potato

Kingdom   Plantae        Plantae

Phylum    Anthophyta     Anthophyta

Class     Eudicots       Eudicots

Order     Rosales        Solanales

Family    Rosaceae       Solanaceae

Genus     Malus          Solanum

Species   M. domestica   S. tuberosum
So, a taxonomy is

• A way of defining “like”
• A way of expressing relationships
  between things
  – We might already be instinctively
    aware of these relationships but need
    to formalize them
• A way of discovering relationships
  between things
• An information model
Taxonomies are

• typically organized by parent-child
  relationships
• typically indicated by the phrase 'is
  a kind of' or 'is a subtype of'
• the subtype has the same
  properties, behaviors, and
  constraints as the supertype plus
  one or more additional properties,
  behaviors, or constraints
Uhh…what?

• For example: car is a kind of
  vehicle, so any car is also a vehicle,
  but not every vehicle is a car
• The level “car” is more constrained
  than the level “vehicle”
• A car has all the properties of a
  vehicle plus some other properties
  specific to a car
Taxonomies are all around us

• It’s our nature to classify
• Many of these taxonomies are
  internal, arbitrary and personal
• A true taxonomy must be uniform
  and unambiguous
Other familiar taxonomies

• Dewey Decimal System
• Library of Congress System
• ICD-9/10 codes
• computer folder system
  – probably most common
    taxonomy in tech comm
And one I especially dig

• A taxonomy of wrongness!
  – www.fallacyfiles.org/taxonomy.html
We have metadata…why do we need
a taxonomy too?

• Where did that metadata come
  from?
  – You must have had some idea of how
    your content should be classified
  – If so, then you already have the
    beginnings of a taxonomy, at least in
    your head
  – So take it a step further
Metadata compliments taxonomy
and vice-versa
• Metadata describes an individual piece of
  content but doesn’t capture relationships
  very well.
• Metadata is part of content so updates
  can be unwieldy; better to maintain the
  model outside the content
• A taxonomy serves as a roadmap…it both
  describes current content and predicts
  future content
• A taxonomy highlights similarities (and
  differences) across products
• Metadata can pick up where taxonomy
  leaves off
What else are taxonomies good for?

• Controlled vocabularies
  – indexing
  – keywords
  – glossaries

• Searching/browsing/filtering
  – Faceted search
  – Filtering for custom doc publishing

• Content reuse
Amazon.com
So far…

• we’ve looked at hierarchical
  taxonomies
When hierarchy isn’t enough

 A Cockapoo is a kind of dog. It’s the
  product of a poodle and a Cocker
  Spaniel. A hierarchy cannot capture
  all these relationships.
There’s an alternative (polyarchical)
Purists might say…

• that you need different notations to
  express different kinds of
  relationships
• or that you must express the
  relationships uniformly
Maybe, maybe not

• You need what you need to capture
  the relationships you need to
  express
• No more, no less - KISS
• The relationships already exist; you
  are just using the taxonomy to
  express them
Decisions to make

• What kind of taxonomy:
  – hierarchical, polyarchical, something
    else?
• If hierarchical, how many levels?
• If polyarchical, what kinds of
  relationships and how designated?
• Tool to use? (meh)
• How to associate content with
  taxonomy?
Questions to ask
• What will the taxonomy be used for?
  – indexing, search, etc.
• Who are the users?
  – content creators, clients, SMEs, support, etc.
• What content will the taxonomy cover?
  – topics, images, demos, videos, etc.
• What are the scope and limits?
  – handling off-topic content—what to
    include/exclude
• What are the resources and constraints?
  – skills/expertise, timing, technology, funding,
    stakeholder roles, etc.
More questions to ask

• Who is responsible for development?
• What are secondary/contributor
  roles?
• How does taxonomy fit in with other
  metadata?
• How to handle ongoing support and
  maintenance?
Some first steps
• Start small—maybe just one small product
• Do content audit of everything the
  taxonomy will categorize
• Compare TOCs of existing deliverables
  – Find commonalities, differences
• Compare indexes of existing deliverables
  – Discover terms already in use
• Use folder structure
More first steps
• Assemble starting list of categories
  that cover existing content based on
  TOC, index and content audit
• Place existing content within
  taxonomy (on paper)
• Create taxonomy task force to
  review and refine
  – Avoid too many cooks
DITA Classification and Subject
Scheme
• Subject scheme
  – Defines controlled values (“buckets”)
    for classifying content
  – Defines relationships between those
    buckets
• Classification
  – Groups content into appropriate
    buckets
Subject classification scheme
subjectScheme map
<subjectScheme>
   <hasInstance>
      <subjectdef keys="product">
         <subjectdef keys="Widget"/>
            <subjectdef keys="module">
               <subjectdef keys="Meds"/>
               <subjectdef keys="AdminW"/>
            </subjectdef>
         </subjectdef>
         <subjectdef keys="Gadget"/>
            <subjectdef keys="module">
               <subjectdef keys="AdminG"/>
               <subjectdef keys="Labs"/>
            </subjectdef>
         </subjectdef>
      </subjectdef>
   </hasInstance>
</subjectScheme>
Associate topics with subjects
<map>
   <topicref href="t_configure_med.xml">
      <topicsubject>
         <subjectref keys="Meds"/>
         <subjectref keys="AdminW"/>
         <subjectref keys="AdminG"/>
      </topicsubject>
   </topicref>
</map>
Recommended reading/viewing

• The Accidental Taxonomist, Heather
  Hedden
• Organising Knowledge: Taxonomies,
  Knowledge, and Organisational
  Effectiveness, Patrick Lambe
• Joe Gelb’s presentation on
  subjectScheme:
  http://svdig.ditamap.com/videos/sv
  dig-2011-05-11.htm
Contact me



               Leigh White
      ElementalSource, LLC

elementalsource@gmail.com
              678.467.7706

Taxonomy: Do I Need One

  • 1.
    Taxonomy: Do I needone? Leigh White ElementalSource, LLC
  • 2.
  • 3.
    What I’ll talkabout • What happens without a taxonomy • What a taxonomy is and does • Why a taxonomy is important • A few first development steps
  • 4.
    What I won’ttalk about • All the different kinds of taxonomies • Details about development • Tools for development – except DITA subjectScheme (briefly!)
  • 5.
  • 6.
    What the he**IS that???
  • 7.
    Oh, let’s callit a… • Use the native name • Name it after something familiar that it’s kind of “like” • “Like” is murky; you have to define “like” – How it looks? Shape? Color? Size? – How it tastes? – How it acts?
  • 8.
    Earth apples, anyone? •aardappel (Dutch) • pomme de terre (French)
  • 9.
  • 10.
    We know thisbecause • We have a taxonomy (Linnean classification) that specifies degrees of relationship between living things
  • 11.
    Distant cousins, atbest apple potato Kingdom Plantae Plantae Phylum Anthophyta Anthophyta Class Eudicots Eudicots Order Rosales Solanales Family Rosaceae Solanaceae Genus Malus Solanum Species M. domestica S. tuberosum
  • 12.
    So, a taxonomyis • A way of defining “like” • A way of expressing relationships between things – We might already be instinctively aware of these relationships but need to formalize them • A way of discovering relationships between things • An information model
  • 13.
    Taxonomies are • typicallyorganized by parent-child relationships • typically indicated by the phrase 'is a kind of' or 'is a subtype of' • the subtype has the same properties, behaviors, and constraints as the supertype plus one or more additional properties, behaviors, or constraints
  • 14.
    Uhh…what? • For example:car is a kind of vehicle, so any car is also a vehicle, but not every vehicle is a car • The level “car” is more constrained than the level “vehicle” • A car has all the properties of a vehicle plus some other properties specific to a car
  • 15.
    Taxonomies are allaround us • It’s our nature to classify • Many of these taxonomies are internal, arbitrary and personal • A true taxonomy must be uniform and unambiguous
  • 16.
    Other familiar taxonomies •Dewey Decimal System • Library of Congress System • ICD-9/10 codes • computer folder system – probably most common taxonomy in tech comm
  • 17.
    And one Iespecially dig • A taxonomy of wrongness! – www.fallacyfiles.org/taxonomy.html
  • 18.
    We have metadata…whydo we need a taxonomy too? • Where did that metadata come from? – You must have had some idea of how your content should be classified – If so, then you already have the beginnings of a taxonomy, at least in your head – So take it a step further
  • 19.
    Metadata compliments taxonomy andvice-versa • Metadata describes an individual piece of content but doesn’t capture relationships very well. • Metadata is part of content so updates can be unwieldy; better to maintain the model outside the content • A taxonomy serves as a roadmap…it both describes current content and predicts future content • A taxonomy highlights similarities (and differences) across products • Metadata can pick up where taxonomy leaves off
  • 20.
    What else aretaxonomies good for? • Controlled vocabularies – indexing – keywords – glossaries • Searching/browsing/filtering – Faceted search – Filtering for custom doc publishing • Content reuse
  • 21.
  • 25.
    So far… • we’velooked at hierarchical taxonomies
  • 26.
    When hierarchy isn’tenough  A Cockapoo is a kind of dog. It’s the product of a poodle and a Cocker Spaniel. A hierarchy cannot capture all these relationships.
  • 27.
  • 28.
    Purists might say… •that you need different notations to express different kinds of relationships • or that you must express the relationships uniformly
  • 29.
    Maybe, maybe not •You need what you need to capture the relationships you need to express • No more, no less - KISS • The relationships already exist; you are just using the taxonomy to express them
  • 30.
    Decisions to make •What kind of taxonomy: – hierarchical, polyarchical, something else? • If hierarchical, how many levels? • If polyarchical, what kinds of relationships and how designated? • Tool to use? (meh) • How to associate content with taxonomy?
  • 31.
    Questions to ask •What will the taxonomy be used for? – indexing, search, etc. • Who are the users? – content creators, clients, SMEs, support, etc. • What content will the taxonomy cover? – topics, images, demos, videos, etc. • What are the scope and limits? – handling off-topic content—what to include/exclude • What are the resources and constraints? – skills/expertise, timing, technology, funding, stakeholder roles, etc.
  • 32.
    More questions toask • Who is responsible for development? • What are secondary/contributor roles? • How does taxonomy fit in with other metadata? • How to handle ongoing support and maintenance?
  • 33.
    Some first steps •Start small—maybe just one small product • Do content audit of everything the taxonomy will categorize • Compare TOCs of existing deliverables – Find commonalities, differences • Compare indexes of existing deliverables – Discover terms already in use • Use folder structure
  • 34.
    More first steps •Assemble starting list of categories that cover existing content based on TOC, index and content audit • Place existing content within taxonomy (on paper) • Create taxonomy task force to review and refine – Avoid too many cooks
  • 35.
    DITA Classification andSubject Scheme • Subject scheme – Defines controlled values (“buckets”) for classifying content – Defines relationships between those buckets • Classification – Groups content into appropriate buckets
  • 36.
  • 37.
    subjectScheme map <subjectScheme> <hasInstance> <subjectdef keys="product"> <subjectdef keys="Widget"/> <subjectdef keys="module"> <subjectdef keys="Meds"/> <subjectdef keys="AdminW"/> </subjectdef> </subjectdef> <subjectdef keys="Gadget"/> <subjectdef keys="module"> <subjectdef keys="AdminG"/> <subjectdef keys="Labs"/> </subjectdef> </subjectdef> </subjectdef> </hasInstance> </subjectScheme>
  • 38.
    Associate topics withsubjects <map> <topicref href="t_configure_med.xml"> <topicsubject> <subjectref keys="Meds"/> <subjectref keys="AdminW"/> <subjectref keys="AdminG"/> </topicsubject> </topicref> </map>
  • 39.
    Recommended reading/viewing • TheAccidental Taxonomist, Heather Hedden • Organising Knowledge: Taxonomies, Knowledge, and Organisational Effectiveness, Patrick Lambe • Joe Gelb’s presentation on subjectScheme: http://svdig.ditamap.com/videos/sv dig-2011-05-11.htm
  • 40.
    Contact me Leigh White ElementalSource, LLC elementalsource@gmail.com 678.467.7706