SlideShare a Scribd company logo
1 of 49
Download to read offline
The Dynamic Potential of
Semantic Enrichment
  or, Everything You Always Wanted to
  Know About Semantic Enrichment
         OK, not everything.
         Not even most things.
         Just some things you probably should be aware of.

  Allen Press
  Emerging Trends in Scholarly Publishing™ Seminar
  14 April 2011
     Pam Harley
      VP, Product & Market Development
      SemedicaTM A DIVISION OF SILVERCHAIR
      pamh@semedica.com
      (434) 296-6333 x372
Why me?
   Me
       20+ years in STM publishing, many hats worn
           print, digital
           books, journals, news, continuing education…
           editorial, production, product development

   Silverchair
       10+ years working with STM publishers to build products
        and features from semantically tagged content




                                                                  2
Here’s the plan
   WHAT is semantic enrichment
   WHY you should care (benefits)
   HOW to get started
    (with a few side trips to make sure we’re all on the
    same page re: lingo)




                                                           3
First…

        DON’T
do what I’m about to do
   Don’t start by exploring technology
   (Hint: Start with user stories)




                                          4
What’s a user story?
a user story captures what the user wants
  to achieve—who wants the functionality
    and why it allows that user to achieve
              something useful




                                             5
Creating user stories
   Focus your tagging strategy on user stories—how
    people want to use your content:
       What tasks are they trying to do when they use your
        product? What answers are they looking for? At what point
        in their workflow is your product used?

   Almost all information sites have multiple user
    stories. Know them for your products
   Remember that your organization is also a key
    user of your product
                                                                    6
WHAT
is semantic…
   enrichment
     tagging
      markup
    indexing
 fingerprinting
 classification
 categorization
       ?
                  7
Semantics are about
meaning
   The meaning of content is currently written for
    human understanding, not computers
   Semantics adds a layer of meaning to your
    content, so that computers can make sense of it
    and build connections to it
   Semantic metadata answers the most important
    question of all for content producers and users:
             What is this content about?
    captured in a way that computers can process
                                                       8
“Atomizing” information
   A semantic approach requires you to go beyond
    documents and think of your content as data
   Semantic markup allows knowledge in your
    publications to be acted on as distinct bits of data
          For example:
          1 practice guideline = 1 document
                              OR
          1 practice guideline = 312 distinct pieces of data



                                                               9
Taxonomy is the
semantic foundation
   Taxonomy is the framework for the semantic layer
    and semantic tagging
   It allows…
       Normalization
       Consistency in tagging
       Concept grouping and hierarchical relationships
       Integrations/interoperability (internal and external)




                                                                10
Equivalent relationships are
critical
   Synonyms, abbreviations, jargon, misspellings,
    codes are a critical component
   Necessary to normalize the natural and constantly
    evolving variations in the language that authors use
    to describe concepts and searchers use to find them
   Vastly improve performance of autotagging systems
       Precise strings are easier to match programmatically, and a
        thesaurus magnifies the number of strings available to match to
        a given concept


                                                                          11
Normalization
   Authors use different terminology to represent the
    same topics
    Examples:
    Synonyms (newborn = neonate)
    Abbreviations (GHB = gamma hydroxybutyrate)
    Shorthand (c diff = clostridium difficile)

   Searches for these language variations produce
    different results
   A semantic layer controlled by a taxonomy/
    thesaurus normalizes these variations


                                                         12
Normalization in action at McGraw-Hill’s
AccessEmergency Medicine




                                           13
Consistency in tagging




                         14
Dynamic concept grouping and
hierarchical relationships




                               15
Hooks for integrations/
interoperability




                          16
Where does a taxonomy
come from?
   Your content collection
   Inputs from your users (e.g., author keywords,
    search logs)
   Subject matter expert consultation
   Industry standard terminologies
       Source for concepts, equivalents, guidance on hierarchy




                                                                  17
The importance of industry
standard terminologies
   Your taxonomy must be able to interact with
    standards of your domain to forge meaningful
    external integrations
   Many terminologies are in use in different scientific
    domains (UMLS, ACS, ACM, AIP, IEEE, OSA, EPA,
    NASA, USGS…). Investigate what’s available
       Great case example for domain-level taxonomy:
        For medical content, UMLS metathesaurus maps together 100+
        constituent health care vocabularies (MeSH, SNOMED, ICD,
        RxNorm…) to support health care interoperability

                                                                     18
Don’t reinvent the wheel!
   If there’s a taxonomy available that’s a good fit, use it
   BUT make sure you have a mechanism for adapting
    it to meet the needs of
       your content
       your users
       the pace of change/new concepts in your field

        [Note to STM publishers in cutting-edge areas: You can’t
        wait for the standards to catch up to your research output—
        you’ll need to be able to add concepts at the time of
        publication]


                                                                      19
Ongoing taxonomy management
   Taxonomies must be continually enhanced as
    your domain evolves, your content set grows,
    and your user needs and expectations change
   Make sure it is easy to update your taxonomy and
    make it available to your systems (tagging, web
    applications), ideally in real time
             Taxonomies should always be
           considered a work in progress!


                                                       20
Application of taxonomy to
content—semantic tagging
   Semantic tagging is the insertion of semantic
    information at the level of XML elements
        Example: <root-term termID="47521">t cells, regulatory</root-term>

   Tagging can be embedded directly in XML,
    provided as separate reference files, or placed
    in database tables that reference elements
   If the content is inaccessible (e.g., images and
    videos, PDFs) tagging can be placed in header
    files
                                                                             21
Who/what tags?
   Automated tagging—software analyzes content, adds tags
    based on concept matching, patterns, grammar
        Pros: Highly scalable, good at finding trends in large bodies of content. Sometimes the
              only option for very large data sets
        Cons: False positives, missed concepts

   Manual tagging—humans with appropriate expertise
    (sometimes called Subject Matter Experts, or SMEs) read the
    content and apply tags
        Pros: Precise, ideal when clinical judgment is required
        Cons: Cost-prohibitive for large volumes of content, hard to scale, inconsistent
              (humans make subjective choices!)

   Hybrid—automated process followed by manual
    review/modification
        For high-value, specialized sites (such as clinical decision support that require “one best
         answer” results) this extra human touch can be necessary
        Some content types aren’t accessible to automated systems (multimedia)
                                                                                                       22
<collection1, collection2>                                                     Tagging for different uses
<summary>
                                                                               <Collections> What “buckets” does this
Disease                                                                        content object belong in?
<summary>
Diagnosis                                                                      Assignment of content into topical
Lorem ipsum dolor sit amet, cras sagittis velit velit fermentum dignissim,
<odio purus>, in enim phasellus eget, tincidunt suspendisse tempus.
                                                                               collections for major site navigation or
<Egestas tempor> eu id velit rutrum, per diam arcu eget nec placerat.          product definition
            <summary>
              TABLE. Rewrewqrq <rewqrewreq dsfdsafsda>                         topic collections; microsites;
                 fdsfsdafdsfds    fdsfdsfdsafds     fdsfdsfdsfds               virtual journals…
                  rewrewrq          rewqrwq           rewrwq

                                                                               <Section Summaries> What is this
     <summary>
     Subheading. <Pretium consequat> luctus nascetur. Interdum                 section/article/chapter about?
     et quis malesuada pellentesque. Lorem nonummy <massa tristique>
     augue viverra., ridiculus eleifend at.                                    Most significant topics discussed at the
                                                                               article/chapter/ section (wrapper) level
            <summary>
                                                                               answers to clinical questions; review;
                                                                               skills assessment…
             FIGURE. <Tincidunt suspendisse> tempus cras.
                                                                               <Entities> What is this thing?
<summary>
Treatment                                                                      Important concepts at the paragraph/list/
<Tincidunt> suspendisse amet, cras sagittis velit velit fermentum dignissim,   table/figure (granular) level
odio purus, in enim phasellus eget, <tincidunt suspendisse tempus>. Egestas
tempor eu id <lorem ipsum dolor> sit amet.
                                                                               complex search queries; concept overlap
References                                                                     analysis; specific entity types like drugs,
1.   Lorem ipsum dolor sit amet, cras sagittis velit velit
2.   Lorem ipsum dolor sit amet, cras sagittis velit velit fermentum           genes, clinical trials, manufacturers…        23
WHY
    should you care
(What are the benefits?)



                           24
Failure of the status quo
   Information scarcity is no longer the issue.
    Attention scarcity is the problem.
   The publisher’s role in information curation and
    filtering has never been more important.
    However, the tools to achieve them are changing.

           “Information is a source of learning. But unless
           it is organized, processed, and available to the
           right people in a format for decision making, it
           is a burden, not a benefit.”– William Pollard, Physicist

                                                                      25
Search accuracy, precision
   Faster, more accurate and reliable answers to
    questions enhance user productivity and thus
    improve your application’s usability and user
    satisfaction ratings.
       The accuracy threshold for STM information is very high!
        Users increasingly will not tolerate ambiguous results.
       Time-strapped users are struggling with information
        overload—fewer, better answers are often preferred.
       Tagging allows exposure of hard-to-find media like images,
        videos.


                                                                     26
“Which did you mean?” at McGraw-Hill’s
AccessMedicine




                                         27
28
Pathways to related content

 Related search terms
 Links to related content within and across
  resources
 Dynamically generated as new content is
  added
 Goal: Increases serendipitous discovery, site
  stickiness, and usage metrics like number of
  page views and time on site

                                                  29
30
31
Contextual integrations
   Internally—across titles and content types
    (journals, books, videos, images, e-learning…)
   Externally—with partners and external data sets
   Increasingly important to integrate content into
    customer workflows—to bring content to them
    in context as they do their daily work
       clinicians at point of care
       students as prepare for exam



                                                       32
New products
   Content recycling: Create new products from
    content you already have
       Image collections
       Mashup and micro products that serve specialized audiences and
        fit specific workflows
       Topically constructed objects like virtual journals, knowledge
        environments, coursepacks, learning objects

               You can cost-effectively create
             niche products not possible before


                                                                         33
AIP/APS virtual journals




                           34
Search engine optimization
   Granular topic exposure leads to better
    ranking in major search engines
   Next wave of discovery tools (intelligent agents,
    virtual research assistants) will give greater weight
    to content they can understand
   Tags can also be exposed to help create auto-
    extracts for content that doesn’t have abstracts
    (like book chapters)

                                                            35
36
Semantic users
   As users search and navigate semantic content, you can attach the
    tags on that content to them

   A semantic profile for a user can be created from his/her site
    activity
       What topics are they interested in?
       How are their interests evolving?

   Use this information to create personalized information services

   Excellent method for encouraging anonymous institutional users to
    register/log in

   Use topical affinities between users to create communities of
    practice—groups of people who share a passion for something they
    do and learn how to do it better through social interaction
                                                                        37
Contextual advertising
   Match article and ad semantic tags to precisely
    target ads based on topic
   OR, block ads from appearing next to articles on
    related topics
   OR (even better): Alternative advertising method
       Advertising can be targeted to the user profile, not just the article
       Avoid targeting editorially sensitive pages but still deliver ads
        that match that user’s interests on neutral pages or alerts
       For classified/job ad targeting, user interests can be matched up
        with demographics like location

                                                                                38
What about mobile?

   Reduction in number of
    clicks!
   Precision in search
   Quick links to what
    most users need
   Targeted navigation that
    leads to content most
    important (answers to
    clinical questions)
                               39
HOW
to get started




                 40
Questions for you and your
application/hosting providers
   What are your user stories/use cases?
   What are the business benefits/ROI for your
    organization?
   What content do you need to tag, how is that content
    delivered, and can those delivery systems/platforms use
    taxonomy and tagging in a way that supports your user
    needs?
   What’s your plan for keeping your taxonomy up to date?
   Can your “living” taxonomy be integrated into your
    applications? In real time as you make updates?
                                                              41
Questions for semantic tech
providers
   Does the technology support your user stories/
    use cases?
   Does it offer/integrate with a constantly evolving
    taxonomy?
   Does it meet the accuracy threshold for your users
    and your content?
   Can it tag at the depth—both granular and summary
    level—necessary? Figures and tables? Top-level
    collections?

                                                         42
The semantic user story

I am specifically identifying --------------
because -------------------- is very important
to my ------------------- users

when they are ------------------ -.




                                                 43
The semantic user story
I am specifically identifying concise disease
treatment content because immediate access to
treatment options is very important to my
emergency physician users when they are seeing
20 patients an hour.



                                                 44
McGraw-Hill: metadata targeted to deliver
fast, concise treatment info to ER doc




                                            45
The semantic user story
I am specifically identifying skin disorder images
on all body locations and all types of skin because
visual diagnosis is very important to my family
physician users.




                                                      46
Derm101: images displayed in diagnosis
search results




                                         47
What are your user
stories?
   Problems/needs to solve for your users
       Delivering top quality care under serious time constraints
       Explosion of new research to keep up with and integrate into
        practice
       Need to pass a licensing exam
   Problems/needs to solve for your
    organization
       Creating new products that grow and diversify revenue
       Creating more value from advertising
       Gaining insight into users

                                                                       48
Thank you!                                 “Organizing is
                                           what you do
   Pam Harley                             before you do
    VP, Product & Market Development       something, so
    SemedicaTM A DIVISION OF SILVERCHAIR
    pamh@semedica.com
                                           that when you
    (434) 296-6333 x372                    do it, it is not all
                                           mixed up.”
    www.silverchair.com
    www.semedica.com                                –A. A. Milne




                                                                   49

More Related Content

Similar to Dynamic Potential of Semantic Enrichment

Linked data vocabularies ala lldig 20140126
Linked data vocabularies   ala lldig 20140126Linked data vocabularies   ala lldig 20140126
Linked data vocabularies ala lldig 20140126James R. Morris
 
Vocabulary interoperability in the semantic web james r morris
Vocabulary interoperability in the semantic web   james r morrisVocabulary interoperability in the semantic web   james r morris
Vocabulary interoperability in the semantic web james r morrisJames R. Morris
 
Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...butest
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startupDzung Nguyen
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Jenn Riley
 
Connecting Publications and Data
Connecting Publications and DataConnecting Publications and Data
Connecting Publications and DataMichael Habib
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Karen Thompson
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Suite Solutions
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK Conference
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartHewlett Packard Enterprise Services
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slidesNeil Perlin
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slidesNeil Perlin
 
Structured design: Modular style for modern content
Structured design: Modular style for modern contentStructured design: Modular style for modern content
Structured design: Modular style for modern contentChristopher Hess
 
Writing good C# code for good cloud applications - Draft Oct 20, 2014
Writing good C# code for good cloud applications - Draft Oct 20, 2014Writing good C# code for good cloud applications - Draft Oct 20, 2014
Writing good C# code for good cloud applications - Draft Oct 20, 2014Marco Parenzan
 
LOR Characteristics and Considerations
LOR Characteristics and ConsiderationsLOR Characteristics and Considerations
LOR Characteristics and ConsiderationsScott Leslie
 
Scalable architectures for phenotype libraries
Scalable architectures for phenotype librariesScalable architectures for phenotype libraries
Scalable architectures for phenotype librariesMartin Chapman
 
Terminology Management
Terminology ManagementTerminology Management
Terminology ManagementUwe Muegge
 

Similar to Dynamic Potential of Semantic Enrichment (20)

Linked data vocabularies ala lldig 20140126
Linked data vocabularies   ala lldig 20140126Linked data vocabularies   ala lldig 20140126
Linked data vocabularies ala lldig 20140126
 
Vocabulary interoperability in the semantic web james r morris
Vocabulary interoperability in the semantic web   james r morrisVocabulary interoperability in the semantic web   james r morris
Vocabulary interoperability in the semantic web james r morris
 
Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
Connecting Publications and Data
Connecting Publications and DataConnecting Publications and Data
Connecting Publications and Data
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
"Paradigm Shifting" Presentation
"Paradigm Shifting" Presentation"Paradigm Shifting" Presentation
"Paradigm Shifting" Presentation
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slides
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slides
 
Annotation Of
Annotation OfAnnotation Of
Annotation Of
 
Structured design: Modular style for modern content
Structured design: Modular style for modern contentStructured design: Modular style for modern content
Structured design: Modular style for modern content
 
Writing good C# code for good cloud applications - Draft Oct 20, 2014
Writing good C# code for good cloud applications - Draft Oct 20, 2014Writing good C# code for good cloud applications - Draft Oct 20, 2014
Writing good C# code for good cloud applications - Draft Oct 20, 2014
 
KMA Taxonomy TBC2010
KMA Taxonomy TBC2010KMA Taxonomy TBC2010
KMA Taxonomy TBC2010
 
LOR Characteristics and Considerations
LOR Characteristics and ConsiderationsLOR Characteristics and Considerations
LOR Characteristics and Considerations
 
Scalable architectures for phenotype libraries
Scalable architectures for phenotype librariesScalable architectures for phenotype libraries
Scalable architectures for phenotype libraries
 
Terminology Management
Terminology ManagementTerminology Management
Terminology Management
 

Dynamic Potential of Semantic Enrichment

  • 1. The Dynamic Potential of Semantic Enrichment or, Everything You Always Wanted to Know About Semantic Enrichment OK, not everything. Not even most things. Just some things you probably should be aware of. Allen Press Emerging Trends in Scholarly Publishing™ Seminar 14 April 2011  Pam Harley VP, Product & Market Development SemedicaTM A DIVISION OF SILVERCHAIR pamh@semedica.com (434) 296-6333 x372
  • 2. Why me?  Me  20+ years in STM publishing, many hats worn  print, digital  books, journals, news, continuing education…  editorial, production, product development  Silverchair  10+ years working with STM publishers to build products and features from semantically tagged content 2
  • 3. Here’s the plan  WHAT is semantic enrichment  WHY you should care (benefits)  HOW to get started (with a few side trips to make sure we’re all on the same page re: lingo) 3
  • 4. First… DON’T do what I’m about to do  Don’t start by exploring technology  (Hint: Start with user stories) 4
  • 5. What’s a user story? a user story captures what the user wants to achieve—who wants the functionality and why it allows that user to achieve something useful 5
  • 6. Creating user stories  Focus your tagging strategy on user stories—how people want to use your content:  What tasks are they trying to do when they use your product? What answers are they looking for? At what point in their workflow is your product used?  Almost all information sites have multiple user stories. Know them for your products  Remember that your organization is also a key user of your product 6
  • 7. WHAT is semantic… enrichment tagging markup indexing fingerprinting classification categorization ? 7
  • 8. Semantics are about meaning  The meaning of content is currently written for human understanding, not computers  Semantics adds a layer of meaning to your content, so that computers can make sense of it and build connections to it  Semantic metadata answers the most important question of all for content producers and users: What is this content about? captured in a way that computers can process 8
  • 9. “Atomizing” information  A semantic approach requires you to go beyond documents and think of your content as data  Semantic markup allows knowledge in your publications to be acted on as distinct bits of data For example: 1 practice guideline = 1 document OR 1 practice guideline = 312 distinct pieces of data 9
  • 10. Taxonomy is the semantic foundation  Taxonomy is the framework for the semantic layer and semantic tagging  It allows…  Normalization  Consistency in tagging  Concept grouping and hierarchical relationships  Integrations/interoperability (internal and external) 10
  • 11. Equivalent relationships are critical  Synonyms, abbreviations, jargon, misspellings, codes are a critical component  Necessary to normalize the natural and constantly evolving variations in the language that authors use to describe concepts and searchers use to find them  Vastly improve performance of autotagging systems  Precise strings are easier to match programmatically, and a thesaurus magnifies the number of strings available to match to a given concept 11
  • 12. Normalization  Authors use different terminology to represent the same topics Examples: Synonyms (newborn = neonate) Abbreviations (GHB = gamma hydroxybutyrate) Shorthand (c diff = clostridium difficile)  Searches for these language variations produce different results  A semantic layer controlled by a taxonomy/ thesaurus normalizes these variations 12
  • 13. Normalization in action at McGraw-Hill’s AccessEmergency Medicine 13
  • 15. Dynamic concept grouping and hierarchical relationships 15
  • 17. Where does a taxonomy come from?  Your content collection  Inputs from your users (e.g., author keywords, search logs)  Subject matter expert consultation  Industry standard terminologies  Source for concepts, equivalents, guidance on hierarchy 17
  • 18. The importance of industry standard terminologies  Your taxonomy must be able to interact with standards of your domain to forge meaningful external integrations  Many terminologies are in use in different scientific domains (UMLS, ACS, ACM, AIP, IEEE, OSA, EPA, NASA, USGS…). Investigate what’s available  Great case example for domain-level taxonomy: For medical content, UMLS metathesaurus maps together 100+ constituent health care vocabularies (MeSH, SNOMED, ICD, RxNorm…) to support health care interoperability 18
  • 19. Don’t reinvent the wheel!  If there’s a taxonomy available that’s a good fit, use it  BUT make sure you have a mechanism for adapting it to meet the needs of  your content  your users  the pace of change/new concepts in your field [Note to STM publishers in cutting-edge areas: You can’t wait for the standards to catch up to your research output— you’ll need to be able to add concepts at the time of publication] 19
  • 20. Ongoing taxonomy management  Taxonomies must be continually enhanced as your domain evolves, your content set grows, and your user needs and expectations change  Make sure it is easy to update your taxonomy and make it available to your systems (tagging, web applications), ideally in real time Taxonomies should always be considered a work in progress! 20
  • 21. Application of taxonomy to content—semantic tagging  Semantic tagging is the insertion of semantic information at the level of XML elements Example: <root-term termID="47521">t cells, regulatory</root-term>  Tagging can be embedded directly in XML, provided as separate reference files, or placed in database tables that reference elements  If the content is inaccessible (e.g., images and videos, PDFs) tagging can be placed in header files 21
  • 22. Who/what tags?  Automated tagging—software analyzes content, adds tags based on concept matching, patterns, grammar Pros: Highly scalable, good at finding trends in large bodies of content. Sometimes the only option for very large data sets Cons: False positives, missed concepts  Manual tagging—humans with appropriate expertise (sometimes called Subject Matter Experts, or SMEs) read the content and apply tags Pros: Precise, ideal when clinical judgment is required Cons: Cost-prohibitive for large volumes of content, hard to scale, inconsistent (humans make subjective choices!)  Hybrid—automated process followed by manual review/modification  For high-value, specialized sites (such as clinical decision support that require “one best answer” results) this extra human touch can be necessary  Some content types aren’t accessible to automated systems (multimedia) 22
  • 23. <collection1, collection2> Tagging for different uses <summary> <Collections> What “buckets” does this Disease content object belong in? <summary> Diagnosis Assignment of content into topical Lorem ipsum dolor sit amet, cras sagittis velit velit fermentum dignissim, <odio purus>, in enim phasellus eget, tincidunt suspendisse tempus. collections for major site navigation or <Egestas tempor> eu id velit rutrum, per diam arcu eget nec placerat. product definition <summary> TABLE. Rewrewqrq <rewqrewreq dsfdsafsda> topic collections; microsites; fdsfsdafdsfds fdsfdsfdsafds fdsfdsfdsfds virtual journals… rewrewrq rewqrwq rewrwq <Section Summaries> What is this <summary> Subheading. <Pretium consequat> luctus nascetur. Interdum section/article/chapter about? et quis malesuada pellentesque. Lorem nonummy <massa tristique> augue viverra., ridiculus eleifend at. Most significant topics discussed at the article/chapter/ section (wrapper) level <summary> answers to clinical questions; review; skills assessment… FIGURE. <Tincidunt suspendisse> tempus cras. <Entities> What is this thing? <summary> Treatment Important concepts at the paragraph/list/ <Tincidunt> suspendisse amet, cras sagittis velit velit fermentum dignissim, table/figure (granular) level odio purus, in enim phasellus eget, <tincidunt suspendisse tempus>. Egestas tempor eu id <lorem ipsum dolor> sit amet. complex search queries; concept overlap References analysis; specific entity types like drugs, 1. Lorem ipsum dolor sit amet, cras sagittis velit velit 2. Lorem ipsum dolor sit amet, cras sagittis velit velit fermentum genes, clinical trials, manufacturers… 23
  • 24. WHY should you care (What are the benefits?) 24
  • 25. Failure of the status quo  Information scarcity is no longer the issue. Attention scarcity is the problem.  The publisher’s role in information curation and filtering has never been more important. However, the tools to achieve them are changing. “Information is a source of learning. But unless it is organized, processed, and available to the right people in a format for decision making, it is a burden, not a benefit.”– William Pollard, Physicist 25
  • 26. Search accuracy, precision  Faster, more accurate and reliable answers to questions enhance user productivity and thus improve your application’s usability and user satisfaction ratings.  The accuracy threshold for STM information is very high! Users increasingly will not tolerate ambiguous results.  Time-strapped users are struggling with information overload—fewer, better answers are often preferred.  Tagging allows exposure of hard-to-find media like images, videos. 26
  • 27. “Which did you mean?” at McGraw-Hill’s AccessMedicine 27
  • 28. 28
  • 29. Pathways to related content  Related search terms  Links to related content within and across resources  Dynamically generated as new content is added  Goal: Increases serendipitous discovery, site stickiness, and usage metrics like number of page views and time on site 29
  • 30. 30
  • 31. 31
  • 32. Contextual integrations  Internally—across titles and content types (journals, books, videos, images, e-learning…)  Externally—with partners and external data sets  Increasingly important to integrate content into customer workflows—to bring content to them in context as they do their daily work  clinicians at point of care  students as prepare for exam 32
  • 33. New products  Content recycling: Create new products from content you already have  Image collections  Mashup and micro products that serve specialized audiences and fit specific workflows  Topically constructed objects like virtual journals, knowledge environments, coursepacks, learning objects You can cost-effectively create niche products not possible before 33
  • 35. Search engine optimization  Granular topic exposure leads to better ranking in major search engines  Next wave of discovery tools (intelligent agents, virtual research assistants) will give greater weight to content they can understand  Tags can also be exposed to help create auto- extracts for content that doesn’t have abstracts (like book chapters) 35
  • 36. 36
  • 37. Semantic users  As users search and navigate semantic content, you can attach the tags on that content to them  A semantic profile for a user can be created from his/her site activity  What topics are they interested in?  How are their interests evolving?  Use this information to create personalized information services  Excellent method for encouraging anonymous institutional users to register/log in  Use topical affinities between users to create communities of practice—groups of people who share a passion for something they do and learn how to do it better through social interaction 37
  • 38. Contextual advertising  Match article and ad semantic tags to precisely target ads based on topic  OR, block ads from appearing next to articles on related topics  OR (even better): Alternative advertising method  Advertising can be targeted to the user profile, not just the article  Avoid targeting editorially sensitive pages but still deliver ads that match that user’s interests on neutral pages or alerts  For classified/job ad targeting, user interests can be matched up with demographics like location 38
  • 39. What about mobile?  Reduction in number of clicks!  Precision in search  Quick links to what most users need  Targeted navigation that leads to content most important (answers to clinical questions) 39
  • 41. Questions for you and your application/hosting providers  What are your user stories/use cases?  What are the business benefits/ROI for your organization?  What content do you need to tag, how is that content delivered, and can those delivery systems/platforms use taxonomy and tagging in a way that supports your user needs?  What’s your plan for keeping your taxonomy up to date?  Can your “living” taxonomy be integrated into your applications? In real time as you make updates? 41
  • 42. Questions for semantic tech providers  Does the technology support your user stories/ use cases?  Does it offer/integrate with a constantly evolving taxonomy?  Does it meet the accuracy threshold for your users and your content?  Can it tag at the depth—both granular and summary level—necessary? Figures and tables? Top-level collections? 42
  • 43. The semantic user story I am specifically identifying -------------- because -------------------- is very important to my ------------------- users when they are ------------------ -. 43
  • 44. The semantic user story I am specifically identifying concise disease treatment content because immediate access to treatment options is very important to my emergency physician users when they are seeing 20 patients an hour. 44
  • 45. McGraw-Hill: metadata targeted to deliver fast, concise treatment info to ER doc 45
  • 46. The semantic user story I am specifically identifying skin disorder images on all body locations and all types of skin because visual diagnosis is very important to my family physician users. 46
  • 47. Derm101: images displayed in diagnosis search results 47
  • 48. What are your user stories?  Problems/needs to solve for your users  Delivering top quality care under serious time constraints  Explosion of new research to keep up with and integrate into practice  Need to pass a licensing exam  Problems/needs to solve for your organization  Creating new products that grow and diversify revenue  Creating more value from advertising  Gaining insight into users 48
  • 49. Thank you! “Organizing is what you do  Pam Harley before you do VP, Product & Market Development something, so SemedicaTM A DIVISION OF SILVERCHAIR pamh@semedica.com that when you (434) 296-6333 x372 do it, it is not all mixed up.” www.silverchair.com www.semedica.com –A. A. Milne 49