AAUP 2008: Making XML Work (T. Kerner)Presentation Transcript
The Next Wave of Content Technology:
Thane Kerner President & CEO, Silverchair
What are Semantics and the Semantic Web?
The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.
--W3C Semantic Web Activity Definition
The Larger Context
Web 1.0: The web of documents Web 2.0: The web of people Next: The web of data (Semantic Web) These are additive Internet-wide movements. Though there are fads and uneven progress in each step, the overall trajectory follows this basic template.
The Semantic Web requires us to go beyond documents and think of our content as data.
1 practice guideline = 1 document
1 practice guideline = 312 distinct pieces of data
This comes more naturally to industries that have traditionally dealt with uniform data (finance, travel)
If the airlines treated their data like publishers…
If the airlines treated their data like publishers…
This Week ’ s Departures (PDF, 45K) This Week ’ s Arrivals (PDF, 52K)
Your Content is a Database!
Structural XML was a great start for scholarly and professional publishers. XML removes the form factor as a limiting factor and presents the data in logical form.
Your Database is Not Alone
Anybody making real decisions uses data from many sources, produced by many sorts of organizations, and we're stymied … In a way, the Semantic Web is a bit like having all the databases out there as one big database . It's difficult to imagine the power that you're going to have when so many different sorts of data are available.
Why Are Semantics Essential To Scholarly Publishers?
The Failures of the Status Quo
Information scarcity is become less of an issue as attention scarcity is becoming more of an issue Information systems are increasingly specialized and disconnected Retrieval reliant on search phrasing (highly variable input method produces highly variable results)
Science is Synthesis
Breakthroughs are built on unique understanding and synthesis of many types of existing information. In scholarship, this is the concept of standing on the shoulders of giants. But before synthesis can occur, lower cognitive functions must take place: identification, categorization, relevance.
Who Are Your Most Important Readers?
A. People B. Computers
Who Are Your Most Important Readers?
But Can Both Understand It?
The meaning of content is currently written for human understanding, not computers. The semantic web requires a descriptive data layer that can be understood by other computer applications (intelligent agents). Therefore, a new semantic layer is needed for your data to truly join the semantic web.
The Semantic Layer
The semantic layer is an evolution of traditional web <meta> data. It is a consistent, rules-based information layer for computer logic parsing. It is a method for exposing the meaning of your content data so the computer and the network can perform more sophisticated cognitive tasks.
Chapter 23: Numbness, Tingling, and Sensory Loss Normal somatic sensation reflects a continuous monitoring process, little of which reaches consciousness under ordinary conditions. By contrast, disordered sensation, particularly when experienced as painful, is alarming and … For Humans: The Narrative Layer <semantics controlvocab= “ UMLS ” > <tag> <root-term termID="28648">sensation disorders</root-term> <sub-term termID="180">classification</sub-term> <sub-term termID="6138">terminology</sub-term> </tag> <tag> <root-term termID="39923">sensory testing</root-term> </tag> </semantics> For Computers: The Semantic Layer
Immediate Benefits of Semantics
The “Sort of…” Problem
As proposed by comedian Demetri Martin
I love you.
I love you. You ’ re going to live.
I love you. You ’ re going to live. Here ’ s the information you need.
I love you. You ’ re going to live. Here ’ s the information you need. … sort of
Precision in Discovery!
Precision in answering user queries is a key component of an application ’ s usability and user satisfaction rating. The semantic layer provides your application with a concise guide to the content in a language it can understand. It can now provide more accurate results (and fewer!).
A user wants to know about the mortality of necrotizing fasciitis .
Authors use different terminology in different books, journal articles, and even in the same book. A semantic layer with a controlled vocabulary will normalize these differences and make your user-data connections smarter. This is especially pertinent in health care.
From a Previous Example
Chapter 23: Numbness, Tingling, and Sensory Loss Normal somatic sensation reflects a continuous monitoring process, little of which reaches consciousness under ordinary conditions. By contrast, disordered sensation , particularly when experienced as painful, is alarming and … For Humans <semantics controlvocab= “ UMLS ” > <tag> <root-term termID="28648"> sensation disorders </root-term> … For Computers “ disordered sensation ” = 215 PubMed results “ sensation disorders ” = 112,577 PubMed results (raw search) = 76,826 PubMed results (MeSH major topic search)
More Need for Normalization
Synonyms (newborn = neonate) Acronyms (GHB = gamma hydroxybutyrate) Shorthand (c diff = clostridium difficile ) Bonus: You can use a semantic normalization web service in your search without tagging your content.
By using a shared vocabulary or taxonomy, you can more easily integrate your varied content (journals, books, videos, images, training). Current taxonomies in health care include: MeSH, SNOMED, ICD-10, Read Codes, (and about 100 more). The Unified Medical Language System (UMLS) is a place to start for health care integrations.
Machine-Made Context Links
Create a rich matrix of contextual linking for your users using the semantic layer. These links never have to be updated by a person — when new content is added it immediately flows in.
Semantic User Profiles
As users navigate and search semantic content, they can inherit the properties of that content. A very simple algorithm can be constructed to attach a semantic profile to a user solely from their site activity – what topics are they interested in? What do they search for? This sounds creepy, but is commonplace.
Are 10,000 semantic users more valuable than 10,000 generic users? Yes! Marketing efforts can be focused to particular subgroups. Alerting features for new content can target precise groups that will be more likely to read the message and come to the site.
Content Where are the topic gaps in your content? Where is your content complete? Semantic reports give a unified view to integrated sites and can lead new content development and guide author teams. Trends How are certain topics trending among your user groups? Is there a secondary market for your semantic trend reports? Possibly.
Next Wave of SEO
The next generation of discovery tools (intelligent agents, virtual research assistants) will give greater weight to content they can understand. Don ’ t let your database be part of the “ dark web ”— expose it through your semantic layer (without giving away your content!)
Semantic abstracts increased Google referrals to one site by 400% in 2 months.
Content Moving Into Workflow
E.g., EHR/HIT Integrations
Hospital systems are a new outlet for content sales. Requirement of this space is to provide short, focused information (not entire journal articles or book chapters). More importantly, relevance to the current situation and patient would bring high value. Semantic content is ready to deliver precisely the paragraph relevant to the current case.
How Do You Create and Use Semantics?
1. Start With Vocabularies, Taxonomies, Ontologies
Order of Complexity
Term list Simple set of words used in text Controlled vocabulary Uses only approved terms, may include thesaurus, synonyms, sound-alikes, abbreviations and jargon Taxonomy Includes structural hierarchy (parent/child) Ontology Limitless relationship types defined in system Less Complex More Complex
Taxonomy as Semantic Foundation
The taxonomy you choose will be the framework for your semantic layer and semantic tagging (be it by computer or human) To facilitate integration, choose (or create) an industry standard taxonomy when possible Taxonomies are living creatures — they should be actively managed by an expert team (our medical taxonomy is updated every day)
Silverchair’s TOTEM Taxonomy Platform
2. Tag Your Data
Tagging is the insertion of semantic information in the XML, whose smallest unit is called a tag. Tagging can also be placed in database tables and header files if the content is inaccessible (such as images and videos). Tagging should be done at the smallest “atomic” level of data possible
Human indexers are the most accurate taggers for high-value content, but computer routines can help them tag or tag extremely formulaic content. At Silverchair, we run an automated routine to place obvious tags and medical editors apply the rest. Community tagging/author tagging seems attractive, but can be risky due to inconsistency.
Add Tagging to Workflows
The biggest key to operationalizing semantics is to put the creation of the semantic layer in your current production workflows. Make it part of the editorial process for all new content. The skill set for creating semantics is analogous to content indexing (with a few extra complexities) Remember: We ’ ve done this before with XML!
Silverchair’s TagMaster Tagging Platform
Welcome to the Semantic Web!
Your content is now ready for the next generation of scholarly information systems. Start writing new features and create new businesses from your semantic layer. There are many more opportunities than those I ’ ve discussed.
Thane Kerner President & CEO, Silverchair [email_address] www.silverchair.com