Semantic Analysis in IA

4,527 views

Published on

English is a messy and chaotic language, with exceptions to rules, different styles of writing, and a multitude of different ways to write about the same thing. This chaos means that analysis, categorisation and building a corporate taxonomy is a very time consuming task, even if it’s just for the navigation of the local intranet- or internet website.

This is my presentation at Oz-IA -- about my recent experience in turning ‘scary-bad’ medical restrictions text into something machine-usable. It introduces the concept of Semantic Analysis, the methodology I used to investigate the linguistic patterns in the text, and how this facilitated information classification and codification of content.

Published in: Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,527
On SlideShare
0
From Embeds
0
Number of Embeds
88
Actions
Shares
0
Downloads
226
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Semantic Analysis in IA

    1. 1. Semantic Analysis in IA Matthew Hodgson ACT regional-lead, Web and Information Management 23 Sept 2007
    2. 4. Jeffrey Veen on analysing content <ul><li>“a mind-numbingly detailed odyssey through your web site... </li></ul><ul><li>…this process…is a relatively straightforward process of clicking through your web site and recording what you find.” </li></ul>Source: http://www.adaptivepath.com/ideas/essays/archives/000040.php
    3. 6. Content overview – first take <ul><li>Medical restrictions text </li></ul><ul><li>Free-text built in Word and hand-crafted (*grrr*) </li></ul><ul><li>Unclassified </li></ul><ul><li>Varied consistency within and between texts </li></ul><ul><li>Highly complex sentence structures in pseudo-legalese </li></ul><ul><li>Style reflects the author rather than the meaning in the communication </li></ul><ul><li>Content needed for re-use </li></ul><ul><li>Content output was needed for reuse by others </li></ul><ul><li>Multiple audiences </li></ul><ul><li>Multiple purposes for re-use </li></ul><ul><li>Codification </li></ul><ul><li>Codification (after authoring) takes too long </li></ul><ul><li>Need to reduce timeframes! </li></ul>
    4. 7. The task . . .analyse and codify
    5. 8. <ul><li>Linguistics </li></ul><ul><li>… a whole discipline devoted to the </li></ul><ul><li>study of language </li></ul>
    6. 9. “You’re joking!?” <ul><li>All language has structure – even someone’s pseudo-legal English </li></ul><ul><li>Analysing language is actually easier than you might think </li></ul>
    7. 10. The approach <ul><li>Analyse semantics of content </li></ul><ul><li>There is a predicable structure </li></ul><ul><li>It’s all just Lego™ building blocks (nouns, verbs, adjectives, etc) </li></ul><ul><li>Implied meaning can be made overt </li></ul><ul><li>New tools for IAs to play with! </li></ul><ul><li>Understand semantics, the structure of sentences, and you can analyse, categorise and codify English! </li></ul>
    8. 11. Language as Lego™ <ul><li>Building blocks </li></ul><ul><li>Subject (S) </li></ul><ul><li>Verb (V) </li></ul><ul><li>Object (O) </li></ul><ul><li>Order of blocks </li></ul><ul><li>Differs depending on the language </li></ul>
    9. 12. Order from chaos <ul><li>SVO languages </li></ul><ul><li>English, French, Chinese, Bulgarian, Swahili </li></ul><ul><li>SOV </li></ul><ul><li>Japanese, Turkish, Korean </li></ul><ul><li>VSO </li></ul><ul><li>Classical Arabic, Celtic and Hawaiian </li></ul><ul><li>VOS </li></ul><ul><li>Fijian, Yoda’s amusing phrases </li></ul>
    10. 13. Subjects, verbs and objects <ul><li>Sometimes, though, the SVO structure is hidden: </li></ul><ul><li>The apple is red or </li></ul><ul><li>The apple is a red apple? </li></ul><ul><li>Uncovering the hidden structure helps to differentiate between the subject and the object and identify the who and what </li></ul>
    11. 14. Sentences as (apple) trees
    12. 15. Semantic analysis <ul><li>Medical restrictions wording: </li></ul><ul><li>Restricted benefit Gastro-oesophageal reflux disease; Scleroderma oesophagus; </li></ul><ul><li>Authority required Peptic ulcer </li></ul>
    13. 16. Semantic analysis (cont.) <ul><li>Actual sentence </li></ul><ul><li>Peptic ulcer </li></ul><ul><li>Implied sentence </li></ul><ul><li>The prescription of medicine is restricted to the initial treatment of patients with peptic ulcer </li></ul>
    14. 19. “ Who Treated” semantic model
    15. 20. “ Authority Action” semantic model
    16. 21. High-level semantic overview
    17. 22. How did the ‘trees’ help? <ul><li>Inferred </li></ul><ul><li>How people think about and structure content </li></ul><ul><li>Described </li></ul><ul><li>Business processes that produce content </li></ul><ul><li>Identified </li></ul><ul><li>Where content quality is poor so it can be improved </li></ul><ul><li>Critical components of the sentence for codification </li></ul><ul><li>Designed </li></ul><ul><li>Taxonomies and describe folk taxonomies </li></ul><ul><li>Built </li></ul><ul><li>Systems to help bring some structure to content authoring </li></ul>
    18. 23. How can I do this stuff too?! (a side-step) <ul><li>Theory is important </li></ul><ul><li>An understanding of semantics - sentence trees and grammar </li></ul><ul><li>Text books by authors like Fromkin and Rodman can help through the tricky bits </li></ul><ul><li>Need good tools </li></ul><ul><li>Conexor : www.conexor.fi/demo/syntax </li></ul><ul><li>Big sheets of paper (and an electronic whiteboard) </li></ul><ul><li>Visio (not PowerPoint!) </li></ul>
    19. 24. Demo <ul><li>Connexor </li></ul><ul><li>www.conexor.fi/demo/syntax </li></ul>
    20. 25. Introducing ways to codify restrictions <ul><li>How are we actually going to codify the stuff?! </li></ul><ul><li>Give people Lego™ or ‘fridge-magnets’ to build sentences </li></ul><ul><li>Build a prototype to explore and demonstrate conceptual design </li></ul><ul><li>Communicate </li></ul><ul><li>Talk about ideas with business owners </li></ul><ul><li>Explore possibilities with end-users </li></ul><ul><li>Build-in ‘no surprises’ into change management </li></ul><ul><li>Iterate </li></ul><ul><li>Iterate and refine concepts and design before it was built </li></ul><ul><li>Inform </li></ul><ul><li>Developers of intent and requirements </li></ul><ul><li>The building of an ‘tool’ for codifying content (hooray for Axure!) </li></ul>
    21. 26. Demo <ul><li>Protyotyping with Axure </li></ul>
    22. 35. Why should I care about this? <ul><li>Google uses semantic analysis to index content </li></ul><ul><li>Translation software uses semantic analysis to identify ‘components’ for translation </li></ul><ul><li>Good sentence structure equals: </li></ul><ul><ul><li>Accurate indexing </li></ul></ul><ul><ul><li>Higher rank relevance of content </li></ul></ul><ul><ul><li>Happy people (they find what they’re looking for) </li></ul></ul>
    23. 36. Summing up <ul><li>Content is still king, but: </li></ul><ul><li>Is it’s quality any good? </li></ul><ul><li>Does it match your website’s categories? </li></ul><ul><li>Is your metadata ok? </li></ul><ul><li>Can people find the content they need? </li></ul><ul><li>Do you need to understand your content better? </li></ul><ul><li>Semantic analysis can: </li></ul><ul><li>Make your content audits more objective </li></ul><ul><li>Inform processes to improve the quality of the content </li></ul><ul><li>Inform processes to improve search engine indexing </li></ul><ul><li>Inform metadata creation </li></ul><ul><li>Improve website navigation design </li></ul>
    24. 37. <ul><li>email: [email_address] web: www.smsmt.com </li></ul><ul><li>blog: magia3e.wordpress.com twitter: magia3e community: iacanberra.org </li></ul><ul><li>cartoons: © Garry Larson </li></ul>Please Sir, can I have some more…?
    25. 38. <ul><li>Fin </li></ul>

    ×