Your SlideShare is downloading. ×
0
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Semantic Analysis in IA
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Semantic Analysis in IA

3,910

Published on

English is a messy and chaotic language, with exceptions to rules, different styles of writing, and a multitude of different ways to write about the same thing. This chaos means that analysis, …

English is a messy and chaotic language, with exceptions to rules, different styles of writing, and a multitude of different ways to write about the same thing. This chaos means that analysis, categorisation and building a corporate taxonomy is a very time consuming task, even if it’s just for the navigation of the local intranet- or internet website.

This is my presentation at Oz-IA -- about my recent experience in turning ‘scary-bad’ medical restrictions text into something machine-usable. It introduces the concept of Semantic Analysis, the methodology I used to investigate the linguistic patterns in the text, and how this facilitated information classification and codification of content.

Published in: Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,910
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
220
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Semantic Analysis in IA Matthew Hodgson ACT regional-lead, Web and Information Management 23 Sept 2007
    • 2.  
    • 3.  
    • 4. Jeffrey Veen on analysing content <ul><li>“a mind-numbingly detailed odyssey through your web site... </li></ul><ul><li>…this process…is a relatively straightforward process of clicking through your web site and recording what you find.” </li></ul>Source: http://www.adaptivepath.com/ideas/essays/archives/000040.php
    • 5. &nbsp;
    • 6. Content overview – first take <ul><li>Medical restrictions text </li></ul><ul><li>Free-text built in Word and hand-crafted (*grrr*) </li></ul><ul><li>Unclassified </li></ul><ul><li>Varied consistency within and between texts </li></ul><ul><li>Highly complex sentence structures in pseudo-legalese </li></ul><ul><li>Style reflects the author rather than the meaning in the communication </li></ul><ul><li>Content needed for re-use </li></ul><ul><li>Content output was needed for reuse by others </li></ul><ul><li>Multiple audiences </li></ul><ul><li>Multiple purposes for re-use </li></ul><ul><li>Codification </li></ul><ul><li>Codification (after authoring) takes too long </li></ul><ul><li>Need to reduce timeframes! </li></ul>
    • 7. The task . . .analyse and codify
    • 8. <ul><li>Linguistics </li></ul><ul><li>… a whole discipline devoted to the </li></ul><ul><li>study of language </li></ul>
    • 9. “You’re joking!?” <ul><li>All language has structure – even someone’s pseudo-legal English </li></ul><ul><li>Analysing language is actually easier than you might think </li></ul>
    • 10. The approach <ul><li>Analyse semantics of content </li></ul><ul><li>There is a predicable structure </li></ul><ul><li>It’s all just Lego™ building blocks (nouns, verbs, adjectives, etc) </li></ul><ul><li>Implied meaning can be made overt </li></ul><ul><li>New tools for IAs to play with! </li></ul><ul><li>Understand semantics, the structure of sentences, and you can analyse, categorise and codify English! </li></ul>
    • 11. Language as Lego™ <ul><li>Building blocks </li></ul><ul><li>Subject (S) </li></ul><ul><li>Verb (V) </li></ul><ul><li>Object (O) </li></ul><ul><li>Order of blocks </li></ul><ul><li>Differs depending on the language </li></ul>
    • 12. Order from chaos <ul><li>SVO languages </li></ul><ul><li>English, French, Chinese, Bulgarian, Swahili </li></ul><ul><li>SOV </li></ul><ul><li>Japanese, Turkish, Korean </li></ul><ul><li>VSO </li></ul><ul><li>Classical Arabic, Celtic and Hawaiian </li></ul><ul><li>VOS </li></ul><ul><li>Fijian, Yoda’s amusing phrases </li></ul>
    • 13. Subjects, verbs and objects <ul><li>Sometimes, though, the SVO structure is hidden: </li></ul><ul><li>The apple is red or </li></ul><ul><li>The apple is a red apple? </li></ul><ul><li>Uncovering the hidden structure helps to differentiate between the subject and the object and identify the who and what </li></ul>
    • 14. Sentences as (apple) trees
    • 15. Semantic analysis <ul><li>Medical restrictions wording: </li></ul><ul><li>Restricted benefit Gastro-oesophageal reflux disease; Scleroderma oesophagus; </li></ul><ul><li>Authority required Peptic ulcer </li></ul>
    • 16. Semantic analysis (cont.) <ul><li>Actual sentence </li></ul><ul><li>Peptic ulcer </li></ul><ul><li>Implied sentence </li></ul><ul><li>The prescription of medicine is restricted to the initial treatment of patients with peptic ulcer </li></ul>
    • 17. &nbsp;
    • 18. &nbsp;
    • 19. “ Who Treated” semantic model
    • 20. “ Authority Action” semantic model
    • 21. High-level semantic overview
    • 22. How did the ‘trees’ help? <ul><li>Inferred </li></ul><ul><li>How people think about and structure content </li></ul><ul><li>Described </li></ul><ul><li>Business processes that produce content </li></ul><ul><li>Identified </li></ul><ul><li>Where content quality is poor so it can be improved </li></ul><ul><li>Critical components of the sentence for codification </li></ul><ul><li>Designed </li></ul><ul><li>Taxonomies and describe folk taxonomies </li></ul><ul><li>Built </li></ul><ul><li>Systems to help bring some structure to content authoring </li></ul>
    • 23. How can I do this stuff too?! (a side-step) <ul><li>Theory is important </li></ul><ul><li>An understanding of semantics - sentence trees and grammar </li></ul><ul><li>Text books by authors like Fromkin and Rodman can help through the tricky bits </li></ul><ul><li>Need good tools </li></ul><ul><li>Conexor : www.conexor.fi/demo/syntax </li></ul><ul><li>Big sheets of paper (and an electronic whiteboard) </li></ul><ul><li>Visio (not PowerPoint!) </li></ul>
    • 24. Demo <ul><li>Connexor </li></ul><ul><li>www.conexor.fi/demo/syntax </li></ul>
    • 25. Introducing ways to codify restrictions <ul><li>How are we actually going to codify the stuff?! </li></ul><ul><li>Give people Lego™ or ‘fridge-magnets’ to build sentences </li></ul><ul><li>Build a prototype to explore and demonstrate conceptual design </li></ul><ul><li>Communicate </li></ul><ul><li>Talk about ideas with business owners </li></ul><ul><li>Explore possibilities with end-users </li></ul><ul><li>Build-in ‘no surprises’ into change management </li></ul><ul><li>Iterate </li></ul><ul><li>Iterate and refine concepts and design before it was built </li></ul><ul><li>Inform </li></ul><ul><li>Developers of intent and requirements </li></ul><ul><li>The building of an ‘tool’ for codifying content (hooray for Axure!) </li></ul>
    • 26. Demo <ul><li>Protyotyping with Axure </li></ul>
    • 27. &nbsp;
    • 28. &nbsp;
    • 29. &nbsp;
    • 30. &nbsp;
    • 31. &nbsp;
    • 32. &nbsp;
    • 33. &nbsp;
    • 34. &nbsp;
    • 35. Why should I care about this? <ul><li>Google uses semantic analysis to index content </li></ul><ul><li>Translation software uses semantic analysis to identify ‘components’ for translation </li></ul><ul><li>Good sentence structure equals: </li></ul><ul><ul><li>Accurate indexing </li></ul></ul><ul><ul><li>Higher rank relevance of content </li></ul></ul><ul><ul><li>Happy people (they find what they’re looking for) </li></ul></ul>
    • 36. Summing up <ul><li>Content is still king, but: </li></ul><ul><li>Is it’s quality any good? </li></ul><ul><li>Does it match your website’s categories? </li></ul><ul><li>Is your metadata ok? </li></ul><ul><li>Can people find the content they need? </li></ul><ul><li>Do you need to understand your content better? </li></ul><ul><li>Semantic analysis can: </li></ul><ul><li>Make your content audits more objective </li></ul><ul><li>Inform processes to improve the quality of the content </li></ul><ul><li>Inform processes to improve search engine indexing </li></ul><ul><li>Inform metadata creation </li></ul><ul><li>Improve website navigation design </li></ul>
    • 37. <ul><li>email: [email_address] web: www.smsmt.com </li></ul><ul><li>blog: magia3e.wordpress.com twitter: magia3e community: iacanberra.org </li></ul><ul><li>cartoons: © Garry Larson </li></ul>Please Sir, can I have some more…?
    • 38. <ul><li>Fin </li></ul>

    ×