IAs, Language and Lego -- an introduction to Semantic Analysis

  • 4,305 views
Uploaded on

This presentation will introduce Semantic Analysis – a way in which content can be analysed and classified through its linguistic basis, rather than through its overt meaning. It will achieve this by …

This presentation will introduce Semantic Analysis – a way in which content can be analysed and classified through its linguistic basis, rather than through its overt meaning. It will achieve this by using Lego as a metaphor for language and demonstrating that by examining the building blocks of language a deeper understanding of content can be gained.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Great preso Matthew. I really enjoyed your talk and it's always refreshing to add something new to my IA toolbox. Keep up the good work.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,305
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
162
Comments
1
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. IAs, Language and Lego™ – an Introduction to Semantic Analysis Matthew Hodgson Regional-lead, Web and Information Management, Canberra Australia 12 April 2008
  • 2.  
  • 3.  
  • 4. IA Tools for understanding content
  • 5. Content analysis…
  • 6.
    • We all:
    • Think about information in different ways
    • Write about information in different ways
    Information: we all think differently …
  • 7. … we all even write differently …
  • 8. Jeffrey Veen on analysing content
    • “ a mind-numbingly detailed odyssey through your web site...
    • … this process…is a relatively straightforward process of clicking through your web site and recording what you find.”
    Source: http://www.adaptivepath.com/ideas/essays/archives/000040.php
  • 9. When analysing content …
  • 10. An extract of medical restrictions text
  • 11. What is this content?!
    • Medical restrictions text
    • Free-text built in Word and hand-crafted (*grrr*)
    • Unclassified
    • Varied consistency within and between texts
    • Highly complex sentence structures in pseudo-legalese
    • Style reflects the author rather than the meaning in the communication
    • Content needed for re-use
    • Content output was needed for reuse by others
    • Multiple audiences
    • Multiple purposes for re-use
    • Codification
    • Codification by 3 rd parties (after authoring) takes too long
    • Need to reduce timeframes!
  • 12. The task . . .analyse and codify
  • 13. What tools would be appropriate?
    • ?
  • 14.
    • Linguistics
    • … a whole discipline devoted to the
    • study of language…
    preposition verb adjective noun determiner subject object conjunction semantics sentence structure all language has structure
  • 15. Language is like Lego™
    • Building blocks
    • Subject (S)
    • Verb (V)
    • Object (O)
    • Order of blocks
    • Differs depending on the language
  • 16. Language is like Lego™
    • SVO languages
    • English, French, Chinese, Bulgarian, Swahili
    • SOV
    • Japanese, Turkish, Korean
    • VSO
    • Classical Arabic, Celtic and Hawaiian
    • VOS
    • Fijian, Yoda’s amusing phrases
  • 17. Lego bricks: subjects, verbs and objects
    • Sometimes, though, the SVO structure is hidden:
    • “ The Lego is red” or
    • “ Those Lego bricks are [some] red Lego bricks” ?
    • Uncovering the hidden structure helps to differentiate between the subject and the object and identify the who and what
  • 18. Uncovering hidden meaning
    • If the LEGO trademark is used at all, it should always be used as an adjective , not as a noun .
    • For example, say
    • "MODELS BUILT OF LEGO BRICKS".
    • Never say
    • "MODELS BUILT OF LEGOs".
    • Source: http://everything2.com/title/legOS
  • 19. Lego trees…
  • 20. Semantic analysis
    • Medical restrictions wording:
    • Restricted benefit Gastro-oesophageal reflux disease; Scleroderma oesophagus;
    • Authority required Peptic ulcer
  • 21. Semantic analysis (cont.)
    • Actual sentence
    • Peptic ulcer
    • Implied sentence
    • The prescription of medicine is restricted to the initial treatment of patients with peptic ulcer
  • 22. Semantic structure of ‘peptic ulcer’
  • 23. Semantic model for restrictions text
  • 24. Semantics describing “Who Treated”
  • 25. Semantics describing “Authority Action”
  • 26. High-level semantic overview
  • 27. Yes, it can be codified!
    • Medical restrictions:
    • Did have structure
    • Did have underlying logic
    • Were based on repeatable business processes
    • Could be codified
    • Could we make a ‘system’ to reinforce the structure at the point of authoring?
  • 28. Demo
    • Putting it together in a system:
    • Supporting building of content restrictions in a codified way
    • Protyotyping with Axure
  • 29.  
  • 30.  
  • 31.  
  • 32.  
  • 33.  
  • 34.  
  • 35.  
  • 36.  
  • 37. The semantic analysis advantage vs
    • Identifies:
    • Themes in content
    • Identifies:
    • Themes in content
    • Work processes
    • Folk taxonomies used
    • ‘ Things’ written about
  • 38. What else could you use it for?
    • When you need to understand:
    • Business processes that create content
    • When you want to disassemble content for:
    • FAQs
    • A-Z indexes
    • Help files
  • 39. How can I add this to my toolbox??!
    • Theory is important
    • An understanding of semantics - sentence trees and grammar
    • Text books by authors like Fromkin and Rodman can help through the tricky bits
    • Need good tools
    • Connexor : http://www.connexor.eu/technology/machinese/demo/
    • Big sheets of paper (and an electronic whiteboard)
    • Visio (not PowerPoint!)
  • 40. Demo
    • Connexor:
    • http://www.connexor.eu/technology/machinese/demo/
  • 41. Connexor
  • 42. Connexor – machine tagger
  • 43. Connexor – machine syntax
  • 44. Why should I care about this?
    • Google uses semantic analysis to index content
    • Translation software uses semantic analysis to identify ‘components’ for translation
    • Good sentence structure equals:
      • Accurate indexing
      • Higher rank relevance of content
      • Happy people (they find what they’re looking for)
  • 45. Why should I care about this?
  • 46. ‘ Calais’ by Reuters
  • 47. Summing up
    • Content is still king!
    • But how can you tell if your content:
    • Is of good quality?
    • Matches your website’s categories?
    • Accurately reflects your metadata?
    • Can be found by people?
    • Semantic analysis can:
    • Make your content audits more objective
    • Inform processes to improve the quality of the content
    • Inform processes to improve search engine indexing
    • Inform metadata creation
    • Inform choice of taxonomy
  • 48. Take-home message
    • Semantic analysis can help IAs:
    • Infer
    • How people think about, and structure, their information
    • Describe
    • Business processes that produce content
    • Identify
    • Where content quality is poor so it can be improved
    • Critical components of the sentence for codification
    • Design
    • Taxonomies and describe folk taxonomies
    • Build
    • Systems to help bring some structure to content authoring
  • 49.
    • Fin
  • 50. IAs, Language and Lego™ an Introduction to Semantic Analysis
  • 51. by Matthew Hodgson Regional-lead, Web and Information Management SMS Management & Technology Canberra Australia
  • 52. by Matthew Hodgson Email [email_address] Blog magia3e.wordpress.com Slideshare www.slideshare.net/magia3e Twitter magia3e