What is IOA?


Published on

This presentation provides you with an overview of techniques and technologies for information organisation and access. The slides are from AIIM's IOA Certificate Program.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • IOA Practitioner Copyright AIIM
  • What is IOA?

    1. 1. What is Information Organization and Access?
    2. 2. www.aiim.org/training
    3. 3. How to find information? <ul><li>Option 1: </li></ul><ul><li>Search </li></ul><ul><li>65,400,000 hits when searching for “good red wine” </li></ul><ul><li>Little or no metadata / taxonomy </li></ul>
    4. 4. How to find information? <ul><li>Option 2: </li></ul><ul><li>Browsing </li></ul><ul><li>Ability to find the wine you need via 9 different categories </li></ul><ul><li>Requires metadata / taxonomy </li></ul>
    5. 5. What is IOA? <ul><li>IOA, or Information Organization and Access, consists of a content preparation process and a content search and access process </li></ul><ul><li>During the preparation content is captured, prepared, enriched, and indexed; </li></ul><ul><li>During the access process, someone searches for and accesses content </li></ul>
    6. 6. Parts of IOA <ul><li>Information Organization </li></ul><ul><li>Content Architecture </li></ul><ul><ul><li>Structure and composition of a repository, information collection, or individual document </li></ul></ul><ul><li>Content Intelligence </li></ul><ul><ul><li>Enriching content with additional information </li></ul></ul><ul><li>Information Access </li></ul><ul><li>Search and Retrieval </li></ul><ul><ul><li>Querying information sets and obtaining documents </li></ul></ul><ul><li>Findability </li></ul><ul><ul><li>Enhancing access to the right information </li></ul></ul>
    7. 7. How to organize information to improve access?
    8. 8. What is Content Intelligence? <ul><li>Adding “meaning” to information by structuring, classifying, and/or labeling the content so it is more findable by both people and technology </li></ul><ul><li>In short, enriching the content </li></ul><ul><ul><li>Metadata </li></ul></ul><ul><ul><ul><li>“ Data about the data” </li></ul></ul></ul><ul><ul><ul><li>Usually a discreet component </li></ul></ul></ul><ul><ul><li>Classification of content </li></ul></ul><ul><ul><li>Taxonomy </li></ul></ul><ul><ul><ul><li>Law for categorizing information </li></ul></ul></ul>
    9. 9. Metadata Fundamentals <ul><ul><li>Metadata consists of statements we make about resources to help us find, identify, use, manage, evaluate, and preserve them -- and perhaps dispose of them </li></ul></ul><ul><ul><li>Metadata building blocks </li></ul></ul><ul><ul><li>The basic unit of metadata is a statement </li></ul></ul><ul><ul><li>A statement consists of a property (aka, element) and a value </li></ul></ul><ul><ul><ul><li>e.g., The shirt has a color (property), which is blue (value) </li></ul></ul></ul><ul><ul><li>Metadata statements describe resources that can be used by content technologies </li></ul></ul><ul><ul><ul><li>e.g., Display all information that is about blue products </li></ul></ul></ul>
    10. 10. Categorizing Metadata Asset metadata – Who: Creator, Publisher, Contributor, Type, Format, Identifier Subject metadata – What, Where & Why: Subject, Title, Description, Coverage Relational metadata – Links between and to: Source, Relation Use metadata – When & How: Date, Language, Rights Source: Taxonomy Strategies, LLC
    11. 11. Where Value Emerges Asset metadata – Who: Creator, Publisher, Contributor, Type, Format, Identifier Subject metadata – What, Where & Why: Subject, Title, Description, Coverage Relational metadata – Links between and to: Source, Relation Use metadata – When & How: Date, Language, Rights More efficient content processing Better navigation & discovery Source: Taxonomy Strategies, LLC
    12. 12. Taxonomy and Content Management <ul><li>Taxonomies often act as a “great unifier” in the area of content technologies and enable them to work together </li></ul><ul><li>Many content management systems depend on solid metadata and taxonomy in order to add significant value </li></ul><ul><li>Taxonomy is a key enabler for ECM </li></ul><ul><ul><li>Essential for organizing any large content corpus </li></ul></ul><ul><ul><li>Required for meaningful records management </li></ul></ul><ul><ul><li>Critical to effective findability </li></ul></ul><ul><ul><li>Ideal way to represent logical hierarchy </li></ul></ul><ul><li>How you choose to design the taxonomy in the repository, and how the system you choose can use a taxonomy, greatly influence the business value you can realize </li></ul>
    13. 13. Understanding taxonomies <ul><li>A taxonomy is a classification scheme </li></ul><ul><ul><li>Such as the way that an individual classifies the content of their e-mail inbox, a personal cd collection, or the contents on an iPod </li></ul></ul><ul><li>A taxonomy is a knowledge map </li></ul><ul><ul><li>Reflects how it’s owner conceives a given body of content (a knowledge domain), for purposes of browsing, navigating, discovering, and sharing that information </li></ul></ul><ul><li>A taxonomy is semantic </li></ul><ul><ul><li>Indicating the relationships between concepts, such as the relationships between a car and a steering wheel, in that the steering wheel is a “part of” a car </li></ul></ul>Source: Organising Knowledge (Patrick Lambe, 2007)
    14. 14. Representations of taxonomies <ul><li>Lists </li></ul><ul><li>Trees </li></ul><ul><li>Hierarchies </li></ul><ul><li>Polyhierarchies </li></ul><ul><li>Matrices </li></ul><ul><li>Facets </li></ul><ul><li>System Maps </li></ul>© AIIM | All rights reserved Source: Organising Knowledge (Patrick Lambe, 2007) List Matrices Facets System Maps
    15. 15. What is a Vocabulary? <ul><li>Vocabularies represent potential metadata values </li></ul><ul><li>Vocabularies can be controlled or uncontrolled </li></ul><ul><ul><li>Controlled vocabularies: metadata must come from a set list (e.g. “Province”) </li></ul></ul><ul><ul><li>Uncontrolled vocabularies: metadata can be applied free-form (e.g. “Town”) </li></ul></ul><ul><li>“ Taxonomies” are a particular type of controlled vocabulary </li></ul><ul><ul><li>But not all controlled vocabularies are taxonomies </li></ul></ul><ul><ul><li>We’ll discuss taxonomies in the next module </li></ul></ul>
    16. 16. Why Use Controlled Vocabularies? <ul><li>It’s important to control vocabulary so your searchers don’t have to </li></ul><ul><li>Standards need to be set to minimize confusion among taggers/indexers </li></ul><ul><ul><li>Enforces terminological consistency </li></ul></ul><ul><ul><li>Reduces spelling mistakes </li></ul></ul><ul><ul><li>Enables interoperability </li></ul></ul>
    17. 17. What is a Thesaurus? <ul><li>Thesaurus : is a networked collection of controlled vocabulary terms, using associative rather than strict hierarchical relationships </li></ul><ul><ul><li>Often used to control synonyms across vocabularies or taxonomies </li></ul></ul><ul><ul><li>But more generally can identify the relationships among terms </li></ul></ul><ul><ul><ul><li>E.g. Equal to, Related to, Opposite of </li></ul></ul></ul><ul><li>Some examples from a hypothetical domain </li></ul><ul><ul><li>Lettuce = Greens = Fris ée (a.k.a, ‘a synonym ring’) </li></ul></ul><ul><ul><li>Coriander is related to Cilantro </li></ul></ul><ul><li>Thesauri can be enormously useful in an enterprise setting </li></ul><ul><ul><li>When different units have different taxonomies where systems need to cross-walk </li></ul></ul><ul><ul><li>When the enterprise cannot agree on a common vocabulary </li></ul></ul>
    18. 18. What is an Ontology? <ul><li>The formal definition of ontology is &quot; the specification of one's conceptualization of a knowledge domain ” </li></ul><ul><li>Semantic technologies are typically centered around ontologies </li></ul><ul><li>Ontologies: </li></ul><ul><ul><li>Resemble faceted taxonomies and often subsume thesauri, but employ richer semantic relationships among terms and attributes </li></ul></ul><ul><ul><li>Apply rules specifying terms and relationships </li></ul></ul><ul><ul><li>Do more than just control vocabulary </li></ul></ul><ul><ul><li>Are a knowledge representation </li></ul></ul><ul><li>Thus, an ontology for salad would contain the structure for how it relates to everything, from ingredients to growers to the rodents that might eat it, and how a salad is different in Japan vs. Italy </li></ul>
    19. 19. Ontology Example Capturing all the uses of ice cream… A complete ontology would account for more relationships and properties. Source: Roz Chast, The New Yorker
    20. 20. What’s a Topic Map? <ul><li>A topic map is a visual representation of a knowledge domain </li></ul><ul><li>Topic maps are an ISO standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The standard is formally known as ISO/IEC 13250:2003 </li></ul><ul><li>The topics that populate the map are an ontology – topic maps are thus ontology-driven – like the ice cream example </li></ul>
    21. 21. <ul><li>Folksonomy : the anti-controlled vocabulary. Collaborative vocabularies for tagging content, rarely with any sort of control </li></ul><ul><li>Relevance between metadata and content may be determined by users in a democratic fashion </li></ul><ul><ul><li>four users define an object as being “green” </li></ul></ul><ul><ul><li>one user defines an object as being “aqua” </li></ul></ul><ul><ul><li>relevance can be defined as &quot; more green than aqua ” </li></ul></ul><ul><li>Over time, clusters emerge and communities typically self-organize around them </li></ul><ul><ul><li>“ Wisdom of the crowd” </li></ul></ul><ul><li>Typically arise in Web-based communities where individuals to share content, then create and use tags (e.g., blogs) </li></ul><ul><li>Applied to enterprise use cases when there is a critical mass of taggers to make it worthwhile </li></ul><ul><ul><li>Can be a useful “bottom-up” approach to developing taxonomies </li></ul></ul>What is a Folksonomy?
    22. 22. Folksonomy Example Source: flickr.com
    23. 23. The importance of Findability
    24. 24. What is Findability? <ul><li>Findability is the quality of being locatable or navigable </li></ul><ul><li>At the core of IOA is the findability of information. Information should be easy to discover or locate </li></ul><ul><li>Information access is about helping users find documents that satisfy their information needs </li></ul><ul><li>Remember, someone may be looking for something they’ve never seen or touched before </li></ul><ul><li>Advanced information organization techniques can support findability </li></ul><ul><ul><li>Thesauri, Ontologies, Topic Maps and Semantic Networks </li></ul></ul><ul><ul><li>Faceted search and navigation </li></ul></ul>
    25. 25. Access via Browse <ul><li>Browsing is usually the first option for users seeking information or documents </li></ul><ul><ul><li>Desktop and enterprise file systems </li></ul></ul><ul><ul><li>Content management system repositories </li></ul></ul><ul><ul><li>Intranets and Websites </li></ul></ul><ul><li>If users can’t find via browse, then they resort to search </li></ul><ul><li>Some users will go straight to search </li></ul><ul><ul><li>This is partly generational </li></ul></ul>
    26. 26. Effective Browsing <ul><li>Browsing effectiveness is highly dependent on </li></ul><ul><ul><li>navigational structure </li></ul></ul><ul><ul><li>folder labeling </li></ul></ul><ul><ul><li>the location of the content </li></ul></ul><ul><ul><li>In short: depends on how organized the content is… </li></ul></ul><ul><li>Content technologies typically use “virtual folders” to represent different classifications </li></ul><ul><ul><li>These allow for multiple paths to the same content </li></ul></ul><ul><ul><li>In contrast: physical file system forces documents to a single “place” </li></ul></ul><ul><ul><li>Ideally content should be cross-referenced , but not duplicated </li></ul></ul>
    27. 27. Access via Search <ul><li>Search is an application or tool for finding information via search term </li></ul><ul><li>Search is omnipresent, and essential </li></ul><ul><ul><li>But: there is much ignorance about how search engines work </li></ul></ul><ul><ul><li>Most end-users shouldn’t need to know; they just assume “magic” </li></ul></ul><ul><li>Advanced display techniques can blur the line between search and browse </li></ul><ul><li>Search is not a magic bullet or effective panacea for lack of information organization </li></ul><ul><ul><li>Better-organized information will yield more effective search results </li></ul></ul>
    28. 28. How Enterprise Subsystems Work Together Source: CMS Watch
    29. 29. What Is An Effective Search Result? <ul><li>When a user finds what they are seeking </li></ul><ul><ul><li>Or not… </li></ul></ul><ul><ul><li>Seekers may find more than one answer </li></ul></ul><ul><li>Two ways to measure results effectiveness: </li></ul><ul><li>Precision and recall are frequently traded off in actual search implementations </li></ul><ul><ul><li>“ Tuning” for one can reduce the other… </li></ul></ul><ul><ul><li>Precision Percentage of all returns in a results set that are relevant to the query </li></ul></ul><ul><ul><li>Recall Percentage of relevant documents that were actually returned in the results set </li></ul></ul>
    30. 30. The Myriad of Search Choices <ul><li>Vendors recognise the importance of search </li></ul><ul><ul><li>Beware of how they push enterprise search as the answer to an organization’s need for a single, unified window into everything the organization knows at any point in time </li></ul></ul><ul><li>The ultimate knowledge management machine simply does not exist: the typical enterprise search system does not contain “all” the organization's content </li></ul><ul><li>Limitations on available information include: </li></ul><ul><ul><li>Security considerations </li></ul></ul><ul><ul><li>Inability to integrate specialized content </li></ul></ul><ul><ul><li>Difficulty reconciling structured and unstructured content </li></ul></ul><ul><ul><li>Cost, time, and difficulty required to incorporate diverse content repositories </li></ul></ul>
    31. 31. Current Trends in Search As search sector changes, distinctions among different “flavors” of search technology, features, and functions become more difficult to make. Source: CMS Watch
    32. 32. Federated Search <ul><li>The modern enterprise is not a monolith </li></ul><ul><li>Multiple information repositories </li></ul><ul><li>Multiple search engines </li></ul><ul><li>Need to search across information domains from a single query interface </li></ul><ul><ul><li>Federated search approaches are designed to accomplish this </li></ul></ul><ul><ul><li>Sometimes called “meta search” </li></ul></ul><ul><li>Two approaches to federated search </li></ul><ul><ul><li>Use the same search technology across information sets, but create separate indexes and merge results </li></ul></ul><ul><ul><li>Use multiple search technologies, passing query over heterogeneous indexes, and synthesizing multiple result sets (more common) </li></ul></ul>
    33. 33. Federated Search Example <ul><li>Has seen success on the public web </li></ul><ul><li>No security issues around public info </li></ul><ul><li>Limited set of file types </li></ul><ul><li>Better metadata can improve results merging </li></ul><ul><li>Example: “Merlot” </li></ul><ul><li>Meta search engine for education resources on the public web ( www.merlot.org ) </li></ul>
    34. 34. Challenges of Federated Search <ul><li>Federated search within the enterprise tends to be much harder </li></ul><ul><li>Multiple indexes mean multiple security systems to resolve </li></ul><ul><li>Different index and query approaches across search systems may skew results </li></ul><ul><li>Often prohibitive performance problems </li></ul><ul><ul><li>Results must be de-duped, transferred, merged, and ranked </li></ul></ul>
    35. 35. The Case for Text Mining <ul><li>Enterprises looking for better findability face two vexing challenges: </li></ul><ul><ul><li>How to yield metadata from large quantities of information? </li></ul></ul><ul><ul><li>How to turn “search” into more powerful navigation and discovery? </li></ul></ul><ul><li>Text Mining offers one answer </li></ul><ul><ul><li>Text mining is partly a more attractive marketing term for auto-classification – a term that aligns with the concept of “data mining” </li></ul></ul><ul><ul><li>But text mining takes auto-classification one step further through the discovery of more sophisticated patterns in text </li></ul></ul><ul><ul><li>However, there are many different approaches to text mining </li></ul></ul><ul><ul><li>Text mining is sometimes called “text analytics” or “content intelligence” </li></ul></ul>
    36. 36. How Text Mining Works <ul><li>Prior to indexing content, information is “discovered” or derived from a corpus of content </li></ul><ul><ul><li>The goal of text mining is to glean information from data, find patterns, and “separate signal from noise” </li></ul></ul><ul><ul><li>It does this by attempting to extract “entities” and “relationships” from text </li></ul></ul><ul><ul><li>Relevant information is usually derived through the divining of patterns and trends </li></ul></ul><ul><li>Text is then parsed (sometimes adding and removing certain pieces of text for the purposes of an index) </li></ul><ul><li>Typical text mining tasks can include auto-classification, clustering, concept/entity extraction, auto-categorization (production of taxonomies), document summarization, and entity relation modeling (i.e., learning relations between entities, as in an ontology) </li></ul><ul><ul><li>Different text mining tools tend to excel at one or just a handful of these approaches </li></ul></ul>
    37. 37. A Clustering Example Source: cwi.nl & Inxight
    38. 38. A Clustering Example Source: London Natural History Museum & Inxight
    39. 39. IOA Strategy IOA as a Practice IOA as a Project IOA Master For more information: AIIM IOA Certificate Program www.aiim.org/training
    40. 40. Find, Inventory, Analyze Content Metadata Taxonomy Ontologies and Topic Maps Content Modelling Introduction to Access Search Techniques Topics in Findability User Experience Of IOA Parts of IOA For more information: AIIM IOA Practitioner www.aiim.org/training