Your SlideShare is downloading. ×
0
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Taxonomy Assessments - Part Two
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Taxonomy Assessments - Part Two

652

Published on

Part two of a two-part series on assessing your taxonomy through indexing. Presented by Dr. Jay Ven Eman at the 2012 Data Harmony User Group meeting on February 9, 2012 at the Access Innovations, Inc. …

Part two of a two-part series on assessing your taxonomy through indexing. Presented by Dr. Jay Ven Eman at the 2012 Data Harmony User Group meeting on February 9, 2012 at the Access Innovations, Inc. offices.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
652
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • PDF
  • Post processing“Labels” content itemBut also classifies author
  • Thanks to Helen Atkins of AACR for this illustration.The real power of this is that the links can all go in all directions, so we take advantage of having the user’s attention regardless of how they step into our “web”Continuing Medical Education (CME)
  • Johnny Carson
  • Transcript

    • 1. Taxonomy Assessments - Part Two February 9, 2012 Access Innovations, Inc. Leveraging Your Content Semantically Jay Ven Eman, Ph.D., CEO j_ven_eman@accessinn.com www.accessinn.com www.dataharmony.com +1.505.998.0800 Albuquerque, NM© 2012. Access Innovations, Inc. All rights reserved.
    • 2. Indexing Subject term assignment Permanent meta-data to indexed object Used for retrieval and evaluation Processes • Manual • Publisher • 3rd party aggregators • Authors • Automated methods © 2011. Access Innovations, Inc. All rights reserved.
    • 3. Integration / workflow API’s, Client/Server, Author Submission Web Services, HTTP-TCP/IP SystemBooks Content Repository “A” Or IntermediateConference ProcessesProceedings Content ETC. Repository “B”, etc. Thesaurus M.A.I. Master Web Data Harmony Sites MAIstro Server Classification System © 2011. Access Innovations, Inc. All rights reserved.
    • 4. Select the document collection CMS Please select the database and the the document directory to load © 2011. Access Innovations, Inc. All rights reserved.
    • 5. CMS© 2011. Access Innovations, Inc. All rights reserved.
    • 6. Sample unstructured document © 2011. Access Innovations, Inc. All rights reserved.
    • 7. Run the documents through a metadata extractionprocess to create well-formed, rich XML • Automatic (per doc template) • E.g. Dublin Core Metadata • Bibliographic citation © 2011. Access Innovations, Inc. All rights reserved.
    • 8. Automatically add the taxonomyterms Entity extraction: People, Places, Things Conceptual indexing: using the taxonomy © 2011. Access Innovations, Inc. All rights reserved.
    • 9. Classification Process or Assigned Indexing <Anchor><Date>09-14-11</Date>09-14-11 <TI>“Solving the Challenge”</TI>“Solving the Challenge” <BLH>By</BLH>By Jay Ven Eman <Author> <AU_FN>Jay</AU_FN>The process of indexing <AU_MI></AU_MI>a content object begins <AU_LN>Ven Eman</AU_LN>with… </Author> <Body>The process of indexing a content object begins with…</Body> <Subject>Indexing</Subject> <Subject>Thesauri</Subject> <Subject>Standards</Subject> <Subject>Classification</Subject> Unstructured </Anchor> Structured Thesaurus M.A.I. Master Content Data Harmony Repository MAIstro Server e.g. Database Classification System © 2011. Access Innovations, Inc. All rights reserved.
    • 10. Indexing Indexing measures • Indexing experts • Subject matter experts (SME) • Hits, misses, & noise • 85% hits In conjunction with taxonomy measures • Over & under used terms • Over & under indexed content © 2011. Access Innovations, Inc. All rights reserved.
    • 11. Indexing & Search Metrics Hit, Miss, Noise Subjective • Relevance • Aboutness Statistical • Precision • Recall • Level of effort © 2011. Access Innovations, Inc. All rights reserved.
    • 12. Hit, Miss, Noise Hit – exactly what a human indexer would use Miss – human indexer would use, but system did not assign Noise – system assigned, but human did not • Relevant noise – could have been assigned • Irrelevant noise – just plain wrong © 2011. Access Innovations, Inc. All rights reserved.
    • 13. Subjective Relevance • Reflects how akin it is to the users request “Aboutness” • Reflects the topical match between the document content and the term • How well the topic describes what the document is about Varies with level of conceptual terms vs. factual terms in the thesaurus © 2011. Access Innovations, Inc. All rights reserved.
    • 14. Indexing All content types & sources • Inventory control • Everything in, everything out Document types • Articles • Proceedings • Corporate © 2011. Access Innovations, Inc. All rights reserved.
    • 15. Link to Community Resources(Source: Helen Atkins, AACR) CME Upcoming Other Activity on Conference Journal Topic A on Topic A Articles on Topic A Job Posting Journal for Expert Article on on Topic A Topic A Grant Available Podcast Interview for Researchers with Researcher Working on Working on Topic A Topic A Author Networks Social Networking SME – Topic A © 2011. Access Innovations, Inc. All rights reserved.
    • 16. Indexing with Data Harmony® M.A.I.™ Rule base development • 80/20 rule • Indexing objectives GUI Time-to-market • Level of effort to build • Level of effort to maintain • Less than all other alternatives when indexing for high precision & recall © 2011. Access Innovations, Inc. All rights reserved.
    • 17. Updating Rule Base Automatic for matching rules when using Data Harmony MAIstro™ 80/20 rule Re-index when 5% to 10% changes to taxonomy – arbitrary ranges: • Monthly with small databases – 5k to 20k • Quarterly with medium – 20k to 1 million • Annual with large – greater than 1 million Depends on search software, too © 2011. Access Innovations, Inc. All rights reserved.
    • 18. NAMES© 2012. Access Innovations, Inc. All rights reserved.
    • 19. What’s in a name? Juliet:
"Whats in a name? That which we call a rose By any other name would smell as sweet." Romeo and Juliet (II, ii, 1-2) © 2011. Access Innovations, Inc. All rights reserved.
    • 20. © 2012. Access Innovations, Inc. All rights reserved.
    • 21. Magnitude of the Problem:Facebook - 700 Million Users Projected for 2011(Open-First) 700 Million Names How will your boss, peers, anyone ever find you? © 2012. Access Innovations, Inc. All rights reserved.
    • 22. What’s in a name? My name Jay Ven Eman Ven Eman, Jay <First_Name>Jay</First_Name> <Last_Name>Ven Eman</Last_Name> Name variants  Aliases Jay Von Eman William Henry McCarty Jay Van Eman Henry Antrim Jay van Eman William H. Bonney Jay ven Eman Billy the Kid Jay Veneman  National & Cultural Jay Venema Conventions © 2011. Access Innovations, Inc. All rights reserved.
    • 23. Names Computationally & editorially intense Author submissions Membership records & the like Industry initiatives – ORCID, VIVO Subject term disambiguation Inventory control basics apply here, too Difficulty level is high Constance maintenance needed © 2011. Access Innovations, Inc. All rights reserved.
    • 24. Taxonomy Assessments - Part Two February 9, 2012 Thank you! Questions? Access Innovations, Inc. Leveraging Your Content Semantically Jay Ven Eman, Ph.D., CEO j_ven_eman@accessinn.com www.accessinn.com www.dataharmony.com +1.505.998.0800 Albuquerque, NM© 2012. Access Innovations, Inc. All rights reserved.

    ×