I Don’t Have Time for Metadata!

1,268 views
1,075 views

Published on

Presented by Bob Kasenchak of Access Innovations, Inc. at the 2014 Special Libraries Association (SLA) annual meeting in Vancouver, British Columbia on June 7, 2014.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,268
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

I Don’t Have Time for Metadata!

  1. 1. Bob Kasenchak Project Coordinator Access Innovations bob_kasenchak@accessinn.com @taxobob
  2. 2. DISCLAIMER I Don’t Have Time for Metadata!
  3. 3. OUTLINE • Data • Structured Data • Unstructured Data • Metadata • Subject Metadata • Entity (author, institution) Metadata • Document Type Metadata • Automating Metadata • Heuristic/Statistical/Inferential • Rule-based I Don’t Have Time for Metadata!
  4. 4. CASE STUDIES I Don’t Have Time for Metadata!
  5. 5. STRUCTURED VS. UNSTRUCTURED DATA Present different problems – and possible solutions – for automatically adding metadata I Don’t Have Time for Metadata!
  6. 6. STRUCTURED VS. UNSTRUCTURED DATA I Don’t Have Time for Metadata! Association,in view of abuses and lack of consistency in published reports, has asserted that the all-inclusive income statement,containing allincome items recognized as determinantsof net income, is the answer to these questions.2 The Securities and Exchange Commission has also strongly favored this solution.3 On the 1 Committeeon Accounting Procedure, American Instituteof Accountants, "Income and Earned Surplus," Accounting Research BulletinNo. 32 (December, 1947). 2 (1) "A TentativeStatementof Accounting Principles Affecting Corporate Reports," THE ACCOUNTING REvIEw, June, 1936, pp. 187-191; (2) Accounting
  7. 7. STRUCTURED VS. UNSTRUCTURED DATA I Don’t Have Time for Metadata! <volume>325</volume> <issue>5945</issue> <fpage seq="c">1206</fpage> <lpage>1206</lpage> <history><date date-type="received"><day>26</day><month>02</month><year>2009 </year></date><date date-type="accepted"><day>11</day><month>08</month> <year>2009</year></date></history> <permissions> <copyright-statement>Copyright © 2009</copyright-statement> <copyright-year>2009</copyright-year> <copyright-holder>Your name here</copyright-holder> </permissions> <abstract> <p>Our extended ontogenetic growth model is a theoretical model based on conservation of energy and general biological mechanisms underlying ontogenetic growth. We do not believe that the comments of Makarieva <italic>et al</italic>. and Sousa <italic>et al </italic>. expose substantive problems with our model. Nevertheless, they raise interesting, still unresolved questions and point to philosophical differences about the role of theory and of simple, general models as opposed to complicated, specific models.</p> </abstract>
  8. 8. STRUCTURED VS. UNSTRUCTURED DATA • Just extracting basic information • Author • Institution • Title • Document type • Accession number(s) …can be a challenge. However… I Don’t Have Time for Metadata!
  9. 9. STRUCTURED VS. UNSTRUCTURED DATA • Predictability • Positionality I Don’t Have Time for Metadata! Journal name/ Issue/Vol./etc. Article Title Copyright info Author info Abstract
  10. 10. UNSTRUCTURED DATA => STRUCTURED DATA! <journal>Transactions on Vehicular Technology</journal> <article-title>Relationship of Average Transmitted and Received Energies in Adaptive Transmission</article-title> <authors><author-surname>Kotelba</author-surname><author-firstname>Adrian</author- firstname><affiliation>Member, IEEE</affiliation></authors> <copyright-info><copyright-date>2009</copyright-date></copyright-info> <abstract><p>This paper studies the…</p></abstract> NOTE: Some cleanup may be required I Don’t Have Time for Metadata!
  11. 11. STRUCTURED VS. UNSTRUCTURED DATA • Basic information already tagged, labeled, and easy to extract • Author info • Title • Journal/Volume/Issue etc. • We can add semantic (or subject) metadata • Targeting only those parts of the text we require • Title • Abstract • Full text body • Exclude references, etc. I Don’t Have Time for Metadata!
  12. 12. SEMANTIC METADATA  Uncontrolled  Automatic keyword extraction  Crowdsourced/folksonomic tags  Controlled – from a Thesaurus (or Taxonomy…)  Inferential (Heuristic; Statistical)  Rule-based I Don’t Have Time for Metadata!
  13. 13. SEMANTIC METADATA: HOW?  Controlled – from a Thesaurus (or Taxonomy…)  Inferential (Heuristic; Statistical)  Rule-based  Manual tagging  Automatic tagging I Don’t Have Time for Metadata!
  14. 14. SEMANTIC METADATA: MANUAL ENTRY I Don’t Have Time for Metadata!
  15. 15. SEMANTIC METADATA: MANUAL ENTRY I Don’t Have Time for Metadata! A Thought Experiment • Let’s say a manual indexer can index 10 records/hour • Let’s say the manual indexers are perfectly consistent (they’re not) • Let’s say your manual indexers are paid $10/hour (good luck with that) If you have 10,000 articles/pieces of content: It would take a manual indexer 1000 hours (25 weeks) and cost $10,000 If you have 100,000 articles: It would take a manual indexer 10,000 hours (250 weeks, or almost 5 years) and cost $100,000 If you have 1,000,000 articles: It would take a manual indexer 100,000 hours (~48 years) and $1,000,000
  16. 16. SEMANTIC METADATA: AUTOMATED I Don’t Have Time for Metadata!
  17. 17. SEMANTIC METADATA: WHY?  Disambiguate the ambiguous  Specify most specific topics  Improve information retrieval  Search  Browse  Enable advanced analytics I Don’t Have Time for Metadata!
  18. 18. SEMANTIC METADATA: DISAMBIGUATION “Mercury” I Don’t Have Time for Metadata!
  19. 19. SEMANTIC METADATA: SPECIFICATION Beyond exact string matches: Synonymy Fiber optic gyroscopes Fiber optic gyros Fiber-optic gyroscopes Fiber-optic gyros Fibre optic gyroscopes Fibre optic gyros Fibre-optic gyroscopes Fibre-optic gyros Fiberoptic gyroscopes Fiberoptic gyros Optical fiber gyroscopes Optical fiber gyros Optical fibre gyroscopes Optical fibre gyros FOGs FOG’s I Don’t Have Time for Metadata!
  20. 20. SEMANTIC METADATA: SPECIFICATION Beyond exact string matches: Context. Matters.  Indexing to most specific term - Microscopes - Electron microscopes - Scanning electron microscopes I Don’t Have Time for Metadata!
  21. 21. SEMANTIC METADATA: WHY? Improving information retrieval (Search, Browse) SEARCH ≠ BROWSE I Don’t Have Time for Metadata!
  22. 22. SEMANTIC METADATA: WHY? Improving information retrieval: Search  Allows user to search by tags  Ensures consistent and reliable retrieval  Speeds electronic search I Don’t Have Time for Metadata!
  23. 23. SEMANTIC METADATA: WHY? Improving information retrieval: Search I Don’t Have Time for Metadata! Subject Metadata
  24. 24. SEMANTIC METADATA: WHY? Improving information retrieval: Search I Don’t Have Time for Metadata! Metadata-based Search Results Based on metadata
  25. 25. SEMANTIC METADATA: WHY? Improving information retrieval: Browse I Don’t Have Time for Metadata! Taxonomy browse Results Based on metadata
  26. 26. SEMANTIC METADATA: WHY? Improving information retrieval: Browse I Don’t Have Time for Metadata! Taxonomy browse Additional Search filters
  27. 27. SEMANTIC METADATA: WHY? Improving information retrieval: Analytics  Combine subject metadata with metadata about  Authors  Institutions  Publications (Journals, Magazines, etc.)  Publication Types …to create detailed informatics about your data, users, authors, and whatever else is relevant or useful I Don’t Have Time for Metadata!
  28. 28. SEMANTIC METADATA: WHY? Improving information retrieval: Analytics I Don’t Have Time for Metadata! Taxonomy term Narrower terms Broader Term(s) Authors who publish on this topic
  29. 29. I DON’T HAVE TIME FOR METADATA! I Don’t Have Time for Metadata! Since Metadata allows you to do things you already have want need to do: It’s always time for metadata.
  30. 30. Bob Kasenchak Project Coordinator Access Innovations bob_kasenchak@accessinn.com @taxobob Thank you!

×