Tales from the Field:    Implementing Information Theory   SIG CR - 2012Marjorie Hlava, PresidentAccess Innovations, Inc. ...
Implementing Information Theory   The case of the missing abstracts   Russian information   US PTO   Getty adventures...
The Bleeding Edge   Figure Out the client needs   Figure out the specifications   Get approval on the specifications  ...
The Case of Missing Abstracts Tests showed that just searching the indexing   did not provide the full answers users   wan...
Specifications Need 99.998% accuracy for user acceptance Left tagged ASCII Office in Mexico City – Access de Mexico Triple...
Access de Mexico7:17 Am Shift changeSeptember 19,19858.7 earthquake
CAS to Philippines        Limo from the airport with the remaining volumesTyphoon DotOctober 12, 1985Clark Air Force base ...
Jamaica Hurricane Kate November 1985 4 inches of water in the computer room No power on the island
Beijing China November 1985   NOTHING HAPPENED   Finished   On time   Under budget   At promised accuracy level   Cl...
Russian Information
Implementing Information Theory Viniti Maxwell Information map PDP-8’s Microfilm machines – no batteries Glastnof – open b...
Payments incash in ourshoes
Puzzles, Keys, and Digitization Photocomposition keys   Science typographers   Puzzles – SGML Encyclopaedia Britannica Mar...
US PTO Conversions   Scan at 300 dpi   OCR to 97%   5,400,000 patents   Create the machines   Testy   QC algorithms...
Perugia Bible 12” VideoDisc
British Library Map Collection225,000 maps pre-1850From printed catalog todigital catalog
Getty AAT to AATA
Success - Failure - Future   Successes    •   Chemical Abstracts    •   USPTO    •   Getty AATA    •   British Map Collec...
All projects use classification   To organize the job   To organize the information   To allow the finding of the items...
Theoretical Underpinnings   Outlines of Knowledge    •   Thomas Aquinas    •   John Knox (Bacon)    •   Morton Taube - En...
Theory of knowledge…. began early   Plato et al. - BC       Knowledge of reality is philosophy   Realism       St. Aug...
Theory of knowledge   William of Occam (or Ockham) –       c. 1288 – c. 1348   Nominalism - Universals are    represent...
Theory of knowledge   The Knower (Subject)   The Known (Object)   Knowing (a subjective process)   An act, a process, ...
The basis of knowledge   René Descartes 1596 - 1650        Separate what is known - philosophy        From new knowledg...
Conditions for knowledge        John Locke - 1632 - 1704          “A sailor needs to know the length of            a lin...
John Locke 1632 - 1704                               Classification of kinds                               of knowledge   ...
Outlines of knowledge   Carl Linnaeus 1707 – 1778       Placed plants in categories       Systematized the three kingdo...
Epistemology   James Frederick Ferrier 1808 - 1864   Analyzing the nature of knowledge   How it relates to connected no...
Personification of knowledge                                                    (Greek Επιστημη, Episteme)                ...
Philosophy of knowledgedivides   20th century thought       Memory       Perception and memory       Religion       L...
Rise of Classification   Charles Ami Cutter 1837 - 1903            Cutter Classification System   Melville Dewey 1851 -...
Charles Ammi Cutter   Harvard College,   index catalog,       using cards instead of published volumes,       an autho...
Thesauri   Philo of Byblos Herennius Philon; c. 64-    141 AD   Sanskrit, the Amarakosha 4th century    verse   Rogets ...
Points of knowledge   Single point of knowledge       Eve and the apple       First organism       All science       ...
Points of knowledge   Multiple points of origin       Several fields come together       Top terms       Should they b...
Information access is changing   Teletype   Fax   Online   CD-ROM   Downloading   Internet
The players are changing   Standalone publishers   Aggregators   Serials and book vendors   Hosting services   Cloud...
The formats are changing    Handwritten    Gutenberg    Linotype    Web Presses     •   Photocomposition         Digit...
Search is (finally) changing                                                                   Stairs   Online search   ...
Tagging is still debated   Permuted Indexes    •   Chem abs    •   Bio abs    •   Portals   Permaterm indexes    •   IFI...
Horizons are more complicated   Field formatted data   Relational and SQL databases   Object oriented systems   Semant...
Formats just keep being added   Photocomposition markup   SGML   XML   JSON CallsStorage keeps changing   Big iron  ...
Telecommunications tries tokeep up   Party lines   Direct connect lines   Trunk lines   Fiber optic   Cell towers   ...
Media   Punch cards   9 track tapes   Mountain tapes   Removable drives   Diskettes    •   8” –    •   5.25 –    •   ...
Indexes   Pre-coordinate    •   Back of the book    •   Subject headings   Post-coordinate   Bayesian   Co-occurrence...
Now   Changing the way we learn   Changing the way we find things   Easier to manipulate what we know    •   http://www...
Future   Information any place, any time   A great big mess - Unless we corral it.    •   Tag it,    •   Clean it,    • ...
The informationexplosion has just     begun© 2010. Access Innovations, Inc. All Rights Reserved.
We should         all be part of itQuestions?Marjorie M.K. HlavaPresidentAccess Innovations, Inc.Mhlava@accessinn.com505-9...
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Tales From the Field: Implementing Information Technology
Upcoming SlideShare
Loading in...5
×

Tales From the Field: Implementing Information Technology

640

Published on

Presented by Marjorie Hlava, president of Access Innovations, Inc., at the American Society for Information Science and Technology's 23rd Annual SIG/CR Classification Research Workshop on October 26, 2012.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
640
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Tales From the Field: Implementing Information Technology"

  1. 1. Tales from the Field: Implementing Information Theory SIG CR - 2012Marjorie Hlava, PresidentAccess Innovations, Inc. www.accessinn.com
  2. 2. Implementing Information Theory The case of the missing abstracts Russian information US PTO Getty adventures Vatican bibles Past basics Thoughts on directions
  3. 3. The Bleeding Edge Figure Out the client needs Figure out the specifications Get approval on the specifications Figure out how to deliver the data following the specs Quality control the data delivery …. But then life happens
  4. 4. The Case of Missing Abstracts Tests showed that just searching the indexing did not provide the full answers users wanted. Searching the titles and abstracts as well would improve search Enough space could be found on servers if the data was moved to in-house from Dialog and Orbit. New platform going into production New format – Messenger Specifications written, test file approved
  5. 5. Specifications Need 99.998% accuracy for user acceptance Left tagged ASCII Office in Mexico City – Access de Mexico Triple key - double proof Two sets of volumes 792,000 abstract tapes destroyed 1970 – 1982 data
  6. 6. Access de Mexico7:17 Am Shift changeSeptember 19,19858.7 earthquake
  7. 7. CAS to Philippines Limo from the airport with the remaining volumesTyphoon DotOctober 12, 1985Clark Air Force base evacuatedPower out for weeks
  8. 8. Jamaica Hurricane Kate November 1985 4 inches of water in the computer room No power on the island
  9. 9. Beijing China November 1985 NOTHING HAPPENED Finished On time Under budget At promised accuracy level Client said “ when I read your contract I thought you had an unusual level of detail on the Acts of God clauses…. But I didn’t expect you to use every one of them!”
  10. 10. Russian Information
  11. 11. Implementing Information Theory Viniti Maxwell Information map PDP-8’s Microfilm machines – no batteries Glastnof – open but no trust
  12. 12. Payments incash in ourshoes
  13. 13. Puzzles, Keys, and Digitization Photocomposition keys Science typographers Puzzles – SGML Encyclopaedia Britannica Marquis Who’s Who Designing the Chicago Research and trading “desks”
  14. 14. US PTO Conversions Scan at 300 dpi OCR to 97% 5,400,000 patents Create the machines Testy QC algorithms Display image Search dirty OCR Spell right once in 30 pages = findable
  15. 15. Perugia Bible 12” VideoDisc
  16. 16. British Library Map Collection225,000 maps pre-1850From printed catalog todigital catalog
  17. 17. Getty AAT to AATA
  18. 18. Success - Failure - Future Successes • Chemical Abstracts • USPTO • Getty AATA • British Map Collection Failures • Access Russia • Ipsoa Video Disk • MAI Mail
  19. 19. All projects use classification To organize the job To organize the information To allow the finding of the items once digital Apply term tags • thesaurus and controlled Apply notation • Not necessarily classification • Just reflects the content The classification is NEVER done • Needs to reflect the ever-changing data
  20. 20. Theoretical Underpinnings Outlines of Knowledge • Thomas Aquinas • John Knox (Bacon) • Morton Taube - Encyclopaedia Britannica Organization of Knowledge • Cutter – 1896 • COSATI – 1964 • Alvin Weinberg • Cranfield Institute papers • Cleverton, Aitcheson, Vickery
  21. 21. Theory of knowledge…. began early Plato et al. - BC  Knowledge of reality is philosophy Realism  St. Augustine 354 - 430 AD  St. Thomas Aquinas 1225 -1274 AD  Characteristics common in particulars  Not the same object without them 38 © 2010. Access Innovations, Inc. All Rights Reserved.
  22. 22. Theory of knowledge William of Occam (or Ockham) –  c. 1288 – c. 1348 Nominalism - Universals are represented by words Conceptualism - Universals are general concepts, mind dependent, formed by extraction from particular experiences 39 © 2010. Access Innovations, Inc. All Rights Reserved.
  23. 23. Theory of knowledge The Knower (Subject) The Known (Object) Knowing (a subjective process) An act, a process, or a concept Facts or perception? Yes or no answers 40 © 2010. Access Innovations, Inc. All Rights Reserved.
  24. 24. The basis of knowledge René Descartes 1596 - 1650  Separate what is known - philosophy  From new knowledge - science  Conditions of reason, suspension of belief  Je pense donc je suis  Cogito, ergo sum (from Socrates)  I think, therefore I am  Cartesian 41 © 2010. Access Innovations, Inc. All Rights Reserved.
  25. 25. Conditions for knowledge  John Locke - 1632 - 1704  “A sailor needs to know the length of a line he has available before he goes out to sound the ocean with it.” - J. Locke  Acquire knowledge of reality  Establish the conditions needed to acquire knowledge  Establish possible extent and limitations of knowledge 42 © 2010. Access Innovations, Inc. All Rights Reserved.
  26. 26. John Locke 1632 - 1704 Classification of kinds of knowledge Some Thoughts Concerning Education 43© 2010. Access Innovations, Inc. All Rights Reserved.
  27. 27. Outlines of knowledge Carl Linnaeus 1707 – 1778  Placed plants in categories  Systematized the three kingdoms of nature  Replaced “natural systems” classification Immanuel Kant 1724 - 1804  A posteriori and a priori judgments  A posteriori and a priori concepts  Outline of knowledge The nature of this distinction has been disputed by various philosophers; however, the terms may be roughly defined as follows: A priori knowledge is knowledge that is known independently of experience (that is, it is non-empirical, or arrived at beforehand, usually by reason). A posteriori knowledge is knowledge that is known by experience (that is, it is empirical, or arrived at afterward). 44 © 2010. Access Innovations, Inc. All Rights Reserved.
  28. 28. Epistemology James Frederick Ferrier 1808 - 1864 Analyzing the nature of knowledge How it relates to connected notions  truth, belief, justification The means of production of knowledge Skepticism about different knowledge claims http://en.wikipedia.org/wiki/Epistemology 45 © 2010. Access Innovations, Inc. All Rights Reserved.
  29. 29. Personification of knowledge (Greek Επιστημη, Episteme) in Celsus Library in Ephesus, Turkey. Epistemology from Greek ἐπιστήμη – epistēmē, "knowledge, science" + λόγος, "logos") or theory of knowledge is the branch of philosophy concerned with the nature and scope (limitations) of knowledge. It addresses the questions: What is knowledge? How is knowledge acquired? How do we know what we know? 46© 2010. Access Innovations, Inc. All Rights Reserved.
  30. 30. Philosophy of knowledgedivides 20th century thought  Memory  Perception and memory  Religion  Linguistic analysis  Classification of knowledge  Vocabulary control  Linguistic analysis 47 © 2010. Access Innovations, Inc. All Rights Reserved.
  31. 31. Rise of Classification Charles Ami Cutter 1837 - 1903  Cutter Classification System Melville Dewey 1851 - 1931  Dewey Decimal Classification Vladimir Lenin 1870 – 1924  Rubricon - Russia  Rubricator S. R. Ranganathan – India,1892 – 1972  Faceted Classification System  Colonicity 48 © 2010. Access Innovations, Inc. All Rights Reserved.
  32. 32. Charles Ammi Cutter Harvard College, index catalog,  using cards instead of published volumes,  an author index  and a “classed catalog” or subject index. Expansive Classification System (Cutter)  seven levels of classification,  each with increasing specificity  use lower levels and still be specific 49 © 2010. Access Innovations, Inc. All Rights Reserved.
  33. 33. Thesauri Philo of Byblos Herennius Philon; c. 64- 141 AD Sanskrit, the Amarakosha 4th century verse Rogets Thesaurus, 1805  by Peter Mark Roget, and published in 1852 COSATI - 1964  TEST - 1967 50 © 2010. Access Innovations, Inc. All Rights Reserved.
  34. 34. Points of knowledge Single point of knowledge  Eve and the apple  First organism  All science  Examples  Linnean system  Rubricator  Locke system  Dewey 51 © 2010. Access Innovations, Inc. All Rights Reserved.
  35. 35. Points of knowledge Multiple points of origin  Several fields come together  Top terms  Should they be captured separately or together?  Facets or different views?  Anarchy in the universe  Examples  Physical biochemistry  NICEM  Engineering  Cutter, COSATI, Ranganathan 52 © 2010. Access Innovations, Inc. All Rights Reserved.
  36. 36. Information access is changing Teletype Fax Online CD-ROM Downloading Internet
  37. 37. The players are changing Standalone publishers Aggregators Serials and book vendors Hosting services Cloud Disaggregation Everyone is an author Loss of quality, accuracy, review
  38. 38. The formats are changing  Handwritten  Gutenberg  Linotype  Web Presses • Photocomposition Digital layout Desktop publishing Web publishing
  39. 39. Search is (finally) changing  Stairs Online search  Elhill Boolean search  Orbit Cached search  String search Bayesian  Verity  Co-occurrence  Neural nets  Fast  Machine learning  Lucene Faceted (fielded)  Muse Global Rules systems  Perfect Search © 2010. Access Innovations, Inc. All Rights Reserved.
  40. 40. Tagging is still debated Permuted Indexes • Chem abs • Bio abs • Portals Permaterm indexes • IFI Predicasts • Classification systems LC • Thesauri Inverted files Triples
  41. 41. Horizons are more complicated Field formatted data Relational and SQL databases Object oriented systems Semantic web Linked data
  42. 42. Formats just keep being added Photocomposition markup SGML XML JSON CallsStorage keeps changing Big iron Server farms Cloud farms
  43. 43. Telecommunications tries tokeep up Party lines Direct connect lines Trunk lines Fiber optic Cell towers Wireless
  44. 44. Media Punch cards 9 track tapes Mountain tapes Removable drives Diskettes • 8” – • 5.25 – • 3.5 • Flash drives • Chips
  45. 45. Indexes Pre-coordinate • Back of the book • Subject headings Post-coordinate Bayesian Co-occurrence Neural nets Machine learning Rules systems
  46. 46. Now Changing the way we learn Changing the way we find things Easier to manipulate what we know • http://www.youtube.com/watch?v=B8ofWFx5 25s Comprehensive information / invasive • http://www.youtube.com/watch?v=RNJl9EEc soE People now know what search is.
  47. 47. Future Information any place, any time A great big mess - Unless we corral it. • Tag it, • Clean it, • Weed it • Curate it Everyone is creating content
  48. 48. The informationexplosion has just begun© 2010. Access Innovations, Inc. All Rights Reserved.
  49. 49. We should all be part of itQuestions?Marjorie M.K. HlavaPresidentAccess Innovations, Inc.Mhlava@accessinn.com505-998-0800

×