SlideShare a Scribd company logo
1 of 66
Tales from the Field:
    Implementing
 Information Theory
   SIG CR - 2012


Marjorie Hlava, President
Access Innovations, Inc.
      www.accessinn.com
Implementing Information Theory

   The case of the missing abstracts
   Russian information
   US PTO
   Getty adventures
   Vatican bibles
   Past basics
   Thoughts on directions
The Bleeding Edge
   Figure Out the client needs
   Figure out the specifications
   Get approval on the specifications
   Figure out how to deliver the data
    following the specs
   Quality control the data delivery

   …. But then life happens
The Case of Missing Abstracts
 Tests showed that just searching the indexing
   did not provide the full answers users
   wanted. Searching the titles and abstracts as
   well would improve search
 Enough space could be found on servers if the
   data was moved to in-house from Dialog and
   Orbit.
 New platform going into production
 New format – Messenger
 Specifications written, test file approved
Specifications


 Need 99.998% accuracy for user acceptance
 Left tagged ASCII
 Office in Mexico City – Access de Mexico
 Triple key - double proof
 Two sets of volumes
 792,000 abstract tapes destroyed
 1970 – 1982 data
Access de
 Mexico




7:17 Am Shift change
September 19,1985
8.7 earthquake
CAS to Philippines



        Limo from the airport with the remaining volumes



Typhoon Dot
October 12, 1985
Clark Air Force base evacuated
Power out for weeks
Jamaica


 Hurricane Kate November 1985
 4 inches of water in the
 computer room
 No power on the island
Beijing China November 1985
   NOTHING HAPPENED
   Finished
   On time
   Under budget
   At promised accuracy level
   Client said “ when I read your contract I
    thought you had an unusual level of detail
    on the Acts of God clauses….
   But I didn’t expect you to use every one of
    them!”
Russian Information
Implementing Information Theory
 Viniti Maxwell
 Information map
 PDP-8’s
 Microfilm machines – no batteries
 Glastnof – open but no trust
Payments in
cash in our
shoes
Puzzles, Keys, and Digitization

 Photocomposition keys
   Science typographers
   Puzzles – SGML
 Encyclopaedia Britannica
 Marquis Who’s Who
 Designing the Chicago Research and trading
   “desks”
US PTO Conversions
   Scan at 300 dpi
   OCR to 97%
   5,400,000 patents
   Create the machines
   Testy
   QC algorithms
   Display image
   Search dirty OCR
   Spell right once in 30 pages = findable
Perugia Bible 12” VideoDisc
British Library Map Collection
225,000 maps pre-1850
From printed catalog to
digital catalog
Getty AAT to AATA
Success - Failure - Future
   Successes
    •   Chemical Abstracts
    •   USPTO
    •   Getty AATA
    •   British Map Collection
   Failures
    •   Access Russia
    •   Ipsoa Video Disk
    •   MAI Mail
All projects use classification
   To organize the job
   To organize the information
   To allow the finding of the items once digital
   Apply term tags
    •   thesaurus and controlled
   Apply notation
    •   Not necessarily classification
    •   Just reflects the content
   The classification is NEVER done
    •   Needs to reflect the ever-changing data
Theoretical Underpinnings
   Outlines of Knowledge
    •   Thomas Aquinas
    •   John Knox (Bacon)
    •   Morton Taube - Encyclopaedia Britannica
   Organization of Knowledge
    •   Cutter – 1896
    •   COSATI – 1964
        •   Alvin Weinberg
    •   Cranfield Institute papers
        •   Cleverton, Aitcheson, Vickery
Theory of knowledge
…. began early
   Plato et al. - BC
       Knowledge of reality is philosophy
   Realism
       St. Augustine 354 - 430 AD
       St. Thomas Aquinas 1225 -1274 AD
       Characteristics common in particulars
       Not the same object without them


                                                                38
        © 2010. Access Innovations, Inc. All Rights Reserved.
Theory of knowledge
   William of Occam (or Ockham) –
       c. 1288 – c. 1348
   Nominalism - Universals are
    represented by words
   Conceptualism - Universals are general
    concepts, mind dependent, formed by
    extraction from particular experiences

                                                                39
        © 2010. Access Innovations, Inc. All Rights Reserved.
Theory of knowledge
   The Knower (Subject)
   The Known (Object)
   Knowing (a subjective process)
   An act, a process, or a concept
   Facts or perception?
   Yes or no answers


                                                              40
      © 2010. Access Innovations, Inc. All Rights Reserved.
The basis of knowledge
   René Descartes 1596 - 1650
        Separate what is known - philosophy
        From new knowledge - science
        Conditions of reason, suspension of belief
        Je pense donc je suis
        Cogito, ergo sum (from Socrates)
        I think, therefore I am
        Cartesian


                                                                41
        © 2010. Access Innovations, Inc. All Rights Reserved.
Conditions for knowledge
        John Locke - 1632 - 1704
          “A sailor needs to know the length of
            a line he has available before he
            goes out to sound the ocean with it.”
            - J. Locke
        Acquire knowledge of reality
        Establish the conditions needed to
         acquire knowledge
        Establish possible extent and
         limitations of knowledge
                                                              42
      © 2010. Access Innovations, Inc. All Rights Reserved.
John Locke 1632 - 1704
                               Classification of kinds
                               of knowledge

                                      Some Thoughts
                                      Concerning
                                      Education

                                                         43

© 2010. Access Innovations, Inc. All Rights Reserved.
Outlines of knowledge
   Carl Linnaeus 1707 – 1778
       Placed plants in categories
       Systematized the three kingdoms of nature
       Replaced “natural systems” classification
   Immanuel Kant 1724 - 1804
       A posteriori and a priori judgments
       A posteriori and a priori concepts
       Outline of knowledge
   The nature of this distinction has been disputed by various philosophers; however, the
    terms may be roughly defined as follows:
   A priori knowledge is knowledge that is known independently of experience (that is, it
    is non-empirical, or arrived at beforehand, usually by reason).
   A posteriori knowledge is knowledge that is known by experience (that is, it is
    empirical, or arrived at afterward).                                            44
             © 2010. Access Innovations, Inc. All Rights Reserved.
Epistemology
   James Frederick Ferrier 1808 - 1864
   Analyzing the nature of knowledge
   How it relates to connected notions
       truth, belief, justification
   The means of production of knowledge
   Skepticism about different knowledge
    claims
   http://en.wikipedia.org/wiki/Epistemology
                                                                45

        © 2010. Access Innovations, Inc. All Rights Reserved.
Personification of knowledge
                                                    (Greek Επιστημη, Episteme)
                                                    in Celsus Library in
                                                    Ephesus, Turkey.


                                                    Epistemology
                                                    from Greek ἐπιστήμη – epistēmē,
                                                    "knowledge, science" + λόγος, "logos")

                                                    or theory of knowledge
                                                    is the branch of philosophy
                                                    concerned with the nature and scope
                                                    (limitations) of knowledge.
                                                    It addresses the questions:
                                                              What is knowledge?
                                                              How is knowledge acquired?
                                                              How do we know what we know?
                                                                                         46

© 2010. Access Innovations, Inc. All Rights Reserved.
Philosophy of knowledge
divides
   20th century thought
       Memory
       Perception and memory
       Religion
       Linguistic analysis
       Classification of knowledge
              Vocabulary control
              Linguistic analysis
                                                                47
        © 2010. Access Innovations, Inc. All Rights Reserved.
Rise of Classification
   Charles Ami Cutter 1837 - 1903
            Cutter Classification System
   Melville Dewey 1851 - 1931
            Dewey Decimal Classification
   Vladimir Lenin 1870 – 1924
            Rubricon - Russia
            Rubricator
   S. R. Ranganathan – India,1892 – 1972
            Faceted Classification System
            Colonicity                                         48
        © 2010. Access Innovations, Inc. All Rights Reserved.
Charles Ammi Cutter
   Harvard College,
   index catalog,
       using cards instead of published volumes,
       an author index
       and a “classed catalog” or subject index.
   Expansive Classification System (Cutter)
       seven levels of classification,
       each with increasing specificity
       use lower levels and still be specific                  49

        © 2010. Access Innovations, Inc. All Rights Reserved.
Thesauri
   Philo of Byblos Herennius Philon; c. 64-
    141 AD
   Sanskrit, the Amarakosha 4th century
    verse
   Roget's Thesaurus, 1805
       by Peter Mark Roget, and published in 1852
   COSATI - 1964
       TEST - 1967
                                                                50

        © 2010. Access Innovations, Inc. All Rights Reserved.
Points of knowledge
   Single point of knowledge
       Eve and the apple
       First organism
       All science
       Examples
              Linnean system
              Rubricator
              Locke system
              Dewey
                                                                51
        © 2010. Access Innovations, Inc. All Rights Reserved.
Points of knowledge
   Multiple points of origin
       Several fields come together
       Top terms
       Should they be captured separately or together?
       Facets or different views?
       Anarchy in the universe
       Examples
               Physical biochemistry
               NICEM
               Engineering
       Cutter, COSATI, Ranganathan

                                                                    52
            © 2010. Access Innovations, Inc. All Rights Reserved.
Information access is changing
   Teletype
   Fax
   Online
   CD-ROM
   Downloading
   Internet
The players are changing
   Standalone publishers
   Aggregators
   Serials and book vendors
   Hosting services
   Cloud
   Disaggregation
   Everyone is an author
   Loss of quality, accuracy, review
The formats are changing
    Handwritten
    Gutenberg
    Linotype
    Web Presses
     •   Photocomposition
         Digital layout
     Desktop publishing
     Web publishing
Search is (finally) changing
                                                                   Stairs
   Online search
                                                                   Elhill
   Boolean search
                                                                   Orbit
   Cached search
                                                                   String search
   Bayesian
                                                                   Verity
       Co-occurrence
       Neural nets
                                                                   Fast
       Machine learning                                           Lucene
   Faceted (fielded)                                              Muse Global
   Rules systems                                                  Perfect Search

        © 2010. Access Innovations, Inc. All Rights Reserved.
Tagging is still debated
   Permuted Indexes
    •   Chem abs
    •   Bio abs
    •   Portals
   Permaterm indexes
    •   IFI Predicasts
    •   Classification systems LC
    •   Thesauri
   Inverted files
   Triples
Horizons are more complicated
   Field formatted data
   Relational and SQL databases
   Object oriented systems
   Semantic web
   Linked data
Formats just keep being added
   Photocomposition markup
   SGML
   XML
   JSON Calls

Storage keeps changing
   Big iron
   Server farms
   Cloud farms
Telecommunications tries to
keep up
   Party lines
   Direct connect lines
   Trunk lines
   Fiber optic
   Cell towers
   Wireless
Media
   Punch cards
   9 track tapes
   Mountain tapes
   Removable drives
   Diskettes
    •   8” –
    •   5.25 –
    •   3.5
    •   Flash drives
    •   Chips
Indexes
   Pre-coordinate
    •   Back of the book
    •   Subject headings
   Post-coordinate
   Bayesian
   Co-occurrence
   Neural nets
   Machine learning
   Rules systems
Now
   Changing the way we learn

   Changing the way we find things
   Easier to manipulate what we know
    •   http://www.youtube.com/watch?v=B8ofWFx5
        25s
   Comprehensive information / invasive
    •   http://www.youtube.com/watch?v=RNJl9EEc
        soE
   People now know what search is.
Future
   Information any place, any time
   A great big mess - Unless we corral it.
    •   Tag it,
    •   Clean it,
    •   Weed it
    •   Curate it
   Everyone is creating content
The information
explosion has just
     begun


© 2010. Access Innovations, Inc. All Rights Reserved.
We should
         all be part of it
Questions?

Marjorie M.K. Hlava
President
Access Innovations, Inc.
Mhlava@accessinn.com
505-998-0800

More Related Content

Viewers also liked (9)

Xperentia Company Profile
Xperentia Company ProfileXperentia Company Profile
Xperentia Company Profile
 
Anahata as Heart-centered Consciousness
Anahata as Heart-centered ConsciousnessAnahata as Heart-centered Consciousness
Anahata as Heart-centered Consciousness
 
Kansas Food Bank Can Do
Kansas Food Bank Can DoKansas Food Bank Can Do
Kansas Food Bank Can Do
 
gUILLERMO SLEVA
gUILLERMO SLEVAgUILLERMO SLEVA
gUILLERMO SLEVA
 
Plan de bienestar andrea....
Plan de bienestar andrea....Plan de bienestar andrea....
Plan de bienestar andrea....
 
Analytical Writing Sample #1
Analytical Writing Sample #1Analytical Writing Sample #1
Analytical Writing Sample #1
 
50 tech tips 2016 fin
50 tech tips 2016 fin50 tech tips 2016 fin
50 tech tips 2016 fin
 
Impact of Disruptive Technology in Businesses
Impact of Disruptive Technology in BusinessesImpact of Disruptive Technology in Businesses
Impact of Disruptive Technology in Businesses
 
Centros Comerciales
Centros ComercialesCentros Comerciales
Centros Comerciales
 

Similar to Tales From the Field: Implementing Information Technology

Epistemology, technology and knowledge growth - Meetup session 4
Epistemology, technology and knowledge growth - Meetup session 4Epistemology, technology and knowledge growth - Meetup session 4
Epistemology, technology and knowledge growth - Meetup session 4William Hall
 
Class4 - The Scientific Method to Psycholinguistics
Class4 - The Scientific Method to PsycholinguisticsClass4 - The Scientific Method to Psycholinguistics
Class4 - The Scientific Method to PsycholinguisticsNathacia Lucena
 
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...BillHall
 
Dacota_blue: What is science?
Dacota_blue: What is science?Dacota_blue: What is science?
Dacota_blue: What is science?Daniel Tabinga
 
Phil – 10 into to philosophy lecture 12 - empiricism
Phil – 10 into to philosophy   lecture 12 - empiricismPhil – 10 into to philosophy   lecture 12 - empiricism
Phil – 10 into to philosophy lecture 12 - empiricismWilliamParkhurst
 
Knowledge and Life: What does it mean to be living?
Knowledge and Life: What does it mean to be living?Knowledge and Life: What does it mean to be living?
Knowledge and Life: What does it mean to be living?William Hall
 
Philosophy of science 3 knowledge, theory, communication
Philosophy of science 3 knowledge, theory, communicationPhilosophy of science 3 knowledge, theory, communication
Philosophy of science 3 knowledge, theory, communicationDavid Engelby
 
NatSci - BQ5 exploration points (2022-23).docx
NatSci - BQ5 exploration points (2022-23).docxNatSci - BQ5 exploration points (2022-23).docx
NatSci - BQ5 exploration points (2022-23).docxShruthiThyagarajan2
 
Mystical Claims and Embodied Knowledge -- 2013 itc slides tom murray
Mystical Claims and Embodied Knowledge -- 2013 itc slides tom murrayMystical Claims and Embodied Knowledge -- 2013 itc slides tom murray
Mystical Claims and Embodied Knowledge -- 2013 itc slides tom murray2013 ITC Integral Theory Conference
 
Philosophy of science for icp
Philosophy of science for icpPhilosophy of science for icp
Philosophy of science for icpArief El Hakim
 
Video lecture for b.tech
Video lecture for b.techVideo lecture for b.tech
Video lecture for b.techEdhole.com
 
Philosophy -transition__2017_
Philosophy  -transition__2017_Philosophy  -transition__2017_
Philosophy -transition__2017_Anne Fraser
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyDavid Engelby
 
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8Manuela Pestana
 
2. logic and epistemology, chs. 7 8, p. 94-132
2. logic and epistemology, chs. 7 8, p. 94-1322. logic and epistemology, chs. 7 8, p. 94-132
2. logic and epistemology, chs. 7 8, p. 94-132Justin Morris
 

Similar to Tales From the Field: Implementing Information Technology (20)

Epistemology, technology and knowledge growth - Meetup session 4
Epistemology, technology and knowledge growth - Meetup session 4Epistemology, technology and knowledge growth - Meetup session 4
Epistemology, technology and knowledge growth - Meetup session 4
 
Lesson 1 what is philosophy
Lesson 1 what is philosophyLesson 1 what is philosophy
Lesson 1 what is philosophy
 
Class4 - The Scientific Method to Psycholinguistics
Class4 - The Scientific Method to PsycholinguisticsClass4 - The Scientific Method to Psycholinguistics
Class4 - The Scientific Method to Psycholinguistics
 
Epistemology_APH 214
Epistemology_APH 214Epistemology_APH 214
Epistemology_APH 214
 
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
Emergence and Growth of Knowledge and Diversity in Hierarchically Complex Org...
 
Dacota_blue: What is science?
Dacota_blue: What is science?Dacota_blue: What is science?
Dacota_blue: What is science?
 
Phil – 10 into to philosophy lecture 12 - empiricism
Phil – 10 into to philosophy   lecture 12 - empiricismPhil – 10 into to philosophy   lecture 12 - empiricism
Phil – 10 into to philosophy lecture 12 - empiricism
 
Knowledge and Life: What does it mean to be living?
Knowledge and Life: What does it mean to be living?Knowledge and Life: What does it mean to be living?
Knowledge and Life: What does it mean to be living?
 
Philosophy of science 3 knowledge, theory, communication
Philosophy of science 3 knowledge, theory, communicationPhilosophy of science 3 knowledge, theory, communication
Philosophy of science 3 knowledge, theory, communication
 
NatSci - BQ5 exploration points (2022-23).docx
NatSci - BQ5 exploration points (2022-23).docxNatSci - BQ5 exploration points (2022-23).docx
NatSci - BQ5 exploration points (2022-23).docx
 
Philosophy lecture rpc
Philosophy lecture  rpcPhilosophy lecture  rpc
Philosophy lecture rpc
 
Mystical Claims and Embodied Knowledge -- 2013 itc slides tom murray
Mystical Claims and Embodied Knowledge -- 2013 itc slides tom murrayMystical Claims and Embodied Knowledge -- 2013 itc slides tom murray
Mystical Claims and Embodied Knowledge -- 2013 itc slides tom murray
 
Philosophy of science for icp
Philosophy of science for icpPhilosophy of science for icp
Philosophy of science for icp
 
Philosophy lecture rpc
Philosophy lecture  rpcPhilosophy lecture  rpc
Philosophy lecture rpc
 
Video lecture for b.tech
Video lecture for b.techVideo lecture for b.tech
Video lecture for b.tech
 
Philosophy -transition__2017_
Philosophy  -transition__2017_Philosophy  -transition__2017_
Philosophy -transition__2017_
 
Philosophy of science summary presentation engelby
Philosophy of science summary presentation engelbyPhilosophy of science summary presentation engelby
Philosophy of science summary presentation engelby
 
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
 
A Priori A Posteriori
A Priori A PosterioriA Priori A Posteriori
A Priori A Posteriori
 
2. logic and epistemology, chs. 7 8, p. 94-132
2. logic and epistemology, chs. 7 8, p. 94-1322. logic and epistemology, chs. 7 8, p. 94-132
2. logic and epistemology, chs. 7 8, p. 94-132
 

More from Access Innovations, Inc.

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Access Innovations, Inc.
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Access Innovations, Inc.
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Access Innovations, Inc.
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedAccess Innovations, Inc.
 

More from Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

Tales From the Field: Implementing Information Technology

  • 1. Tales from the Field: Implementing Information Theory SIG CR - 2012 Marjorie Hlava, President Access Innovations, Inc. www.accessinn.com
  • 2. Implementing Information Theory  The case of the missing abstracts  Russian information  US PTO  Getty adventures  Vatican bibles  Past basics  Thoughts on directions
  • 3. The Bleeding Edge  Figure Out the client needs  Figure out the specifications  Get approval on the specifications  Figure out how to deliver the data following the specs  Quality control the data delivery  …. But then life happens
  • 4. The Case of Missing Abstracts Tests showed that just searching the indexing did not provide the full answers users wanted. Searching the titles and abstracts as well would improve search Enough space could be found on servers if the data was moved to in-house from Dialog and Orbit. New platform going into production New format – Messenger Specifications written, test file approved
  • 5. Specifications Need 99.998% accuracy for user acceptance Left tagged ASCII Office in Mexico City – Access de Mexico Triple key - double proof Two sets of volumes 792,000 abstract tapes destroyed 1970 – 1982 data
  • 6. Access de Mexico 7:17 Am Shift change September 19,1985 8.7 earthquake
  • 7. CAS to Philippines Limo from the airport with the remaining volumes Typhoon Dot October 12, 1985 Clark Air Force base evacuated Power out for weeks
  • 8. Jamaica Hurricane Kate November 1985 4 inches of water in the computer room No power on the island
  • 9. Beijing China November 1985  NOTHING HAPPENED  Finished  On time  Under budget  At promised accuracy level  Client said “ when I read your contract I thought you had an unusual level of detail on the Acts of God clauses….  But I didn’t expect you to use every one of them!”
  • 11. Implementing Information Theory Viniti Maxwell Information map PDP-8’s Microfilm machines – no batteries Glastnof – open but no trust
  • 12.
  • 13. Payments in cash in our shoes
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Puzzles, Keys, and Digitization Photocomposition keys Science typographers Puzzles – SGML Encyclopaedia Britannica Marquis Who’s Who Designing the Chicago Research and trading “desks”
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. US PTO Conversions  Scan at 300 dpi  OCR to 97%  5,400,000 patents  Create the machines  Testy  QC algorithms  Display image  Search dirty OCR  Spell right once in 30 pages = findable
  • 29. Perugia Bible 12” VideoDisc
  • 30.
  • 31.
  • 32.
  • 33. British Library Map Collection 225,000 maps pre-1850 From printed catalog to digital catalog
  • 34. Getty AAT to AATA
  • 35. Success - Failure - Future  Successes • Chemical Abstracts • USPTO • Getty AATA • British Map Collection  Failures • Access Russia • Ipsoa Video Disk • MAI Mail
  • 36. All projects use classification  To organize the job  To organize the information  To allow the finding of the items once digital  Apply term tags • thesaurus and controlled  Apply notation • Not necessarily classification • Just reflects the content  The classification is NEVER done • Needs to reflect the ever-changing data
  • 37. Theoretical Underpinnings  Outlines of Knowledge • Thomas Aquinas • John Knox (Bacon) • Morton Taube - Encyclopaedia Britannica  Organization of Knowledge • Cutter – 1896 • COSATI – 1964 • Alvin Weinberg • Cranfield Institute papers • Cleverton, Aitcheson, Vickery
  • 38. Theory of knowledge …. began early  Plato et al. - BC  Knowledge of reality is philosophy  Realism  St. Augustine 354 - 430 AD  St. Thomas Aquinas 1225 -1274 AD  Characteristics common in particulars  Not the same object without them 38 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 39. Theory of knowledge  William of Occam (or Ockham) –  c. 1288 – c. 1348  Nominalism - Universals are represented by words  Conceptualism - Universals are general concepts, mind dependent, formed by extraction from particular experiences 39 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 40. Theory of knowledge  The Knower (Subject)  The Known (Object)  Knowing (a subjective process)  An act, a process, or a concept  Facts or perception?  Yes or no answers 40 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 41. The basis of knowledge  René Descartes 1596 - 1650  Separate what is known - philosophy  From new knowledge - science  Conditions of reason, suspension of belief  Je pense donc je suis  Cogito, ergo sum (from Socrates)  I think, therefore I am  Cartesian 41 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 42. Conditions for knowledge  John Locke - 1632 - 1704  “A sailor needs to know the length of a line he has available before he goes out to sound the ocean with it.” - J. Locke  Acquire knowledge of reality  Establish the conditions needed to acquire knowledge  Establish possible extent and limitations of knowledge 42 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 43. John Locke 1632 - 1704 Classification of kinds of knowledge Some Thoughts Concerning Education 43 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 44. Outlines of knowledge  Carl Linnaeus 1707 – 1778  Placed plants in categories  Systematized the three kingdoms of nature  Replaced “natural systems” classification  Immanuel Kant 1724 - 1804  A posteriori and a priori judgments  A posteriori and a priori concepts  Outline of knowledge  The nature of this distinction has been disputed by various philosophers; however, the terms may be roughly defined as follows:  A priori knowledge is knowledge that is known independently of experience (that is, it is non-empirical, or arrived at beforehand, usually by reason).  A posteriori knowledge is knowledge that is known by experience (that is, it is empirical, or arrived at afterward). 44 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 45. Epistemology  James Frederick Ferrier 1808 - 1864  Analyzing the nature of knowledge  How it relates to connected notions  truth, belief, justification  The means of production of knowledge  Skepticism about different knowledge claims  http://en.wikipedia.org/wiki/Epistemology 45 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 46. Personification of knowledge (Greek Επιστημη, Episteme) in Celsus Library in Ephesus, Turkey. Epistemology from Greek ἐπιστήμη – epistēmē, "knowledge, science" + λόγος, "logos") or theory of knowledge is the branch of philosophy concerned with the nature and scope (limitations) of knowledge. It addresses the questions: What is knowledge? How is knowledge acquired? How do we know what we know? 46 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 47. Philosophy of knowledge divides  20th century thought  Memory  Perception and memory  Religion  Linguistic analysis  Classification of knowledge  Vocabulary control  Linguistic analysis 47 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 48. Rise of Classification  Charles Ami Cutter 1837 - 1903  Cutter Classification System  Melville Dewey 1851 - 1931  Dewey Decimal Classification  Vladimir Lenin 1870 – 1924  Rubricon - Russia  Rubricator  S. R. Ranganathan – India,1892 – 1972  Faceted Classification System  Colonicity 48 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 49. Charles Ammi Cutter  Harvard College,  index catalog,  using cards instead of published volumes,  an author index  and a “classed catalog” or subject index.  Expansive Classification System (Cutter)  seven levels of classification,  each with increasing specificity  use lower levels and still be specific 49 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 50. Thesauri  Philo of Byblos Herennius Philon; c. 64- 141 AD  Sanskrit, the Amarakosha 4th century verse  Roget's Thesaurus, 1805  by Peter Mark Roget, and published in 1852  COSATI - 1964  TEST - 1967 50 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 51. Points of knowledge  Single point of knowledge  Eve and the apple  First organism  All science  Examples  Linnean system  Rubricator  Locke system  Dewey 51 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 52. Points of knowledge  Multiple points of origin  Several fields come together  Top terms  Should they be captured separately or together?  Facets or different views?  Anarchy in the universe  Examples  Physical biochemistry  NICEM  Engineering  Cutter, COSATI, Ranganathan 52 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 53. Information access is changing  Teletype  Fax  Online  CD-ROM  Downloading  Internet
  • 54. The players are changing  Standalone publishers  Aggregators  Serials and book vendors  Hosting services  Cloud  Disaggregation  Everyone is an author  Loss of quality, accuracy, review
  • 55. The formats are changing  Handwritten  Gutenberg  Linotype  Web Presses • Photocomposition Digital layout Desktop publishing Web publishing
  • 56. Search is (finally) changing  Stairs  Online search  Elhill  Boolean search  Orbit  Cached search  String search  Bayesian  Verity  Co-occurrence  Neural nets  Fast  Machine learning  Lucene  Faceted (fielded)  Muse Global  Rules systems  Perfect Search © 2010. Access Innovations, Inc. All Rights Reserved.
  • 57. Tagging is still debated  Permuted Indexes • Chem abs • Bio abs • Portals  Permaterm indexes • IFI Predicasts • Classification systems LC • Thesauri  Inverted files  Triples
  • 58. Horizons are more complicated  Field formatted data  Relational and SQL databases  Object oriented systems  Semantic web  Linked data
  • 59. Formats just keep being added  Photocomposition markup  SGML  XML  JSON Calls Storage keeps changing  Big iron  Server farms  Cloud farms
  • 60. Telecommunications tries to keep up  Party lines  Direct connect lines  Trunk lines  Fiber optic  Cell towers  Wireless
  • 61. Media  Punch cards  9 track tapes  Mountain tapes  Removable drives  Diskettes • 8” – • 5.25 – • 3.5 • Flash drives • Chips
  • 62. Indexes  Pre-coordinate • Back of the book • Subject headings  Post-coordinate  Bayesian  Co-occurrence  Neural nets  Machine learning  Rules systems
  • 63. Now  Changing the way we learn  Changing the way we find things  Easier to manipulate what we know • http://www.youtube.com/watch?v=B8ofWFx5 25s  Comprehensive information / invasive • http://www.youtube.com/watch?v=RNJl9EEc soE  People now know what search is.
  • 64. Future  Information any place, any time  A great big mess - Unless we corral it. • Tag it, • Clean it, • Weed it • Curate it  Everyone is creating content
  • 65. The information explosion has just begun © 2010. Access Innovations, Inc. All Rights Reserved.
  • 66. We should all be part of it Questions? Marjorie M.K. Hlava President Access Innovations, Inc. Mhlava@accessinn.com 505-998-0800