• Like
NFAIS 2014 Miles Conrad Award Lecture, Presented by Marjorie M.K. Hlava
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

NFAIS 2014 Miles Conrad Award Lecture, Presented by Marjorie M.K. Hlava

  • 1,271 views
Published

Marjorie M.K. Hlava, President and founder of Access Innovations, Inc. and the Data Harmony suite of indexing software, gives the Miles Conrad Memorial Lecture at the 2014 Annual Conference for the …

Marjorie M.K. Hlava, President and founder of Access Innovations, Inc. and the Data Harmony suite of indexing software, gives the Miles Conrad Memorial Lecture at the 2014 Annual Conference for the National Federation of Advanced Information Services (NFAIS).

The Miles Conrad Award and accompanying lecture was established in 1965 in commemoration of NFAIS founder, G. Miles Conrad. Hlava earned the Miles Conrad Award this January for her past and continuing services to NFAIS and the Information and Knowledge Management industries.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,271
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Tales from the Field: Our Changing World Marjorie Hlava, President Access Innovations, Inc. www.accessinn.com
  • 2. Miles Conrad Presentation - 2014 • Tales from the field –The case of the missing abstracts –Russian information –USPTO –Getty adventures –Vatican bibles • Where are we now? • Future directions
  • 3. The Cutting Edge • Figure out the client needs • Figure out the specifications • Get approval on the specifications • Figure out how to deliver the data following the specs • Quality control the data delivery • …. But then life happens
  • 4. The Case of the Missing Abstracts • Tests showed that just searching the indexing did not provide the full answers users wanted. Searching the titles and abstracts as well would improve search. • Enough space could be found on servers if the data were moved to in-house from Dialog and Orbit. • New platform going into production • New format – Messenger • Specifications written, test file approved
  • 5. Specifications Need 99.998% accuracy for user acceptance Left tagged ASCII Office in Mexico City – Access de Mexico Triple key - double proof Two sets of volumes 792,000 abstract tapes destroyed 1970 – 1982 data
  • 6. Access de Mexico 7:17 Am Shift change September 19,1985 8.7 earthquake
  • 7. CAS to Philippines Limo from the airport with the remaining volumes Typhoon Dot October 12, 1985 Clark Air Force base evacuated Power out for weeks
  • 8. Jamaica Hurricane Kate November 1985 4 inches of water in the computer room No power on the island
  • 9. Beijing China November 1985 • NOTHING HAPPENED • Finished • On time • Under budget • At promised accuracy level • Client said “ when I read your contract I thought you had an unusual level of detail on the Acts of God clauses…. • But I didn’t expect you to use every one of them!”
  • 10. Russian Information
  • 11. Implementing Information Theory Viniti Maxwell Information map PDP-8’s Microfilm machines – no batteries Glasnost – open but no trust
  • 12. Payroll in cash … in our shoes …
  • 13. This is an office???
  • 14. Three flights down metal stairs… A full well equipped office
  • 15. Puzzles, Keys, and Digitization • Photocomposition keys • Science typographers • Puzzles – SGML • Encyclopedia Britannica • Marquis Who’s Who • Designing the Chicago research and trading “desks”
  • 16. One mile back in the cave
  • 17. US Patents staging for scanning
  • 18. Fragile pages – air fed – double sided
  • 19. USPTO Conversions • Scan at 300 dpi • OCR to 97% • 5,400,000 patents • Create the machines • Test • QC algorithms • Display image • Search dirty OCR • Spell right once in 30 pages = findable
  • 20. Perugia Bible 12” VideoDisc
  • 21. British Library Map Collection 225,000 maps pre-1850 From printed catalog to digital catalog
  • 22. Getty AAT to AATA
  • 23. All projects use classification • To organize the job • To organize the information • To allow the finding of the items once digital • Apply term tags –Thesaurus-based and controlled • Apply notation –Not necessarily classification –Just reflects the content • The classification is NEVER done –Needs to reflect the ever-changing data
  • 24. Theoretical Underpinnings • Outlines of Knowledge –Thomas Aquinas –John Knox (Bacon) –Morton Taube - Encyclopedia Britannica • Organization of Knowledge –Cutter – 1896 –COSATI – 1964 • Alvin Weinberg –Cranfield Institute papers • Cleverdon, Aitchison, Vickery
  • 25. Theory of knowledge …. began early • Plato et al. - BC –Knowledge of reality is philosophy • Realism –St. Augustine 354 - 430 AD –St. Thomas Aquinas 1225 -1274 AD –Characteristics common in particulars –Not the same object without them 36
  • 26. Theory of Knowledge • William of Occam (or Ockham) – –c. 1288 – c. 1348 • Nominalism - Universals are represented by words • Conceptualism - Universals are general concepts, mind dependent, formed by extraction from particular experiences 37
  • 27. Theory of Knowledge • The Knower (Subject) • The Known (Object) • Knowing (a subjective process) • An act, a process, or a concept • Facts or perception? • Yes or no answers 38
  • 28. The Basis of Knowledge • René Descartes 1596 - 1650 –Separate what is known - philosophy –From new knowledge - science –Conditions of reason, suspension of belief –Je pense donc je suis –Cogito, ergo sum (from Socrates) –I think, therefore I am –Cartesian 39
  • 29. Conditions for knowledge • John Locke - 1632 - 1704 –“A sailor needs to know the length of a line he has available before he goes out to sound the ocean with it.” - J. Locke • Acquire knowledge of reality • Establish the conditions needed to acquire knowledge • Establish possible extent and limitations of knowledge 40
  • 30. John Locke 1632 - 1704 Classification of kinds of knowledge Some Thoughts Concerning Education 41
  • 31. Philosophy of knowledge divides • 20th century thought –Memory –Perception and memory –Religion –Linguistic analysis –Classification of knowledge • Vocabulary control • Linguistic analysis 42
  • 32. Rise of Classification • Charles Ammi Cutter 1837 - 1903 –Cutter Classification System • Melville Dewey 1851 - 1931 –Dewey Decimal Classification • Vladimir Lenin 1870 – 1924 –Rubricon - Russia –Rubricator • S. R. Ranganathan – India,1892 – 1972 –Faceted Classification System –Colonicity 43
  • 33. Charles Ammi Cutter • Harvard College, • index catalog, –using cards instead of published volumes, –an author index –and a “classed catalog” or subject index. • Expansive Classification System (Cutter) –seven levels of classification, –each with increasing specificity –use lower levels and still be specific 44
  • 34. Thesauri • Philo of Byblos Herennius Philon; c. 64-141 AD • Sanskrit, the Amarakosha 4th century verse • Roget's Thesaurus, 1805 –by Peter Mark Roget, and published in 1852 • COSATI - 1964 –TEST - 1967 45
  • 35. Points of knowledge • Single point of knowledge –Eve and the apple –First organism –All science –Examples • Linnean system • Rubricator • Locke system • Dewey 46
  • 36. Points of knowledge • Multiple points of origin –Several fields come together - Top terms –Should they be captured separately or together? –Facets or different views? –Anarchy in the universe –Examples • Physical biochemistry • NICEM • Engineering –Supporters = Cutter, COSATI, Ranganathan 47
  • 37. Information access is changing • Teletype • Fax • Online • CD-ROM • Downloading
  • 38. The players are changing • Standalone publishers • Aggregators • Serials and book vendors • Hosting services • Cloud • Disaggregation • Everyone is an author • Loss of quality, accuracy, review
  • 39. Funding for clients is changing • The US Government spends $17 BILLION a day more than it brings in • 50% of the Republicans in the house have served 3 years or less (WSJ 9/23/2013) • Public university funding - Illinois –State appropriated funds 18.9% • Decreasing at 9.4% from 4 years previous –Tuition revenue 24.7 % –Governmental grants and contracts 17.9 % –Hospital income 12.4% – http://www.ibhe.org/Fiscal%20Affairs/PDF/FY12PublicRevExpRpt.pdf
  • 40. University of California
  • 41. • Handwritten • Gutenberg • Linotype • Web Presses – Photocomposition Digital layout Desktop publishing Web publishing The formats are changing
  • 42. Storage is changing • iPhone has 240,000 times the memory power of the Voyager 1 – 12 billion miles from earth (NYTimes 9/13/2013)
  • 43. Search is (finally) changing • Online search • Boolean search • Cached search • Bayesian – Co-occurrence – Neural nets – Machine learning • Faceted (fielded) • Rules systems • Stairs • ELHILL • Orbit • String search • Verity • FAST • Lucene • MuseGlobal • Perfect Search
  • 44. Tagging is still debated • Permuted indexes –Chem abs –Bio abs –Portals • Permaterm indexes –IFI Predicasts –Classification systems LC –Thesauri • Inverted files • Triples
  • 45. Horizons are more complicated • Field formatted data • Relational and SQL databases • Object oriented systems • Semantic web • Linked data
  • 46. Formats just keep being added • Photocomposition markup • SGML • XML • JSON calls • Big iron • Server farms • Cloud farms Storage keeps changing
  • 47. Telecommunications tries to keep up • Party lines • Direct connect lines • Trunk lines • Fiber optics • Cell towers • Wireless
  • 48. Media • Punch cards • 9 track tapes • Mountain tapes • Removable drives • Diskettes –8” – –5.25 – –3.5 –Flash drives –Chips 72% of online Americans use social networking in 2013, up from 8% in 2005 Few differences by Educational attainment 67% are HS Dropouts 72% are college graduates (Pew Internet & American Life Project)
  • 49. Indexes • Pre-coordinate –Back of the book –Subject headings • Post-coordinate • Bayesian • Co-occurrence • Neural nets • Machine learning • Rules systems
  • 50. Our Cozy World • Has CHANGED • The landscape is shifting in profound ways • Funding models are changing • Who will pay is changing • Systems manipulating knowledge • User needs and wants are changing
  • 51. Challenge old assumptions • In the olden days – companies had “industries” that they worked within – “markets” that they sold into, and – “business models” that they pursued – assumptions that drove their decisions – and associations that represented them in – a world that moved relatively slowly • Now…every single assumption needs to be challenged – rapid change in future trends – innovation is constant – we need to find the growth opportunities
  • 52. Doing Business at the speed of thought • Communicating in 140 characters –Twitter –Text • Small screens • Small thoughts • Always connected • Social networking
  • 53. Where are we headed? • An Australian study predicted that 65 percent of preschoolers would eventually work in jobs and careers that do not currently exist. • It’s going to be a move from a bad economy to the next economy” - Mike Fleming, American Chamber of Commerce
  • 54. Technology • Cell phones • Netflix • iPod • Tablets • iPad • Digital journal collection bundles • e-books • Google • Linked data • Blackberry • E-mail • Webex • Skype • Blogs • Twitter • LinkedIn • Facebook • YouTube • MySpace
  • 55. The Landscape is Changing • New fields driven by technology –Information architects, KM, KOSs • New associations, new companies • Interests and focus - associations merge, fold, or morph
  • 56. Systems Manipulating Knowledge • Search domination –Steering the Bayesian engines –Tricking the search systems • Google's personalization algorithms affect search results. • Yahoo and Microsoft do the same thing
  • 57. Filter Bubbles
  • 58. Systems Manipulating Knowledge • Knowledge is Power http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html "The Filter Bubble: What The Internet Is Hiding From You" – Eli Pariser – Chair of MoveOn.org
  • 59. Demographics • World Population = 3,031,720,300 – July 1980 • World Population = 6,775,235,300 – July 2009 • World Population = 7,098,978,020 – February 2013 • World Population = 7,214,213,389 – February 2014 • U.S.A. Population = 180,671,000 – July 1960 • U.S.A. Population = 226,542,250 – July 1980 • U.S.A. Population = 317,599,000 – February 2014* • U.S.A. Population = 321,666,147– February 2014* • New Mexico Population = 1,303,303 – July 1980 • New Mexico Population = 2,085,287 – 2013 estimate • *Accounting for births, deaths, and net immigration, the population is expected to tick up by another person every 15 seconds.
  • 60. Gross Domestic Product - USA • 25% of global economy • 20% of global manufacturing • Government added 14% of USA GDP • Not-for-profits added 5.5% of the USA GDP • http://en.wikipedia.org/wiki/Economy_of_the_United_States
  • 61. Public finance - USA • Public debt $17,322 trillion (2014) • Revenues $2.5 trillion (2012) • Expenses $3.54 trillion (2012) • Economic aid (ODA) $30.7 billion (2011) • Public and private US debt = 3.5 x GDP • Public debt is growing at $2.44B each day
  • 62. Total US revenue http://www.usgovernmentrevenue.com/united_states_total_revenue_pie_chart
  • 63. Federal only US revenue http://www.usgovernmentrevenue.com/piechart_2014_US_fed
  • 64. Our customers revenue
  • 65. Who will pay? - Academic • Demographics are increasingly challenging • We cannot continue to use eminent domain as a driver – Build more – Consume more – Add more people – Increase the GDP – Increase taxes • Research and universities dependent primarily on tax revenue – Grant funding – Government contracts – Donations – UNM: 25% from State for 2011-12, 12% for 2013-14 – Funding models facing pressure to change
  • 66. Who will pay? - Associations • When the company pays for membership – Most let membership continue indefinitely – When a member, who first used his boss’s membership, then decided to pay for his own, didn’t think the $100 paid was worth it. • Companies are changing association membership • from company-paid to personal-paid • causing problems for some of the biggest trade groups and membership societies.
  • 67. Open Source Issues • But you know that discussion! • …..
  • 68. Landscape forms • More publisher consolidation • E-book self-publishing is surging • Publishers will use metadata in more sophisticated ways • Expansion of peer power • Less peer review more self publishing
  • 69. Publishing today http://www.worldometers.info/
  • 70. Landscape Forms - Technology • Increased content in the cloud • Bring Your Own Device (BYOD) –iPad Air, iPhone 6 –Widespread adoption of Android • Increasing demand for mobile apps and websites • Increase in software-as-a-service (SaaS) options –Ubiquitous communications • Increased implementation of cloud computing • Widespread adoption of 3D technology
  • 71. Landscape Forms - Apps • Applications being “socialized” • Enhanced/interactive/portable e-books • Many options for low-priced standalone e-readers • More legal disputes and patent wars • More interest in open and linked data • Search analytics • Gradual adoption of HTML5 • Streaming content • Information prices are rising, – but content budgets aren’t keeping pace
  • 72. Landscape Forms – Data • Smartphone adoption • Blending of offline-online worlds • Gesture-based computing • Increased geo-tagging of information • Voice interfaces • Interactive Learning • MOOCs—Massive Open Online Courses
  • 73. Landscape Forms – Data • Data analytics • More Big Data –Underneath, there are compelling applications to be implemented –Enterprise search and business insight technologies • Intelligent objects—The Internet of Things • Work anywhere • More predictive personalization
  • 74. Landscape Forms - Libraries • Increased engagement with social media • Growing popularity of e-singles • Web-scale discovery for library collections • Facilitation roles for librarians • Monograph e-platforms ascendant – Books at JSTOR, University Publishing Online, University Press Scholarship Online, University Press Content Consortium Book Collections of Project MUSE
  • 75. Landscape forms - Politics • WikiLeaks • Net neutrality battles • Government tries to regulate the internet (SOPA and PIPA, ACTA, WCIT) • Privacy and security concerns dominate policy discussions • Developments in discovery tools • New legal platforms • Ongoing focus on security/privacy issues—online • Newspapers and magazines online only • Information access regulation
  • 76. Giants are moving • Battling on hardware and search –Apple, Google, Facebook, Amazon • Battle of the mobile devices –Apple iOS, Google’s Android, Microsoft Windows • Growth in open source innovation/problem solving • More installations of Solr • Power outages are much more a problem than data breaches
  • 77. Universe of Options What should we do?
  • 78. Publishers • “publishers have behaved a bit like hunter-gatherers of research” * • Become information providers • Invest in search & data management tools • Make sure relevant articles find their way to researchers • Open article databases to software developers • Build community apps on top of their content • Drive readers to obscure content • Open Access initiatives • *(Economist, “One of the best media businesses is also one of the most resented,” Http://www.economist.com/node/18744177/)
  • 79. “Ask an Expert” Sites • 2005 – 15 % used such sites • 2009 – 43% used them • 2013 – 83% used them • “librarian” usage is flat • 83% of those who used librarian perceived value • Embed ourselves in the expert sites! • Add a taxonomy to leverage search • http://www.libraryjournal.com/lj/home/889752-264/stuck_in_the_past_.html.csp
  • 80. New kinds of customers • Publishers • Information professionals • CTO, CIO, Librarian • Researchers • Information consultants • Information architects • Taxonomists • Library technical assistants • Business owners • Planners • Work inside libraries • Researchers • Information consultants • Knowledge managers • Records managers • User experience specialists • Indexers Where do we find such a diverse set of customers?
  • 81. Thirst for knowledge • “The reality of the future of meetings is that learning is what most people will do for a living in the 21st century. There will be a requirement to constantly replenish that knowledge, and a huge focus on knowledge delivery.” -- Jim Carroll, Futurist
  • 82. What information format? • Symposia • Articles • Databases • Meet-ups • Tweet-ups • Virtual networking • Short seminars • Continuing education • Workshops • Webinars • Blogs • Free vs fee • 12 x per year 4 x per year • All day or two hours
  • 83. Need trusted sources • Semantic enrichment • Make data findable • Ensure trustworthy data • Replicable search results – Discovery – Precision – Recall • Aiding the human brain • Automated efficient processing
  • 84. Now • Changing the way we learn • Changing the way we find things • Easier to manipulate what we know – http://www.youtube.com/watch?v=B8ofWFx525s • Comprehensive information / invasive – http://www.youtube.com/watch?v=RNJl9EEcsoE • People now know what search is.
  • 85. Our Path • Metadata 2.0 – Taxonomies will lead innovative search – Publishers will be using metadata in increasingly sophisticated ways • Production will shift to web ready data – Publishers already had big data! – Accelerated repackaging and combining of existing resources – Channeling of data will become more important – Semantic tagging will leverage product development – Selling more directly to the consumer
  • 86. Our Path • Markets –Publishers and enterprises will embrace the web for distribution –Government funded sales will decrease –Enterprise sales will increase • Reaching customers –More personal approach –Fewer will attend conferences
  • 87. Future • Information any place, any time • A great big mess - unless we corral it. –Tag it, –Clean it, –Weed it, –Curate it. • Everyone is creating content
  • 88. The information explosion has just begun
  • 89. We should be all be part of it! Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com +1-505-998-0800 Thank you