Successfully reported this slideshow.
Your SlideShare is downloading. ×

Machines are people too

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 50 Ad

Machines are people too

Download to read offline

Keynote for Theory and Practice of Digital Libraries 2017

The theory and practice of digital libraries provides a long history of thought around how to manage knowledge ranging from collection development, to cataloging and resource description. These tools were all designed to make knowledge findable and accessible to people. Even technical progress in information retrieval and question answering are all targeted to helping answer a human’s information need.

However, increasingly demand is for data. Data that is needed not for people’s consumption but to drive machines. As an example of this demand, there has been explosive growth in job openings for Data Engineers – professionals who prepare data for machine consumption. In this talk, I overview the information needs of machine intelligence and ask the question: Are our knowledge management techniques applicable for serving this new consumer?

Keynote for Theory and Practice of Digital Libraries 2017

The theory and practice of digital libraries provides a long history of thought around how to manage knowledge ranging from collection development, to cataloging and resource description. These tools were all designed to make knowledge findable and accessible to people. Even technical progress in information retrieval and question answering are all targeted to helping answer a human’s information need.

However, increasingly demand is for data. Data that is needed not for people’s consumption but to drive machines. As an example of this demand, there has been explosive growth in job openings for Data Engineers – professionals who prepare data for machine consumption. In this talk, I overview the information needs of machine intelligence and ask the question: Are our knowledge management techniques applicable for serving this new consumer?

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Machines are people too (20)

Advertisement

Recently uploaded (20)

Advertisement

Machines are people too

  1. 1. MACHINES ARE PEOPLE TOO Dr. Paul Groth | @pgroth | pgroth.com Disruptive Technology Director Elsevier Labs | @elsevierlabs Theory and Practice of Digital Libraries 2017
  2. 2. THANKS FOR CONVERSATION & SLIDES! Riffing off of Brad’s Dublin Core 2016 keynote https://www.slideshare.net/bpa777/ dc2016-keynote-20161013- 67164305
  3. 3. THE SUCCESS OF DIGITAL LIBRARIES “Live every day like it's NBER day”
  4. 4. THE SUCCESS OF DIGITAL LIBRARIES
  5. 5. THE SUCCESS OF DIGITAL LIBRARIES
  6. 6. THE SUCCESS OF DIGITAL LIBRARIES
  7. 7. THE SUCCESS OF DIGITAL LIBRARIES
  8. 8. THE NEXT MEDIA: DATA
  9. 9. FAIR EVERYWHERE
  10. 10. RESEARCH DATA MANAGEMENT
  11. 11. DATA SEARCH Antony Scerri, John Kuriakose, Amit Ajit Deshmane, Mark Stanger, Peter Cotroneo, Rebekah Moore, Raj Naik, Anita de Waard; Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge, Database, Volume 2017, 1 January 2017, bax056, https://doi.org/10.1093/database/bax056
  12. 12. THE CENTRALITY OF THE USER
  13. 13. HOW DO RESEARCHERS SEARCH FOR DATA? Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey: 1. The needs and behaviours of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. 2. Background uses of observational data are better documented than foreground uses. 3. Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common.
  14. 14. BUT ARE WE MISSING A USER?
  15. 15. WHY MACHINES?
  16. 16. ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR RESEARCHERS, DOCTORS AND NURSES My work is moving towards a new field; what should I know? • Journal articles, reference works, profiles of researchers, funders & institutions • Recommendations of people to connect with, reading lists, topic pages How should I treat my patient given her condition & history? • Journal articles, reference works, medical guidelines, electronic health records • Treatment plan with alternatives personalized for the patient How can I master the subject matter of the course I am taking? • Course syllabus, reference works, course objectives, student history • Quiz plan based on the student’s history and course objectives
  17. 17. INFORMATION OVERLOAD
  18. 18. WHAT CAN MACHINE INTELLIGENCE DO TODAY? If there’s a task that a normal person can do with less than one second of thinking, there’s a very good chance we can automate it with deep learning. Andrew Ng, Chief Scientist, Baidu (lecture at Bay Area Deep Learning School, Stanford, CA, September 24, 2016)
  19. 19. HUMAN SPEECH RECOGNITION Was 23% in 2013, and over 35% in 2012. https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/
  20. 20. IMAGE RECOGNITION https://devblogs.nvidia.com/parallelforall/author/czhang/
  21. 21. THESE RESULTS ARE DRIVEN BY DATA “The paradigm shift of the ImageNet thinking is that while a lot of people are paying attention to models, let’s pay attention to data, …” – Prof. Fei-Fei Li [1] [1] The data that transformed AI research—and possibly the world https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and- possibly-the-world/
  22. 22. THE GROWTH IN DATA ENGINEERS https://www.stitchdata.com/resources/reports/the-state-of-data-engineering
  23. 23. BUT DO DIGITAL LIBRARIES HELP MACHINES? • Machines’ proficiency in learning to answer questions from text, audio, images and video will depend on our ability to train them effectively to read information from the Web • How machines read the Web today • Crawling and indexing Web resources, possibly semantically tagged (e.g. using schema.org) • Find-and-follow crawling of open linked data resources for ontology and data sharing and reuse • Programmatic access to APIs mediated through HTTP/S and other Internet protocols
  24. 24. DIGITAL LIBRARIES & LINKED DATA STANDARDS
  25. 25. THE SEMANTIC WEB WAS INTENDED FOR MACHINE READING … that’s the real idea behind the Semantic Web: letting software use the vast collective genius embedded in its published pages. Swartz, A. (2013). Aaron Swartz's A programmable Web: An unfinished work. San Rafael, Calif.: Morgan & Claypool Publishers.
  26. 26. BUT THE SEMANTIC WEB IS BUILT FOR PEOPLE, NOT MACHINES • The Semantic Web is largely a logicist take on the way knowledge is to be represented • The latest advances in machine intelligence are based on a connectionist approach to knowledge representation • There is a gap between how knowledge is represented in the Semantic Web and what deep learning is exploiting to such good effect • The Semantic Web is silent about how machines can become better readers, and hence better partners in the second machine age • How will we evolve metadata standards to better accommodate machines?
  27. 27. MACHINE READING IS ENABLED BY MACHINE LEARNING input output algorithm input output model learning architecture data Programming Machine learning GPU CPU CPU
  28. 28. MACHINES SEE THINGS DIFFERENTLY THAN PEOPLE From: Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv:1610.01644v1.
  29. 29. MACHINES LEARN THINGS DIFFERENTLY THAN PEOPLE
  30. 30. VOCABULARIES ARE SETS OF VECTOR EMBEDDINGS From: Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M. and Riedel, S. (2016). Emoji2vec: learning emoji representations from their description. arXiv:1609.08359v1.
  31. 31. TRAINING DATASETS ARE GROWING IN VOLUME AND COVERAGE From: Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B. and Vijayanarasimhan, S. YouTube-8M: a large-scale video classification benchmark. arXiv:1609.08675.
  32. 32. MODELS ARE BECOMING REUSABLE DATA RESOURCES Check out: sujitpal.blogspot.com for more
  33. 33. MACHINE LEARNING DATASETS AND MODELS ARE BECOMING PART OF THE WEB • Machines need lots and lots of data to learn how to read • Datasets with ad-hoc formats are being made openly available • Open Images “~9 million URLs to images that have been annotated with labels spanning over 6000 categories” (The Open Images Dataset. (n.d.). Retrieved September 29, 2016, from https://github.com/openimages/dataset.) • YouTube-8M : “8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities” (Vijayanarasimhan S. and Natsev, P. (2016). Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research. Retrieved September 29, 2016, https://research.googleblog.com/2016/09/announcing-youtube-8m- large-and-diverse.html.) • Stanford Natural Language Inference: “570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference” (The Stanford Natural Language Inference (SNLI) Corpus. (n.d.). Retrieved September 29, 2016, from http://nlp.stanford.edu/projects/snli/.) • Standard architectures for machine (deep) learning are being released as open source • Dense neural networks for classification • Convolutional neural networks for image, audio and video recognition • Recurrent neural networks for sequence processing and generation • Advances in the field are being published quickly and transferred to industrial application just as quickly
  34. 34. THE OPPORTUNITY FOR LIBRARIANS AND PUBLISHERS As machines become increasingly capable of general- purpose language understanding, the burden of effort in building machine intelligences will shift from software engineering to the acquisition, organization and curation of training content and data.
  35. 35. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER SAVE THE TIME OF THE MACHINE READER Perhaps this law is not so self-evident as the others. None the less, it has been responsible for many reforms in library administration and has a great potentiality for effecting many more reforms in the future. Ranganathan, S.R. (1931). The five laws of library science. Madras: The Madras Library Association.
  36. 36. IMAGE SOURCE: HTTP://WESTPORTLIBRARY.ORG/ABOUT/NEWS/ROBOTS-ARRIVE-WESTPORT-LIBRARY WHAT DOES IT LOOK LIKE TO HAVE MACHINES AS LIBRARY PATRONS? Tasks 1. Dataset / Model / Vocabulary Curation 2. Combating Bias 3. Explanation 4. Interoperability 5. Data  Narratives
  37. 37. DATASET CURATION
  38. 38. MODEL CURATION
  39. 39. VOCABULARY CURATION
  40. 40. BATTLING BIAS
  41. 41. BATTLING BIAS: ALGORITHMIC LITERACY Algorithms all have their own ideologies. As computational methods and data science become more and more a part of every aspect of our lives, it is essential that work begin to ensure there is a broader literacy about these techniques and that there is an expansive and deep engagement in the ethical issues surrounding them.” – Trevor Owens (Library of Congress / Former IMLS) http://www.pewinternet.org/2017/02/08/theme-7-the-need-grows-for-algorithmic-literacy-transparency-and-oversight/
  42. 42. THE RIGHT TO AN EXPLANATION “The data subject shall have the right to obtain … the existence of automated decision-making, including profiling … meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.” EU General Data Protection Chapter 3, Article 15
  43. 43. PROVENANCE FOR EXPLANATION Credits: Curt Tilmes, Peter Fox Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G., "Provenance Representation for the National Climate Assessment in the Global Change Information System," Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
  44. 44. NATIONAL CLIMATE CHANGE ASSESSMENT PROVENANCE
  45. 45. INTEROPERABILITY
  46. 46. DATA  NARRATIVE GENERATION Towards Automating Data Narratives. Gil, Y.; and Garijo, D. In Proceedings of the Twenty-Second ACM International Conference on Intelligent User Interfaces (IUI-17), Limassol, Cyprus, 2017.
  47. 47. THE CHALLENGE: DIGITAL LIBRARIES FOR MACHINES • Digital Libraries have made tremendous strides in making media available • The investment in Linked Data and APIs has made integration and building applications easier and can help machine reader use cases • But a new user needs new support: • new forms of media (models, data) • new vocabulary representations • new forms of transparency • new ways to interoperate • new mechanisms to communicate • ….
  48. 48. THANK YOU Dr. Paul Groth | @pgroth | pgroth.com labs.elsevier.com

Editor's Notes

  • 8800 facebook group
    print
  • Media
  • 115 organizations
  • Work with dans
    Reviewed 400 papers deep dive 114
  • Sundar Pichai
  • These laws are:
    Books are for use.
    Every reader his / her book.
    Every book its reader.
    Save the time of the reader.
    The library is a growing organism.
  • Obviously, this is facetious. The “patron” is the machine learning faculty, not the machine itslelf.
  • Identying and document

×