SlideShare a Scribd company logo
1 of 50
MACHINES ARE PEOPLE TOO
Dr. Paul Groth | @pgroth | pgroth.com
Disruptive Technology Director
Elsevier Labs | @elsevierlabs
Theory and Practice of Digital Libraries 2017
THANKS FOR CONVERSATION & SLIDES!
Riffing off of Brad’s Dublin Core
2016 keynote
https://www.slideshare.net/bpa777/
dc2016-keynote-20161013-
67164305
THE SUCCESS OF DIGITAL LIBRARIES
“Live every day like it's NBER day”
THE SUCCESS OF DIGITAL LIBRARIES
THE SUCCESS OF DIGITAL LIBRARIES
THE SUCCESS OF DIGITAL LIBRARIES
THE SUCCESS OF DIGITAL LIBRARIES
THE NEXT MEDIA: DATA
FAIR EVERYWHERE
RESEARCH DATA MANAGEMENT
DATA SEARCH
Antony Scerri, John Kuriakose, Amit Ajit Deshmane, Mark Stanger, Peter Cotroneo, Rebekah Moore, Raj Naik, Anita de Waard;
Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge, Database, Volume 2017, 1 January 2017,
bax056, https://doi.org/10.1093/database/bax056
THE CENTRALITY OF THE USER
HOW DO RESEARCHERS SEARCH FOR DATA?
Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A.,
& Wyatt, S. (2017). Searching Data: A Review of
Observational Data Retrieval Practices. arXiv
preprint arXiv:1707.06937.
Some observations from @gregory_km
survey:
1. The needs and behaviours of specific user groups
(e.g. early career researchers, policy makers,
students) are not well documented.
2. Background uses of observational data are better
documented than foreground uses.
3. Reconstructing data tables from journal articles,
using general search engines, and making direct data
requests are common.
BUT ARE WE MISSING A USER?
WHY MACHINES?
ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR
RESEARCHERS, DOCTORS AND NURSES
My work is moving towards a new field; what should I know?
• Journal articles, reference works, profiles of researchers, funders &
institutions
• Recommendations of people to connect with, reading lists, topic pages
How should I treat my patient given her condition & history?
• Journal articles, reference works, medical guidelines, electronic health
records
• Treatment plan with alternatives personalized for the patient
How can I master the subject matter of the course I am taking?
• Course syllabus, reference works, course objectives, student history
• Quiz plan based on the student’s history and course objectives
INFORMATION OVERLOAD
WHAT CAN MACHINE INTELLIGENCE DO TODAY?
If there’s a task that a normal person can do with
less than one second of thinking, there’s a very
good chance we can automate it with deep
learning.
Andrew Ng, Chief Scientist, Baidu (lecture at Bay Area Deep Learning
School, Stanford, CA, September 24, 2016)
HUMAN SPEECH RECOGNITION
Was 23% in 2013, and over 35% in 2012.
https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/
IMAGE RECOGNITION
https://devblogs.nvidia.com/parallelforall/author/czhang/
THESE RESULTS ARE DRIVEN BY DATA
“The paradigm shift of the ImageNet
thinking is that while a lot of people
are paying attention to models, let’s
pay attention to data, …”
– Prof. Fei-Fei Li [1]
[1] The data that transformed AI research—and possibly the world
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-
possibly-the-world/
THE GROWTH IN DATA ENGINEERS
https://www.stitchdata.com/resources/reports/the-state-of-data-engineering
BUT DO DIGITAL LIBRARIES HELP MACHINES?
• Machines’ proficiency in learning to answer questions from text, audio,
images and video will depend on our ability to train them effectively to read
information from the Web
• How machines read the Web today
• Crawling and indexing Web resources, possibly semantically tagged
(e.g. using schema.org)
• Find-and-follow crawling of open linked data resources for ontology and
data sharing and reuse
• Programmatic access to APIs mediated through HTTP/S and other
Internet protocols
DIGITAL LIBRARIES & LINKED DATA STANDARDS
THE SEMANTIC WEB WAS INTENDED FOR MACHINE READING
… that’s the real idea behind the Semantic Web:
letting software use the vast collective genius
embedded in its published pages.
Swartz, A. (2013). Aaron Swartz's A programmable Web: An unfinished
work. San Rafael, Calif.: Morgan & Claypool Publishers.
BUT THE SEMANTIC WEB IS BUILT FOR PEOPLE, NOT MACHINES
• The Semantic Web is largely a logicist take on the way knowledge is to be
represented
• The latest advances in machine intelligence are based on a connectionist
approach to knowledge representation
• There is a gap between how knowledge is represented in the Semantic Web
and what deep learning is exploiting to such good effect
• The Semantic Web is silent about how machines can become better
readers, and hence better partners in the second machine age
• How will we evolve metadata standards to better accommodate machines?
MACHINE READING IS ENABLED BY MACHINE LEARNING
input
output
algorithm
input
output
model
learning
architecture
data
Programming
Machine learning
GPU
CPU
CPU
MACHINES SEE THINGS DIFFERENTLY THAN PEOPLE
From: Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv:1610.01644v1.
MACHINES LEARN THINGS DIFFERENTLY THAN PEOPLE
VOCABULARIES ARE SETS OF VECTOR EMBEDDINGS
From: Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M. and Riedel, S. (2016). Emoji2vec: learning emoji representations from their description. arXiv:1609.08359v1.
TRAINING DATASETS ARE GROWING IN VOLUME AND COVERAGE
From: Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B. and Vijayanarasimhan, S. YouTube-8M: a large-scale video classification benchmark. arXiv:1609.08675.
MODELS ARE BECOMING REUSABLE DATA RESOURCES
Check out: sujitpal.blogspot.com for more
MACHINE LEARNING DATASETS AND MODELS ARE BECOMING
PART OF THE WEB
• Machines need lots and lots of data to learn how to read
• Datasets with ad-hoc formats are being made openly available
• Open Images “~9 million URLs to images that have been annotated with labels spanning over 6000 categories” (The Open Images Dataset.
(n.d.). Retrieved September 29, 2016, from https://github.com/openimages/dataset.)
• YouTube-8M : “8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of
4800 Knowledge Graph entities” (Vijayanarasimhan S. and Natsev, P. (2016). Announcing YouTube-8M: A Large and Diverse Labeled Video
Dataset for Video Understanding Research. Retrieved September 29, 2016, https://research.googleblog.com/2016/09/announcing-youtube-8m-
large-and-diverse.html.)
• Stanford Natural Language Inference: “570k human-written English sentence pairs manually labeled for balanced classification with the
labels entailment, contradiction, and neutral, supporting the task of natural language inference” (The Stanford Natural Language Inference
(SNLI) Corpus. (n.d.). Retrieved September 29, 2016, from http://nlp.stanford.edu/projects/snli/.)
• Standard architectures for machine (deep) learning are being released as open source
• Dense neural networks for classification
• Convolutional neural networks for image, audio and video recognition
• Recurrent neural networks for sequence processing and generation
• Advances in the field are being published quickly and transferred to industrial application just as
quickly
THE OPPORTUNITY FOR LIBRARIANS AND PUBLISHERS
As machines become increasingly capable of general-
purpose language understanding, the burden of effort in
building machine intelligences will shift from software
engineering to the acquisition, organization and curation
of training content and data.
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
SAVE THE TIME OF THE MACHINE READER
Perhaps this law is not so self-evident as the others.
None the less, it has been responsible for many
reforms in library administration and has a great
potentiality for effecting many more reforms in the
future.
Ranganathan, S.R. (1931). The five laws of library science. Madras: The
Madras Library Association.
IMAGE SOURCE: HTTP://WESTPORTLIBRARY.ORG/ABOUT/NEWS/ROBOTS-ARRIVE-WESTPORT-LIBRARY
WHAT DOES IT LOOK LIKE TO HAVE MACHINES AS
LIBRARY PATRONS?
Tasks
1. Dataset / Model / Vocabulary Curation
2. Combating Bias
3. Explanation
4. Interoperability
5. Data  Narratives
DATASET CURATION
MODEL CURATION
VOCABULARY CURATION
BATTLING BIAS
BATTLING BIAS: ALGORITHMIC LITERACY
Algorithms all have their own ideologies. As computational
methods and data science become more and more a part of
every aspect of our lives, it is essential that work begin to ensure
there is a broader literacy about these techniques and that
there is an expansive and deep engagement in the ethical
issues surrounding them.”
– Trevor Owens (Library of Congress / Former IMLS)
http://www.pewinternet.org/2017/02/08/theme-7-the-need-grows-for-algorithmic-literacy-transparency-and-oversight/
THE RIGHT TO AN EXPLANATION
“The data subject shall have the right to obtain … the
existence of automated decision-making, including profiling
… meaningful information about the logic involved, as
well as the significance and the envisaged consequences
of such processing for the data subject.”
EU General Data Protection Chapter 3, Article 15
PROVENANCE FOR EXPLANATION
Credits: Curt Tilmes, Peter Fox
Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G.,
"Provenance Representation for the National Climate Assessment in the Global Change Information System,"
Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
NATIONAL CLIMATE CHANGE ASSESSMENT
PROVENANCE
INTEROPERABILITY
DATA  NARRATIVE GENERATION
Towards Automating Data Narratives.
Gil, Y.; and Garijo, D. In Proceedings of the
Twenty-Second ACM International Conference
on Intelligent User Interfaces (IUI-17),
Limassol, Cyprus, 2017.
THE CHALLENGE: DIGITAL LIBRARIES FOR MACHINES
• Digital Libraries have made tremendous strides in making media available
• The investment in Linked Data and APIs has made integration and building
applications easier and can help machine reader use cases
• But a new user needs new support:
• new forms of media (models, data)
• new vocabulary representations
• new forms of transparency
• new ways to interoperate
• new mechanisms to communicate
• ….
THANK YOU
Dr. Paul Groth | @pgroth | pgroth.com
labs.elsevier.com

More Related Content

What's hot

Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Data science and privacy regulation
Data science and privacy regulationData science and privacy regulation
Data science and privacy regulationblogzilla
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 

What's hot (20)

Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Data science and privacy regulation
Data science and privacy regulationData science and privacy regulation
Data science and privacy regulation
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 

Similar to Machines are people too

Discoverability and Web-Enabled Science - #ScholarAfrica
Discoverability and Web-Enabled Science - #ScholarAfricaDiscoverability and Web-Enabled Science - #ScholarAfrica
Discoverability and Web-Enabled Science - #ScholarAfricaKaitlin Thaney
 
Social Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationSocial Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationDavid De Roure
 
Semantic Technologies in Learning Environments
Semantic Technologies in Learning EnvironmentsSemantic Technologies in Learning Environments
Semantic Technologies in Learning EnvironmentsDragan Gasevic
 
DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13Bradley Allen
 
Social Machines of Science and Scholarship
Social Machines of Science and ScholarshipSocial Machines of Science and Scholarship
Social Machines of Science and ScholarshipDavid De Roure
 
Scholarly Social Machines
Scholarly Social MachinesScholarly Social Machines
Scholarly Social MachinesDavid De Roure
 
Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015Kaitlin Thaney
 
Engineering a Data Scientist
Engineering a Data ScientistEngineering a Data Scientist
Engineering a Data ScientistAron Ahmadia
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...Ig Bittencourt
 
myExperiment and the Rise of Social Machines
myExperiment and the Rise of Social MachinesmyExperiment and the Rise of Social Machines
myExperiment and the Rise of Social MachinesDavid De Roure
 
Making the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesMaking the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402vrij
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
LIS Game Changer Trends and Profession Motivation by Muhammad Shafiq Rana
LIS Game Changer Trends and Profession Motivation by Muhammad Shafiq RanaLIS Game Changer Trends and Profession Motivation by Muhammad Shafiq Rana
LIS Game Changer Trends and Profession Motivation by Muhammad Shafiq RanaAta Rehman
 
New e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionNew e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionDavid De Roure
 

Similar to Machines are people too (20)

Discoverability and Web-Enabled Science - #ScholarAfrica
Discoverability and Web-Enabled Science - #ScholarAfricaDiscoverability and Web-Enabled Science - #ScholarAfrica
Discoverability and Web-Enabled Science - #ScholarAfrica
 
Social Machines of Scholarly Collaboration
Social Machines of Scholarly CollaborationSocial Machines of Scholarly Collaboration
Social Machines of Scholarly Collaboration
 
Semantic Technologies in Learning Environments
Semantic Technologies in Learning EnvironmentsSemantic Technologies in Learning Environments
Semantic Technologies in Learning Environments
 
DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13
 
Social Machines of Science and Scholarship
Social Machines of Science and ScholarshipSocial Machines of Science and Scholarship
Social Machines of Science and Scholarship
 
Scholarly Social Machines
Scholarly Social MachinesScholarly Social Machines
Scholarly Social Machines
 
Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015
 
Engineering a Data Scientist
Engineering a Data ScientistEngineering a Data Scientist
Engineering a Data Scientist
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
 
myExperiment and the Rise of Social Machines
myExperiment and the Rise of Social MachinesmyExperiment and the Rise of Social Machines
myExperiment and the Rise of Social Machines
 
Making the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesMaking the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture Series
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
LIS Game Changer Trends and Profession Motivation by Muhammad Shafiq Rana
LIS Game Changer Trends and Profession Motivation by Muhammad Shafiq RanaLIS Game Changer Trends and Profession Motivation by Muhammad Shafiq Rana
LIS Game Changer Trends and Profession Motivation by Muhammad Shafiq Rana
 
New e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionNew e-Science Edinburgh Late Edition
New e-Science Edinburgh Late Edition
 
Bibliotheek & Onderzoek 2.0?
Bibliotheek & Onderzoek 2.0?Bibliotheek & Onderzoek 2.0?
Bibliotheek & Onderzoek 2.0?
 

More from Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

More from Paul Groth (11)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Recently uploaded

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Recently uploaded (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Machines are people too

  • 1. MACHINES ARE PEOPLE TOO Dr. Paul Groth | @pgroth | pgroth.com Disruptive Technology Director Elsevier Labs | @elsevierlabs Theory and Practice of Digital Libraries 2017
  • 2. THANKS FOR CONVERSATION & SLIDES! Riffing off of Brad’s Dublin Core 2016 keynote https://www.slideshare.net/bpa777/ dc2016-keynote-20161013- 67164305
  • 3. THE SUCCESS OF DIGITAL LIBRARIES “Live every day like it's NBER day”
  • 4. THE SUCCESS OF DIGITAL LIBRARIES
  • 5. THE SUCCESS OF DIGITAL LIBRARIES
  • 6. THE SUCCESS OF DIGITAL LIBRARIES
  • 7. THE SUCCESS OF DIGITAL LIBRARIES
  • 9.
  • 11.
  • 13. DATA SEARCH Antony Scerri, John Kuriakose, Amit Ajit Deshmane, Mark Stanger, Peter Cotroneo, Rebekah Moore, Raj Naik, Anita de Waard; Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge, Database, Volume 2017, 1 January 2017, bax056, https://doi.org/10.1093/database/bax056
  • 14. THE CENTRALITY OF THE USER
  • 15. HOW DO RESEARCHERS SEARCH FOR DATA? Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey: 1. The needs and behaviours of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. 2. Background uses of observational data are better documented than foreground uses. 3. Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common.
  • 16. BUT ARE WE MISSING A USER?
  • 18. ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR RESEARCHERS, DOCTORS AND NURSES My work is moving towards a new field; what should I know? • Journal articles, reference works, profiles of researchers, funders & institutions • Recommendations of people to connect with, reading lists, topic pages How should I treat my patient given her condition & history? • Journal articles, reference works, medical guidelines, electronic health records • Treatment plan with alternatives personalized for the patient How can I master the subject matter of the course I am taking? • Course syllabus, reference works, course objectives, student history • Quiz plan based on the student’s history and course objectives
  • 20. WHAT CAN MACHINE INTELLIGENCE DO TODAY? If there’s a task that a normal person can do with less than one second of thinking, there’s a very good chance we can automate it with deep learning. Andrew Ng, Chief Scientist, Baidu (lecture at Bay Area Deep Learning School, Stanford, CA, September 24, 2016)
  • 21. HUMAN SPEECH RECOGNITION Was 23% in 2013, and over 35% in 2012. https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/
  • 23. THESE RESULTS ARE DRIVEN BY DATA “The paradigm shift of the ImageNet thinking is that while a lot of people are paying attention to models, let’s pay attention to data, …” – Prof. Fei-Fei Li [1] [1] The data that transformed AI research—and possibly the world https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and- possibly-the-world/
  • 24. THE GROWTH IN DATA ENGINEERS https://www.stitchdata.com/resources/reports/the-state-of-data-engineering
  • 25. BUT DO DIGITAL LIBRARIES HELP MACHINES? • Machines’ proficiency in learning to answer questions from text, audio, images and video will depend on our ability to train them effectively to read information from the Web • How machines read the Web today • Crawling and indexing Web resources, possibly semantically tagged (e.g. using schema.org) • Find-and-follow crawling of open linked data resources for ontology and data sharing and reuse • Programmatic access to APIs mediated through HTTP/S and other Internet protocols
  • 26. DIGITAL LIBRARIES & LINKED DATA STANDARDS
  • 27. THE SEMANTIC WEB WAS INTENDED FOR MACHINE READING … that’s the real idea behind the Semantic Web: letting software use the vast collective genius embedded in its published pages. Swartz, A. (2013). Aaron Swartz's A programmable Web: An unfinished work. San Rafael, Calif.: Morgan & Claypool Publishers.
  • 28. BUT THE SEMANTIC WEB IS BUILT FOR PEOPLE, NOT MACHINES • The Semantic Web is largely a logicist take on the way knowledge is to be represented • The latest advances in machine intelligence are based on a connectionist approach to knowledge representation • There is a gap between how knowledge is represented in the Semantic Web and what deep learning is exploiting to such good effect • The Semantic Web is silent about how machines can become better readers, and hence better partners in the second machine age • How will we evolve metadata standards to better accommodate machines?
  • 29. MACHINE READING IS ENABLED BY MACHINE LEARNING input output algorithm input output model learning architecture data Programming Machine learning GPU CPU CPU
  • 30. MACHINES SEE THINGS DIFFERENTLY THAN PEOPLE From: Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv:1610.01644v1.
  • 31. MACHINES LEARN THINGS DIFFERENTLY THAN PEOPLE
  • 32. VOCABULARIES ARE SETS OF VECTOR EMBEDDINGS From: Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M. and Riedel, S. (2016). Emoji2vec: learning emoji representations from their description. arXiv:1609.08359v1.
  • 33. TRAINING DATASETS ARE GROWING IN VOLUME AND COVERAGE From: Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B. and Vijayanarasimhan, S. YouTube-8M: a large-scale video classification benchmark. arXiv:1609.08675.
  • 34. MODELS ARE BECOMING REUSABLE DATA RESOURCES Check out: sujitpal.blogspot.com for more
  • 35. MACHINE LEARNING DATASETS AND MODELS ARE BECOMING PART OF THE WEB • Machines need lots and lots of data to learn how to read • Datasets with ad-hoc formats are being made openly available • Open Images “~9 million URLs to images that have been annotated with labels spanning over 6000 categories” (The Open Images Dataset. (n.d.). Retrieved September 29, 2016, from https://github.com/openimages/dataset.) • YouTube-8M : “8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities” (Vijayanarasimhan S. and Natsev, P. (2016). Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research. Retrieved September 29, 2016, https://research.googleblog.com/2016/09/announcing-youtube-8m- large-and-diverse.html.) • Stanford Natural Language Inference: “570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference” (The Stanford Natural Language Inference (SNLI) Corpus. (n.d.). Retrieved September 29, 2016, from http://nlp.stanford.edu/projects/snli/.) • Standard architectures for machine (deep) learning are being released as open source • Dense neural networks for classification • Convolutional neural networks for image, audio and video recognition • Recurrent neural networks for sequence processing and generation • Advances in the field are being published quickly and transferred to industrial application just as quickly
  • 36. THE OPPORTUNITY FOR LIBRARIANS AND PUBLISHERS As machines become increasingly capable of general- purpose language understanding, the burden of effort in building machine intelligences will shift from software engineering to the acquisition, organization and curation of training content and data.
  • 37. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER SAVE THE TIME OF THE MACHINE READER Perhaps this law is not so self-evident as the others. None the less, it has been responsible for many reforms in library administration and has a great potentiality for effecting many more reforms in the future. Ranganathan, S.R. (1931). The five laws of library science. Madras: The Madras Library Association.
  • 38. IMAGE SOURCE: HTTP://WESTPORTLIBRARY.ORG/ABOUT/NEWS/ROBOTS-ARRIVE-WESTPORT-LIBRARY WHAT DOES IT LOOK LIKE TO HAVE MACHINES AS LIBRARY PATRONS? Tasks 1. Dataset / Model / Vocabulary Curation 2. Combating Bias 3. Explanation 4. Interoperability 5. Data  Narratives
  • 43. BATTLING BIAS: ALGORITHMIC LITERACY Algorithms all have their own ideologies. As computational methods and data science become more and more a part of every aspect of our lives, it is essential that work begin to ensure there is a broader literacy about these techniques and that there is an expansive and deep engagement in the ethical issues surrounding them.” – Trevor Owens (Library of Congress / Former IMLS) http://www.pewinternet.org/2017/02/08/theme-7-the-need-grows-for-algorithmic-literacy-transparency-and-oversight/
  • 44. THE RIGHT TO AN EXPLANATION “The data subject shall have the right to obtain … the existence of automated decision-making, including profiling … meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.” EU General Data Protection Chapter 3, Article 15
  • 45. PROVENANCE FOR EXPLANATION Credits: Curt Tilmes, Peter Fox Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G., "Provenance Representation for the National Climate Assessment in the Global Change Information System," Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
  • 46. NATIONAL CLIMATE CHANGE ASSESSMENT PROVENANCE
  • 48. DATA  NARRATIVE GENERATION Towards Automating Data Narratives. Gil, Y.; and Garijo, D. In Proceedings of the Twenty-Second ACM International Conference on Intelligent User Interfaces (IUI-17), Limassol, Cyprus, 2017.
  • 49. THE CHALLENGE: DIGITAL LIBRARIES FOR MACHINES • Digital Libraries have made tremendous strides in making media available • The investment in Linked Data and APIs has made integration and building applications easier and can help machine reader use cases • But a new user needs new support: • new forms of media (models, data) • new vocabulary representations • new forms of transparency • new ways to interoperate • new mechanisms to communicate • ….
  • 50. THANK YOU Dr. Paul Groth | @pgroth | pgroth.com labs.elsevier.com

Editor's Notes

  1. 8800 facebook group print
  2. Media
  3. 115 organizations
  4. Work with dans Reviewed 400 papers deep dive 114
  5. Sundar Pichai
  6. These laws are: Books are for use. Every reader his / her book. Every book its reader. Save the time of the reader. The library is a growing organism.
  7. Obviously, this is facetious. The “patron” is the machine learning faculty, not the machine itslelf.
  8. Identying and document