SlideShare a Scribd company logo
Analytics Beyond
Usage Numbers
Presented for NISO, September 12, 2018
Corey A Harper (@chrpr)
Applying analytics to metadata,
content, and research
Many thanks for contributions from Brad Allen, Jessica Cox,
Ron Daniel, Helena Deus, Paul Groth, Darin McBeath, and
Tony Scerri
September 12, 2018
Analytics Beyond Usage Numbers
• Introduction to analytics
• Metadata analytics – a DPLA case study
- Metadata visualization
- Metadata completeness & effects on usage
• Information analytics
- “A global information analytics company”
- To support linked data & knowledge graphs
- Examples
• Tools, recommended practice, and conclusion
2
Books, presentations, and systems
September 12, 2018
Analytics Beyond Usage Numbers
3
Invokes library assessment
September 12, 2018
Analytics Beyond Usage Numbers
4
Metadata Analytics
September 12, 2018
Analytics Beyond Usage Numbers
5
Quantifying metadata – a case study
September 12, 2018
Analytics Beyond Usage Numbers
6
• ”Record completeness”
• Field distributions and statistics
• Usage data and query language
• Natural language processing
- Query language
- Record full text
- Field full text
• Term and bi-gram frequency
• Topic modeling
• Metadata impact on usage
https://journal.code4lib.org/articles/11752
Caveat: data and graphics are from 2016
Average # of subjects by provider
September 12, 2018
Analytics Beyond Usage Numbers
7
Percentage of records with subject
September 12, 2018
Analytics Beyond Usage Numbers
8
September 12, 2018
Analytics Beyond Usage Numbers
9
Star plots in D3
September 12, 2018
Analytics Beyond Usage Numbers
10
Average Field
Count
Percentage with
at least 1
Univ. of N. Texas Metadata Quality Interface
September 12, 2018
Analytics Beyond Usage Numbers
11
http://dublincore.org/conference/2018/abstracts/#564
Term frequency distributions
September 12, 2018
Analytics Beyond Usage Numbers
12
More than ¼ of words are rights statements!
September 12, 2018
Analytics Beyond Usage Numbers
13
September 12, 2018
Analytics Beyond Usage Numbers
14
DPLA Google searches
September 12, 2018
Analytics Beyond Usage Numbers
15
Percent of items with at least 1 view
September 12, 2018
Analytics Beyond Usage Numbers
16
Caveat: skewed usage data
September 12, 2018
Analytics Beyond Usage Numbers
17
Predicting usage
September 12, 2018
Analytics Beyond Usage Numbers
18
Decision tree results
September 12, 2018
Analytics Beyond Usage Numbers
19
Information Analytics
September 12, 2018
Analytics Beyond Usage Numbers
20
“A global information analytics business”
September 12, 2018
Analytics Beyond Usage Numbers
21
To help customers answer questions at point of need:
Elsevier combines content with technology
to provide actionable knowledge
Operational Excellence
Content Technology
Chemistry database
500m published experimental facts
User queries
13m monthly users on ScienceDirect
Books
35,000 published books
Drug Database
100% of drug information from
pharmaceutical companies updated daily
Research
16% of the world’s research data and
articles published by Elsevier
1,000
technologists employed by Elsevier
Machine learning
Over 1,000 predictive models trained on 1.5
billion electronic health care events
Machine reading
475m facts extracted from
ScienceDirect
Collaborative filtering:
1bn scientific articles added by 2.5m
researchers analyzed daily to generate over
250m article recommendations
Semantic Enhancement
Knowledge on 50m chemicals captured as 11B
facts
22
Elsevier Labs
September 12, 2018
Analytics Beyond Usage Numbers
• Reports into Architecture / Technology
• Mix of Researchers and very experienced software
developers
• Two main modes of work
- Targeted Research, primarily stuff that’s still 2-3 years out
- Accelerated Development, in partnership with and
support of product groups
• Applying state of the art research to medical and
scientific domain
23
Differences in citation language
September 12, 2018
Analytics Beyond Usage Numbers
24
Researchers have successfully
reprogrammed somatic cells into stem-
like cells – known as induced pluripotent
stem cells (iPSCs) – which share many
of the characteristics of ESCs (Takahashi
and Yamanaka, 2006).
Human nephron progenitors were
induced from iPSCs (201B7) (Takahashi
and Yamanaka, 2006), based on the
protocol that we previously established
(Taguchi et al., 2014).
Materials and Methods Introduction
Citation language pre- & post- Nobel Prize
September 12, 2018
Analytics Beyond Usage Numbers
25
https://openaccess.leidenuniv.nl/handle/1887/65351
AnnotationQuery
September 12, 2018
Analytics Beyond Usage Numbers
26
This allows us to search for:
• a <sentence>
• in the <methods_section>
• that contains
• a citation to to
• A Nobel Prize Paper
(“Nobel_papers.txt”)
https://github.com/elsevierlabs-os/AnnotationQuery
Building blocks for text analytics
September 12, 2018
Analytics Beyond Usage Numbers
• Original Markup
Annotations
• Part of Speech Annotations
• Sentences, Paragraphs,
Noun Phrases, Verb
Phrases
• Dependency and
Constituency Parse Trees
• Query Across Annotation
Sets!
27
Additional use cases
September 12, 2018
Analytics Beyond Usage Numbers
Units and Measurements
• Find a <numeric pattern>
12 ± 3, 53–55, 0.245
• Followed by a <unit of
measurement>
°C, μM, hours, h, MPa,
28
Temperatures
• Find a <U&M>
• Of type <temperature>
• With the word
<housing>
• in the same <sentence>
• in the <methods_section>
Units and measurements
September 12, 2018
Analytics Beyond Usage Numbers
• Nanoamperes (nA) for neural cell Rheobase values
• Megapascals (mPa) for compressive strength of concrete
• Milligrams per Kilogram (mg/kg) for administered drug
dosages
29
Cold Mice Problem
September 12, 2018
Analytics Beyond Usage Numbers
30
September 12, 2018
Analytics Beyond Usage Numbers
31
https://ieeexplore.ieee.org/abstract/document/8258456/
Additional parameters
September 12, 2018
Analytics Beyond Usage Numbers
32
Visualizing data from tables
33
September 12, 2018
Analytics Beyond Usage Numbers
Information analytics enables:
September 12, 2018
Analytics Beyond Usage Numbers
• Datasets aggregated across the literature
• Knowledge Graphs for specific domains
• Databases of experimental results
• Decision support and question answering systems
This kind of information extraction can be a key
component of realizing the library community’s vision for
linked data in cultural heritage & scholarly research.
34
A tour of analytics
September 12, 2018
Analytics Beyond Usage Numbers
• From library analytics and metrics,
• To metadata analytics,
• To information analytics and knowledge graph
extraction
• Heterogenous data streams:
- Combined in interesting ways,
- Made queryable and recombinant,
- For use in question answering, visualization, and more.
35
36
September 12, 2018
Analytics Beyond Usage Numbers
36
• Heterogeneous storage
• Databases
• Graphs
• Columnar data formats
• Cloud object storage
• Heterogeneous tools and systems
• Spark and Kafka
• Tableau, D3, Seaborn
• Notebooks (Jupyter, Databricks)
Design guidance
https://dataintensive.net/
Thank you
Corey A Harper
Sr. Technology Researcher
Elsevier Labs
c.harper@elsevier.com
@chrpr
Backup
Implicit metadata
September 12, 2018
Analytics Beyond Usage Numbers
• Term & N-gram Frequencies
• Topic Maps
• Query Language
• Click & Usage Data
• Referral Patterns
39
Citation Analysis
September 12, 2018
Analytics Beyond Usage Numbers
40
Library analytics
September 12, 2018
Analytics Beyond Usage Numbers
• Data informed decision making
• Use cases around:
- Library instruction
- Personalized recommendations
- E-resource cost per use
- Physical collections & space
- Digital collections
• Tying library programs to student
GPA
• Building personas from data
• Service point staffing and use
41
All of this requires
• Data collection and integration:
- University data warehouses
- Library systems
- Subscription vendors
• Data management policies
• Data analysis tools and expertise
September 12, 2018
Analytics Beyond Usage Numbers
42
Frequency Distributions
September 12, 2018
Analytics Beyond Usage Numbers
43
Answers are about things, not just Works
September 12, 2018
Analytics Beyond Usage Numbers
44
Why shouldn’t a search on an author
return information about the author,
including the author’s works? Where
was the author born, when did she
live, what is she known for? … All of
this is possible, but only if we can
make some fundamental changes in
our approach to bibliographic
description. ... The challenge for us
lies in transforming what we can of our
data into interrelated “things” without
overindulging that metaphor.
Coyle, K. (2016). FRBR, before and after: a look at our
bibliographical models. Chicago: ALA Editions.
Building Knowledge Graphs
September 12, 2018
Analytics Beyond Usage Numbers
45
Plus LAWDI, LOD-LAM, LD4L-Labs, & Many More
https://zepheira.com/ – https://linkedjazz.org/network/ – http://vivo.cornell.edu/
Algorithmic Bias
September 12, 2018
Analytics Beyond Usage Numbers
46

More Related Content

What's hot

Mining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment CycleMining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment CycleAmanda Clay Powers
 
Library support for metrics: What can and should we do?
Library support for metrics: What can and should we do?Library support for metrics: What can and should we do?
Library support for metrics: What can and should we do?Christina Pikas
 
Pikas using bibliometrics to make sense of research proposals
Pikas using bibliometrics to make sense of research proposalsPikas using bibliometrics to make sense of research proposals
Pikas using bibliometrics to make sense of research proposalsChristina Pikas
 
Case studies for open science
Case studies for open scienceCase studies for open science
Case studies for open scienceIUPUI
 
Library intelligence notes
Library intelligence notesLibrary intelligence notes
Library intelligence notesJoe Matthews
 
The role of new information and communication technologies in information and...
The role of new information and communication technologies in information and...The role of new information and communication technologies in information and...
The role of new information and communication technologies in information and...Christina Pikas
 
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsBibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsGESIS
 
Gathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate ImpactGathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate ImpactIUPUI
 
Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)danw421
 
Big data
Big dataBig data
Big data26Nia
 
Author identifiers & research impact: A role for libraries
Author identifiers & research impact: A role for librariesAuthor identifiers & research impact: A role for libraries
Author identifiers & research impact: A role for librariesKristi Holmes
 

What's hot (20)

Mining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment CycleMining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment Cycle
 
Library support for metrics: What can and should we do?
Library support for metrics: What can and should we do?Library support for metrics: What can and should we do?
Library support for metrics: What can and should we do?
 
Cassidy "Case Study: Supporting Researcher Impact and Efficiency"
Cassidy "Case Study: Supporting Researcher Impact and Efficiency"Cassidy "Case Study: Supporting Researcher Impact and Efficiency"
Cassidy "Case Study: Supporting Researcher Impact and Efficiency"
 
Pikas using bibliometrics to make sense of research proposals
Pikas using bibliometrics to make sense of research proposalsPikas using bibliometrics to make sense of research proposals
Pikas using bibliometrics to make sense of research proposals
 
Case studies for open science
Case studies for open scienceCase studies for open science
Case studies for open science
 
Library intelligence notes
Library intelligence notesLibrary intelligence notes
Library intelligence notes
 
The role of new information and communication technologies in information and...
The role of new information and communication technologies in information and...The role of new information and communication technologies in information and...
The role of new information and communication technologies in information and...
 
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsBibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
 
Warren-Jones "Using text-mining and summarisation technology to manage the gr...
Warren-Jones "Using text-mining and summarisation technology to manage the gr...Warren-Jones "Using text-mining and summarisation technology to manage the gr...
Warren-Jones "Using text-mining and summarisation technology to manage the gr...
 
Assessing and Reporting Research Impact – A Role for the Library - Kristi L....
Assessing and Reporting Research Impact – A Role for the Library  - Kristi L....Assessing and Reporting Research Impact – A Role for the Library  - Kristi L....
Assessing and Reporting Research Impact – A Role for the Library - Kristi L....
 
Roth "Tools to support systematic review research"
Roth "Tools to support systematic review research"Roth "Tools to support systematic review research"
Roth "Tools to support systematic review research"
 
Carelli "Promoting Content Discovery Within the Reader/Researcher Workflow"
Carelli "Promoting Content Discovery Within the Reader/Researcher Workflow"Carelli "Promoting Content Discovery Within the Reader/Researcher Workflow"
Carelli "Promoting Content Discovery Within the Reader/Researcher Workflow"
 
Gathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate ImpactGathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate Impact
 
Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)
 
Chilton "Collaborative Collection Assessment"
Chilton "Collaborative Collection Assessment"Chilton "Collaborative Collection Assessment"
Chilton "Collaborative Collection Assessment"
 
Data informed decision making - Yaz El Hakim
Data informed decision making - Yaz El HakimData informed decision making - Yaz El Hakim
Data informed decision making - Yaz El Hakim
 
Big data
Big dataBig data
Big data
 
Author identifiers & research impact: A role for libraries
Author identifiers & research impact: A role for librariesAuthor identifiers & research impact: A role for libraries
Author identifiers & research impact: A role for libraries
 
Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"Henderson "Institutional Identifiers"
Henderson "Institutional Identifiers"
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
 

Similar to Harper Analytics Beyond Usage Numbers

Optique presentation
Optique presentationOptique presentation
Optique presentationDBOnto
 
Ag infra kream-presentation-7-6-2013
Ag infra kream-presentation-7-6-2013Ag infra kream-presentation-7-6-2013
Ag infra kream-presentation-7-6-2013Stoitsis Giannis
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystemVarsha Khodiyar
 
Top 5 Considerations When Evaluating NoSQL
Top 5 Considerations When Evaluating NoSQLTop 5 Considerations When Evaluating NoSQL
Top 5 Considerations When Evaluating NoSQLMongoDB
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Ken Karapetyan
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
 

Similar to Harper Analytics Beyond Usage Numbers (20)

Optique presentation
Optique presentationOptique presentation
Optique presentation
 
Ag infra kream-presentation-7-6-2013
Ag infra kream-presentation-7-6-2013Ag infra kream-presentation-7-6-2013
Ag infra kream-presentation-7-6-2013
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Top 5 Considerations When Evaluating NoSQL
Top 5 Considerations When Evaluating NoSQLTop 5 Considerations When Evaluating NoSQL
Top 5 Considerations When Evaluating NoSQL
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can help
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 

Recently uploaded

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
NLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptxNLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptxssuserbdd3e8
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfkaushalkr1407
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfTamralipta Mahavidyalaya
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxDenish Jangid
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationDelapenabediema
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfQucHHunhnh
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxakshayaramakrishnan21
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfbu07226
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxJisc
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...Denish Jangid
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfVivekanand Anglo Vedic Academy
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfYibeltalNibretu
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersPedroFerreira53928
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismDeeptiGupta154
 

Recently uploaded (20)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
NLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptxNLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 

Harper Analytics Beyond Usage Numbers

  • 1. Analytics Beyond Usage Numbers Presented for NISO, September 12, 2018 Corey A Harper (@chrpr) Applying analytics to metadata, content, and research Many thanks for contributions from Brad Allen, Jessica Cox, Ron Daniel, Helena Deus, Paul Groth, Darin McBeath, and Tony Scerri
  • 2. September 12, 2018 Analytics Beyond Usage Numbers • Introduction to analytics • Metadata analytics – a DPLA case study - Metadata visualization - Metadata completeness & effects on usage • Information analytics - “A global information analytics company” - To support linked data & knowledge graphs - Examples • Tools, recommended practice, and conclusion 2
  • 3. Books, presentations, and systems September 12, 2018 Analytics Beyond Usage Numbers 3
  • 4. Invokes library assessment September 12, 2018 Analytics Beyond Usage Numbers 4
  • 5. Metadata Analytics September 12, 2018 Analytics Beyond Usage Numbers 5
  • 6. Quantifying metadata – a case study September 12, 2018 Analytics Beyond Usage Numbers 6 • ”Record completeness” • Field distributions and statistics • Usage data and query language • Natural language processing - Query language - Record full text - Field full text • Term and bi-gram frequency • Topic modeling • Metadata impact on usage https://journal.code4lib.org/articles/11752 Caveat: data and graphics are from 2016
  • 7. Average # of subjects by provider September 12, 2018 Analytics Beyond Usage Numbers 7
  • 8. Percentage of records with subject September 12, 2018 Analytics Beyond Usage Numbers 8
  • 9. September 12, 2018 Analytics Beyond Usage Numbers 9
  • 10. Star plots in D3 September 12, 2018 Analytics Beyond Usage Numbers 10 Average Field Count Percentage with at least 1
  • 11. Univ. of N. Texas Metadata Quality Interface September 12, 2018 Analytics Beyond Usage Numbers 11 http://dublincore.org/conference/2018/abstracts/#564
  • 12. Term frequency distributions September 12, 2018 Analytics Beyond Usage Numbers 12
  • 13. More than ¼ of words are rights statements! September 12, 2018 Analytics Beyond Usage Numbers 13
  • 14. September 12, 2018 Analytics Beyond Usage Numbers 14
  • 15. DPLA Google searches September 12, 2018 Analytics Beyond Usage Numbers 15
  • 16. Percent of items with at least 1 view September 12, 2018 Analytics Beyond Usage Numbers 16
  • 17. Caveat: skewed usage data September 12, 2018 Analytics Beyond Usage Numbers 17
  • 18. Predicting usage September 12, 2018 Analytics Beyond Usage Numbers 18
  • 19. Decision tree results September 12, 2018 Analytics Beyond Usage Numbers 19
  • 20. Information Analytics September 12, 2018 Analytics Beyond Usage Numbers 20
  • 21. “A global information analytics business” September 12, 2018 Analytics Beyond Usage Numbers 21
  • 22. To help customers answer questions at point of need: Elsevier combines content with technology to provide actionable knowledge Operational Excellence Content Technology Chemistry database 500m published experimental facts User queries 13m monthly users on ScienceDirect Books 35,000 published books Drug Database 100% of drug information from pharmaceutical companies updated daily Research 16% of the world’s research data and articles published by Elsevier 1,000 technologists employed by Elsevier Machine learning Over 1,000 predictive models trained on 1.5 billion electronic health care events Machine reading 475m facts extracted from ScienceDirect Collaborative filtering: 1bn scientific articles added by 2.5m researchers analyzed daily to generate over 250m article recommendations Semantic Enhancement Knowledge on 50m chemicals captured as 11B facts 22
  • 23. Elsevier Labs September 12, 2018 Analytics Beyond Usage Numbers • Reports into Architecture / Technology • Mix of Researchers and very experienced software developers • Two main modes of work - Targeted Research, primarily stuff that’s still 2-3 years out - Accelerated Development, in partnership with and support of product groups • Applying state of the art research to medical and scientific domain 23
  • 24. Differences in citation language September 12, 2018 Analytics Beyond Usage Numbers 24 Researchers have successfully reprogrammed somatic cells into stem- like cells – known as induced pluripotent stem cells (iPSCs) – which share many of the characteristics of ESCs (Takahashi and Yamanaka, 2006). Human nephron progenitors were induced from iPSCs (201B7) (Takahashi and Yamanaka, 2006), based on the protocol that we previously established (Taguchi et al., 2014). Materials and Methods Introduction
  • 25. Citation language pre- & post- Nobel Prize September 12, 2018 Analytics Beyond Usage Numbers 25 https://openaccess.leidenuniv.nl/handle/1887/65351
  • 26. AnnotationQuery September 12, 2018 Analytics Beyond Usage Numbers 26 This allows us to search for: • a <sentence> • in the <methods_section> • that contains • a citation to to • A Nobel Prize Paper (“Nobel_papers.txt”) https://github.com/elsevierlabs-os/AnnotationQuery
  • 27. Building blocks for text analytics September 12, 2018 Analytics Beyond Usage Numbers • Original Markup Annotations • Part of Speech Annotations • Sentences, Paragraphs, Noun Phrases, Verb Phrases • Dependency and Constituency Parse Trees • Query Across Annotation Sets! 27
  • 28. Additional use cases September 12, 2018 Analytics Beyond Usage Numbers Units and Measurements • Find a <numeric pattern> 12 ± 3, 53–55, 0.245 • Followed by a <unit of measurement> °C, μM, hours, h, MPa, 28 Temperatures • Find a <U&M> • Of type <temperature> • With the word <housing> • in the same <sentence> • in the <methods_section>
  • 29. Units and measurements September 12, 2018 Analytics Beyond Usage Numbers • Nanoamperes (nA) for neural cell Rheobase values • Megapascals (mPa) for compressive strength of concrete • Milligrams per Kilogram (mg/kg) for administered drug dosages 29
  • 30. Cold Mice Problem September 12, 2018 Analytics Beyond Usage Numbers 30
  • 31. September 12, 2018 Analytics Beyond Usage Numbers 31 https://ieeexplore.ieee.org/abstract/document/8258456/
  • 32. Additional parameters September 12, 2018 Analytics Beyond Usage Numbers 32
  • 33. Visualizing data from tables 33 September 12, 2018 Analytics Beyond Usage Numbers
  • 34. Information analytics enables: September 12, 2018 Analytics Beyond Usage Numbers • Datasets aggregated across the literature • Knowledge Graphs for specific domains • Databases of experimental results • Decision support and question answering systems This kind of information extraction can be a key component of realizing the library community’s vision for linked data in cultural heritage & scholarly research. 34
  • 35. A tour of analytics September 12, 2018 Analytics Beyond Usage Numbers • From library analytics and metrics, • To metadata analytics, • To information analytics and knowledge graph extraction • Heterogenous data streams: - Combined in interesting ways, - Made queryable and recombinant, - For use in question answering, visualization, and more. 35
  • 36. 36 September 12, 2018 Analytics Beyond Usage Numbers 36 • Heterogeneous storage • Databases • Graphs • Columnar data formats • Cloud object storage • Heterogeneous tools and systems • Spark and Kafka • Tableau, D3, Seaborn • Notebooks (Jupyter, Databricks) Design guidance https://dataintensive.net/
  • 37. Thank you Corey A Harper Sr. Technology Researcher Elsevier Labs c.harper@elsevier.com @chrpr
  • 39. Implicit metadata September 12, 2018 Analytics Beyond Usage Numbers • Term & N-gram Frequencies • Topic Maps • Query Language • Click & Usage Data • Referral Patterns 39
  • 40. Citation Analysis September 12, 2018 Analytics Beyond Usage Numbers 40
  • 41. Library analytics September 12, 2018 Analytics Beyond Usage Numbers • Data informed decision making • Use cases around: - Library instruction - Personalized recommendations - E-resource cost per use - Physical collections & space - Digital collections • Tying library programs to student GPA • Building personas from data • Service point staffing and use 41 All of this requires • Data collection and integration: - University data warehouses - Library systems - Subscription vendors • Data management policies • Data analysis tools and expertise
  • 42. September 12, 2018 Analytics Beyond Usage Numbers 42
  • 43. Frequency Distributions September 12, 2018 Analytics Beyond Usage Numbers 43
  • 44. Answers are about things, not just Works September 12, 2018 Analytics Beyond Usage Numbers 44 Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  • 45. Building Knowledge Graphs September 12, 2018 Analytics Beyond Usage Numbers 45 Plus LAWDI, LOD-LAM, LD4L-Labs, & Many More https://zepheira.com/ – https://linkedjazz.org/network/ – http://vivo.cornell.edu/
  • 46. Algorithmic Bias September 12, 2018 Analytics Beyond Usage Numbers 46

Editor's Notes

  1. Script: We’ve been working on combining our vast quantities of structured data with technology, supported by our operational expertise. You may know us for content from articles and books, but for example, we hold 500m experimental facts in our chemistry databases. We collect 13M user queries on ScienceDirect every month. Elsevier is sitting a top a trove of “big-data”. And we’ve built the technology muscle to process that data. We employ over 1000 technologists. We’re using artificial intelligence such as machine reading and machine learning. As an example of scale on ScienceDirect, we employ collaborative filtering, analyzing 1bn articles from 2.5m researchers daily, to generate 250m article recommendations.