This presentation was given by guest lecturer Dr. Hélène Draux of Digital Science Consultancy, during the fourth session of the NISO Spring training series "Working with Scholarly APIs." Session Four, Digital Science Dimensions, was moderated by Phill Jones of MoreBrains Cooperative and held on May 19, 2022.
4. 1-5 years from grant to publication immediate 2-3 years years years decades
Pre-publication Post publication
Grants
Research
Preprints
Data sets
Publications
Tweets Blogs
Citations
Clinical
Trials
Patents
Policy docs
We capture a more comprehensive view of the research
lifecycle
5. We enrich and organize the data
Categorization
Concept
extraction
Reference extraction
Institution identification
Researcher
disambiguation
Incoming data
Enrichment
Dimensions
enriched metadata
Concepts
Researchers
Organizations
Classifications
References Metrics
Publications
Policy
docs
Patents
Altmetric
Datasets Grants
Clinical
trials
6. We make the connections
1.5bn links
1.8m links
448k links
142m
Patents
402m links
Status: Jan 2022
6.1m
Grants
$1.9 trillion
in funding
743k
Policy
documents
681k
Clinical
trials
200m
Altmetric
data
points
124m
Publications
improved
metadata
of 88m
11m
Datasets
7. ● Fast, large scale analyses;
dynamic dashboards
● Direct integration with BI tools
e.g. Tableau, Qlik, PowerBI
● Join with private & public data
● Ad hoc analysis & single data-
type analyses
● Full-text search & special
functions e.g. affiliation
extraction
● Product integrations e.g. CRIS
● Search & discovery;
top analytical use cases
● Dedicated UI, inbuilt
visualizations
● In the browser, no specialized
knowledge required
Web App API Google BigQuery
For everyone
For API users
+ data & analytics teams
For data & analytics teams
+ dashboards for everyone
9. Dimensions Search Language (DSL)
Powerful query language developed around
simple syntax, allowing users with various
levels of technical skills to use.
● Basic query based around two phrases
● More complexity can be included by
adding filters
● Returned data can be specified using
custom fieldsets
search publications
return publications
10. Dimensions Search Language (DSL)
Powerful query language developed around
simple syntax, allowing users with various
levels of technical skills to use.
● Basic query based around two phrases
● More complexity can be included by
adding filters
● Returned data can be specified using
custom fieldsets
search publications
return publications
Sources:
● Publications
● Grants
● Patents
● Clinical Trials
● Policy Documents
● Datasets
● Source Titles
● Reports
● Researchers
● Organizations
11. Dimensions Search Language (DSL)
Powerful query language developed around
simple syntax, allowing users with various
levels of technical skills to use.
● Basic query based around two phrases
● More complexity can be included by
adding filters
● Returned data can be specified using
custom fieldsets
search publications
for “leukemia”
where year in [2015:2017]
return publications
12. Dimensions Search Language (DSL)
Powerful query language developed around
simple syntax, allowing users with various
levels of technical skills to use.
● Basic query based around two phrases
● More complexity can be included by
adding filters
● Returned data can be specified using
custom fieldsets
search publications
for “leukemia”
where year in [2015:2017]
return publications [title+doi]
return researchers [id+last_name
return journal [title]
13. Special Functions
● Affiliations matching
○ from strings to GRID IDs (https://grid.ac/)
● Concepts extraction
○ from text to keywords
● Classification service
○ from text to ontologies (FOR, UOA, HRA, RCDC,
HRCS_HC, HRCS_RAC, SDG, BRA, ICRP_CSO
and ICRP_CT)
Dimensions Search Language (DSL)
extract_affiliations(
affiliation="university
of oxford, uk")
extract_concepts("text of
abstract")
classify(
title="..text..",
abstract="..text..",
system="FOR")
14. Python library, developed by Dimensions.
● Simplifies common API operations
○ Log in, log out
○ Querying, iterative queries
○ Extracting and transforming data e.g. to dataframes
● Includes a Command Line Interface (CLI)
○ Like a ‘query console’ with autocomplete and other handy functionalities
https://github.com/digital-science/dimcli
Dimcli
18. Dimensions Analytics web app
Online search & discovery platform:
● Explore the data with customizable and
exportable visualizations.
● Search the full-text to gain more relevant
results. Plus, search via abstract or
concept.
● Filter with sophisticated research
classification systems and content-
specific options e.g. Inventor for patents.
● Identify trends and opportunities: grants,
research topics, open access, compliance
and more.
● Use directly from the browser with no
specialised knowledge required.
19. Dimensions APIs
Powerful APIs - designed to allow flexible use
of the enriched data:
● Integrate data into applications outside of
the web-app; e.g. admin systems.
● Enrich your own data/content using
special functions like affiliation extraction
and concept extraction.
● Query using full-text search. Ideal for ad
hoc analytical queries going a step further
than the web app.
● Use without constraints for internal
purposes.
● Easy-to-learn, human-readable querying
language.
20. Dimensions on Google BigQuery
Access to the full Dimensions database, via a direct
integration with BigQuery - Google’s cloud data
warehouse.
● Integrate Dimensions data into your existing reporting
and analysis infrastructure quickly, using out-of-the
box connectors.
● Analyze the full dataset with complete flexibility to
support multiple decision-making processes.
● Join Dimensions data to your own private data to
provide a global research context for benchmarking.
● Create custom dashboards and automated reports
on different topics e.g. funding opportunities,
collaboration.
● Share these analyses and dashboards with
stakeholders across the organization so that the
relevant insights are always at their fingertips.
This Covid-19 topic dashboard took 15 minutes to
build
22. Dimensions uses an ‘inclusive approach’
● We do not exclude research outputs
based on a “behind closed doors”
content selection processes
● We believe the user should be able to
make inclusion/exclusion decisions
based on their own needs
● We provide the user with as much
information and tools as possible in order
to do so
Dimensions
empowers our users
23. ● Journal articles, pre-prints, conference
proceedings, books/book chapters
● Full text searching available for ~70% of
publications
● 100M + records based on metadata
● Metadata derived from multiple available
databases
● Highly contextualized - related grants,
publication references, citing publications,
related trials, related patents, related policy
documents, Altmetric attention
● OA tagged
Publications
PUBLICATIONS
JOURNALS /
BOOKS
PRE-PRINT / OA
...and many
more!
24. ● More than 8 million datasets
● Sourced from DataCite and Figshare
● Linked to publications, supporting grants
and funders
● Filters for research organizations, funders,
researchers and more
Datasets
DATASETS
DATACITE
FIGSHARE & FIGSHARE HOSTED REPOSITORIES
800+ more
70+ more
25. ● Project funding
● 6 million grants from 600+ funders globally
● $2 trillion of funding
● Not limited to federal/national funding
● Sourcing
○ Direct relationships with funders
○ Data available via APIs
○ Data available via websites which we
crawl
Grants data
GRANTS
26. Patents data
● 134 million+ patent
documents
● Global coverage
● 100+ jurisdictions, including
but not limited to:
○ China
○ Japan
○ United States
○ Germany
○ European Union
○ South Korea
28. Over 700,000 policy document records,
linked to publications
Including but not limited to:
● World Health Organization
● World Bank
● Centers for Disease Control &
Prevention
● Government of the United Kingdom
● National Bureau of Economic
Research
Policy documents data
POLICY DOCUMENTS