SlideShare a Scribd company logo

STI 2022 - Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery

The growth of large, programatically accessible bibliometrics databases presents new opportunities for complex analyses of publication metadata. In addition to providing a wealth of information about authors and institutions, databases such as those provided by Dimensions also provide conceptual information and links to entities such as grants, funders and patents. However, data is not the only challenge in evaluating patterns in scholarly work: These large datasets can be challenging to integrate, particularly for those unfamiliar with the complex schemas necessary for accommodating such heterogeneous information, and those most comfortable with data mining may not be as experienced in data visualisation. Here, we present an open-source Python library that streamlines the process accessing and diagramming subsets of the Dimensions on Google BigQuery database and demonstrate its use on the freely available Dimensions COVID-19 dataset. We are optimistic that this tool will expand access to this valuable information by streamlining what would otherwise be multiple complex technical tasks, enabling more researchers to examine patterns in research focus and collaboration over time.

1 of 21
Download to read offline
Generating large-scale
network analyses of
scientific landscapes in
seconds using Dimensions
on Google BigQuery
2022-09 STI / Granada
Michele Pasin
Head of Dimensions Data Solutions
m.pasin@digital-science.com
Richard J. Abdill
Data Scientist
richard.abdill@pennmedicine.upenn.edu
STI 2022 - Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery
STI 2022 - Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery
Dimensions delivers an array of search and
discovery, analytical, and research management
tools, all in a single platform.


● Moving beyond citations to better capture
the research lifecycle: the links among
publications, grants, policy documents, clinical
trials, datasets, patents, researchers are the
key to understanding this landscape.
https://www.dimensions.ai
What is Dimensions?
More than a citations database
1.6bn links
2m links
485k links
147m
Patents
423m links
245k links
327k to funders
2
0
m
l
i
n
k
s


2
9
m
t
o
f
u
n
d
e
r
s
2.4m
links
Status: July 2022
2
.
4
m
l
i
n
k
s
904k links
1.6m to funders
6.3m
Grants
$2.2 trillion 

in funding
902k
Policy
documents
719k
Clinical

trials
2
7
k
l
i
n
k
s
2
9
0
k
t
o
f
u
n
d
e
r
s
224m
Altmetric

data
points
129m
Publications

improved 

metadata
of 92m
12m
Datasets
18m links
14m links
● Fast, large scale analyses;
dynamic dashboards


● Direct integration with BI tools
e.g. Tableau, Qlik, PowerBI


● Join with private & public data
● Ad hoc analysis & single data-
type analyses


● Full-text search & special
functions e.g. affiliation extraction


● Product integrations e.g. CRIS
● Search & discovery;


top analytical use cases


● Dedicated UI, inbuilt visualizations


● In the browser, no specialized
knowledge required
Web App API Google BigQuery
For everyone
For API users
+ data & analytics teams
For data & analytics teams
+ dashboards for everyone
Dimensions ‘flavors’

Recommended

Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slidesARDC
 
GENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterGENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterIan Foster
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
 
dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET
 
Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2GigaScience, BGI Hong Kong
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive trackGeorge Komatsoulis
 

More Related Content

Similar to STI 2022 - Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataPhilip Bourne
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands Vivien Bonazzi
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingLiming Zhu
 
The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGPhilip Bourne
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementStephan Haller
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
 
RD shared services and research data spring
RD shared services and research data springRD shared services and research data spring
RD shared services and research data springJisc RDM
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingGigaScience, BGI Hong Kong
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Vivien Bonazzi
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesChris Bizer
 

Similar to STI 2022 - Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery (20)

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
The Commons
The CommonsThe Commons
The Commons
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
 
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use caseEnabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of Everything
 
The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data Management
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
RD shared services and research data spring
RD shared services and research data springRD shared services and research data spring
RD shared services and research data spring
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
 

More from Michele Pasin

Designing great dashboards: a slidedeck for dashboard developers
Designing great dashboards: a slidedeck for dashboard developersDesigning great dashboards: a slidedeck for dashboard developers
Designing great dashboards: a slidedeck for dashboard developersMichele Pasin
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureMichele Pasin
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 
How do philosophers think their own disciplines?
How do philosophers think their own disciplines?How do philosophers think their own disciplines?
How do philosophers think their own disciplines?Michele Pasin
 
The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015Michele Pasin
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 
Exploring highly interconnected humanities data: are faceted browsers always ...
Exploring highly interconnected humanities data: are faceted browsers always ...Exploring highly interconnected humanities data: are faceted browsers always ...
Exploring highly interconnected humanities data: are faceted browsers always ...Michele Pasin
 
Semantic Web Approaches in Digital History: an Introduction
Semantic Web Approaches in Digital History: an IntroductionSemantic Web Approaches in Digital History: an Introduction
Semantic Web Approaches in Digital History: an IntroductionMichele Pasin
 
Prosopography and Computer Ontologies: Towards a Formal Representation of the...
Prosopography and Computer Ontologies: Towards a Formal Representation of the...Prosopography and Computer Ontologies: Towards a Formal Representation of the...
Prosopography and Computer Ontologies: Towards a Formal Representation of the...Michele Pasin
 
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...Michele Pasin
 
An Ontological View of Canonical Citations
An Ontological View of Canonical CitationsAn Ontological View of Canonical Citations
An Ontological View of Canonical CitationsMichele Pasin
 
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...Michele Pasin
 
Livecoding with impromptu
Livecoding with impromptuLivecoding with impromptu
Livecoding with impromptuMichele Pasin
 
Introducing FRBR-OO (CCH KR workshop 2.2)
Introducing FRBR-OO (CCH KR workshop 2.2)Introducing FRBR-OO (CCH KR workshop 2.2)
Introducing FRBR-OO (CCH KR workshop 2.2)Michele Pasin
 
Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Michele Pasin
 
KR Workshop 1 - Ontologies
KR Workshop 1 - OntologiesKR Workshop 1 - Ontologies
KR Workshop 1 - OntologiesMichele Pasin
 

More from Michele Pasin (16)

Designing great dashboards: a slidedeck for dashboard developers
Designing great dashboards: a slidedeck for dashboard developersDesigning great dashboards: a slidedeck for dashboard developers
Designing great dashboards: a slidedeck for dashboard developers
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
How do philosophers think their own disciplines?
How do philosophers think their own disciplines?How do philosophers think their own disciplines?
How do philosophers think their own disciplines?
 
The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
Exploring highly interconnected humanities data: are faceted browsers always ...
Exploring highly interconnected humanities data: are faceted browsers always ...Exploring highly interconnected humanities data: are faceted browsers always ...
Exploring highly interconnected humanities data: are faceted browsers always ...
 
Semantic Web Approaches in Digital History: an Introduction
Semantic Web Approaches in Digital History: an IntroductionSemantic Web Approaches in Digital History: an Introduction
Semantic Web Approaches in Digital History: an Introduction
 
Prosopography and Computer Ontologies: Towards a Formal Representation of the...
Prosopography and Computer Ontologies: Towards a Formal Representation of the...Prosopography and Computer Ontologies: Towards a Formal Representation of the...
Prosopography and Computer Ontologies: Towards a Formal Representation of the...
 
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
 
An Ontological View of Canonical Citations
An Ontological View of Canonical CitationsAn Ontological View of Canonical Citations
An Ontological View of Canonical Citations
 
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
 
Livecoding with impromptu
Livecoding with impromptuLivecoding with impromptu
Livecoding with impromptu
 
Introducing FRBR-OO (CCH KR workshop 2.2)
Introducing FRBR-OO (CCH KR workshop 2.2)Introducing FRBR-OO (CCH KR workshop 2.2)
Introducing FRBR-OO (CCH KR workshop 2.2)
 
Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)
 
KR Workshop 1 - Ontologies
KR Workshop 1 - OntologiesKR Workshop 1 - Ontologies
KR Workshop 1 - Ontologies
 

Recently uploaded

DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptxsameer gaikwad
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptDrZeeshanBhatti
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsssuser82c38d
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxmavinoikein
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriISPMAIndia
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxPoojitha B
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fxjavierdavidvelasco17
 
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이ssuser82c38d
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAutokey
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...ISPMAIndia
 
OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20Shane Coughlan
 
killing camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfkilling camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfssuser82c38d
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...ISPMAIndia
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Asher Sterkin
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...ISPMAIndia
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...emili denli
 
Steps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfSteps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfayushinwizards
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementOnePlan Solutions
 

Recently uploaded (20)

DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptx
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.ppt
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp students
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptx
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit Bendigiri
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptx
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fx
 
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
 
OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20
 
killing camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfkilling camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdf
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
 
Steps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfSteps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdf
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
 

STI 2022 - Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery

  • 1. Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery 2022-09 STI / Granada Michele Pasin Head of Dimensions Data Solutions m.pasin@digital-science.com Richard J. Abdill Data Scientist richard.abdill@pennmedicine.upenn.edu
  • 4. Dimensions delivers an array of search and discovery, analytical, and research management tools, all in a single platform. ● Moving beyond citations to better capture the research lifecycle: the links among publications, grants, policy documents, clinical trials, datasets, patents, researchers are the key to understanding this landscape. https://www.dimensions.ai What is Dimensions?
  • 5. More than a citations database 1.6bn links 2m links 485k links 147m Patents 423m links 245k links 327k to funders 2 0 m l i n k s 
 2 9 m t o f u n d e r s 2.4m links Status: July 2022 2 . 4 m l i n k s 904k links 1.6m to funders 6.3m Grants $2.2 trillion 
 in funding 902k Policy documents 719k Clinical
 trials 2 7 k l i n k s 2 9 0 k t o f u n d e r s 224m Altmetric
 data points 129m Publications
 improved 
 metadata of 92m 12m Datasets 18m links 14m links
  • 6. ● Fast, large scale analyses; dynamic dashboards ● Direct integration with BI tools e.g. Tableau, Qlik, PowerBI ● Join with private & public data ● Ad hoc analysis & single data- type analyses ● Full-text search & special functions e.g. affiliation extraction ● Product integrations e.g. CRIS ● Search & discovery; 
 top analytical use cases ● Dedicated UI, inbuilt visualizations ● In the browser, no specialized knowledge required Web App API Google BigQuery For everyone For API users + data & analytics teams For data & analytics teams + dashboards for everyone Dimensions ‘flavors’
  • 7. THE CHALLENGE What is our motivation? THE OPPORTUNITY What action are we taking? THE RESULT What changes can we expect to see? • Cloud databases are powerful, but can be hard to operate • Performing large scientometric analysis at scale requires a lot of technical knowledge - eg. building a network-analysis • Can we automate parts of these tasks? • Can we empower more researchers who have domain knowledge (but lack time / technical skills) ? • Approach for generating scientific network visualizations from simpler SQL queries • Uses GBQ + Python + VosViewer, but core ideas applicable elsewhere The main argument
  • 8. Cloud-based scientometrics databases 1/2 performance ● How they work ○ Cost-efficient storage ○ Distribute access control ○ Perform complex calculations with an on-demand infrastructure ● Case study: “centre of mass of research” ○ authors affiliations locations weighted by citations to outputs Hook, D. W., Porter, S. J., (2021). Scaling Scientometrics: Dimensions on Google BigQuery as an Infrastructure for Large-Scale Analysis. Front. Res. Metr. Anal. 6.
  • 9. Cloud-based scientometrics databases 2/2 linked data ● How they work ○ Cost-efficient storage ○ Distribute access control ○ Perform complex calculations with an on-demand infrastructure ● Case study: “centre of mass of research” ○ authors affiliations locations weighted by citations to outputs ● Case study: “connecting scientometrics” ○ GRID + World Bank Data ○ High/Low-income-country research organizations collaborations Porter, Hook, D. W., Porter, S. J., (2022). Connecting Scientometrics: Dimensions as a Route to Broadening Context for Analyses. Front. Res. Metr. Anal.
  • 10. Technical challenges 1. DB Structure can be complex 2. SQL Query can be complex 3. Cloud systems have their own peculiarities EG nested fields
  • 11. 1. DB Structure can be complex 2. SQL Query can be complex 3. Cloud systems have their own peculiarities EG nested fields Technical challenges
  • 12. 1. SQL Query can be complex 2. DB Structure can be complex 3. Cloud systems have their own peculiarities EG nested fields Dimensions BigQuery Lab, (2022). Working with nested and repeated fields - https://bigquery-lab.dimensions.ai/tutorials/04-nested/ Technical challenges
  • 13. Our Prototype: main ideas ● Goal: simplified workflow for GBQ-based analytics ● App that allows users to generate network visualizations of a scientific topic ○ Organization collaborations ○ Concepts co-occurrence https://digital-science.github.io/dimensions-network-gen
  • 14. Our Prototype: main ideas ● Goal: simplified workflow for GBQ-based analytics ● App that allows users to generate network visualizations of a scientific topic ○ Organization collaborations ○ Concepts co-occurrence ● Data: Dimensions COVID-19 on GBQ (public domain) ○ Publications: 1,031,972 ○ Clinical Trials: 14,723 ○ Grants: 16,703 ○ Patents: 41,473 ○ Datasets: 32,784 ○ Organizations: 36,670 https://console.cloud.google.com/marketplace/product/digitalscience- public/covid-19-dataset-dimensions
  • 15. Implementation: three steps 1. SQL input 2. Network generation 3. JSON / dataviz output
  • 16. Implementation 1/3: SQL input 1. SQL input 2. Network generation 3. Dataviz output (from JSON) User define a ‘document set’ using a simple SQL query returning IDs Document IDs can be used later as part of a more complex query. $> python dimensions-networks-gen topics/query.sql
  • 17. Implementation 2/3 Automatically, the user query gets embedded in more complex ‘network’ query ● Concept co-occurrence network ○ how many publications are shared between these concepts ● Organisation network ○ how many publications are shared between these organisations. 1. SQL input 2. Network generation 3. Dataviz output (from JSON)
  • 18. Implementation 3/3: Data Visualization Network data processed so to be compatible with VosViewer JSON format. Static HTML pages generated. 1. SQL input 2. Network generation 3. Dataviz output (from JSON)
  • 19. Results Try it out at: https://digital-science.github.io/dimensions-network-gen
  • 20. Wrapping up ● Introducing an open source prototype using Python and VOSViewer ● Simplifies the process of generating scientific networks ● Uses Dimensions’ freely available COVID-19 dataset on Google BigQuery ● Shows challenges and proposes a solution for cloud-based analytics generation ● Audience: analysts, researchers and anyone who deals with scientometrics data processing https://digital-science.github.io/dimensions-network-gen
  • 21. Questions? Next steps ● Adding more network generation queries ● Adding more output visualizations ● Supporting Jupyter & Colab