SlideShare a Scribd company logo
1 of 125
Data4Impact Methodology
& Indicators
Brussels, June 2019
Introduction to Data4Impact
Data4Impact: the basics
• Call: CO-CREATION-08-2016-2017: Better integration of evidence on the impact of research
and innovation in policy making
• Expected impacts:
 Improved monitoring of R&I activities: new indicators for assessing research and innovation
performance, including the impact of research and innovation policies
 Prove value to the society: determining the societal impact of research and innovation funding in order
better to justify research and innovation spending
Data4Impact addresses key challenges and expected impacts of CO-CREATION-08-2016-2017
through a data driven approach
What is big data?
Definition of Big Data:
"Big Data is high-volume, high-velocity and/or high-variety information assets that demand
cost-effective, innovative forms of information processing that enable enhanced insight, decision
making, and process automation."
Key properties of Big Data:
 Volume, i.e. no sampling is generally applied
 Variety, i.e. structured and unstructured data from various sources, in different formats
 Velocity, i.e. real-time/rapid data
 Veracity, i.e. variations in data quality, cleaning, processing, etc.
Non-intrusiveness -> Big Data is a byproduct of digital interaction and communication
Key objective: make Big Data small!
Big data versus traditional methods: pros and
cons
No sampling, bottom-up, scalable
Low administrative burden
Short/no data lags
New data and indicators
Risk of misidentification
Data veracity
Lack of persistent identifiers
Data mishandling, ethics
Where? Start with an individual
Individual level
Who participated in the programme?
Who were members of the extended team?
Organisation/team level
Research teams in universities & research centres;
Small companies and large enterprises
Project/programme level
Data aggregated at project or programme level
Analytical dimensions
Within researchers themselves; between researchers;
between researchers and organisations; between
organisations; between projects; between programmes
Key questions:
- Whom exactly did the programme attract?
- What happened during and after the projects?
- What was the impact?
How? Build a Knowledge Graph, Integrate Data
Why/what? Answer questions that matter to funders without
ever asking a beneficiary
1 2 3
Outputs,
products and
interventions
- Outputs, products and
interventions
- Collaborations
- Scientific publications
- Intellectual Property Rights
- Scientific prizes
Outcome-level
indicators
- Innovations
- Dissemination activities
- Further funding/
investment
- Next destinations
- Effects on the company/
private sector
- New companies/
organizations created
Impact level
indicators
- Impact on health and
welfare/ Health and
environmental impacts
- Impacts on creativity,
culture & society/ Social,
economic, capability and
cultural impact
- Influence on policy
making/ political impact
Ask less, know more
Evaluating Planning Storytelling
Tracking individual researchers
Organisation news/public relations
Tracking organisations
Tracking organisations
Tracking projects
Key facts about Data4Impact: data sources
Input data EC monitoring data (Health & SC1 projects, health related),
PubMed data
Data sources: output level indicators EC monitoring data (Cordis)
OpenAIRE
Europe PMC (incl. full text data)
PATSTAT (incl. abstracts & full texts)
Lens.org data
Data sources: result and impact level
indicators
Company websites
Social media (Twitter)
Clinical guidelines repositories
EC monitoring data
EMA data on human medicinal products & orphan medicines
DrugBank data
Company websites
Social media (Twitter)
News/media sites
Key facts about Data4Impact
Project dimension Coverage
Levels of data collection Organisation
Project (for EU FP programmes only)
Programme
Programmes covered Over 40 health funders in the Europe + EU FPs
Data collection Yes (strong effort)
Data integration Yes (moderate effort)
Machine learning, NLP, entity
recognition
Yes (strong effort)
Topic modelling Yes (strong effort)
Project duration & budget 2 years, EUR 1.5 million
Key objectives and results
Data4Impact
Data4Impact: objectives
Objective 1: define, develop, analyse new indicators for assessing the
performance of EU and national R&I systems.
Data4Impact: objectives
Objectives 2+3: gather data at input, throughput, output and impact levels,
derive facts and understand impact on health-related challenges
Objectives 4+5: perform community-driven validation and develop user-
centered tools
Key achievements
Data4Impact offers unique coverage of data sources, with an aim to link them through
specific entities
Data4Impact covers all key stages of the R&I lifecycle in the health domain, i.e. basic
research -> translational & applied research -> innovation & uptake on the market ->
clinical practice & public health
New indicators and line of thinking investigated on academic impact
 The funder and society perspective: funding timely and relevant research? Do the ‘right
thing’ by funding rare topics?
 If a funder enters an area where few others invest, does this imply stronger impact?
 How does this interact with the researcher/organization perspective?
Data4Impact was first/one of the first to track data to medium- and long-term economic and
societal/health impacts, i.e. link previous project activities to events that happened recently
Conceptual framework
What is societal impact?
Societal Impact as a...
"demonstrable contribution that excellent research makes to
society and the economy. This can involve academic impact,
economic and societal impact [...]“
(Economic and Social Research Council, ESRC)
General requirements
• Take the temporality, multicausality, and multifacetedness of societal impact into
account
• Consider the academic, economic, and societal impact dimension and illustrate
respective impact generation paths
• Take advantage of all the data/data sources covered by Data4Impact
• For analytical purpose we distinct between:
1) Academic Impact (effects on academic system and scientific practice)
2) Economic Impact (effects on the economy and value creation)
3) Societal Impact (effects on policy, society, and culture as well as on individual
behaviour, subjective wellbeing, and life satisfaction)
Simplified linear logic I
Input Throughput Output Impact
Processes before
any R&I activity
starts as well as
the resources that
are needed.
Intermediate
results of R&I
activities, i.e.
documented
knowledge.
Further
processing of
knowledge
generated during
R&I activities
Demonstrable
contribution to
academia,
economy, and
society
Simplified linear logic II
Tracking Research Activities
Input Throughput Output Impact
Keeping Track of the Whole Process
- automated
- granular
- scalable
- applicable to other settings
Tracking Research Activities
Input Throughput Output Impact
Keeping Track of the Whole Process
- automated
- granular
- scalable
- applicable to other settings
Tracking Research Activities
Funding
Input - FP7/H2020 Projects
Core Set Extended Set
FP7 998 8832
H2020 669 2253
[eHealth]
[SC 1]
[> 20% PubMed]
Number of research projects of the
EU Framework Programme
Input - FP7/H2020 Projects – DATA
CORDIS
● Call document
● Project description
● Final or periodic project reports (project summary)
● Scholarly publications deriving from the project
● Patents
● Results in Brief – Expected Impact
automatic extraction
of pertinent info
from associated
documents (AI) and
metadata
Input - FP7/H2020 Projects
EC Contribution for the FP7-Core and
H2020-Core Projects
Topics in the Health Sector
ICD - 10 Chapters
• International statistical Classification of Diseases and
related health problems
• international standard for reporting diseases and health
conditions
• diagnostic standard for all clinical and research purposes
• ICD classes associated with every project
Topics in the Health Sector
ICD - 10 Example: Neoplasms
Malignant neoplasms, stated or presumed to be primary, of specified sites,
except of lymphoid, haematopoietic and related tissue C00-C75
Malignant neoplasms of ill-defined, secondary and unspecified sites C76-C80
Malignant neoplasms, stated or presumed to be primary, of lymphoid,
haematopoietic and related tissue C81-C96
Malignant neoplasms of independent (primary) multiple sites C97-C97
In situ neoplasms D00-D09
Benign neoplasms D10-D36
Neoplasms of uncertain or unknown behaviour D37-D48
Input (Funding) & Topics
H2020-CoreFP7-Core
Input
Project-level data
Funding allocation by
- Organizations: type (private, public), geographic location
- Funder
- ICD chapters
Comparisons over time.
Input Throughput Output Impact
Keeping Track of the Whole Process
- automated
- granular
- scalable
- applicable to other settings
Tracking Research Activities
Publications
Patents cited in PP
Other
Innovations
Input - FP7/H2020 Projects – DATA
CORDIS
● Call document
● Project description
● Final or periodic project reports (project summary)
● Scholarly publications deriving from the project
● Patents
● Results in Brief – Expected Impact
automatic extraction
of pertinent info
from associated
documents (AI) and
metadata
Throughput & Output
Innovation “Insights” from Project Portfolios
• Diagnostic Tools
• Treatment
• Drug
• Protocol
• Biomarker
• Biorepository
• Gene
• Metabolite
• Clinical Trial
• Method
• Patent
• Device
• Material
• Infrastructure
• Software
• System
• Prototype
• Study
• Publication
• Company
• Education
• Employment
• Dissemination
• *Impact
• *Outcome
Documents FP7 Core FP7 Extended H2020 Core H2020 Extended
Rest/Other Pubs 4205 42916 500 5657
Pubs in PubMed 25980 68521 1324 8590
Throughput - Publications
Symptoms, signs and Certain infectious and Congenital malformations,
deformations and chromosomal
abnormalities; 11
Diseases of the blood and blood-
forming organs and certain
disorders involving the immune
mechanism; 18
abnormal clinical and
laboratory findings, not
elsewhere classified; 6
parasitic diseases; 37
Mental and behavioural disorders; 9
Diseases of the digestive system; 22
Injury, poisoning and certain other
consequences of external causes; 21
Diseases of the eye and adnexa; 41
Endocrine, nutritional and
metabolic diseases; 46
Diseases of the skin and
subcutaneous tissue; 30
Diseases of the genitourinary
system; 74
Diseases of the respiratory
system; 64
Diseases of the musculoskeletal
system and connective tissue; 25
Diseases of the nervous system; 68
Diseases of the circulatory
system; 63
Neoplasms; 132
Throughput – # Patents by ICD Class
FP7 Extended
Treatment
Standard
Publication
Prototype
Protocol
Protein
Metabolite
Material
Gene
Employment
Education
Drug
Dissemination
Diagnostic Tool
Clinical Trial
Biorepository
Biomarker
0 5000 10000 15000 20000 25000 30000 35000 40000
Device
Infrastructure
Method
Software
System
Study
Output – Innovations - FP7- Extended
Output – Pubs in Patents
Funder
Number of
publications
analysed
Share of publications cited
in patents at least once
National Institutes of Health (US) 397886 4,4%
Wellcome Trust (UK) 97434 6,8%
European Commission 84038 5,5%
National Science Foundation (US) 52366 4,5%
Medical Research Council (UK)* 45246 10,0%
Research Councils UK* 39214 2,9%
Biotechnology and Biological Sciences
Research Council (UK)*
22260 9,8%
National Health and Medical Research
Council (Australia)
21181 2,3%
Swiss National Science Foundation
(Switzerland)
15961 5,3%
Austrian Science Fund (Austria) 13816 5,6%
Output – Creation of New Companies
● 430 newly created companies in FP7
● 51 of which in FP7-Core
● Sample of FP7-Core projects with 2 or more new companies formed
Project Number Project Acronym # Spin-offs
201924 EDICT 3
223744 DOPAMINET 2
201418 READNA 2
278832 hiPAD 2
279039 ComplexINC 2
Collaboration Networks
ICD Ch9 Diseases of the Circulatory System
Technological Diffusion - Organization networks
(public vs private, geographic location, etc):
size, density, key bridge organizations,
across fields, fine detail within a subfield
Input Throughput Output Impact
Keeping Track of the Whole Process
- automated
- granular
- scalable
- applicable to other settings
Tracking Research Activities
Project portofolios, PubMed, Lens.org, etc
Insight extractors & other NLP algorithms
Input Throughput Output Impact
Linking across
- funders/programs
- organization (type, location, etc)
- ICD class
- Time
 IMPACT
Tracking Research Activities
Project portofolios, PubMed, Lens.org, etc
Insight extractors & other NLP algorithms
Impact
Input Throughput Output Impact
Keeping Track of the Whole Process
- automated
- granular
- scalable
- applicable to other settings
Academic, Economic, Societal Impact
Academic Impact
Topic Modelling
Preliminary Results
Topic Modelling
Publications
• > 5 million
• H2020, FP7
• 20% of sample from 40+
funders of D4I
Deep Learning
NLP
Expert
442 Topics
9 major categories
Linked to funders,
organizations, authors
countries, etc.
Citations
Clinicopathologic and 11C-Pittsburgh compound B implications
of Thal amyloid phase across the Alzheimer’s disease spectrum
An autoradiographic evaluation of AV-1451 Tau PET in dementia
Deciphering Interactions of Acquired Risk Factors and ApoE-
mediated Pathways in AlzheimerΒ΄s Disease
What is normal in normal aging? Effects of aging, amyloid and
Alzheimer's disease on the cerebral cortex and the hippocampus
Soluble apoE complex: mechanism and therapeutic target for
APOE4-induced AD risk
Role of genes linked to sporadic Alzheimer's disease risk in the
production of Β -amyloid peptides
Proteolytic Cleavage of Apolipoprotein E4 as the Keystone for
the Heightened Risk Associated with Alzheimer’s Disease
MeSH
alzheimer disease
amyloid beta peptides
amyloid
neurodegenerative diseases
Brain
apolipoprotein e4
amyloidosis
Text
Amyloid
Alzheimer
Apoe
Neurodegeneration
Neurodegenerative
Abeta
Brain
Dementia
Aggregation
Fibrils
Tau
Cognitive
Pathology
Plaques
Deposition
impairment
aging
Phrases
alzheimer disease
neurodegenerative diseases
amyloid fibrils
amyloid deposition
Keywords
alzheimer disease
neurodegeneration
amyloid
dementia
geriatrics
Wikipedia terms
Alzheimer's_disease
Neurodegeneration
Apolipoprotein_E
Amyloid
Neuropathology
What is this Topic about??
Alzheimer’s disease
Topic Modelling
Identifying Topics
Topic Modelling – What for? (1/2)
• identify active areas of research: discover hidden themes (topics)
• understand what is actually produced: calc topic distributions per
document / project(grant) / funder
• analyze active research areas on several dimensions (e.g.,
geographic regions, funders, etc.)
• discover clusters and communities, assess research collaboration:
topic based similarity analysis
• identify emerging research areas: topic based trend analysis
• assess coverage, identify gaps or new challenges: compare funded
research
• assess the relevance and impact of research in the society using
new indicators
Topic Modelling – What for? (2/2)
Topic Modelling & ICD Chapters
Topic Modelling
• automation
• granularity
• bottom up
• process is not field-related
• changes in set of topics over time
• ICD Chapters provide another piece of information
Topic category
Estimated Share of Research
Output in PubMed
# Research Topics in the
Data4Impact Topic Model
1. Infectious Diseases 7,2% 34
2. Non-Communicable Diseases 18,6% 86
3. Health systems, public health & epidemiology 14,5% 63
4. Diagnostics, treatment development, surgery 6,4% 26
5. Molecular cell biology 26,1% 118
6. Methods, models, technologies, databases 11,5% 46
7. Physiology 3,2% 15
8. Cognition and behaviour 4,6% 18
9. Other 7,9% 36
Total 100,0% 442
Topic Modelling – Major Categories
Distributed (Big) Data analytics
HCI design & user experience
GPU
Topic Modelling
Identify topic trends
Distributed (Big) Data analytics
HCI design & user experience
GPU
Topic Modelling
Trendy Topics
Topic Modelling
Old-fashioned Topics
Relational DBs
Programming
Topic Modelling
Important but declining (?)
Genetic algorithms
P2P networks & content distribution
topicid title
318 protein interaction / binding
365 molecular dynamics & protein structure
275 gene expression analysis
69 brain function
111 snps & genetic association
209 Diabetes
315 depression & anxiety
68 genome sequencing
470 hiv epidemiology
284 breast cancer
319 cardiovascular disease (risk)
109 smoking and public health
48 kidney disease
403 genetics (mutation, disease)
351 escherichia coli infections
226 graphene & nanotechnology
121 obesity
312 lung / pulmonary disease
Academic Impact – Common Topics
topicid title
123 eating disorders
306 arsenic exposure & public health
397 ovarian cancer
164 gastric cancer
465 glioblastoma
269 genomics & exome sequencing
248 psoriasis
117 mosquitoes & public health
462 hepatitis B infection (hbv)
47 lung cancer
212 oral / dental health
327 thyroid disease, hormone, cancer
296 hodgkin lymphoma
71 clinical biomarkers & diagnosis
11 multiple sclerosis
490 pet imaging
489 pharmacokinetics
101 epilepsy
Academic Impact – Rare Topics
Academic Impact – Timeliness of Research
Funder
Share of research output in
top-10% fastest growing research topics
National Health and Medical Research Council
(Australia) 24,7%
Research Councils UK* 23,5%
European Commission 19,5%
National Institutes of Health (US) 16,7%
Swiss National Science Foundation (Switzerland) 16,2%
Wellcome Trust (UK) 14,5%
Biotechnology and Biological Sciences Research
Council (UK)* 11,2%
Medical Research Council (UK)* 11,1%
Total PubMed 9,9%
Academic Impact – Timeliness of Investment
Topic name
Estimated share of research
output in the EU Framework
Programmes
Estimated share of research
output in PubMed
(fast- growing topics)
Copy number variations (genome) 0,5% 0,2%
Graphene & nanotechnology 1,3% 0,4%
Complement activation 0,9% 0,2%
DNA sequence processing 0,3% 0,2%
Cleft palate <0,1% 0,3%
Gut microbiota 0,4% 0,2%
topicid title
226 graphene & nanotechnology
69 brain function
111 snps & genetic association
318 protein interaction / binding
351 escherichia coli infections
228 proteomics & mass spectrometry
433 climate change
68 genome sequencing
365 molecular dynamics & protein structure
400 influenza virus
272 vaccination & immunization
275 gene expression analysis
266 hiv infection
258 embryonic stem cells
71 clinical biomarkers & diagnosis
117 mosquitoes & public health
254 alzheimer disease
403 genetics (mutation, disease)
Academic Impact – EC Funded Topics
Academic Impact
Topic View: Cardiovascular Diseases
Funder Rank
National Institutes of Health (US) 1
Medical Research Council (UK)* 2
European Commission 3
Wellcome Trust (UK) 4
British Heart Foundation (UK) 5
National Health and Medical
Research Council (Australia) 6
Research Councils UK* 7
Swedish Research Council (Sweden) 8
Chief Scientist Office (UK) 9
Cancer Research UK 10
Topic Size: large
- x2 of average topic in PubMed
Topic Trend: growing
- 1.25 times larger in 2012-18, than
2005-11
Topic Exclusivity:
- low (many funders investing on
topic)
Academic Impact – Summary
Topic modelling
1. Automated
2. Granular
3. Bottom up
4. Not field related
Publication Links + Topics & Trends allow for comparisons across:
- Funders
- Projects
- Authors/Organizations
- Geographic locations
- Over time
Social media impact
topic
models
topic
searches
search
results
indicators
News
Blogs
Fora
Twitter
Search Queries
Most discussed topics
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
Indicator: Topic Buzz, rank topics
by the number of mentions
We show the top-20 topics
Dates: 13 January – 31 March
2019
Topics’ Engagement
Indicator: Virality, engaging articles
Meaning: The % of articles of each
topic that have more than 5
interactions on Facebook.
Dates: 13 January – 31 March 2019
Flu
Indicator: Buzz trend, number of
daily mentions
Dates: 13 January – 15 April 2019
Cardiovascular risk factors
Indicator: risk factors of
cardiovascular diseases
We show the Share-of-Voice for
each factor
Dates: 13 January – 15 April 2019
Economic impact
Tracking of data from company
websites
Why?
Current methodologies affected by low and dropping response rates, relatively
high running costs and substantial data lags
Big data offers data scalability, completeness and shorter data lags
Growing interest in the big data, e.g. future editions of the European Innovation
Scoreboard to contain data derived from big data approaches
Classification of innovations (what?)
Innovations
Innovation type
Input data Company URL link
Innovation output
Product
innovation
Service, process,
other
innovation
Innovation
activity
Licensing
activities
Private/public
funding
attracted
Certification &
standardisation
M&A
+
Extraction of entities (product names, trademarks, copyright) associated
with innovation outputs and activities
Key results: FP7-Core set
Key results: 2097 FP7 & H2020 companies analysed in total, over 1.5 million URL links
harvested, over 15,000 innovation texts identified
Key results: FP7-Core Set
Indicator Indicator value (FP7-Core projects)
Number of companies analysed in the FP7-Core set 1395
Estimated share of enterprises with evidence of innovation activities 46.0%
Average number of innovation outputs and activities identified per
company
16.1
Estimated share of highly innovative enterprises 7.4%
Estimated share of enterprises with evidence of licensing activities
(incl. patent/trademark license agreements)
9.3%
Estimated share of enterprises involved in activities related to
acquisitions
20.0%
Estimated share of enterprises with evidence of private
investment/capital attracted
8.0%
Examples of innovations identified
2019-06-26
82
Examples of innovations identified
2019-06-26
83
Uptake of R&I by companies
Estimated uptake of innovation outputs and activities in FP7-Core projects, by ICD class
Uptake of R&I activities: targeted approach
Aiming at a simple but powerful first-line screening tool for HRF mutations, we have developed and validated a reverse-hybridization assay (HRF
StripAssay) for the rapid and simultaneous detection of 22 most common HRF mutations: H20N, H20P, I268T, V377I (HIDS); R260W, D303N, L305P,
T348M, L353P, Y570C (CAPS); C30R, C33Y, D42Del, T50M, C70R, C73W, R92Q (TRAPS); M680I(G/A), M680I(G/C), M694I(G/A), V726A (FMF).
Reliable genotyping of recombinant mutant clones and a selection of reference DNA samples was achieved by means of teststrips presenting
parallel arrays of allele-specific oligonucleotides. We demonstrated that the prototype HRF StripAssay is capable of detecting all 22 mutions, as
well as identifying homozygotes by the absence of the corresponding wild-type signal.
Summary
Company websites proved to be a rich source of data for innovation outputs and
activities
State-of-the-art web scraper and NLP model developed, approach is scalable and can
handle multiple languages
New data and indicators which can be reproduced in frequent batches
Summary
Useful for:
- Monitoring and ex-post evaluation: first use cases for the EIS built; possible to
link company innovations to previous research activities
- Storytelling: rich source of data for innovation success stories and case studies
- Proposal evaluation: innovation track record, previous commercialisation
activities, investment attracted, etc.
Caveats, weaknesses and areas for further work:
- Process and service innovations captured to a lesser degree
- Eudamed (EU database for CE marked medical devices and technologies,
opening in 2020) offers a rich source of data for further work
Societal and health impact
Linking medicines to R&I
Why?
No data currently tracked in a systematic way on the contributions of R&I to
new products on the market
Large investments made in translational medicine and close-to-market research,
but little known about the uptake
New products on the market is a proxy for economic impact, but also
health/societal impact, e.g. orphan medicines, new non-generic medicines,
medicines treating highly resistant pathogens
Project
Medicine
Medicine and/or
active substance
Clinical trial
Publication
mentions
Sponsor
Linked to
Clinical
trials
New
medicine
Linked to
develops
Key results: human medicinal
products authorized by the EMA
Selected results: top-5 medicines with
the strongest links to FP7
Medicine name Active substance Marketing authorisation
holder
Total number of mentions
of medicine name & active
substance
Orfadin Nitisinone Swedish Orphan Biovitrum
International AB
4290
Alkindi Hydrocortisone Diurnal Europe B.V. 3144
Ferriprox Deferiprone Apotex Europe BV 2789
Herceptin Trastuzumab Roche Registration GmbH 1210
Aplidin Plitidepsin Pharma Mar, S.A. 650
Example: Orfadin
Example: Alkindi
Example: Alkindi
Example: Olaparib
Example: Olaparib (breast, ovarian cancer)
Example: Olaparib (breast, ovarian cancer)
Key results: human medicinal
products authorized by the EMA
Summary
To the best of our knowledge, Data4Impact is the first project to systematically link
medicinal products & clinical trials to R&I activities
Data highly useful for storytelling and impact stories, as well as monitoring and ex-
post evaluation
Eventually, big data will cover all key stages of the R&I lifecycle:
• Basic research: throughput/output data + measures of academic impact
• Translational research: clinical trials
• Applied research, close-to-market research: EMA data on medicines, Eudamed data
on medical devices & technologies
• Impact: HTAs, Cochrane reviews
Clinical guidelines
• Clinical guidelines, systematic reviews and treatment
recommendation documents provide traces of clinical and
professional practice
• Proprietary data from Minso Solutions AB. Maintains a
database, Clinical Impact, (CI:TM) (Except WHO, Cochrane,
NICE, also available in PubMed)
• The coverage is nearly complete at the government level
for Sweden, Denmark, Norway, Germany (at the S3 level),
and the UK (NICE and SIGN guidelines), as well as good
coverage of WHO guideline documents and Cochrane
Systematic Reviews.
• In total 855 clinical guidelines had a total of 3684 (2,073
fractional) references that were matched to 1781
publications found in the D4I database.
Indicators
• Traditional bibliometric indicators based on Clinical guideline
citations.
1. e.g. fractionalization at at funder level, normalization of publication and citation
counts to comparable research, if needed, usage of time averages.
• Combined citation and text based metrics
 2. Subject classification of clinical guideline docs
 Vector space embedding of references (based on reference/text combination)*
 Conceptual embeddings of references (based on MESH terms of
references)**
3. Reference weight in text CG:s) (Identification and categorization of named entities
within the clinical guidelines.) ***
Together with citation metrics, by using these modes of analyses we aim to identify significant
relationships between cited references, named entities, topics, and reference functions.
* Eklund, J. (2018). The importance of scientific references in their contexts Poster presented at the 23rd Nordic Workshop on Bibliometrics and
Research Policy 2018, Borås, 7-9 November.
**Eklund, J., Gunnarsson Lorentzen, David & Nelhans, G., (2019). MESH classification of clinical guidelines using conceptual embeddings of
references. Manuscript accepted to ISSI, 17th International Society of Scientometrics and Informetrics Conference, Rome, 2-5 September.
*** Manuscript in preparation
Funder (EC breakdown)
Funder_type
Number
(full)
Number
(fract.)
EC_funder (FP7/H2020) 115 78.2
European nat’l funders 1,859 1,317.9
Internationa funders 1,710 676.9
Total sum 3,684 2,073.0
Funder
Number
(full)
Number
(fract)
EC_FP7-CORE 74 49.9
EC_FP7-EXTENDED 28 18.2
EC_H2020-EXTENDED 1 0.1
EC_other 12 10.0
Total sum 115 78.2
Funders (top 20)
Funder_full Funder_country
Number
(full)
Number
(fract)
National Institutes of Health US 1,645 624.6
Medical Research Council UK 585 452.4
Wellcome Trust UK 555 416.9
NHMRC - National Health and Medical Research Australia 156 85.5
Cancer Research UK UK 122 85.6
RCUK - Research Councils UK UK 85 37.7
Chief Scientist Office UK 82 66.5
EC_FP7-CORE EU 74 49.9
British Heart Foundation UK 69 34.6
Swiss National Science Foundation Switzerland 64 41.9
Arthritis Research UK UK 29 27.5
World Health Organization International 29 27.3
EC_FP7-EXTENDED EU 28 18.2
AKA - Academy of Finland FIN 27 9.8
Biotechnology and Biological Sciences Research Council UK 15 9.6
EC_other EU 12 10.0
NWO - Netherlands Organisation for Scientific Research Netherlands 12 6.6
Austrian Science Fund FWF Austria 11 9.1
ARC - Australian Research Council Australia 10 5.1
Other (N=26 funders) - 74 54
Sum - 3,684 2,073
Guideline providers
0 5 10 15 20 25 30
SoS Nationella Riktlinjer (SE)
SIGN Guidelines (SC)
SST | Sundhedsstyrelsen (DK)
Läkemedelsverkets treatm. Recomm. (SE)
SBU Utvärdering (SE)
Helsedirektoratet (NO)
Am. Acad. Neur. Practice (US)
AWMF (DE)
Folkhälsomyndigheten (SE)
NICE Guidelines (EN)
Cochrane - Reviews
WHO International
EC projects matched with guideline citations (n=115)
MESH terms for funded research
HIV Infections 13 1.97%
Antitubercular Agents 8 1.21%
Mycobacterium
tuberculosis 8 1.21%
Stroke 6 0.91%
Antibodies, Monoclonal 5 0.76%
Colorectal Neoplasms 5 0.76%
ErbB Receptors 5 0.76%
Microbial Sensitivity Tests 5 0.76%
ras Proteins 4 0.61%
HIV-1 4 0.61%
Tuberculosis 4 0.61%
Diabetes Mellitus, Type 1 4 0.61%
Europe 4 0.61%
EC
HIV Infections 62 2.45%
Stroke 27 1.07%
Anti-HIV Agents 26 1.03%
United Kingdom 26 1.03%
England 18 0.71%
Diabetes Mellitus, Type 2 17 0.67%
Brain 15 0.59%
Primary Health Care 15 0.59%
Smoking 15 0.59%
Smoking Cessation 14 0.55%
Cardiovascular Diseases 13 0.51%
Obesity 13 0.51%
Breast Neoplasms 12 0.47%
Bipolar Disorder 12 0.47%
HIV Seropositivity 12 0.47%
Depression 12 0.47%
Medical research council
HIV Infections 104 4.01%
Antimalarials 60 2.32%
Malaria, Falciparum 48 1.85%
Artemisinins 44 1.70%
Tuberculosis 28 1.08%
Anti-HIV Agents 24 0.93%
Malaria, Vivax 23 0.89%
Malaria 22 0.85%
Plasmodium falciparum 22 0.85%
South Africa 21 0.81%
Pregnancy
Complications, Parasitic 19 0.73%
Primaquine 17 0.66%
Quinolines 17 0.66%
Wellcome Trust
MESH terms in referred works
fastText algorithm
Topical analysis of reference contexts
congue risus feugiat ref264 tincidunt lorem nullam
In the generated topic model, each word is associated
with a probability distribution of topics
For each reference, a symmetric context window of
size k is used as a pseudo-document, and the most
probable topic is calculated for that context window
congue risus feugiat ref264 tincidunt lorem nullam
Asthma, a chronic respiratory condition
affecting 300 million people globally (
aref15080825 ), causes inflammation of the lungs
as well as structural and functional remodelling
of the airways. It is characterised by recurrent
attacks of breathlessness and wheezing with
varying degrees of frequency and severity, which
is caused by swelling of the bronchial tubes
resulting in airflow limitation (WHO 2011).
Although the causes of asthma are not completely
understood, risk factors are known to include
inhaling asthma triggers such as allergens,
tobacco smoke and chemical irritants. Asthma is
incurable and the prevalence is increasing,
particularly in children and young adults (
aref22157151 ), however appropriate management
can control the disorder and enable people to
enjoy a high quality of life (WHO 2011).
https://doi.org/10.1002/14651858.CD001116.pub4
asthma a chronic respiratory condition affecting million people globally aref causes inflammation of
the lungs as well as structural and functional remodelling of the airways
Topic 346 (0.8149): asthma, copd, allergic, airway, disease, fev, ige, respiratory, lung, symptoms
Topic 78 (0.0689): pressure, lung, pulmonary, respiratory, gas, lungs, ventilation, volume, breathing,
alveolar
Topical coherence
Using distance measures
defined on spaces of
probability distribution, such
as the Bhattacharyya
distance and the Hellinger
distance, we measure the
divergence between the topics
assigned to the same
reference in different contexts
as well as the topics assigned
to context windows of
different size for a specific in-
text citation.
Clinical guideline impact
• Professional impact – One step closer to the implementation of research
within the clinic
• Case: References in context:
 Generic method for academic citations
In Data for impact :
1. Subject classification of citing document based on cited documents’ MESH
terms
2. Distinguishing between reference kinds in guideline documents
3. Establishing the ”topicality” of each reference based on a trained model of
EuroPMC article.
Architecture
WP4
500 topic
models
WP5.4
138 topic
searches
H2020/FP7
project topics
human expert
web lists of
diseases
manual
selection
News
Blogs
Fora
Twitter
Mentions Indicators
• Monthly releases
• ~1,5M documents per release:
news, blogs, fora. Expected
total size ~5M documents
• ~10M tweets per release
total size ~30M tweets
• 138 topics searched -> 1
dataset per topic
Top-20 Twitter topics (n:~31M tweets)
0 500,000 1,000,000 1,500,000 2,000,000
climate change
vaccination
measles and newborn screening
stress disorders
diabetes mellitus
attention deficit disorder with…
depression
transplantation
weight loss and obesity
cardiovascular risk factors
alzheimer disease
cancer therapy
eating disorders
hypertension and blood pressure
myocardium and heart failure
breast cancer
schizophrenia and bipolar disorder
dendritic cells and immunity
asthma
environmental exposure and air…
Topic Topic name Num tweets
433 climate change 9,949,906
272 vaccination 1,760,780
175 measles and newborn screening 1,457,110
245 stress disorders 898,758
209 diabetes mellitus 858,118
294 adhd 706,055
315 depression 703,844
348 transplantation 699,582
121 weight loss and obesity 696,612
319 cardiovascular risk factors 647,843
254 alzheimer disease 637,668
362 cancer therapy 570,636
123 eating disorders 513,989
240 hypertension and blood pressure 452,499
302 myocardium and heart failure 445,434
284 breast cancer 415,986
366 schizophrenia and bipolar disorder 407,553
344 dendritic cells and immunity 397,980
169 asthma 383,321
373 env. exposure and air pollution 381,212
Topic fluctuation Jan-Feb
3
123
175
254
272
362
0
10000
20000
30000
40000
50000
60000
70000
Topics: 3: anorexia, 123: bulimia, 175:
measles, 254: Alzheimer, 272: vaccination,
362: cancer
Virality
From ten prominent topics according to virality,
the most retweeted tweet together with its url.
ID Topic Retweets URL
47lung cancer 145,421https://t.co/nAtqnmKCqW
3psychometrics 35,353https://t.co/xht4elJZ6w
450iron deficiency and anemia 4,401https://t.co/jBisW7YcRI
491acute lymphoblastic leukemia 11,338https://t.co/zc4qFt6fy5
324embryonic development 3,534https://t.co/xHd1kadSIf
433climate change 47,547https://t.co/zxzAlorA3O
175measles and newborn screening 15,561https://t.co/HjMoUva4nN
272vaccination 11,923https://t.co/d6l8vfmBVW
348transplantation 60,692https://t.co/FSmETQpSkm
362cancer therapy 5,031https://t.co/Qnvo8hTtdE
47 lung cancer 491 leukemia 433 climate change 272 vaccination 348 transplantation
Task 5.4.3 Twitter conversation analysis
• Builds on other WP5.4 activities, but takes a somewhat different approach
to collecting data.
 Focuses on relationships between social media posts (retweets, @tweets, #tweets)
 Possible to construct meaningful tests as ”scripted dialogs”
 Helps weed out spam
 Amenable to content based text analysis at the conversation level (e.g. Sentiment
analys, topic modelling)
Referring to research in thread
First collected tweet in thread:
-[tweet id='13441' replyto='14018'] Independent research has shown that individuals who were
vaccinated for the flu had 5.5 times more respiratory illness than those who were not
vaccinated. [/tweet]
- (A number of replies omitted; thread length: 313)
- [tweet id='216387' replyto='216418'] In the light of new info, why not? It happens all the
time.[/tweet]
- (Replies omitted, showing those with reference)
- [tweet id='216302' replyto='216387'] which is???DOI:10.1371/journal.pntd.0005179
[/tweet]
- [tweet id='216261' replyto='216387'] 'Analysis of year 3 results of phase III trials
of Dengvaxia suggest high rates of protection of vaccinated partial dengue immunes
but high rates of hospitalizations during breakthrough dengue infections of persons
who were vaccinated when seronegative...'DOI:10.1371/journal.pntd.0005179
[/tweet]
-- [tweet id='216241' replyto='216387'] Phase III Trials, among our 9-year olds!
FACT. DOI:10.1371/journal.pntd.0005179 [/tweet]
--- [tweet id='215757' replyto='216241'] Phase 2 was all that is required for release
Phase 3 was 'extra' 'Extra' studies are always done throughout the commercial
lifetimes of drugs & vaccines Consequences of phase 3 results are nowhere near
what group wud have us believe DOI:10.1371/journal.pntd.0005179 [/tweet]
Vaccination on
Twitter
Topic bursts, user behaviour and referring to research in discussions
Topic burst
• Identify a day when activity is more than 50% above the daily average
• The burst extends up to the next day with activity below the average
• This period is compared to previous and following periods of equal length
• This example: 4 day long burst in topic 272 (vaccination)
3
123
175
254
272
362
0
10000
20000
30000
40000
50000
60000
70000
14-Jan
15-Jan
16-Jan
17-Jan
18-Jan
19-Jan
20-Jan
21-Jan
22-Jan
23-Jan
24-Jan
25-Jan
26-Jan
27-Jan
28-Jan
29-Jan
30-Jan
31-Jan
1-Feb
2-Feb
3-Feb
4-Feb
5-Feb
6-Feb
7-Feb
8-Feb
9-Feb
10-Feb
RT networks
(similar
structures,
amount of RTs
increases when
activity is high)
Word clouds
based on
hashtags
(seemingly a
topical shift
during burst)
48% rts 55% rts 42.5% rts
User groups and their relative activity Previous (144869
tweets)
Burst
(194712)
Next
(115557)
Top 1% most active share (overall: 16%) 12 12 19
Next 9% share (overall: 17%) 20 18 18
90% least active share (overall: 67%) 68 70 63
The least active user
group is more prominent
when general activity is
high while the most
active user group is more
prominent when activity
is low.
”Deniers”
(measles, vaxxed, mmr,
autism, study, flu, hpv,
informedconsent,
vaxwoke, cdc,
vaccineinjury,
learntherisk, maga,
gardasil, vaccineskill) ”Non-deniers 2”
(measles, vaccineswork,
publichealth, science,
humanitariancrisis,
scientificreport, antivax,
vaccinessavelives, venezuela,
crisis, humanitarianaid, help,
antivaxxers, vaccinesaresafe,
misinformation, scicomm,
itrustvaccines, mmr,
factsmatter)
”Non-deniers 1”
(measles, vaccineswork, flu, hpv,
antivax, vaxfactsfebruary,
vaccinessavelives, immunization,
antivaxxers, mumps, rotavirus,
ethiopia, law, ebola)
RT and coupled hashtag
networks from burst period.
Academic
27%
Academically
trained
11%
Other
Professional
23%
Media
38%
Policy/decision
maker
1%
9,647 plain text biographies from Twitter profiles
classified using a rule-based method: 30 % matched as professionals:
Class Keyword example
Science student student, studying,
Graduated MS, MA, graduate
University faculty lectur, prof., professor
Other scientist
technician, lab
manager, -ologist
Education and
outreach
curator, teacher,
librarian
Applied science
organization
nonprofit, philantropy
Other professional
recruiter, entrepreneur,
manager
Media professional journalis, publisher
Policy/decision
maker
congressman, senator,
parliament
Ekström, B. (2019): Developing a rule-based method for identifying researchers on Twitter: The case of vaccine discussions
Poster accepted to ISSI, 17th International Society of Scientometrics and Informetrics Conference, Rome, 2-5 September.
How can we use Twitter-bio personas?
- Retweet data
How can we use Twitter-bio personas?
Conversation data
?
Data4Impact has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 770531.
Thank you for your attention!
The Data4Impact Consortium
Visit out website:
www.data4impact.eu
Follow us on Twitter and SlideShare:
@Data4Impact

More Related Content

What's hot

Report on current policies and regulatory frameworks
Report on current policies and regulatory frameworksReport on current policies and regulatory frameworks
Report on current policies and regulatory frameworksOles Kulchytskyy
 
Five minute guide to choosing and implementing research management technology
Five minute guide to choosing and implementing research management technologyFive minute guide to choosing and implementing research management technology
Five minute guide to choosing and implementing research management technologyUNIT4 UK
 
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Platforma Otwartej Nauki
 
Kuali - Building a Community (KDUK14)
Kuali - Building a Community (KDUK14)Kuali - Building a Community (KDUK14)
Kuali - Building a Community (KDUK14)Martin Hamilton
 
scenarios and Foresight
scenarios and Foresightscenarios and Foresight
scenarios and ForesightIan Miles
 
EOSC-Pillar “National Initiatives” Survey Results
EOSC-Pillar “National Initiatives” Survey ResultsEOSC-Pillar “National Initiatives” Survey Results
EOSC-Pillar “National Initiatives” Survey ResultsEOSC-Pillar European Project
 
deJong - The importance of measuring husehold sector innovation
deJong - The importance of measuring husehold sector innovationdeJong - The importance of measuring husehold sector innovation
deJong - The importance of measuring husehold sector innovationinnovationoecd
 
Pan All Ia Learning Clinic
Pan All Ia Learning ClinicPan All Ia Learning Clinic
Pan All Ia Learning Clinicpanall2009
 
Report on future policies and regulatory frameworks
Report on future policies and regulatory frameworksReport on future policies and regulatory frameworks
Report on future policies and regulatory frameworksOles Kulchytskyy
 
R&Dによる知識資産の資本化と産業連関表の改訂および 全要素生産性測定
R&Dによる知識資産の資本化と産業連関表の改訂および全要素生産性測定R&Dによる知識資産の資本化と産業連関表の改訂および全要素生産性測定
R&Dによる知識資産の資本化と産業連関表の改訂および 全要素生産性測定scirexcenter
 
Maghe - National innovation system and policy mix
Maghe - National innovation system and policy mixMaghe - National innovation system and policy mix
Maghe - National innovation system and policy mixinnovationoecd
 
Scanning for emerging s&t issues
Scanning for emerging s&t issuesScanning for emerging s&t issues
Scanning for emerging s&t issuesTotti Könnölä
 
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...Totti Könnölä
 
Igami - Holistic and timely monitoring of STI system
Igami - Holistic and timely monitoring of STI systemIgami - Holistic and timely monitoring of STI system
Igami - Holistic and timely monitoring of STI systeminnovationoecd
 
Annual environment and health conference 2018 tom mc carthy epa hse conferenc...
Annual environment and health conference 2018 tom mc carthy epa hse conferenc...Annual environment and health conference 2018 tom mc carthy epa hse conferenc...
Annual environment and health conference 2018 tom mc carthy epa hse conferenc...Environmental Protection Agency, Ireland
 
Open government data for regulation of energy resource industries in India - ...
Open government data for regulation of energy resource industries in India - ...Open government data for regulation of energy resource industries in India - ...
Open government data for regulation of energy resource industries in India - ...Open Data Research Network
 

What's hot (20)

Report on current policies and regulatory frameworks
Report on current policies and regulatory frameworksReport on current policies and regulatory frameworks
Report on current policies and regulatory frameworks
 
Five minute guide to choosing and implementing research management technology
Five minute guide to choosing and implementing research management technologyFive minute guide to choosing and implementing research management technology
Five minute guide to choosing and implementing research management technology
 
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
 
Kuali - Building a Community (KDUK14)
Kuali - Building a Community (KDUK14)Kuali - Building a Community (KDUK14)
Kuali - Building a Community (KDUK14)
 
scenarios and Foresight
scenarios and Foresightscenarios and Foresight
scenarios and Foresight
 
EOSC-Pillar “National Initiatives” Survey Results
EOSC-Pillar “National Initiatives” Survey ResultsEOSC-Pillar “National Initiatives” Survey Results
EOSC-Pillar “National Initiatives” Survey Results
 
deJong - The importance of measuring husehold sector innovation
deJong - The importance of measuring husehold sector innovationdeJong - The importance of measuring husehold sector innovation
deJong - The importance of measuring husehold sector innovation
 
Pan All Ia Learning Clinic
Pan All Ia Learning ClinicPan All Ia Learning Clinic
Pan All Ia Learning Clinic
 
Public defence_v3
Public defence_v3Public defence_v3
Public defence_v3
 
Commission studies on eaccessibility
Commission studies on  eaccessibilityCommission studies on  eaccessibility
Commission studies on eaccessibility
 
Creating knowledge
Creating knowledgeCreating knowledge
Creating knowledge
 
Report on future policies and regulatory frameworks
Report on future policies and regulatory frameworksReport on future policies and regulatory frameworks
Report on future policies and regulatory frameworks
 
R&Dによる知識資産の資本化と産業連関表の改訂および 全要素生産性測定
R&Dによる知識資産の資本化と産業連関表の改訂および全要素生産性測定R&Dによる知識資産の資本化と産業連関表の改訂および全要素生産性測定
R&Dによる知識資産の資本化と産業連関表の改訂および 全要素生産性測定
 
Maghe - National innovation system and policy mix
Maghe - National innovation system and policy mixMaghe - National innovation system and policy mix
Maghe - National innovation system and policy mix
 
Scanning for emerging s&t issues
Scanning for emerging s&t issuesScanning for emerging s&t issues
Scanning for emerging s&t issues
 
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
 
Igami - Holistic and timely monitoring of STI system
Igami - Holistic and timely monitoring of STI systemIgami - Holistic and timely monitoring of STI system
Igami - Holistic and timely monitoring of STI system
 
Annual environment and health conference 2018 tom mc carthy epa hse conferenc...
Annual environment and health conference 2018 tom mc carthy epa hse conferenc...Annual environment and health conference 2018 tom mc carthy epa hse conferenc...
Annual environment and health conference 2018 tom mc carthy epa hse conferenc...
 
Method and Platform for Identifying Stakeholders in the Nanotechnology Economy
Method and Platform for Identifying Stakeholders in the Nanotechnology EconomyMethod and Platform for Identifying Stakeholders in the Nanotechnology Economy
Method and Platform for Identifying Stakeholders in the Nanotechnology Economy
 
Open government data for regulation of energy resource industries in India - ...
Open government data for regulation of energy resource industries in India - ...Open government data for regulation of energy resource industries in India - ...
Open government data for regulation of energy resource industries in India - ...
 

Similar to Presentation on Data4Impact methodology and results (Workshop in Brussels)

20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...
20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...
20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...OpenAIRE
 
Tackling societal challenges through digital transformation
Tackling societal challenges through digital transformationTackling societal challenges through digital transformation
Tackling societal challenges through digital transformationGames for Health Europe
 
Horizon 2020 - Alain Thielemans
Horizon 2020 - Alain ThielemansHorizon 2020 - Alain Thielemans
Horizon 2020 - Alain Thielemansimec
 
Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...
Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...
Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...shengjing 孙胜晶
 
Big Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBig Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBYTE Project
 
Oecd uni indcollaboration_ch1_website
Oecd uni indcollaboration_ch1_websiteOecd uni indcollaboration_ch1_website
Oecd uni indcollaboration_ch1_websiteslideshow19
 
Data4Impact Expert Workshop Report
Data4Impact Expert Workshop ReportData4Impact Expert Workshop Report
Data4Impact Expert Workshop ReportData4Impact
 
Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020OpenAIRE
 
General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...Nancy Pontika
 
119-Verma RD and Innovation indicators in the indian S&T system
119-Verma RD and Innovation indicators in the indian S&T system119-Verma RD and Innovation indicators in the indian S&T system
119-Verma RD and Innovation indicators in the indian S&T systeminnovationoecd
 
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE
 
Jean claude burgelman implications of open data
Jean claude burgelman implications of open dataJean claude burgelman implications of open data
Jean claude burgelman implications of open dataPlatforma Otwartej Nauki
 
H2020 open-data-pilot
H2020 open-data-pilotH2020 open-data-pilot
H2020 open-data-pilotSarah Jones
 
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...Martin Pan
 
Daniel Spichtinger: Open Access in a European Policy Context opencon
Daniel Spichtinger: Open Access in a European Policy Context openconDaniel Spichtinger: Open Access in a European Policy Context opencon
Daniel Spichtinger: Open Access in a European Policy Context openconRight to Research
 
Service Innovation - Strategy and Policy
Service Innovation - Strategy and PolicyService Innovation - Strategy and Policy
Service Innovation - Strategy and PolicyIan Miles
 
Open Access Presentation Update June 2015
Open Access Presentation Update June 2015Open Access Presentation Update June 2015
Open Access Presentation Update June 2015Jean-François Dechamp
 

Similar to Presentation on Data4Impact methodology and results (Workshop in Brussels) (20)

20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...
20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...
20190528_Data4Impact_Open Science and Big data in support of measuring R&I In...
 
Workshop report
Workshop reportWorkshop report
Workshop report
 
2.tic sante atelierh2020- pcn santé-10sept15
2.tic sante atelierh2020- pcn santé-10sept152.tic sante atelierh2020- pcn santé-10sept15
2.tic sante atelierh2020- pcn santé-10sept15
 
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
 
Tackling societal challenges through digital transformation
Tackling societal challenges through digital transformationTackling societal challenges through digital transformation
Tackling societal challenges through digital transformation
 
Horizon 2020 - Alain Thielemans
Horizon 2020 - Alain ThielemansHorizon 2020 - Alain Thielemans
Horizon 2020 - Alain Thielemans
 
Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...
Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...
Digitalization Capacity for Knowledge Acquisition-Learning from Health Monito...
 
Big Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBig Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case Studies
 
Oecd uni indcollaboration_ch1_website
Oecd uni indcollaboration_ch1_websiteOecd uni indcollaboration_ch1_website
Oecd uni indcollaboration_ch1_website
 
Data4Impact Expert Workshop Report
Data4Impact Expert Workshop ReportData4Impact Expert Workshop Report
Data4Impact Expert Workshop Report
 
Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020
 
General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...
 
119-Verma RD and Innovation indicators in the indian S&T system
119-Verma RD and Innovation indicators in the indian S&T system119-Verma RD and Innovation indicators in the indian S&T system
119-Verma RD and Innovation indicators in the indian S&T system
 
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
OpenAIRE-COAR conference 2014: Open Access in H2020, by Anni Hellman - Europe...
 
Jean claude burgelman implications of open data
Jean claude burgelman implications of open dataJean claude burgelman implications of open data
Jean claude burgelman implications of open data
 
H2020 open-data-pilot
H2020 open-data-pilotH2020 open-data-pilot
H2020 open-data-pilot
 
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
Real World Outcomes Across the AD (Alzheimer’s disease) Spectrum (ROADS) to B...
 
Daniel Spichtinger: Open Access in a European Policy Context opencon
Daniel Spichtinger: Open Access in a European Policy Context openconDaniel Spichtinger: Open Access in a European Policy Context opencon
Daniel Spichtinger: Open Access in a European Policy Context opencon
 
Service Innovation - Strategy and Policy
Service Innovation - Strategy and PolicyService Innovation - Strategy and Policy
Service Innovation - Strategy and Policy
 
Open Access Presentation Update June 2015
Open Access Presentation Update June 2015Open Access Presentation Update June 2015
Open Access Presentation Update June 2015
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

Presentation on Data4Impact methodology and results (Workshop in Brussels)

  • 3. Data4Impact: the basics • Call: CO-CREATION-08-2016-2017: Better integration of evidence on the impact of research and innovation in policy making • Expected impacts:  Improved monitoring of R&I activities: new indicators for assessing research and innovation performance, including the impact of research and innovation policies  Prove value to the society: determining the societal impact of research and innovation funding in order better to justify research and innovation spending Data4Impact addresses key challenges and expected impacts of CO-CREATION-08-2016-2017 through a data driven approach
  • 4.
  • 5. What is big data? Definition of Big Data: "Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation." Key properties of Big Data:  Volume, i.e. no sampling is generally applied  Variety, i.e. structured and unstructured data from various sources, in different formats  Velocity, i.e. real-time/rapid data  Veracity, i.e. variations in data quality, cleaning, processing, etc. Non-intrusiveness -> Big Data is a byproduct of digital interaction and communication Key objective: make Big Data small!
  • 6. Big data versus traditional methods: pros and cons No sampling, bottom-up, scalable Low administrative burden Short/no data lags New data and indicators Risk of misidentification Data veracity Lack of persistent identifiers Data mishandling, ethics
  • 7. Where? Start with an individual Individual level Who participated in the programme? Who were members of the extended team? Organisation/team level Research teams in universities & research centres; Small companies and large enterprises Project/programme level Data aggregated at project or programme level Analytical dimensions Within researchers themselves; between researchers; between researchers and organisations; between organisations; between projects; between programmes Key questions: - Whom exactly did the programme attract? - What happened during and after the projects? - What was the impact?
  • 8. How? Build a Knowledge Graph, Integrate Data
  • 9. Why/what? Answer questions that matter to funders without ever asking a beneficiary 1 2 3 Outputs, products and interventions - Outputs, products and interventions - Collaborations - Scientific publications - Intellectual Property Rights - Scientific prizes Outcome-level indicators - Innovations - Dissemination activities - Further funding/ investment - Next destinations - Effects on the company/ private sector - New companies/ organizations created Impact level indicators - Impact on health and welfare/ Health and environmental impacts - Impacts on creativity, culture & society/ Social, economic, capability and cultural impact - Influence on policy making/ political impact
  • 10. Ask less, know more Evaluating Planning Storytelling
  • 12.
  • 13.
  • 14.
  • 15.
  • 19. Key facts about Data4Impact: data sources Input data EC monitoring data (Health & SC1 projects, health related), PubMed data Data sources: output level indicators EC monitoring data (Cordis) OpenAIRE Europe PMC (incl. full text data) PATSTAT (incl. abstracts & full texts) Lens.org data Data sources: result and impact level indicators Company websites Social media (Twitter) Clinical guidelines repositories EC monitoring data EMA data on human medicinal products & orphan medicines DrugBank data Company websites Social media (Twitter) News/media sites
  • 20. Key facts about Data4Impact Project dimension Coverage Levels of data collection Organisation Project (for EU FP programmes only) Programme Programmes covered Over 40 health funders in the Europe + EU FPs Data collection Yes (strong effort) Data integration Yes (moderate effort) Machine learning, NLP, entity recognition Yes (strong effort) Topic modelling Yes (strong effort) Project duration & budget 2 years, EUR 1.5 million
  • 21. Key objectives and results Data4Impact
  • 22. Data4Impact: objectives Objective 1: define, develop, analyse new indicators for assessing the performance of EU and national R&I systems.
  • 23. Data4Impact: objectives Objectives 2+3: gather data at input, throughput, output and impact levels, derive facts and understand impact on health-related challenges Objectives 4+5: perform community-driven validation and develop user- centered tools
  • 24. Key achievements Data4Impact offers unique coverage of data sources, with an aim to link them through specific entities Data4Impact covers all key stages of the R&I lifecycle in the health domain, i.e. basic research -> translational & applied research -> innovation & uptake on the market -> clinical practice & public health New indicators and line of thinking investigated on academic impact  The funder and society perspective: funding timely and relevant research? Do the ‘right thing’ by funding rare topics?  If a funder enters an area where few others invest, does this imply stronger impact?  How does this interact with the researcher/organization perspective? Data4Impact was first/one of the first to track data to medium- and long-term economic and societal/health impacts, i.e. link previous project activities to events that happened recently
  • 26. What is societal impact? Societal Impact as a... "demonstrable contribution that excellent research makes to society and the economy. This can involve academic impact, economic and societal impact [...]“ (Economic and Social Research Council, ESRC)
  • 27. General requirements • Take the temporality, multicausality, and multifacetedness of societal impact into account • Consider the academic, economic, and societal impact dimension and illustrate respective impact generation paths • Take advantage of all the data/data sources covered by Data4Impact • For analytical purpose we distinct between: 1) Academic Impact (effects on academic system and scientific practice) 2) Economic Impact (effects on the economy and value creation) 3) Societal Impact (effects on policy, society, and culture as well as on individual behaviour, subjective wellbeing, and life satisfaction)
  • 28. Simplified linear logic I Input Throughput Output Impact Processes before any R&I activity starts as well as the resources that are needed. Intermediate results of R&I activities, i.e. documented knowledge. Further processing of knowledge generated during R&I activities Demonstrable contribution to academia, economy, and society
  • 31. Input Throughput Output Impact Keeping Track of the Whole Process - automated - granular - scalable - applicable to other settings Tracking Research Activities
  • 32. Input Throughput Output Impact Keeping Track of the Whole Process - automated - granular - scalable - applicable to other settings Tracking Research Activities Funding
  • 33. Input - FP7/H2020 Projects Core Set Extended Set FP7 998 8832 H2020 669 2253 [eHealth] [SC 1] [> 20% PubMed] Number of research projects of the EU Framework Programme
  • 34. Input - FP7/H2020 Projects – DATA CORDIS ● Call document ● Project description ● Final or periodic project reports (project summary) ● Scholarly publications deriving from the project ● Patents ● Results in Brief – Expected Impact automatic extraction of pertinent info from associated documents (AI) and metadata
  • 35. Input - FP7/H2020 Projects EC Contribution for the FP7-Core and H2020-Core Projects
  • 36. Topics in the Health Sector ICD - 10 Chapters • International statistical Classification of Diseases and related health problems • international standard for reporting diseases and health conditions • diagnostic standard for all clinical and research purposes • ICD classes associated with every project
  • 37. Topics in the Health Sector ICD - 10 Example: Neoplasms Malignant neoplasms, stated or presumed to be primary, of specified sites, except of lymphoid, haematopoietic and related tissue C00-C75 Malignant neoplasms of ill-defined, secondary and unspecified sites C76-C80 Malignant neoplasms, stated or presumed to be primary, of lymphoid, haematopoietic and related tissue C81-C96 Malignant neoplasms of independent (primary) multiple sites C97-C97 In situ neoplasms D00-D09 Benign neoplasms D10-D36 Neoplasms of uncertain or unknown behaviour D37-D48
  • 38. Input (Funding) & Topics H2020-CoreFP7-Core
  • 39. Input Project-level data Funding allocation by - Organizations: type (private, public), geographic location - Funder - ICD chapters Comparisons over time.
  • 40. Input Throughput Output Impact Keeping Track of the Whole Process - automated - granular - scalable - applicable to other settings Tracking Research Activities Publications Patents cited in PP Other Innovations
  • 41. Input - FP7/H2020 Projects – DATA CORDIS ● Call document ● Project description ● Final or periodic project reports (project summary) ● Scholarly publications deriving from the project ● Patents ● Results in Brief – Expected Impact automatic extraction of pertinent info from associated documents (AI) and metadata
  • 42. Throughput & Output Innovation “Insights” from Project Portfolios • Diagnostic Tools • Treatment • Drug • Protocol • Biomarker • Biorepository • Gene • Metabolite • Clinical Trial • Method • Patent • Device • Material • Infrastructure • Software • System • Prototype • Study • Publication • Company • Education • Employment • Dissemination • *Impact • *Outcome
  • 43. Documents FP7 Core FP7 Extended H2020 Core H2020 Extended Rest/Other Pubs 4205 42916 500 5657 Pubs in PubMed 25980 68521 1324 8590 Throughput - Publications
  • 44. Symptoms, signs and Certain infectious and Congenital malformations, deformations and chromosomal abnormalities; 11 Diseases of the blood and blood- forming organs and certain disorders involving the immune mechanism; 18 abnormal clinical and laboratory findings, not elsewhere classified; 6 parasitic diseases; 37 Mental and behavioural disorders; 9 Diseases of the digestive system; 22 Injury, poisoning and certain other consequences of external causes; 21 Diseases of the eye and adnexa; 41 Endocrine, nutritional and metabolic diseases; 46 Diseases of the skin and subcutaneous tissue; 30 Diseases of the genitourinary system; 74 Diseases of the respiratory system; 64 Diseases of the musculoskeletal system and connective tissue; 25 Diseases of the nervous system; 68 Diseases of the circulatory system; 63 Neoplasms; 132 Throughput – # Patents by ICD Class FP7 Extended
  • 45. Treatment Standard Publication Prototype Protocol Protein Metabolite Material Gene Employment Education Drug Dissemination Diagnostic Tool Clinical Trial Biorepository Biomarker 0 5000 10000 15000 20000 25000 30000 35000 40000 Device Infrastructure Method Software System Study Output – Innovations - FP7- Extended
  • 46. Output – Pubs in Patents Funder Number of publications analysed Share of publications cited in patents at least once National Institutes of Health (US) 397886 4,4% Wellcome Trust (UK) 97434 6,8% European Commission 84038 5,5% National Science Foundation (US) 52366 4,5% Medical Research Council (UK)* 45246 10,0% Research Councils UK* 39214 2,9% Biotechnology and Biological Sciences Research Council (UK)* 22260 9,8% National Health and Medical Research Council (Australia) 21181 2,3% Swiss National Science Foundation (Switzerland) 15961 5,3% Austrian Science Fund (Austria) 13816 5,6%
  • 47. Output – Creation of New Companies ● 430 newly created companies in FP7 ● 51 of which in FP7-Core ● Sample of FP7-Core projects with 2 or more new companies formed Project Number Project Acronym # Spin-offs 201924 EDICT 3 223744 DOPAMINET 2 201418 READNA 2 278832 hiPAD 2 279039 ComplexINC 2
  • 48. Collaboration Networks ICD Ch9 Diseases of the Circulatory System Technological Diffusion - Organization networks (public vs private, geographic location, etc): size, density, key bridge organizations, across fields, fine detail within a subfield
  • 49. Input Throughput Output Impact Keeping Track of the Whole Process - automated - granular - scalable - applicable to other settings Tracking Research Activities Project portofolios, PubMed, Lens.org, etc Insight extractors & other NLP algorithms
  • 50. Input Throughput Output Impact Linking across - funders/programs - organization (type, location, etc) - ICD class - Time  IMPACT Tracking Research Activities Project portofolios, PubMed, Lens.org, etc Insight extractors & other NLP algorithms
  • 52. Input Throughput Output Impact Keeping Track of the Whole Process - automated - granular - scalable - applicable to other settings Academic, Economic, Societal Impact
  • 54. Topic Modelling Publications • > 5 million • H2020, FP7 • 20% of sample from 40+ funders of D4I Deep Learning NLP Expert 442 Topics 9 major categories Linked to funders, organizations, authors countries, etc.
  • 55. Citations Clinicopathologic and 11C-Pittsburgh compound B implications of Thal amyloid phase across the Alzheimer’s disease spectrum An autoradiographic evaluation of AV-1451 Tau PET in dementia Deciphering Interactions of Acquired Risk Factors and ApoE- mediated Pathways in AlzheimerΒ΄s Disease What is normal in normal aging? Effects of aging, amyloid and Alzheimer's disease on the cerebral cortex and the hippocampus Soluble apoE complex: mechanism and therapeutic target for APOE4-induced AD risk Role of genes linked to sporadic Alzheimer's disease risk in the production of Β -amyloid peptides Proteolytic Cleavage of Apolipoprotein E4 as the Keystone for the Heightened Risk Associated with Alzheimer’s Disease MeSH alzheimer disease amyloid beta peptides amyloid neurodegenerative diseases Brain apolipoprotein e4 amyloidosis Text Amyloid Alzheimer Apoe Neurodegeneration Neurodegenerative Abeta Brain Dementia Aggregation Fibrils Tau Cognitive Pathology Plaques Deposition impairment aging Phrases alzheimer disease neurodegenerative diseases amyloid fibrils amyloid deposition Keywords alzheimer disease neurodegeneration amyloid dementia geriatrics Wikipedia terms Alzheimer's_disease Neurodegeneration Apolipoprotein_E Amyloid Neuropathology What is this Topic about?? Alzheimer’s disease Topic Modelling Identifying Topics
  • 56. Topic Modelling – What for? (1/2) • identify active areas of research: discover hidden themes (topics) • understand what is actually produced: calc topic distributions per document / project(grant) / funder • analyze active research areas on several dimensions (e.g., geographic regions, funders, etc.)
  • 57. • discover clusters and communities, assess research collaboration: topic based similarity analysis • identify emerging research areas: topic based trend analysis • assess coverage, identify gaps or new challenges: compare funded research • assess the relevance and impact of research in the society using new indicators Topic Modelling – What for? (2/2)
  • 58. Topic Modelling & ICD Chapters Topic Modelling • automation • granularity • bottom up • process is not field-related • changes in set of topics over time • ICD Chapters provide another piece of information
  • 59. Topic category Estimated Share of Research Output in PubMed # Research Topics in the Data4Impact Topic Model 1. Infectious Diseases 7,2% 34 2. Non-Communicable Diseases 18,6% 86 3. Health systems, public health & epidemiology 14,5% 63 4. Diagnostics, treatment development, surgery 6,4% 26 5. Molecular cell biology 26,1% 118 6. Methods, models, technologies, databases 11,5% 46 7. Physiology 3,2% 15 8. Cognition and behaviour 4,6% 18 9. Other 7,9% 36 Total 100,0% 442 Topic Modelling – Major Categories
  • 60. Distributed (Big) Data analytics HCI design & user experience GPU Topic Modelling Identify topic trends
  • 61. Distributed (Big) Data analytics HCI design & user experience GPU Topic Modelling Trendy Topics
  • 63. Topic Modelling Important but declining (?) Genetic algorithms P2P networks & content distribution
  • 64. topicid title 318 protein interaction / binding 365 molecular dynamics & protein structure 275 gene expression analysis 69 brain function 111 snps & genetic association 209 Diabetes 315 depression & anxiety 68 genome sequencing 470 hiv epidemiology 284 breast cancer 319 cardiovascular disease (risk) 109 smoking and public health 48 kidney disease 403 genetics (mutation, disease) 351 escherichia coli infections 226 graphene & nanotechnology 121 obesity 312 lung / pulmonary disease Academic Impact – Common Topics
  • 65. topicid title 123 eating disorders 306 arsenic exposure & public health 397 ovarian cancer 164 gastric cancer 465 glioblastoma 269 genomics & exome sequencing 248 psoriasis 117 mosquitoes & public health 462 hepatitis B infection (hbv) 47 lung cancer 212 oral / dental health 327 thyroid disease, hormone, cancer 296 hodgkin lymphoma 71 clinical biomarkers & diagnosis 11 multiple sclerosis 490 pet imaging 489 pharmacokinetics 101 epilepsy Academic Impact – Rare Topics
  • 66. Academic Impact – Timeliness of Research Funder Share of research output in top-10% fastest growing research topics National Health and Medical Research Council (Australia) 24,7% Research Councils UK* 23,5% European Commission 19,5% National Institutes of Health (US) 16,7% Swiss National Science Foundation (Switzerland) 16,2% Wellcome Trust (UK) 14,5% Biotechnology and Biological Sciences Research Council (UK)* 11,2% Medical Research Council (UK)* 11,1% Total PubMed 9,9%
  • 67. Academic Impact – Timeliness of Investment Topic name Estimated share of research output in the EU Framework Programmes Estimated share of research output in PubMed (fast- growing topics) Copy number variations (genome) 0,5% 0,2% Graphene & nanotechnology 1,3% 0,4% Complement activation 0,9% 0,2% DNA sequence processing 0,3% 0,2% Cleft palate <0,1% 0,3% Gut microbiota 0,4% 0,2%
  • 68. topicid title 226 graphene & nanotechnology 69 brain function 111 snps & genetic association 318 protein interaction / binding 351 escherichia coli infections 228 proteomics & mass spectrometry 433 climate change 68 genome sequencing 365 molecular dynamics & protein structure 400 influenza virus 272 vaccination & immunization 275 gene expression analysis 266 hiv infection 258 embryonic stem cells 71 clinical biomarkers & diagnosis 117 mosquitoes & public health 254 alzheimer disease 403 genetics (mutation, disease) Academic Impact – EC Funded Topics
  • 69. Academic Impact Topic View: Cardiovascular Diseases Funder Rank National Institutes of Health (US) 1 Medical Research Council (UK)* 2 European Commission 3 Wellcome Trust (UK) 4 British Heart Foundation (UK) 5 National Health and Medical Research Council (Australia) 6 Research Councils UK* 7 Swedish Research Council (Sweden) 8 Chief Scientist Office (UK) 9 Cancer Research UK 10 Topic Size: large - x2 of average topic in PubMed Topic Trend: growing - 1.25 times larger in 2012-18, than 2005-11 Topic Exclusivity: - low (many funders investing on topic)
  • 70. Academic Impact – Summary Topic modelling 1. Automated 2. Granular 3. Bottom up 4. Not field related Publication Links + Topics & Trends allow for comparisons across: - Funders - Projects - Authors/Organizations - Geographic locations - Over time
  • 73. Most discussed topics 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 Indicator: Topic Buzz, rank topics by the number of mentions We show the top-20 topics Dates: 13 January – 31 March 2019
  • 74. Topics’ Engagement Indicator: Virality, engaging articles Meaning: The % of articles of each topic that have more than 5 interactions on Facebook. Dates: 13 January – 31 March 2019
  • 75. Flu Indicator: Buzz trend, number of daily mentions Dates: 13 January – 15 April 2019
  • 76. Cardiovascular risk factors Indicator: risk factors of cardiovascular diseases We show the Share-of-Voice for each factor Dates: 13 January – 15 April 2019
  • 78. Tracking of data from company websites Why? Current methodologies affected by low and dropping response rates, relatively high running costs and substantial data lags Big data offers data scalability, completeness and shorter data lags Growing interest in the big data, e.g. future editions of the European Innovation Scoreboard to contain data derived from big data approaches
  • 79. Classification of innovations (what?) Innovations Innovation type Input data Company URL link Innovation output Product innovation Service, process, other innovation Innovation activity Licensing activities Private/public funding attracted Certification & standardisation M&A + Extraction of entities (product names, trademarks, copyright) associated with innovation outputs and activities
  • 80. Key results: FP7-Core set Key results: 2097 FP7 & H2020 companies analysed in total, over 1.5 million URL links harvested, over 15,000 innovation texts identified
  • 81. Key results: FP7-Core Set Indicator Indicator value (FP7-Core projects) Number of companies analysed in the FP7-Core set 1395 Estimated share of enterprises with evidence of innovation activities 46.0% Average number of innovation outputs and activities identified per company 16.1 Estimated share of highly innovative enterprises 7.4% Estimated share of enterprises with evidence of licensing activities (incl. patent/trademark license agreements) 9.3% Estimated share of enterprises involved in activities related to acquisitions 20.0% Estimated share of enterprises with evidence of private investment/capital attracted 8.0%
  • 82. Examples of innovations identified 2019-06-26 82
  • 83. Examples of innovations identified 2019-06-26 83
  • 84. Uptake of R&I by companies Estimated uptake of innovation outputs and activities in FP7-Core projects, by ICD class
  • 85. Uptake of R&I activities: targeted approach Aiming at a simple but powerful first-line screening tool for HRF mutations, we have developed and validated a reverse-hybridization assay (HRF StripAssay) for the rapid and simultaneous detection of 22 most common HRF mutations: H20N, H20P, I268T, V377I (HIDS); R260W, D303N, L305P, T348M, L353P, Y570C (CAPS); C30R, C33Y, D42Del, T50M, C70R, C73W, R92Q (TRAPS); M680I(G/A), M680I(G/C), M694I(G/A), V726A (FMF). Reliable genotyping of recombinant mutant clones and a selection of reference DNA samples was achieved by means of teststrips presenting parallel arrays of allele-specific oligonucleotides. We demonstrated that the prototype HRF StripAssay is capable of detecting all 22 mutions, as well as identifying homozygotes by the absence of the corresponding wild-type signal.
  • 86. Summary Company websites proved to be a rich source of data for innovation outputs and activities State-of-the-art web scraper and NLP model developed, approach is scalable and can handle multiple languages New data and indicators which can be reproduced in frequent batches
  • 87. Summary Useful for: - Monitoring and ex-post evaluation: first use cases for the EIS built; possible to link company innovations to previous research activities - Storytelling: rich source of data for innovation success stories and case studies - Proposal evaluation: innovation track record, previous commercialisation activities, investment attracted, etc. Caveats, weaknesses and areas for further work: - Process and service innovations captured to a lesser degree - Eudamed (EU database for CE marked medical devices and technologies, opening in 2020) offers a rich source of data for further work
  • 89. Linking medicines to R&I Why? No data currently tracked in a systematic way on the contributions of R&I to new products on the market Large investments made in translational medicine and close-to-market research, but little known about the uptake New products on the market is a proxy for economic impact, but also health/societal impact, e.g. orphan medicines, new non-generic medicines, medicines treating highly resistant pathogens
  • 90. Project Medicine Medicine and/or active substance Clinical trial Publication mentions Sponsor Linked to Clinical trials New medicine Linked to develops
  • 91. Key results: human medicinal products authorized by the EMA
  • 92. Selected results: top-5 medicines with the strongest links to FP7 Medicine name Active substance Marketing authorisation holder Total number of mentions of medicine name & active substance Orfadin Nitisinone Swedish Orphan Biovitrum International AB 4290 Alkindi Hydrocortisone Diurnal Europe B.V. 3144 Ferriprox Deferiprone Apotex Europe BV 2789 Herceptin Trastuzumab Roche Registration GmbH 1210 Aplidin Plitidepsin Pharma Mar, S.A. 650
  • 97. Example: Olaparib (breast, ovarian cancer)
  • 98. Example: Olaparib (breast, ovarian cancer)
  • 99. Key results: human medicinal products authorized by the EMA
  • 100. Summary To the best of our knowledge, Data4Impact is the first project to systematically link medicinal products & clinical trials to R&I activities Data highly useful for storytelling and impact stories, as well as monitoring and ex- post evaluation Eventually, big data will cover all key stages of the R&I lifecycle: • Basic research: throughput/output data + measures of academic impact • Translational research: clinical trials • Applied research, close-to-market research: EMA data on medicines, Eudamed data on medical devices & technologies • Impact: HTAs, Cochrane reviews
  • 101. Clinical guidelines • Clinical guidelines, systematic reviews and treatment recommendation documents provide traces of clinical and professional practice • Proprietary data from Minso Solutions AB. Maintains a database, Clinical Impact, (CI:TM) (Except WHO, Cochrane, NICE, also available in PubMed) • The coverage is nearly complete at the government level for Sweden, Denmark, Norway, Germany (at the S3 level), and the UK (NICE and SIGN guidelines), as well as good coverage of WHO guideline documents and Cochrane Systematic Reviews. • In total 855 clinical guidelines had a total of 3684 (2,073 fractional) references that were matched to 1781 publications found in the D4I database.
  • 102. Indicators • Traditional bibliometric indicators based on Clinical guideline citations. 1. e.g. fractionalization at at funder level, normalization of publication and citation counts to comparable research, if needed, usage of time averages. • Combined citation and text based metrics  2. Subject classification of clinical guideline docs  Vector space embedding of references (based on reference/text combination)*  Conceptual embeddings of references (based on MESH terms of references)** 3. Reference weight in text CG:s) (Identification and categorization of named entities within the clinical guidelines.) *** Together with citation metrics, by using these modes of analyses we aim to identify significant relationships between cited references, named entities, topics, and reference functions. * Eklund, J. (2018). The importance of scientific references in their contexts Poster presented at the 23rd Nordic Workshop on Bibliometrics and Research Policy 2018, Borås, 7-9 November. **Eklund, J., Gunnarsson Lorentzen, David & Nelhans, G., (2019). MESH classification of clinical guidelines using conceptual embeddings of references. Manuscript accepted to ISSI, 17th International Society of Scientometrics and Informetrics Conference, Rome, 2-5 September. *** Manuscript in preparation
  • 103. Funder (EC breakdown) Funder_type Number (full) Number (fract.) EC_funder (FP7/H2020) 115 78.2 European nat’l funders 1,859 1,317.9 Internationa funders 1,710 676.9 Total sum 3,684 2,073.0 Funder Number (full) Number (fract) EC_FP7-CORE 74 49.9 EC_FP7-EXTENDED 28 18.2 EC_H2020-EXTENDED 1 0.1 EC_other 12 10.0 Total sum 115 78.2
  • 104. Funders (top 20) Funder_full Funder_country Number (full) Number (fract) National Institutes of Health US 1,645 624.6 Medical Research Council UK 585 452.4 Wellcome Trust UK 555 416.9 NHMRC - National Health and Medical Research Australia 156 85.5 Cancer Research UK UK 122 85.6 RCUK - Research Councils UK UK 85 37.7 Chief Scientist Office UK 82 66.5 EC_FP7-CORE EU 74 49.9 British Heart Foundation UK 69 34.6 Swiss National Science Foundation Switzerland 64 41.9 Arthritis Research UK UK 29 27.5 World Health Organization International 29 27.3 EC_FP7-EXTENDED EU 28 18.2 AKA - Academy of Finland FIN 27 9.8 Biotechnology and Biological Sciences Research Council UK 15 9.6 EC_other EU 12 10.0 NWO - Netherlands Organisation for Scientific Research Netherlands 12 6.6 Austrian Science Fund FWF Austria 11 9.1 ARC - Australian Research Council Australia 10 5.1 Other (N=26 funders) - 74 54 Sum - 3,684 2,073
  • 105. Guideline providers 0 5 10 15 20 25 30 SoS Nationella Riktlinjer (SE) SIGN Guidelines (SC) SST | Sundhedsstyrelsen (DK) Läkemedelsverkets treatm. Recomm. (SE) SBU Utvärdering (SE) Helsedirektoratet (NO) Am. Acad. Neur. Practice (US) AWMF (DE) Folkhälsomyndigheten (SE) NICE Guidelines (EN) Cochrane - Reviews WHO International EC projects matched with guideline citations (n=115)
  • 106. MESH terms for funded research HIV Infections 13 1.97% Antitubercular Agents 8 1.21% Mycobacterium tuberculosis 8 1.21% Stroke 6 0.91% Antibodies, Monoclonal 5 0.76% Colorectal Neoplasms 5 0.76% ErbB Receptors 5 0.76% Microbial Sensitivity Tests 5 0.76% ras Proteins 4 0.61% HIV-1 4 0.61% Tuberculosis 4 0.61% Diabetes Mellitus, Type 1 4 0.61% Europe 4 0.61% EC HIV Infections 62 2.45% Stroke 27 1.07% Anti-HIV Agents 26 1.03% United Kingdom 26 1.03% England 18 0.71% Diabetes Mellitus, Type 2 17 0.67% Brain 15 0.59% Primary Health Care 15 0.59% Smoking 15 0.59% Smoking Cessation 14 0.55% Cardiovascular Diseases 13 0.51% Obesity 13 0.51% Breast Neoplasms 12 0.47% Bipolar Disorder 12 0.47% HIV Seropositivity 12 0.47% Depression 12 0.47% Medical research council HIV Infections 104 4.01% Antimalarials 60 2.32% Malaria, Falciparum 48 1.85% Artemisinins 44 1.70% Tuberculosis 28 1.08% Anti-HIV Agents 24 0.93% Malaria, Vivax 23 0.89% Malaria 22 0.85% Plasmodium falciparum 22 0.85% South Africa 21 0.81% Pregnancy Complications, Parasitic 19 0.73% Primaquine 17 0.66% Quinolines 17 0.66% Wellcome Trust
  • 107. MESH terms in referred works fastText algorithm
  • 108. Topical analysis of reference contexts congue risus feugiat ref264 tincidunt lorem nullam In the generated topic model, each word is associated with a probability distribution of topics For each reference, a symmetric context window of size k is used as a pseudo-document, and the most probable topic is calculated for that context window congue risus feugiat ref264 tincidunt lorem nullam
  • 109. Asthma, a chronic respiratory condition affecting 300 million people globally ( aref15080825 ), causes inflammation of the lungs as well as structural and functional remodelling of the airways. It is characterised by recurrent attacks of breathlessness and wheezing with varying degrees of frequency and severity, which is caused by swelling of the bronchial tubes resulting in airflow limitation (WHO 2011). Although the causes of asthma are not completely understood, risk factors are known to include inhaling asthma triggers such as allergens, tobacco smoke and chemical irritants. Asthma is incurable and the prevalence is increasing, particularly in children and young adults ( aref22157151 ), however appropriate management can control the disorder and enable people to enjoy a high quality of life (WHO 2011). https://doi.org/10.1002/14651858.CD001116.pub4 asthma a chronic respiratory condition affecting million people globally aref causes inflammation of the lungs as well as structural and functional remodelling of the airways Topic 346 (0.8149): asthma, copd, allergic, airway, disease, fev, ige, respiratory, lung, symptoms Topic 78 (0.0689): pressure, lung, pulmonary, respiratory, gas, lungs, ventilation, volume, breathing, alveolar
  • 110. Topical coherence Using distance measures defined on spaces of probability distribution, such as the Bhattacharyya distance and the Hellinger distance, we measure the divergence between the topics assigned to the same reference in different contexts as well as the topics assigned to context windows of different size for a specific in- text citation.
  • 111. Clinical guideline impact • Professional impact – One step closer to the implementation of research within the clinic • Case: References in context:  Generic method for academic citations In Data for impact : 1. Subject classification of citing document based on cited documents’ MESH terms 2. Distinguishing between reference kinds in guideline documents 3. Establishing the ”topicality” of each reference based on a trained model of EuroPMC article.
  • 112. Architecture WP4 500 topic models WP5.4 138 topic searches H2020/FP7 project topics human expert web lists of diseases manual selection News Blogs Fora Twitter Mentions Indicators • Monthly releases • ~1,5M documents per release: news, blogs, fora. Expected total size ~5M documents • ~10M tweets per release total size ~30M tweets • 138 topics searched -> 1 dataset per topic
  • 113. Top-20 Twitter topics (n:~31M tweets) 0 500,000 1,000,000 1,500,000 2,000,000 climate change vaccination measles and newborn screening stress disorders diabetes mellitus attention deficit disorder with… depression transplantation weight loss and obesity cardiovascular risk factors alzheimer disease cancer therapy eating disorders hypertension and blood pressure myocardium and heart failure breast cancer schizophrenia and bipolar disorder dendritic cells and immunity asthma environmental exposure and air… Topic Topic name Num tweets 433 climate change 9,949,906 272 vaccination 1,760,780 175 measles and newborn screening 1,457,110 245 stress disorders 898,758 209 diabetes mellitus 858,118 294 adhd 706,055 315 depression 703,844 348 transplantation 699,582 121 weight loss and obesity 696,612 319 cardiovascular risk factors 647,843 254 alzheimer disease 637,668 362 cancer therapy 570,636 123 eating disorders 513,989 240 hypertension and blood pressure 452,499 302 myocardium and heart failure 445,434 284 breast cancer 415,986 366 schizophrenia and bipolar disorder 407,553 344 dendritic cells and immunity 397,980 169 asthma 383,321 373 env. exposure and air pollution 381,212
  • 114. Topic fluctuation Jan-Feb 3 123 175 254 272 362 0 10000 20000 30000 40000 50000 60000 70000 Topics: 3: anorexia, 123: bulimia, 175: measles, 254: Alzheimer, 272: vaccination, 362: cancer
  • 115. Virality From ten prominent topics according to virality, the most retweeted tweet together with its url. ID Topic Retweets URL 47lung cancer 145,421https://t.co/nAtqnmKCqW 3psychometrics 35,353https://t.co/xht4elJZ6w 450iron deficiency and anemia 4,401https://t.co/jBisW7YcRI 491acute lymphoblastic leukemia 11,338https://t.co/zc4qFt6fy5 324embryonic development 3,534https://t.co/xHd1kadSIf 433climate change 47,547https://t.co/zxzAlorA3O 175measles and newborn screening 15,561https://t.co/HjMoUva4nN 272vaccination 11,923https://t.co/d6l8vfmBVW 348transplantation 60,692https://t.co/FSmETQpSkm 362cancer therapy 5,031https://t.co/Qnvo8hTtdE 47 lung cancer 491 leukemia 433 climate change 272 vaccination 348 transplantation
  • 116. Task 5.4.3 Twitter conversation analysis • Builds on other WP5.4 activities, but takes a somewhat different approach to collecting data.  Focuses on relationships between social media posts (retweets, @tweets, #tweets)  Possible to construct meaningful tests as ”scripted dialogs”  Helps weed out spam  Amenable to content based text analysis at the conversation level (e.g. Sentiment analys, topic modelling)
  • 117. Referring to research in thread First collected tweet in thread: -[tweet id='13441' replyto='14018'] Independent research has shown that individuals who were vaccinated for the flu had 5.5 times more respiratory illness than those who were not vaccinated. [/tweet] - (A number of replies omitted; thread length: 313) - [tweet id='216387' replyto='216418'] In the light of new info, why not? It happens all the time.[/tweet] - (Replies omitted, showing those with reference) - [tweet id='216302' replyto='216387'] which is???DOI:10.1371/journal.pntd.0005179 [/tweet] - [tweet id='216261' replyto='216387'] 'Analysis of year 3 results of phase III trials of Dengvaxia suggest high rates of protection of vaccinated partial dengue immunes but high rates of hospitalizations during breakthrough dengue infections of persons who were vaccinated when seronegative...'DOI:10.1371/journal.pntd.0005179 [/tweet] -- [tweet id='216241' replyto='216387'] Phase III Trials, among our 9-year olds! FACT. DOI:10.1371/journal.pntd.0005179 [/tweet] --- [tweet id='215757' replyto='216241'] Phase 2 was all that is required for release Phase 3 was 'extra' 'Extra' studies are always done throughout the commercial lifetimes of drugs & vaccines Consequences of phase 3 results are nowhere near what group wud have us believe DOI:10.1371/journal.pntd.0005179 [/tweet]
  • 118. Vaccination on Twitter Topic bursts, user behaviour and referring to research in discussions
  • 119. Topic burst • Identify a day when activity is more than 50% above the daily average • The burst extends up to the next day with activity below the average • This period is compared to previous and following periods of equal length • This example: 4 day long burst in topic 272 (vaccination) 3 123 175 254 272 362 0 10000 20000 30000 40000 50000 60000 70000 14-Jan 15-Jan 16-Jan 17-Jan 18-Jan 19-Jan 20-Jan 21-Jan 22-Jan 23-Jan 24-Jan 25-Jan 26-Jan 27-Jan 28-Jan 29-Jan 30-Jan 31-Jan 1-Feb 2-Feb 3-Feb 4-Feb 5-Feb 6-Feb 7-Feb 8-Feb 9-Feb 10-Feb
  • 120. RT networks (similar structures, amount of RTs increases when activity is high) Word clouds based on hashtags (seemingly a topical shift during burst) 48% rts 55% rts 42.5% rts User groups and their relative activity Previous (144869 tweets) Burst (194712) Next (115557) Top 1% most active share (overall: 16%) 12 12 19 Next 9% share (overall: 17%) 20 18 18 90% least active share (overall: 67%) 68 70 63 The least active user group is more prominent when general activity is high while the most active user group is more prominent when activity is low.
  • 121. ”Deniers” (measles, vaxxed, mmr, autism, study, flu, hpv, informedconsent, vaxwoke, cdc, vaccineinjury, learntherisk, maga, gardasil, vaccineskill) ”Non-deniers 2” (measles, vaccineswork, publichealth, science, humanitariancrisis, scientificreport, antivax, vaccinessavelives, venezuela, crisis, humanitarianaid, help, antivaxxers, vaccinesaresafe, misinformation, scicomm, itrustvaccines, mmr, factsmatter) ”Non-deniers 1” (measles, vaccineswork, flu, hpv, antivax, vaxfactsfebruary, vaccinessavelives, immunization, antivaxxers, mumps, rotavirus, ethiopia, law, ebola) RT and coupled hashtag networks from burst period.
  • 122. Academic 27% Academically trained 11% Other Professional 23% Media 38% Policy/decision maker 1% 9,647 plain text biographies from Twitter profiles classified using a rule-based method: 30 % matched as professionals: Class Keyword example Science student student, studying, Graduated MS, MA, graduate University faculty lectur, prof., professor Other scientist technician, lab manager, -ologist Education and outreach curator, teacher, librarian Applied science organization nonprofit, philantropy Other professional recruiter, entrepreneur, manager Media professional journalis, publisher Policy/decision maker congressman, senator, parliament Ekström, B. (2019): Developing a rule-based method for identifying researchers on Twitter: The case of vaccine discussions Poster accepted to ISSI, 17th International Society of Scientometrics and Informetrics Conference, Rome, 2-5 September.
  • 123. How can we use Twitter-bio personas? - Retweet data
  • 124. How can we use Twitter-bio personas? Conversation data ?
  • 125. Data4Impact has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 770531. Thank you for your attention! The Data4Impact Consortium Visit out website: www.data4impact.eu Follow us on Twitter and SlideShare: @Data4Impact