Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data4Impact booklet overview of results

92 views

Published on

Data4Impact booklet demonstrates linkages between data collected across the three dimensions of impact, including academic, economic and societal impact

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data4Impact booklet overview of results

  1. 1. for impact Overview of Data4Impact results Big Data approaches for improved assessment of the societal impact in the Health, Demographic Change and Wellbeing Societal Challenge
  2. 2. Data4Impact Analytical Framework Data4Impact’s Analytical Model Of Societal Impact Assessment (AMOSIA) is structured around four distinct phases of the research lifecycle, including input, throughput, output, and impact. Relying on novel big data techniques such as web scraping, crawling and mining as well as text analysis methods such as Natural Language Processing and deep learning, we have gathered data for each analytical phase. Data4Impact builds on data harvested from PubMed, OpenAIRE, Lens.org, PATSTAT, clinical guidelines repositories, company websites, social media and media platforms, EC monitoring data and other databases. The booklet demonstrates linkages between data collected across three dimensions of impact, including academic, economic and societal impact. EU R&I acঞviঞes Sci-tech output Economic output Policy feedback Sci-tech impact Economic impact Scienঞfic- technological impact path Input Throughput Output Impact Economic impact path Translaঞonal acঞviঞes Translaঞonal acঞviঞes Societal ImpactThroughput Direct disseminaঞon acঞviঞes Source: Data4Impact Programme inputs: Data4Impact covered almost 13,000 projects funded under the EU Framework Programmes. For analytical purposes, the projects were split into the core and extended sets. The FP7 Core and H2020 Core sets correspond to projects funded under FP7 Health and Societal Challenge 1, respectively. Projects from other parts of FP7 and H2020 were included in the extended set if at least 20% of their publication output was found in PubMed. Number of research projects analysed by Data4Impact, by EU Framework Programme Core Set Extended Set FP7 998 8,332 H2020 669 2,253 Total 1,667 10,585 Sources: Cordis, Data4Impact EC contribution for the FP7-Core and H2020- Core projects Sources: Cordis, Data4Impact FP7 H2020 0 1 2 3 4 5 6 7 Total budget (billion EUR) EC contribution (billion EUR) 6, 405 4,747 2,974 2,357 Selected Throughputs and Outputs Data4Impact extracted data on innovation outputs from unstructured/text data from the publicly available EC monitoring data using a range of text processing and entity recognition techniques. As a result, data on the key innovation outputs produced became structured. Numbers are lower for the H2020-Core set because many of the analysed H2020 projects were still ongoing at the time of analysis. Key innovation outputs produced by the FP7-Core (left) and H2020-Core (right) projects 0 2000 4000 6000 8000 10000 12000 Biomarker Biorepository Clinical trial Device Diagnosঞc tool Disseminaঞon Drug Educaঞon Employment Gene Infrastructure Material Method Protein Protocol Prototype Publicaঞon So[ ware Standard Study System Treatment 0 100 200 300 400 500 600 700 800 900 Biomarker Biorepository Clinical trial Device Diagnosঞc tool Disseminaঞon Drug Gene Infrastructure Material Metabolite Method Protocol Prototype So[ ware Standard Study System Treatment Source: Data4Impact based on analysis of European Commission monitoring data Data4Impact linked PubMed data to Lens.org which produced new insights on the technological value of the research performed. Of particular relevance might be publications which were cited in multiple patents or patents which had high technological value themselves. Data were produced for the 40+ funders covered in the project. Top-10 funders by publication output in PubMed, based on citations of publications in patents Funder Number of publications analysed Share of publications cited in patents at least once Share of publications cited in patents at least 5 times National Institutes of Health (US) 397,886 4.4% 0.8% Wellcome Trust (UK) 97,434 6.8% 1.2% European Commission 84,038 5.5% 0.7% National Science Foundation (US) 52,366 4.5% 0.7% Medical Research Council (UK)* 45,246 10.0% 1.7% Research Councils UK* 39,214 2.9% 0.3% Biotechnology and Biological Sciences Research Council (UK)* 22,260 9.8% 1.5% National Health and Medical Research Council (Australia) 21,181 2.3% 0.2% Swiss National Science Foundation (Switzerland) 15,961 5.3% 0.8% Austrian Science Fund (Austria) 13,816 5.6% 0.7% Note: Medical Research Council, Research Councils UK and Biotechnology and Biological Sciences Research Council transitioned into the UK Research and Innovation which was created as a result of the Higher Education and Research Act (HERA) in 2017 Sources: OpenAIRE, PubMed/Europe PMC, Lens.org Data4Impact analytical framework AMOSIA
  3. 3. Selected Throughputs and Outputs Academic Impact and Societal Relevance of Research Creation of new companies: in total, Data4Impact found direct evidence for 430 newly created companies in FP7. Data4Impact identified that 51 new companies/start-ups were created in the FP7- Core projects. Below we show a selection of projects from the FP7-Core set which created two or more new companies. The identified new companies can be matched to Orbis and multiple product databases. Their innovation performance can also be tracked by analysing data from their websites (see the Economic impact section for the types of data analysed). Selected projects with two or more newly created companies in the FP7-Core set Project number Project acronym Number of spin-offs 201924 EDICT 3 223744 DOPAMINET 2 201418 READNA 2 278832 hiPAD 2 279039 ComplexINC 2 Source: Data4Impact based on analysis of European Commission monitoring data Data4Impact applied text analysis techniques to assign projects to major diseases under the International Classification of Diseases (ICD-11). The approach was bottom-up, i.e. projects were assigned to clusters based on the actual research performed and not programmatic structure. Similar data can be produced for other funders. Additional analyses can be performed with the data, e.g. collaboration networks can be analysed in each ICD class to assess central actors in the communities or the level of interdisciplinarity. Estimated share of EU funding allocated to the ICD-11 class in the FP7-Core (left) and H2020-Core (right) projects 10% 29% 10% 5% 14% 11% 7% 6% 3% 5% 1. Certain infecঞous and parasiঞc diseases 2. Neoplasms 4. Endocrine, nutriঞonal and metabolic diseases 5. Mental and behavioral disorders 6. Diseases of the nervous system 9. Diseases of the circulatory system 10. Diseases of the respiratory system 11. Diseases of the digesঞve system 13. Diseases of the musculoskeletal system and connecঞve ঞssue 14. Diseases of the genitourinary system 15% 20% 13% 3% 13% 11% 6% 8% 5% 6% Source: Data4Impact based on analysis of European Commission monitoring data Collaboration network in ICD-9 class (Diseases of the Circulatory System) Demyelinaঞng diseases of the central nervous system (0,74%) Malignant neuroplasms (1,49%) Cerebrovascular diseases (19,23%) hypertensive diseases (0,12%) Ischaemic heart diseases (48,01%) Diseases of arteries, arterioles and capillaries (2,85%) Visual disturbances and blindness (0,25%) Other forms of heart diseases (27,3%) Source: Data4Impact based on analysis of European Commission monitoring data Data4Impact clustered over 5 million publications in PubMed into 442 research topics and 9 major topic categories using Natural Language Processing and deep learning techniques. This resulted in a bottom-up mapping of the research performed in the health domain. About 20% of publications in the sample were funded by the 40+ funders covered by Data4Impact. Mapping of research in PubMed Topic category Estimated share of research output in PubMed Number of research topics in the Data4Impact topic model 1. Infectious Diseases 7.2% 34 2. Non-Communicable Diseases 18.6% 86 3. Health systems, public health & epidemiology 14.5% 63 4. Diagnostics, treatment development, surgery 6.4% 26 5. Molecular cell biology 26.1% 118 6. Methods, models, technologies, databases 11.5% 46 7. Physiology 3.2% 15 8. Cognition and behaviour 4.6% 18 9. Other 7.9% 36 Total 100.0% 442 Note: Data are preliminary and subject to change after further development of the Data4Impact topic model Sources: PubMed/Europe PMC, Data4Impact Trend data were produced for each topic in the Data4Impact topic model. This led to the identification of fastest growing research topics. As the model links data to over 40 funders, one can produce indicators on the timeliness of the research performed in different research programmes. Our preferred approach was to compute indicator values using top-10% of fastest growing research topics, however other approaches can be applied. Share of research output in top-10% fastest growing topics, selected funders Funder Share of research output in top-10% fastest growing research topics National Health and Medical Research Council (Australia) 24.7% Research Councils UK* 23.5% European Commission 19.5% National Institutes of Health (US) 16.7% Swiss National Science Foundation (Switzerland) 16.2% Wellcome Trust (UK) 14.5% Biotechnology and Biological Sciences Research Council (UK)* 11.2% Medical Research Council (UK)* 11.1% Total PubMed 9.9% Note: Data are preliminary and subject to change after further development of the Data4Impact topic model Note: Medical Research Council, Research Councils UK and Biotechnology and Biological Sciences Research Council transitioned into the UK Research and Innovation which was created as a result of the Higher Education and Research Act (HERA) in 2017 Sources: PubMed, OpenAIRE, Data4Impact
  4. 4. Academic Impact and Societal Relevance of Research Topic view: Cardiovascular Diseases One can go to the topic level and identify the share of output allocated to each topic within each programme. The data suggest that the EU Framework Programmes invested more in several fast-growing research topics shown below than would be expected in PubMed. However, the topic of cleft palate was covered to a lesser extent in the EU Framework Programme despite being a fast-growing topic. This represents a potential missed opportunity for the programmes. Additional insights on missed opportunities can be gained once data are analysed at the topic level across multiple funders. Investment in fast-growing research topics by EU Framework Programmes, selected topics Topic name Estimated share of research output in the EU Framework Programmes Estimated share of research output in PubMed Copy number variations (genome) 0.5% 0.2% Graphene & nanotechnology 1.3% 0.4% Complement activation 0.9% 0.2% DNA sequence processing 0.3% 0.2% Cleft palate <0.1% 0.3% Gut microbiota 0.4% 0.2% Note: Data are preliminary and subject to change after further development of the Data4Impact topic model Sources: PubMed, OpenAIRE, Data4Impact By addressing major societal challenges, research funders are ultimately accountable to patients and the society in general. In addition to funding timely research, it may be important to know which topics correspond to the most pressing societal needs. Data4Impact analysed the incidence and popularity of the research topics using social media and media data to produce data for the most discussed research topics. These proxy data were compared to actual spending on the topics. In the case of the EU Framework Programmes, for example, there is a very strong match between the estimated awareness/needs and actual spending. Similar data can be produced for other funders. Ranking of most discussed topics by funding provided by EU Framework Programmes, selected topics Topic name Societal buzz score Topic rank based on societal buzz Number of EU projects related to the topic EU funding, EUR million Topic rank based on investment made in the EU FPs Vaccines 53,583 1 50 221,130,874 1 Cardiovascular diseases 23,962 2 25 83,811,729 7 Regenerative medicine 15,368 3 22 93,625,035 3 Stem cells 13,707 4 22 89,136,591 6 Gene therapy 11,279 5 29 150,682,145 2 Alzheimer’s disease 10,563 6 18 89,635,114 5 Proteins 8,433 7 27 82,118,948 8 Preterm neonates, pediatric medicine 3,996 8 26 77,561,258 9 Tuberculosis 3,547 9 17 91,974,649 4 Note: Data are preliminary and subject to change after further development of the Data4Impact topic model Source: Data4Impact This page shows selected indicator values for the cardiovascular diseases research topic. The data presentedhavebeenderivedfromtheData4Impacttopicmodel.Thedatacoverallthe40+fundersanalysedin Data4Impact. Topic size: large (2 times larger than an average topic in PubMed) Topic trend: moderately growing (output 1.25 times larger in 2012-2018 than in 2005-2011) Topic funding exclusivity: low (many funders investing in the topic) The table below shows top-10 funders in the cardiovascular diseases research topic. In addition to producing data on key funders per topic, we can also take an inverse look at the data and produce a list of key research topics by size for each funder. Top-10 funders in the topic by share of research output, ranked Funder Rank National Institutes of Health (US) 1 Medical Research Council (UK)* 2 European Commission 3 Wellcome Trust (UK) 4 British Heart Foundation (UK) 5 National Health and Medical Research Council (Australia) 6 Research Councils UK* 7 Swedish Research Council (Sweden) 8 Chief Scientist Office (UK) 9 Cancer Research UK 10 Note: Medical Research Council, Research Councils UK and Biotechnology and Biological Sciences Research Council transitioned into the UK Research and Innovation which was created as a result of the Higher Education and Research Act (HERA) in 2017 Source: Data4Impact Cardiovascular risk factors most actively discussed by citizens and the media Source: Data4Impact based on analysis of media and social media data 7% 10% 19% 8% 6% 26% 6% 16% 2% Physical inac ity Smoking Obesity Alcohol Hypertension Diabetes Family history Blood pressure Blood liquids
  5. 5. 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Total FP7-Core 17. Congenital malformaঞons, deformaঞons and chromosomal abnormaliঞes 14. Diseases of the genitourinary system 13. Diseases of the musculoskeletal system and connecঞve ঞssue 11. Diseases of the digesঞve system 10. Diseases of the respiratory system 9. Diseases of the circulatory system 7. Diseases of the eye and adnexa 6. Diseases of the nervous system 5. Mental and behavioural disorders 4. Endocrine, nutriঞonal and metabolic diseases 3. Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism 2. Neoplasms 1. Certain infecঞous and parasiঞc diseases Economic impact Societal and Health Impact Data4Impact developed a state-of-the-art web scraper and Natural Language Processing model to identify and classify innovation data in company websites. In total, the research team analysed 2,097 FP7 and H2020 companies, harvested over 1.5 million URL links and identified over 15,000 mentions of innovations in company websites. For illustrative purposes, the results shown in this section correspond to companies which participated in the FP7 Core projects (i.e. FP7 Health). Similar results can be produced for the FP7 Extended, H2020 Core and H2020 Extended sets. Innovations linked to the FP7-Core companies, by innovation type 53% 47% Innovaঞon outputs Innovaঞon acঞviঞes 70% 30% Product innovaঞons Process, service and other types of innovaঞon 28% 32%4% 7% 29% IPR and licensing acঞviঞes Acquisiঞons Private funding a‚ racted Public funding a‚ racted Other/Unassigned Source: Data4Impact based on analysis of data extracted from company websites Indicator Indicator value (FP7-Core projects) Estimated share of enterprises with evidence of innovation activities 46.0% Average number of innovation outputs and activities identified per company 16.1% Estimated share of highly innovative enterprises 7.4% Estimated share of enterprises with evidence of licensing activities (incl. patent/trademark license agreements) 9.3% Estimated share of enterprises involved in activities related to acquisitions 20.0% Estimated share of enterprises with evidence of private investment/ capital attracted 8.0% Summary of innovation profiles of companies in the FP7 Core projects Source: Data4Impact based on analysis of data extracted from company websites Estimated overlap between project activities in FP7 & identified company innovations, by ICD class 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Total FP7-Core 17. Congenital malformaঞons, deformaঞons and chromosomal abnormaliঞes 14. Diseases of the genitourinary system 13. Diseases of the musculoskeletal system and connecঞve ঞssue 11. Diseases of the digesঞve system 10. Diseases of the respiratory system 9. Diseases of the circulatory system 7. Diseases of the eye and adnexa 6. Diseases of the nervous system 5. Mental and behavioural disorders 4. Endocrine, nutriঞonal and metabolic diseases 3. Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism 2. Neoplasms 1. Certain infecঞous and parasiঞc diseases Source: Data4Impact based on analysis of European Commission monitoring data and data extracted from company websites Uptake of health research in clinical guidelines: if a publication is cited in clinical guidelines, it can be regarded as a contributor to clinical practice in certain health areas. Citations of health research in clinical guidelines can thus be regarded as a proxy indicator for the impact of R&I activities on public health. Data4Impact analysed a set of national and international clinical guidelines and extracted their citation data. The citations were then linked to publications funded by the 40+ funders covered in Data4Impact. The figure below shows data for the EU Framework Programmes. Similar data can be produced for other funders. Number of guidelines citing EU-funded publications Source: Data4Impact Contribution of R&I activities to the development of new medicines and products: research funders increasingly invest in translational, applied and close-to-market R&I activities to produce new medicines, devices and technologies on the market. Tracking new medicines and linking them to projects is a challenge due to the time lag and the fact that projects are usually no longer monitored after the end of research funding. To the best of our knowledge, Data4Impact is the first project to ‘travel in time’ and link research activities to product databases provided by the European Medicines Agency. The team identified multiple medicinal products and orphan medicines which were linked to previous research activities in the EU Framework Programmes. Number of human medicinal products directly linked to EU Framework Programmes 62 Medicines analysed in the same projects where the sponsors p cipated 612 Medicines linked to companies which cipated inEU Framework Programmes 962 Medicines or substances me oned in EU Framework Programme projects 1562 Total number of medicines analysed Source: Data4Impact based on analysis of EMA data Funder Number of guidelines WHO International 25 Cochrane - Reviews 20 NICE Guidelines 17 Folkhälsomyndighetens 16 AWMF 10 American Academy of Neurology Practice Guidelines 5 Helsedirektoratet 5 SBU Utvärdering 5 Läkemedelsverket behandlingsrekommendationer 3 SST Sundhedsstyrelsen 3 SIGN Guidelines 2 1. Certain infectious and parasitic diseases 2. Neoplasms 3. Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism 4. Endocrine, nutritional and metabolic diseases 5. Mental and behavioural disorders 6. Diseases of the nervous system 7. Diseases of the eye and adnexa 9. Diseases of the circulatory system 10. Diseases of the respiratory system 11. Diseases of the digestive system 13. Diseases of the musculoskeletal system and connective tissue 14. Diseases of the genitourinary system 17. Congenital malformations, deformations and chromosomal abnormalities Total FP7-Core
  6. 6. Societal and Health Impact Data4Impact Indicators Top-5 human medicinal products most cited in FP7 projects Medicine name Active substance Marketing authorisation holder Total number of mentions of medicine name & active substance Orfadin Nitisinone Swedish Orphan Biovitrum International AB 4,290 Alkindi Hydrocortisone Diurnal Europe B.V. 3,144 Ferriprox Deferiprone Apotex Europe BV 2,789 Herceptin Trastuzumab Roche Registration GmbH 1,210 Aplidin Plitid Pharma Mar, S.A. 650 Source: Data4Impact based on analysis of EMA data Number of orphan medicines directly linked to EU Framework Programmes Top-5 orphan medicines most cited in FP7 projects Orphan medicine name or active substance Marketing authorisation holder Total number of mentions of medicine name & active substance Givinostat ITALFARMACO SPA 158 Sapacitabine CYCLACEL LIMITED 96 Polihexanide S.I.F.I. SOCIETA INDUSTRIA FARMACEUTICA ITALIANA SPA 64 Cannabidivarin GW PHARMA LIMITED 36 Cannabidiol GW PHARMA LIMITED 21 Source: Data4Impact based on analysis of EMA data Level Indicator Description Input level indicators Funding volume Monetary expenditure on R&I activities Throughput and output level indicators Publications Number of publications produced by programme/funder cited in patents Number of highly cited publications in patents Patents Number of patents produced Innovation outputs produced by EU FPs projects Number of health-specific innovation outputs produced in projects funded by the EU Framework Programmes New companies Number of new companies/start-ups created in the EU Framework Programmes Innovation outputs produced by companies participating in R&I activities Number of product innovations; or process, service or other innovations announced by companies Innovation activities carried out by companies participating in R&I activities Number of licensing agreements; or cases of acquisitions; or cases of private funding attracted; or cases of public funding attracted by companies; or cases of newly CE-marked medical devices or technologies Academic impact Funding priorities Topic size in PubMed (absolute and normalized) Distribution of topics per funder (normalized) Distribution of funders per topic (absolute) Timeliness of research performed Rate of topic growth between 2012-2018 compared to 2005- 2011 Share of funding allocated to top-10% fastest growing topics Funding exclusivity Share of funding allocated to top-10% smallest research topics by size (i.e. investment in small-niche topics) Number of funders per topic whose output share exceeds 3% globally Share of funding allocated to research topics with less than 5 funders whose share exceeds the 3% mark (i.e. investment in topics where few other funders invest) Technological value/ significance of patents Analysis of the extent to which commonly patent forward citations, i.e. citations a patent receives from subsequent patent filings, are used Economic impact Economic and innovation performance of companies Estimated share of enterprises with evidence of innovation activities Estimated share of highly innovative enterprises Estimated share of enterprises with evidence of licensing activities (incl. patent/trademark license agreements) Estimated share of enterprises involved in activities related to acquisitions Estimated share of enterprises with evidence of private investment/capital attracted Continuity of innovation activities Estimated overlap between project activities in FP7 & identified company innovations Number of newly CE-marked devices and medical technologies that could be directly linked to R&I activities in the EU Framework Programmes Societal/health impact Impact on public health Citations of publications in clinical guidelines Societal awareness/relevance of research Rank of research topic based on number of news articles, blogs, posts, tweets, etc. discussing a given topic Congruence of research funding with societal priorities Rank similarity of most discussed research topics versus actual spending in the topics Newly launched medicines and medicinal products Number of human medicinal products or orphan medicines that could be directly linked to R&I activities in the EU Framework Programmes Strength of link based on the number of mentions of product names and their active substances in EC monitoring data The following table summarises key Data4Impact indicators. 1316 Total number of orphan medicines analysed 190 Or oned in EU Framework Programme projects 21 Orphan medicines analysed in the same projects where the sponsors par ipated 93 Orphan medicines linked to companies cipated in EU Framework Programmes
  7. 7. Who we are Visit out website: www.data4impact.eu Follow us on Twitter: @Data4Impact Discover our presentations on SlideShare: @Data4Impact Data4Impact is a Horizon 2020 project funded by the European Commission. We pioneer big data techniques and develop pilot approaches which track the legacy and impact of research activities afterthe end of public funding. We have developed a series of indicators on the performance and societal impact of 40+ research programmes in the health domain. Contact Us Consortium partners: This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 770531

×