Using Linked Data to Evaluate the
Impact of Research and
Development in Europe:
A Structural Equation Model
Amrapali Zaver...
Outline
• Research Question
• Datasets
• Data Extraction
• Structural Equation Modeling
• Result
• Conclusion & Future Wor...
Research Question
•

Research and Development (R&D) has a direct
effect over

•
•

Economic Performance (GDP) - EcoP

•

H...
Our Approach
•
•
•

Identify relevant statistical datasets
Choose & extract appropriate variables

•

Feed variables into ...
Datasets World Bank
•

International financial institution, collects and
processes large amount of data on the basis of
eco...
Datasets World Bank
Adolescent fertility rate
Immunization, DPT, measles
High technology exports
R&D expenditure
Birth rat...
Healthcare

Datasets World Bank

Adolescent fertility rate
Birth rate
Death rate
Fertility rate
Mortality rate, infant
Imm...
•

•

Datasets Eurostat
Statistical office of the European Union (EU)

•

provides statistical information to the
instituti...
Datasets Eurostat
Economic active population by sex, age and NUTS2
regions
Annual expenditure on public &
private educatio...
Datasets Eurostat

Educational Performance
Annual expenditure on public &
private educational institutions per pupil/stude...
Data Extraction
•
•

SPARQL
Advantages of LD:

•
•

Discovering relevant datasets
Data available in a single standardized
...
Structural Equation
Modeling
•

•

Statistical technique for testing and estimating
causal relations using a combination o...
Structural Equation
Modeling

• Step 1: Specify latent variables through
sequence of CFA and EFA

• EFA to detect latent v...
Result
• 4 latent variables
• 12(/18) observed variables
• 20(/28) EUR countries
• 10 years: 1999 - 2009
14
SEM Model

Result
Effect weights
Factor loadings
Measurement
errors

R&D

15
Conclusion
•

Using LD to evaluate the impact of R&D in
Europe backed by robust statistical analysis

•

Complex data anal...
Future Work
•
•
•
•
•

R + SPARQL*
Streamlined process
Application of dynamic systems modeling
More variables, more datase...
Thank You
Questions?
zaveri@informatik.uni-leipzig.de
http://aksw.org/AmrapaliZaveri
@amrapaliz
Upcoming SlideShare
Loading in...5
×

LOD-SEM

282

Published on

"Using Linked Data to Evaluate the Impact of Research and Development in Europe: A Structural Equation Model" presented at ISWC 2013 (http://link.springer.com/chapter/10.1007/978-3-642-41338-4_16)

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
282
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

LOD-SEM

  1. 1. Using Linked Data to Evaluate the Impact of Research and Development in Europe: A Structural Equation Model Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon
  2. 2. Outline • Research Question • Datasets • Data Extraction • Structural Equation Modeling • Result • Conclusion & Future Work 2
  3. 3. Research Question • Research and Development (R&D) has a direct effect over • • Economic Performance (GDP) - EcoP • Healthcare Performance (birth rate, death rate) - Hcare Education Performance (public spending on education) - EduP 3
  4. 4. Our Approach • • • Identify relevant statistical datasets Choose & extract appropriate variables • Feed variables into a Structural Equation Model (SEM) • • Exclude variables with low data quality Exclude variables that do not covariate with the others Obtain a stable model aligned to hypothesis 4
  5. 5. Datasets World Bank • International financial institution, collects and processes large amount of data on the basis of economic models and makes them openly available • Published as RDF • http://worldbank.270a.info/ 5
  6. 6. Datasets World Bank Adolescent fertility rate Immunization, DPT, measles High technology exports R&D expenditure Birth rate Death rate Fertility rate GDP Incidence of Tuberculosis Public spending on education Mortality rate, infant Researchers in R&D Health expenditure public 6
  7. 7. Healthcare Datasets World Bank Adolescent fertility rate Birth rate Death rate Fertility rate Mortality rate, infant Immunization, DPT, measles Incidence of Tuberculosis R&D R&D expenditure Researchers in R&D Health expenditure public Economic Performance GDP High technology exports Public spending on education 7
  8. 8. • • Datasets Eurostat Statistical office of the European Union (EU) • provides statistical information to the institutions of the EU to promote harmonization of statistical methods across its member states and candidates for accession as well as European Free Trade Association (EFTA) countries Published as RDF • http://eurostat.linked-statistics.org/ 8
  9. 9. Datasets Eurostat Economic active population by sex, age and NUTS2 regions Annual expenditure on public & private educational institutions per pupil/student Biotechnology patent applications to EPO by priority year, country and metropolitan regions Financial aid to students 9
  10. 10. Datasets Eurostat Educational Performance Annual expenditure on public & private educational institutions per pupil/student Financial aid to students Biotechnology patent applications to EPO by priority year, country and metropolitan regions Economic Performance Economic active population by sex, age and NUTS2 regions 10
  11. 11. Data Extraction • • SPARQL Advantages of LD: • • Discovering relevant datasets Data available in a single standardized structured format (RDF) • • avoiding heterogeneity of similar kinds of data (measures and their units) Supported query mechanism to acquire data 11
  12. 12. Structural Equation Modeling • • Statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions • • Latent variables Observed variables Measured by • • Exploratory Factor Analysis (EFA) Confirmatory Factor Analysis (CFA) 12
  13. 13. Structural Equation Modeling • Step 1: Specify latent variables through sequence of CFA and EFA • EFA to detect latent variables • CFA to confirm the structure • Step 2: Specify & identify SEM based on the research question by inserting one variable at a time • Statistical analysis done in R* * http://www.r-project.org/ 13
  14. 14. Result • 4 latent variables • 12(/18) observed variables • 20(/28) EUR countries • 10 years: 1999 - 2009 14
  15. 15. SEM Model Result Effect weights Factor loadings Measurement errors R&D 15
  16. 16. Conclusion • Using LD to evaluate the impact of R&D in Europe backed by robust statistical analysis • Complex data analysis on LD can lead to important & meaningful insights on publicly available data * http://cran.r-project.org/web/packages/SPARQL/ 16
  17. 17. Future Work • • • • • R + SPARQL* Streamlined process Application of dynamic systems modeling More variables, more datasets More countries, regional level analysis * http://cran.r-project.org/web/packages/SPARQL/ 17
  18. 18. Thank You Questions? zaveri@informatik.uni-leipzig.de http://aksw.org/AmrapaliZaveri @amrapaliz
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×