The RDFIndex-MTSR 2013

  • 176 views
Uploaded on

The presentation of the RDFIndex/Computex vocabulary at MTSR2013.

The presentation of the RDFIndex/Computex vocabulary at MTSR2013.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
176
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The RDFIndex approach | MTSR 2013 Leveraging semantics to represent and compute quantitative indexes. The RDFIndex approach. Michalis Vafopoulos (speaker) and {Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and Jose Emilio Labra-Gayo} MTSR 2013 | 7th Metadata and Semantics Research Conference Main Track 1 / 43
  • 2. The RDFIndex approach | MTSR 2013 1 Introduction 2 Related Work 3 Main Contributions 4 The RDFIndex approach 5 Evaluation and Discussion 6 Conclusions and Future Work 7 Metadata and Information 2 / 43
  • 3. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Let’s suppose that... 1 We want to create a “Health index”... 2 ...to know which is the most “healthy” country. 3 ...to know which is the performance of the health expenditure . 4 ...to collect in just one value a set of indicators. 5 ...to make some new policy. 3 / 43
  • 4. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... It is not a big deal because... We have the “Health Index” by an international network of physicians and researchers (http://www.healthindex.com/). ...or the “Health Index” by the United Nations (http://hdrstats.undp.org/en/indicators/72206.html). ...or the “Ocean Health Index” by an international collaborative effort (http://www.oceanhealthindex.org/). ...or the indicators in the World Bank (http://data.worldbank.org/topic/health). ... 4 / 43
  • 5. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Benefits of using an index... 1 Creation of valuable data and information. 2 Generation of know-how to make some policy. 3 Re-use of a great effort and commitment by experts in some area. 4 Rank entities according to a quantitative value. 5 ... 5 / 43
  • 6. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Drawbacks of existing indexes... 1 Data heterogeneity: different datasources, formats and access protocols. 2 Structure: math models to aggregate some indicators that can change over time. 3 Computation process: observations are gathered and processed, somehow, generating a final value. 4 Documentation: multilingual and multicultural character of information. 5 ... ..that imply the necessity of... 1 Accessing data and information under a common and shared data model. 2 Representing the evolving structure of the index. 3 Computing the index to improve transparency. 4 Providing context-aware documentation: user-profile. 5 ... Exploiting valuable data and metadata. 6 / 43
  • 7. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Drawbacks of existing indexes... 1 Data heterogeneity: different datasources, formats and access protocols. 2 Structure: math models to aggregate some indicators that can change over time. 3 Computation process: observations are gathered and processed, somehow, generating a final value. 4 Documentation: multilingual and multicultural character of information. 5 ... ..that imply the necessity of... 1 Accessing data and information under a common and shared data model. 2 Representing the evolving structure of the index. 3 Computing the index to improve transparency. 4 Providing context-aware documentation: user-profile. 5 ... Exploiting valuable data and metadata. 7 / 43
  • 8. The RDFIndex approach | MTSR 2013 Introduction ...but...Is it a common problem? ...some indexes (per domain)... 1 Bibliography: the JCR and JSR, etc. 2 Government: the GDP, etc. 3 Web: the Webindex, etc. 4 Health: (the aforementioned ones). 5 Cloud: the CSC Cloud Usage Index, the VMWare index, the SMI index, etc. 6 ...to name a few per domain and creators. 8 / 43
  • 9. The RDFIndex approach | MTSR 2013 Introduction A quantitative index is... 1 It is technique to collect in just one value a set of indicators. 2 It can be divided into: index, component and indicator. 3 An index is calculated by aggregating n components using an OWA operator 1 . 4 A component is also calculated by aggregating n indicators using an OWA operator. 5 Observations are aligned to an indicator including some metadata such as: location, measure, value, etc. 1 Ordered weighted averaging operator: aggregated element ai . n i =1 wi ai , where wi is the weight of the 9 / 43
  • 10. The RDFIndex approach | MTSR 2013 Introduction Graphical view of a quantitative index... 10 / 43
  • 11. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Vocabularies 1 The Statistical Core Vocabulary [5] (SCOVO), a former standard to describe statistical information in the Web of Data (2009). 2 The RDF Data Cube Vocabulary [2], an adaptation of the ISO standard SDMX (Statistical Data and Metadata Exchange Vocabulary) (2013). 3 “DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data.” (2013) [1] . Preliminary evaluation... Existing RDF-based vocabularies enable us the possibility of modelling and representing statistical data. 11 / 43
  • 12. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Vocabularies 1 The Statistical Core Vocabulary [5] (SCOVO), a former standard to describe statistical information in the Web of Data (2009). 2 The RDF Data Cube Vocabulary [2], an adaptation of the ISO standard SDMX (Statistical Data and Metadata Exchange Vocabulary) (2013). 3 “DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data.” (2013) [1] . Preliminary evaluation... Existing RDF-based vocabularies enable us the possibility of modelling and representing statistical data. 12 / 43
  • 13. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Statistics and Linked Data 1 “Defining and Executing Assessment Tests on Linked Data for Statistical Analysis” (2011) [8]. 2 “Publishing Statistical Data on the Web” (2012) [7]. 3 “Publishing open statistical data: the Spanish census” (2011) [4]. 4 “Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project” (2012) [6]. 5 “Linked Open Data Statistics: Collection and Exploitation” (2013) [3]. 6 Some works in the “RDF Validation Workshop 2013” (http://www.w3.org/2012/12/rdf-val/). Preliminary evaluation... All of the approaches are/were focused on data publishing/consumption...but... Validation of statistical data and/or structure...and the Computation process are still open issues. 13 / 43
  • 14. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Statistics and Linked Data 1 “Defining and Executing Assessment Tests on Linked Data for Statistical Analysis” (2011) [8]. 2 “Publishing Statistical Data on the Web” (2012) [7]. 3 “Publishing open statistical data: the Spanish census” (2011) [4]. 4 “Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project” (2012) [6]. 5 “Linked Open Data Statistics: Collection and Exploitation” (2013) [3]. 6 Some works in the “RDF Validation Workshop 2013” (http://www.w3.org/2012/12/rdf-val/). Preliminary evaluation... All of the approaches are/were focused on data publishing/consumption...but... Validation of statistical data and/or structure...and the Computation process are still open issues. 14 / 43
  • 15. The RDFIndex approach | MTSR 2013 Main Contributions Main Contributions 1-Representation A high-level model on top of the RDF Data Cube Vocabulary for representing quantitative indexes. 2-Computation A Java-SPARQL based processor to exploit the meta-information, validate and compute the new index values. 15 / 43
  • 16. The RDFIndex approach | MTSR 2013 Main Contributions Main Contributions 1-Representation A high-level model on top of the RDF Data Cube Vocabulary for representing quantitative indexes. 2-Computation A Java-SPARQL based processor to exploit the meta-information, validate and compute the new index values. 16 / 43
  • 17. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example: Building the “The World Bank Naive Index”. Components: “Aid Efectiveness” (c1 ) and “Health” (c2 ). Indicators: “Life Expectancy” (in1 ) and “Health expenditure, total ( %) of GDP” (in2 ). The index, i, is calculated through the ordered weighted averaging (OWA) operator: n wi ci , where wi is the weight of the component i=1 ci All observations must be normalized using the z-score before computing intermediate and final values for each indicator, component and index. It is necessary to populate the average age by country and year to create a new “derivate” indicator of “Life Expectancy” without the “sex” dimension. 17 / 43
  • 18. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example of indicator observations from the WorldBank. Description Year Country Value Status Life Expectancy Male 2010 Spain 79 Normal Life Expectancy Male 2011 Spain 79 Normal Life Expectancy Female 2010 Spain 85 Normal Life Expectancy Female 2011 Spain 85 Normal Life Expectancy Male 2010 Greece 78 Normal Life Expectancy Male 2011 Greece 79 Normal Life Expectancy Female 2010 Greece 83 Normal Life Expectancy Female 2011 Greece 83 Normal Health expenditure, total ( % of GDP) 2010 Spain 74,2 Normal Health expenditure, total ( % of GDP) 2011 Spain 73,6 Normal Health expenditure, total ( % of GDP) 2010 Spain 61,5 Normal Health expenditure, total ( % of GDP) 2011 Spain 61,2 Normal 18 / 43
  • 19. The RDFIndex approach | MTSR 2013 The RDFIndex approach Representing and computing data with the RDFIndex (summary) Steps 1 2 Define the structure and computation of the index with the RDFIndex vocabulary. Run the Java SPARQL based processor: Load the RDFIndex ontology to have access to common definitions. Create a kind of Abstract Syntax Tree (AST) containing the defined meta-data (our index). Validate the structure with an AST walker (structure and RDF Data Cube normalization). Create the SPARQL queries to compute the index (other AST walker). Load the observations and run the SPARQL queries to generate new observations. 19 / 43
  • 20. The RDFIndex approach | MTSR 2013 The RDFIndex approach Graphical view of the RDFIndex workflow... 20 / 43
  • 21. The RDFIndex approach | MTSR 2013 The RDFIndex approach Underlying definitions Observation-o It is a tuple {v , m, s}, where v is a numerical value for the measure m with an status s that belongs to only one dataset of observations O. Dataset-q It is a tuple {O, m, D, A, T } where O is a set of observations for only one measure m that is described under a set of dimensions D and a set of annotations A. Additionally, some attributes can be defined in the set T for structure enrichment. Aggregated dataset-aq It is an aggregation of n datasets qi (identified by the set Q) which set of observations O is derivate by applying an OWA operator p to the observations Oqi . 21 / 43
  • 22. The RDFIndex approach | MTSR 2013 The RDFIndex approach Scope and Consequences Necessary condition for the computation process An aggregated dataset aq defined by means of the set of dimensions Daq can be computed iif ∀qj ∈ Q : Daq ⊆ Dqj . Furthermore the OWA operator p can only aggregate values belonging to the same measure m. The set of dimensions D, annotations A and attributes T for a given dataset Q is always the same with the aim of describing all observations under the same context. An index i and a component c are aggregated datasets. Nevertheless this restriction is relaxed if observations can be directly mapped to these elements without any computation process. An indicator in can be both dataset or aggregated dataset. All elements in definitions must be uniquely identified. An aggregated dataset is also a dataset. 22 / 43
  • 23. The RDFIndex approach | MTSR 2013 The RDFIndex approach Mapping the RDFIndex to the RDF Data Cube Vocabulary Concept Vocabulary element Comments Observation o qb:Observation Enrichment through annotations Numerical value v xsd:double Restriction to numerical values Measure m qb:MeasureProperty sdmx-measure:obsValue Restriction to one measure Status s sdmx-concept:obsStatus Defined by the SDMX-RDF vocabulary Dataset q qb:dataSet and qb:qb:DataStructureDefinition Metadata of the qb:dataSet Dimension di ∈ D qb:DimensionProperty Context of observations Annotation ai ∈ A owl:AnnotationProperty Intensive use of Dublin Core Attribute ati ∈ T qb:AttributeProperty Link to existing datasets such as DBPedia OWA operator p SPARQL 1.1 aggregation operators Other extensions depend on the RDF repository Index, Component and Indicator skos:Concept SKOS taxonomy (logical structure) 23 / 43
  • 24. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example of the “World Bank Naive index” structure in RDF. § @prefix @prefix @prefix ¤ r d f i n d e x : < h t t p : // p u r l . o r g / r d f i n d e x / o n t o l o g y /> . r d f i n d e x −wb: < h t t p : // p u r l . o r g / r d f i n d e x /wb/ r e s o u r c e /> . r d f i n d e x −w b o n t : < h t t p : // p u r l . o r g / r d f i n d e x /wb/ o n t o l o g y /> . r d f i n d e x −w b : T h e W o r l d B a n k N a i v e I n d e x a rdfindex:Index ; r d f s : l a b e l " The ␣ World ␣ Bank ␣ N a i v e ␣ I n d e x " @en ; rdfindex:type rdfindex:Quantitative ; rdfindex:aggregates [ r d f i n d e x : a g g r e g a t i o n −o p e r a t o r r d f i n d e x : O W A ; r d f i n d e x : p a r t −o f [ r d f i n d e x : d a t a s e t r d f i n d e x −w b : A i d E f f e c t i v e n e s s ; rdfindex:weight 0.4]; r d f i n d e x : p a r t −o f [ r d f i n d e x : d a t a s e t r d f i n d e x −w b : H e a l t h ; rdfindex:weight 0.6]; ]; #More meta−d a t a p r o p e r t i e s . . . qb:structure r d f i n d e x −wb:TheWorldBankNaiveIndexDSD ; . r d f i n d e x −wb:TheWorldBankNaiveIndexDSD a qb:DataStructureDefinition ; qb:component [ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −a r e a ] , [ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −y e a r ] , [ qb:measure rdfindex:value ] , [ q b : a t t r i b u t e sdmx−a t t r i b u t e : u n i t M e a s u r e ] ; #More meta−d a t a p r o p e r t i e s . . . . ¦  ¥ 24 / 43
  • 25. The RDFIndex approach | MTSR 2013 The RDFIndex approach SPARQL template for building aggregated observations. § ¤ SELECT ( di ∈ D ) [ ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e ) | OWA( ? m e a s u r e ) ] WHERE{ q rdfindex : aggregates ? parts . ? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f . ? p a r t o f r d f i n d e x : d a t a s e t qi . FILTER ( ?partof ∈ Q ) . ? o b s e r v a t i o n r d f : t y p e qb : O b s e r v a t i o n . ? part r d f i n d e x : weight ? defaultw . OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } . BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w) ? o b s e r v a t i o n m ? measure . ? o b s e r v a t i o n ? dim ? dimR ef . FILTER ( ?dim ∈ D ) . }GROUP BY ( di ∈ D ) ¦  ¥ 25 / 43
  • 26. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example of generated SPARQL query. § ¤ p r e f i x r d f i n d e x : <h t t p : / / p u r l . o r g / r d f i n d e x / o n t o l o g y /> SELECT ? dim0 ? dim1 ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e ) WHERE{ r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x rdfindex : aggregates ? parts . ? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f . ? partof rdfindex : dataset ? part . FILTER ( ( ? p a r t =r d f i n d e x −wb : A i d E f f e c t i v e n e s s ) | | ( ? p a r t =r d f i n d e x −wb : H e a l t h ) ) . ? o b s e r v a t i o n qb : d a t a S e t ? p a r t . ? part r d f i n d e x : weight ? defaultw . OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } . BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w) ? o b s e r v a t i o n r d f i n d e x : v a l u e ? measure . ? o b s e r v a t i o n r d f i n d e x −wbont : r e f −a r e a ? dim0 . ? o b s e r v a t i o n r d f i n d e x −wbont : r e f −y e a r ? dim1 . } GROUP BY ? dim0 ? dim1 ¦  ¥ 26 / 43
  • 27. The RDFIndex approach | MTSR 2013 The RDFIndex approach Partial example of a populated observation for “The World Bank Naive Index”. § r d f i n d e x −wb : o680 810085 1579 a qb : o b s e r v a t i o n ; qb : d a t a S e t r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x ; r d f i n d e x −wbont : r e f −a r e a d b p e d i a −r e s : S p a i n ; r d f i n d e x −wbont : r e f −y e a r <h t t p : / / r e f e r e n c e . d a t a . gov . uk / i d / g r e g o r i a n − i n t e r v a l /2010−01−01T00 : 0 0 : 0 0 / P1Y> ; ... #r d f s : { l a b e l , comment} { l i t e r a l } ; #d c t e r m s : { i s s u e d , d a t e , c o n t r i b u t o r , a u t h o r , #p u b l i s h e r , i d e n t i f i e r } #{ r e s o u r c e , l i t e r a l } ; r d f i n d e x : md5−h a s h " 6 c d d a 7 6 0 8 8 c d 1 6 1 d 7 6 6 8 0 9 d 6 a 7 8 d 3 5 f 6 " ; sdmx−c o n c e p t : o b s S t a t u s sdmx−c o d e : o b s S t a t u s −E ; r d f i n d e x : v a l u e " 0 . 7 0 7 "^^ x s d : d o u b l e . ¦  ¤ ¥ 27 / 43
  • 28. The RDFIndex approach | MTSR 2013 The RDFIndex approach Evolution: joint effort between the RDFIndex and Computex Computex 1 Vocabulary for statistical computations. 2 Extends RDF Data Cube. 3 Ontology at: http://purl.org/weso/computex. 4 Data and structure validation using SPARQL Queries. 5 Some terms: cex:Concept cex:Indicator cex:Computation cex:WeightSchema qb:Observation 6 Learn more: http://computex.herokuapp.com/ Preliminary Evaluation... Similar approaches that have been merged between July and September, 2013. 28 / 43
  • 29. The RDFIndex approach | MTSR 2013 The RDFIndex approach Demo of the RDFIndex/Computex... http://computex.herokuapp.com/ 29 / 43
  • 30. The RDFIndex approach | MTSR 2013 The RDFIndex approach On-going working examples The Cloud Index Compilation of Key Performance Indicators for cloud services. Creation of a Cloud Index to measure Quality of Service of cloud providers. Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in the subproject “Quality Management in Service-based Systems and Cloud Applications”. Demo: http://cloudindex-doc.herokuapp.com/ The Web Index 2012 and 2013 Compilation of several indicators (85) from 61 countries (100 in 2013). Creation of the Web Index to measure the web impact over the world. Funded by The Webfoundation. Demo: http://thewebindex.org 30 / 43
  • 31. The RDFIndex approach | MTSR 2013 The RDFIndex approach On-going working examples The Cloud Index Compilation of Key Performance Indicators for cloud services. Creation of a Cloud Index to measure Quality of Service of cloud providers. Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in the subproject “Quality Management in Service-based Systems and Cloud Applications”. Demo: http://cloudindex-doc.herokuapp.com/ The Web Index 2012 and 2013 Compilation of several indicators (85) from 61 countries (100 in 2013). Creation of the Web Index to measure the web impact over the world. Funded by The Webfoundation. Demo: http://thewebindex.org 31 / 43
  • 32. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 32 / 43
  • 33. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 33 / 43
  • 34. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 34 / 43
  • 35. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 35 / 43
  • 36. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Detailed advantages (I) Feature Main advantages Data sources Common and shared data model, RDF. Description of data providers (provenance and trust). A formal query language to query data, SPARQL. Use of Internet protocols, HTTP. Data enrichment and validation (domain and range). Unique identification of entities, concepts, etc. through (HTTP) URIs. Possibility of publishing new data under the aforementioned characteristics. Standardization and integration of data sources. Structure Meta-description of index structure (validation). Re-use of existing semantic web vocabularies. Re-use of existing datasets to enrich meta-data. Context-aware definitions. Underlying logic formalism. Orthogonal and flexible. 36 / 43
  • 37. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Detailed advantages (II) Computation cess proMeta-description of datasets aggregation. Validation of composed datasets. OWA operators support. Direct translation to SPARQL queries. Documentation Multilingual support to describe datasets, etc. Easy generation with existing tools. Cross-Domain Features Separation of concerns and responsibilities: data and meta-data (structure and computation). Standardization (put in action specs from organisms such as W3C). Declarative and adaptive approach. Non-vendor lock-in (format, access and computation). Integration, Interoperability and Transparency. Align to existing trend (data management: quality and filtering). Easy integration with third-party services such as visualization. Contribution to the Web of Data. 37 / 43
  • 38. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Restrictions SPARQL 1.1 support 1 Some built-in functions are not standardized: Example: z-score employs standard deviation. It requires built-in function: sqrt (only available in some SPARQL implementations). 2 Ranking of values is not obvious: GROUP_CONCAT Check a value against all the other values. ...neither solution is efficient. Limitations of the RDFIndex expressivity The working examples show its applicability but some lack of terms/relationships should be expected. 38 / 43
  • 39. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Restrictions SPARQL 1.1 support 1 Some built-in functions are not standardized: Example: z-score employs standard deviation. It requires built-in function: sqrt (only available in some SPARQL implementations). 2 Ranking of values is not obvious: GROUP_CONCAT Check a value against all the other values. ...neither solution is efficient. Limitations of the RDFIndex expressivity The working examples show its applicability but some lack of terms/relationships should be expected. 39 / 43
  • 40. The RDFIndex approach | MTSR 2013 Conclusions and Future Work Conclusions 1 The use of quantitative indexes is a common practice to compile and rank indicators for making decisions. 2 Existing indexes are completely opaque (PDF, HTML, etc.) 3 It is necessary to improve the access to data/metadata and to boost transparency. 4 The RDFIndex/Computex vocabulary seeks for providing the appropriate building blocks to represent and compute indexes. However: 5 SPARQL limitations avoid a fully development of “pedantic” Linked Data approach. The vocabulary likely does not cover all required elements to represent any index. 40 / 43
  • 41. The RDFIndex approach | MTSR 2013 Conclusions and Future Work Future Work 1 Extend the RDFIndex/Computex vocabulary. 2 Externalize the computation with a hybrid approach R+SPARQL. 3 Parallelization of the computation process. 41 / 43
  • 42. The RDFIndex approach | MTSR 2013 End of the presentation... 42 / 43
  • 43. The RDFIndex approach | MTSR 2013 Metadata and Information Roster... Dr. Jose María Alvarez-Rodríguez SEERC (until August, 2013) and Carlos III University of Madrid, Spain E-mail: josemaria.alvarez@uc3m.es WWW: http://www.josemalvarez.es Prof. Dr. Patricia Ordoñez De Pablos University of Oviedo, Spain E-mail: patriop@uniovi.es Prof. Dr. Jose Emilio Labra-Gayo University of Oviedo, Spain E-mail: labra@uniovi.es WWW: http://www.di.uniovi.es/~labra 43 / 43
  • 44. The RDFIndex approach | MTSR 2013 Some SPARQL queries Z-score normalization using SPARQL... § ¤ prefix afn : < http :// jena . hpl . hp . com / ARQ / function # > SELECT ( (? measure -? mean )/? stddev as ? zscore ) WHERE { ... ? observation rdfindex : value ? measure { SELECT ? mean ( afn : sqrt (( SUM ((? measure -? mean )* (? measure -? mean ))/? count )) as ? stddev ) WHERE { ? observation rdfindex : value ? measure { SELECT ( COUNT (? measure ) as ? count ) ( AVG (? measure ) as ? mean ) WHERE { ? observation rdfindex : value ? measure } GROUP BY ? count ? mean LIMIT 1 } } GROUP BY ? mean ? count LIMIT 1 } } ¦  ¥ 44 / 43
  • 45. The RDFIndex approach | MTSR 2013 References T. Bosch, R. Cyganiak, A. Gregory, and J. Wackerow. DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data. In 6th Workshop on Linked Data on the Web (LDOW2013), 2013. R. Cyganiak and D. Reynolds. The RDF Data Cube Vocabulary. Working Draft, W3C, 2013. http://www.w3.org/TR/vocab-data-cube/. I. Ermilov, M. Martin, J. Lehmann, and S. Auer. Linked Open Data Statistics: Collection and Exploitation. In KESW, pages 242–249, 2013. J. D. Fernández, M. A. Martínez-Prieto, and C. Gutiérrez. Publishing open statistical data: the spanish census. In DG.O, pages 20–25, 2011. M. Hausenblas, W. Halb, Y. Raimond, L. Feigenbaum, and D. Ayers. SCOVO: Using Statistics on the Web of Data. In L. A. et al., editor, The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in Computer Science, pages 708–722. Springer Berlin Heidelberg, 2009. J. M. A. Rodríguez, J. Clement, J. E. L. Gayo, H. Farhan, and P. Ordoñez De Pablos. Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project, pages 199–226. IGI Global, 2013. P. E. R. Salas, F. M. D. Mota, K. K. Breitman, M. A. Casanova, M. Martin, and S. Auer. Publishing Statistical Data on the Web. Int. J. Semantic Computing, 6(4):373–388, 2012. 45 / 43
  • 46. The RDFIndex approach | MTSR 2013 References B. Zapilko and B. Mathiak. Defining and Executing Assessment Tests on Linked Data for Statistical Analysis. In COLD, 2011. 46 / 43
  • 47. The RDFIndex approach | MTSR 2013 References Leveraging semantics to represent and compute quantitative indexes. The RDFIndex approach. Michalis Vafopoulos (speaker) and {Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and Jose Emilio Labra-Gayo} MTSR 2013 | 7th Metadata and Semantics Research Conference Main Track 47 / 43