Semiotics in spreadsheets

311 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
311
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Semiotics in spreadsheets

  1. 1. Semiotics in Spreadsheets: Enhancing Semantic Interoperability Ivelize Rocha Bernardo André Santanchè
  2. 2. Outline •Motivation •Research Problems •Related Work •What I did in my Master Degree •Limitations of the Master Degree Proposal •Which are the plans to the PhD
  3. 3. Motivation Large amount of information in spreadsheets [Syed et al., 2010]
  4. 4. Motivation Large amount of information in spreadsheets [Syed et al., 2010] Why? •They are intuitive •They have high flexibility -> diverse needs
  5. 5. Motivation However, they were designed for: •Isolated use •Human reading
  6. 6. Research Goal The main goal of our research is to promote a richer semantic interoperability among spreadsheets
  7. 7. Interoperability (Ouksel & Sheth 1999) system interoperability syntactic interoperability structural interoperability semantic interoperability
  8. 8. Interoperability (Ouksel & Sheth 1999) system interoperability syntactic interoperability structural interoperability semantic interoperability (Tolk 2006) no interoperability technical interoperability syntactic interoperability semantic interoperability pragmatic interoperability dynamic interoperability conceptual interoperability
  9. 9. Interoperability (Ouksel & Sheth 1999) system interoperability syntactic interoperability structural interoperability semantic interoperability (Tolk 2006) no interoperability technical interoperability syntactic interoperability semantic interoperability pragmatic interoperability dynamic interoperability conceptual interoperability
  10. 10. Interoperability semantic interoperability semantic interoperability pragmatic interoperability dynamic interoperability conceptual interoperability Data Interpretation
  11. 11. Which elements must be considered in this interpretation process?
  12. 12. Which elements must be considered in this interpretation process? Unity Interpretation
  13. 13. Related Work isolated label (Han et al,. 2008) - RDF123: from spreadsheets to RDF, The Semantic Web. Lecture Notes in Computer Science, vol. 5318. Springer (Langegger & Wolfram, 2009) - XLWrap Querying and Integrating Arbitrary Spreadsheets with SPARQL, The Semantic Web. Lecture Notes in Computer Science, vol. 5823. Springer
  14. 14. Related Work template (Abraham & Erwig, 2006) - Inferring Templates from Spreadsheets, Proceedings of the International Conference on Software Engineering
  15. 15. Related Work instances (Zhao et al, 2010) - A spreadsheet system based on data semantic object, IEEE International Conference on Information Management and Engineering
  16. 16. Related Work isolated label associated to linked data (Syed et al., 2010) - Exploiting a Web of Semantic Data for Interpreting Tables, Proceedings of the Web Science Conference
  17. 17. Related Work correlation of labels associated to linked data (Venetis et al., 2011) - Recovering Semantics of Tables on the Web, Proceedings of the VLDB Endowment (Mulwad et al., 2010) - Using linked data to interpret tables, Proceedings of the International Workshop on Consuming Linked Data
  18. 18. Related Work correlation between several spreadsheet elements associated to linked data (Limaye, 2010) - Annotating and Searching Web Tables Using Entities, Proceedings of the VLDB Endowment
  19. 19. How far the system can interpret, considering labels and their correlations?
  20. 20. How much different they are in fact?
  21. 21. How much different they are in fact?
  22. 22. How much different they are in fact?
  23. 23. How much different they are in fact?
  24. 24. What I did in my Master Degree
  25. 25. Research Strategy 1. To identify construction patterns followed by biologists during the creation of these spreadsheets 2. To verify if these construction patterns could lead us to recognition of the spreadsheet purpose 3. To achieve a semantic interoperability among these spreadsheets
  26. 26. How to identify Construction Patterns *
  27. 27. * How to identify Construction Patterns what
  28. 28. * How to identify Construction Patterns what
  29. 29. * How to identify Construction Patterns what what
  30. 30. * How to identify Construction Patterns what whatwhen
  31. 31. * How to identify Construction Patterns what what wherewhen
  32. 32. Construction Patterns *
  33. 33. Construction Patterns * catalogue
  34. 34. Construction Patterns * catalogue
  35. 35. Construction Patterns * catalogue collection
  36. 36. Construction Patterns * catalogue collection
  37. 37. SciSpread System
  38. 38. Architecture Evaluation Automatic analysis of 11,150 spreadsheets the system recognized 1,151 spreadsheets 806 spreadsheets were classified as catalogue 345 spreadsheets were classified as collection Total: 748,459 records analyzed *
  39. 39. Architecture Evaluation - Results • Random subset of 1,203 spreadsheets was selected to evaluate precision/recall – Precision: 0.84 – Recall: 0.76 – Specificity: 0.95 *
  40. 40. Limitation of the Master Degree Proposal
  41. 41. Main Limitations● Single Domain Specific spreadsheets (catalogue and collection) ● Lack of a Model to represent construction patterns ○ after, model for construction patterns isolated for each other ● Linking labels to ontologies ○ not able to aggregate different labels belonging to the same concept ○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data
  42. 42. ● Single Domain ○ Specific spreadsheets (catalogue and collection) ● Lack of a Model to represent construction patterns ○ after, model for construction patterns isolated for each other ● Linking labels to ontologies ○ not able to aggregate different labels belonging to the same concept ○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data ● Multiple Domains ● Model as an association network ○ relates elements and concepts of several spreadsheets ● Linking spreadsheet structure to ontologies ○ the link is made between concepts
  43. 43. Which are the plans to my PhD
  44. 44. Start SEEK
  45. 45. Start SEEK proj.
  46. 46. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit
  47. 47. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES tre. val. SD Unit tre. val. SD Unit
  48. 48. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 tre. val. SD Unit tre. val. SD Unit
  49. 49. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  50. 50. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  51. 51. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  52. 52. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  53. 53. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  54. 54. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  55. 55. Semantic Interoperability among Spreadsheets
  56. 56. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit
  57. 57. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit
  58. 58. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID tre. val. SD Unit tre. val. SD Unit
  59. 59. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  60. 60. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  61. 61. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  62. 62. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  63. 63. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type tre. val. SD Unit tre. val. SD Unit
  64. 64. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type tre. val. SD Unit tre. val. SD Unit
  65. 65. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type tre. val. SD Unit tre. val. SD Unit
  66. 66. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unit
  67. 67. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unit
  68. 68. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unittre. val.
  69. 69. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. SD Unit tre. val. SD Unittre. val.
  70. 70. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unit
  71. 71. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea.
  72. 72. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea. Spreadsheet Purpose
  73. 73. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea. Spreadsheet Purpose Spreadsheet Domain
  74. 74. Data Model Spreadsheets Semiotic Sign
  75. 75. Data Model Spreadsheets Semiotic Sign signifierstructural form
  76. 76. Data Model Spreadsheets Semiotic Sign signifier signifiedstructural form spreadsheet purpose + semantic spreadsheet data
  77. 77. Architecture
  78. 78. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea. Spreadsheet Purpose Spreadsheet Domain Start XYZ How to devise different domains when the networks are interconnected? Research Challenge Spreadsheet Domain Spreadsheet Purpose
  79. 79. Research Questions • When spreadsheets could be considered of the same purpose? • Is there a canonical representation among spreadsheets of the same purpose? • Is it possible to define a canonical representation for a spreadsheet group • Can this representation be used to predict spreadsheets of a given purpose?
  80. 80. Acknowledgements ● Laboratory of Information Systems (LIS) ● UNICAMP ● FAPESP ● Microsoft Research FAPESP Virtual Institute (NavScales project) ● CNPq (MuZOO Project and PRONEX-FAPESP) ● INCT in Web Science(CNPq 557.128/2009-9) ● CAPES
  81. 81. Thank you for your attention!

×