Your SlideShare is downloading. ×
  • Like
Semiotics in spreadsheets
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Semiotics in spreadsheets

  • 130 views
Published

 

Published in Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
130
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Semiotics in Spreadsheets: Enhancing Semantic Interoperability Ivelize Rocha Bernardo André Santanchè
  • 2. Outline •Motivation •Research Problems •Related Work •What I did in my Master Degree •Limitations of the Master Degree Proposal •Which are the plans to the PhD
  • 3. Motivation Large amount of information in spreadsheets [Syed et al., 2010]
  • 4. Motivation Large amount of information in spreadsheets [Syed et al., 2010] Why? •They are intuitive •They have high flexibility -> diverse needs
  • 5. Motivation However, they were designed for: •Isolated use •Human reading
  • 6. Research Goal The main goal of our research is to promote a richer semantic interoperability among spreadsheets
  • 7. Interoperability (Ouksel & Sheth 1999) system interoperability syntactic interoperability structural interoperability semantic interoperability
  • 8. Interoperability (Ouksel & Sheth 1999) system interoperability syntactic interoperability structural interoperability semantic interoperability (Tolk 2006) no interoperability technical interoperability syntactic interoperability semantic interoperability pragmatic interoperability dynamic interoperability conceptual interoperability
  • 9. Interoperability (Ouksel & Sheth 1999) system interoperability syntactic interoperability structural interoperability semantic interoperability (Tolk 2006) no interoperability technical interoperability syntactic interoperability semantic interoperability pragmatic interoperability dynamic interoperability conceptual interoperability
  • 10. Interoperability semantic interoperability semantic interoperability pragmatic interoperability dynamic interoperability conceptual interoperability Data Interpretation
  • 11. Which elements must be considered in this interpretation process?
  • 12. Which elements must be considered in this interpretation process? Unity Interpretation
  • 13. Related Work isolated label (Han et al,. 2008) - RDF123: from spreadsheets to RDF, The Semantic Web. Lecture Notes in Computer Science, vol. 5318. Springer (Langegger & Wolfram, 2009) - XLWrap Querying and Integrating Arbitrary Spreadsheets with SPARQL, The Semantic Web. Lecture Notes in Computer Science, vol. 5823. Springer
  • 14. Related Work template (Abraham & Erwig, 2006) - Inferring Templates from Spreadsheets, Proceedings of the International Conference on Software Engineering
  • 15. Related Work instances (Zhao et al, 2010) - A spreadsheet system based on data semantic object, IEEE International Conference on Information Management and Engineering
  • 16. Related Work isolated label associated to linked data (Syed et al., 2010) - Exploiting a Web of Semantic Data for Interpreting Tables, Proceedings of the Web Science Conference
  • 17. Related Work correlation of labels associated to linked data (Venetis et al., 2011) - Recovering Semantics of Tables on the Web, Proceedings of the VLDB Endowment (Mulwad et al., 2010) - Using linked data to interpret tables, Proceedings of the International Workshop on Consuming Linked Data
  • 18. Related Work correlation between several spreadsheet elements associated to linked data (Limaye, 2010) - Annotating and Searching Web Tables Using Entities, Proceedings of the VLDB Endowment
  • 19. How far the system can interpret, considering labels and their correlations?
  • 20. How much different they are in fact?
  • 21. How much different they are in fact?
  • 22. How much different they are in fact?
  • 23. How much different they are in fact?
  • 24. What I did in my Master Degree
  • 25. Research Strategy 1. To identify construction patterns followed by biologists during the creation of these spreadsheets 2. To verify if these construction patterns could lead us to recognition of the spreadsheet purpose 3. To achieve a semantic interoperability among these spreadsheets
  • 26. How to identify Construction Patterns *
  • 27. * How to identify Construction Patterns what
  • 28. * How to identify Construction Patterns what
  • 29. * How to identify Construction Patterns what what
  • 30. * How to identify Construction Patterns what whatwhen
  • 31. * How to identify Construction Patterns what what wherewhen
  • 32. Construction Patterns *
  • 33. Construction Patterns * catalogue
  • 34. Construction Patterns * catalogue
  • 35. Construction Patterns * catalogue collection
  • 36. Construction Patterns * catalogue collection
  • 37. SciSpread System
  • 38. Architecture Evaluation Automatic analysis of 11,150 spreadsheets the system recognized 1,151 spreadsheets 806 spreadsheets were classified as catalogue 345 spreadsheets were classified as collection Total: 748,459 records analyzed *
  • 39. Architecture Evaluation - Results • Random subset of 1,203 spreadsheets was selected to evaluate precision/recall – Precision: 0.84 – Recall: 0.76 – Specificity: 0.95 *
  • 40. Limitation of the Master Degree Proposal
  • 41. Main Limitations● Single Domain Specific spreadsheets (catalogue and collection) ● Lack of a Model to represent construction patterns ○ after, model for construction patterns isolated for each other ● Linking labels to ontologies ○ not able to aggregate different labels belonging to the same concept ○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data
  • 42. ● Single Domain ○ Specific spreadsheets (catalogue and collection) ● Lack of a Model to represent construction patterns ○ after, model for construction patterns isolated for each other ● Linking labels to ontologies ○ not able to aggregate different labels belonging to the same concept ○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data ● Multiple Domains ● Model as an association network ○ relates elements and concepts of several spreadsheets ● Linking spreadsheet structure to ontologies ○ the link is made between concepts
  • 43. Which are the plans to my PhD
  • 44. Start SEEK
  • 45. Start SEEK proj.
  • 46. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit
  • 47. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES tre. val. SD Unit tre. val. SD Unit
  • 48. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 tre. val. SD Unit tre. val. SD Unit
  • 49. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  • 50. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  • 51. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  • 52. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  • 53. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  • 54. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End MOSES M_MZ_sample1 ura Saccharomyces_ cerevisiae 4932 CEN.PK-113-7D ura3 6,5 0,1 37 0,5 oC tre. val. SD Unit tre. val. SD Unit
  • 55. Semantic Interoperability among Spreadsheets
  • 56. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit
  • 57. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit
  • 58. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID tre. val. SD Unit tre. val. SD Unit
  • 59. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  • 60. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  • 61. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  • 62. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. tre. val. SD Unit tre. val. SD Unit
  • 63. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type tre. val. SD Unit tre. val. SD Unit
  • 64. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type tre. val. SD Unit tre. val. SD Unit
  • 65. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type tre. val. SD Unit tre. val. SD Unit
  • 66. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unit
  • 67. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unit
  • 68. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unittre. val.
  • 69. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. SD Unit tre. val. SD Unittre. val.
  • 70. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End ID time rel. glu. geno type trea. tre. val. SD Unit tre. val. SD Unit
  • 71. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea.
  • 72. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea. Spreadsheet Purpose
  • 73. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea. Spreadsheet Purpose Spreadsheet Domain
  • 74. Data Model Spreadsheets Semiotic Sign
  • 75. Data Model Spreadsheets Semiotic Sign signifierstructural form
  • 76. Data Model Spreadsheets Semiotic Sign signifier signifiedstructural form spreadsheet purpose + semantic spreadsheet data
  • 77. Architecture
  • 78. Start SEEK proj. title nam. org. NCBI ID stra. gene nam. Mod. type phe. com. tre1. ph tre2. tem. End tre. val. SD Unit tre. val. SD Unit ID time rel. glu. geno type trea. Spreadsheet Purpose Spreadsheet Domain Start XYZ How to devise different domains when the networks are interconnected? Research Challenge Spreadsheet Domain Spreadsheet Purpose
  • 79. Research Questions • When spreadsheets could be considered of the same purpose? • Is there a canonical representation among spreadsheets of the same purpose? • Is it possible to define a canonical representation for a spreadsheet group • Can this representation be used to predict spreadsheets of a given purpose?
  • 80. Acknowledgements ● Laboratory of Information Systems (LIS) ● UNICAMP ● FAPESP ● Microsoft Research FAPESP Virtual Institute (NavScales project) ● CNPq (MuZOO Project and PRONEX-FAPESP) ● INCT in Web Science(CNPq 557.128/2009-9) ● CAPES
  • 81. Thank you for your attention!