Spreadsheets are graphs too!
Felienne Hermans (@felienne)
Spreadsheets are graphs too!
Felienne Hermans (@felienne)
In this slidedeck I explain how I
used Neo4J to store informatio...
Ehm...spreadsheets?
They are so tably?
Are you sure they are fit for a graph
database?
Spreadsheets are mislabeled
Spreadsheets are mislabeled
People often think of spreadsheets
as data, but...
Spreadsheets are code
Spreadsheets are code
I have made it my life’s work to
spread the happy word
“Spreadsheets are code!”
Spreadsheets are code
I have made it my life’s work to
spread the happy word
“Spreadsheets are code!”
If you don’t immedia...
1) Used for similar problems
This tool (for stock price
computation) could have been
built in any language. C,
JavaScript, COBOL, or Excel.
The problem...
2) Formulas are Turing complete
2) Formulas are Turing complete
I go to great lengths to make my
point. To such great lengths that I
built a Turing machin...
Here you see it in action. Every row
is an consecutive step of the lint.
This makes it, in addition to a proof
that formul...
3) They suffer from the same problems
3) They suffer from the same problems
3) They suffer from the same problems
3) They suffer from the same problems
In summary: both the activities,
complexity and problems are the same
So if spreadsheets are code, can we
apply software engineering methods?
In my dissertation, I defined smells
for spreadsheet formulas
Turns out, Fowler’s code smells are easily
transferable to spreadsheets
Pop quiz: what smell is this?
It is the ‘feature envy’ smell
See how easily this applies to
spreadsheets
To analyze smells, we save spreadsheet
info to a database
This is the data model that I am
storing to the database.
The basics are pretty simple.
This is the data model that I am
storing to the database.
The basics are pretty simple.
But cells can refer to each other,...
=A7+A9=SUM(A1:A5)
This is the data model that I am
storing to the database.
The basics are pretty simple.
But cells can re...
This is the data model that I am
storing to the database.
The basics are pretty simple.
But cells can refer to each other,...
You know the saying that if all you
have is a hammer, everything is a
nail to you.
This is what happened to me. I did
not ...
SQL
You know the saying that if all you
have is a hammer, everything is a
nail to you.
This is what happened to me. I did
...
Number of worksheets in a spreadsheet
Which started out just fine!
Number of cells in a spreadsheet
Still pretty okay
Number of connected cells for a cell
But, in order to calculate the
‘feature envy’ smell, we need the
total number of conn...
Number of connected cells for a cell
But, in order to calculate the
‘feature envy’ smell, we need the
total number of conn...
Number of connected cells for a cell
Number of connected cells for a cell
But, in order to calculate the
‘feature envy’ smell, we need the
total number of conn...
Number of connected cells for a cell
Number of connected cells for a cell
Number of connected cells for a cell
Things start to get iffy when we
combine these two query parts.
Number of connected cells for a cell
Number of connected cells for a cell
Things start to get iffy when we
combine these two query parts.
Not only is the query...
Number of connected cells for a cell
If your tools reach their limits, this
has to tell you something.
So I started thinking.
Maybe this
is not a
nail…
Maybe I
need a
different tool
Maybe I
need a
different tool
It was at this time that I attended a
talk about Neo4J.
And the strange thing is, I had
seen...
So I ended up with this
model. Still spreadsheets,
worksheets, cells and links.
So I ended up with this
model. Still spreadsheets,
worksheets, cells and links.
But the ‘prec’ relation can
now refer to e...
Turning this
Turning this into this.
Turning this into this.
I wouldn’t say this is the power of
Neo at work. It is the power of the
right tool for the job.
Th...
Also, to be honest with you, I did
not immediately write such super
succint Cypher queries. My first
attempt was something...
Also, to be honest with you, I did
not immediately write such super
succint Cypher queries.
My first attempt was something...
Also, to be honest with you, I did
not immediately write such super
succint Cypher queries.
My first attempt was something...
Number of cells in a spreadsheet
Number of cells in a spreadsheet
First Cypher attempt
Still very SQLy
Number of cells in a spreadsheet
Second (okay probably more like
fifth) attempt. No more where,
directly matching a graph ...
That’s all folks.
Spreadsheets are code
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
Neo is cool for graph like struc...
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
Neo is cool for graph like struc...
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
Neo is cool for graph like struc...
Spreadsheets are graphs too!
Felienne Hermans (@felienne)
That’s all folks.
Spreadsheets are code
Don’t justhit things wit...
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information
Upcoming SlideShare
Loading in...5
×

Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information

4,448

Published on

This presentation explains how I use Neo4J as a database for a tool that calculate spreadsheet metrics.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,448
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
28
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information

  1. 1. Spreadsheets are graphs too! Felienne Hermans (@felienne)
  2. 2. Spreadsheets are graphs too! Felienne Hermans (@felienne) In this slidedeck I explain how I used Neo4J to store information on spreadsheets
  3. 3. Ehm...spreadsheets? They are so tably? Are you sure they are fit for a graph database?
  4. 4. Spreadsheets are mislabeled
  5. 5. Spreadsheets are mislabeled People often think of spreadsheets as data, but...
  6. 6. Spreadsheets are code
  7. 7. Spreadsheets are code I have made it my life’s work to spread the happy word “Spreadsheets are code!”
  8. 8. Spreadsheets are code I have made it my life’s work to spread the happy word “Spreadsheets are code!” If you don’t immediately believe me, I have three reasons* * If you do believe me, skip the next 10 slides ;)
  9. 9. 1) Used for similar problems
  10. 10. This tool (for stock price computation) could have been built in any language. C, JavaScript, COBOL, or Excel. The problems Excel is used for are often (not always) similar to problems solved in different languages.
  11. 11. 2) Formulas are Turing complete
  12. 12. 2) Formulas are Turing complete I go to great lengths to make my point. To such great lengths that I built a Turing machine in Excel, using formulas only.
  13. 13. Here you see it in action. Every row is an consecutive step of the lint. This makes it, in addition to a proof that formulas are Turing complete, Also a nice visualization of a Turing machine.
  14. 14. 3) They suffer from the same problems
  15. 15. 3) They suffer from the same problems
  16. 16. 3) They suffer from the same problems
  17. 17. 3) They suffer from the same problems
  18. 18. In summary: both the activities, complexity and problems are the same
  19. 19. So if spreadsheets are code, can we apply software engineering methods?
  20. 20. In my dissertation, I defined smells for spreadsheet formulas
  21. 21. Turns out, Fowler’s code smells are easily transferable to spreadsheets
  22. 22. Pop quiz: what smell is this?
  23. 23. It is the ‘feature envy’ smell
  24. 24. See how easily this applies to spreadsheets
  25. 25. To analyze smells, we save spreadsheet info to a database
  26. 26. This is the data model that I am storing to the database. The basics are pretty simple.
  27. 27. This is the data model that I am storing to the database. The basics are pretty simple. But cells can refer to each other, either directly (i.e. =A7+A9) =A7+A9
  28. 28. =A7+A9=SUM(A1:A5) This is the data model that I am storing to the database. The basics are pretty simple. But cells can refer to each other, either directly [=A7+A9] or through a range [=SUM(A1:A5)]
  29. 29. This is the data model that I am storing to the database. The basics are pretty simple. But cells can refer to each other, either directly [=A7+A9] or through a range [=SUM(A1:A5)] In the case of a range, the range itself will points to the cells it contains. =SUM(A1:A5) A1..A5
  30. 30. You know the saying that if all you have is a hammer, everything is a nail to you. This is what happened to me. I did not think about what type of database to use.
  31. 31. SQL You know the saying that if all you have is a hammer, everything is a nail to you. This is what happened to me. I did not think about what type of database to use. I just started banging with the good ol’ SQL hammer I had been using for ever.
  32. 32. Number of worksheets in a spreadsheet Which started out just fine!
  33. 33. Number of cells in a spreadsheet Still pretty okay
  34. 34. Number of connected cells for a cell But, in order to calculate the ‘feature envy’ smell, we need the total number of connected cells. So both direct and through a range.
  35. 35. Number of connected cells for a cell But, in order to calculate the ‘feature envy’ smell, we need the total number of connected cells. So both direct and through a range. Let’s start with direct.
  36. 36. Number of connected cells for a cell
  37. 37. Number of connected cells for a cell But, in order to calculate the ‘feature envy’ smell, we need the total number of connected cells. So both direct and through a range. Let’s start with direct. Now look at the range part.
  38. 38. Number of connected cells for a cell
  39. 39. Number of connected cells for a cell
  40. 40. Number of connected cells for a cell Things start to get iffy when we combine these two query parts.
  41. 41. Number of connected cells for a cell
  42. 42. Number of connected cells for a cell Things start to get iffy when we combine these two query parts. Not only is the query quite big, also this happens.
  43. 43. Number of connected cells for a cell
  44. 44. If your tools reach their limits, this has to tell you something. So I started thinking.
  45. 45. Maybe this is not a nail…
  46. 46. Maybe I need a different tool
  47. 47. Maybe I need a different tool It was at this time that I attended a talk about Neo4J. And the strange thing is, I had seen a few talks about Neo before. But this time it ‘clicked’, because I was suffering from the problem that Neo could solve.
  48. 48. So I ended up with this model. Still spreadsheets, worksheets, cells and links.
  49. 49. So I ended up with this model. Still spreadsheets, worksheets, cells and links. But the ‘prec’ relation can now refer to either cells or ranges.
  50. 50. Turning this
  51. 51. Turning this into this.
  52. 52. Turning this into this. I wouldn’t say this is the power of Neo at work. It is the power of the right tool for the job. There are scenarios, for sure, where the situation is the other way around. But for my goal, Neo was a great fit.
  53. 53. Also, to be honest with you, I did not immediately write such super succint Cypher queries. My first attempt was something like this:
  54. 54. Also, to be honest with you, I did not immediately write such super succint Cypher queries. My first attempt was something like this
  55. 55. Also, to be honest with you, I did not immediately write such super succint Cypher queries. My first attempt was something like this This is basically a one on one translation from SQL to Neo. Still the two different ways of connecting. It took me a while to understand the power of traversal queries. Here’s another example:
  56. 56. Number of cells in a spreadsheet
  57. 57. Number of cells in a spreadsheet First Cypher attempt Still very SQLy
  58. 58. Number of cells in a spreadsheet Second (okay probably more like fifth) attempt. No more where, directly matching a graph pattern. The power of Cypher :)
  59. 59. That’s all folks. Spreadsheets are code
  60. 60. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know
  61. 61. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures
  62. 62. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures It makes queries easier
  63. 63. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures It makes queries easier But it takes some getting used to for SQL minded brains
  64. 64. Spreadsheets are graphs too! Felienne Hermans (@felienne) That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures It makes queries easier But it takes some getting used to for SQL minded brains Liked this talk? Visit my site for more
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×