• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
An overview of my PhD research
 

An overview of my PhD research

on

  • 11,605 views

 

Statistics

Views

Total Views
11,605
Views on SlideShare
8,308
Embed Views
3,297

Actions

Likes
7
Downloads
91
Comments
5

20 Embeds 3,297

http://www.felienne.com 2026
http://www.markpack.org.uk 345
http://blogs.technet.com 339
http://tojans.me 221
https://twitter.com 180
http://trabasack.tumblr.com 96
http://localhost 40
http://www.linkedin.com 26
http://cloud.feedly.com 7
http://www.newsblur.com 4
https://abs.twimg.com 3
http://webcache.googleusercontent.com 2
http://markpack.org.dev 1
http://silverreader.com 1
http://i1.blogs.technet.com 1
http://markpack.chocolate.markpack.vc.catn.com 1
http://ppe.blogs.technet.com 1
http://conversation.cipr.co.uk 1
http://unpitiable6.capogee.com 1
http://www.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

15 of 5 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • @andrewchan520 Would be great if your wrote a piece on this! Let me know if have any questions, my contact info is on the final slide.
    Are you sure you want to
    Your message goes here
    Processing…
  • A nice one! I have been interested in the 'philosophy' of spreadsheet designs, and it's nice to see some research on it. For the spreadsheet problems, what will be the (most important) root cause? Is it Microsoft on the spreadsheet design or the users who design it? Do you have any suggestions on good spreadsheet design and maintenance principles?
    Are you sure you want to
    Your message goes here
    Processing…
  • Like to write an article for CompAct, an online newsletter from Society of Actuaries Technology Section.

    Here is a link to CompAct:

    http://www.soa.org/News-and-Publications/Newsletters/Compact/2013/april/com-2013-iss47.aspx
    Are you sure you want to
    Your message goes here
    Processing…
  • Hi Nicola,

    Thanks for your comment. We did not investigate at BI tools, we only looked at the applicability of software engineering method to spreadsheets. We might study BI tools in the future, did you have specific tools in mind?
    Are you sure you want to
    Your message goes here
    Processing…
  • Thank you! Hi liked very much the introduction on spreadsheets.
    In the second part you outline an interesting approach to spreadsheet issues, I was wonderinfg if you also took into consideration Business Intelligence tools in your research?

    thanks,
    Nicola
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An overview of my PhD research An overview of my PhD research Presentation Transcript

    • Analyzing & visualizing spreadsheetsFelienne Hermans (@felienne)
    • Analyzing & visualizing spreadsheetsFelienne Hermans (@felienne)In this slidedeck I present anoverview of my PhD research. Irecently defended my dissertationtitled ‘Analyzing and visualizingSpreadsheets’
    • In this slidedeck I present anoverview of my PhD research. Irecently defended my dissertationtitled ‘Analyzing and visualizingSpreadsheets’ This one!
    • Bridging the gapFunny story: I wasn’t hired toresearch spreadsheets at all. WhenI started my PhD project, I wassupposed to research the gapbetween business users andprogrammers.UsersProgrammers
    • To research this gap, I started bystudying business in practice
    • What surprised me, is that this gapwasn’t that big, it was more like asmall creek than a huge cliff.Some programmers were heavillyinvolved in business, and even moreinteresting: some business guys weredoing serious programming.ProgrammersUsers
    • What surprised me, is that this gapwasn’t that big, it was more like asmall creek than a huge cliff.Some programmers were heavillyinvolved in business, and even moreinteresting: some business guys weredoing serious programming.In Excel!ProgrammersUsers
    • What surprised me, is that this gapwasn’t that big, it was more like asmall creek than a huge cliff.Some programmers were heavillyinvolved in business, and even moreinteresting: some business guys weredoing serious programming.In Excel!So I looked into some previous workon the impact of spreadsheets onbusiness.ProgrammersUsers
    • 95% of all U.S. firms use spreadsheets forfinancial reporting
    • 90% of all analysts in industry performcalculations in spreadsheets
    • 50% of spreadsheets form the basis fordecisions
    • Importance can grow over timeWhen studying the impact ofspreadsheets, we found that theydo not become importantovernight. As processes change,spreadsheets can become keycompany assets over time.Nobody sets out to create a missioncritical spreadsheet, they “justhappen”
    • This is a simple spreadsheet for manyusersFurthermore, spreadsheets canbecome surprisingly complex.
    • And, spreadsheet exist‘under the radar’Another interesting property ofspreadsheets is that they often live‘under the radar’:There is no list of spreadsheets, noone keeps track of what sheets areneeded for what report and somespreadsheets do not have a clearowner.
    • Only 33% of spreadsheets hasa manualFinally, spreadsheets are lackingdocumentation. In only one third ofspreadsheets we found‘documentation’ (i.e. Some sort ofexplanation on how to use thespreadsheet) Technicaldocumentation, explaining why aspreadsheet was designed as it is,was hardly ever found.
    • Complex spreadsheets withoutdocumentation can lead to serious errorsYou can imagine the combinationof all the above facts:• Spreadsheets are important• They are complex• They lack documentationis a potential recipe for disaster.And indeed, those errors happen
    • The European Spreadsheet Risk InterestGroup (Eusprig.org) collects horror stories
    • Estimated loss: 10 billion dollars a year
    • We interviewed spreadsheetprofessionalsOnce I had studied relatedspreadsheet work and the horrorstories from Eusprig, I wanted togain a deeper understanding ofspreadsheet problems in practice.So I interviewed 27 spreadsheetprofessionals at the Dutch Robecobank.
    • We interviewed spreadsheetprofessionalsOnce I had studied relatedspreadsheet work and the horrorstories from Eusprig, I wanted togain a deeper understanding ofspreadsheet problems in practice.So I interviewed 27 spreadsheetprofessionals at the Dutch Robecobank.I asked only two questions (a semi-structured interview) to obtain anoverall view of spreadsheetproblems:
    • What annoys you?
    • And what makes you happy?
    • Financial professionals spend 2 days aweek working with ExcelFrom the interviews, we learned thefollowing facts
    • Spreadsheets can have a long life,5 years on average
    • Average sheet is used by 12 differentpeople
    • There is a gap! Between importance andtreatment.Then I concluded that there is aninteresting gap that needsbridging:the gap between how importantspreadsheets are and how wellthey are treated.So how could this gap be bridged?
    • It looks like software in the 70s!Let’s summarize the problemsaround spreadsheets again:• They lack documentation• They contain errors• They stay alive for several yearsand are used by several people• They are complexDoes this remind you ofsomething?It reminded me of the problems inthe early days of software
    • Hence, we tried to bridge this gap withmethods from software engineering.
    • Spreadsheet users lack great toolsupportIf you compare the tooling ofspreadsheet developers with thatof software developers, thedifference is clear.
    • Modern IDEs (like Visual Studio)have all kinds of build-in tools tohelp you build software in aresponsible way: debugging,testing, analyzing and visualizingare accessible at the click of abutton.
    • Compare this to a spreadsheetenvironment, like Excel. Lots ofsupport to create a spreadsheet,with fonts and colors and borders,but none of the helpful tools tobuild a maintainable spreadsheet.
    • We did not start coding immediatelyHowever tempting, we did not startto build a spreadsheet IDEimmediately. Instead, we lookedat the results of the interviews, tofind the most pressing informationneed that spreadsheet users had.
    • Most important problem: support forunderstanding spreadsheets was missing
    • To address this information needspecifically, we developed ourtool Breviz.This tool visualizes thedependencies among worksheets,depicted as rectangles with arrowsdrawn between them. The thickerthe arrow, the more connectionsthere are.Example: In worksheet ‘POAProject’ formulas are placed thatrefer to cells in ‘ProjectTeam’
    • We went back to practiceWith our tool, we went back topractice, to see whether it reallysupported spreadsheet users.
    • Turned out, it did. Some of theresponses of users:“This diagramreminds me ofwhat I had in mindwhen building”
    • Turned out, it did. Some of theresponses of users:This remark is interesting:apparently, this spreadsheet userdid do some modeling beforebuilding a spreadsheet.“This diagramreminds me ofwhat I had in mindwhen building”
    • Turned out, it did. Some of theresponses of users:A clear sign that we were on theright track!“This makes my job10 times easier”
    • This work was publishedat ICSE 2011
    • However, unexpected things alsohappened. Not all spreadsheetslooked as well structured as thisone.Let’s look at some of them:
    • Here, pink blocks representworksheets outside of thespreadsheet. So this spreadsheetgathers information from over 20other worksheets and combinesthis information.
    • Users diagnosed with the diagramsWe found that, due to the diversityon the diagrams, users started tojudge spreadsheets based on theirdataflow diagrams.We therefore formalized thisfeeling users had into ‘smells’ atthe design level.These spreadsheet smells turnedout to be very similar to codesmells as defined by Fowler.
    • Consider for instance the ‘featureenvy’ smell. This occurs when amethod from class B refers tomany fields outside its own class.This method envies all the coolfields that A has, hence the name.
    • Consider for instance the ‘featureenvy’ smell. This occurs when amethod from class B refers tomany fields outside its own class.This method envies all the coolfields that A has, hence the name.Easy to see how this smell couldbe defined on spreadsheets,where a formula in worksheet Bcould be overly interested in cellson worksheet A.
    • We added support in Breviz fordetecting and visualizing theseinter-worksheet code smells.
    • We went back to practiceNext, of course, we went back topractice, to see how users feltabout the detected smells.
    • “Thatshould beimproved”Results showed that usersunderstoond why certainconstructions were qualified assmelly.
    • “Thatshould beimproved”Results showed that usersunderstoond why certainconstructions were qualified assmelly.“This must beconfusing for others”
    • Published at ICSE 2012
    • However, new problems were to bediscovered. We found that, oncethe structure of the spreadsheetshad been understood andvalidated, complex formulas stillgot in the way of understandingspreadsheets.
    • This led us to the idea of formula smells
    • Again, we took our inpiration fromthe smells that Fowler defines in hiscanonical book on refctoring.
    • Published at ICSM 2012
    • In a recent extention of the paper,we also suggest refactoringscorresponding to smells.This formula, for instance, containthe same subformula twice.Extracting this subformula into aseperate cell will improvereadbility.
    • We went back to practiceAnd again... A look in practice
    • We found that cloning (i.e. Copypasting) in spreadsheets was aproblem. If data is copy-pasted,updates will not be propagated tothe copies and that might lead toerrors.Based on existing work in clonedetection in source code, wedeveloped an algorithm to detecclones.
    • Clone visualization was added toour visualization, indicated with adashed arrow. After all, when datais copy-pasted betweenworksheets, there is a dependencybetween those worksheets (albeit adifferent one than a formula link)
    • To validate our algorithm, weperformed a case study at thedistribution centre of the SouthDutch food bank. There, theyprocess 100.000 kilos of food permonth, and keep track of that withspreadsheets.We were able to detect 61 near-miss clones, of which 25 wereactual errors.Because of our analysis, thisdistrubution centre is now runningerror-free spreadsheets!
    • To be published at ICSE 2013
    • And this paper concluded my PhDthesis.I will continue to work onspreadsheet analysis for at leastfive more years at Delft University ofTechnology, so in the remainingfew slides, I’ll line out what I will beworking on in the future.
    • Remember spreadsheets stay inbusiness for 5 years and are usedby 12 people during their life span?This makes it interesting to consider‘spreadsheet evolution’ and studyhow spreadsheets are created.
    • Visual Basic AnalysisIn our current visualization andanalysis technique, we onlyconsider formulas.However, spreadsheets also allowfor code to interact with data andformulas (VBA code in Excel).By analyzing this, we could makeour analysis more complete andinteresting.
    • Spreadsheet testingFinally, we want to research howspreadsheet users test. One mightthink that spreadsheet users do nottest, but this is not true.
    • In our previous studies, we oftensaw formules like this one. Here,nothing is really calculated.Instead, some sort of validation isperformed: if ‘find zone’!W3 issmaller than 0, we are notinterested in the value.When we could extract these typeof formulas, we could use them totest the spreadsheet.
    • Analyzing and visualizing spreadsheetsFelienne HermansThanks for reading about theresearch adventure I was enjoyingthe past 4 years!If you want to know more, have alook at my blog: www.felienne.comIf you are intrested in collaborating,please send me anEmail f.f.j.hermans@tudelft.nlor a tweet @felienne