Spreadsheet Guardian: An Approach to Protecting Semantic Correctness throughout the Evolution of Spreadsheets

Spreadsheets are everywhere in the corporate world. They have become a standard tool for a variety of professions for tasks like calculating a budget or predicting risks. Hence, multi-million decisions rely on the correctness of these spreadsheets.

Yet, a spreadsheet is not just an innocent document as a Word document. It is essentially a program. And as such, when defects can have big consequences, we need to treat their quality as we would with any other software. The problem is: Spreadsheets are usually built by end-users not professional software engineers. They don't know and probably don't really care about software quality assurance. So how can we approach this problem?

My doctoral student Daniel Kulesz is working in his doctoral project towards an approach and tool that can help spreadsheet users in protecting the semantic correctness of their spreadsheets. He called it Spreadsheet Guardian. The main ideas are the following:

  • Let's bring over the ideas of static analysis and testing from conventional software development to spreadsheets.
  • To not overwhelm the normal spreadsheet user, distinguish more advanced users that specify test rules and common users that just run the tests to see if they broke something.

In Spreadsheet Guardian, as an advanced user, you specify a test rule. This can be, for example, if in cell A3 is the number 213, then in A4, there should be the number 426. If other people use and start changing the spreadsheet, as soon as they break this rule, they are notified by Spreadsheet Guardian. This ensures that they do not inadvertently break the spreadsheet.

There is an open-source implementation of Spreadsheet Guardian in the form of a plugin for Microsoft Excel called Spreadsheet Inspection Framework. It allows you to specify and execute test rules.

To evaluate whether this approach is really helpful for spreadsheet users, we conducted a control experiment with 48 participants. We found that the vast majority of the participants were able to specify test rules correctly after two short tutorials. We did not see a clear improvement of correctness in our example when using the Spreadsheet Guardian approach. We noticed however, that users were a lot less suffering from overconfidence in the correctness of their spreadsheet. It seems it allows spreadsheet users to reasonably judge the actual correctness of their spreadsheets.

In case you are interested in more details, the whole approach together with the study is available as an open-access article in the Journal of Software: Evolution and Process directly or on arXiv. Besides Daniel Kulesz, Verena Käfer did a lot of work on the experiment.

Coupled change suggestions lead to better perfective software maintenance

One important part of software maintenance is perfective maintenance. I understand it here as adding new features and functionality to the software system. Especially for developers new to a system or developer beginners, adding a feature to a large, long-lived software system can be a challenge.

While looking through hundreds of thousands of lines of code, it is hard to keep track of where which parts of the system are and should be located. Architecture and design documents are often out-of-date or completely missing. So one classic question in software engineering is: Where in the source code do I have to make changes?

Several researchers have worked on this problem over the years. One direction in this research uses the idea of suggestions that we get in web shops: "Who bought diapers also bought romper suits." In our context: Who changed this file also changed these three files. Gall, Hajek and Jazayeri already published this idea in 1998. Over the years, there have been many proposals of methods and tools for coupled change analysis as well as the application on existing data sets.

What has been missing so far, however, is experimental validation that these approaches actually help developers. Therefore, we conducted an experiment comparing two groups of students performing perfective maintenance tasks. We measured the correctness of the feature implementation as well as the time needed to complete the tasks. We found almost no difference in the time needed, but there is a statistically significant difference in the correctness. In short: students having suggestions from coupled change analysis produced more correct implementations.

Of course, our first- and second-year students are not representative for all kinds of developers, but especially software engineering beginners have similar characteristics. In the end, this is only one experiment but at least first solid evidence on the usefulness of coupled change suggestions.

You can find all details to our study openly accessible in the journal PeerJ Computer Science.

The Terms "Model-Based" and "Model-Driven" Considered Harmful

I truly believe that abstraction is at the core of computer science and software engineering. Abstraction is essential to be able to cope with complexity and allows us to build the tremendously complex systems we build today. Yet, it is hard to clearly define what abstraction is.  Florian Deissenboeck and I were attempting this almost 10 years ago in an article that was published in a workshop on the role of abstraction in software engineering. I think we found some interesting ways to think about it, but it was extremely hard to wrap your head around the concept as such.

Similarly, models, which I would define in our context as abstractions of systems that have a specific purpose, are essential for software engineering, maybe for engineering in general. Especially in software engineering, almost everything we deal with is a model. Requirements written down are a model of a part of a system to be. Even the short sentence in a typical user story is a model. The sketch with boxes and arrows on the whiteboard indicating an aspect of a software architecture is a model of the software. The Java source code is a model of the software abstracting e.g. the details of the execution on the machine. I could go on and on. Models are essential for software engineering.

So what does it mean to call something "model-based software engineering" (MBSE) or "model-driven software engineering" (MDSE)? I would argue without models there is no software engineering. Yet, there is a large research community working on MBSE/MDSE. To this day, I have not fully understood what that is, although I have been working on things that were called "model-based" myself.

My first research project in 2002 was a collaboration with BMW in which we tried model-based testing of their MOST network master. We used the approach my valued colleague Alex Pretschner, now professor at TU Munich, built in his PhD project. We invested a lot of time discussing with the BMW engineers to build a detailed model in AutoFOCUS. This model was then quite suitable to generate test cases to be run against the actual network master. Interestingly, we found that many of the defects we found were found during the modelling. My personal observation was that the model became so detailed, it was almost a reimplementation of the network master software. Was there really a conceptual difference between our "model" and the code?

Over the years, I have thought about this a lot. This post is my current status of what I think about  MBSE/MDSE. I do not want to convey that everyone working in that field is stupid and does non-sense research. Quite to the contrary, I do think there is a lot of interesting work going on. My hypothesis, however, is the following:

Using the terms "model-based" or "model-driven" in combination with software engineering or software engineering techniques obscures what these techniques are actually about and are actually capable of. Progress in software engineering is hindered by this division in "model-based" and "code-based".

I sincerely hope that this might start a discussion in our community. To substantiate why this hypothesis may be true, I collected the following five observations:

Nobody knows what MBSE/MDSE really is. There is a great confusion in research and especially practice what this MBSE or MDSE should be. For many, it is working with diagrams instead of textual code. For example, a UML sequence diagram would be a model-based technique but text in a programming language describing the sequence of messages might not. For others, it needs to be mathematically formal. For other still, it is the name for using Simulink to program. A recent study by Andreas Vogelsang et al. showed this perfectly.

Practitioners don't adopt MBSE/MDSE. This point is a bit hard to discuss given the first one. Oftentimes, if practitioners state that they do MBSE/MDSE, they apply a certain tool such as Simulink. Or they have some UML diagrams of the system lying around (usually out of date). In the same study of Vogelsang et al., they investigate drivers and barriers for the adoption of MBSE. They found that "Forces that prevent MBSE adoption mainly relate to immature tooling, uncertainty about the return-on-investment, and fears on migrating existing data and processes. On the other hand, MBSE adoption also has strong drivers and participants have high expectations mainly with respect to managing complexity, adhering to new regulations, and reducing costs." So practitioners think the tools are immature and the whole thing might have a negative ROI. But there's the hope that it might help to manage complexity and reduce costs. This does not seem like a methodology I would like to invest in.

MBSE/MDSE is Formal Methods 2.0. I don't think formalisation and formal analysis is useless. It has its benefit and surely a place in the development of software systems. Yet, it is not the holy grail it was often sold to be. If I play the devil's advocate, it feels like after formal methods failed to be broadly adopted in industry, they are now simply renamed as model-based methods. Yet, putting some nice diagrams on top of a formal analysis most of the times won't make it easier to understand. In contrast, I love the work by people like Daniel Ratiu or Markus Völter on integrating formal verification in DSLs or common programming languages (see e.g. here). Is this "model-based"? Does it really matter whether it is?

Models are positioned as replacing code. I hear that less nowadays, but it is still out there. The story line was that the success of software engineering has always been based on reaching higher levels of abstraction. We went from machine code to assembly to C and Java. Models are supposedly then the next step in which we can abstract away all these technical details and everything will be easier.  I don't believe that at all, unless for very specific instances of "model". Abstraction comes at a cost. What we abstract away can come back and haunt us. For example, although Java hides null pointers from us most of the time, sometimes we suddenly see a null pointer exception pop up. It breaks the abstraction and suddenly makes the underlying details visible. As we are not used to dealing with null pointers in Java, this might be even worse than dealing with them directly in the first place. Furthermore, there are many arguments that can be made that there is a huge tool chain supporting us dealing with source code that is not directly available for various kinds of models. Finally, I fail to see how it is easier to work with graphical diagrams than with plain text.

The research community is split (at least) in two. Again, I will play the devil's advocate and exaggerate to make the point: On the one side, we have the MBSE/MDSE people claiming that models are the future and everybody still working on code just hasn't realised that. On the other side, we have the code people (e.g. in the maintenance community) who only analyse source code and ignore what these MBSE/MDSE people are doing for the most part. In the end, I believe, that leads to duplications of efforts, misunderstandings in reviews of papers and project proposals as well as a very confusing picture for practitioners. I don't think we project a consistent vision of how software engineering should be done in the future.

In summary, I believe that the extensive use of the terms "model-based" and "model-driven" in the software engineering community has done more harm than good. When I read that a method is model-based, I don't know if that means it uses graphical diagrams, it just uses additional artefacts or it is based on formalisations. I would be happy if my little rant here serves as the starting point in our community to bring the two fractions together so that we can work jointly on the next generation of software engineering methods, techniques and tools. 

Is there a future for empirical software engineering? Claes Wohlin's keynote at #ESEIW2016

Claes Wohlin gave a great keynote last week at the International Symposium on Empirical Software Engineering in Ciudad Real, Spain. He emphasised that we need to get better at creating empirical results that are puzzle pieces that we can try to fit together. That doesn't mean that that there be no conflicting results but that those should be easily identifiable. We need to build more theories, add and understand context and document our results in a way that helps in meta-analysis and generalisations.

My sketch notes:

Sketchnotes for talk by @alexbergel on evaluating software visualisations @VIS_VISUS

Alexandre Bergel, professor at the University of Chile, gave a talk yesterday at the Visualisation Institute of the University of Stuttgart. He presented the application of two innovative visualisation techniques to software engineering data. He closed the talk with a demonstration of a visualisation tool they develop.

I believe there is still a lot to explore in how to visualise best the data we have for the tasks to be performed during a software project. Yet, I also think we still don't know enough about the information needs. Maybe Daniel Graziotin will be able to contribute to this (doi: 10.3897/rio.2.e8865).

Here are my sketch notes of the talk:

Agile Hardware – Sketchnotes from Nancy Schooenderwoert's workshop keynote #XP2016

At XP 2016, I also attended the Second International Workshop on Agile Development of Safety-critical Software. Also very interesting workshop with – from my point of view – very timely topics. Very much related to agile and safety is the agile development of hardware which I'm also interested in.

Nancy Schooenderwoert gave an interesting keynote on her experiences. The sketch notes are below:

There must be more than ICSE for academic software engineering research

The International Conference on Software Engineering (ICSE) is the best known conference for academic software engineering research. It gets more and more submissions and the PC chairs are trying different things to deal with it. For this year, Willem Visser and Laurie Williams tried a Program Board approach which worked reasonably well. But it does not fully solve the problem.

Recently, the PC chairs of 2017, Alessandro Orso and Martin Robillard announced the idea to limit the submission to three per author.

Read More

Sketchnotes for "Cloning Considered Harmful" Considered Harmful - A look back

Mike Godfrey gave a talk at SANER 2016 looking back at the 10 year old paper "Cloning Considered Harmful" Considered Harmful which received the Most Influential Paper award. It definitely has influenced the discussion on cloning emphasising that there are valid reasons to clone. The discussion overall is not done, however. We are still working to understand the factors influencing the impact of cloning.

Here are my sketch notes of his talk:

How are functionally similar code clones syntactically different? An empirical study and a benchmark

A blogged previously about this study when we made it available as a pre-print. Now I'm happy to report that it has passed peer reviews and is published in PeerJ Computer Science.

We used solutions to programming problems of Google Code Jam. The solutions to a programming problem have to be functionally similar. Therefore, we can consider them functionally similar clones  (i.e. the code fragments provide similar functionality but are not necessarily syntactically similar). Clones with similar syntax, usually the result of copy & paste, can be found reliably with contemporary clone detection tools. Functionally similar clones, however, are still a large challenge.

We used the clone detection functionality in ConQAT and DECKARD to investigate how much syntactic similarity is there in these functionally similar code clones. They probably have not been created by copy & paste. We found that the syntactic similarity is below 16 %; often it is much smaller. To better understand the remaining differences, we categorised them manually. We found differences in the algorithms, data structures, OO design, I/O and the usage of libraries. While this classification may seem trivial at first sight, it really does capture well what has to be addressed in detecting such clones.

We provide based on this categorisation a freely available benchmark representing these differences along with all the data used in the study. We hope this helps the clone detection research community to move the detection of functionally similar code clones forward.

We plan to use the benchmark ourselves now to work on a new detection approach for functionally similar code clones. If you would like to collaborate, please get in touch!

Also: If you are a practitioner and have seen functionally similar clones in your code, please get in touch as well. We are planning to conduct a study on the relevance of such clones in practice.

We are looking forward to your comments and feedback!

Source: https://peerj.com/articles/cs-49/?td=bl