Friday, 13 April 2012

Indicators, Testing & Wine Tasting

The other day I was discussing a problem with a developer. Essentially, it was about how to judge feedback from a continuous integration system next to the other information that the testers in the team were producing. The reason I'd started asking questions was that with the myriad of information available there was a risk (as I saw it) to ignore some information - or cherry-pick the information to use...

Indicators

This led me to describe how the different pieces of information contributed to the picture of the software being developed, and as such they were indicators.

Some people confuse test results for absolute truth. "It worked", "it passed", "it x'ed" - all past tenses.... Getting from a past result to a future prediction of performance isn't easy - unless you're demonstrating on the customers equipment or some identical configuration.

Test results can be important markers or references - especially if a customer thinks of one from a demonstration - then there is an implicit expectation made (assuming the customer was happy with the demonstration) - that particular test case could be thought of as part of an acceptance criteria.

A test result can't be understood in a meaningful way if it is taken out of context and many results from automated suites or continuous feedback systems are inevitably context-reduced. That doesn't mean they're not useable, but for me that means they are indicators of current performance and future likelihood of behaviour. Also, although automated set of tests can be very powerful - they also leave out information (or people interpretting the results leave out information) - the so called silent evidence of testing, ref [1].

How we report and discuss these results adds the context and makes them context-specific. Getting from a test result (or results) to "it works" or "it meets customer expectations" is not as simple a task as a stakeholder (or developer) might wish. More on context-free reporting in another post...

Thought experiment - illustration

Of all the possible tests that could be executed on this system, I have a set (no matter how comprehensive) that I think of good-enough. Now suppose one (1) test failed - and only one test. Would you:

  • Report the problem (or sit with a developer to localize the problem),
  • Wonder what other potential problems might be connected that haven't been observed,
  • Wonder if the information from the failed test is sufficient (extra testing or extra tests),
  • Wonder what information this failed test is saying in context of other/previous testing done (or ongoing),
  • Investigate the significance of the test that failed - maybe even see if there was any connection to recent changes in the system (under test), environment, test framework or other parameterization.

Note #1, hopefully you chose all of the above!
Note #2, this problem expands if you ever have a subset of tests (say an automated set of tests used as a sanity check) and 1 test (or even X tests) in your sample (subset) fail.

The example here is to show that a result doesn't stand on its own - without some situational context. Sometimes the investigation around the problem is about adding that situational context so that you (or someone else) can make a judgement about it.

Flaky Feedback?

One of the problems for the team was a mistrust with the feedback they were getting from the continuous integration system. There had been a glitch in the previous week with the environment which had thrown up some warning signs.

But,

  • Warning signs are exactly that - they're not judgements - they are "items for attention"and always need more analysis.
  • An environmental issue is good feedback - oops, will this work in production, rather than just on our machines?
  • Why only mistrust bad results or warning signs? Doesn't the same logic apply to "good" results or "lack of warning signs"?

Mistrust can be an excuse to not analyse or when overloaded it's easy to say we'll down-prioritize analysis. Mistrust can also result from a system that becomes too flaky or unreliable - in which case you need to consider (1) can I use that system, (2) what should I use to get the information I need, (3) do I need external help?

Another question for these team set-ups (especially as/where continuous integration is being introduced) - is the team itself ready for this feedback? I'll explore this in another post.

Wine Tasting

After I commented on the different sources of information, how they all added to the big picture (the aspect of adding context to a result) and that our knowledge of that "big picture" was constantly changing the remark came back, "this is more like wine tasting than software engineering.."

In a way I couldn't agree more. There are a lot of similarities between "good" software testing and an expert wine taster - they both require skill, constant learning and practice. Descriptions are both objective and subjective - knowing how to distinguish and give the right tone to which is tricky. Both are providing a service. Neither the software tester nor the wine taster is the stakeholder (end user or their sole representative).

But this raises an interesting question - in a world of multitudes of information (and nuances in that information):

  • Is the team ready for this? The sheer amount of information and strategies in tackling or prioritizing the information. Topic for a different post...
  • Is this a simplistic view of testing (and even software engineering)? Testing is not a true/false, go/no-go or back/white result, it is not a criteria or quality gate (in itself).

Software Engineering?

I think we can be misled by this term. When one thinks of engineering it might be in terms of designing and constructing machines, bridges or buildings. The word 'engineering' has primed us to think of associations to engineering problems, many of which have an element of precision in them - often detailed blueprints and plans. But not always....

I occasionally watch a program, Grand Designs, that follows people building their own homes - whether via architects and sub-contractors or totally alone. A common factor of all these builds are that they are unique (even where a blueprint exists) as usually some problem occurs on the way, (1) a specific material can't be obtained in time so a replacement needs to be sought, (2) money runs out during the project so elements are cut or reduced, (3) the customer changes their mind about something (changing requirements), (4) some authority/bureaucracy is slower than hoped for, delaying or changing the project  - so very little is completely predictable or goes to plan. A common factor: where there is human involvement/interaction then plans change!

Software - it's conception, development, use and maintenance is inherently a "knowledge-based" activity. The testing part of it is inevitably entwined with eliciting and making visible assumptions about it's use and purpose as well as giving the risks associated with the information uncovered, investigated or not touched. So, I'd like people to get away from the idea (frame) that software engineering solely as a blueprint, planned, right/wrong or black/white activity.

Using the term "software engineering" is fine - but put it in context: "software engineering is software development with social interactions (that may have a unpredictable tendency to change)".

And finally...

  • Don't assume bad results are not useful or useable.
  • Don't assume good results tell the whole story.
  • Product Risk tends to increase where analysis of results and their context doesn't happen.

References

[1] The Tester's Headache: Silent Evidence in Testing

Let's Test Conference Carnival #1

It's now just over three weeks to Let's Test, a conference first in Europe with the focus on context-driven testing, for testers, by testers -> yes, the organizers are all testers! There is a buzz building around the conference and so I thought a carnival was overdue....

Kick-off

Ola Hyltén got the ball rolling with a post about the main attraction of the conference - confering and peer interaction, here.

Interviews

Markus Gärtner has done a sterling job with a bunch of interviews with participants:


Participants

Some thoughts from some of the participants and why they think it will be a special conference:


More than conferring?

Henrik Emilsson wrote about the test lab, here, and other evening sessions, here.

Starting the Peer-Workshopping Early?

James Lyndsay has proposed a LEWT-model peer workshop for the day before the conference start. For more details read here.

Oh, why #1 in the title?
I expect there will be a bunch of posts during and after the conference for installment #2...