Monday 21 April 2014

On Test Results and Decisions about Test Results

A Test Result != Result of a Test
An Observation != Result of an Observation

It’s not the test result that matters, but the decision about the test result!!

Pass-Fail-Dooesn’t Start-Inconclusive-Can’t Execute

How you come to these results (and there are more) is interesting. But from a testing viewpoint what you do, what actions you take based on such results is very interesting.
“If a tree falls in a forest and no one hears it, does it make a noise?”
Or
“If a test result is obtained and not used or considered, was it worth it?”
Used?
Did it confirm an expectation?
Did it contribute to a body of evidence that a feature, product or system is behaving within bounds?
Did it help understand risks with the system?

If the answer is no, then it might be time to take a good long hard look at what you’re doing…

Body of Evidence / Evidence
All test observations and test results are not equal. They contribute to the picture of a product in different ways. But that picture is not necessarily a paint-by-numbers book. It’s not something that you can necessarily think I’ve ticked all the boxes, I’m finished.
Note: In many cases, Testing is not painting by numbers! 
Unless you’re a doctor finding no pulse and rigor mortis has set in!

It’s about making sense of the observations.

The 1% Problem Soundbite
Suppose 1% of your tests fail. Suppose you’ve seen a problem that 1% of your customers will experience.

The 50% Problem Soundbite
Suppose 50% of your tests fail. Suppose you’ve seen a problem that 50% of your customers will experience.


Based on this information is it possible to say anything about the product?

No. But you might have something that says, “need more information”.

These are what I think of as context-free reports.

From those soundbites, you don’t know anything about the nature of the problems, product, market that the product might be used in, circumstances for usage, etc, etc.

Suppose the 1% problem is a corner case - not allowing installation in a geographical location if some other feature is active - that might affect how the product is launched in that market. Suppose it’s something “cosmetic” that might annoy 1%, but not prevent the product being used. These two different types of observations (results) might produce totally different results.

The 50% case - is this a break in legacy, some feature that customers are using and needs to be operated differently; or is it a new feature interacting differently with existing features (e.g. flagged by some automated scripts) that hasn’t been launched yet and might be only for trial/“friendly users”. Again these observations (of essentially first order feedback, after Weinberg, ref [3]) might have totally different results.


Decisions and Supporting Data

1. Are you a tester that decides when testing is finished and a product can ship?
2. Are you a tester that tries to give your boss, project manager, stakeholder a good account of the testing done on a product? Might you even give some insight and feedback about what that testing might mean?

Ok, assuming you’re in the second category…

What does your test observation tell you?
What do your test observations tell you?

I’m reminded of a model I’ve helped use in the past about understanding test results, ref [1]. But now I’m looking at the flip-side, not what a “pass” means but what a “not-pass” might mean. A simple version of a test result / observation might look like:




Note, this is a typical interpretation when a test result is deemed to be of the form OK / NOK (including inconclusive, don’t know etc.) The implication of this is that when the desired test result (OK, pass) is obtained then that “test case” is done, i.e. it is static and is unchanged by environment or circumstance. This might be typical in conformance testing, where responses within a desired range is expected, usually when the test subject has a safety component (more on this in another post).

But, if the results are not black-and-white or can be open to interpretation (as in the 1% problem soundbite) then a different model might be useful.




This model emphasises the project, product and testing elements to consider - these may change over time (within the same project), between product versions and between testing cycles (or sprints). I’ve drawn the line between product owner and product decision as a thick black line to highlight that this is the deciding input.
More on testing and silent evidence can be found in ref [2].

A larger version of this picture can be found here.


References
[3] Quality Software Management: Vol 2 (Weinberg; 1993, Dorset House)

Sunday 20 April 2014

On Thinking about Heuristic Discovery

How do you spot new heuristics?
How do you identify new heuristics?

These were questions that were “touched on” during parts of SWET6* (a small part in the open season of James Bach’s discussion and during a car journey from SWET6).

“Pause & Reflect” / “System 2 Re-Insertion”

During the open season of James Bach’s topic on documenting heuristics from a test activity I remarked that I’d spotted an heuristic that he didn’t appear to notice. It was the activity of pause and reflection: putting down the work for a while, revisiting, correcting and re-working. This can cycle through after bouncing ideas around with colleagues or getting their feedback, followed by then further ‘pause and reflection’ cycles until the result was deemed good enough. 

Reflection
Due to the nature of how the activity and report had been made it incurred a series of pauses (interruptions or breaks) which then (I assert) meant that some of the parameters of the context (or frame through which you look at the work) has to be remembered, reviewed or picked-up again - this can mean that the frame gets slightly altered, i.e. “you look at it with a fresh pair of eyes” and, hey presto, see something different or new.

I do not claim to be the first to spot the power of this activity - but I think of it as re-activating the System 2 mode of thought (according to Kahneman, ref [1] this is more deliberate and takes conscious effort to use). The action of putting the work (report) aside for a time and then revisiting means that some of the context is forgotten and so has to be remembered when the work is picked up again. This is the re-analysis in the system 2 mode of thought.

Noticing New Heuristics

The pause and reflection about the observations is very important. It gives the basis for new pattern recognition. Either you notice something different that you don’t recognise or that is a little different. Typically this might be a procedure that you followed, for example:
  • I tend to find problems of type X when I do A, B & C, 
  • I tend to find race condition problems when I alter (shorten and lengthen) the timing between certain test steps.
  • I tend to notice new/different patterns when I pause and reflect on a set of actions and compare with previous times I did some similar action.
Blink Comparator / Trawl and Compare

The activity of reflection and noticing new patterns is something I think of as a blink comparator test. A blink comparator test is something astronomers used to use to find new stars. A reference pattern is used to compare with a current observation. It’s like trawling through previous experience (observations) and comparing with the latest experience. In the example above it might be:
  • When I find problems of type X (in the past) what was the same/similar to the current actions (A, B & C)?
  • When I have found race conditions in the past were there any timing differences in the actions I made in the test steps? (When I change the timing between steps - for the same types of steps - I find race conditions more often.)
  • When I notice new patterns is there some key step that is common? (Pause and reflect,)
Visualization

Sometime in January I tried to sketch this connection between Pause & Reflect and how I noticed new heuristics. This is my latest version that I think reflects the process I tend to follow:





A larger version of this model can be viewed here.

"Pause & Reflect Heuristic"
The model is best read entering the “Pause & Reflect Heuristic” box. 

Here observations of actions (or reviewing a test report) is done - with a pause & reflection in-between - so the two “Solution / Result” clouds may be different. The “Evaluate” cloud is where a difference is noticed. In some cases this feeds back as re-work, in other cases it causes a question which then feeds into the “Proto-pattern” cloud. 

Blink Comparator Test / Trawl and Compare
This is the step where a search of the difference (comparator item) is made. The search might be a re-analysis of similar experiences to understand what caused the different result. The result / assessment might be that it was a random action, or something outside my control, that caused the difference**. At other times it might be, “step X was made when the system was in state Y” - that might be enough to give me a new useful rule of thumb (heuristic).

The next step would be to search if it was something in use elsewhere or known already. If new, then classify (name and describe) and publish (talk about or discuss it).

Question, Test, Evaluate
But the work doesn’t stop there. When using this heuristic new observations about it’s use should be gathered. Does the heuristic work as before? Is there a new (maybe more subtle) pattern or action that makes it useful. This might result in a new heuristic, a refined heuristic or a restriction to fewer applications. This is the “Question, Test, Evaluate” yellow box. It is analogous to the scientific method where the heuristic is the object (hypothesis) tested.

Testing the model

I’m currently collecting new observations of noticing new patterns and trying to construct ways I can test this model.

In the meantime, comments and suggestions are very welcome.

References
[1] Thinking, Fast and Slow [Kahneman; 2012; Farrar]

*SWET6 was at Hönö Hotell, Öckerö, Sweden, 19-20 October 2013, Attendees: Martin Jansson, Steve Öberg, Saam Koroorian, Mikael Jönsson, Anders Bjelkfelt, Marcus Möllenborg, Klaus Nohlås, Simon Morley, Henrik Emilsson, James Bach

**Note, a “different result” might be, when I tackled the problem now I got a different result than previously. What made the difference? This can applies to systems of people interaction too.


Saturday 19 April 2014

Simplistic Views, Complex Subtexts and Whaa?

Recently I read an opinion piece, ref [1], from a Technical Evangelist (more on that label later). My first impression of it was a muddled message. Which point was the piece trying to make?

So, I made a quick close analysis of the text and posted some questions, which basically amounted to, “do you have data/evidence to back this up” and “can you clarify what you mean”, ref [2]. I’ll be happy if I get a response.

Close Analysis

Close analysis (after ref [3]) is a form of critical analysis where the text is dissected into argument markers, assuring and guarding terms, discounting and evaluative terms and rhetorical devices. The aim is to analyse a text sympathetically - i.e. to try and make the person’s argument as good as possible - to get to the real meaning of what they are trying to say.

This is my usual first step where I need to pay close attention to text. In the case above I found it unclear and bordering on incoherent hand-waving. Sometimes where the people are responsive then there might be a discussion that starts with, “can I ask some questions?” (ref context-free questions, ref [4])

Yes, it’s time to stop, think and ask questions - what is the message, where is the supporting analysis or evidence.

I will probably be doing this more and more. I’m currently making a close inspection of Taylor’s Principles of Scientific Management. I’ve also recently had reason to look at Program Test Methods and I’ll be revisiting that to provide a commentary of it. I started looking at these because they either get referenced in several places or are the basis for further work. I may end up revisiting more of my software testing library and other literature (as I have done with “regression testing” in wikipedia and the ISTQB glossary, ref [5]).  

Some useful Heuristics for Potential Spotting Problematical Articles

Evidence or Data?
If a claim is made is there a reference to some analysis that you can read or how that conclusion was discovered?
Beware of data being retrofitted (cherry picked) to support a claim.
Do the claims appear to be anecdotal or have a single source?

Meaning
Is the meaning clear?
Does the conclusion follow from the argument? 
 - These two are the main parts of close analysis.
Why was the piece written? Was it an opinion or were they trying to influence opinion?
How was the reaction to the piece? Peer review, open comments, discussion or other commentary?

Labels
The labels people use are important. Evangelist has many connotations (to me) of preaching someone else’s thoughts, words and ideas. Ok, so if you don’t think for yourself should I go to the source?
BTW, the same (to me) applies when I hear someone describe themselves as a “passionate <whatever>” - I wonder does that mean they’re enthusiastic, irrational emotional or irascible.
Labels like expert and authority should be distinguished whether it is self-appointed or peer-recognised.

And finally….
I know this won’t stop articles and books where “I only have a few minor comments”, but it’s important to distinguish a simplistic or nonsensical view from a complex subtext or just plain BS.

If you don’t understand it or where a claim is made without evidence, call it out!

References

[3] Understanding Arguments, 8th edition, chapter 4 [Sinnott-Armstrong, Fogelin; 2009, Wadworth]
[4] Exploring Requirements: Quality Before Design (Gause, Weinberg; 1989, Dorset House)
[5] The Tester’s Headache: Another Regression Test Trap http://testers-headache.blogspot.se/2013/03/another-regression-test-trap.html