Sunday, 25 March 2012

Silent Evidence in Testing

A: All the test cases passed!
B: But the feature does not work for the customer...

I expect this, or something similar, is not a totally new type of statement to many people. Were the test cases sufficient, was there even any connection between the test cases and the feature, was some part of the scenario/system/network simulated and wasn't accounted for, etc, etc. The possible reasons are many!

Silent Evidence is a concept to highlight that we may have been working with limited information or that the information we gather may have limited scope for use. The effect is to act as a reminder to either (1) re-evaluate the basis of some of the decisions or (2) to act as a warning flag that the information is more limited than we want and so to warn about the usage of such information.

When I started reading Taleb's Black Swan, ref [1], at the end of 2009, I started seeing many similarities between the problems described and my experiences with software testing. One of the key problems that jumped out was to do with silent evidence, from which I started referring to silent evidence in testing.

Silent Evidence
Taleb describes this as the bias of focussing on what is presented or visible. He cites a story from Cicero:
Diagoras, a nonbeliever in the gods, was shown painted tablets bearing the portraits of some worshippers who prayed, then survived a subsequent shipwreck. The implication was that praying protects you from drowning. 
Diagoras asked, “Where are the pictures of those who prayed, then drowned?”
Recently, whilst reading Kahneman's latest book, ref [2], I found that his description of WYSIATI extends the idea of silent evidence. WYSIATI stands for "what you see is all there is" - and this deals with making a decision on the available information. Many people make decisions on the available information - the distinction here is that it becomes an assumption that the information is good-enough, and not necessarily investigating if it is good-enough. This is manifested by associated problems:
  • Overconfidence in the quality of the information present - the story is constructed from the information available. From a testing perspective - think about the elements of the testing not done and what that might mean for product risk. Do you consider any product risks from this perspective?
  • Framing effects due to the way in which the information is presented. If a stakeholder associates a high pass rate as something "good" (and being the only piece of information to listen to) then starting the testing story with that information may be misleading for the whole story. See ref [3] for other examples of framing effects and silent evidence.
  • Base-rate neglect: This is the weighting of information based on what is visible. Think about counting fault/bug reports and using them as a basis for judging testing or product quality. Without context the numbers give little useful information. Within context, they can be useful, but the point is how they are used and to what extent. See ref [3] for more examples. 
Silent Evidence: ENN
I use the term silent evidence from an analysis viewpoint, typically connected with evaluation of all the testing or even connected with root cause analysis - to examine assumptions that gave rise to the actions taken. (I use this not just for "testing problems" but also for process and organizational analysis.) This is useful to find patterns in testing decisions and how they might relate to project and stakeholder decisions.

I use the acronym ENN to remind me of the main aspects. They fall into the "before" and "after" aspects:
Excluded (before)
Not Covered (after)
Not Considered (after)
E: Excluded
This is the fact-finding, data part of the report on the information available and decisions taken before test execution. This considers the test scope or parts of areas that we rule out, or down-prioritize during an analysis of the test area. It can also include areas that will be tested by someone else, some other grouping or at some other stage.

Picture a functional map that modifies a feature, I might conclude that another feature which is mutually exclusive to the one being modified (can't be used/configured simultaneously), deserves much less attention. I might decide a sanity test of that area sufficient, test that the excluded interaction can't happen and leave it at that. (Remember, at this point I'm not excluding expanding the test scope - this might happen, depending upon the sanity, interaction or other testing.)

An extreme case: Testing mobile terminals for electromagnetic compatibility or equipment cabinets that can isolate the effects of a nuclear pinch are not done every time new software is loaded. Excluding this type of testing is not usually controversial - but do you know if it is relevant for the product?

A more usual case: There is a third-party-product (3PP) part of the system that is not available for the testing of your product. Whatever we say about the result of the testing we should make the lack of the 3PP visible, how we handled it (did we simulate something or restrict scope?), what risks the information might leave and if there are any parts to follow-up.

It's never as simple as, "the testing was successful"or "the test cases passed". This is drifting into the dangerous territory of "context-free reporting".

It's not just applicable to software test execution - think about a document review - a diagram has been added and I might consider unrelated parts of the document to only deserve a quick look through. Should I really check all the links in the references if nothing has changed? This is a scope setting to do with the author - they might check this themselves.

N: Not considered
This is the retrospective part of the report or analysis. What was missed or overlooked that should have had more of the available attention? What part of the product story didn't get presented in a good way?

This is very much a "learning from experience activity" - ok, how did we get here and, just as importantly, does it matter? Some problems are more fundamental than others, some are organizational (a certain process is always started after another, when maybe it should start in parallel or be joined with the first), some are slips that can't be pinpointed. The key is to look for major patterns - we are all human, so it's not to spend so much time to root out every little problem (that's not practical or efficient) - but to see if some assumptions get highlighted that we should be aware of next time.

Example: A customer experiences problems with configuration and provisioning during an update procedure. The problem here is not that this wasn't tested to some adequate level, but that it was tested and the information from that analysis resulted in a recommendation to stop/prevent such configuration and provisioning during upgrade. This information didn't make it into the related instruction for the update (and maybe even including a mechanism to prevent configuration).

In the example, the part not considered was either that the documentation update wasn't tested or highlighted to show potential problems with update so that a decision about how to handle the customer updates could be made.

N: Not covered
This is similar to the case for excluded testing - but it is usually that we had intended to have this part of the scope included. Something happened to change the scope along the way. It is very common for a time, tool or configuration constraint to fall into this category.

The not covered cases usually do get reported. However, it is sometimes hard to remember the circumstances in longer, drawn-out projects - so it is important to record and describe why testing decisions in scope are made. Record them along the way, so that they are visible in any reporting and for any follow-up retrospective.

The problem even occurs in shorter projects where the change happens early on, then the change gets "remembered" as always being there.

Example: Think of a particular type of tool needed to fulfill the scope of the testing for a product. For some reason this can't be available in the project (becomes visible early in the project). The project then starts to create a tool that can do part of the required functionality that the first tool would've covered. Everyone is very happy by the end of the project. But what about the 'hole' in functionality that couldn't be covered? Does this still affect the overall scope of the project and is this information being made visible?

And finally
Thinking about silent evidence helps me think about these traps, and maybe to even avoid some the next time around. Being aware of traps doesn't mean you won't fall for them, but it does help.

So, use the concept of silent evidence (or even WYSIATI) to keep your testing and product story more context-specific and less context-free.

[1] The Black Swan: The Impact of the Highly Improbable (Taleb; 2008, Penguin)
[2] Thinking, Fast and Slow (Kahneman; 2011, FSG)
[3] Test Reporting to Non-Testers (Morley; 2010, Iqnite Nordic)


  1. Nice post Simon - I like how you're bringing your readings into your everyday work. I'm hoping to do the same, just putting the pieces together at the moment.

    Nice model as well to help you tell the testing story. Thats said, its all too easy to lead the stakeholders down the path we want them to...

    1. Hi Duncan, if you've got a way to easily lead stakeholders down a certain path, I'd love to hear about it :)