Sunday, 25 March 2012

Silent Evidence in Testing

A: All the test cases passed!
B: But the feature does not work for the customer...

I expect this, or something similar, is not a totally new type of statement to many people. Were the test cases sufficient, was there even any connection between the test cases and the feature, was some part of the scenario/system/network simulated and wasn't accounted for, etc, etc. The possible reasons are many!

Silent Evidence is a concept to highlight that we may have been working with limited information or that the information we gather may have limited scope for use. The effect is to act as a reminder to either (1) re-evaluate the basis of some of the decisions or (2) to act as a warning flag that the information is more limited than we want and so to warn about the usage of such information.

When I started reading Taleb's Black Swan, ref [1], at the end of 2009, I started seeing many similarities between the problems described and my experiences with software testing. One of the key problems that jumped out was to do with silent evidence, from which I started referring to silent evidence in testing.

Silent Evidence
Taleb describes this as the bias of focussing on what is presented or visible. He cites a story from Cicero:
Diagoras, a nonbeliever in the gods, was shown painted tablets bearing the portraits of some worshippers who prayed, then survived a subsequent shipwreck. The implication was that praying protects you from drowning. 
Diagoras asked, “Where are the pictures of those who prayed, then drowned?”
Recently, whilst reading Kahneman's latest book, ref [2], I found that his description of WYSIATI extends the idea of silent evidence. WYSIATI stands for "what you see is all there is" - and this deals with making a decision on the available information. Many people make decisions on the available information - the distinction here is that it becomes an assumption that the information is good-enough, and not necessarily investigating if it is good-enough. This is manifested by associated problems:
  • Overconfidence in the quality of the information present - the story is constructed from the information available. From a testing perspective - think about the elements of the testing not done and what that might mean for product risk. Do you consider any product risks from this perspective?
  • Framing effects due to the way in which the information is presented. If a stakeholder associates a high pass rate as something "good" (and being the only piece of information to listen to) then starting the testing story with that information may be misleading for the whole story. See ref [3] for other examples of framing effects and silent evidence.
  • Base-rate neglect: This is the weighting of information based on what is visible. Think about counting fault/bug reports and using them as a basis for judging testing or product quality. Without context the numbers give little useful information. Within context, they can be useful, but the point is how they are used and to what extent. See ref [3] for more examples. 
Silent Evidence: ENN
I use the term silent evidence from an analysis viewpoint, typically connected with evaluation of all the testing or even connected with root cause analysis - to examine assumptions that gave rise to the actions taken. (I use this not just for "testing problems" but also for process and organizational analysis.) This is useful to find patterns in testing decisions and how they might relate to project and stakeholder decisions.

I use the acronym ENN to remind me of the main aspects. They fall into the "before" and "after" aspects:
Excluded (before)
Not Covered (after)
Not Considered (after)
E: Excluded
This is the fact-finding, data part of the report on the information available and decisions taken before test execution. This considers the test scope or parts of areas that we rule out, or down-prioritize during an analysis of the test area. It can also include areas that will be tested by someone else, some other grouping or at some other stage.

Picture a functional map that modifies a feature, I might conclude that another feature which is mutually exclusive to the one being modified (can't be used/configured simultaneously), deserves much less attention. I might decide a sanity test of that area sufficient, test that the excluded interaction can't happen and leave it at that. (Remember, at this point I'm not excluding expanding the test scope - this might happen, depending upon the sanity, interaction or other testing.)

An extreme case: Testing mobile terminals for electromagnetic compatibility or equipment cabinets that can isolate the effects of a nuclear pinch are not done every time new software is loaded. Excluding this type of testing is not usually controversial - but do you know if it is relevant for the product?

A more usual case: There is a third-party-product (3PP) part of the system that is not available for the testing of your product. Whatever we say about the result of the testing we should make the lack of the 3PP visible, how we handled it (did we simulate something or restrict scope?), what risks the information might leave and if there are any parts to follow-up.

It's never as simple as, "the testing was successful"or "the test cases passed". This is drifting into the dangerous territory of "context-free reporting".

It's not just applicable to software test execution - think about a document review - a diagram has been added and I might consider unrelated parts of the document to only deserve a quick look through. Should I really check all the links in the references if nothing has changed? This is a scope setting to do with the author - they might check this themselves.

N: Not considered
This is the retrospective part of the report or analysis. What was missed or overlooked that should have had more of the available attention? What part of the product story didn't get presented in a good way?

This is very much a "learning from experience activity" - ok, how did we get here and, just as importantly, does it matter? Some problems are more fundamental than others, some are organizational (a certain process is always started after another, when maybe it should start in parallel or be joined with the first), some are slips that can't be pinpointed. The key is to look for major patterns - we are all human, so it's not to spend so much time to root out every little problem (that's not practical or efficient) - but to see if some assumptions get highlighted that we should be aware of next time.

Example: A customer experiences problems with configuration and provisioning during an update procedure. The problem here is not that this wasn't tested to some adequate level, but that it was tested and the information from that analysis resulted in a recommendation to stop/prevent such configuration and provisioning during upgrade. This information didn't make it into the related instruction for the update (and maybe even including a mechanism to prevent configuration).

In the example, the part not considered was either that the documentation update wasn't tested or highlighted to show potential problems with update so that a decision about how to handle the customer updates could be made.

N: Not covered
This is similar to the case for excluded testing - but it is usually that we had intended to have this part of the scope included. Something happened to change the scope along the way. It is very common for a time, tool or configuration constraint to fall into this category.

The not covered cases usually do get reported. However, it is sometimes hard to remember the circumstances in longer, drawn-out projects - so it is important to record and describe why testing decisions in scope are made. Record them along the way, so that they are visible in any reporting and for any follow-up retrospective.

The problem even occurs in shorter projects where the change happens early on, then the change gets "remembered" as always being there.

Example: Think of a particular type of tool needed to fulfill the scope of the testing for a product. For some reason this can't be available in the project (becomes visible early in the project). The project then starts to create a tool that can do part of the required functionality that the first tool would've covered. Everyone is very happy by the end of the project. But what about the 'hole' in functionality that couldn't be covered? Does this still affect the overall scope of the project and is this information being made visible?

And finally
Thinking about silent evidence helps me think about these traps, and maybe to even avoid some the next time around. Being aware of traps doesn't mean you won't fall for them, but it does help.

So, use the concept of silent evidence (or even WYSIATI) to keep your testing and product story more context-specific and less context-free.

[1] The Black Swan: The Impact of the Highly Improbable (Taleb; 2008, Penguin)
[2] Thinking, Fast and Slow (Kahneman; 2011, FSG)
[3] Test Reporting to Non-Testers (Morley; 2010, Iqnite Nordic)

Friday, 9 March 2012

The Linear Filter Trap

Or: Illustrating Systems Thinking with Proximate and Distal Analysis

I was having a discussion the other day where we were looking at some problems and discussing whether it was sufficient to treat symptoms... I remarked that by treating symptoms, rather than root causes [notes n1], and not allowing time to find the real problems then we would, in many cases, be fooling ourselves [notes n2].

Why? (I was asked)
People like to solve problems that they can see - which means we have a tendency to "fix" the problem we see in front of us. This occurs more often if the problem appears to have a "straightforward" fix -> I think of this as a form of cognitive ease in action. Digging for root causes is a challenging activity, and we sometimes want to believe that the cause we identify is good enough to fix. For another example of cognitive ease, with best practices, see ref [2].

An illustration of this - that I have seen in one form or another - is that we settle for the first solution without understanding (or trying to) the root cause. There is no guarantee that fixing a symptom will make the problem better. Many times, the problem improves for a while, but then re-occurs in another form. Now there is a "new" problem to solve, which usually has the same (or similar) root cause, so from the system perspective it's ineffective [notes n3].

Ineffective, because many problems in processes and organisations are often non-linear, but we often try to solve "linear" problems, and...

Linear Filter
I expanded, that I think of this from a systems thinking perspective as applying a "linear filter" to a "non-linear system".

What? (I was asked again) Linear vs Non-Linear?
Non-linear -> multiple interactions affect the output vs linear -> the output is directly proportional to the input. So the application here is that there are usually multiple causes for a problem -> when I perform an assessment after a root cause analysis (RCA) activity I usually take them root cause by root cause, in the order that we think will have the biggest impact on improvement.

Ok, time for a....

A Real-life example
A fault (bug) report was written by a customer -> initial RCA shows that some "test" was not executed that would (in theory) have caught the problem. This is a symptom and a "linear" view of the problem. The "linear filter trap" is to then consider this as the root (or most important) problem. [notes n6]

Digging deeper shows that the team had it's priorities changed during execution, to make an early drop (resulting in some negative and alternative use-cases being delivered later). This, in itself, is not a problem but the communication that was associated with the "early drop" didn't reflect the change.

In this case, some of the underlying problems were a set of communication issues:

  • Partly in the team connected with their story [notes n4] (especially their testing silent evidence [notes n5]), 
  • Partly connected with the stakeholder that changed priorities and may have had a duty to follow the change of expectations through the delivery chain and what that might mean at those different stages, and 
  • Communication with the customer to ensure that they are aware of the staged delivery, and what means to them.

Another example of a root cause analysis can be found in ref [3].

And finally
Tackling and fixing symptoms is a very natural activity - very human. But it is not always enough. Sometimes it is enough - it depends, of course, on the scale of the problem and the cost of investigating the underlying problems and tackling those. Sometimes, the underlying problems cannot be fixed and it is sufficient with easing the symptoms.

But I believe, as good testers, it is important to understand the difference between symptoms and root causes, especially where it affects either the testing we do or affects the perception of the testing we do. This is important where a perception of "testing or testers missed something"... So,

Be aware of the linear filter trap!

[n1] In philosophy and sociology, root causes and symptoms are usually referred to as distal and proximate causes, see ref [1] for more background.

[n2] Slightly naughty of me, playing on the fact that most people don't like to think that they are fooling themselves, but that's a different story...

[n3] The times when it might be effective are when we (some stakeholder) is prepared to take the cost of fixing some problem now. A problem here is that stakeholders that are project-driven have, by the nature of the task, a propensity to see only as far as the end of the project. A product owner may have a different perspective - bear this is mind when someone is deciding whether it's a project problem or a product problem - or even a line organisation problem.

[n4] Story here means the story about the product and the story about the testing of the product.

[n5] Here, testing silent evidence refers to the elements not tested and thus not reported - their significance is assumed to be not important. For further background see ref [4].

[n6] I should add that the problem with the trap in this example is that I have seen this in the past trigger one of two responses: (1) A perception that the testers are at fault, which becomes a myth/rumour with a life of its own ; (2) A knee-jerk reaction to implement some extra oversight of the test displicine or team as a whole -> in the worst case it becomes a desire to introduce some additional "quality gate". This is a good example of when reacting to the perceived symptom is both ineffective and counter-productive for the organisation.

[1] Wikipedia: Proximate and ultimate causation
[2] The Tester's Headache: Best Practices - Smoke and Mirrors or Cognitive Fluency?
[3] The Tester's Headache: Problem Analysis - Mind Maps and Thinking
[4] The Tester's Headache: Mind The Information Gap

Thursday, 1 March 2012

The Hindsight Trap

Root cause analysis (RCA) activities are an important part of many retrospectives - whether as part of  a team activity, analysis of test reports, analysis of fault (bug) reports or as standalone activities to investigate organizational and process problems.

I read Rikard Edgren's post the other day, on some good ISTQB definitions, here, and it reminded me of a common trap that people sometimes fall into with documented claims or requirements. (I hasten to add that Rikard didn't appear to fall into this trap!)

The most common form of this trap in testing circles (where documentation is involved) usually crops up as a question, "why didn't you/someone spot this problem earlier (as it's specified in document X)", or statement, "this requirement should been tested because it's specified in document X".

The Trap
The problem with these types of statements and questions is that there is an important assumption being made: They are using documented specifications or claims in their current form and assuming that was the same form that was an "input" into some tester's/team's mind.

In my experience this assumption occurs more often when:

  • The one making the analysis doesn't understand the problems (challenges) that testers face with making documented and undocumented requirements visible.
  • The one making the analysis assumes that testers have the same information as everyone else.
  • The one making the analysis assumes that test problems are "owned" solely by testers. 

Trap Avoidance
Ask yourself at least one question (but preferably more) about documented claims / specifications:

  • Was this information available to the tester at the time of testing?
  • Silent evidence? Was this information available but not used/tested for a particular reason?

In the first bullet the onus is on the one making the analysis. In the second bullet the onus is on good test reporting (story-telling) - highlighting areas not done (for whatever reason),  more on silent evidence in ref [1].

Further Analysis
Potential for these problems exist where code design and test are separated in different teams. But, more generally, they exist where communication problems (or lack of transparency) exists - I see these most often where parts of teams receive external requests that isn't "hand-shaken" within the team.

One way to help make these problems visible (whether from an RCA or team retrospective viewpoint) is to consider the frames that different roles have with respect to the problem, how does the developer look at the documented claim and how does the tester - any external influence that changes their priorities? More in ref [2].

So, being knowledgable after the fact (with hindsight) usually misses potentially significant parts of the story. And remember, a statement is very often not the whole story -> good story-telling takes effort!

Hindsight trap - sound familiar?


[1] Mind the Information Gap
[2] Framing: Some Decision and Analysis Frames in Testing