Sunday, 15 May 2016

A Thought Experiment on Definitions

Thought experiments are a very powerful tool. You probably use them a lot without realising! Any time you wonder "what would happen if..." or "what is a possible consequence of that?" then you're making your own mini thought experiment.

Einstein used them to develop his ideas around relativity. A recommended documentary on this here.

I recently posed a thought experiment on twitter

thought experiment: what happens when one agrees on purpose behind a definition but not agree on it's usage..

There are a number of layers to this question.
  • Purpose
  • Definition
  • Agreement
  • Usage
This is the "why?" question. What problem are you trying to solve, and does the definition and usage examples help solve it?
So, what might happen if a usage of a definition doesn't appear to agree with it's intent, i.e. they are not congruous.

This is getting into the correctness and relevance area. Is the definition too narrow or broad? Is it circular? Is it complex to understand. Is there some guidance to help understanding?

Complex or obscure definitions may be harder to agree with. Is the definition accessible, useable and congruous. Is there controversy or disagreement? Is that due to the purpose-definition-usage parts not being in synch? Is the definition generally accepted - de facto agreement?

Is it clear how such a definition would and wouldn't be used? Are there any examples, or patterns and anti-patterns of usage somewhere - or indeed any guidance at all.

It's not necessary for a definition to have usage examples or guidance. But it might help the case. Think about dictionaries - do they often, usually or seldom include examples of usage or guidance notes? (I think the answer would, of course, vary with the dictionary used.) This question would seem to be more relevant if the definition is complex or is difficult to accept.

What might symptoms of non agreement between definition and usage look like?
  • Dislike of the definition (fit for purpose? relevance?)
  • Aversion or uneasiness with the definition (understanding, clarity?)
  • Misuse of the definition (understanding, clarity?)
  • Non-use (relevance, clarity, understanding?)
To me there are a number of consequences if such a contradiction crops up between usage and definition.
  • The definition is not clear or complete.
  • The usage of the definition is not clear or illustrated.
  • The definition is misunderstood.
  • The definition is communicated in a way that doesn't align with the definition.
  • There is resistance to the definition and/or usage - emotional response.
  • There is resistance to the definition and/or usage - different paradigm.
  • There is resistance to the definition and/or usage - different dictionary references.
  • There is resistance to the definition and/or usage - frames of reference.
  • There is resistance to the definition and/or usage - little value add visible.
  • A combination of the above or even something else.
So, good definitions are generally robust. Unfortunately in the world of software testing many definitions would fail a lot of these tests above. Go look in the ISTQB Standard Glossary of Terms used in Software Testing and try it. Do you find any terms that "don't add value"? 

Ok, so if I wanted play the school ground bully and pick on the weak I'd start with the ISTQB glossary, but I have higher intellectual ambitions, so...

I've been thinking about checking recently, let's try there.

I would say I have had a certain uneasiness with the definition - for reasons I don't think I've always been able to articulate. This could boil down to my understanding or the clarity of the definition or something else.

It could be that this feeling is also reflected elsewhere - as recently appeared on the software testing club. The reasons others may give for their "Icky feeling" may be unconnected from my observations, but it would be interesting for them to give their reasons.

Ok, so let's take the checking definition from RST:
Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
  • “evaluations” as a noun refers to the product of the evaluation, which in the context of checking is going to be an artifact of some kind; a string of bits.
  • “algorithmic” means that it can be expressed explicitly in a way that a tool could perform.
  • “specific observations” means that the observation process results in a string of bits (otherwise, the algorithmic decision rules could not operate on them).
 Now let's apply it to a stochastic process - eg speech recognition.

According to the definition I can make specific observations (samples of audio) and apply an algorithm to them (for example a speech recognition algorithm). The interesting thing here is that the result is non deterministic (due to speech/accent/pronunciation variation - making the test data design problem difficult) and is going to need some engagement - both for input threshold parameters and analysis of the output. I might get a boolean output (match/no match) or I might get a range (78% match) - and that is a function of the input parameters and the specific observations I ran the algorithm with.

Now the actual algorithm that is making the comparisons is the "checking" part of the process. But this becomes a very small part of the whole - because I need to put effort (more effort and time than the algorithm takes) beforehand and afterwards.

To make this example fit into the current definition I'd have to have all possible samples for certain speech snippets (infinite) or I'd have to define the sample population (this is the test design part of the process - by implication this is part of "testing"). (I won't get into the problematics of the sampling mechanism I use.) So, I'm narrowing the checking part of the whole even more.

So, the question becomes (for me) - should I only use checks where I am certain of the wanted outcome - i.e. a binary answer (which might be "yes/no", "pass/fail", "above 78% threshold/not above 78% threshold"). And here's the problem - I'm quite happy to use scripts as change checkers - or early/leading indicators - they are a mechanism to draw my attention to a result and then ask a question, "should I investigate more or what does this result tell me?". As soon as I am paying attention to the result or thinking about it I am not checking anymore - that's testing.

In this example, checking becomes a very small part of the whole - compared with all the other parts of requirement and test analysis, test design, test set-up and result analysis that make up testing. Then I wonder what value it really adds.

Am I using the definition incorrectly? I don't see any usage examples anywhere, so maybe the definition is incomplete. Or maybe guidance is incomplete. Or maybe the terminology is just not useful for me.

Divergent thought: In the definition of checking it's not clear to me if the algorithm can be a non deterministic algorithm. It could be read in that way - then here's another thought experiment --> what would the consequences of that be?

If I was to revisit the purpose and intent behind this definition I'm not sure that it achieves what it wanted. The checking part is quite small - the other activities in testing are not described so the importance of checking seems to be artificially increased. This is a problem! To me, it would be better to list different tactics of test execution and highlight that checking is one of them.

So, in this example, the "checking" is a very small part of the whole and falls into (for me) a very narrow definition, with a certain amount of ambiguity. (It's narrow as it is contrasted with testing. This is analogous to a "testing vs test design post".) The definition is incomplete and/or incongruous (no usage example and generates confusion and discussion) and fails to add value (as it seems to artificially inflate the importance of checking in relation to other testing activities).

Note, it's taken me quite a while to come to this conclusion - I have needed to put an amount of time thinking around this. It's certainly not an obvious conclusion. And I can also understand if others don't have the time, energy or inclination to do this type of thought journey and treat it as a heuristic to help in their communication. And I also understand that this term is helpful for some people and they have success in using it with their stakeholders - again if this heuristic communication works for you - fine.

Final word
It seems to me that there are many definitions around in the testing and software testing community that could benefit from this type of approach. Do you agree? Which would you try it on first?

Potentially Related Posts

The Conway Heuristic
Communication Heuristic: Use Cases
Thoughts around the label "Checking"

No comments:

Post a Comment