Sunday, 14 September 2014

ISO 29119 Questions: Part 2

Content questions

There is an ISO software testing standard (ISO/IEC 29119) parts 1-3 published, parts 4-5 in draft, ref [1-4]. I have read parts 1-3 and a draft version of part 4 and am and have been forming several questions for some time. Some questions might be regarded as problems, issues and others as need for clarification.

I will use a series of 3 posts to document. 

Part 1: Looks at the reasoning for the standard - based on the information publicly available, ref [8].
Part 2: Looks at some of the content.
Part 3: Looks at the output and usage of the standard and some implications.

This is a snapshot of some items analysed in 29119, it is not a full review - but the comments are representative of what I have found in parts 1-4.
It looks at the validity of the model, the aspect of compliance/conformance to 29119 and some of the language and descriptions used.

Process Model

Process Model Validity?
The standard presents a set of definitions, models for processes and templates for artefacts. The implication (that I assume) is that these are needed, required or observed where “good” software testing happens. I make this assumption as anyone would hardly take the effort to standardise approaches that are ineffective, would they?

For these process models to have any validity I’d expect external and internal validity to be demonstrated or the basis on which they are claimed to be shown. In fact, you’d expect research of model types that work, and which type of populations (organisations, products and project specifics) to be the basis for a standardisation effort.

Internal process model validity
This handles the question of cause and effect. By following this process model it produces something. Good software, happy customers, good testing? It isn’t really stated. A question I have with a process model - that is presented as a standard - is where is the relation between the model (construct) and the result/interpretation (“good” software, or “good” software testing).So what’s the purpose of the model - and what will happen if I follow it. The effects are not stated. There is no evidence of it being applied (tried or tested). Therefore, internal validity of the process model cannot be assumed. I would suggest this is needed for standardisation. 

External process model validity
This handles the question to which the process model can be applied generally - across which population of products, projects and organisations. Again there is no indication (evidence) about which range of products and organisations this can be reliably applied to. It is claimed that it can be applied for any organisation or software testing. This claim is not supported by evidence - reference to the input cases for which this claim is made. Therefore, the claim of application (any software testing and any organisation) is unsupported. Rhetoric and guesswork.

Process Model - Default Form: Waterfall
It is striking looking at the process model flows how waterfall like they are. In 29119-2, 7.2.1, the test planning process is shown. It is also visible here, from ref [6]. There is a note saying that it may be done iteratively. I think leaving the default model being a form that Royce said “is risky and invites failure”, in 1970 ref [9], is poor work. It seems that some real experience “that can be applied anywhere” is not the basis of this flow. 

The last part of “Gain consensus of test plan” (which is labelled TP8 in 29119-2) is that approval is gained from the stakeholders. The last part of “Publish Test Plan” (labelled TP9 in 29119-2) is to communicate the availability of the test plan. Wait, the test plan your stakeholders have approved in the previous step - you’re now going to tell them that it’s been approved? Do you think this might be annoying, appear wasteful or potentially just incompetent? I think there’s a risk for that - if you follow it by rote.

Test status and reporting is only described within the context of a completion process. Communication is framed as a handover (or asset even) rather than something that starts early and is continuous. This seems to be framed as a contractual type of reporting. The lack of integration with a software development process model is striking - there is no linkage to software development other than “design then test” - so this seems to be aimed at non agile shops, non incremental or iterative development shops. So who is it aimed at? Testing and test results can and should affect decisions within software development (including requirement refinement and discovery), but this seems to be absent in 29119 - possibly, I hypothesise, because it is a pure waterfall-test-separate-from-development model.

The process model is a mine field of contradictions and potential question marks (for me) - which is one reason I can think that it hasn’t been tested / used in real-life. Note, I have read that a draft of 29119 has been used in a study - this will be the subject of a future study. 

Process Model Conclusion 
The problem for me is that definitions are stated and process models are stated. But the validity of the process models producing good (meaning to me: efficient, effective and reasonable) software testing is not stated, linked, demonstrated or explained.

I’d like to see - and expect a linkage to be demonstrated - in a model that is at the basis of a standard - between theory (what is intended to be the output of such a model) and process model, and between study of practice (evidence to support the model that is intended to be standardised) and the process model. Where is the linkage, or evidence of this linkage - I can’t find it in 29119? Therefore, I cannot see any basis for this process model or any internal or external model validity. Conclusion: speculation, gamble.

The standard appears to lay the groundwork for a paper-trail between test requirement/inception and final test report. However, the rigour that 29119 might want to put into this paper-trail is strikingly absent in it’s own “trail” (or line of reasoning) between “test need”, “test output” and “test effectiveness and efficiency” - i.e. it omits the evidence for it’s claim that this is a model that facilitates “effective and efficient” testing, or is even a useful model. How did that happen? It seems to be a “Don’t do as we do, do as we say” approach - and I can’t decide if it’s sloppiness or lack of rigour or some oversight on the part of the WG or ISO or both. I do wonder if the result might have been different if ISO had recruited (with interviews) people to draft a standard rather than leave it to volunteers.

Perhaps the standard - and any accompanying audit according to the standard - is meant to facilitate demonstration by the tester (or organisation) that they have formed a traceable approach to their work and can connect meaning for the work and results to a stakeholders needs. This is a potentially valid and useful aim. But the standard in itself will not necessarily facilitate this - an organisation could produce masses of documentation according to the standard and still have work that isn’t effective, efficient or even repeatable. I will dig more into this question in part 3 of this post series.

An organisation can claim full or tailored conformance to 29119.

In part 29119-2, section 2, the processes (meaning - I think - activities) involved are described as the basis for conformance (or not). Due to the concerns above (with the model validity) this could lead to a number of scenarios where an organisation is requested to state its conformance to 29119.

1. Full conformance is claimed (but there is no understanding of the implications or the organisation hasn’t spotted the problems in the model).
2. Tailored conformance is claimed (but there is no understanding of the implications or the organisation hasn’t spotted the problems in the model).
3. Tailored conformance is claimed (and no part of the process model is followed). The implication here is that the organisation understands (and can demonstrate) what they are doing and maybe even sees 29119 as an obstacle to efficient and effective testing.
4. Non conformance is claimed and the organisation understands (and can demonstrate) what they are doing and maybe even sees 29119 as an obstacle to efficient and effective testing.
5. Some combination of the above.

So, a statement of conformance potentially doesn’t say so much.

Language and Claims
I don’t know if the editing, writing style or understanding of the topics is at fault but I know poorly formulated or illogical statements don’t help in understanding 29119. This is meant to be a standard, developed and reviewed over quite a time - not a soundbite or copy/paste/hope-it-makes-sense effort. 

For instance, the notes for test strategy (29119-2, state that the estimate of “elapsed time” should be made. This is like saying, “estimate the actual time it takes rather than the time you think it might take…”

- Context, Understanding
On the part about understanding context (29119-2, it’s not actually stated what is meant with “context” - so a statement such as “understanding of context and software testing requirements is obtained” is simplistic, misleading and poorly written. On “understanding … is obtained” - how should that be measured? Well the rest of the statement is “to support the test plan”. So the implication (as I can read it, as there is no other guidance on interpretation, once there is a scope in a test plan - or really test plan - then “understanding is obtained”.) As someone who talks to plenty of non-testers you cannot point to a document and claim that understanding is “obtained”.

Ah, but wait, point (b) in the clause might help. You should identify and “interact” with “relevant stakeholders”. Oh, not any old stakeholder (which would be an oxymoron, I think) but relevant stakeholders. Clumsy language -> like saying, “talk to the right people”. So it appears as though the standard is using a convoluted way of saying, “talk to the right people”. Simplistic and obtuse.

Section 29119-1, 5.2, does give some words on project and organisation context. However, this is framed in the model of 29119-2. This model has a number of validity issues (see above) and is, in essence, forming a circular argument. If you adopt the process model (from 29119-2) then form an understanding of the organisational and project context then the process model (in 29119-2) can be followed. But the context understanding is framed by the process model and the context understanding should be formed somewhat separately (independently) of the process model. Or looking at it starting with 29119-1 (5.2) to understand the project context you need to adopt the process model frame from 29119-2 - same circular dependency. Circular reasoning -> faulty basis for usage, and potentially misleading.

- Stakeholders
Reference to “stakeholders” or “the stakeholders” occurs quite often in parts 1 & 2 without actually suggesting who they might be. I think putting a lot of time into describing a model of a process without describing the actors in the model is a serious flaw in a model. It means it is so generic - to be applied (or maybe really overlaid) anywhere. It also suggests the validity of the model or actual usage was never considered. Unclear and potential for misunderstanding.

- Engineering Heuristics
In 29119-1, clause 5.1.3, testing is described as an heuristic approach, although it is described in a muddled fashion. It correctly states that heuristics are fallible but that knowing this allows multiple test strategies to be used to reduce the risk of using an ineffective test strategy.

  • Why would you knowingly use an ineffective test strategy (I can think of a reason, but the impression is that you shouldn’t use one, so the implication is that an ineffective test strategy could be unwittingly used…)? 
  • Or does this mean that multiple test strategies should always be used because you don’t know which are effective or ineffective? The implication is that we don’t know the consequence, limitations and benefits of any test strategy - and that I’m not sure I can agree with.
  • Of course, the standard gives no guidance on what is effective or not. So the purpose of the standard is what exactly? 

It seems to be advocating - do multiple things because you might not know what you’re doing. I take the view that if you don’t know what you’re doing then you need help, training, practice and even coaching, not a blunderbuss approach that may do more harm than good. Clumsy and Reckless.

- Exploratory Testing Definition
In 29119-2 (clause 4.9) describes this type of testing as spontaneous and unscripted. Then the authors couldn’t imagine a case where investigation, deliberation, preparation and even scripting might be needed to perform testing; a case where the result/s of that test might dictate the next steps. This, in itself, is not unusual as I think many people equate “exploratory testing” with something to do with little preparation rather than using results to guide future tests. I have prepared and guided this type of approach in the past. Therefore the “definition” in the standard is erroneous and unhelpful - either to me as a tester, test strategist or test communicator.

- Standard, Dictionary or Constraint?
By attempting to define (describe) all types of testing (I won’t say approaches as I don’t think that was considered) then this limits my potential testing. If I discover, develop or talk about a new (or variant) of a technique or way of testing then I am (per default) non-compliant to the standard. So the standard is not a guide to good, appropriate or effective ways to test. Erroneous.

In the draft of part 4, 29119-4, the conformance part states that techniques used that are not described in 29119-4 should be described. This seems to be a way of not capturing everything in the standard - and is a way of avoiding debatable definitions and muddled language.

It seems to me that the standard is really attempting to be an encyclopaedia of testing - but when I see muddled language, claims without support, circular reasoning and what appears to be untested models it makes faith (and trust) in definitions a little difficult. An encyclopaedia is not the same as a standard - so I think the intention behind the standard (and the result) is muddled.

- Informative?
29119-1 is informative. This means (according to 29119-1, 2) that conformance (agreement or compliance) is not needed. This means it is optional or that it is open to interpretation. One consequence is that having this as an input to part 2 - a generic model - means that the model is open to interpretation. Another consequence is that it’s like a poor dictionary - at best a guide to current usage, at worst misleading. Superficial.

- Process?
I think there’s probably some confusion about what “process” means. It’s what happens (according to Mirriam-Websters dictionary and the OED). How does that fit into a process model? A model of things that happen? Ok, and this is a generic model. To produce what? That’s not defined. So it’s trying to describe a generic set of actions that happens in software testing without any description of an output. Why would you do that? I can hypothesise that (i) you might perform good testing already (which your organisation might regard as efficient and effective), then 29119 is of no use and may even be detrimental; (ii) you’re a start-up, new org or org without any knowledge of how to approach testing - then 29119 might be used to produce some documentation and activities - but as there is no construct validity in 29119 that may be a totally inappropriate approach also. So what use are generic models with generic activities? Misleading and lacking evidence of validity.

- Repeatable?
There is a claim that businesses have an interest in developing and using repeatable, effective and efficient processes (29119-1. 5.2.) It seems natural to want your activities (processes) effective and efficient. But repeatable - in which scenarios is repeatable desirable for the way testing is carried out? Does this mean repeatable test cases in scenarios where test scripts are re-executed as some safety net (e.g. in a continuous integration loop)? Fine. But test analysis, planning, design and reporting. Should this be done in a repeatable way? The case isn’t made. Rhetoric.

- Scripted?
In 29119-1, 5.6.6, it describes advantages and disadvantages of scripted and unscripted testing. It is claimed that scripting (in this case, a tester following a script) makes the testing more repeatable. Here, “repeatable” is debatable to me - I’ve seen people following a script that don’t do what the script says - there may be some extra step inserted or some extra observation made or some step that is hopped-over, but the script isn’t strictly followed. So, I think these advantages and disadvantages haven’t been challenged or compared with real experiences. Erroneous.

- Test policies and strategies
In 29119-1, 5.2, it claims that where formal organisational test policies and strategies are absent then the testing there is “typically” less effective and efficient. This is an interesting claim - and could read by some that this must be in place. There are 2-3 problems with this claim:
1. The connection (or correlation) demonstrated in cases where such “documents” are in place and the performance of the organisation/company being effective and efficient is not demonstrated. (I.e. there is no study of “effective and efficient” testing or “effective and efficient test organisations”  and what might affect their output).
2. There is no study or evidence to show that the presence of such a formal item (it probably is present even where not formally present - part of the test culture) directs the testing in an organisation - cases where it is present and has no effect, or where it is absent and the testing is deemed “efficient and effective” anyway. 
3. Where such a formal item is in place - it is not clear if that comes afterwards - i.e. is not the cause of “effective and efficient” testing, but a byproduct (related to #1)
No evidence, rhetoric.

Examination of Claims and Language 
Some might think I’m being “picky” when I read 29119 - or being nit-picky. Actually, this is a tester reading unsubstantiated claims - or rather noticing claims that are unsubstantiated. That’s not being picky, it’s calling out shoddy work, highlighting issues and raising a warning flag. It’s basic understanding of cause and effect - and being able to distinguish between them. Why is this important? Because if one reads 29119 and is not able to understand the claim, what it implies and doesn’t say, then there’s a strong risk that it is followed by rote. And as the output of the model is not described, following it without understanding its pitfalls is potentially harmful.

A generic model?
It is not stated but I suspect 29119 is trying to be an encyclopaedia of testing, presenting a menu from which to select which activities to do and perhaps a way to structure it. However, the output is not defined - the aim of the testing - and an assessment of which may help or hinder those results. The process models have no demonstrated validity - meaning it is open to interpretation what they will produce and, more seriously, what readers of 29119 think they might produce, how they might be applied and how to observe if the models are relevant or have any contribution to an organisation’s testing effectiveness or efficiency. Therefore, the generic nature of process models and the association with an idea of conformance is really dangerous. One might conclude that 29119 (in it’s current form) is a clear and present danger to an organisation’s testing if a question of conformance is raised.

In this post I’ve mainly focussed on parts related to 29119-1 & 29119-2. 

The lack of output description and consideration (of the process model) is serious.
The lack of process model validity - any evidence of validity or applicability (or not) of generic models of activities - is serious. It verges on being dangerous and misleading.
The muddled language is serious.
The lack of apparent rigour in something it is trying to describe - whether model, definitions process model applicability - is serious.
The notion of conformance which my reading of 29119 means is not possible - part due to the way part 1 is defined as informative, part due to the lack of rigour in the models, part due to the apparent waterfall approach to modelling, part due to the muddled language. This means that 29119 can not be used or applied in a repeatable, efficient or effective way - means that it would be misleading to claim conformance to 29119. Claiming conformance is a potential warning sign
The amount of rhetoric and unsupported claims are potentially confusing to the reader.

I get the impression that the result (29119) is due to a number of factors

1. It’s a result of the underlying reasoning and motivation for 29119 not being clear, ref [8]
2. It’s a factor of the working group’s interpretation of said motivations and reasoning - and maybe not being able to clarify or communicate that
3. It’s a result of some underlying assumptions (or beliefs) that the working group haven’t declared
4. It’s possible that the underlying beliefs were not visible in the working group or had different interpretations (because they were not visible)
5. It’s a result of ignorance in processes, how to observe processes and make hypotheses based on observation.
6. It’s a result of ignorance about model validity, experimental design and limitations of these.

The result? An Example
When I read 29119 I get the impression that it’s like Royce’s paper, ref [9], was read to page 2 - and crucially not page 3 or the conclusion - because 29119 seems to model what later became known as waterfall, and ignored any iterative corrections.

Royce warned of the dangers with the model - the type of model displayed in 29119 - over forty years ago. Why is there a belief that the type of model in 29119 “works”? My guess would be that it’s a result of poor observation amongst the other reasons above.

For example

1. Project X (in company Y, under circumstances Z) “worked” and we followed Royce’s-page2-model (waterfall), ref [9]
2. Project A (in company B, under circumstances C) “worked” and we produced lots of documentation

Someone could draw a conclusion that it’s the documentation that produced the good result in A, and others that it was a waterfall model that produced a good result in X. Someone else could conclude that combining “documentation” and “waterfall” would produce a good result in a new project. Of course, this “hypothesis” is dangerous and reckless. It’s not even certain (or there may be a lot of doubt) that the reasoning for X & A “working” was correct, and it’s very probable that we can’t, couldn’t or didn’t observe the most important factors that contributed to their results (meaning the understanding of Y & B, or Z & C was not what we think it is). Projects X & A might’ve worked in spite of “waterfall” & “documentation”.

This is the reason why generalisation, and not taking close study of underlying factors of “processes”, and not establishing validity of a model is dangerous. Not being able to connect an observation to a symptom and cause is dangerous. Therefore, I think the reasoning (or absence of reasoning), support or conclusions in 29119 is dangerous. There is no evidence that this model “works”, or will produce anything that a customer can use.

I suspect the intention of 29119 was based on experiences that have worked for the contributors, but the case for internal and external model validity is not made. So, advocating those models outside of the people that have used them is not supported.

Locked-in Planning and Feedback?
Fred Brooks, chapter 11 in ref [7], wrote in 1974 about the need to plan to throw one away - that’s about understanding the limitation of planning and process models, and not locking into a waterfall model. Cosgrove, ref Brooks chap11, [7], in 1971 claimed that programmers (and organisations) deliver satisfaction of a user need, and that this perception of need changes as software is developed and tested. There is no clear consideration of user need, satisfaction or change in 29119 - i.e. there is little connection to software development in 29119, especially agile, incremental or interative ways of working.

I get the impression it’s a contractual model - if you structure your testing along these lines and your stakeholders sign-off on this then you’re “protected” from blame. The model is not directed at producing something that a customer will be happy with, but rather something that a customer potentially signs-off on something before any prototype is produced.

It seems to me that there is no result-orientation - it’s about standardising locked-in paper trails early rather than working software. There is no guidance on how to square that circle with agile approaches. In fact, if you work in an agile shop and someone starts talking about the process model from 29119 and how to adopt it, I’d be worried - that’s a “bad smell”.

Practice, and Consolidation in a Standard?
There is no evidence of study or observation of test processes in practice or support of claims to say “X” is the factor that contributes to makes this process effective and efficient. This would require a social science type of study of people working (process), of organisational structures, project set-up and outcomes. And it seems to be missing even on the smallest of scales - which would be a case study to support a claim of a specific process.

I get the impression that the working group didn’t make (or search for) any study of factors that contribute to “effective and efficient” testing or testing results. To use the terminology of 29119-1, this is “error guessing”. However, as there is no assessment of “errors” (problems and pitfalls to avoid) I think of it as just plain guessing. Rhetoric, guesswork and superstition.

I can’t work out why no one thought of this in the 6 years 29119 was under development - because if they had then I might not be having these questions now.

And finally…
I can borrow from “Yes, Minister”, ref [5], when I think about the connection between a process model and reality as conveyed in 29119 and in the style of clarity I get from 29119-> 
“the precise correlation between the information … communicated and the facts, insofar as they can be determined and demonstrated, is such as to cause epistemological problems, of sufficient magnitude as to lay upon the logical and semantic resources of the English language a heavier burden than they can reasonably be expected to bear.” 
I.e no evidence of the validity in the claims made. So yes, the overwhelming impressions I take from 29119 are of unsubstantiated claims and rhetoric

[1] 29119-1: ISO/IEC/IEEE 29119-1:2013 Software and systems engineering -- Software testing -- Part 1: Concepts and definitions
[2] 29119-2: ISO/IEC/IEEE 29119-2:2013 Software and systems engineering -- Software testing -- Part 2: Test processes
[3] 29119-3: ISO/IEC/IEEE 29119-3:2013 Software and systems engineering -- Software testing -- Part 3: Test documentation
[4] 29119-4: ISO/IEC/IEEE DIS 29119-4.2 Software and systems engineering -- Software testing -- Part 4: Test techniques
[7] The Mythical Man-Month: Essays on Software Engineering [F.P.Brooks, 1975]
[8] The Tester’s Headache: ISO 29119 Questions: Part 1
[9] Managing the Development of Large Software Systems [Winston Royce, 1970]

Sunday, 31 August 2014

ISO 29119 Questions: Part 1

Reason & Motivations?

There is an ISO software testing standard (ISO/IEC/IEEE 29119). Currently parts 1-3 are published, and parts 4-5 are in draft. I have read parts 1-3 and the draft of part 4 and am and have been forming several questions. Some questions might be regarded as problems, some as issues and others as a need for clarification.

I will use a series of 3 posts to document. 

Part 1: Looks at the reasoning and motivation for the standard - based on the information publicly available.
Part 2: Looks at some of the content.
Part 3: Looks at the output and usage of the standard and some implications.

Note, in this post I’ll only refer to publicly available versions and parts of the standard - the part I am considering here is the introduction - that is viewable in the preview part of the IEC webstore, see references.

Whenever someone comes to me with a new idea - I usually hear about all the reasons why it’s a great idea. What isn’t usually obvious is to say why the change is needed - or what problem is being solved.

And so I had the same question in my head with 29119. I wanted to know the reasons and motivations behind it - especially ones that might support its usage or adoption.

According to 6.1.4 of the ISO Guide for Standards, ref [3], the introduction should state:
The introduction is a conditional preliminary element used, if required, to give specific information or commentary about the technical content of the document, and about the reasons prompting its preparation.
So, there should be reasons in the introduction of 29119. Looking at the introduction, there are three potential reasons given.

Reason 1?
The purpose of the ISO/IEC/IEEE 29119 series of software testing standards is to define an internationally-agreed to set of standards for software testing that can be used by any organization when performing any form of software testing [see refs 1 & 3]
Looking more closely at that:
  • “internationally”: ISO implies this already - so this part is redundant
  • “agreed”: the drafting of a standard requires consensus/agreement of 75% of the drafting members, so this is also redundant when the standard is published.

Note: One could ask questions about the representativeness of those drafting 29119, but that’s not a question I’m looking at here.
So “internationally-agreed” is redundant.
  • “set of standards for software testing” -> seems to be a duplication of “software testing”, redundant.

  • “can be used”: possible - this probably relates to how the standard can be tailored (more about this in part 2 of this post). 
  • “by any organisation”: quite a bold claim, but linked with the previous point on tailoring this could be read as any company can tailor the standard.
So - “can be used by any organisation” -> could be generously interpreted as “can be tailored”. But then this is still problematical. 

Standards application and conformance are usually full compliance, partial (or tailored) compliance or non-compliance. Therefore, the fact that you can declare conformance or not goes with the territory of a standard. Therefore, this is redundant information - in terms of reasoning for a standard. 

An alternative reading could be that the standard is “useable” by any organisation. Again, this is redundant information - either the standard is adopted and conformance declared against it, or not. And why would effort be placed on a non-useable standard? Therefore, this is redundant information.
  • “when performing any type of software testing”: again another bold claim. But part 1 of the standard “defines” the types of testing it is talking about.. The “performing” part is also implied as why else would you use the standard if not for software testing? So these parts are also redundant - by implication if you’re following the standard you’re following the definitions of the standard. (more about the definitions in part 2 of this post).
So, the paragraph becomes:
The purpose of the ISO/IEC/IEEE 29119 series of software testing standards is to define an internationally-agreed to set of standards for software testing that can be used by any organization when performing any form of software testing
Correcting the grammar for the parts that are stripped out would look like:
The purpose of the ISO/IEC/IEEE 29119 series of software testing standards is to define [a] … set of standards
Reason 1: Verdict?

So, when you strip out the redundant parts of the statement the purpose of the standard is to create a standard. That, in grammar, is called a tautology -> redundant information. That’s not much of a purpose - it still does not tell me why the it was produced.

Reason 2?
This series of international standards can support testing in many different contexts.[see ref 1]
  • “can” - I think this claim is a moot point - I suspect the reason is that, “if you claim conformance then it’s supporting testing” - so it’s not actually a reason /for/ the standard. It’s like saying, “if you follow the standard then you are standard-compliant”.

  • “support testing” - well as part 1 is attempting to define testing and part 2 is attempting to define a process then in a way it could be claimed to support testing. On the other hand, it could be read that if you perform testing in accordance with parts 1-3 then the standard is supporting your testing - actually this is wrong, because it has defined what is “your testing” and potentially “what is excluded” - in this sense it’s not supporting at all. 

Reason 2: Verdict

My reading of this sentence is, “if you follow the standard then the standard will support your testing”. Unfortunately, this is a circular argument, ref [4], i.e. not really a supporting reason for the standard at all.

Reason 3?
Together, this series of international standards aims to provide stakeholders with the ability to manage and perform software testing in any organization.
  • “stakeholders” - this is ambiguous who is meant. I’ll be generous and assume the user of the standard is meant here.

  • “ability to manage … software testing in any organisation”
  • “ability to … perform software testing in any organisation”
Interesting. This could be interpreted as anyone - absolutely anyone - by following this standard, can perform (and/or manage) software testing in any organisation. In fact, I’m not sure how to interpret it in a different way - there is no guidance (clarification) in the text.

Ok - I have seen (quite a few) people in my time that (i) can’t manage software testing and (ii) are pretty mediocre at software testing. In my judgement I have a hard time understanding how this standard would give the ability for “anyone” to become “testing managers” or “good testers” - except by creating a lowest common denominator. 

I can almost hear the explanation, “Find the least able person and we’ll calibrate to that person.” Really?

I have met and worked with - and I’m sure others will have similar experience - people (managers and testers) in various companies that are “going through the motions” - they tend to work /within/ a box (set of definitions or practices) and are not the ones who think or challenge assumptions (i.e. the ones who “think on their feet”). These are the people that spend more effort defining boundaries than communicating - they are plodders. If the aim of the standard is to produce a set of test managers or testers that either (a) think inside the box, (b) don’t think about or question what they’re doing, (c) plod along; then it is not something that can be associated with good (or efficient) work.

A Possible Reason 3 Verdict

The statement is telling - and probably says more than it really should.

Come on! If the aim of 29119 is to set the bar so low that anyone can look as though they are performing good work - BECAUSE they appear to be following 29119 - then that Mr Regulator, Mr Customer, Mr Potential-Stakeholder, I repeat, that is a “bad smell” - it’s a sign the company claiming conformance hasn’t got their eye on the ball! In a sense - you’d want to be extremely careful of anyone claiming conformance…

Introduction verdict?

According to 6.1.4, ref [2], I didn’t see any reason for the standard.

The skills I used to analyse the text were reasoning, logical and critical thinking - basic skills in any good testers toolkit - it comes into play (in my experience) when discussing requirements, understanding needs from stakeholders and reacting to and understanding conflicting needs in a project situation. Or even trying to understand standards. Actually, it’s difficult to do well and takes practice and concentration.

I had the impression that a number of testing experts were involved in the drafting and review of the standard. I think the resultant communication in the introduction has room for improvement - and doesn’t bode well for the rest of the documents. I’ll leave it to the reader to judge if the people involved in drafting the introduction did a good job of explaining the reasons and motivation for 29119.

Other Sources

Looking elsewhere for reasoning for the standard. Ok, let’s try some other sources.

Web #1
If I look at the “softwartestingstandards” web page, ref [5], I see this:
By implementing these standards, you will be adopting the only internationally-recognised and agreed standards for software testing, which will provide your organisation with a high-quality approach to testing that can be communicated throughout the world.
Ok, the part about “adopting the only internationally-recognised and agreed standards for software testing” is covered above - it’s the same logic - and reduces to a redundant phrase (says nothing in terms of motivation).

The next part is interesting though, “which will provide your organisation with a high-quality approach to testing that can be communicated throughout the world”.

I’m interested that “high-quality approach to testing” is used - this is not stated in 29119, either as a motivation for using the standard or as a result of using the standard. As far as I can see there is no evidence (whether as case-study, or something else) to support this claim. Therefore, it is rhetoric - an attempt to influence without any evidence (proof).

Web #2

A Presentation given at the BCS, page 8 ref [6], states the motivation for 29119 as:

• Demand for existing 'standards’
• Conflicts in current definitions and processes
• Gaps in the current standards provision
• A Baseline for the Testing Discipline
• Current industry practice is lacking
• Buyers unclear on what is 'good test practice'

“Demand for existing 'standards’” - it’s not stated (or referenced) in the presentation where or what this demand is, therefore it’s an unsupported claim. Rhetoric.

Conflicts in current definitions and processes” - 29119 is replacing some standard, so there is scope to believe that this replacement is reducing conflict between definitions and processes. However, the case for why this needs to happen is not made - it’s not demonstrated (in this presentation or elsewhere) what this conflict looks like or what problems it causes. Therefore, this seems to be a pretty weak argument. In terms of support in the presentation this claim is unsupported. Therefore in the scope of the presentation it is rhetoric.

Gaps in the current standards provision” - it’s not stated (or referenced) in the presentation where or what these gaps are, or if or why they are relevant to being a motivation for a new standard. Rhetoric.

“A Baseline for the Testing Discipline” - it’s not stated (or referenced) in the presentation where or what this need is, therefore it’s an unsupported claim. Rhetoric.

Current industry practice is lacking” - I could probably agree to this (but maybe for completely different reasons!) Without more information it’s not clear what this means - and isn’t stated (or referenced) in the presentation. Therefore, the claim is unclear and appears to lack any supporting evidence. Rhetoric.

Buyers unclear on what is 'good test practice'” - I like the idea of presenting and distinguishing good test practices. However, the argument about buyers is not supported in the presentation (or referenced) with evidence. Therefore it’s an unsupported claim. Rhetoric.

Original Proposal for the Standard & ISO Directive #1

According to Annex C of ISO Directive 1, ref [7], a proposal for a new standard just have a justification. Specifically:
C.3.2 The documentation justifying new work in ISO and IEC shall make a substantial case for the market relevance of the proposal.C.3.3 The documentation justifying new work in ISO and IEC shall provide solid information as a foundation for informed ISO or IEC national body voting.
Now to documentation proposing the work for the new standard, ref [8], produced in 2007:

The market requirement section states:
The national standards on which the new international standard is to be based are widely used both as the basis for commercial contracts and international qualification schemes. However, their national origin often results in them being ignored by potential users in some countries. At present there are a number of gaps (and overlaps) in the coverage they provide. A coherent set of international standards for the complete life cycle is required.
The first two sentences state that some national standards are used, but not widely. The third sentence states there are gaps/overlaps between existing standards. 

The forth sentence is interesting: “A coherent set of international standards for the complete life cycle is required.” But where is the case made according to C.3.2 & C.3.3? According to point C.3.3 there is no evidence to support this point.

Ok, no help there.

In the “purpose and justification” section it does state:
The purpose is to produce an integrated set of international standards to cover the software testingprocess throughout the development and maintenance of a software product or system.
In overall terms, the purpose of the project is to unify and integrate the currently fragmented corpus of normative literature regarding testing that is currently offered by three distinct standards-makers: BSI, IEEE, and ISO/IEC JTC 1/SC 7. The result of the project will be a consistent, unified treatment adopted by all three organizations.
These two purposes seem complimentary, but there are problems. 

In the first says that “the software testing process” will be covered - but this seems undefined. The part 1 of 29119 outlines concepts and definitions, doesn’t it? Well there’s a problem there - that I’ll cover in part 2 of this post. But to summarise here - no parts 1-3 do not define “the software testing process” - part 1 is informative, meaning it has examples and not definitions - there is nothing to stop someone else creating their own definition. Part 2 has problems with conformance meaning that it can’t (and doesn’t) define THE software testing process. More details in part 2 of this post.

So this purpose might be real, but it’s also unrealistic. Unjustified.

The second purpose - to integrate information from different sources (BSI, IEEE & ISO) might seem reasonable - but this is more of a paper exercise than anything to do with standardising. There is no justification why this should be done or what problem is caused to the users of those individual sources. So, this is an unjustified purpose.


I wanted to understand the motivations and justification behind 29119. 

I looked in the introduction of 29119-1, where it is supposed to be stated according to ISO’s directives. I didn’t find anything that didn’t reduce to tautology and rhetoric. Now, I don’t know personally any of the members of the workgroup that produced 29119. But they are supposed to be experts in their field, and yet they produced an introduction to describe the motivation of 29119 that does not stand up to scrutiny. 

Did they all have an off-day - that lasted 6 years? Groupthink? Or what was it?

I looked elsewhere - presentations given to the BCS and the softwaretestingstandards site. There were a lot of claims without supporting evidence or references. This is also known as rhetoric.

Then I searched in the original proposal for the workgroup that would produce the standard. There, I found claims without evidence, some were unrealistic and some were almost a wish list.

After reading this I wondered about the people that reviewed the proposal. I’m sure there’s a bunch of intelligent people that looked at this and yet they seemed to have missed a lot. Was it some form of groupthink, I don’t know.

Well, I've been looking for reasoning and motivations for 29119 that will stand up to scrutiny - so far without success. In the next part I'll look at some of the content of 29119 in more detail.

[2] ISO/IEC Directives Part 2 ISO/IEC 
[4] Wikipedia: Circular Reasoning
[5] Webpage: ISO/IEC/IEEE 29119 Software Testing Standard
[6] BCS - British Computer Society, page 8:
[8] ISO/IEC JTC1/SC7 3701

Thursday, 29 May 2014

Standards, Straw Man Arguments and Superstition

I listened to a webinar in November 2013, ref [1], about the testing standard, "ISO 29119".

At the time I made some notes on the recording, but there was one item that particularly struck me as slightly intimidatory. However I happily assigned it to my mental trash bin and just remembered the meta-labels, "Straw man, scare tactic".

I had forgotten about the recording... until today when I saw this on twitter from James Christie:

@michaelbolton What happens when the press asks your CEO, "did you follow standards?" when there's been a fiasco?
2014-05-28 15:33
Note: These were not James' words.
Gut feeling: Straw man argument!


1. Why would the press be asking about a software testing standard when there is no evidence of the testing standard "working" or preventing similar "fiasco's"? It seems equally plausible for the press to ask if the omens in the CEO's tea leaves were followed!

2. If you have a CEO, that's worth his salt1, and there has been a fiasco (presumably a software catastrophe), then I'd be very surprised if the thing that's highest on his mind is some internal witch-hunt.

He's probably gonna:
1. Put people on fixing the issue
2. Put people/teams on liaising with the customer (if it's one big and, understandably, angry one)
3. Put people on understanding what went wrong
4. Understand if the company did what they thought was right / good enough

I'd be surprised if a company that used technically competent people would have the/a root cause being that a template (or group of templates / standard) was not followed - especially if there is no empirical study (evidence) to demonstrate how the standard (template) might avert such a fiasco.3
If a catastrophic failure has been found to be caused2 by not following a template then I would love to hear about it.
I've been in those discussions with responsible managers and execs - and (maybe this is my good fortune of dealing with competent managers)4 they're more interested in that the technical influential/responsible people do what is right / good enough for the needs of the business, rather than be concerned about being fully or partially compliant to a particular template.5

Yes, I'm not naive, I know some people, projects and company cultures follow some template as an ass-covering exercise. I've been there.1

So... The straw man argument is constructing a need to explain whether or not a particular template is used, when the need for that template is not established.

Reasoned argument?

Ok, so it was time for me to revisit the webinar, where at timestamp ~33-34mins, one hears:
"If you are still doubtful about using these standards, then I have a question for you.  
Can you afford not to? 
Imagine you're responsible for the testing of an important application. It could even be business critical, or safety critical, and something goes noticeably wrong with the application in use.  
Even if really good testing could’ve missed the bug, how easy will you find it to explain to the business your testing missed the bug and no, your test processes do not comply with international testing standards; that they are the ones we've used for years and no, you don't have them fully documented and justified.  
So, can you afford not to use them?"

First impression: You should follow a template.... just in case.

Second impression: Slightly sinister tone. 

Ok, let's take a closer look. Closer inspection:

1. The first two sentences are plain rhetoric, trying to establish doubt in the listener - doubt due to not following a standard. This is "rhetoric" because there's no argument/justification about why a standard would help you avoid this situation.

2. "if really good testing could’ve missed the bug" - good testing can miss bugs. This is not necessarily anything to do with a process. Good processes can miss bugs.

3. "how easy will you find it to explain to the business your testing missed the bug and no, your test processes do not comply with international testing standards" - this statement is intending to sow doubt. But the international testing standard has no evidence of being useful, appropriate or a proven track record.

I could go on and pick it apart, displaying the parts which are negative evaluations, argument precursors etc, but you get the picture. It says very little more than an innuendo of repercussions if you don't use the standard.

In Summary

  • If someone tries to scare you, or generate doubt and worry, ask to see the evidence, the track record, the money. Look out for straw man arguments, misleading and off-the-point comments and allusions to superstition or unjustified belief.
  • Successful CEO's don't usually look for scape goats in their staff, don't usually follow checklists without a good business reason. 
  • For me the jury is still partially out on the new software testing standard - I will publish some thoughts on its content at a later stage. But, just because it's been labelled a standard does not make it comparable to other ISO or IEEE standards where interoperability or conformance to a standardised result is usually the goal - here the goal seemed to be to create a standard (job done! Is that the business case? WTF! )

1. If you have a CEO (or responsible manager) that doesn't trust the technical competence of the organisation then either (1) change the organisation, or (2) change organisation. And Yes! I have done both in the past!

2. Note, if a template is intended to replace competence then this should be declared also. (This puts an additional burden on the template, guidelines and usage of the template.)

3. The existence of a standard, template or checklist does not stop mistakes in following such a checklist

4. Competent managers (in my experience) usually do not chase paper trails to a template, and usually trust their staff. I'll give some tips on how to deal with managers that are tempted by reliance on checklists over competent staff in a separate post.

5. Businesses tend to adopt (or use) the standards that make business sense, rather than adopting a standard purely because there is a standard.

[1] Eurostar webinar: Webinar 75: ISO 29119 - The New Set of International Standards on Software Testing

Monday, 21 April 2014

On Test Results and Decisions about Test Results

A Test Result != Result of a Test
An Observation != Result of an Observation

It’s not the test result that matters, but the decision about the test result!!

Pass-Fail-Dooesn’t Start-Inconclusive-Can’t Execute

How you come to these results (and there are more) is interesting. But from a testing viewpoint what you do, what actions you take based on such results is very interesting.
“If a tree falls in a forest and no one hears it, does it make a noise?”
“If a test result is obtained and not used or considered, was it worth it?”
Did it confirm an expectation?
Did it contribute to a body of evidence that a feature, product or system is behaving within bounds?
Did it help understand risks with the system?

If the answer is no, then it might be time to take a good long hard look at what you’re doing…

Body of Evidence / Evidence
All test observations and test results are not equal. They contribute to the picture of a product in different ways. But that picture is not necessarily a paint-by-numbers book. It’s not something that you can necessarily think I’ve ticked all the boxes, I’m finished.
Note: In many cases, Testing is not painting by numbers! 
Unless you’re a doctor finding no pulse and rigor mortis has set in!

It’s about making sense of the observations.

The 1% Problem Soundbite
Suppose 1% of your tests fail. Suppose you’ve seen a problem that 1% of your customers will experience.

The 50% Problem Soundbite
Suppose 50% of your tests fail. Suppose you’ve seen a problem that 50% of your customers will experience.

Based on this information is it possible to say anything about the product?

No. But you might have something that says, “need more information”.

These are what I think of as context-free reports.

From those soundbites, you don’t know anything about the nature of the problems, product, market that the product might be used in, circumstances for usage, etc, etc.

Suppose the 1% problem is a corner case - not allowing installation in a geographical location if some other feature is active - that might affect how the product is launched in that market. Suppose it’s something “cosmetic” that might annoy 1%, but not prevent the product being used. These two different types of observations (results) might produce totally different results.

The 50% case - is this a break in legacy, some feature that customers are using and needs to be operated differently; or is it a new feature interacting differently with existing features (e.g. flagged by some automated scripts) that hasn’t been launched yet and might be only for trial/“friendly users”. Again these observations (of essentially first order feedback, after Weinberg, ref [3]) might have totally different results.

Decisions and Supporting Data

1. Are you a tester that decides when testing is finished and a product can ship?
2. Are you a tester that tries to give your boss, project manager, stakeholder a good account of the testing done on a product? Might you even give some insight and feedback about what that testing might mean?

Ok, assuming you’re in the second category…

What does your test observation tell you?
What do your test observations tell you?

I’m reminded of a model I’ve helped use in the past about understanding test results, ref [1]. But now I’m looking at the flip-side, not what a “pass” means but what a “not-pass” might mean. A simple version of a test result / observation might look like:

Note, this is a typical interpretation when a test result is deemed to be of the form OK / NOK (including inconclusive, don’t know etc.) The implication of this is that when the desired test result (OK, pass) is obtained then that “test case” is done, i.e. it is static and is unchanged by environment or circumstance. This might be typical in conformance testing, where responses within a desired range is expected, usually when the test subject has a safety component (more on this in another post).

But, if the results are not black-and-white or can be open to interpretation (as in the 1% problem soundbite) then a different model might be useful.

This model emphasises the project, product and testing elements to consider - these may change over time (within the same project), between product versions and between testing cycles (or sprints). I’ve drawn the line between product owner and product decision as a thick black line to highlight that this is the deciding input.
More on testing and silent evidence can be found in ref [2].

A larger version of this picture can be found here.

[3] Quality Software Management: Vol 2 (Weinberg; 1993, Dorset House)