The Tester's Headache: 2012

Sunday, 30 December 2012

Mapping Information Value in Testing

This has been a year where I’ve been thinking a lot about how testing information is communicated, some of the associated traps and some models to help explore this.

For SWET4 (Swedish Workshop on Exploratory Testing #4) I was prepared to talk about modelling information flow in testing and how I had used it. I’d called the topic “Modeling: Information-driven Testing VS Activity and Artifact-driven Testing”.

The drive of the proposed talk/discussion was about using models to illustrate traps associated with many projects - especially large projects, projects with large product bases or legacy systems. This comprised of two illustrative models.

The first:

Model #1

In the diagram notice the flow of customer requirements that devolve into a project which, from a testing perspective, turns into some form of analysis, design, execution and reporting. Typically there may be some artifact from these activities. These typically produce part of the “answers” that the project and product are looking for (some of these “answers” may come in non-artifact form also, eg verbal reporting).

I assert that this is the view of the non-tester in many situations - whether from a project or product (owner) perspective. This is no bad thing in itself, but... After a number of repetitions it can happen that this is /the/ model for delivering on requirements -> that a project is set up we turn the handle with a number of given activities, produce a number of given artifacts and then deliver to the customer.

Some problems with this:

The focus is on the containers - the activities and artifacts - rather than the contents. The contents (or content needs) should dictate the needed activities and artifacts, not the other way around*.

The traps associated with Taylorism - splitting activities into unitary items (to optimize and measure) becomes a goal in itself. These look like targets, but can lead to local optimizations and process gaming (skewing) due to measurement goals.

The model is used as an exemplar as it has succeeded with a (several) project(s) already.

Familiarity with the model (system) is thought of as a closed system - meaning it’s processes and flows are defined and known to the extent that they are considered (by their practitioners) as complete.

The previous two points contribute to the model becoming a best practice.

So, is this model wrong?

It depends... on how the model is used.

The activities and artifacts are not wrong in themselves. I am not saying these artifacts
cannot be produced or that these activities cannot be done.

The usage of it can be illustrative - or informative versus normative - to show what a “general” flow might look like. But I think there are several alternatives. It’s important to start at the needs end of the equation and build components that are congruent with those needs.

Mapping Information Flow

The second model:

Model #2

When I first presented the above model (#1) and contrasted it with an alternative (#2), in Sept 2012 to a group of my peers, I made it on a large whiteboard with one displayed above the other. I did this to point out that activities and artifacts (model #1) could be mapped into this model.

The second model takes the view of not encapsulating customer requirements, but tries to work with (analyse) what information the customer needs about the product (that they will receive). Part of this can be some traditional requirement labelling, but can also be thinking at a very early stage, “how will I demonstrate or show this to a customer”. This breaks down into a:

Needs picture. This might be something close to a traditional requirement description - the items on a delivery checklist to a customer.

Risk picture. What risks are associated with this product, due to it’s composition, development and it’s context (including legacy system).

Information picture. What elements are important to tell the story to the customer and how?

Phasing: Sometimes products have dependencies to other product or third-party components and so phasing in time can be needed (from a project perspective).

The needs picture can be decomposed into information elements related to the new feature, the new system and the aspects of the legacy system.

The important thing to bear in mind with these elements is that they must be congruent to add to the bigger picture - that they add value to the element in themselves (eg test of a feature in isolation), add value to the whole (the new system) and their context (to the product that the customer will use - which might be different from a generic system).

This model also introduces elements of design thinking, see references, to move from the first model (how-> activity + artifact based) to a focus on the why and what value elements to help the needs pictures (of the project and the customer).

Usefulness of this model?

The information model has the purpose of identifying value to the business. We work with the information goals and elements that add value to the business. In this way, we align the development activity (and artifacts) to business goals. Then, maybe we devote less attention on activities (and artifacts) that don’t contribute towards a business goal (ie they add no information that is needed).

From a tester and testing perspective:
An aim with this model is to focus the daily activity wrt answering the question: “What information (and so business value) will my current task add for the customer.**

It forms a chain between daily activities and end goal. This could make reporting about testing (especially status reporting) easier or more accessible to stakeholders. The stakeholders know which thread of business value we are working on and how it contributes to the end goal.

It focusses reporting on information (business) value rather than something that is more abstract, eg test cases.

From a business and organisational perspective:
This is helpful in illustrating the difficulties and complexities of good testing. That requirements are not “things” in isolation, that the value chain in the whole development of the product is should be visible, to illustrate why something is being done or what omitting something might mean. It’s also a useful tool to show why “just test that” isn’t a necessarily simple task.

Background

Communication about testing. Since early 2010 I’ve been looking at different problems (traps) associated with communicating about testing. See references.

Before Let’s Test (May 2012) I had been thinking about communication about testing, especially to stakeholders and other interested parties. During Let’s Test I asked a number of questions related how testers should present information between different layers in the organisation -> ie that models for communication and analysis don’t translate so well between different layers of an organisation (it’s not their purpose, so how do we tackle that?)

In May 2012 I took a course on Value Communication - whilst the focus of it was about communicating key points (especially for execs) I started thinking about information flows in testing.

Work in Progress and Feedback

I presented these two models to James Bach as we sat on a train to Stockholm after SWET4. After some discussion about which problems I was trying to look at he commented that I was debugging the information flow. Yes, that was definitely a major aspect, especially when considering the perspectives of the non-testers in this flow, ie how non-testers perceive and interpret information flow to/from testers.

This model is a work in progress, elements in it and its applications. All comments and feedback are very welcome.

Notes

[*] Contrast this activity model with the idea of a test case - which can be thought of as an instantiation of a test idea with a given set of input parameters (which include expectations) - then when the test case is used as an exemplar it becomes a test idea constraint! In this sense the model (of activities and artifacts) constrains the development process. From a testing perspective it constrains the testing.

[**] Customer, in this model, can be an external or internal customer. It could also be a stakeholder making a release decision.

References
[] Test Reporting to Non-Testers http://www.slideshare.net/YorkyAbroad/test-reporting-tonontesters2010
[] Challenges with Communicating Models http://testers-headache.blogspot.com/2012/07/challenges-with-communicating-models.html
[] Challenges with Communicating Models II http://testers-headache.blogspot.com/2012/07/challenges-with-communicating-models-ii.html
[] Silent Evidence in Testing http://testers-headache.blogspot.com/2012/03/silent-evidence-in-testing.html
[] Taylorism and Testing http://testers-headache.blogspot.com/2011/08/taylorism-and-testing.html
[] Wikipedia: Design Thinking http://en.wikipedia.org/wiki/Design_thinking

Saturday, 27 October 2012

Carnival for a Tester - Remembering Ola

I saw the news late on Wednesday evening about the tragic death of Ola Hyltén. I wanted to write something as a tribute - but actually found it too painful. I think Henrik and Pradeep did admirable jobs, here and here (read the comments too).

Many people have fond memories of Ola, and I thought it was worth taking the angle of re-visiting some of his blog posts as a tribute.

Ola's first step into blogging about testing was kickstarted by his experiences at Øredev and on Rapid Software Testing. From these he developed the motto "I think for myself" - which I used to see recurring through his online presence (google & gmail). This was and is a great approach by any tester wanting to learn and know more - something I recognized in him whenever we "talked test" (as he would say).

An exploratory approach was always in his ideas about testing, as seen here. This post was evidence that he might take one starting approach but was always open to discussion and new ideas (something that made it fun talking to him about testing.)

He had that young kid excitement approach to new things that he'd learnt that shone through when he'd found a bug, here.

Awareness that good testing starts with thinking was always with him. Here you can see again his excitement about new ideas and that he used that to remind himself against complacency.

SWET#2 was Ola's first SWET attendance (he was at SWET#3 also), and I think he found the experience inspiring.

Taking his "think for myself" approach with him, he attended an ISTQB course, and wrote about his thoughts, here. He saw danger (as I think most good testers do) in following something without thinking. He didn't just see the "think for myself" need as something specific to testing, as demonstrated here. That's another mark of someone who has a good approach to testing.

In all of this we cannot forget Ola's big contribution to Let's Test. He was there at the first discussions about starting a test conference (during SWET#2). He took the peer conference approach to a test conference, as shown here. He became El Commendante and he opened Let's Test 2012.

This week lost a good thinker, a good test enthusiast, a part of SWET, a part of Let's Test, and a thoroughly nice bloke!

To Ola - cheers mate!

Tuesday, 7 August 2012

Testing Chain of Inference: A Model

Pass? What's a pass?

I've heard these questions recently and wonder how many think about this. Think about automated test suites: isn't a pass, a pass, a pass? Think also about regression testing - especially where tests are repeated - or where they are repeated in a scripted fashion.

The following is an exploration of traps in thinking about test observations and test results. To help think about them I tried to model the process.

Gigerenzer illustrated a chain of inference model, ref [1], which he used to illustrate potential mistakes in thinking when considering DNA evidence in legal trials.

Observed Pass -> Confirmed Pass Model

I used the chain of inference idea to apply to a "test pass":

Elements in the Model

Reported Pass. This is one piece of information on the road to making a decision about the test. It could be the output of an automated suite, or a scripted result.

Product Consistency. Think: What product are you testing and on what level? This influences how much you will know about the product (or feature, or component, etc). Which features or services are blocked or available? Third-party products in the build - are their versions correct? Is any part of the system or product being simulated or emulated, and how does this reflect what you know about the product? What provisioned data is being used, and does that affect what you know about the product? What risk picture do you have of the product, and has the test information changed that?

Test Consistency. Think: Have my test framework or harness changed, and does this affect what I know about the product? Has the product changed in a way that may affect the testing I do? Has there been any change in typical user behavior, and if so, does this affect the information you think you're getting from the test (test observations). Is any behavior simulated, or not known, in the test?

Confirmed Pass. Think: Taking into account the test observations, the information about the product and test consistency can the result be deemed to be "good enough"? If not, is more information needed - maybe testing with different data, behavior or product components? (Note, "good enough" here means that the information from the test is sufficient - the test/s with the data, format, components and environment. It is not a template for an absolute guide (oracle) to a future "pass" - that would be falling for the representation error…)

Representation Error. A short cut in the evaluation of the result is made, specifically excluding information about product or test consistency. Think: This is equivalent to telling only a partial story about the product or testing.

Lucid Fallacy. Ref [2]. This is the simplification of the decision analysis (about the reported pass) to "a pass, is a pass, is a pass". Think: An automated suite, or a script, that produces a pass - that's a pass right? The assumption that an observed "pass" is a real "pass" is a simplification of the judgement about whether a test result might be good enough or not.

Proxy Stakeholder Trap. Think: "Pass? That means the product is ok (to deliver, release, ship), right?" It is quite fine for a stakeholder to set their own "gate" about how to use a "pass". The trap here is for the tester who makes the jump and says, "the pass means the product is ok to…" - this is the trap of reading too much into the observation/s and transforming into a proxy stakeholder (whether by desire or accident).

This model helps visualize some significant ideas.

A reported pass is not the same as a confirmed pass
The labelled lines show traps in thinking about a test - shortcuts in thinking that may be fallible.
There is no direct/immediate link between a test being "OK" and a product being "OK"
Observing an automated/scripted "pass" implies that there needs to be a "decision or judgement" about a "pass" - to decide whether it is good enough or not.

Q: Are all reported passes confirmed in your environment?

Test Observation -> Test Result Model

The above model can be used as a basis for thinking about the chain of inference in test results, reports and extending this thinking to exploratory approaches.

Elements in the Model

Test Observations. These are the notes, videos, memories or log files made during the testing of the product. They may cover the product, the test environment, the test data used, the procedures and routines applied, feelings about the product and the testing, anomalies and questions.

Product Elements. Think: Putting the components of the product into perspective. This is the product lens (view or frame, ref [3]) of the test observations. Are the test observations consistent with the product needs / mission / objectives?

Testing Elements. Think: The parts of the testing story come together - this is the test lens (frame) on the test observations. Are the test observations consistent with any of the models of test coverage? Do they raise new questions to investigate? Are there parts that can't be tested, or tested in the

Project Elements. The aims of the project should ultimately align with those of the product, but it is not always the case. Therefore, the project lens (frame) needs to be applied to the test observations. Are they consistent with the immediate needs of the project?

Test Results. Think: Pulling the test observations together - both raw and with the perspectives of the project, product and testing elements - compiling and reflecting on these. Is there consensus on the results and their meaning or implication? Has the silent evidence, ref [4], been highlighted? What has been learnt? Are the results to be presented in a way tailored for the receivers?

Context-free Reporting. Think: When the thinking about the test observations and results are not consistent with project, product or testing elements then the context of the result is not involved. The result becomes context-agnostic or a context-free report.

Individual -> Multiple Test Observation Model

Combing the above two models gives:

Summary

This has been a look at some traps in thinking that can occur when thinking about test observations, test results and when implicitly making decisions about those.

A test observation and a judgement about a test observation are different.
A test result and a decision about a test result are different.
A test result and a feeling about a test result are different.

References

[1] Calculated Risks: How to know when numbers deceive you (Simon & Schuster, 2002, Gigerenzer)

[2] Wikipedia: Lucid Fallacy: http://en.wikipedia.org/wiki/Ludic_fallacy

[3] The Tester's Headache: Framing: Some Decision and Analysis Frames in Testing

[4] The Tester's Headache: Silent Evidence in Testing

Friday, 27 July 2012

Challenges with Communicating Models III

The previous posts, [1] and [2], have looked at some of the problems and traps with the mental models we create and how they might (or might not) show up when we try and communicate (use) those models.

Then I was reminded of a passage in a book that might help...

A Heuristic For Communication?

George Polya wrote a marvelous book, ref [3], which addressed heuristic reasoning in mathematical problems. All are applicable for software testing - indeed the book could be treated as an analogy for heuristic approaches in software testing, and deserves separate treatment.

However, just as useful about the specific edition of the book I reference was the foreword by John Conway, a mathematician, ref [4]. He made some great observations about Polya's work, and I will raise one of the observations from Conway's foreword:

"It is a paradoxical truth that to teach mathematics well, one must also know how to misunderstand it at least to the extent one's students do! If a teacher's statement can be parsed in two or more ways, it goes without saying that some students will understand it one way and others another..."

And from this I derive something I've been calling the Conway Heuristic, ref [5]:

"To communicate something well (effectively), one must be able to misunderstand the information (report, result, interpretation, explanation) in as many ways as the other participants in the communication (discussion or dialogue) process."

The beauty of this is that it reminds me that no matter how well practiced my method is, how well-polished my talk or presentation is, there is likely to be someone in the crowd, stakeholder grouping, or distribution list that "doesn't get it". The chance of that is greater if I haven't spoken with them before or they are unfamiliar with any of the used methods and procedures for the communication.

This is a difficult heuristic to apply - it requires effort, training and awareness to do it consistently and successfully. I think more training on how we report testing information (and stories) is needed with emphasis on their clarity, devil's advocacy role-playing and awareness of rat-holes and how to handle them.

To be continued...

References

[1] The Tester's Headache Challenges with Communicating Models

[2] The Tester's Headache Challenges with Communicating Models II

[3] How to Solve It (Polya, 2004, Expanded Princeton Science Library)

[4] Wikipedia http://en.wikipedia.org/wiki/John_Horton_Conway

[5] Slideshare Testing Lessons from the Rolling Stones

Tuesday, 24 July 2012

Challenges with Communicating Models II

Anders Dinsen posted some great comments and questions to my previous post (here). One comment was, "if I get your model", i.e. if I understand you correctly. I loved the irony here - I was writing a post about potential problems with communication (or trying to) - and I was getting questions about what I meant.

I thought the comment exchange was great. Anders was doing what good testers do - not sure if he understood me, he posed questions and stated the assumptions he could see.

A Recent Example

I'm reminded of a course I took a couple of months ago - it was called Communicating Value - and covered a range of topics and techniques, including presentation techniques and content, presenter style and negotiation techniques. One of the exercises was to give a five-minute presentation where you were trying to sell something or get "buy-in" for an idea. I chose the horrendously difficult task of trying to sell the increased use of qualitative information to execs in my organization (which is quite large) and do this in 5 minutes. This was a challenge for me because (i) the topic goes against a lot of "learning"/reliance/intuition about only quantative methods, (ii) the topic is quite complex to explain, (iii) my own explanation techniques can be quite "wordy" and abstract, (iv) execs are used to executive summaries - how to introduce the topic and content and get agreement in five minutes was then a challenge.

For the exercise, I modeled my presentation on a group of execs (that I knew) and on information that I knew they were familiar with and targeted the goal to getting buy-in for follow-up discussions to lead to a trial. In my experience this model worked with people familiar to the topic, the content and some of the related problems. For those not so familiar it became more abstract to them and they lost touch with reality.

Lessons

Models of communication (or content) do not translate automatically.
Some models need more time to digest or understand.
Some models need more familiarity than others.
All models are not perfect, and so need the facilitation and guidance to aid communication and dialogue.

A Model for Communication

When I started thinking more about this I thought of this relation:

Communication = Artifact (model) + (Discussion and/or Dialogue)

Thinking about the example above, the good exchange with Anders and the previous post, I attempted to jot down some thoughts about modeling as a mind map, below. It might be useful, but as I've been saying, I also expect it to be fallible….

Thursday, 12 July 2012

Challenges with Communicating Models

This train of thought started at Let's Test 2012, a fabulous conference that has been written about in many places, ref [1].

A theme I identified during the conference, and even before in the LEWT-style peer conference, was the discussion of models, mainly mental. These are models that are used for a variety of purposes:

Trying to understand the testing problem space (mission)
Trying to understand the procedures and processes at use within a project
Trying to understand and document the approach to the testing problem
Trying to understand progress and map coverage of the testing space and of the product space
Trying to communicate information about the testing and about the product

Some of these models are implicit, undiscovered, unknown or tacit - or a combination of these. Some models are understood to different levels by the user, some are communicated to different levels of accuracy and some are understood to different levels of accuracy by the receiver.

Some people translate (and communicate) these models in mind maps, some in tabular format, some in plain text format, some verbal and some in a combination of these.

Problem?
All models are wrong, but some are useful. Ref [2].

Another way to think of this - all models leave out elements of information. But, I think the inherent challenge with models (mental or otherwise) is how we communicate them. My frames of reference for a problem, ref [3], may be different from my stakeholders and my stakeholder's stakeholder.

At Let's Test there was much discussion about the use and applicability of models, but not so much about the translation and communication of them, IMO. It's this translation of models between people or levels that is an interesting, and perhaps underrated problem.

If you have a model that is communicated and understood by only some of your stakeholders then there may be a problem with the model, the communication or both. Models often start out as a personal tool or don't capture the frames of all those involved in the information flow.

My questions in the keynotes of Michael Bolton and Scott Barber, and in Henrik Emilsson's session, at Let's Test 2012 were along the lines of "how do we overcome the translation problems with models between different layers in an organisation or between different groupings?"

Recently someone showed me a representation (pictorial) of a complex set of data, followed by a question, "do you see what I see?" To which I replied, "I'm sure I do, but have no idea if I interpret what you do."

Trap?
The trap that I see is that we often put a lot of effort into the capture and representation of data and information. But the effort in the communication and dialogue that must accompany isn't always considered, or to the same degree.

The trap is that we start to think of the model as an artifact and not a tool (or enabler) for communication.

I often refer back to the SATIR interaction model (that I discovered via Jerry Weinberg), an online example is in ref [5]. If we're missing the significance and response parts then there's a danger that our intended message won't get across.

Examples
Ok, this all sounds theoretical, so time for some examples in my usage of such models.

Mindmaps. I use mind mapping tools (and pen and paper) for lots of different activities - I don't use them as artifacts, and that's an important distinction.

I have a couple of A3 mindmaps pinned up by my desk, variants of ref [6] & [7], and occasionally someone will ask about them, "oh, that's interesting, what does it show?" There will usually follow an interesting discussion about the purpose and content, my reasoning behind them and potential future content. But it's the discussion and dialogue that makes the communication - as there usually will be comments such as, "I don't understand this", or "why do you have this and not that in a certain position?", or, "aren't you forgetting about..." - some will be explained by me and not the piece of paper, and some will be taken onboard for future amendment.

But, it's the information I leave out that NEEDS my explanation that makes the communication work - and hopefully successful.

Presentation material. I purposely write presentation material to be presented - rather than writing a document in presentation format. This means it can be quite de-cluttered, empty or abstract - because these are meant to be items that the audience can attach to and hear about the story or meaning behind them.

The presentation material is only an enabler for a presentation (and hopefully dialogue) - it is not the presentation. In my day-to-day work I occasionally get asked for presentation material from people who missed it - they may get something, but not everything I'd intended. So I usually supply the standard health warning about the material not being the presentation.

How, and how well, I present the story is another matter....

Dashboards and kanban boards. I like low-tech dashboards, see ref [4] for an example, and kanban boards are tremendously useful. But don't mistake a status board/chart for a status report - it's the persons describing elements of the charts that are reporting. It's those elements that allow the others/audience/receivers to grasp (and check) the significance of the information with the report presenter.

Test analysis. I work with many teams on a large and complex system. It's very typical that I will look at a problem from a slightly different angle than they do - they take a predominantly inside-out approach whilst I tend to look outside-in. That approach helps cover most of the angles needed.

Discussions occasionally happen around the team's or my own approach and understanding of the problem. "Why are only feature aspects considered and not the wider system impacts?", or "why are we so worried about this system impact for this particular feature?" These are symptoms that the models we're using to analyse the problem are not visible, transparent or communicated and understood by all involved. If the team is not familiar with it then I should be describing, "these are the areas I've been looking at because..."

Test case counting. Sometimes stakeholders want to see test case number or bug defect counts. I then usually start asking questions about what information they really want and why. I might throw in examples of how really good or bad figures might be totally misleading, ref [8]. Their model for using such figures is not usually apparent - sometimes they think they will get some meaning and significance from such figures that can't really be deduced. It might be that they really need some defect count (for their stakeholders), but then it's my duty to see that the relevant "health warnings" about the figures and limitations in which they can be used (if any) are understood by the receiver - for further communication up the chain.

Way forward?
Awareness of the problem is the first step. Referring back to the SATIR interaction model, think about whether all parts of the model are considered. The significance and response parts are the most commonly forgotten parts - are you and all parties understanding and feeling the same way about the information. If not, then there might be a problem. Time to check.

Communicating models through different layers of an organisation or between groups? In some cases the models will only be applicable to certain groups or groupings of people - it's important to know how and where the information is to be used. In certain cases this may mean making "health warnings" or "limitations of use" available (and understood) with the information.

I think there will be more thoughts to come in this area...

References
[1] Let's Test: Let's Test 2012 Re-cap
[2] Wikiquotes: Quote by George E. P. Box
[3] The Tester's Headache: Framing: Some Decision and Analysis Frames in Testing
[4] James Bach: A Low Tech Testing Dashboard
[5] Judy Bamberger: The SATIR Interaction Model
[6] The Tester's Headache: Communication Mindmap
[7] The Tester's Headache: Book Frames and Threads Updated
[8] Slideshare: Test Reporting to Non-Testers

Wednesday, 11 July 2012

Communication Mindmap

About 2 years ago (May 2010) I started preparing a presentation for Iqnite Nordic 2010. In the preparation for this I'd described that I'd prepared a mindmap that I had printed out on A3 format - to be able to read it.

I recently had a very good question (thanks Joshua) about the availability of this mindmap - well, I'd forgotten about the source as I generally just have the paper copy for reference. But I dug it out, and found that it is all still very relevant (to me).

I still give the presentation, "Test Reporting to Non-Testers", fairly regularly, ref [1] - it goes to some of the points about metric obsession and some of the cognitive biases involved in why stakeholders might like them - such as cognitive ease and availability biases. Some stakeholders find it a useful eye-opener. I may try and record a future session and make that available.

For an example of cognitive ease in software development and testing see ref [2].

The picture of the mindmap is below, I'll try to make the source files available in the future. There is the usual health-warning with all mindmaps and presentation material - they only give my perspectives and are only fully unambiguous when accompanied with dialogue and discussion - but have a certain amount of value as they are, so enjoy!

References
[1] Test Reporting to Non-Testers
[2] Best Practices - Smoke and Mirrors or Cognitive Fluency?

Wednesday, 6 June 2012

The Documented Process Trap

"I can't do this work well without a way of working..."

"If only we had a document describing how people should work they'd be much better...."

"It's very difficult to do this without a documented process..."

Any of these sound familiar? The third one might be true, but that says more about the person (or the conditioning they've been exposed to) than the process. It can be easy to get conditioned into this way of thinking.

Trap?
This is the belief that something can't be done unless it's documented. But where does this belief come from?

Beware, it's a trap.

It's a self-fulfilling trap. The more you hear it, the more some people tend to believe it, then it disables them so that they can't operate unless they have a documented process in front of them.

It can even become contagious.

Once you have a document (sometimes called a process description or way of working) telling you how to do things or what to do in what order, you start thinking about other aspects of your work that needs a document describing it...

Beware of this trap.

Note, sometimes process critical steps need ordering and documenting - this trap is not saying anything documented is bad, it's a warning about approaching new or original problems.

A logical extension of this is that test ideas, sometimes called test cases, are described in advance - so that we know what to do. Sound familiar?

Bad to document test ideas?
No. Awareness of this trap is stating that it is dangerous to pre-determine all steps to a solution before the solution is known. It can be very useful to write down a set of ideas in advance, but don't let the problem solving approach stop there - before you know if you're on the right track.

What about decisions or criteria for decisions? Should they be documented in advance so that you don't make mistakes?

Why document things in advance?
As aide-memoire, as checklists, or as decision tools? All of these are fine when used in the right spirit - as a help, but not necessarily definitive. Unfortunately, the move to something being definitive can be quite short. From aide to "standard" to "standard practice", or even "best practice". This is connected to the way we create norms, and then the dreaded cognitive ease (which we're all susceptible to).

Kahneman, ref [1], states that norms are built by repetition - when we see something "work", we start to think of it as "how it works". This is complicated by something called cognitive ease - once we start to "see how it works" it can become a natural default as it is much more difficult to think each time, "is that really how it works?".

Hypothesis
I proposed a hypothesis recently:

Many people naturally approach new problems in a heuristic manner, but due to repetition or familiarity this becomes a "standard" and the heuristic approach is lost or forgotten.

When we look at new or original problems we often don't know how to tackle them - we might take out our "toolkit" of approaches and try out some different ideas, we might think down different solutions depending on time, customer needs and available tools and people. We evaluate the results of some of those ideas, and either change or continue. There is rarely a pre-defined way to approach a problem.

Polya, ref [2], warns not to mix up heuristic reasoning with rigorous proof.

Awareness
Awareness of a trap won't avoid anyone falling into it. However, it is the first step in progressing away from unconscious incompetence, ref [3].

The next time you are presented with a new or original problem, ask yourself "am I thinking originally about this" or am I following a "standard" (or accepted practice) and should I question that.

References
[1] Thinking, Fast and Slow (Kahneman, 2011, FSG)
[2] How to Solve It: A New Aspect of Mathematical Method (Polya, 2004, Princeton University Press)
[3] Wikipedia: http://en.wikipedia.org/wiki/Four_stages_of_competence

Wednesday, 16 May 2012

Thoughts from Let's Test 2012

Date: 7-9 May 2012

Location: Runö, Sweden

Site: lets-test.com

This is a brain dump of thoughts (partial in content and with more analysis later)......

Conference T-1
I arrived just after breakfast on the day before the start of the conference for facilitation training from Paul Holland. Although I'd seen facilitation up close from SWET conferences - it was good to hear some of the theory and problems behind the method. We discussed a little about the decision points that facilitators need to weigh up - of course, Paul makes it look simple - and that's the result of practice and judging the dynamic in the room. I was really happy to get that opportunity - learning different aspects from each of the sessions (2 conference + 1 ?EWT). Cool!

After a facilitation briefing we then joined a LEWT-style peer conference - initiated by James Lyndsay - I will jot some notes on this later. Here I was able to practice my first facilitation, picked up some good feedback and had a great day!

Tutorial Day

Attended Fiona Charles' tutorial on Test Leadership. Didn't know what to expect, went with an open mind - ready to hoover up ideas. We seemed to be thrown out of balance (probably intentionally) by the exercises - which then triggered different approaches. I enjoyed the experience - problem solving practice, observing others solve problems and debriefs. Fabulous. I think the lessons from this will bubble up for a while....

Mingling

Met what seemed like half of DEWT - very cool guys (Ray, Peter & Huib).

Met 2 very interesting testers from Switzerland (Ilari & Mischa) - met both(?) Belgium testers (Pete & Zeger) - please if anyone else saw a yellow parachute/paraglider down by the water before dinner on the first day, let them know :)

Test lab
Thoroughly enjoyed. It seemed to be an exploration of collaboration with several scenario experiments injected. I found that fascinating. On the first evening we worked with test planning and execution - despite lots of issues with the environment and some early disorganization within 90 mins our group (Huib, Peter 'Simon', Modesty & Joep) actually gelled and generated useful information about the applications.

On the second evening I joined the test lab late (after attending the lightning talks) and sat with Rob Sabourin. Fabulous - soon after joining we found a bug for which we then proceeded to, in his words, gather 'compelling evidence'. We gelled well on that exercise and formed hypotheses to discount along the way, using good lab notes. We continued our debrief afterwards - Rob has the skill of forming a helicopter view of the situation. I'm now in proud possession of his book "I Am A Bug".

Evening events
I think there was something for everyone, and more than I could choose - I could easily have cloned myself to attend the talks (Michael Hunter and Alan Richardson), BoF, art walk, crash test party.

I was in the testlab and listened in to the lightning talks. A talk on some ideas of visualizing test coverage from Kristoffer Ankarberg, a look at "JAM and ACT" testing in an agile context (I didn't catch the presenter's name) and Duncan Nisbet (the "bloke in shorts" ;) ) giving a briefing about NW tester gathering (in England) and aspects of community and learning (just like at a peer conference).

Sessions
I haven't re-read through my notes yet, but I found thought-provoking ideas or "I should research more into that" moments in all the ones I attended:

Anders Dinsen "Testing in the Black Swan Domain" (I also facilitated)
Rikard Edgren "Curing our Binary Disease"
Christin Wiedemann "You're a Scientist - Embracing the Scientific Method in Software Testing" (I also facilitated)
Henrik Emilsson "Do you trust your mental models"

I enjoyed them all - and will refer to them in future posts! Actually the choice of sessions was very good, making choosing not an easy task!

And finally (for now)...

Stumbled on the "Just?" question with Martin Jansson - more on that later...

Was it fate? I had #1 on my K-cards and I asked the first facilitated question at the first Let's Test.... (The answer is no, of course! Or was it? ;) )

I'm still processing the experiences, thoughts and discussions from the conference. During and after I formulated several hypotheses and started seeing threads of new ideas - sign of a great conference!

This was a conference that embraced many elements of a peer conference: peers discussing testing with one another, aided by great location, great facilities and great organizers - yes, Ola, Henrik, Henke, Tobbe, Johan - that's you!

Friday, 13 April 2012

Indicators, Testing & Wine Tasting

The other day I was discussing a problem with a developer. Essentially, it was about how to judge feedback from a continuous integration system next to the other information that the testers in the team were producing. The reason I'd started asking questions was that with the myriad of information available there was a risk (as I saw it) to ignore some information - or cherry-pick the information to use...

Indicators

This led me to describe how the different pieces of information contributed to the picture of the software being developed, and as such they were indicators.

Some people confuse test results for absolute truth. "It worked", "it passed", "it x'ed" - all past tenses.... Getting from a past result to a future prediction of performance isn't easy - unless you're demonstrating on the customers equipment or some identical configuration.

Test results can be important markers or references - especially if a customer thinks of one from a demonstration - then there is an implicit expectation made (assuming the customer was happy with the demonstration) - that particular test case could be thought of as part of an acceptance criteria.

A test result can't be understood in a meaningful way if it is taken out of context and many results from automated suites or continuous feedback systems are inevitably context-reduced. That doesn't mean they're not useable, but for me that means they are indicators of current performance and future likelihood of behaviour. Also, although automated set of tests can be very powerful - they also leave out information (or people interpretting the results leave out information) - the so called silent evidence of testing, ref [1].

How we report and discuss these results adds the context and makes them context-specific. Getting from a test result (or results) to "it works" or "it meets customer expectations" is not as simple a task as a stakeholder (or developer) might wish. More on context-free reporting in another post...

Thought experiment - illustration

Of all the possible tests that could be executed on this system, I have a set (no matter how comprehensive) that I think of good-enough. Now suppose one (1) test failed - and only one test. Would you:

Report the problem (or sit with a developer to localize the problem),
Wonder what other potential problems might be connected that haven't been observed,
Wonder if the information from the failed test is sufficient (extra testing or extra tests),
Wonder what information this failed test is saying in context of other/previous testing done (or ongoing),
Investigate the significance of the test that failed - maybe even see if there was any connection to recent changes in the system (under test), environment, test framework or other parameterization.

Note #1, hopefully you chose all of the above!
Note #2, this problem expands if you ever have a subset of tests (say an automated set of tests used as a sanity check) and 1 test (or even X tests) in your sample (subset) fail.

The example here is to show that a result doesn't stand on its own - without some situational context. Sometimes the investigation around the problem is about adding that situational context so that you (or someone else) can make a judgement about it.

Flaky Feedback?

One of the problems for the team was a mistrust with the feedback they were getting from the continuous integration system. There had been a glitch in the previous week with the environment which had thrown up some warning signs.

But,

Warning signs are exactly that - they're not judgements - they are "items for attention"and always need more analysis.
An environmental issue is good feedback - oops, will this work in production, rather than just on our machines?
Why only mistrust bad results or warning signs? Doesn't the same logic apply to "good" results or "lack of warning signs"?

Mistrust can be an excuse to not analyse or when overloaded it's easy to say we'll down-prioritize analysis. Mistrust can also result from a system that becomes too flaky or unreliable - in which case you need to consider (1) can I use that system, (2) what should I use to get the information I need, (3) do I need external help?

Another question for these team set-ups (especially as/where continuous integration is being introduced) - is the team itself ready for this feedback? I'll explore this in another post.

Wine Tasting

After I commented on the different sources of information, how they all added to the big picture (the aspect of adding context to a result) and that our knowledge of that "big picture" was constantly changing the remark came back, "this is more like wine tasting than software engineering.."

In a way I couldn't agree more. There are a lot of similarities between "good" software testing and an expert wine taster - they both require skill, constant learning and practice. Descriptions are both objective and subjective - knowing how to distinguish and give the right tone to which is tricky. Both are providing a service. Neither the software tester nor the wine taster is the stakeholder (end user or their sole representative).

But this raises an interesting question - in a world of multitudes of information (and nuances in that information):

Is the team ready for this? The sheer amount of information and strategies in tackling or prioritizing the information. Topic for a different post...
Is this a simplistic view of testing (and even software engineering)? Testing is not a true/false, go/no-go or back/white result, it is not a criteria or quality gate (in itself).

Software Engineering?

I think we can be misled by this term. When one thinks of engineering it might be in terms of designing and constructing machines, bridges or buildings. The word 'engineering' has primed us to think of associations to engineering problems, many of which have an element of precision in them - often detailed blueprints and plans. But not always....

I occasionally watch a program, Grand Designs, that follows people building their own homes - whether via architects and sub-contractors or totally alone. A common factor of all these builds are that they are unique (even where a blueprint exists) as usually some problem occurs on the way, (1) a specific material can't be obtained in time so a replacement needs to be sought, (2) money runs out during the project so elements are cut or reduced, (3) the customer changes their mind about something (changing requirements), (4) some authority/bureaucracy is slower than hoped for, delaying or changing the project - so very little is completely predictable or goes to plan. A common factor: where there is human involvement/interaction then plans change!

Software - it's conception, development, use and maintenance is inherently a "knowledge-based" activity. The testing part of it is inevitably entwined with eliciting and making visible assumptions about it's use and purpose as well as giving the risks associated with the information uncovered, investigated or not touched. So, I'd like people to get away from the idea (frame) that software engineering solely as a blueprint, planned, right/wrong or black/white activity.

Using the term "software engineering" is fine - but put it in context: "software engineering is software development with social interactions (that may have a unpredictable tendency to change)".

And finally...

Don't assume bad results are not useful or useable.
Don't assume good results tell the whole story.
Product Risk tends to increase where analysis of results and their context doesn't happen.

References

[1] The Tester's Headache: Silent Evidence in Testing

Let's Test Conference Carnival #1

It's now just over three weeks to Let's Test, a conference first in Europe with the focus on context-driven testing, for testers, by testers -> yes, the organizers are all testers! There is a buzz building around the conference and so I thought a carnival was overdue....

Kick-off

Ola Hyltén got the ball rolling with a post about the main attraction of the conference - confering and peer interaction, here.

Interviews

Markus Gärtner has done a sterling job with a bunch of interviews with participants:

Participants

Some thoughts from some of the participants and why they think it will be a special conference:

More than conferring?

Henrik Emilsson wrote about the test lab, here, and other evening sessions, here.

Starting the Peer-Workshopping Early?

James Lyndsay has proposed a LEWT-model peer workshop for the day before the conference start. For more details read here.

Oh, why #1 in the title?
I expect there will be a bunch of posts during and after the conference for installment #2...

Sunday, 25 March 2012

Silent Evidence in Testing

A: All the test cases passed!
B: But the feature does not work for the customer...

I expect this, or something similar, is not a totally new type of statement to many people. Were the test cases sufficient, was there even any connection between the test cases and the feature, was some part of the scenario/system/network simulated and wasn't accounted for, etc, etc. The possible reasons are many!

Silent Evidence is a concept to highlight that we may have been working with limited information or that the information we gather may have limited scope for use. The effect is to act as a reminder to either (1) re-evaluate the basis of some of the decisions or (2) to act as a warning flag that the information is more limited than we want and so to warn about the usage of such information.

When I started reading Taleb's Black Swan, ref [1], at the end of 2009, I started seeing many similarities between the problems described and my experiences with software testing. One of the key problems that jumped out was to do with silent evidence, from which I started referring to silent evidence in testing.

Silent Evidence
Taleb describes this as the bias of focussing on what is presented or visible. He cites a story from Cicero:

Diagoras, a nonbeliever in the gods, was shown painted tablets bearing the portraits of some worshippers who prayed, then survived a subsequent shipwreck. The implication was that praying protects you from drowning.

Diagoras asked, “Where are the pictures of those who prayed, then drowned?”

WYSIATI
Recently, whilst reading Kahneman's latest book, ref [2], I found that his description of WYSIATI extends the idea of silent evidence. WYSIATI stands for "what you see is all there is" - and this deals with making a decision on the available information. Many people make decisions on the available information - the distinction here is that it becomes an assumption that the information is good-enough, and not necessarily investigating if it is good-enough. This is manifested by associated problems:

Overconfidence in the quality of the information present - the story is constructed from the information available. From a testing perspective - think about the elements of the testing not done and what that might mean for product risk. Do you consider any product risks from this perspective?
Framing effects due to the way in which the information is presented. If a stakeholder associates a high pass rate as something "good" (and being the only piece of information to listen to) then starting the testing story with that information may be misleading for the whole story. See ref [3] for other examples of framing effects and silent evidence.
Base-rate neglect: This is the weighting of information based on what is visible. Think about counting fault/bug reports and using them as a basis for judging testing or product quality. Without context the numbers give little useful information. Within context, they can be useful, but the point is how they are used and to what extent. See ref [3] for more examples.

Silent Evidence: ENN
I use the term silent evidence from an analysis viewpoint, typically connected with evaluation of all the testing or even connected with root cause analysis - to examine assumptions that gave rise to the actions taken. (I use this not just for "testing problems" but also for process and organizational analysis.) This is useful to find patterns in testing decisions and how they might relate to project and stakeholder decisions.

I use the acronym ENN to remind me of the main aspects. They fall into the "before" and "after" aspects:

Excluded (before)
Not Covered (after)
Not Considered (after)

E: Excluded
This is the fact-finding, data part of the report on the information available and decisions taken before test execution. This considers the test scope or parts of areas that we rule out, or down-prioritize during an analysis of the test area. It can also include areas that will be tested by someone else, some other grouping or at some other stage.

Picture a functional map that modifies a feature, I might conclude that another feature which is mutually exclusive to the one being modified (can't be used/configured simultaneously), deserves much less attention. I might decide a sanity test of that area sufficient, test that the excluded interaction can't happen and leave it at that. (Remember, at this point I'm not excluding expanding the test scope - this might happen, depending upon the sanity, interaction or other testing.)

An extreme case: Testing mobile terminals for electromagnetic compatibility or equipment cabinets that can isolate the effects of a nuclear pinch are not done every time new software is loaded. Excluding this type of testing is not usually controversial - but do you know if it is relevant for the product?

A more usual case: There is a third-party-product (3PP) part of the system that is not available for the testing of your product. Whatever we say about the result of the testing we should make the lack of the 3PP visible, how we handled it (did we simulate something or restrict scope?), what risks the information might leave and if there are any parts to follow-up.

It's never as simple as, "the testing was successful"or "the test cases passed". This is drifting into the dangerous territory of "context-free reporting".

It's not just applicable to software test execution - think about a document review - a diagram has been added and I might consider unrelated parts of the document to only deserve a quick look through. Should I really check all the links in the references if nothing has changed? This is a scope setting to do with the author - they might check this themselves.

N: Not considered
This is the retrospective part of the report or analysis. What was missed or overlooked that should have had more of the available attention? What part of the product story didn't get presented in a good way?

This is very much a "learning from experience activity" - ok, how did we get here and, just as importantly, does it matter? Some problems are more fundamental than others, some are organizational (a certain process is always started after another, when maybe it should start in parallel or be joined with the first), some are slips that can't be pinpointed. The key is to look for major patterns - we are all human, so it's not to spend so much time to root out every little problem (that's not practical or efficient) - but to see if some assumptions get highlighted that we should be aware of next time.

Example: A customer experiences problems with configuration and provisioning during an update procedure. The problem here is not that this wasn't tested to some adequate level, but that it was tested and the information from that analysis resulted in a recommendation to stop/prevent such configuration and provisioning during upgrade. This information didn't make it into the related instruction for the update (and maybe even including a mechanism to prevent configuration).

In the example, the part not considered was either that the documentation update wasn't tested or highlighted to show potential problems with update so that a decision about how to handle the customer updates could be made.

N: Not covered
This is similar to the case for excluded testing - but it is usually that we had intended to have this part of the scope included. Something happened to change the scope along the way. It is very common for a time, tool or configuration constraint to fall into this category.

The not covered cases usually do get reported. However, it is sometimes hard to remember the circumstances in longer, drawn-out projects - so it is important to record and describe why testing decisions in scope are made. Record them along the way, so that they are visible in any reporting and for any follow-up retrospective.

The problem even occurs in shorter projects where the change happens early on, then the change gets "remembered" as always being there.

Example: Think of a particular type of tool needed to fulfill the scope of the testing for a product. For some reason this can't be available in the project (becomes visible early in the project). The project then starts to create a tool that can do part of the required functionality that the first tool would've covered. Everyone is very happy by the end of the project. But what about the 'hole' in functionality that couldn't be covered? Does this still affect the overall scope of the project and is this information being made visible?

And finally
Thinking about silent evidence helps me think about these traps, and maybe to even avoid some the next time around. Being aware of traps doesn't mean you won't fall for them, but it does help.

So, use the concept of silent evidence (or even WYSIATI) to keep your testing and product story more context-specific and less context-free.

References
[1] The Black Swan: The Impact of the Highly Improbable (Taleb; 2008, Penguin)
[2] Thinking, Fast and Slow (Kahneman; 2011, FSG)
[3] Test Reporting to Non-Testers (Morley; 2010, Iqnite Nordic)

Friday, 9 March 2012

The Linear Filter Trap

Or: Illustrating Systems Thinking with Proximate and Distal Analysis

I was having a discussion the other day where we were looking at some problems and discussing whether it was sufficient to treat symptoms... I remarked that by treating symptoms, rather than root causes [notes n1], and not allowing time to find the real problems then we would, in many cases, be fooling ourselves [notes n2].

Why? (I was asked)
People like to solve problems that they can see - which means we have a tendency to "fix" the problem we see in front of us. This occurs more often if the problem appears to have a "straightforward" fix -> I think of this as a form of cognitive ease in action. Digging for root causes is a challenging activity, and we sometimes want to believe that the cause we identify is good enough to fix. For another example of cognitive ease, with best practices, see ref [2].

An illustration of this - that I have seen in one form or another - is that we settle for the first solution without understanding (or trying to) the root cause. There is no guarantee that fixing a symptom will make the problem better. Many times, the problem improves for a while, but then re-occurs in another form. Now there is a "new" problem to solve, which usually has the same (or similar) root cause, so from the system perspective it's ineffective [notes n3].

Ineffective, because many problems in processes and organisations are often non-linear, but we often try to solve "linear" problems, and...

Linear Filter
I expanded, that I think of this from a systems thinking perspective as applying a "linear filter" to a "non-linear system".

What? (I was asked again) Linear vs Non-Linear?
Non-linear -> multiple interactions affect the output vs linear -> the output is directly proportional to the input. So the application here is that there are usually multiple causes for a problem -> when I perform an assessment after a root cause analysis (RCA) activity I usually take them root cause by root cause, in the order that we think will have the biggest impact on improvement.

Ok, time for a....

A Real-life example
A fault (bug) report was written by a customer -> initial RCA shows that some "test" was not executed that would (in theory) have caught the problem. This is a symptom and a "linear" view of the problem. The "linear filter trap" is to then consider this as the root (or most important) problem. [notes n6]

Digging deeper shows that the team had it's priorities changed during execution, to make an early drop (resulting in some negative and alternative use-cases being delivered later). This, in itself, is not a problem but the communication that was associated with the "early drop" didn't reflect the change.

In this case, some of the underlying problems were a set of communication issues:

Partly in the team connected with their story [notes n4] (especially their testing silent evidence [notes n5]),
Partly connected with the stakeholder that changed priorities and may have had a duty to follow the change of expectations through the delivery chain and what that might mean at those different stages, and
Communication with the customer to ensure that they are aware of the staged delivery, and what means to them.

Another example of a root cause analysis can be found in ref [3].

And finally
Tackling and fixing symptoms is a very natural activity - very human. But it is not always enough. Sometimes it is enough - it depends, of course, on the scale of the problem and the cost of investigating the underlying problems and tackling those. Sometimes, the underlying problems cannot be fixed and it is sufficient with easing the symptoms.

But I believe, as good testers, it is important to understand the difference between symptoms and root causes, especially where it affects either the testing we do or affects the perception of the testing we do. This is important where a perception of "testing or testers missed something"... So,

Be aware of the linear filter trap!

Notes
[n1] In philosophy and sociology, root causes and symptoms are usually referred to as distal and proximate causes, see ref [1] for more background.

[n2] Slightly naughty of me, playing on the fact that most people don't like to think that they are fooling themselves, but that's a different story...

[n3] The times when it might be effective are when we (some stakeholder) is prepared to take the cost of fixing some problem now. A problem here is that stakeholders that are project-driven have, by the nature of the task, a propensity to see only as far as the end of the project. A product owner may have a different perspective - bear this is mind when someone is deciding whether it's a project problem or a product problem - or even a line organisation problem.

[n4] Story here means the story about the product and the story about the testing of the product.

[n5] Here, testing silent evidence refers to the elements not tested and thus not reported - their significance is assumed to be not important. For further background see ref [4].

-->edit-->
[n6] I should add that the problem with the trap in this example is that I have seen this in the past trigger one of two responses: (1) A perception that the testers are at fault, which becomes a myth/rumour with a life of its own ; (2) A knee-jerk reaction to implement some extra oversight of the test displicine or team as a whole -> in the worst case it becomes a desire to introduce some additional "quality gate". This is a good example of when reacting to the perceived symptom is both ineffective and counter-productive for the organisation.

References
[1] Wikipedia: Proximate and ultimate causation
[2] The Tester's Headache: Best Practices - Smoke and Mirrors or Cognitive Fluency?
[3] The Tester's Headache: Problem Analysis - Mind Maps and Thinking
[4] The Tester's Headache: Mind The Information Gap

The Tester's Headache

Pages