RBCS Covid-19 response: Until further notice, all public training classes will be run virtually. Remote proctored certification exams are available (view details).
Reader Gianni Pucciani has another good question in his review of the Advanced Software Testing: Volume 2 book. He asks:
My doubt is on question 4 of chapter 10 (People Skills and Team Composition).
The correct answer is B, and I had chosen B by excluding all the others which were for sure wrong.
However, my question is: how do you know that your team found 90% of defects by the time you need to give bonuses?
You know for sure the number of defects found prior to release, but how do you know the total number of defects if not after an agreed period (1 year?) of production use?
How would you implement this approach in a real life situation?
Here's the question from the book:
You are a test manager in charge of system testing on a project to update a cruise-control module for a new model of a car. The goal of the cruise-control software update is to make the car more fuel efficient. Assume that management has granted you the time, people, and resources required for your test effort, based on your estimate. Which of the following is an example of a motivational technique for testers that will work properly and is based on the concept of adequate rewards as discussed in the Advanced syllabus?
A. Bonuses for the test team based on improving fuel efficiency by 20% or more
B. Bonuses for the test team based on detecting 90% of defects prior to release
C. Bonuses for individual testers based on finding the largest number of defects
D. Criticism of individual testers at team meetings when someone makes a mistake
Gianni is of course right, the answer is B. He is also right that there is some lag time after release required to calculate the defect detection effectiveness. Defect detection effectiveness is calculated as
DDE = (defects detected)/(defects present).
In the case of the final stage of testing, you can calculate this as
DDE = (defects detected in testing)/(defects detected in testing + defects detected in production).
The bottom side of that equation (the denominator) is a reasonably good approximation for "defects present" is you wait long enough.
So, how long is "long enough"? Most of our clients find that they can determine the typical period of time in which 90% of the defects will be reported on a given release, usually through analysis of the field failure information. In some organizations, this is as short as 30 days, though 90 days seems a more typical number.
In the last decade, outsourcing became a powerful force in the software industry. Motivations behind outsourcing vary, but the reason our clients mention most is that of cost savings. Unfortunately, all too often our clients also mention that previous attempts at outsourcing failed to deliver the desired efficiencies, or perhaps failed to deliver anything at all.
So, is outsourcing some siren on rocky project shores, luring to doom the captains of IT who dare to listen to the siren’s song? Not at all, but outsourcing is not without its risks. Over the last twenty years, I’ve worked on both sides of the outsourced IT relationship, and have seen it work. Let’s examine what successful outsourced efforts have in common.
Successful outsourcing involves planning and handling the unique logistical details of outsourcing. For example, e-mail and intranet communication, synchronized software lifecycles, procedures for file transfer, effective configuration management, support for development, test, and staging environments, sufficient test data, common tool usage, and compliance to applicable standards are necessary for success on many software projects. Project teams must understand the tactical details of how the work will get done, day-by-day, person-by-person, and resolve any logistical obstacles that could occur in advance. Good project logistics are like air and water: You don’t notice them until they’re bad or, worse yet, completely missing. However, outsourcing logistics are complex and often span organizational areas of responsibility (or even falls into gaps in areas of responsibility), problems happen often, and cause many outsourcing difficulties and failures.
Successful outsourcing also involves good working relationships with mutual trust and open communication. Studies show that simply locating people on separate floors in the same building can dramatically reduce communication and relationship building. Having people located thousands of miles and half a dozen or more time zones away is even harder on relationship building and maintenance. However, successful outsourcing requires that people actively nurture good working relationships across the organizational and geographical boundaries. If relationships are weak, trust is missing, and communication is infrequent, every project challenge becomes harder to deal with. In the long term, relationships sour and morale suffers. In addition to creating an emotionally-unpleasant working situation for everyone, quality and efficiency both go decrease.
Successful outsourcing requires understanding what CMMI does—and doesn’t—tell you about an outsource vendor’s capabilities. Properly applied, CMMI will lead to more orderly, consistent practices, which can increase quality and efficiency. We have clients who use CMMI to improve their processes, reduce costs, and deliver better software. That said, the jury is still out on whether there is a statistically valid and reliable correlation between CMMI levels and: 1) the cost per delivered KLOC (or Function Point); or, 2) the reliability or defect density of the delivered software. If that seems to contradict what I said about some of our clients, my point is that it is a logical fallacy to say that, since some companies have success with CMMI, therefore every company that achieves a high level of CMMI maturity will produce better, cheaper software than another company with a lower level of maturity. In addition, even Bill Curtis of the Software Engineering Institute, one of the fathers of CMMI, admitted (at the ASM/SM 2002 conference) that, when used purely as a marketing device, CMMI does not significantly improve quality or efficiency. So, if an organization says they are CMMI accredited (at whatever level), dig further to see exactly what that means in terms of their daily practices, and look at solid metrics for efficiency and quality.
This brings us to the final factor for successful outsourcing: selecting the right outsource service provider. As I mentioned above, just looking at CMMI levels won’t suffice, but even if you satisfy yourself that a vendor is mature, efficient, and delivering quality, remember, as an investment prospectus would say, past results are not necessarily an indicator of future performance. In other words, just because a vendor has had good results on past projects doesn’t mean they will succeed on your projects. Here are some other questions to consider:
If all this seems difficult and complex, keep two things in mind. First, even relatively small projects can have significant costs, especially opportunities costs, if they fail, so outsourcing is always a decision to be made with care. Second, in most cases the real efficiencies of outsourcing will only kick in after a few projects, so organizing for outsourcing success is worth doing well, because, if you do it right, you only have to do it once. Once you have established a successful working relationship with an outsourcing vendor, you will find yourself reaping the benefits, project after project.
Many of us got into the computer business because we were fascinated by the prospect of using computers to build better ways to get work done. (That and the almost magical way we could command a complex machine to do something simply through the force of words coming off our fingers, into a keyboard, and onto a screen.) Ultimately, those of us who consider ourselves software engineers, like all engineers, are in the business of building useful things.
Of course, engineers need tools. Civil engineers have dump trucks, trenching machines, and graders. Mechanical engineers have CAD/CAM software. And we have integrated development environments (IDEs), configuration management tools, automated unit testing and functional regression testing tools, and more. May great software testing tools are available, and some of them are even free. But just because you can get a tool, doesn’t mean that you need the tool.
When you get beyond the geek-factor on some tool, you come to the practical questions: What is the business case for using a tool? There are so many options, but how to I pick one? How should I introduce and deploy the tool? How can I measure the return on investment for the tool? This article will help you uncover answers to these questions as you contemplate tools.
Let’s start with the business case. Remember: without a business case, it’s not a tool, it’s a toy. Often, the business case comes down to one or more of the following:
There can be other business cases, but one or more of these will frequently apply. Sometimes the business case masquerades as something else, such as improving consistency of tasks or reducing repetitive work, but notice that these two are actually the first and last bullet items above, respectively, if you consider them carefully.
Once you’ve established a business case, you can select a tool. With the internet, it is easy to find candidate tools. Before you start that, consider the fact that you are going to live with the tool you select for a long time—if it works—and potentially spend a lot of money on it. So, I recommend that you consider tool selection as a special project, and manage it that way. Form a team to carry out a tool selection. Identify requirements, constraints, and limitations. At this point, start searching the Internet to prepare an inventory of suitable tools. If you can’t find any, then perhaps you can find some open source or freeware constituent pieces that could be used to build the tool you need? Assuming you do find some candidate tools, you should perform an evaluation and, ideally, have a proof-of-concept with your actual business problem. (Remember, the vendor’s demo will always work, but you don’t learn much from a demo about how the tool will solve your problems.) With that information in hand, you’re ready to choose a tool.
Once you’ve chosen the tool, it’s time to pilot the tool and then deploy it. In the pilot, select a project that can absorb the risk associated with the piloting of a tool. Your goals for the pilot should include the following:
Based on what you learned from the pilot, you’ll want to make some adjustments. Once those adjustments are in place, you’ll want to proceed to deployment of the tool. Here are some important ideas to remember for deployment:
Finally, let’s address this question of return on investment (ROI). For process improvements (including introduction of tools), we can define ROI as follows:
ROI = (net benefit of improvement)/(cost of improvement)
This question of net benefit returns us to where we started: business objectives. Any meaningful measure of return on investment has a strong relationship with the objectives initially established for the tool. Let’s look at an example. Suppose you have developers who currently use manual approaches for code integration and unit testing. This consumes 5,000 person-hours per year. With the tool, one developer will spend 50% of their time as integration/test toolsmith, using Hudson and other associated tools to automate the process. By doing so, developer effort for this process will shrink to 500 person-hours (plus the 50% of the person-year for the toolsmith). So, ROI is:
ROI = (net benefit from investment)/(cost of investment) = ((5000-(500+1000)))/1000 = 350%
Notice that, in this case, since the tools are free, I did the calculation entirely using person hours. Sometimes, with commercial tools, you have to perform this whole calculation in dollars or whatever your local currency is.
As software engineers, we want to build useful things, and tools can make us more effective and efficient in doing so. Before we start to use a tool, we should understand the business objectives the tool will promote. Understanding the business case will allow us to properly select a tool. With the tool selected we can then go through one or more pilot projects with the tool, followed by a wider deployment of the tool. As we deploy—and after we deploy—we should plan to measure the return on investment, based on the business case. By following this simple process, you can not only achieve success with tools—you can prove it, using solid ROI numbers.
A quick follow-up related to my earlier post on evidence. As some readers may know, avionics software that controls flight on airplanes (e.g., cockpit software) is subject to a test coverage standard, FAA DO-178B. That standard applies lower standards of test coverage to software that is not safety critical.
So far, so good.
Here's an example of why such standards are useful. During my flight from the US to China today, I managed to crash the entertainment software running at my seat not once by three times. I did this by pausing, rewinding, and resuming play when the flight attendants were taking my dinner orders (i.e., not by unusual actions). I was ultimately able to get it working again, thanks to a series of hard reboots by a flight attendant. One of my fellow passengers wasn't so lucky, as his system never recovered.
Okay, that's just entertainment, and anyone who travels regularly knows they should bring a book or plan to winnow down their sleep deprivation balance on long flights.
However, what if the flight control software were as easy to crash? Who would want to hear a cockpit announcement along the lines of the following: "Our entire flight control system just crashed. This enormous airliner is now essentially an unpowered and uncontrolled glider. We'll reboot the system until we get it working again, or until we have an uncontrolled encounter with terrain"?
Personally, I want people testing the more safety-critical aspects of avionics software to adhere to higher standards of coverage, and to be able to provide evidence of the same.
I received another interesting e-mail from a colleague a few weeks ago. Sorry about the delay in response, Simon, but here are my thoughts. First, Simon's e-mail:
I have been reading the Advanced Test Manager book & have been discussing the possibility of adopting an informal risk based approach in my test team, but I am encountering some resistance, which has also got me thinking. You have covered (in several places) the topic of gaps in risk analysis from a breadth point of view, but how about the issue of disparity in 'depth' for identified risk items? For example in your ‘Basic-Sumatra’ Spreadsheet there is a huge variation in depth
between, for example the risk item ‘Can’t cancel incomplete actions using cancel or back.' (A functional item that has a risk score) and 'Regression of existing Speedy Writer features.' (This is also a functional item, but may constitute several hundred test cases).
In my case an experienced tester is against the idea of informal risk analysis due to the effort involved. The scenario is one where a regression 'plan' (set of test cases) is already in place for an enterprise scale solution with 10 main components deployable in both a
Web & Windows client manner. So the usual regression test execution 'plan' requires executing a complex test procedure 10x2 times. In total there is several hundred test cases to execute (some components have approx 100 test cases).
When I suggested an informal (PRAM) style risk identification to each new project the response was:-
The effort of establishing such a 'test plan' seems to be enormous considering that the whole thing has to be performed per application component for each Win and Web client (i.e. 10 x 2 times). I estimate that the number of items requiring risk scoring will be approx 100 for each of the bigger components let alone the whole of the application.
In response to this I pointed out that we could have a 'coarse grained' risk item identification & score - perhaps 20 lines on the risk assessment spreadsheet- 1 for each component\deployment combination.
The response to that was:-
If each of these 20 lines has got an RPN and all the test cases assigned to it just inherited this RPN, this would mean that we would perform an 8 hour test on ‘Securities Win client’ before even beginning with the test of another component, which has got a lower
RPN. Further, this could mean that low-priority components might not be tested at all in a tight time schedule. This cannot be the desired test procedure. It must be ensured that each component is at least tested basically on Win and Web … which would again lead us to scoring risk items at the test case level within each component for Windows and Web & that has the problem of the effort involved.
Do you have any suggestions for handling this depth of risk identification issue?
This is an important question, Simon, that brings up three important points.
First, the amount of effort invested must be considered. We usually find that the risk analysis can be completed within a week. The time involved depends on the approach used. If you use the group brainstorm approach, then each participant must invest an entire day, with the leader of the risk analysis typically investing a couple days in addition on preparation, creating the analysis, doing follow-up, etc. If you use the sequential interview approach, then each participant invests about three hours, with 90 minutes in the initial interview and 90 minutes in the review/approval process for the document, with the leader of the risk analysis again investing about three days of effort.
Second, the question of granularity of the risk analysis is also important. The granularity must be fine-grained enough to allow unambiguous assignment of likelihood and impact scores. However, if you get too fine-grained then the effort goes up to an unacceptable level. A proper balance must be struck.
Third, the question of whether we might not test certain important areas at all because they are seen as low risk is indeed a problem. What we typically suggest is what's called a "breadth-first" approach, which means that to some extent the risk-order execution of tests is modified to ensure that all major areas of the software are tested. These areas are tested in a risk-based fashion, but every area gets at least some amount of testing.
Many of these topics are addressed in the sequence of videos on risk based testing that you can find on our digital library. I'd encourage interested readers to take a look at those brief videos for more ideas on these topics.
I recently received an interesting e-mail from a colleague:
To Whom It May Concern-
Do you have any articles on the value of collecting/capturing detailed test evidence (e.g., screenshots attached to test cases)?
In my opinion, for mature systems with experienced, veteran testers, the need for an abundance of test evidence in the form of screenshots attached to test runs in QC is overkill and unecessary that adds more time to release cycles. The justification for this is awlays "For Audit" as opposed to "Improves Quality". I looked in several articles on this fantastic site, and couldn't find anything pertaining to test evidence. Do you have any articles that provide evidence that an abundance of test evidence improves quality (even if it's just a correlation and not necessarily causation)?
We have clients that do need to retain such detailed software testing evidence; e.g., clients working in safety critical systems (such as medical systems) who must satisfy outside regulators that all necessary tests have been run and have passed. For them, retaining such evidence is a best practice, as not doing so can result in otherwise-worthy systems being barred from the market due to the lack of adequate paperwork.
As someone who relies on such systems to work--indeed, as we all do--I appreciate these regulations and would not want to see software held to a lesser standard. However, Erik makes a very valid point in terms of the trade-off. As time is spent on these audit-trail activities, that is time not spent doing other tasks that would perhaps result in a higher level of quality. Of course, these audit-trail activities are designed to ensure that all critical quality risks are addressed. So, the key question is how should organizations balance the risk of failing to test certain critical quality attributes against the reduction in breadth of quality risk coverage?
I'd be interested in hearing from other readers of this blog on their thoughts. Erik, if you have further comments on this matter, I'm sure the readers of this blog would benefit from those ideas, as this is clearly an important area to consider. I certainly agree it's an interesting topic for an article, and this blog discussion may well inspire me to collaborate with you and other respondents to write one.
I had an interesting set of questions from a reader arrive in my inbox today. I've interleaved my answers with his questions, with "RB:" in front.
Dear Mr. Black,
Would you please comment on the following three questions, or perhaps direct me to where I might gain some meaningful information that addresses them?
What is today’s trend in pricing for the software testing industry i.e. is it increasing, decreasing, stable, etc.?
RB: There certainly are what marketers refer to as "value customers" who make service purchase decisions solely on price, and these customers continue to drive down pricing on average. However, at the top end, especially for clients that need and value senior consultants, we have managed to resist that.
Is the service looked at as value added or a commodity, with pricing accordingly?
RB: For the "value customer" mentioned above, it's a commodity. For other customers, it's really a matter of doing a good job of connecting what is happening in testing with strategic business objectives. I talked about this in my chapter in the book, Beautiful Testing. To the extent that testing is very tactical and inward focused--especially when the focus is almost entirely on finding a large number of potentially unimportant bugs--it will be seen as a commodity.
Given that much of the labor is offshore in India and China, and subject to increase as these countries develop, will market be receptive to required price increases to allow a reasonable margin?
RB: The "value customer" will not be receptive to such price increases, because price is all that matters. The value customer will try to have their cake and eat it, too, by raising the minimum bar of qualifications while not allowing price to rise. Because there are billions of under-utilized human brains in the world, and because technology has almost eliminated barriers to entry for using those brains as commodity software testers, the value customer will get to have their cake and eat it, too.
SGS Consumer Testing Services
Randy, thanks for the questions. I talked about some matters relevant to these questions in my webinar on the Future of Test Management, which you can view here.
I'd be interested in other people's comments. What do you think about these questions?
One of the topics I find very interesting and useful for our clients is the proper use of metrics. We do a lot of metrics-related engagements, and in fact just this morning I'll be talking with a client about some US$ 100 million in defect-related waste that we've found in their software development process. I've written a lot on the topic, including in my books and in various articles.
Regular blog reader Gianni Pucciani asks an interesting metrics question in an e-mail:
The question is: how can you give a bonus to your test team, to motivate it, based on 90% of bugs found before the release to production date? How can you know that you found 90% of the bugs at the time you release the software?
Gianni is referring to a metric called defect detection effectiveness or defect detection percentage. This is a metric I've discussed quite a bit in my books, especially Managing the Testing Process.
Defect detection effectiveness is a very useful metric for measuring the effectiveness of a test process at defect detection. Most testing processes have defect detection as a primary objective, and we certainly should have effectiveness and efficiency metrics for objectives.
That said, it is a retrospective metric that can only be calculated some time after a release, if you intend to calculate it on a release-by-release basis. (Some of our clients calculate it on an annual basis, aggregating all their projects together, which also works.) It's typical to wait 90 days after a release to calculate defect detection effectiveness, though you really should verify what time period is required to have say 80% or more of the field-reported defects.
I could go on for days about this metric, but, since it's a blog and since Gianni asked a specific question, I'll address the other point he brought up, which is the use of this metric for bonuses. Defect detection effectiveness is a process metric, which is not the same as a metric of individual or collective performance. Many things are required to enable good defect detection effectiveness, including good testers, and many things can reduce defect detection effectiveness, some of which are beyond the control of testers. I'd encourage a web search on the string "Deming red bead black bead experiment" for a discussion on the risks of rewarding or punishing based on metrics that might not be entirely in the individuals' control.
In addition, while defect detection is typically a primary objective of testing, it's not the only objective, and defect detection effectiveness is only an effectivness metric. It doesn't measure the efficiency or the elegance with which the test team detects defects. A test process should have a fully articulated set of objectives, with effectiveness, efficiency, and (ideally) elegance metrics for each objective, rather than a single unidimensional metric by which it is measured.
For further information on defect detection effectiveness, I'd refer people to my book Managing the Testing Process, 3e. My colleague Capers Jones also contributed an article to our web site on a couple related defect metrics that readers might find interesting.
Reader Gianni Pucciani has another good question about the Advanced Software Testing: Volume 2 book. Specifically, he's concerned with question 2 from Chapter 8:
Which of the following is a best practice for retrospective meetings that will lead to process improvement?
A. Ensuring management commitment to implement improvements
B. Allowing retrospective participants’ to rely exclusively on subjective assessment
C. Requiring that every project include a retrospective meeting in its closure activities
D. Prohibiting any management staff from attending the retrospective meeting
Gianni writes, "I had marked A, but also C. Where is the mistake? I have a feeling on it, but I would like you to confirm. Is C not correct because it is an organizational best practice, and not a best practice for retrospective meetings. A logic trick basically :), is that correct?"
Actually, Gianni, the reason C is not correct is because merely having retrospectives does not guarantee process improvements. In fact, I've encountered a few situations were organizations were good about having retrospectives, but not so good about management commitment, and thus no improvements occurred.
The Financial Times today featured an article on how a software bug--abysmally handled--in a financial application cost the company US$ 242,000,000:
Because I don't know how long that link will live, here's the summary.
Axa Rosenberg Group had some quantitative analysis software that it used to service its clients accounts. Axa Rosenberg Group manages money for other people, and the software is an internal application, albeit one they touted as a key differentiator, apparently--and indeed it did turn out to be, though not in a happy way.
The software had a bug that disabled a key risk-management component of the software, which was released to production in 2007. Apparently management found out about the bug in November 2009. However, rather than fix the problem, they tried to cover up the reasons for the poor performance of their funds.
Over one third of their customers were affected by the bug.
A wee bit of analysis from yours truly: I have clients in the financial world, and I know how hard it can be to test these kinds of applications. When a calculation is wrong, it can be wrong in a way that is beyond the ability of a human tester to detect. However, Axa Rosenberg Group's handling of the bug after they found out about it is truly a textbook illustration of how not to handle a software quality problem.