Failure Propagation

This year our customer wanted some complicated new changes to our system. The boss put me on the job. We actually have a requirements analysis team that is tasked with gathering and documenting the requirements.

Like all complex changes, there was not enough information in the original requirements documentation. Since we had a team that handled this, I sprayed them with all sorts of questions to figure out exactly what the users wanted. They were very responsive in getting answers for me from the customer. This is no small feat. Unfortunately it seems like not all of the answers got recorded in the documentation. I did not care too much since I got all the answers I needed.

I went off and coded a solution. Did a bunch of unit tests. Had to manually set up a lot of data to test the different scenarios. When my code passed all my unit tests, I shipped in to our in house test team. This is where we started to have some problems. The test team could not decipher the requirements documentation. So they just came over and asked me what the software was supposed to do. Then they asked how I performed my unit tests. Then the test team proceeded to run the same unit tests and counted them as their independent tests.

You can see where this is going. But wait. There is more. After our internal test team passed my software, it got delivered to the customer. The customers themselves have a huge acceptance test team. I got deja vu once I got the call from the customer acceptance test team. They too could not decipher the requirements for my changes. So I told them the same thing I told our own unit test team. Explained the requirements as I understood them through the numerous questions I posed and the answers I got back. I also answered the acceptance test team's questions on how I went about unit tests my changes.

So it comes time for the real users to experience my new changes. And it turns out that I had made a coding error in the middle of my changes. I made some assumptions about some other parts of the system that were not accurate. The result was that my processing did not work according to plan. My unit tests masked this error since I set up the test data like I assumed the other parts of the system would do for real. Turns out our internal test team, as well as the customer acceptance test team did the same. These multiple levels of independent tests did not catch the problem.

I will admit that I was part of the problem here. It is tough to turn down requests for information from overworked testers who do not have the information to do their job. However the true solution is not to provide them with details on how I do my job. I think we need to get to the root cause why they do not have sufficient information, and correct that problem. Then we can avoid situations like the one we are in now. The real gauge as to whether we learned our lesson is how these test team handle testing my latest changes to fix the problem. Are they just going to come to me and ask me what went wrong and how did I fix it? If they once again just repeat my unit tests, we will have learned nothing. Let's hope we can be strong here.