I took part in a rather sad conference call today with our customer. It might be the last one we do on this contract. Another contractor has won the maintenance contractor. So this is our last week. That was not the reason for the sadness however. It was the fact that I had to project that my team was most likely not going to fix any of the remaining software defects that had been reported by the customer.
The real shame is that the next contractor will require a lot of ramp up time to set up their infrastructure, learn the business rules, and make any progress towards fixing problems. So I could hear the desperation in our customer's voice. I piped in that we would continue to try to research and resolve the problems. But given the progress rate, and the difficulty of the remaining issues, it was counterproductive to raise any body's hopes. After this depressing conference call I went out to an expensive steak house and chowed down. I needed a little break.
When I got back to work, I eliminated most of the distractions. And then I got down to work. The problem I was working on was most challenging. The users were stating that the system was not behaving properly. Work was all queuing up for just one user. I identified a root cause of the problem being the actions of one particular user. So I had a sys admin go find out what the user was doing. The user reported that they were following the correct procedure. So I took that at face value. But as I went through the code again and again, both running it and inspecting the heck out of it, something was not adding up. The results they were getting could not have been produced by the code I was looking at. I even got a DBA to pull the version of all the PL/SQL code compiled into our Oracle database. Still no luck.
A key breakthrough happened when the sys admin said that whenever she looked over the shoulders of the user in question, the user followed the correct procedures, and there was never a problem. That made me retrace a key assumption I had been making all along - that the user was following the correct procedure. Silly me. Then I started running a barrage of tests, simulating all kinds of bad behavior by the user. I was finally able to recreate the problem. Then I narrowed down the actions to the minimal set of steps to recreate the problem. Armed with this most valuable information, I traced down to the code that was causing the problem. Yes the user was not following the correct procedure. But our software should handle this in an appropriate manner and not degrade the system work flow.
Some analysis revealed to me that this problem was introduced in the past year when some business rules were changed. The problem was that the rules were not uniformly changed in all the paths through the code. I got the approval to push out the software fix to our users. Normally I go through our normal procedure and get my code changes peer reviewed. But we are at the end of our contract and are literally locking down our source code repository. So I opted to schedule a peer review after the fact. I went ahead and initiated a software release with the bug that I had fixed. But I made sure I did not cut any corners during my unit tests. I documented everything. Made sure I did some critical regression testing. I have a feeling that our customer is going to be happy that they are getting one last release from us that fixes at least one of their problems.
Mysterious Double Instance Hampering Performance - I study the existing code base. Confer with a colleague. Then I determine the optimal plan to change the functionality to load only a slice of all the dat...