Maintenance Trouble

Our customer has a system acceptance test team. They reported a problem they found in our system. Our internal test team could not figure out what they were talking about. Unfortunately the responsibility to test our fix went to the newest guy on our internal test team. This guy had no chance of figuring out what to do. He sent me an email and left me a voice mail asking for help. All he could figure out was that the DBA Team had given them something to test. And he was truly clueless as for what to do next.

I don’t want to spend my life on this project doing other peoples jobs. So I knew I should not just give the tester all the answers. I recommended he start at the beginning and see if he could understand and duplicate the problem that the system acceptance team found. He said he tried running the applications, but could not see where they were finding any discrepancies. I proceeded to spend the next hour or two going over how the system works, how it is supposed to work, and how he could experience the problem in his own environment.

Previously I had assigned this problem to the DBA Team. Their yearly process had deleted some data that was required to be kept around for a couple years. The DBA Team lead told me he was never informed of that requirement. I responded that this was his official notification. As he got into working the solution for the problem, he realized that the fix to restore the data was a programming nightmare. I would have loved to have written a bunch of PL/SQL code to do the work. However I was tied up with other duties.

Later the DBA Team lead came up with a solution that would make his job easier. However it would require adding some new tables to the schema, and also some changes on the application development side of the house. He asked me to negotiate the new tables with the data architect on the project. So I called her up and let her know the situation, and where we wanted to go with the solution. She had some suggestions but was quite flexible. The I came in and wrote some new code in one of our PL/SQL packages. That was really fun. I updated a couple database triggers, and gave the code to the DBA to promote.

Here is what I have learned from this problem. You really need to know the business of the system to understand the complex issues. Most of the work in resolving system problems at this level does not involve sitting down and writing code. I have some other lessons learned about database design. However I will save those for a future post.

Fixing the Release


Last week we had a big software release due. The build went to our internal test by the end of the week. The test team found one application blowing up due to an Oracle exception. I told our team lead that we had better get to the bottom of this problem. The lead thought that this was just some discrepancy between the expected and actual database version. He thought a database change would resolve the problem. With that in mind, I left for the weekend.

It turns out the database change did not fix the problem. A bunch of people on the team got together late Friday night to try to figure out the problem. They left me some emails and voice messages. But by that time I was long gone. When I got back to work on Monday, we were in a state of emergency. My team lead said our company was losing money because we were late on the software release.

Apparently my team lead had spent the weekend trying to determine the cause of the problem. He still thought it had something to do with recent database changes. That did not seem encouraging. In software development you cannot think. You have to know. Thus we were nowhere with the problem. I got assigned the task to figure this out. I tried to duplicate the problem by running equivalent SQL against the database. But I had no luck.

I started applying my normal techniques. The next thing I tried was to run the application against an old version of the database. It had the same problems. At that point I eliminated any new database changes as the issue. Finally I started reviewing the history of the files that had the code that was crashing. A developer recently tried to fix a problem in that file. I rolled back those changes and found the source of the problem. At that point we were able to continue with the software release.

It turned out we were only one day late. That is still not a good thing. I did not lose any sleep over this problem. What could we have done to avoid this in the first place? We could have eliminated rushed last minute changes to the application. Or at least we could have run sufficient regression testing on the late changes. Better yet we should have done a better peer review on those changes. Let’s see if our project learns anything from these mistakes.

Stumbling Blocks

A production problem got assigned to me. The fix was due in three weeks. It is a bug that was hard to analyze. A big problem I had was existing commitments of my time. Currently I spend about a third of my time helping our requirements team. I am also supposed to spend half of my time helping another development team. Normally I also find myself spending a third of my team dealing with the emergency of the day. These tasks alone overbook me.

So if I were to do all that I am currently asked, I would be spending well over 40 hours a week meeting these duties. Now I am tasked with an extra difficult problem to solve. Once you look at it in this light, you can see why three weeks is not enough. If this were a trivial problem, I could knock it out real quick and there would not be any fuss. But this one I got assigned is no simple case.

Here is a review of the difficulties I encountered researching this problem. I could not build a debug or release version of the application that is having the problem. After some research I found that this project assumes that I also have the source code for another project on my system. Then I found the application was throwing all kinds of assertions when I ran it. Some more research showed the test data in my development environment was not good.

Finally I got to the heart of the problem. There were two set of nearly identical code. The production release was using one version of the code which had the problem. The development release was using another version of the code that had this problem fixed. So in this case, we need to make sure our configuration management is up to snuff to avoid more problems like this. However my real beef is that I was over committed by management.

How did I resolve my problem? I told my boss that I was over committed. And I pitched some alternatives to resolve the problem. I said that they should stop loaning me out to other projects when I am too busy with my own. Another tough decision was to stop assisting our requirements team. That will cause some long term pain as the requirements will be no good. I also need to find good ways to stop being tricked into spending time on requirements. But that is a story for a future post.

Software Engineering Stats

I have about some interesting statistics related to software engineering. The exact numbers are not of extreme importance. However the trends themselves were eye opening. Some numbers are not what you would think. I am going to go over some of them here to provoke thought.

The first is one I have heard before. A good programmer is 30 times better than a mediocre one. That seems massive. But I know there is a huge gap between the great programmers and the average ones. There is an ongoing debate over how great this difference is. This is just saying that it is huge (30 times equates to 3000 percent).

In comparison to great developers, great tools only provide a 5 to 30 percent increase in developer productivity. There is a lot of hype in how much productivity gains a tool provides. The root source of this hype is the companies that are selling the tools. That would be expected. What is not normally discussed is that there is an initial decrease in productivity when developers learn how to use a tool.

Software maintenance takes up 40 to 80 percent of the cost for a project. However this maintenance is not all just fixing bugs. As much as 60 percent of this maintenance is for enhancements to the code.

There are many causes for runaway projects. Unstable requirements are one of the causes. Another cause is optimistic estimation. You are going to be in trouble if the estimates are provided by the management or marketing teams. Schedule pressure in general can spell doom for a project.

Developers do conduct unit testing. However they normally cover 60 percent of the possible paths at most. Code reviews can eliminate 90 percent of software errors. Unfortunately rigorous reviews are skipped by most developers.

Finally, a high order language can achieve 90 percent of the speed reached by pure assembly language. This assumes that you turn optimization on in the compiler. The moral is that you do not need to step down to assembly language for most applications to achieve good speed. Moreover, you can get great gains by choosing the correct design.

Code Review Tool

Previously I had watched a video about the code review tool used at Google. Recently I read a review of a number of open source code review tools. It mentioned the one used at Google. This has prompted me to want to do some more research. We used to have a lot of code reviews on my project. Right now we don’t really do them any more. Part of the problem may be that it is a highly manual process. An easy to use review tool may help us get back on track.

The Google tool is called Rietveld. It was written by Guido Van Rossum. Of course Guido chose to write this in Python. You only need a Google account to use this tool. But it is implemented to work for the Subversion code repository. The review said that this was a bare bones tool. I would not mind using the Google tool. However we don’t use Subversion here on my project.

My reservations have been nullified by another tool called Code Striker. It was created by David Sitsky. This tool was written in Perl. It is also open source, and it is a web application. The beauty of this tool is that it works with Clearcase. That is what we use on our project. It allows you to review code diffs. Like any good diff tools, it highlights differences in color.

The goal of Code Striker was to minimize paperwork done with reviews. It also has the benefit of recording comments and issues in a database. I checked out some screen shot online at the Code Striker web site. It was funny to see some sample comments on a fictitious review. Some guys were saying things like “excellent work”. Maybe the comments were directed towards the Code Striker tool.

There are some other open source alternatives in the code review market. However I think I am going to propose we start using Code Striker on my project. I will let you know how it goes.

Developer DBAs

A manager I am starting to read more and more is Redmond News. It reminds me of the days where I was trying to keep up with the rapid pace of Microsoft technologies being released. The last issue discussed how normal developers are taking on database administrator roles. This is in addition to their normal development duties.

The article did agree that DBAs are still required in the enterprise. However there is a general drive to do more with less. Agile developers are taking over traditional DBA roles. In addition, past DBAs are starting to do development work. This is because there are fewer and fewer hard core DBA jobs out there.

Microsoft is released the next version of SQL Server in the first half of 2010. It is code named Kilimanjaro. There has been some talk in the SQL Server community about the desire to integrate SQL Server management tools into Visual Studio. SQL Server 2005 had the Visual Studio shell for SQL Server management.
On our own team, we used to have a number of full time DBAs. Now we are down to just one. And he is a subcontractor. There are a couple other individuals who pitch in with DBA work on an as-needed basis.

Definitive Answer

Our team is considering an upgrade to our reports technology. We are still stuck using Oracle Reports 6i. This is a client server version of Oracle Reports that is getting really old. We want to move to a web based approach. To determine the benefit of this upgrade, our manager wanted to know how many of the 70+ reports in our system are actually being used by the end user.

So our manager called a meeting of the technical staff. One DBA said he thought the application may be logging this somewhere in the database. The reports developer said that we might be writing this information in a database event log table. Nobody could say for sure. That did not put the manager in a good position to speak intelligently with the customer.

When I got back from lunch, my manager asked me if we had the ability to see what reports the users were actually running. I told him it would take me a few minutes. But by that time I could tell him for sure. I also thought we might audit this information. So I scanned all database tables until I found one that looked promising. I checked the code but found only one application actually used that audit table. Then I ran a test to verify that we are actually using that table for this application.

Then I went into each of the other applications, and found that one other application logs the reports run to a file on the local disk. The other application does not log the reports anywhere. I got back to my manager. He was excited to know that at least some of the reports get logged in the database. He had me run a query to find out exactly which reports were being run, and how often.

It is a bit disturbing that many members of the technical team are unable to provide authoritative answers to questions about our system. These are people that have been working on the project for a long time. I know that our project is a bit complicated because it is big. However that is no excuse. Perhaps it is time for a chalk talk with the team to let them know how to fully research questions such as this. The first step will be for me to have them read this blog post. If you are one of my team reading this, it is time to gain the ability to perform due diligence.

So So Workers

I read an article in the Wall Street Journal entitled Slacker Nation. It discussed the new trend found in younger workers in Japan. They did not want promotions. Therefore they were skipping career advancement exams which were required to move ahead. This is an unusual development. Previously the Japanese were known as workaholics and wanted to get ahead.

The stereotype of a promotion in Japan was a higher salary and better title. However this often came with a requirement to work late daily. These days the promotions may not mean that much more in salary. Regardless of pay, many young workers do not want to get the promotion. As such they are not putting in a full effort at work.

Companies are trying to figure out this newer generation. They are calling this new breed of worker the “so-so folks”. These individuals want to forget about goals. They value staying true to yourself. They think that not everybody needs to be a leader.

I find this thinking quite entertaining. It seems nice that somebody has figured out that killing yourself at work is just not worth it. When I first got out of school I worked hard. Then I started burning out and got laid off. At that point, I felt just like these Japanese workers. I played a lot of video games. And I did not feel the need to jump back into the work force.

Hey. I know some people want extra responsibilities and more money. That’s fine for them. But can’t a less ambitious individual do his time at work and then go home? There is more to life than working. Perhaps the sooner we figure this out, the better off we will be in the long run.

Staying Focused

The customer had asked us to make change to our system for next year. I had a lot of questions about the details of the change. I asked our requirements team a lot of questions. They did not have the answers, so they scheduled a meeting with the customer. A lot of people chimed in with their ideas before the meeting. Some people were trying to do some database design. I ignored most of this since we had not nailed down the customer requirements yet.

I was pleased when the technical advisor that works for the customer got on the conference call. He said we don’t have a lot of time in the schedule. And so he wanted us to make the changes, but to implement them all in code. He did not want any new database columns being created to hold intermediate values. He also did not want any existing database columns to be populated any differently.

During the meeting, the customer clarified the requirements very clearly. The technical advisor essentially made the impact to my team negligible. Only one back end developer needed to do the work. That’s the way project management should be. Without this guy’s direction, there would have been all kinds of database changes. And my team would have been required to implement them all. We are way too busy for any of that.

I am not sure what we are going to do when this customer technical advisor retires next year. Maybe we can hire him to work for our company. He consistently helps cut the overhead and wasted efforts from our software development schedule. He also has the clout to make his decisions stick.

Getting it Right

This past year our customer gave us a list of new feature they needed in the applications. I was initially not a part of gathering the requirements since I had not joined the company yet. However a requirements analysis team had produced a requirements document. There was one requirement that was suspiciously vague. I asked the requirements team a bunch of questions about what was needed. They could not answer any of the questions definitively.

The requirements team set up a conference call with the customer. I voiced my concerns that we did not understand one of the requirements at all. It was a dreaded one liner that did not mean anything to development. The customer quickly explained what they needed. I then tried to confirm my understanding. They concurred this is what they wanted. I then proceeded to design and implement a solution.

After we had delivered our changes to internal test, we scheduled a design review with the customer. I walked them through most of the design for the new changes. When we got to the design for the requirement that had previously been a mystery to me, the customer said we got it wrong. I said that I was sure we followed what we discussed at the conference call where we clarified the requirement. The customer was adamant that this was nothing like what they had previously discussed before I joined the project.

Once again we had to go back to the drawing board. This time around, I told the requirements team that we needed to write everything down in detail about the new requirements. And we needed to get the customer to agree in advance before we spend a lot of time in redesign and rewriting of the application. Another senior developer and myself worked closely with the requirements team. We thought up all kinds of questions that we then proceeded to hash out with the customer. That’s what requirements gathering and analysis are all about.

Currently we have a draft set of new requirements for this piece. Development is going to wait until these detailed requirements get customer sign off. Then we need to figure out how to schedule all these changes this late in the year before we go live in production. That story is one for a subsequent post.

Tough Bug

I have been assigned a trouble ticket that has been extremely difficult to diagnose. This problem had been left over from the previous contractor that maintained the software. They were unable to resolve the problem. Now we were on the hook to fix it. Like most problems, I attacked this one head on by checking the production audits of the problem items. At first I could not make head or tails of the situation.

In general I refuse to let any problems get the better of me. So I started looking more broadly at related items that we audited in production. Then I found something that was of interest. The user seemed to make an unexpected change in some of the data right before the problem happened. This was exciting. It seemed like this was the source of the problem. I tried to duplicate this problem in the development environment. I was disappointed to find out that I could not make the problem happen. Still I thought I had been on to something.

Sometimes you need to try out a couple things to get to the bottom of the matter. I decided to install the version of the application that the users were running. However I instead pointed the application to my development database. That’s when I first made the problem happen myself. Usually this is the point where a fix comes quickly. I was still perplexed why I could not make the problem happen when I used my debug version. Oh well. I tried a release version that I built. I still could not make this problem on anything other than our official release.

This troubled me. However like I said before, I am not a quitter. So I set up my virtual machine to be a build machine. I did a build just like our configuration management team does. Now I was going crazy, because my build would not make the problem happen. This is where I broke down and asked the configuration management team if I could borrow their build machine. Wouldn’t you know it? On their machine, the application uses a second copy of our code which is ever so slightly modified. The small modification was causing the problem.

Development is certainly partially responsible for this. Why do we have two copies of the same code, but with subtle differences that cause bugs? I plan to get to the bottom of this. However I am now at the point where the fix is trivial. That is a bonus because this was causing me to lose sleep this weekend. I crush bugs. They don’t crush me.

Review Time

I read a reader entry on one of my favorite software blogs. It was entitled “Review Time Again”. The author had a performance review. And despite having a stellar year, he got a rating of meets expectations. His question is what’s a developer to do in such situations? The great part about this post was the great feedback from other readers.

There was a general sense that performance reviews are themselves bad. They pit employees to compete which each other. Sometimes a little competition can be healthy. But if I need to do better than my coworkers to get that raise, why should I help anybody else. Right?

One source of feedback from another reader was to call attention to the fact that reviews are not about how much effort you have expended. They are or should be able what kind of results you have generated. Then again, another reader recommended you just threaten to quit your position. You might magically receive an “exceeds expectations” ratings sooner than you think.

There are many times in the Dilbert world where the pointy haired boss does you wrong. However in the case of performance reviews, the boss may not have very much flexibility. The manager may have to grade on the curve. The majority of the team is expected to have a rating of meets expectations. Maybe 10% can get an exceeds expectation, while another 10% must get a poor rating.

It is clear that one thing is for sure. Receiving a meets expectations rating hurts developer morale. And it could very well be the case you are working at a place where management is just bad. Sometimes you just need to move on. You often do get a pay raise when you jump ship. At other companies there is no set pay increase other than a cost of living each year.

The last idea, which is one that is implemented in my own company, is that of a 360 degree review. You don’t just get a review from your manager. You get it from your peers as well as people under you. I just received one of these. Or at least I was supposed to. I never got the results back from my manager. Am I unhappy? No. I was already told it would not impact my pay because I am new here. So there was really no point.