Utter Failure

We are getting our big releases ready for the next year. There are a total of six applications. Each have their own installs. One application release got tripped up because its official number was revoked, and a new number reassigned to it.

One would think you could do a new build with the new official number, update some docs, and be on your way. No such luck here. The real crime is that the release went through peer review without catching any major problems.

Lucky for us, we have a diligent test team. They called me over many times today to point out the problems. I told them to just annotate all the issues. I dove into the situation when the release was returned to application development.

The major issue was that the build had failed. The build scripts just packaged up the old components. Everybody thought the build worked. We can fix this by designing and implementing better error checking in the build. But this was just the beginning of the problems with this release.

I do not know how to fix the true source for most of these problems. Perhaps we just need better people on the job. But you would think that you could put in a process that allows even mediocre employees to produce adequate software. Once again the test team saved the day by catching the problems before we shipped the junk out the door.

Falling Down

Last week I got a message from the testers that the application was broke. There seemed to be a lot of excitement over this problem. At first some people thought it might be due to missing database access on some objects.

I digged into the problem. Found that some new SQL in the application did an improper join. So now queries were always returning No Data Found. I provided the responsible developer with the code that would fix the problem. And I told them to release the fix to test.

The following week test reported that their progress was halted because they never received the fix. A manager asked if I could step in. So I searched for the release that fixed the problem. Turns out it was never done. Even worse, it was scheduled, and some Clearcase labels were applied to the code. But no executable. And no documentation.

The configuration management team had left for the day. So I resorted to emergency procedures. I generated a builf myself. Created the documentation. Did a smoke test on the fix I had pointed out originally. Luckily there was one developer left at work for peer review. The show will go on. But something is seriously wrong here.

Password Mania

We have a couple changes that modify our system password policies to meet client requirements. One would think this is no big deal. The challenge is that we turned the code over to our Tiger Test Team.

I guess the testers never paid much attention to password change functionality. Well they are now. And they are finding all kinds of strange behavior. My guidance to them is to make sure they can replicate any problems. Then throw them over to development to resolve.

Our overall end goal is to (1) make sure we meet the client's requirements, and (2) ensure that error messages to the user are informative and useful. This is no small feat. It does not help that previously any error would result in the big dialog box that listed all the rules. I consider that poor.

I could go on about this subject. But right now I got a new trouble ticket based on a remote scenario. If your password expires, and my app forces you to change your password, but you type in your old password incorrectly, you get an unusual error message.

Revenge of the Build

We are trying to release new builds of all applications for the new year. For some reason they let the build guy take a vacation. And now we are falling behind schedule.

Some of our problems are expected. We just branched out in Clearcase for our baseline code. This version of code does not have any new features that are scheduled for a later release. Every time we branch out there are some build pains. It is problematic now because we did the branch right before the big release time.

I had a feeling things were falling apart when I kept getting calls from the configuration management guys. They seemed to keep getting confused about which branch they should be getting the code from. I tried to help out as best as I could. But CM is supposed to dictate this to me, not the other way around.

A number of other delays are due to improper setup. If you are a developer, and you need to do a build, you cannot wait until the day of the build to try it for the first timer. I can guarantee that it is not going to work. For now I am sitting back. If I swoop in and solve all the problems, it will only encourage future problems and more work for me. You have to feel some pain before you can help prevent pain.

Almost Foiled

At this point we are supporting multiple baselines of functionality. We are about to deliver the most basic functionality to production. So we split out a special branch in Clearcase that does not have any of the lastest changes we are working on.

To get ready for the release to production, we needed to create new build scripts which access the new Clearcase branch. You would think this was a simple task. It got assigned a developer that created the build scripts in the first place.

I knew we were in for a rough ride when this person said their Label View in Clearcase was not working. Here is how our build works. It creates a Clearcase label, applies this label to the latest code, then modifies a special label view to see just the code that has the label. The build then uses this view to get a copy of the code to build.

The problem is that the configuration management team does our builds now. And an ambitious system administration team removes your Clearcase views if you do not access them every 30 days. Turns out our developer's view had been deactivated. Today is the day after Christmas. you think we can get this view problem fixed today? Unlikely.

Luckily I had an Outlook reminder that pops up each week. It tells me to run my "Keep Alive" program that touches each of my views. That way the sys admins do not delete the views I need. I was able to help the developer test out the changes to the script so we can build off the new branch. Saved.

Operator Error

A tester officially informed us that our latest release was rejected due to some problems. Another developer's instinct to the problem was that it was a database problem. We had the tester come over and run some queries. And yes, her database account could not even see some required database tables. This was a job for the DBA Team.

But another tester came over and said some problems he found were not fixed. I asked if these problems would prevent the release from going out. He said absolutely so. I got the feeling he wanted somebody to pay more attention to the minor bugs he found. But we were on a roll knocking out problems that day so I indulged him.

On one query screen he saw the word "Group". So he put the group number in that field. But the software did not find the record for that particular group number. I looked up the record in his database. Sure enough it was there. So I got a login to the test database and ran some queries with the app. Lucky for me the app dumps out the exact SQL executed to a log file. When entering the group number, it looked like the app was querying against the group name column. I dug into the resource and found the full word on the screen should have been "Group Name". The tester had a lower screen resolution so the word got clipped.

I went to the tester and explained that this was the Group Name field. When he put the group name in the field, the record was found. Chalk this one up to operator error. Lesson 1: run the application with the required screen resolution. Lesson 2: not all trouble tickets are mission critical.

Contract End

Another company has won the maintenance contract for my project. So in two months I may be out of a job. This has happened before. Last time I just moved from one company to another, staying on the same project. We shall see if this happens again this time.

Normally government contracts are steady work. The work is done by contractors that have long (~5 year) contracts. This leads to more stability for both the client and the contractors. However the contracts get put up for rebid approximately every 5 years as well.

Nobody can say for sure why my company has lost the recompete for the contract. Maybe our bid cost too much for the client. Or maybe it was because during the current contract we botched a port to a web version of our client server application. A buddy told me sometimes the government changes contractors just to shake things up. The bottom line is I need to make sure I still have a steady paycheck coming in 2 months from now.

HR Clusterfaq

My company has a policy where you get a prize for every 5 years your stay with the company. One of my coworkers said he got a FAX machine after his 5th year. That sounded useful to me.

Now I have been working for my company for 4 1/2 years. But I previously worked for the company in the 90s. So I found that I could bridge the experience to count towards benefits. However for each benefit I needed to fight with the Human Resources machine to make it happen.

Getting my 5 year gift was no different. I called up HR and submitted a call ticket. All I got was an e-mail stating I did not have 5 years and the ticket was closed. After replying that I had 7 years, just split between two jobs, I was informed I needed to submit another call ticket.

I should have known then it was going to be a rough ride. But I agreed to the pain, and submitted another ticket. Got a call saying my dates would be updated and the request forwarded the following Monday. Couple weeks later no gift. So I called the HR person back. Left a message. Waited a week. Repeated this for a few weeks until I got tired of this. Finally I got a call a month later saying my dates were updated and the request would be forwarded to ther person who does 5 year gifts.

Once again I should have known I was getting the shaft. No responses to my e-mails. I called up the HR help desk and found my second call ticket got closed as well. The solution? Open a new call ticket. In programming we call this the infinite loop. I don't like to waste my Project Manager's time with stuff like this. But I was getting nowhere fast. He said he would call the local HR rep. No luck just yet.

In all of this, I have tried to walk away with some lessons learned. Most of our work consists of maintaining a suite of legacy software applications. I get assigned trouble tickets on my software all the time. Sometimes us developers are in a rush to close out these tickets. Sometimes the tickets get reassigned to other developers, teams, or even other project. I need to keep aware of the non-service that my company HR team has given me. I try to be diligent, keep customers informed of trouble ticket progress, and work hard on the hand-off to others. This won't fix my HR problems. But it makes for a less disgruntled customer when the application break down on them. Hey. Somebody has to pay it forward.

Test Saves the Day

After a number of problems getting our FTP Server accessible, I turned our internal company test team loose on our software. Earlier I had provided details on all the software changes that were made.

I got a call from a tester about some strange behavior. This did not sound right. Unfortunately I was busy waiting for another call. So the tester volunteered to come to my desk to show me.

The first problem was simple. Although the software appeared to disable some fields at the wrong type, this was operator error. The tester was not carefully selecting the right choices in a drop down.

The second problem was more serious. The software would not behave the way I described in my change notes. I verified that it was not working right. Still had to stay by my phone. So I called over another developer whose code I had included in the build. Assigned them the task to track down what was wrong.

Here is the crazy answer. The confiruation management team ran our scripts to do a build. This resulted in a new executable that looked fine. Even had a new version number built into it. However the build failed, and the build process just took the components from a previous build and packaged them up into a new install program.

Shame shame. The real kicker is that our team rewrote the build scripts. This did not use to happen with the old evil Korn shell built scripts. Luckily we have a diligent test team. Note to development: Fix your build scripts.

DNS Problems

I needed to send out a software update. Got the files checked. Had the CM Team do a build. Thought I was ahead of schedule.

Our test team called up and said they could not access the FTP server where our installs are located. Sure enough. I could not access it either. First line of defense was to call our company's Help Desk.

The company Help Desk asked me a number of questions. They stated the FTP server was hosted on a customer machine. So I needed to call the customer help desk.

I am working to get my release out the door. So I call the customer's help desk. They want to make sure it is not my company's problem. I told them I just got off the phone with the company help desk. To be certain, the customer's help desk conferences in my company help desk. Everyone agrees the customer troubleshooting team should work this problem.

We have 50 people on our team. I told them this was affecting all 50 people. The trouble ticket got assigned the higest priority. That's when the real fun began, I started to get a lot of calls at my desk phone. When a trouble ticket of the higest priority is opened, all kinds of heat is applied. I should know. Sometimes the trouble is with our software.

By the end of the day I am back in touch with my company's help desk. Apparently somebody in the networks division decided to rename the domain name for our FTP Server. I wanted to track down the perpetrator and give them the beat down. Unfortunately I had my hands full with other software releases that now had instructions to use the wrong domain name to download the software. Nice.

Bored to Snoredom

Things are slowing down as we are completing our software development tasks for the year. This is not to say that there is no work to be done. But we are not knee deep in coding.

So I walk by the cubicle of two developers that are on my team. One of them has their head leaning back due to sleep. People get tired. I get it. But when I go back to my desk I hear the developer snoring loudly. I felt a little sorry for the other person in the cubicle.

Come on now. You got to stay awake. Drink some caffeine. Read something interesting. Work on a side project. Or do the work to make sure the little we have to do actually gets done already. I think our team lead went over and made some noise in the area for a wake up call.

Happy Holidays.

Development Planning

At today's dev team meeting I inquired about plans to release different versions of our software. The scheduled due dates were not quite clear to me. And it was also not clear which functionality was going to be released in which builds.

Normally I am quite concerned about the development release schedule. But I am just a developer on my project. If I do not know what the plan is, chances are nobody else does either.

Sometimes it is a developer's job to bring such mysteries to the surface and force resolution of the problem. I ended up in our project manager's office. We got the due dates for the different pieces we need to deliver. We got the mechanism on how each pieces was going to be delivered.

I went back to my desk and started scheduling the builds to get the software delivered to the interested parties. Things are very easy when your developers know the plan. It also helps if there is a plan, and as associated schedule. Otherwise, you will have pandemonium.

Blackmailed

I read a story about a commercial software developer that received a blackmail e-mail. Here were the details of the e-mail.
  • name is Gattoussi Ramzi
  • lives in Tunisia
  • has no copyright law in his country
  • is a good programmer
  • wants a job at your company
  • has decompiled and reconstructed your source code
  • can provide evidence of such code
  • will build web site to sell your software if you do not cooperate

Luckily I have never received such an e-mail threat. I work on specialized systems whose source code normally only goes to one client. If somebody wants to take our source code and sell it via web site, I would wish them good luck.

But I could imagine how this would be detrimental to a company that sells commercial software. I am not even sure how one would combat such evil tactics.

Build Problems Redux

So it was late on a Thursday night. I got the call from the CM team. Their build did not work. Turns out they did not pick up 10 files that I had labelled in Clearcase.

At least the CM Team knew what was wrong. I did not label the directories in Clearcase. You need to do this if you add new files to the directory. I knew this. But I did not know there were new files involved. I set up the build on behalf of another developer that was out sick.

Luckily I had just learned about WinDiff earlier in the day. I compared the CM branch with the development branch in Clearcase. Located all the directories that had new files. Applied a label and let the CM team do a rebuild.

The normal process is for application development to peer review the result of the CM build. I did some review and corrected a number of problem. But since I initiated this release, I could not peer review my own work. The problem was that at 8PM there were no developer left. So I put out a memo to the development team, pretty much saying that the first one in the next day had to do the peer review.

I have a good feeling we are going to make our delivery date. The cost? Me and the CM team had to stay at work until 8PM. Lucky for me I work the late shift normally anyway.

WinDiff Saves

I am running late this morning. And we had the company holiday party today too. So I did not go into work first, but proceeded straight to the holiday party. Nice food. But I did not win any prizes.

When I got to the office I had a voice mail from the boss. Turns out our customer needs a new build by tomorrow. And I am supposed to do the build. The problem is that this build needs all the new stuff that got put in. And the guy that put it in was out sick today.

So I called up the guy and he told me that the WinDiff utility could show me the files I needed to include. I knew the folder where configuration management keeps the source files they have so far. I used WinDiff to compare that directory with our development directory.

It did take some work. But I quickly identified the files I think constitute the "new stuff". This would have taken forever if I did not have WinDiff. That's a real life saver. They are doing the official build for the customer right now. I got my fingers crossed.

Pricey Tools

We use a lot of legacy tools at work. But we also use the latest tools from Rational Software. I got a developer catalog in the mail today. Seems some of the tools we use cost a lot of money.

Here are some of the tools I have installed on my workstation.

  • $1,744 Rational Clearcase
  • $1,645 Rational Clearquest
  • $2,148 Rational RequisitePro

Although I don't use RequisitePro, I might have to start it up once now that I know how much it costs. I think our client pays for huge licenses so they get a big discount. Still that's a lot of bread.

Multiple Failures

Today was a busy day. I had initiated the release of two of our apps to the customer. We funnel all build requests through the configuration management team. While I was away from my desk, they had troubles with the first build. Another developer stepped in and set them straight.

This developer who helped out was the guy who developed the build scripts. Since the first build went wrong, he kept an eye on the second build done by the configuration management team. It is easy for him to do this since he coded the build scripts to e-mail him every time a build completed.

He came by to deliver the bad news. The second build got screwed up too. The problem stemmed from the fact that I update a configuration file that is used by the build. I put a Clearcase label on that version of the configuration file. The configuration management team then tries to use Clearcase to merge this version of the file into their branch. This merge did not work with either build today.

Using the current process, it seems the had a 50% change of getting the merge right. But I guess they did not understand what they were merging. I brought the author of the build scripts upstair to sort this out with our configuration management team. I told them that I set up all the configuration data for each build I request. We agreed that they would stop trying to merge these changes in, and just cut-and-paste the file I label.

Some times you just got to sit back and laugh.

Search for Error

A member of our development team was unable to determine the cause of a pesky warning message that kept popping up in our code. The developer thought it would take weeks to track it down. So it got passed to me.

I thought this would be a trivial task. Just duplicate the problem, break into the code in debug mode, and find out why it was happening. Easier said then done.

Part of the difficulty was that the text of the error message did not make a lot of sense given its context. A bigger problem was that the code was very complex. There were related classes involved that had data members with the same name. Ouch.

So I reverted back to the techniques I have learned over the years. Set some breakpoints to find the code closest to the problem. Use the "find in files" (aka grep) technique to try to search for where important variables are set. In the end I traced the problem back to a missing item in an array which described all the database columns our query was retrieving.

Sometimes you just need experience to plow through someone else's code and debug problems.

Failure to Communicate

I got into work and immediately started going through my e-mails. One such e-mail caught my eye and troubled me. The test team manager blasted the application development team for sneaking in some code and breaking the application.

When at work I try to maintain a professional attitude. However that does not mean I let e-mails like this go unchecked. So I scheduled a meeting with the test manager, project manager, and our whole dev team.

I opened by the meeting by stating that I was offended that I was accused of not following our software process. This was just not true. However I gave the test manager an opportunity to explain further. Apparently this was just a reply-to-all e-mail which was sent based on statement made by a tester. The e-mail was not necessarily directed towards myself.

As soon as I determined the intent, I was relieved. We spent a little more time discussing how we would resolve the problems found. Apprarently this had all been a failure to communicate. Having a sit-down meeting was a good venue to resolve the issue. Starting off an e-mail flame war would have been a step in the wrong direction.

Access Denied

I made some code changes and tried to build our application. Got a lot of strange compile errors. Figured this must be due to the fact that the code is in a state of flux. So I thought I should refresh my local copy of the code with the latest baseline code.
The first step was to blow away all the code on my local machine. Tried to use Windows Explorer to delete the whole folder. Keep getting an error that access was denied. So I closed all of my Windows applications and tried again. Still no luck. Access denied.
I figured that some process must have had a lock on a file in the code folder. So I fired up Windows Task Manager. To my surprise I saw a lot of strange process names which were running under my login ID. Here are some of the process names: ccApp.exe, ctfmon.exe, direct.exe, lcfep.exe, MDM.EXE, wowexec.exe, PAPIHost.exe, rwmts60.exe, VPTray.exe, hkcmd.exe, and igfxtray.exe.
Of all these processes, I only knew that "rwmts60.exe" was my Oracle Reports Server that I always launch on login. So I killed all the rest of the processes and still could not delete my code folder. It was time to get hard core. So I proceeded to kill "explorer.exe". That hosed the Windows user interface. Luckily Task Manager let me start up another instance of explorer. Finally I could delete the code folder. Go figure.

Monday Again

I had a lot of work to get done today. So I got in early. It was a treacherous morning. I seem to be spending a lot of time trying to get software application builds out the door.

Just as I was about to go to lunch, somebody needed another build. Each developer on our team is supposed to know how to run a build. But it seems only the chosen few actually do them. So I kicked off a build and ran out the door.

Prior to leaving I had the option to sign off that I peer reviewed the build documentation. The way this is supposed to work is I read the docs, mark them up, and sign when they are ready for release. But since this would require me skipping lunch, I just signed them. Hey. I was hungry.

The last pain of the day was when a fix I did came back to me as broken everywhere. I installed the version of the application that everyone had. Could not duplicate the problem. So I pinned down one of the people who found the problem. Seems they were using another part of the application. But I suppose my code broke that part as well. This seems to be a case of poor unit test coverage.

Yep. It was a Monday.