Punish the Tricksters

In my down time, I like to read programming articles on Reddit. One post caught my eye. It read "Torvalds: ‘there is no open source community’ ". This headline seemed to be an oxymoron. So I went to read the article.

Turns out it was an interview with Linus Torvalds. However Linus said there is no one open source community. What a cheap ploy for a Reddit title. Hey. But it got me to come and read. After first I thought I would leave a comment on the blog saying thanks for the trick. But I then decided I would not even give this author credit by adding a comment. Note that I did not link to the bait and switch article.

So I did what any responsible Reddit user would do. I signed up for Reddit (finally). And I voted down the link by one. If enough people do this, the article will float down to oblivion where it belongs. I don't mind a sensational title to drum up some interest. Just don't try to trick me. It will only hurt you in the end.

P.S. The link took me to a blog. I tried finding out more by clicking the About link. Got a File Not Found (404 Error). It figures a site using weak tactics like this would be broke down.

Showing Initiative

Last week a developer on a related project thought he found some problems in our data. So I asked him a few questions about his findings. Then I told him how the system was supposed to work. And I sent him off to do some more research.

When the developer presented his findings, it appeared that there was indeed something wrong. So I did a little digging. I found that the developer was interpreting the data wrong. It was an easy enough mistake to make. He was assuming a certain identified was unique. Normally this assumption holds true. But the numbers do recycle after many years. This was the case here.

So I set the developer straight. He has only been on the project for about a year. So he would not know about this kind of thing. And I could have left it at that. But I figured this could and will most likely happen again. I went to a manager and explained the problem. The manager told me to take this up with the customer and get a decision, then fix the problem.

Now the reason I got this direction was because the manager knew what I was capable of doing. And in fact I put together a small writeup on the problem, outlined a couple solutions, and scheduled a time when we could discuss this with the customer. This is business as usual for me. But how can we get a normal developer to act like this? You would like any developer on your team to follow through and see issues to closure.

I don't see this type of initiative on my team though. I would like to. It might not be my job to ensure this type of ambition and drive is spread throughout the team. I am not a manager myself. And I do not even lead a team. I am just a senior developer. But I do care. I do have the drive myself. I would like to see it spread. This is, I am sure, no easy task. Might require some meditating to come up with the answer.

Which Bug to Fix First

We just delivered our initial software release for the year. This is the busy seson for our customer. And when they are busy, they always tend to discover bugs in our code. A lot of problem reports are coming in. I am the guy who has been on the project the longest. So a lot of these bugs come to me for fixing. The challenging part is that other developer can fix the obvious problems. I tend to get the ones that are hard to debug.

So I had a stack of trouble ticket on my desk. The users assign priorities to these tickets based on some general guidelines. It is easy to pick the problem to work on first. I just take the one with the highest priority. However now I have worked my way down to a bunch that have the same priority. What is a guy to do?

I try to work a couple problems at a time. That way I can multitask and get more done. When I need to wait on one ticket, I can jump to another and keep productive. But there are other factors involved. Users tend to make more noise for the problems that impact them the most. This feedback makes its way to me. I also try to gauge which fixes would give the users more bang for their buck. Those I fix first. The funny thing is that I actually do get to decide what to work on.

Now since I have a lot of bugs to squash, it is time to get back to them.

Avoiding the Axe

I got an official memo today stating that I would be picked up by another project after my current one ends. This is good news. Makes my life simpler. I have been working on my current project for a long time. Heck. I even joined my company because they won the contract for this project.

For a while I considered joining the company that has now won the contract for my project. However discussions with that company have remained fruitless. Like most other companies, they balked at my salary requirements. This is OK. Due to the nature of my project at work, I think the new company is in for some serious pain.

My experience is in sharp contrast to a contractor buddy of mine. He took an extended vacation at Christmas time. When he got back to work, he got notice that his contract was ending immediately. Part of the reason is that he wants to remain a contractor (and get paid the big $$$). I imagine they would have a position for him if he was willing to join as a full time employee. But given that he did not want to, his job has been terminated. Lucky for my friend his skills are in high demand. He starts a new job tomorrow. The downside is that he has to move to the state where the job is. This is the life of a contractor.

There are a few benefits to being a full time employee of the company. It appears they will be taking care of me when my project's contract ends. Did not even have to take a pay cut. And I get to keep the toys my company has provided me. I will keep you posted on how the new project goes.

Work Takes Its Toll

I have not taken a vacation in a while. But a number of people on my team have. And when a team mate is out and the software breaks down, I get the call. This is my job. However it takes a toll when you do it day after day without any breaks.

The way I have been dealing with this is sleeping in each day. Luckily I have a nice flexible schedule at work. I actually get a lot more done when I come in and stay late. It is normally just me and the cleaning lady after 6:00PM.

Last week even coming in late did not cure the pain. So by the end of the week I decided it was time to take it easy. Called in sick a couple days. This is much harder than it sounds. Normally I have lots of meetings scheduled, and am working on all kinds of tasks that need day-to-day attention. I am hoping the rest will have recharged me by next week. If not, it might be time for that well deserved vacation.

Priority Problem Alert

Our client has a dedicated testing team. They discovered a critical bug in our latest release of new functionality. They declared that they were unable to proceed with their tests. And they issued a trouble ticket of the highest priority.

At 3:00 PM today the project manager said we needed to ship a fix to this problem today. This was problematic since it normally takes about 8 hours to release software for us even when we know what code changes need to be sent out.

The development team went into emergency mode. Some of us discussed ways to just hack together a fix to remove the high priority problem at the expense of breaking other things in the application. That felt like a waste of time because they would discover the other problems soon enough. Luckily the developer whose code was having trouble had a breakthrough. We did have to remove some of the new functionality. But at least we did not have to break anything.

I tried to stay clear of this problem because I was not in the mood to be at work until 10 or 11 PM. To my amazement, the fix got released in just over 3 hours. It was only later that I realized how this acceleration occurred - they cut corners. Normally all code changes go through a peer review. Skipped. And all new deliveries normally go through independent testing. Skipped. The documentation with the new delivery gets an editorial review normally. Guess what? Skipped. In the end, these shortcuts will have proved worthy if the fix makes it out in time and nothing blows up. I am glad this delivery was not my responsibility. We shall see tomorrow.

Project Management

Last Friday I was informed we needed to ship a fix on that day. That was a tight schedule given the long process we have for releasing software. But this was to fix a high priority problem. So I forged ahead.

The project manager left for the day before we could get the release ready. So he asked the software development team lead to release it instead. The dev team lead wanted to leave too. So this task got deleted to me. I agreed since this was a high priority problem.

Given the rush to ship out that fix, it caused delays for shipping this week's planned release. I tried as best I could to keep things moving through our process. However this time when the project manager decided to leave before the release was ready, I said I needed to go too because I worked this weekend.

As I left I assumed the dev team lead would release today's delivery, but I could not be sure. It is one thing to make an exception and take over project management responsibility in emergencies. But I did not want to make a habit of it. Unfortunately the dev team lead got stuck with releasing a whole lot of other software today too. Let's hope he did have to pull an all nighter.

Too Good

When you prove to be an excellent developer, you may have negative consequences. You become the go-to guy. People learn that they can depend on you. So when things go haywire, you get the call. Unfortunately that call may came in the middle of the night or on the weekend.

Words travels fast when you are competent. The customer will want to talk with you. They will be calling your phone. You may be required to be in on the teleconference calls.

I have heard of people fearing this situation. In fact, sometimes developers intentionally screw up to avoid being seen as the go-to person. They say you can't do too well or it will come back to haunt you.

The jury is still out whether it is worth it to excel and then get additional responsibilities. For now I am doing my best. Let's see where this takes me. Like I mentioned earlier, however, I get the calls on my home phone from people who are in trouble.

Schedule Surprise


Yesterday I got assigned a trouble ticket from our client's acceptance test organization. The customer requested an estimated delivery date for the fix. Somebody informed them it would be done within 1 week. So I figured I had a little time to get the fix done.

Being the take charge type of guy that I am, and knowing that I am going to be busy during the upcoming week, I knocked out a fix in one day. I pushed to get the fix peer reviewed. Then I sent the fix to our own internal test team for verification.

Today I got a big surprise when I got in. Apparently the customer said that 1 week was way too long for them to get the fix. So the Project Manager said we would give it to them in 1 day instead. I am to blame for part of this problem. I come in late usually so I can stay late and get things done. So I guess the PM came to my desk, found I wasn't here, and decided I could meet whatever schedule was dictated.

Luckily I had not slacked off, even when given an initial slack schedule for release. So today I just checked with our testers whether the fix was good. And I scheduled a release to our customer. However I think I had better manage every one's expectations. We got lucky on this one because I could duplicate the problem and was able to code a fix quickly. This is not the norm for trouble ticket resolution.

The Need for Review


We have a process at work where all documentation goes through peer review. I was going to release a bug fix to our internal test team. I noticed another developer was planning to do the same. So I waited until he was done to bundle our changes in the same delivery.

My coworker finally finished his task. So I created a build to give to internal test. The documentation that accompanied the build did not get done until late. Luckily there was just one other developer on the team left at work.

Unfortunately the other developer was busy resolving customer problems. He had already worked a long day and was not done. He told me he would sign off on my documentation, trusting I had done a good job. Now don't get me wrong. I always proofread all my docs to make sure I produce the highest quality possible. However peer review often finds problems in my work. I was tired too. But to keep things moving I let this pass.

It is a difficult line to walk. You want to follow the process because it is there to ensure high quality. However there are times you need to bend the rules to get things done and be able to go home (or do more work). Where should we draw the line?

Configuration Confusion

I needed to release a crucial fix to our customer. So I checked in the code changes and submitted a request for a build. The request came back. Upon review, the configuration management team produced an executable that was 10% smaller in size than normal.

So I asked the CM team if they knew what had happened. Then a long story ensued. Apparently their first build bombed. That's when they realized that their build script recently got deleted. And guess what? The working build script was not under source code control.

To recover from the deleted file, the CM Team enlisted the help of the build script author. They reassembled the script. But the result produced a smaller executable in the end. When installed, the application appeared to behave normally. However I was not about to send out a fix that had suspicious properties.

I spent a lot of time going around in circles with the CM team, researching the differences in their build script with similar ones that worked in other environments. Then I canned that effort. Instead I started to focus on any differences in output that I could detect. Build logs were the same as before. So I made sure all EXEs and DLLs were accounted for. Check. Then I verified all the EXE and DLL file sizes. They were all good.

Finally I discovered that a whole folder of report files were missing from the install. We traced this back to the folder being invisible during the build. It was invisible because the folder had not received a label in Clearcase. The reason? Another CM guy had the file checked out. This was wrong on many levels. Why did a CM guy check out the file and keep it locked? More importantly, why did the build go on and not flag this as a big problem.

I am hoping that a rebuild now will fix the problem.

NetMeeting Assist

I was having trouble identifying exactly what users were doing when our application bombed on them. There were a number of written accounts of the steps. They all seemed related. So I tried asking for a reliable set of steps that always results in the problem.

The response I got back was that the problem was happening at random. This was not getting me anywhere. So I called up an administrator at the site. I asked him if he could log into the application and make it happen. He did. But he had trouble communicating what exactly he was doing.

So I went over the situation again with him. I tried to slow him down on some steps that were unclear. Then the administrator recommended we use NetMeeting so I could see for myself what he was doing. Why didn't I think of that? At first I could not dial him up. But I gave him my IP address and he could dial me.

The administrator said he was set up so I could see full 32-bit color. But the screen refresh on my side of the NetMeeting just could not keep up with all the painting. So he turned down the color depth. It helped that I had him on a land line phone. I had to keep asking him to slow down until I could follow what he was doing.

Seeing is believing. Turns out that once I was watching over his shoulder, we could find a minimal set of steps to duplicate the problem. Once I had these, I was able to locate the source of the problem in an hour. Some meta data tables had incorrect data in their training environment. I gave the DBAs a set of SQL statements to regenerate these tables. Problem solved with an assist from Microsoft NetMeeting.

Problem Duplication

We have an environment where users can train with real data. Each year our DBAs copy some interesting production data to the training database. This is a good place for users to go over the new changes in the software.

User complained about the application aborting with Oracle errors in the training environment. A couple people chimed in with ideas on what could be wrong. Some consultants said it might be invalid database objects. I gave them a little help, but was focusing on other issues.

The DBA manager took a look at the problem and said it must be an application problem. This guy has been on the project a long time. I agreed with his diagnosis and told him I would get on it. As with all trouble tickets, I asked the customers what they were doing when the problem happened. I wanted to find the pattern. I got back a message that the problem was occurring randomly.

Sometimes "random" to a user is not so random for a developer. So I contacted an administrator at the training facility. I got him to go over the scenarios when the application aborted. Then I asked him to try and go make it happen and report back. Each time I spoke with him I got more clues. And when I asked him to repeat some steps, it turns out that this problem does have a broad pattern. The error only happens with some specific data.

That was all I needed. I keyed in on the trouble data. And it turned out to not be this data itself. But there was some metadata tables which were not in sync with this data. Our application depends on the metadata being correct otherwise all kinds of bad things happen. Our DBA manager chided me for having software that degraded so poorly with bad data. I told him this was a drawback of having a ton of data that needs to be accessed quickly. But at least we got to the bottom of the problem. Fix coming out shortly.

Process Overhead

So we have the code for a fix to release. It is a long road of processes before this fix can be experienced by the customer. Some of this process is to ensure a high quality result. But the end result usually means delays in pushing out even the highest of priority fixes.

The code changes have to be checked into source control. Then we need to do a build for internal test. This build needs to go through configuration manangement. The internal testers verify the fix. Then we do a build for customer release. This build then goes through 2 levels of configuration management - our internal one and the customer one. Finally the fix can be staged for implementation on a customer machine.

The real pain comes when we have done the hard work to figure out how to fix a problem, but the problem is high priority. In that case we get lots of attention and need to provide constant status updates. But most of the time he factors delaying release are beyond development control.

Miscommunication

Yesterday a high priority problem was reported by our customer. My manager asked me to step in and work the problem because the responsible parties were out of the office. I have a good rapport with the customer. So I spoke to them and established expected software behavior.

Then I got down to work querying the database. Turns out most of the problems were misunderstanding of how the software was supposed to work, or certain settings mean. However there were two issues that seemed like they might have been bugs.

I kicked some specific questions back to our requirements team. They are responsible for eliciting customer requirements, generating system requirements, and confirming requirements with the customer. The analyst I spoke with had to defer to a subject matter expect to get the answer. But in the end it turns out the software had some bugs.

So I identified the files and the functions in those files that were broken. Then I forwarded the details to the team that handles that portion of the application. When everybody came back to work the next day, I found out more information as to why these problems popped up in the first place. Apparently development had spoken to the customer and got direction to code up the software that way. In complex systems such as ours, where there are documented requirements and multiple levels of test, this informal requirements gathering just does not work.

Right now were are in the processing of putting together a fix for these problems. Since they are high priority, there is a lot of pressure for status updates around the clock. Somehow I think these problems could have been avoided altogether. But implementing a process to ensure this is much work. And it is pretty much too late now.