Software Maintenance: June 2008

The Oracle Client

Our team maintains an old suite of application. Some of the code that accesses the database is 10 or more years old. This code uses Pro*C which gets compiled with the Pro*C compiled shipped with the Oracle client. The current production code makes used of the Oracle 8 client. Support for this version of the client has long since been dropped by Oracle Corporation. So our team has just completed a migration to use of the Oracle 10g client. There was some pain initially in development as we did not standardize on a location for the 10g client installation. Whenever developers passed code to each other, a number of project configuration paths for the Oracle stuff needed to be changed. We have since standardized on a location for the client on development workstations. This has relieved some stress. But the true pain was only just beginning.

The next set of issues with the Oracle client surrounded the testing of the changes. The internal test team wanted to maintain the ability to test changes to the production code which uses the Oracle 8i client. They also needed to start testing the version that was ported to use the Oracle 10g client. There were all kinds of ideas on how to accomplish this. Here are some of the top ideas that were considered:

Borrow separate machines to do the 10g client testing
Install the 10g client in a virtual machine on their boxes
Install a version of the 10g client which is hacked to also support the Oracle 8 client

In the end, a decision was made to leave the Oracle 8 client on just one test machine. All the other test boxes would have the Oracle 8 client replaced with the 10g client. You would think that this was the end of the matter. We were not that lucky.

The test team needed detailed instructions on how to install the 10g client on their machines. This was a reasonable request. Development just ended up sending a competent Oracle DBA to do the install for the test team. This DBA decided to install just the run-time type of 10g client on the testers' machines. This also seemed the reasonable and appropriate course of action. However the testers were unable to connect to the database using the applications with this setup. Currently the DBA went back and installed the administrator type of 10g client on the test machines. This worked for now. But this did not seem like the optimal approach. Right now a developer is trying to understand exactly which components from the administrator version of the client are required for the application.

Our port to using the latest version of tools has been plagued with issues like the Oracle client version issues. We have had a number of problems with the version of the Microsoft Data Access Controls (MDAC) install as well. But that is a story for another post.

Reformat Tool

All the source code for our project is stored in Rational Clearcase. The servers are UNIX boxes. But our Clearcase clients run on Microsoft Windows boxes. As most cross platform programmers already know, the UNIX and Microsoft Windows systems have different standards as to how lines in text files are terminated. On Windows lines are terminated with a carriage return and line feed. However on UNIX lines are only terminated with a line feed.

When you transfer text files between UNIX and Microsoft Windows systems, the transfer software usually takes care of these differences and converts end-of-line to the correct target platform format. Clearcase will also manage this correctly if you set it up properly. We unfortunately let each developer set up their own Clearcase client. And sometimes this is not done right. The result is that people are sometimes looking at code with the wrong termination character(s). So depending on where the files has been, you may be missing characters or see extra characters at the end of the line.

Normally what developers do on their own machine does not affect me. But when they take a file, check it out of Clearcase and mangle the text line termination, then check the resulting file back it, I encounter pain. Some developers have taken it on themselves to fix the files when they see them in Clearcase. I was forced to do this myself on some files I encountered recently. It turns out the files got mangled so there were extra line feed characters at the end of each line. The result was code that appeared as if it were formatted with double spacing. This style of code is difficult to read.

Instead of getting mad at the situation, I decided to have some fun and write a utility to strip out the extra line feeds. This made the code more readable. But then I realized that there are some instances where it is beneficial to have white space between lines. For example you normally want a blank line between different function definitions. So I had to code this logic into my little application that fixes the corrupted files. Here is the main code snippet for my application. Essentially it determines whether the previous line ended a function or class. If so, and the current line is not blank in the source file,. it will preserve the extra line with white space. Here is the code:

CString strLine;
CString strLastLine = "";
BOOL bMoreData;
bMoreData = inputFile.ReadString(strLine);
while (bMoreData)
{
// Reformat
if (!strLine.IsEmpty()
(strLastLine == "}")
(strLastLine == "};"))
{
strLine += "\n";
outputFile.WriteString(strLine);
}

strLastLine = strLine;
strLastLine.TrimLeft();
strLastLine.TrimRight();

bMoreData = inputFile.ReadString(strLine);
}

The code using Microsoft Foundation Class (MFC) classes to get the job done quickly. However the algorithm is easily understandable without knowing the specifics of MFC. I was quite pleased that the algorithm was this simple. I can't believe I even thought about fixing corrupted files by hand. We have some pretty large source files in our project. Now I am ready with my home grown reformatter program in case somebody checks in corrupted code.

Missing Table

We have a whole subsystem that loads data into our system on a daily basis. The loads code runs on a UNIX platform. From a high level, the requirements for the load are not overly complex. We have a number of file types with well defined formats that need to be inserted into our database each day. And there is some post load processing that occurs. The implementation of these requirements is a bit complex though. And our team has a new developer that is responsible for the loads code.

The developer told me the load was failing in a test environment. So I recommended he load a file that has a known good format through the system. That could help him determine if it was a data problem or a software problem. He did some more investigation and found that one process in the load kept failing. Apparently a PL/SQL package would not compile. The package rang a bell with me. I told the developer that this package had a custom install script. The package depends on synonyms which are only created at run time for the loads program. So there is an install program which emulates the loads program by creating the synonyms first before compiling the package.

After getting information from me, the developer went on to find more problems with the PL/SQL package. I tracked down and asked the guy who wrote the package about the problems. They did not ring a bell with the developer. So this was not an easy PL/SQL package to deal with. In the end it appeared as if the developer was looking at an outdated version of the code for the package. Still it felt as if something were terribly wrong with the design of the database package. Normally you should be able to manually compile a database package by executing its source. If it is more complicated than that, the design may be at fault.

The new developer for the loads software has a lot of ideas on how to improve the loads subsystem. I expect this weird PL/SQL package may be part of the list. Any software that requires you to know the tricks is generally bad. Software should be written with maintainability in mind. The system we work on is 15 years old. As you might imagine, most of those 15 years were spent in maintenance mode. So it is of the utmost importance to do the software right in the first place. The hectic pace of software development does not always lead itself to this goal though.

Build Script Tools

I work on an application that was developed 15 years ago. The target system was initially some UNIX workstations. About 10 years ago the code was ported to work on the Windows NT platform. This is essentially the same version of software that runs today. When the port was complete, the development team looked to the configuration management team to put together some scripts to automatically build the software releases. This task fell to a CM team member that had a UNIX background. So he felt the only logical choice was to write the build scripts using the Korn shell.

Unfortunately the target and development platforms were Windows NT. And somehow he got the customer to purchase a site license for MKS Toolkit. This is a software package which emulates UNIX on Windows machines. It has a full Korn shell. Thus the build scripts were written in the UNIX Korn shell running on Windows NT using the MKS Toolkit. About a week after I started on this project, the CM guy that developed these scripts left the project. So the CM Team turned the scripts over to the development team for maintenance.

Just about all of the development team staff are Windows programmers. Now as programmers we can hack any scripts written in any language. But we do not prefer UNIX scripting languages. The changes to the build script showed this. Whenever something in the build script broke or required modification, the changes were hacked in. The result was a very unruly set of Korn shell scripts that accomplished the builds for releases. Now this is not entirely fault of the development team. The scripts were a bit unstable to start with. They worked as long as the original developer was there to run them and tweak them and work on them as we needed builds. Then things only got worse when this guy left.

A few years ago the team got a lot of Java developers added to it. The plan was to port the system from C++ client server to a Java based web version. This port was a disaster. Most of the Java developers left after the port failed. But a few stuck around. They grew tired with maintenance of the C++ client server code. So they were unleashed on rewriting the build scripts. Being Java developers, they chose Ant as the build of choice. The problem is that they did not entirely uncouple the dependence on the MKS Toolkit. So the current build is a mix of old MKS Toolkit utilities and Ant built in functionality.

The final twist is that we have upgraded the tools used to build the application and connect to the database. As a result, our build scripts need to be updated. All of the Java developers have left the project. And we have assigned a Visual Basic developer to make some of the required modifications to the build script. This developer had already written a couple small VB apps to use in our system. But I think he is trying to avoid this for his build modifications. The team is considering a replacement of the dependence on the MKS Toolkit. If we truly need UNIX functionality, we plan to fall back on the Microsoft Windows Services for UNIX which comes pre-installed on all our development workstations. I surely hope the build scripts do not get more complex as a result of our latest change.

Being Supplied

Upon joining my company, I was issued a company laptop. It is only used to send company e-mail and enter my time weekly. But I thought it was a nice gesture anyway. I could have done all those functions using my home computer. The company hooked me up.

The client I work for is also pretty generous. They issued me another laptop to do all the work on their project. The laptop is pretty sweet. If needed I can take the laptop with me to do some work at home. I have only done this once as I really do not care to work from home. But it still makes a positive difference.

I get paid a good salary. And the company is a consulting company. So at first I started bringing in my own supplies. I only truly needed pads of paper and pens at first. But I found myself lacking some other needed office supplies. I had a stash collected from previous assignment from prior jobs. So it did not cost me much money. I was however pleasantly surprised when a coworker told me they had a big box of new hire supplies waiting for me at the main office. I had a buddy who works at the main office sign for my box. And another coworker transported the box from the main office to the site where I work.

Now I do not think this box of supplies cost my company more than $100. However the effort was appreciated. I have this big box in my study at home. I don't think they expect me to haul this big box into my client's location where I work. But the plan is to move us out to a separate office some time soon. At that point I shall bring my box into my new office. It appears I am all set up from the administrative side. Now they just need to buy me the software tools I need, and we shall be off to the races.

Stack Overflow

One of the legacy applications my team maintains uses Pro*C to get to the Oracle database. We recently ported the code from Visual Studio 6 and the Oracle 8 client to Visual Studio 2005 and the Oracle 10g client. One particular Pro*C function kept aborting the application mysteriously after the port. The application just crashed hard and disappeared in release mode.

At first glance, the code in the Pro*C function looked fine. The function was a bit long. But that should not cause a crash. Upon further inspection, I found that this function allocated a huge amount of data on the stack. I wondered whether this was related to the crash. So I slowly starting converting the memory on the stack to use the heap instead. After a while the function stopped crashing.

Now I guess there is some way to get the compiler to allocate more memory for the stack in general. Or maybe I could have coded in some directive to allocate a larger stack size for just that function. And perhaps the Pro*C compiler had some Oracle directives to do the same. However this would have just masked the true problem. You should not allocate huge amounts of memory on the stack. That reminds me of a junior programmer who once had a lot of trouble with declarations like char szBigBuff[10000]. She did not seem to even know the difference between the stack and heap. Thus trouble ensued.

Our team is pretty close to completing the port to Visual Studio 2005 and the Oracle 10g client. We have ported all the big application in our suite. Now we have some smaller ones to attend to. Then we move on to the build scripts. Things are looking pretty positive now.

Microsoft Support

Our team ported some applications from Visual C++ 6.0 to Visual Studio 2005. We were able to get through most of the upgrade issues. However there was one tough problem that eluded us. The empty string ("") got corrupted and was showing up as "0501" everywhere in our application. In the end we turned to Microsoft support to help us out.

It was a bit tricky. Another project loaned us their extra Visual Studio 2005 licenses until we could get our customer to buy us our own. However they were reluctant to let us use up their paid Microsoft support. They get charged by the incident. Our manager expressed our extreme need. So they agreed to let us make one call to Microsoft. It was difficult to help Microsoft assist us. We are not at liberty to give out the code to our project. It is sensitive government stuff. Therefore we had to create small snippets or pass along core dumps.

In the end a developer was able to trace back the problem to the source. I think the help from Microsoft assisted him. Turns out somebody was getting a constant pointer to the internal C string held in a CString object. Then they were explicitly casting away the const and modifying the string. This is pure evil. It just so happens that the Visual C++ 6.0 compiler was masking the problem. But this was not the case in Visual Studio 2005. The developer who fixed this is worried that there are other hidden errors like this lurking in the code. I guess we shall find out sooner or later.

Lack of Requirements

At work our client frequently provides us with written work requests. These requests essentially contain basic business rules that they want implemented. Our job is to initially cost the full implementation of the requests. This is not too difficult if you know the existing system well.

We have a separate requirements team. I think it is their job to read and understand the business requirements. Then they are to perform analysis to arrive at a list of system requirements. The development team will then design and implement a solution to these system requirements.

I am listed as the guy who needs to do most of the design for the new stuff. And I have been noticing that some of the system requirements seemed sketchy. So I have been asking the requirements team about them. Here is where the true surprise come in. For most of the questions I pose, the requirements team does not know what the system requirement means. Apparently they have been just taking the business requirements and reformatting them to produce a system requirements list.

Now I find this a bit disappointing. I can read the initial work requests that contain the business requirements. If there is no added value in the system requirements that are produced, I might as well stick with the original material. The authors of those documents know exactly what they want. But this should not be how our team works. Might be time to call in the big dogs and shed some light on this defective arrangement. Luckily I have found over the years that it is rare to find folks who excel at requirements analysis. So I do not expect much. We shall see if there can be any improvement in the current team.

The Debug and Release

It took about a month to port a big application from Visual Studio 6 and the Oracle 8i client to Visual Studio 2005 and the Oracle 10g client. There were some show stoppers that had to be overcome along the way. One major problem was database calls between different threads colliding with each other. This got resolved using Pro*C directives which explicitly manage database contexts between threads.

This past week I wrapped up unit tests all the functionality in the application. I corrected the last couple bugs and told my team lead that things were looking good. The tests were all conducted using the debug version of the application. If the release version worked just as well, we would actually come out ahead of schedule. A coworker started working on the build script that would bundle the install for my application. He wanted the code for a release version of the application so he could use the real code to test the build script modifications. I spent less than a day to get the release configuration of the application to correctly compile and link.

I started to do some smoke tests on the release version of the application. That's when the horror began. All kinds of functionality that worked fine in debug mode was hosed in release mode. The application is now aborting in a number of places. It is disturbing since, when in release mode, the application just disappears sometimes when it aborts. And there are a lot of controls which are showing garbage in release mode. I broke the bad news to my team lead, who in turn passed the surprise on to upper management. The only good news it that some of the problem symptoms in release mode appear to be similar. I am hoping that I can find the cause of one of these common problems, and apply it everywhere to quickly reduce the number of bugs. The bottom line is that I have about a week to knock out these problems. However problems are harder to isolate when you are running the release version of the code. We will need all the luck we can get.

Trouble with Threads

Our application suite frequently has code that spawns a new worker thread. Most of the time the worker thread does a lot of database retrieval. I just ported one such block of code to Visual C++ 2005. and the Oracle 10g client. Initially there were problems with the worker and main threads colliding with database connection. I fixed this by explicitly using Oracle Pro*C directives to manage separate database contexts.

There was one operation that gave me a lot of trouble. Part of the problem was that the worker thread interacted heavily with the main thread. At first I just implemented the brute force fix to put all the database calls in a separate context. However this resulted in a lot of Oracle run time errors. For a while this stumped me. Then I realized that some of the database work was actually done in the main thread. So I needed to modify some of the calls to use the database context for the main thread. Perhaps the problems stem from a tricky design that passes control back and forth between main and worker thread. Or maybe I was not paying close enough attention to the thread that was executing database code. I am just happy that this final problem has been resolved.

Now I need to get a release version of the code to build and run. But that is a story for another post.

Hack Versus Engineering

Our legacy software has a directory chooser in it. The code makes use of the Open File dialog. However there are some modifications to the Open File dialog to make it select directories instead. This worked in Visual Studio 6. However the code aborts when compiled under Visual Studio 2005. I passed this problem to the guy responsible for the module.

The assigned developer started to research a solution to the problem. But he found himself encountering a lot of problems that might put us behind schedule. He escalated this situation to our team lead. So now I have been assigned the task of making the directory chooser work. It was recommended that I swap the code for a well designed implementation of a directory chooser. But like the other developer, I have a lot of tasks on my plate that need to get done soon.

Apparently here is what has happened. Some controls on the open file dialog have been changed. Our system's modifications assume that the controls will not change. So our code is trying to reference a control that no longer exists. Changing our code to reflect the new control stops the program from aborting. However the directory name is strangely right aligned in the edit control that our code adds to the dialog. It looks like I just need to modify this window's style. And wouldn't you know it. I could not knock out some code to reset a bit in the style DWORD. At least I was not able to do it in a couple minutes. I want to make sure the resultant code is easy to read and understand by someone who might look at it in the future. And I do not want it to be some cheap hack. I am looking to do this right. Hopefully after a good night's sleep I can get it done.

Software Maintenance