One would think you could do a new build with the new official number, update some docs, and be on your way. No such luck here. The real crime is that the release went through peer review without catching any major problems.
Lucky for us, we have a diligent test team. They called me over many times today to point out the problems. I told them to just annotate all the issues. I dove into the situation when the release was returned to application development.
The major issue was that the build had failed. The build scripts just packaged up the old components. Everybody thought the build worked. We can fix this by designing and implementing better error checking in the build. But this was just the beginning of the problems with this release.
I do not know how to fix the true source for most of these problems. Perhaps we just need better people on the job. But you would think that you could put in a process that allows even mediocre employees to produce adequate software. Once again the test team saved the day by catching the problems before we shipped the junk out the door.
I digged into the problem. Found that some new SQL in the application did an improper join. So now queries were always returning No Data Found. I provided the responsible developer with the code that would fix the problem. And I told them to release the fix to test.
The following week test reported that their progress was halted because they never received the fix. A manager asked if I could step in. So I searched for the release that fixed the problem. Turns out it was never done. Even worse, it was scheduled, and some Clearcase labels were applied to the code. But no executable. And no documentation.
The configuration management team had left for the day. So I resorted to emergency procedures. I generated a builf myself. Created the documentation. Did a smoke test on the fix I had pointed out originally. Luckily there was one developer left at work for peer review. The show will go on. But something is seriously wrong here.
I guess the testers never paid much attention to password change functionality. Well they are now. And they are finding all kinds of strange behavior. My guidance to them is to make sure they can replicate any problems. Then throw them over to development to resolve.
Our overall end goal is to (1) make sure we meet the client's requirements, and (2) ensure that error messages to the user are informative and useful. This is no small feat. It does not help that previously any error would result in the big dialog box that listed all the rules. I consider that poor.
I could go on about this subject. But right now I got a new trouble ticket based on a remote scenario. If your password expires, and my app forces you to change your password, but you type in your old password incorrectly, you get an unusual error message.
Some of our problems are expected. We just branched out in Clearcase for our baseline code. This version of code does not have any new features that are scheduled for a later release. Every time we branch out there are some build pains. It is problematic now because we did the branch right before the big release time.
I had a feeling things were falling apart when I kept getting calls from the configuration management guys. They seemed to keep getting confused about which branch they should be getting the code from. I tried to help out as best as I could. But CM is supposed to dictate this to me, not the other way around.
A number of other delays are due to improper setup. If you are a developer, and you need to do a build, you cannot wait until the day of the build to try it for the first timer. I can guarantee that it is not going to work. For now I am sitting back. If I swoop in and solve all the problems, it will only encourage future problems and more work for me. You have to feel some pain before you can help prevent pain.
To get ready for the release to production, we needed to create new build scripts which access the new Clearcase branch. You would think this was a simple task. It got assigned a developer that created the build scripts in the first place.
I knew we were in for a rough ride when this person said their Label View in Clearcase was not working. Here is how our build works. It creates a Clearcase label, applies this label to the latest code, then modifies a special label view to see just the code that has the label. The build then uses this view to get a copy of the code to build.
The problem is that the configuration management team does our builds now. And an ambitious system administration team removes your Clearcase views if you do not access them every 30 days. Turns out our developer's view had been deactivated. Today is the day after Christmas. you think we can get this view problem fixed today? Unlikely.
Luckily I had an Outlook reminder that pops up each week. It tells me to run my "Keep Alive" program that touches each of my views. That way the sys admins do not delete the views I need. I was able to help the developer test out the changes to the script so we can build off the new branch. Saved.
But another tester came over and said some problems he found were not fixed. I asked if these problems would prevent the release from going out. He said absolutely so. I got the feeling he wanted somebody to pay more attention to the minor bugs he found. But we were on a roll knocking out problems that day so I indulged him.
On one query screen he saw the word "Group". So he put the group number in that field. But the software did not find the record for that particular group number. I looked up the record in his database. Sure enough it was there. So I got a login to the test database and ran some queries with the app. Lucky for me the app dumps out the exact SQL executed to a log file. When entering the group number, it looked like the app was querying against the group name column. I dug into the resource and found the full word on the screen should have been "Group Name". The tester had a lower screen resolution so the word got clipped.
I went to the tester and explained that this was the Group Name field. When he put the group name in the field, the record was found. Chalk this one up to operator error. Lesson 1: run the application with the required screen resolution. Lesson 2: not all trouble tickets are mission critical.
Normally government contracts are steady work. The work is done by contractors that have long (~5 year) contracts. This leads to more stability for both the client and the contractors. However the contracts get put up for rebid approximately every 5 years as well.
Nobody can say for sure why my company has lost the recompete for the contract. Maybe our bid cost too much for the client. Or maybe it was because during the current contract we botched a port to a web version of our client server application. A buddy told me sometimes the government changes contractors just to shake things up. The bottom line is I need to make sure I still have a steady paycheck coming in 2 months from now.
Now I have been working for my company for 4 1/2 years. But I previously worked for the company in the 90s. So I found that I could bridge the experience to count towards benefits. However for each benefit I needed to fight with the Human Resources machine to make it happen.
Getting my 5 year gift was no different. I called up HR and submitted a call ticket. All I got was an e-mail stating I did not have 5 years and the ticket was closed. After replying that I had 7 years, just split between two jobs, I was informed I needed to submit another call ticket.
I should have known then it was going to be a rough ride. But I agreed to the pain, and submitted another ticket. Got a call saying my dates would be updated and the request forwarded the following Monday. Couple weeks later no gift. So I called the HR person back. Left a message. Waited a week. Repeated this for a few weeks until I got tired of this. Finally I got a call a month later saying my dates were updated and the request would be forwarded to ther person who does 5 year gifts.
Once again I should have known I was getting the shaft. No responses to my e-mails. I called up the HR help desk and found my second call ticket got closed as well. The solution? Open a new call ticket. In programming we call this the infinite loop. I don't like to waste my Project Manager's time with stuff like this. But I was getting nowhere fast. He said he would call the local HR rep. No luck just yet.
In all of this, I have tried to walk away with some lessons learned. Most of our work consists of maintaining a suite of legacy software applications. I get assigned trouble tickets on my software all the time. Sometimes us developers are in a rush to close out these tickets. Sometimes the tickets get reassigned to other developers, teams, or even other project. I need to keep aware of the non-service that my company HR team has given me. I try to be diligent, keep customers informed of trouble ticket progress, and work hard on the hand-off to others. This won't fix my HR problems. But it makes for a less disgruntled customer when the application break down on them. Hey. Somebody has to pay it forward.
I got a call from a tester about some strange behavior. This did not sound right. Unfortunately I was busy waiting for another call. So the tester volunteered to come to my desk to show me.
The first problem was simple. Although the software appeared to disable some fields at the wrong type, this was operator error. The tester was not carefully selecting the right choices in a drop down.
The second problem was more serious. The software would not behave the way I described in my change notes. I verified that it was not working right. Still had to stay by my phone. So I called over another developer whose code I had included in the build. Assigned them the task to track down what was wrong.
Here is the crazy answer. The confiruation management team ran our scripts to do a build. This resulted in a new executable that looked fine. Even had a new version number built into it. However the build failed, and the build process just took the components from a previous build and packaged them up into a new install program.
Shame shame. The real kicker is that our team rewrote the build scripts. This did not use to happen with the old evil Korn shell built scripts. Luckily we have a diligent test team. Note to development: Fix your build scripts.
Our test team called up and said they could not access the FTP server where our installs are located. Sure enough. I could not access it either. First line of defense was to call our company's Help Desk.
The company Help Desk asked me a number of questions. They stated the FTP server was hosted on a customer machine. So I needed to call the customer help desk.
I am working to get my release out the door. So I call the customer's help desk. They want to make sure it is not my company's problem. I told them I just got off the phone with the company help desk. To be certain, the customer's help desk conferences in my company help desk. Everyone agrees the customer troubleshooting team should work this problem.
We have 50 people on our team. I told them this was affecting all 50 people. The trouble ticket got assigned the higest priority. That's when the real fun began, I started to get a lot of calls at my desk phone. When a trouble ticket of the higest priority is opened, all kinds of heat is applied. I should know. Sometimes the trouble is with our software.
By the end of the day I am back in touch with my company's help desk. Apparently somebody in the networks division decided to rename the domain name for our FTP Server. I wanted to track down the perpetrator and give them the beat down. Unfortunately I had my hands full with other software releases that now had instructions to use the wrong domain name to download the software. Nice.
So I walk by the cubicle of two developers that are on my team. One of them has their head leaning back due to sleep. People get tired. I get it. But when I go back to my desk I hear the developer snoring loudly. I felt a little sorry for the other person in the cubicle.
Come on now. You got to stay awake. Drink some caffeine. Read something interesting. Work on a side project. Or do the work to make sure the little we have to do actually gets done already. I think our team lead went over and made some noise in the area for a wake up call.
Normally I am quite concerned about the development release schedule. But I am just a developer on my project. If I do not know what the plan is, chances are nobody else does either.
Sometimes it is a developer's job to bring such mysteries to the surface and force resolution of the problem. I ended up in our project manager's office. We got the due dates for the different pieces we need to deliver. We got the mechanism on how each pieces was going to be delivered.
I went back to my desk and started scheduling the builds to get the software delivered to the interested parties. Things are very easy when your developers know the plan. It also helps if there is a plan, and as associated schedule. Otherwise, you will have pandemonium.
- name is Gattoussi Ramzi
- lives in Tunisia
- has no copyright law in his country
- is a good programmer
- wants a job at your company
- has decompiled and reconstructed your source code
- can provide evidence of such code
- will build web site to sell your software if you do not cooperate
Luckily I have never received such an e-mail threat. I work on specialized systems whose source code normally only goes to one client. If somebody wants to take our source code and sell it via web site, I would wish them good luck.
But I could imagine how this would be detrimental to a company that sells commercial software. I am not even sure how one would combat such evil tactics.
At least the CM Team knew what was wrong. I did not label the directories in Clearcase. You need to do this if you add new files to the directory. I knew this. But I did not know there were new files involved. I set up the build on behalf of another developer that was out sick.
Luckily I had just learned about WinDiff earlier in the day. I compared the CM branch with the development branch in Clearcase. Located all the directories that had new files. Applied a label and let the CM team do a rebuild.
The normal process is for application development to peer review the result of the CM build. I did some review and corrected a number of problem. But since I initiated this release, I could not peer review my own work. The problem was that at 8PM there were no developer left. So I put out a memo to the development team, pretty much saying that the first one in the next day had to do the peer review.
I have a good feeling we are going to make our delivery date. The cost? Me and the CM team had to stay at work until 8PM. Lucky for me I work the late shift normally anyway.
When I got to the office I had a voice mail from the boss. Turns out our customer needs a new build by tomorrow. And I am supposed to do the build. The problem is that this build needs all the new stuff that got put in. And the guy that put it in was out sick today.
So I called up the guy and he told me that the WinDiff utility could show me the files I needed to include. I knew the folder where configuration management keeps the source files they have so far. I used WinDiff to compare that directory with our development directory.
It did take some work. But I quickly identified the files I think constitute the "new stuff". This would have taken forever if I did not have WinDiff. That's a real life saver. They are doing the official build for the customer right now. I got my fingers crossed.
Here are some of the tools I have installed on my workstation.
- $1,744 Rational Clearcase
- $1,645 Rational Clearquest
- $2,148 Rational RequisitePro
Although I don't use RequisitePro, I might have to start it up once now that I know how much it costs. I think our client pays for huge licenses so they get a big discount. Still that's a lot of bread.
This developer who helped out was the guy who developed the build scripts. Since the first build went wrong, he kept an eye on the second build done by the configuration management team. It is easy for him to do this since he coded the build scripts to e-mail him every time a build completed.
He came by to deliver the bad news. The second build got screwed up too. The problem stemmed from the fact that I update a configuration file that is used by the build. I put a Clearcase label on that version of the configuration file. The configuration management team then tries to use Clearcase to merge this version of the file into their branch. This merge did not work with either build today.
Using the current process, it seems the had a 50% change of getting the merge right. But I guess they did not understand what they were merging. I brought the author of the build scripts upstair to sort this out with our configuration management team. I told them that I set up all the configuration data for each build I request. We agreed that they would stop trying to merge these changes in, and just cut-and-paste the file I label.
Some times you just got to sit back and laugh.
I thought this would be a trivial task. Just duplicate the problem, break into the code in debug mode, and find out why it was happening. Easier said then done.
Part of the difficulty was that the text of the error message did not make a lot of sense given its context. A bigger problem was that the code was very complex. There were related classes involved that had data members with the same name. Ouch.
So I reverted back to the techniques I have learned over the years. Set some breakpoints to find the code closest to the problem. Use the "find in files" (aka grep) technique to try to search for where important variables are set. In the end I traced the problem back to a missing item in an array which described all the database columns our query was retrieving.
Sometimes you just need experience to plow through someone else's code and debug problems.
When at work I try to maintain a professional attitude. However that does not mean I let e-mails like this go unchecked. So I scheduled a meeting with the test manager, project manager, and our whole dev team.
I opened by the meeting by stating that I was offended that I was accused of not following our software process. This was just not true. However I gave the test manager an opportunity to explain further. Apparently this was just a reply-to-all e-mail which was sent based on statement made by a tester. The e-mail was not necessarily directed towards myself.
As soon as I determined the intent, I was relieved. We spent a little more time discussing how we would resolve the problems found. Apprarently this had all been a failure to communicate. Having a sit-down meeting was a good venue to resolve the issue. Starting off an e-mail flame war would have been a step in the wrong direction.
Just as I was about to go to lunch, somebody needed another build. Each developer on our team is supposed to know how to run a build. But it seems only the chosen few actually do them. So I kicked off a build and ran out the door.
Prior to leaving I had the option to sign off that I peer reviewed the build documentation. The way this is supposed to work is I read the docs, mark them up, and sign when they are ready for release. But since this would require me skipping lunch, I just signed them. Hey. I was hungry.
The last pain of the day was when a fix I did came back to me as broken everywhere. I installed the version of the application that everyone had. Could not duplicate the problem. So I pinned down one of the people who found the problem. Seems they were using another part of the application. But I suppose my code broke that part as well. This seems to be a case of poor unit test coverage.
Yep. It was a Monday.
On the way back from lunch, they close all lanes on the highway. It takes me forever to make it to the next exit to get off the road. By then I knew this was going to be a very bad day.
When it rains, it pours. All the networks go down an hour after I make it back to the office. And they remain down for the rest of the afternoon. Help Desk said we were warned.
Is there any moral to this story? Could be that the early bird gets the worm. Note to self - get in early next week.
Normally I am all for mixing it up to stand out. However the example provided was a bit unusual. The employer said he got a cover letter from a girl. And the girl included interests like "long walks on the beach" and "romantic candlelight dinners" in her cover letter. WTF?
Who knows? Maybe if I were hiring and a supermodel were applying, it would help her to put unusual phrases like that in her cover letter. By why bother with that?
If you are a supermodel, you can just attach your picture to your resume. No need to waste time. I probably wouldn't penalize a supermodel for spelling mistakes either.
The more I thought about this rollback script, the more I felt like it was cheating our users. It takes a lot of customer effort to start up the new year. Why make them go through the pain twice next year? I could blame it on the schedule. But that would be irresponsible.
So I asked a memeber of our requirements team to set up a meeting with our client. I wanted to make sure they understood the prior plan to execute my script, undoing a months worth of startup work. The goal was to discuss a better way to handle the task.
Our project manager likes getting carbon copied on important e-mails like this. He responded with some ideas that did not make sense. So I went to have a talk with him. He threw out some ideas on how to solve the problem. I provided him with the pros and cons of each idea. In the end we agreed upon a solution that does not create a lot of work for me, and saves much time for the customer.
I find it refreshing to work under a project manager who actually started out in development. They can slip back in developer mode and understand the technical issues I am dealing with.
So I got the latest copy of the code, but could not duplicate the problem. Next I went and retrieved an exact copy of the code that CM used for their official build. We actually have configuration management practices that let me trace the version of every file used in the build. But even with this version of code, I could not make the problem happen in Debug or Release mode.
The CM Team recommended that I let them try to do a build again, in case this was a one-time fluke. No such luck. The problem persisted even in the new build. So I went to the build machine and ran the application built there from Visual Studio. Could not make the problem happen. At this point I knew we had a very strange problem on our hand.
So I went through our whole application suite, trying each executable and DLL one at a time. I still could not make the problem happen after copying all our target files to the development folder. So for kicks I tried copying some of the system files we deploy during install. And this is where I stumbled upon the problem. Turns out the MFC DLL we ship was not getting properly installed into the Windows system directory.
Since I was in a hurry, I just changed our install code to copy a local copy of the MFC DLLs to our application directory. This works but is not elegant. Unfortunately it may have to stay this way.
At first I thought I had better investigate. Then I decided against it. I already had a lot to do today. So I just chugged along. I did feel sorry for the person who had to share a cubicle with snake-woman. For you Harry Potter fans, Salazar Slytherin would be proud.
With it being Thanksgiving, we have some family visiting us from out of town. Grandpa wanted to play some PC games with the kids. He is a a computer tinkerer. And he was able to temporarily bypass the Vista restrictions and install a game rated M for Mature on our PC.
Here is a where the funny part comes in. Gramps either logged off then back on, or rebooted the machine. He then found out that Vista located the game that violated the ratings policy. Vista then proceeded to uninstall the game on him. LOL. He came and asked us for help in turning the Vista security off for good. Nice try guy.
New guy did not make much noise. So I figured he was able to do the build successfully. Luckily our team leader kept checking in with new guy. Seems more like the new guy kept quiet because nothing was working.
Our team lead got fed up and started to worry that the build would not fixed in time. So he sent me in to help out. First problem was missing custom build instructions for a Pro*C file. Second problem was missing include directory options. Final problem was harder. But in the end we found a developer was removing files from the source code repository that the build expected.
Nobody expects a new guy to step up and figure out tough problems on his own. But we do expect you to dig in and do the hard research. More importantly, you need to speak up when you need help. Otherwise we think everything is OK and find out the truth too late.
Apparently a novice programmer got a job as a technician in a PC repair shop. In his down time, he coded up a customer work order system. This replaced a piece of garbage system that cost the owner $1500.
Then this novice programmer wrote a Computer Cleaner application from scratch. The app was done in time to market for the Xmas season. Preorders alone brought in $50k of revenue. Altogether the work this novice programmer did increased revenues from $20k to $350k in one year.
Turns out the not-so-novice programmer asked for a raise from his paltry $22k a year. In the end he got fired. I consider that a good thing. If this guy could rocket revenues up to $350k a year, he needs to start his own software business. That way he can pocket the big earnings for himself.
Our goal was to keep the new change branch be a superset of the baseline branch. Any time we made a change to baseline, we were supposed to also make the exact same change to the new change branch. At some point in the future we plan to fold the new change branch into baseline making a new baseline.
Today I was trying to promote some baseline fixes to the new change view. The changes were significant. So I asked our developer most familiar with Clearcase whether I could just copy and paste the baseline version to the new change view. The Clearcase expert told me to just graphically merge the files using Clearcase. I felt a bit uneasy about this. But I thought I might as well give it a try.
At first I could not see the right branches to select where to merge to and from. Seems that I first have to check out the file on the destination branch before seeing it. But even after that I could not check in the resulting changes from the merge. Apparently the act of checking out the file caused the new version to "move" from the baseline to the new change branch.
After some meditation I realized my visual model of the Clearcase branches was not correct. I had assumed that when we branched out, a snapshot of all the version in baseline at that time would be the starting point for the new change branch. But this was not so. The new change branch equals the baseline branch for every file you have not started branching out new changes into. Who would have thunk it?
Maybe I will figure out a way to describe this version control configuration in a picture. For now I hope my words suffice. And as usual, you should be careful which abstractions and assumptions you make.
Here are some of the problem I found in the software release documentation:
- Wrong build number listed
- Wrong file timestamp listed
- Not documenting special circumstances of release
- Filling out sections of the document that our customer reserves for their use
- Filling out other sections incorrectly, violating customer policy
These are the basics that we should always get right. Perhaps developers and configuration management know they can be sloppy since we have a process of review that catches most of these problems. Or maybe people have Thanksgiving fever and just want to slap something together and go on vacation.
I know I personally perform quality control on documents I generate before I pass them on to others for review. But that's just me. Maybe that's what our process is really all about. No matter what the skill or level of effort performed by the staff, the process still is set up to indentify and resolve problems before we ship stuff out to our customers. Now this usually only works on big projects with big budgets. If I worked at a startup, I imagine there would be minimal process and more individual responsibility to get things right the first time.
This is where I come into the picture. The developer asked me to do a peer review on the documentation for the release. So like any good reviewer, I pretended like I was the customer and followed all instructions in the release document. Found a couple clerical problems that could be fixed real quick. I also found a show-stopper: the install program did not install the application. So I informally used my veto power and held up the release.
When I went to discuss this main issue with the developer, he said he also found that the install program did not actually install the application on his computer. But he said it worked on another machine. At this point, the install appeared to only be working 1 out of every 3 times. I don't like these odds.
Due to the fact that this was a critical release, I volunteered to dig in and find out why the install program was not working. Our build scripts are written with Apache Ant. The scripts call Visual C++ to produce the EXEs and DLLs. The scripts also call Installshield to convert these into install files that we deploy. I think the scripts also use WinZip and/or Ant to turn the final set of files into one self-extracting executable that we deploy.
So I started by manually extracting the files. Everything looked good. The I ran the install in verbose mode. No errors seemed to come up. I tried closing out all other Windows apps before running the install. No luck. I tried uninstalling a lot of other applications first. Still no clues. Finally I ran the install in verbose mode one more time and looked for anything unusual. Even though the install went by fast, I saw some of the files it was unpacking and installing. These files were not part of the application that my coworker was trying to release. These were from another application in our suite. That was it.
Turns out somebody took the install executable from one of our other applications, renamed it to look like the latest release we needed, and passed it on to development. This in and of itself was a heinous act. But the real crime would have been if we allowed this release to go out even after crucial problems were detected during peer review. Luckily the our process saved us.
Apparently the one combo box that intially gets the focus kept changing when the user scrolled the mouse wheel. This even happened after the user clicked on the vertical scrollbar.
The reaction from most people on the project was to say thats how the control with the focus behaves. However I take all trouble tickets seriously and did not want to blow off the concern. So I consulted what Microsoft Word does in this scenario. And sure enough - Word will not scroll the control in focus if you click on the scrollbar and scroll the mouse wheel. Microsoft Word scrolls the whole dialog screen.
So far I have only started looking at ways to quickly fix this problem. At first I hacked in a handler for WM_LBUTTONDOWN on the main dialog. The handler tried to to send a WM_KILLFOCUS to the control that had focus. Like most hacks this did not work. And what did I do? Make a more complex hack. I created separate hidden button to which I switched the focus on WM_LBUTTONDOWN.
The hacks only worked for some left mouse clicks on the dialog. If you click on the scroll bar area, you need to handle WM_NCLBUTTONDOWN because that is a non-client area of the dialog. This hackology was getting too deep without consistent results. So now its time to go back to the drawing board to fix this problem right. Any ideas?
I like reading the latest posts of programmers links. A recent link brought me to a page about managing complexity. The page was apparently written by a dude named Phil Haack. But this was only his handle, not his real name. He claimed to be a senior project manager at Microsoft.
The more I read this guy's web page, the more I doubted he really worked for Microsoft. I mean the page was littered with Google ads. And he had a gmail e-mail address. LOL! What Redmond manager worth his salt supports Google publicly. The guy posted his resume and surprise: Microsoft was not listed on it.
I believe in doing due diligence. Since this Phil Haack peaked my interest, I decided to dig a little deeper. Then I found the info that cleared things up. "Phil" only recently got a job offer from Microsoft. He was going to be a Microsoft senior project manager at Microsoft.
It will be fun to see whether this Haacker changes his public image after joining The Borg.
Now we are at the point where we are supposed to deliver the first additional change to testing. Surprise. The software is not ready. The delay is not entirely due to the late addition of the feature, but mainly because it took a long time to agree upon the requirements and design. Apparently a lot is riding on this first delivery. The customer community at large is using this delivery as a milestone to determine the confidence they have in development.
Some of the comedy in this situation is that I heard even the week before the delivery there was new customer talk as to how the software should work. Luckily I have little to do with the first major feature. Those guys doing that development are putting in the overtime. At least they are getting paid extra.
Another funny thing about the situation is that you need to cut corners when you are behind. Instead of putting a spreadsheet control in the application, they are just launching Microsoft Excel. Sounds like a valid shortcut. But for a while there were even some problems doing this programmatically. There was some weird green triangle showing up in the Excel title bar. I don't much of the detail behind this because I am trying to stay out of it. There is just something hilarious about the whole ordeal.
A couple things happened on the way downhill. At first our documentation team got cut, leaving us with contract documentation support when needed. Now we don't have anybody. So we have to sacrifice one of our programmers to be the documentation guy.
Another thing that happened was we went to Rational Rose. This in and of itself is not a bad thing. The theory actually sounded quite good. Do all of your design in Rational Rose. Then run a report to extract the design info to Microsoft Word output. The problem is that this report process doesn't really work too well. So now when we need to get documentation updates, we make the change manually in the Microsoft Word file AND update the Rose source.
Something is very wrong with this process. It is no wonder I have little desire to add all the good design information to the design docs. Somewhere along the way to having the system fully documented in a nice tool like Rose, we got derailed into a documentation nightmare. So much for documentation being the power of information.
Our installs do not do anything out of the ordinary. They unpack some DLLs and register them. Set up some keys in the Windows registry. Put some icons on the desktop and add links to the Start menu.
We have been doing fine with our original version of Installshield for so long that I did not even know that Installshield the company was bought out my Macrovision way back in 2004. This might be a testimony to how well this package does its job.
I have heard about some installation/scripting packages from Microsoft. Perhaps it may be time to look at them further if we ever get around to moving our apps to dot Net. But since we have a good size base of around 600 users, and plan to get a whole lot more next year, there is a management desire to go to the web. System administrators don't want to worry about pushing our apps out to the desktop when they can just update one web server.
Our normal build script gets the code to build from the main Clearcase view. I asked the author of the build scripts to give me a version that would build from the new view. With the new script in hand, I tried to get a build done. Compiler kept complaining about missing files. The way the scripts work is that they label the latest code, update a build view to access all labelled files, then copy the files from this view locally to build.
At first I thought the missing files were not getting labelled. A quick check in Clearcase showed some missing files were checked out by another developer. After a quick team pow wow, I got the developer to unreserve the files. Build still did not work. Then I saw that the label was getting applied correctly. Time to take a closer look at the build scripts. The evidence pointed to the scripts not updating the view to look at the correct branch in Clearcase. Got the script author to fix this and the build was a success.
Part of problem solving is to make hypotheses about observed events. The key is to verify that all assumptions are true before treating the hypotheses as fact. It is all too easy to skip this step. The result is that you travel down wrong paths and don't know where to turn.
There was an opportunity to move to a more recent IDE when the system was being re-engineered a couple years ago. Unfortunately the powers that be decided to switch to Java. The price was right since the Java IDE was free. However the re-engineering effort failed and we had to resurrect the Visual Studio 6 version of our code.
Currently there is not a huge business case to get the client to invest in Visual Studio 2005. That will require money to purchase the licenses for our entire team. A higher cost will be the work to port the current code to the new compiler. All of this will provide little to no benefit to the business users of our application suite.
I am thinking that, as some of the developers gain subject matter expertise, a good reason to upgrade is to keep the developers happy. It is a no brainer that hiring replacement employees is expensive. If we can tie the upgrade to a crucial business enhancement as well, we may have a slam dunk proposal.
Recently this team submitted a trouble ticket. I could not duplicate the problem. The trouble ticket had lots of information. But I suspected part of the problem was their data.
In the old days, I was on good terms with this team. They let me log into their database. Now there is a whole new test team. And they are reluctant to give me a login. Therefore I have to "Debug by Phone" to resolve problems.
Try to picture the difficulty of running queries by talking to someone on the phone who does not know SQL*Plus. Bottom line is that I cannot solve their problems quickly. This is not good because I have a lot on my plate. On that note, time to get back to coding.
The tester got back to me and said the script was not working. He tried to run it a couple times. But it kept aborting after 10,000 rows were added. I got his database login and tried running the script myself. Same problem. This was strange because it worked every time in my development database.
The weird thing about the problem was that the error was not ocurring directly in my script. Instead it was failing in an audit database trigger. The problem always happened after the 10,000th row was inserted. I looked at the source code of the trigger. The code obtained a unique number using a database sequence. It then used the unique number as the key to insert records into the audit table.
It took a while to come up with the test that exposed the source of the problem. I checked all existing keys in the audit table. And wouldn't you know it? Some of these were higher than the numbers coming back from the database sequence (which was supposed to provide new unique numbers). After cycling the sequence past the duplicate keys, the script ran fine.
As a follow-up, I had a talk with our DBA Team Lead. Wanted to ensure this could not happen in Production. He told me I should have sent the problem in the first place. He knows I have a lot of important things to do other than debugging database issues. I will take him up on the offer soon.
The first chore was to locate the instructions to build the current year software. It has been a long time since we did this. Luckily I keep hard-copies of important documents like the build instructions. So I started following the instructions. I fail early by not being able to remotely connect to the Build Machine. I ask the configuration management team for help. They tell me to come on up to their floor.
At the CM team's cubicle, I find a row of workstations all controlled by a single keyboard. I told them I needed to find the current year Build Machine. Figured the thing might have been powered down. The CM guy did not know we have separate machines for the current and future year software. I ask myself why we let them control our machines.
Since the CM guys are getting me nowhere, I track down the guy that wrote the build scripts. He said the System Administrators renamed the Build Machine. I was able to piece together the new network name based on some rules the SAs used. With that I was able to log into the Build Machine.
Next step is to kick off a build. The build script errors out fast. Apparently it is trying to access a Clearcase view that no longer exists. I check all my views. The missing one does not seem like one I remember using. Had to speak again with the author of the build script. He suspected the CM team changed the view name on us. I sense a pattern with the problems.
When I finally got the darn app to build, I updated the documents that show how to do a Production build. I also check in an updated script with the correct Clearcase view. No reason any other developer should have to go through this pain.
There are 2 main documents generated by a developer for peer review: (1) Code Diff and (2) Unit Test Plan. The Code Diff tells a review what you changed and why you changed it. The Unit Test Plan is an outline of how you debugged and verified your changes. Sometimes the act of actually writing your unit tests down improve their quality.
Recently I was asked to peer review a change in the sorting for a spreadsheet on one of our screens. The sorting was multi-column, along with some non-trivial rules for the sort order. Luckily I had a lot of domain knowledge for this part of the app. Here are the things I looked for when doing the review:
- Were the variable names chosen to provide meaning?
- Are there any "magic numbers" in the code?
- Are tricky pieces documented with meaningful comments?
- Was the code written so that it could be easily maintained?
In the end I had to go through two passes of peer review with the developer who coded the changes. The implementation worked. But the first cut was a maintenance nightmare. I tried to focus on the work product and not the developer. And I made sure the atmosphere was truly that of a peer assisting with quality improvement.
In the end, I think the routine turned out to be a solid piece of code. Only time will tell for sure. But I bet a new guy could come in, read the code, and understand what is going on without ever talking to the original developer.
Some time in the last year, our company's system administration group implemented a new policy on views:
- Views not used in 30 days are made unavailable
- Views not used in 60 days are deleted
I can understand the rationale (no pun) of this policy. The administrators are tired of having lots of unused views taking up resources. Fine with me. But my problems started when I needed some infrequently used views. Since they were not accessed in a while they did not initially work. I tried to recreate them, but all I get are errors. WTF?
After submitting a system administration trouble ticket, and escalating the issue up the management chain, I got a little help. Apparently the automatic view deletion script is buggy. They left my retired views in a permanently unusable state. Great.
Here is my plan to combat this unfortunate set of circumstances: Write a program to make sure none of my views get stale. I am a programmer after all. Muhahaha. I bet the other guys on the development team will find my prog handy. I shall call it "Keep Alive".
The description in the trouble ticket did not make a whole lot of sense. Rather than waste time guessing about this, I called the user directly. During our chat I determined which user operations were slow, and how long each one took.
Initially I was unable to replicate the problem in a development environment. So I queried the Production database to check the volume. Aha! Production had over 50,000 records. My test set only had 1,000. So I wrote a small PL/SQL script to generate test data that matched the Production volume.
Now I was able to experience the performance problems in development. Next I did some profiling to figure out where the app was spending all the time. Turns out that adding, deleting, and editing records were very slow. But the delay was not in the SQL code. The problem stemmed from a poor desin. After each add/delete/edit, the whole data set was reloaded and reprocessed from the database.
I am currently regression testing smarter add, delete, and edit operations. Got to make sure I didn't break any functionality with the fix. So far a peer review of my first cut has shown that a couple things are broken.
error LNK2001: unresolved external symbol
I was asked to help out. At first I did not know why he got the linker errors. But I asked some questions to get to the root cause of the problem.
- Did he get a linker problem for all his functions?
- What was the new code accessed?
- What exactly does this specific linker error mean?
- What other info did the linker provide on the errors?
It appeared that the functions names in his new class were getting mangled. The new code was part of a DLL. He was trying to use the code in an EXE. I asked if he made the EXE depend on the DLL. He did so that was not the problem.
I recommended the developer look up exporting C++ classes from DLLs in the MSDN help. He did not have MSDN installed locally. And wouldn't you know it? He could not access MSDN online either. So we googled "DLL export classes". First hit was DLLs Made Simple.
After putting __declspec(dllexport) in front of the class declaration, the linker problems went away. Maybe using DLLs are really simple. Asking the right questions and getting good technical information can make hard problems seem easy in the end.