I babysat the job this week. We ran it one day early to give us time in case we encountered errors. Of course the job ran fine when I ran it during the day. So I declared the next order of business was to replicate this problem in development. My old architect buddy thought it could be done. Maybe not. But I was going to give it the old college try.
I worked with a developer from the team that introduced the new performance job at night. We both ran our scripts at the same time. After a couple tries, I had still not replicated the problem. I told the dude to give me a copy of his code. Then I went to work. I spent 30 minutes reading up on the UNIX Korn shell. Then I wrote a Korn shell script to (1) generate new data, (2) run my job in the background, (3) wait a bit, and (4) run the new job.
Now I varied the amount of new data added each time from 1 record to 300 records. I also varied the amount of time waiting from 1 second to 5 minutes. Made sure to log everything. And since I launched this thing from an interactive shell, I prepended it with a "nohup" and ran my main script in the background too. I estimated best case this thing would finish in 12 hours. However it might run for as long as 24 hours. Luckily it is Friday. I imagine nobody is working in the development environment on a weekend.