We have an environment where users can train with real data. Each year our DBAs copy some interesting production data to the training database. This is a good place for users to go over the new changes in the software.
User complained about the application aborting with Oracle errors in the training environment. A couple people chimed in with ideas on what could be wrong. Some consultants said it might be invalid database objects. I gave them a little help, but was focusing on other issues.
The DBA manager took a look at the problem and said it must be an application problem. This guy has been on the project a long time. I agreed with his diagnosis and told him I would get on it. As with all trouble tickets, I asked the customers what they were doing when the problem happened. I wanted to find the pattern. I got back a message that the problem was occurring randomly.
Sometimes "random" to a user is not so random for a developer. So I contacted an administrator at the training facility. I got him to go over the scenarios when the application aborted. Then I asked him to try and go make it happen and report back. Each time I spoke with him I got more clues. And when I asked him to repeat some steps, it turns out that this problem does have a broad pattern. The error only happens with some specific data.
That was all I needed. I keyed in on the trouble data. And it turned out to not be this data itself. But there was some metadata tables which were not in sync with this data. Our application depends on the metadata being correct otherwise all kinds of bad things happen. Our DBA manager chided me for having software that degraded so poorly with bad data. I told him this was a drawback of having a ton of data that needs to be accessed quickly. But at least we got to the bottom of the problem. Fix coming out shortly.
A Little Bit of Crypto - I have been trying to figure out to "collision resistant" some of these standard hash functions are. It is a tough concept to get my head around. I figure...