Check Your Subroutines

We are delivering our latest release to internal test today. Had a code review yesterday. Many issues were found. We are fixing the highest priority problems. Before shipping to test, thought we would do one final smoke test. Ran all the functions. Checked to see if data was loaded. Most looked good. One target table was not getting any records.

Very odd. Called the function in the same way. The template for the function matched the others we were using. The routine even spit out some debug messages which confirmed each of the select statements found a lot of records. Me and my buddy ran this thing a total of four times.

The clue that got me on track to a solution was that a table with the same name in another database schema kept getting more records. How was that possible? We were setting up the PostgreSQL search path explicitly at the top of the function. But by the end of the function, it seemed the search path was changed.

Initially I thought maybe some weirdness was going on because the DBAs recently dropped the tables and recreated the tables in our target schema. However that DDL should have been instantaneous. It was just suspicious because they cloned the tables from a prior schema (the one now getting all the data).

Finally I had an epiphany. Maybe our big routine was calling some helper functions that were doing something naughty. I searched for the word "perform" which is how we call subroutines. Bam. There was a helper function resetting the search path to the wrong database schema. That was a very tough bug indeed.