Data Generation Scripts

Our project had a big meeting to review what went right and wrong in our latest release. For some reason we spent a lot of time discussing things without getting a full list of problems. One issue that did get discussed was a longstanding one. We have outdated test data we use for development and internal testing. This has been talked about before. However we have not solved this problem yet.

After going home, I read a magazine article about writing scripts to generate test data. A lot of the ideas resonated with the situation we have on my own project. Data generation script writing requires great skill. You almost have to possess some development background. That’s why it never works when we tell the testers to create their own data.

A good way to plot out you data generation strategy is to create scenarios first. You should organize these scenarios around the flow a regular user would take in the real applications you are testing. Once the script gets written, it should be able to clean up test data that is created. It should also enable logging.

Some best practices for data generations script writing include liberally using asserts. You are making a number of assumptions when creating the data. Why not go the full distance and verify those assumptions. I read that you should use regular expressions why adding asserts to your scripts. I know I need to brush up my regular expression knowledge for that.

Test generation scripts should enter some bad data to try to break the system. The scripts should accept a parameter which indicates what environment you are populating. For example, this may be for development or for internal testing. Data generation scripts, like any other source code, should be kept under source code control. Check those scripts in.

Once the scripts are written, you should run the tests regularly. You never know when developers will break existing code. Finally you should send the output of your automated tests to as many interested parties as possible. I know we have a lot of data generation needs on my project. I think I am going to get involved with the solution in a big way.