Automatically exported from code.google.com/p/fake-data-generator
Testing data mining is hard. If all you have are real-world data sets, then you don't actually know what you're looking for. If you did, you probably wouldn't be using a data mining algorithm.
This tool-set is designed to create data sets with known properties and relations of varying complexity. It produces a full listing of the model it used to generate a data set, so the results of using a machine learning algorithm to try to develop a model of the data can be compared to the real model, since a real model exists. Varying amounts of noise and bias can be introduced, to simulate real-world imperfection of data.
GNU GPL v2