Validation of Tax-Calculator

The Tax-Calculator computes federal income taxes and FICA taxes for a sample of tax filing units in years beginning with 2013. The Python code that performs the tax calculations has been validated in a number of ways. First, Tax-Calculator results for a number of tax filing units have been compared to hand calculations performed using IRS tax forms. Second, Tax-Calculator results for a large sample of tax filing units have been compared to results for the same sample generated by a detailed SAS program developed by Dan Feenberg and Ina Shapiro of NBER. And third, tools in this directory provide the ability to conduct cross-model validation work using any two tax models that read input formatted as expected by the Internet-TAXSIM model and write output formatted as written by the Internet-TAXSIM model. The ability to read and write tax information in Internet-TAXSIM format is provided by the Tax-Calculator SimpleTaxIO class and by simtax.py, which is a Python program that provides a command-line interface to the SimpleTaxIO class.

The premise behind cross-model validation work is that independently developed tax simulation models are unlikely to contain the same bug, which means looking for differences between the output from two models is an effective way to locate bugs in the tax calculation logic.

The tools included in this directory support the following validation work flow:

Generate a random sample of tax filing units (INPUT).
Generate OUTPUT from INPUT using simtax.py.
Generate OUTPUT from INPUT using Internet-TAXSIM.
Generate tax differences by comparing the two OUTPUT files.

Installing Validation Tools

The current version of the validation tools in this directory should work on Linux or Mac OS X without any changes and without adding any extra software. Those who want to use these validation tools on Windows will have to do three things: (a) install an AWK interpreter, (b) install a Tcl interpreter, and (c) translate the tests bash script into a Windows batch file (tests.bat). The Free Software Foundation provides a free AWK interpreter for Windows (gawk.exe) and ActiveState provides a free Tcl interpreter for Windows (tclsh.exe).

Using Validation Tools

Here is an overview of how the tools support the four-step work flow described above. The validation tools provide additional help from the command line. This overview consists of examples of tool use and assumes that the current working directory is taxcalc/validation/.

Generating an INPUT file

(1) Generate a random sample of 100,000 tax filing units for 2014 that have no itemized-deduction expenses and have no child-care expenses.

tclsh make-in.tcl 2014 a > a2014.in

(2) Generate a different random sample of 100,000 tax filing units for 2014 that have no itemized-deduction expenses and have no child-care expenses.

tclsh make-in.tcl 2014 a 1 > a2014-1.in

When not using the optional third parameter, the random-number-generator seed offset is set to zero. In the example above, the offset is one, which generates a sample in the a2014-1.in file that is completely different from the sample in the a2014.in file. The third parameter can have any value in the [0,1000] range, which allows generation of as many as one thousand alternative samples.

(3) Generate a random sample of 100,000 tax filing units for 2013 that have itemized-deduction expenses but no child-care expenses.

tclsh make-in.tcl 2013 b > b2013.in

(4) Generate a random sample of 100,000 tax filing units for 2013 that have both itemized-deduction expenses and child-care expenses.

tclsh make-in.tcl 2013 c > c2013.in

(5) The Tax-Calculator versus Internet-TAXSIM validation tests use six samples of 100,000 tax filing units generated by the make-in.tcl program. There are a, b, and c samples (see make-in.tcl for details) for 2013 and for 2014. All together, these samples contain 600,000 randomly-generated tax filing units.

Generating an OUTPUT file with simtax.py

(1) Generate Tax-Calculator OUTPUT (assuming current-law tax policy) for the c2013.in INPUT file described in the prior section's item (4).

python ../../simtax.py c2013.in

The resulting OUTPUT file is in the same directory as the INPUT file and is called c2013.in.out-simtax.

(2) Do same thing as in item (1) except emulate Internet-TAXSIM logic regarding the approximation of Form 2441 qualified persons for whom child-care expenses are incurred.

python ../../simtax.py --taxsim2441 c2013.in

The above command produces an OUTPUT file that can be compared with OUTPUT generated by Internet-TAXSIM. The Internet-TAXSIM approximation assumes the number of Form 2441 qualified persons is equal to the total number of dependents of all ages, while the default simtax.py approximation uses the number of dependents under age 17.

(3) Generate extra OUTPUT variables that are useful in debugging Tax-Calculator logic.

Temporarily edit the DVAR_NAMES list in the simpletaxio.py file.

Then execute a single test and save the INPUT and OUTPUT files. For example, to use the a14 INPUT sample under current-law policy, do this:

bash test c14 . save

After executing this command, the following files will not be deleted after the test: c14.in, c14.in.out-simtax, and c14.in.out-taxsim.

(4) Do same thing as in item (1) except instead of assuming current-law policy assume instead a policy reform that increases the top regular income tax rate by five percentage points and raises the social security maximum taxable earnings level by 100,000 dollars in 2013. Do this by preparing a text file that contains valid JSON describing the reform, and then specify the name of that file using the --reform FILENAME command-line simtax.py option. (There is an example of a more complex reform in the REFORM_CONTENTS section of the test_simpletaxio.py file.)

cat reform-example.json   
// Example of JSON suitable for use as an optional SimpleTaxIO reform file.   
// This JSON file can contain any number of trailing //-style comments,   
// which will be removed before the the JSON to Python dictionary conversion.  
// The primary keys are policy parameters and secondary keys are years.  
// Both the primary and secondary key values must be enclosed in quotes (").  
// Boolean variables are either true or false (no quotes; all lowercase).  
// Note the square brackets around single parameter values; double square  
// brackets are required when a parameter is "two-dimensional" (that is,  
// indexed by year and another variable like filing unit type).   
{   
  "_II_rt7": {   
    "2013": [0.446] // up from current-law 0.396 (TAXSIM: "18 0.05" option)   
  },   
  "_SS_Earnings_c": { // social security (OASDI) maximum taxable earnings  
    "2013": [213700] // up from current-law 113700 (TAXSIM: no option)  
  }  
}

python ../../simtax.py --reform reform-example.json c2013.in

Note that the above command will generate an output file named c2013.in.out-simtax-reform-example and its contents will be significantly different than those in c2013.in.out-simtax for tax filing units with high earnings and/or high taxable income.

Generating an OUTPUT file with Internet-TAXSIM

Generate Internet-TAXSIM OUTPUT for the c2013.in INPUT file described above.

(1) Browse Internet-TAXSIM model home page.

(2) Just under the heading Upload a file with TAXSIM data, do four things: (a) choose c2013.in as the file to upload, (b) click the On radio button to show detailed intermediate calculations, (c) enter 56 1 in the optional tax plan box to suppress property income smoothing in the calculation of the EITC, and (d) click on the button labeled "calculate using this file's data".

(3) After a few seconds a new browser page is opened automatically and the Internet-TAXSIM output results start to fill up that page.

(4) After results for all 100,000 tax filing units have been written to that browser page, save them to a file called, in this example, c2013.in.out-taxsim. Do this as follows: (a) in the new browser page, "Select All" and then "Copy" to put the contents of the new browser page into the computer's clipboard, and (b) use a text editor to "Paste" the contents of the clipboard into an editor buffer, but before saving the buffer to a file be sure to remove any blank lines and the Internet-TAXSIM message about using the 56 1 option (which is usually at the top of the results).

Generating tax-difference results

Continuing the above example that uses c2013.in INPUT, generate a summary of differences in intermediate and final tax OUTPUT variables and write those summary results to a file called c2013.taxdiffs.

tclsh taxdiffs.tcl c2013.in.out-simtax c2013.in.out-taxsim > c2013.taxdiffs

Reading tax-difference results

The tax-difference tools work with the same philosophy as the Unix diff command: no results are shown unless these are differences between the two OUTPUT files being compared. But instead of showing all the tax filing units whose OUTPUT differs, the tax-difference tools show a summary of the differences. The summary information for a particular OUTPUT variable includes:

ovar: the number of the OUTPUT variable whose differences are being summarized,
#diffs: the number of filing units that have a difference in the value of this OUTPUT variable between the two OUTPUT files being compared,
#1cdiffs: the number of filing units for which the absolute value of this difference is no more than one cent,
maxdiff: the signed value of the absolute value of the largest difference, and
[id]: the id (ivar[1] and ovar[1]) of the filing unit with the largest difference in absolute value. In other words, the filing unit with the given id is the filing unit with the maxdiff.

When appropriate, a second line of summary results is shown. This second line contains the number of filing units with an OUTPUT variable difference greater than one cent (that is, is not included in #1cdiffs on the main summary line) and a total tax liability (ovar[4]) that is greater than one cent in absolute value. If there are no filing units that meet both criteria, the second line is not written.

Automating Validation Process

Currently six samples, each containing 100,000 randomly-generated tax filing units, have been used to generate Internet-TAXSIM OUTPUT files that are stored in the out-taxsim.zip file. The tests bash script automates the four-step validation process described above. Without using the all command-line argument, the tests script conducts current-law validation for the c13 and c14 INPUT files. Doing this abbreviated set of tests takes about the same time to execute (roughly two minutes) as all the unit tests in the taxcalc/tests directory. Executing ./tests all in the validation directory causes all six of the samples to be used with current-law policy and for the c13 and c14 samples each to be used with four different policy reforms. It takes roughly nine minutes to execute the 14 validation tests. These validation tests are an good complement to the unit tests because they provide a much stronger test than the unit tests for detecting whether changes in Tax-Calculator code have introduced unintended regressions in Tax-Calculator logic.

Current Validation Results

Since early November, 2015, there have been no FICA tax liability and no federal income tax liability differences (of more than one cent in absolute value) between Tax-Calculator results and Internet-TAXSIM results for the 600,000 randomly-generated tax filing units described above. There are some (more than one cent) differences in intermediate results and they will be investigated in the future.

Among these 600,000 filing units there only a handful of marginal tax rate differences (of more than one basis point).

There are 49 filing units for which Tax-Calculator generates a marginal FICA tax rate of 3.80 percent while Internet-TAXSIM generates a rate of 2.90 percent. All of these units are exactly at the 200,000 (or 250,000 for couples) dollar threshold for the additional Medicare tax on high earnings. The derivative of the total FICA tax function at this point is not well defined: Tax-Calculator generates 3.80 because it approximates the derivative using a one-cent increase in earnings, while Internet-TAXSIM appears to be focusing on decreases in earnings. But the main point here is that the only significant differences in marginal FICA tax rates are completely understandable.

And there are 3 filing units for which Tax-Calculator generates a marginal federal income tax rate of 35.00 percent while Internet-TAXSIM generates rates of 35.77, 35,52, and 35.88 percent. The reasons for these three modest differences have not been determined.

The validation tests also include four simple income tax policy reforms that are simulated with the c13 and c14 samples. The 100,000 Tax-Calculator and Internet-TAXSIM income tax liabilities in each of these eight tests differ by no more than one-cent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Validation of Tax-Calculator

Installing Validation Tools

Using Validation Tools

Generating an INPUT file

Generating an OUTPUT file with simtax.py

Generating an OUTPUT file with Internet-TAXSIM

Generating tax-difference results

Reading tax-difference results

Automating Validation Process

Current Validation Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Validation of Tax-Calculator

Installing Validation Tools

Using Validation Tools

Generating an INPUT file

Generating an OUTPUT file with simtax.py

Generating an OUTPUT file with Internet-TAXSIM

Generating tax-difference results

Reading tax-difference results

Automating Validation Process

Current Validation Results