CPS File Progress Report #90

andersonfrailey · 2017-05-11T14:08:21Z

This issue is just an overview of the progress we've made in preparing the CPS-based file for use in Tax-Calculator.

John gave me the files needed to create the CPS file along with an associated weights file that covers the years 2015-2027. The SAS scripts create tax-units from the CPS in the same manner used to create the CPS tax-units that are then merged with the 2009 IRS-PUF file to create the final PUF currently used. After that, the following files are adjusted for top-coding:

Wages and salaries
Taxable interest income
Dividends
Alimony
Business income/loss
Pensions
Rents
Farm income/loss

Then the following are imputed:

Capital gains
Taxable IRA distributions
Adjusted IRA Contributions
KEOGH/SEP plan contributions
Self-employed health insurance deduction
Student loan interest deduction
Charitable contributions
Miscellaneous deductions
Child care expenses
Medical expenses deduction
Home mortgage interest expense
Real estate taxes
Domestic production activity deduction

Finally, the following are targeted at a state level:

Wages
Interest income
Dividends
Business income/loss
Capital gains
Taxable IRA distributions
Pensions
Unemployment
KEOGH
Self-employment health insurance
IRA contribution
Student loan interest
Domestic production activity deduction
Schedule E income

There are some variables that are currently missing from the file as well:

Variable	Description
p23250	Sch D: Net long-term capital gains/losses
p25470	Sch E: Royalty depletion and/or rental depreciation
e09800	Unreported payroll taxes from Form 4137 or 8919
e02000	Sch E rental, royalty, S-corp, etc, income/loss
e62900	Alternative Minimum Tax foreign tax credit from Form 6251
p08000	Other tax credits (but not including Sch R credit)
e58990	Investment income elected amount from Form 4952
e00700	Taxable refunds of state and local income taxes
e03290	Health savings account deduction from Form 8889
e07240	Retirement savings contributions credit from Form 8880
agi_bin	Historical AGI category used in data extrapolation
e19200	Sch A: Interest paid
e27200	Sch E: Farm rent net income or loss
e01200	Other net gain/loss from Form 4797
e03500	Alimony paid
n1821	Number of people over 18 and under 21 years old in the filing unit
e07260	Residential energy credit from Form 5695
blind_spouse	1 if spouse is blind; otherwise 0
p22250	Sch D: Net short-term capital gains/losses
e03220	Educator expenses
e07400	General business credit from Form 3800
f2441	number of child/dependent-care qualifying persons
nu18	Number of people under 18 years old in the filing unit
f6251	1 if Form 6251 (AMT) attached to return; otherwise 0
blind_head	1 if taxpayer is blind; otherwise 0
e03230	Tuition and fees from Form 8917
e03400	Penalty on early withdrawal of savings
e07300	Foreign tax credit from Form 1116
e11200	Excess payroll (FICA/RRTA) tax withheld
e24518	Sch D: 28% Rate Gain or Loss
EIC	number of EIC qualifying children (range: 0 to 3)
e09700	Recapture of Investment Credit
p87521	Total tentative AmOppCredit amount for all students
e09900	Penalty tax on qualified retirement plans
e24515	Sch D: Un-Recaptured Section 1250 Gain
e01500	Pensions and annuities
e87530	Adjusted qualified lifetime learning expenses for all students
e26270	Sch E: Combined partnership and S-corporation net income/loss
e07600	Prior year minimum tax credit from Form 8801
cmbtp	Estimate of income on (AMT) Form 6251 but not in AGI
n21	Number of people 21 years old or older in the filing unit
e18400	Sch A: State and local income/sales taxes
e20500	Sch A: Net casualty or theft loss
MIDR	1 if separately filing spouse itemizes; otherwise 0

nu18, n1821, and n21 can be found, and I'm editing the SAS files to do so. We're waiting for John to get us imputations for state and local taxes as well. I'm also digging more into the CPS to see if there are any other variables that can be found.

@Amy-Xu and I are analyzing the final files to make sure the results after using it in tax-calc makes sense.

I will use this issue to post updates as more progress is made

@martinholmer @MattHJensen @codykallen

The text was updated successfully, but these errors were encountered:

martinholmer · 2017-05-11T15:27:30Z

@andersonfrailey said in taxdata issue #90 wrt the being-developed CPS data file:

[The currently missing variables] nu18, n1821, and n21 can be found, and I'm editing the SAS files to do so.

Thanks for the progress report! Sounds like things are progressing nicely.

I always thought one of the big advantages of CPS data is that you know everybody's age, right?
If so, then why can't you make a pretty good estimate of the value of the following variables for each filing unit?

EIC
f2441

Without these variables (especially without the EIC variable) the tax estimates generated by this CPS file are going to be way off.

Because they are not on the missing variable list, I'm assuming you have values for the following variables, right?

n24
nu05
nu13
nu18

@MattHJensen @Amy-Xu

andersonfrailey · 2017-05-12T12:39:30Z

@martinholmer, you are correct that a big advantage of the CPS is knowing everybody's age. The two variables you listed first, EIC, f2441 are two that I'll be turning my attention to next.

And we do have the values for the second set listed.

martinholmer · 2017-05-12T12:50:38Z

@andersonfrailey said:

you are correct that a big advantage of the CPS is knowing everybody's age. The two variables you listed first, EIC, f2441 are two that I'll be turning my attention to next.

And we do have the values for the second set listed.

Great!

martinholmer · 2017-05-12T16:48:23Z

For earlier discussion (from 03-Nov-2016 to 02-Dec-2016) of issues related to the development of this CPS input file for Tax-Calculator, see Tax-Calculator issue 1030.

andersonfrailey · 2017-05-16T17:49:02Z

Quick update on the CPS file. Here is a notebook where you can see some comparisons between the CPS and PUF file after running Tax-Calc with each for 2017.

Some notable differences between the two can be seen in regular tax and income tax liability. That could be partially due to a couple of missing income items such as investment income.

You can also see that itemized deductions are significantly lower in the CPS. As noted before, John is imputing state and local tax deductions so once we have that the difference should close some, but we will still be missing a few deductions.

I also want to add farm income (e02100) and capital gain distributions (e01100) to the missing variables list. After looking at the aggregates for each, they are significantly higher in the CPS than the PUF. This may simply be due to a mislabeling when prepping the CPS. I'm looking into it further. That being said, I am a little more concerned about the drop in tax liabilities given that these two income sources were so much higher than they should have been and yet liabilities still fall.

If there is anything specific you would like to see in my next update, please let me know.

@martinholmer @MattHJensen @Amy-Xu

andersonfrailey · 2017-05-17T20:30:36Z

I've added a few distribution plots for the CPS data to the notebook. It looks to me like significant stage 3 adjustment will be needed.

andersonfrailey · 2017-06-07T19:34:36Z

John sent me a preliminary version of state and local tax deduction imputation. The notebook has been updated to include a chart comparing totals for a number of itemized deductions.

Amy-Xu · 2017-06-09T16:04:35Z

@andersonfrailey Just to confirm all blow-up factors used for CPS-Tax-Unit file is the same ones used for PUF, is that right? Same for the stage-I factors used to calculate weights?

If that's the case, I think we might need to develop a separate stage-I factor, or adjust the base-year file before feeding it to TC due to the large gap for several income/expense items.

martinholmer · 2017-06-09T16:20:40Z

@Amy-Xu asked:

Just to confirm all blow-up factors used for CPS-Tax-Unit file is the same ones used for PUF, is that right? Same for the stage-I factors used to calculate weights?

Unless I'm confused, "blow-up factors" and "stage-I factors" are two different names for the same thing. And yes, they affect the baseline (CBO-derived) projection, so they must be the same for all micro datasets, by definition.

@Amy-Xu continued:

If that's the case, I think we might need to develop separate stage-I factors [for the CPS and the PUF datasets].

This is not possible given the logic above.

The stage2 weights can differ for the CPS and PUF datasets and the stage3 adjustment factors can be different for CPS and PUF, but not stage1 factors (otherwise the two datasets would be using different CBO baseline projections, which doesn't make any sense).

Amy-Xu · 2017-06-09T17:11:06Z

@martinholmer said

Unless I'm confused, "blow-up factors" and "stage-I factors" are two different names for the same thing.

Right I'm aware of that so I said 'just to confirm'.

And yes, they affect the baseline (CBO-derived) projection, so they must be the same for all micro datasets, by definition.

For this part, it's true that we derive most factors to match CBO baseline, but not every factor is derived from CBO projection. Instead, some of the factors are tuned to fit PUF data. For example, we originally assumed to apply personal income factor (ATXPY) to most PUF variables outside AGI, which is a very reasonable assumption. But then we realized that applying this factor to ATXPY is not proper for e19200, which is interest paid that used to calculate itemized deduction. It's not proper since the growth rate of its major component -- home mortgage interest deduction -- deviates from the personal income growth rate. So we added one extra factor (AIPD) for this variable only to blow up this factor at the rate given by SOI tables prior to 2014, and then apply ATXPY after 2014. This particular factor yields better itemized deduction, which is not predicted by CBO explicitly, and therefore makes our results closer to CBO projected baseline.

When it comes to CPS data, as you can see in Anderson's notebook, home mortgage interest differs from PUF total by about 200 billion (total at ~350 billion in PUF), which means if we use exact same factor (originally derive from SOI tables) for CPS, we will significantly over-estimate all interest paid deduction, and therefore overestimate total number of itemizers, which would drag our CPS-tax-unit results away from CBO baseline. Same would apply to state and local deduction.

Yes I'm aware that having two sets of factors merely to adjust itemize deduction is too much work and may or may not be efficient for the goal (adjusting the baseline to resemble CBO projection) we want to achieve. So alternatively I proposed,

or adjust the base-year file before feeding it to TC

which I consider as a more doable way for adjusting itemized deduction and related expense projection. Does it make some sense?

andersonfrailey · 2017-06-12T13:08:39Z

@Amy-Xu

or adjust the base-year file before feeding it to TC

As in perform an adjustment similar to what we do in stage 3? Or did you have something else in mind?

Amy-Xu · 2017-06-12T13:40:52Z

@andersonfrailey I was thinking to tweak stage I factors a bit like AIPD, but perfectly fine with a Stage III as well. The goal is to take a look whether adjusting the itemize deductions would fix itemizer/standard deductor numbers, and furthur help AMT numbers.

martinholmer · 2017-06-12T14:11:14Z

@Amy-Xu said in issue #90 wrt the CPS dataset:

For this part, it's true that we derive most factors to match CBO baseline, but not every factor is derived from CBO projection. Instead, some of the factors are tuned to fit PUF data. For example, we originally assumed to apply personal income factor (ATXPY) to most PUF variables outside AGI, which is a very reasonable assumption. But then we realized that applying this factor to ATXPY is not proper for e19200, which is interest paid that used to calculate itemized deduction. It's not proper since the growth rate of its major component -- home mortgage interest deduction -- deviates from the personal income growth rate. So we added one extra factor (AIPD) for this variable only to blow up this factor at the rate given by SOI tables prior to 2014, and then apply ATXPY after 2014. This particular factor yields better itemized deduction, which is not predicted by CBO explicitly, and therefore makes our results closer to CBO projected baseline.

When it comes to CPS data, as you can see in Anderson's notebook, home mortgage interest differs from PUF total by about 200 billion (total at ~350 billion in PUF), which means if we use exact same factor (originally derive from SOI tables) for CPS, we will significantly over-estimate all interest paid deduction, and therefore overestimate total number of itemizers, which would drag our CPS-tax-unit results away from CBO baseline. Same would apply to state and local deduction.

Yes I'm aware that having two sets of factors merely to adjust itemize deduction is too much work and may or may not be efficient for the goal (adjusting the baseline to resemble CBO projection) we want to achieve. So alternatively I proposed,

or adjust the base-year file before feeding it to TC

which I consider as a more doable way for adjusting itemized deduction and related expense projection. Does it make some sense?

Yes, it is true that not all of the stage1 growfactors are taken from an CBO projection. The story you tell about why the mortgage interest needed its own growfactor (because of the sharp drop in mortgage interest rates following the 2008-2009 financial crash and house-price deflation) is true. But that growfactor reflects the reality of the macroeconomic situation in the years after 2009. And therefore, it is not a logical choice for changing across micro datasets.

And we seem to be in agreement on that point because you say:

So alternatively I proposed,

or adjust the base-year file before feeding it to TC

which I consider as a more doable way for adjusting itemized deduction and related expense projection.

Perhaps @andersonfrailey could do a one-time adjustment (to close the gap) at the end of the python code that creates the raw CPS dataset. And then we can see if the gap remains closed in all projection years. If the gap becomes too large, @andersonfrailey could use the stage3 adjustment methodology he created to manage the gap in subsequent years. Or can stage3 methods be used to close the gap in all years? I can't remember whether stage3 can deal with the first year.

Does this seem like a sensible way to proceed in the effort to make the CPS dataset generate results that a reasonably close to those generated by the puf.csv dataset?

andersonfrailey · 2017-06-13T17:25:07Z

@Amy-Xu @martinholmer, Stage 3 adjustments wouldn't be the best way to go about this because all stage 3 does is adjust the distribution of the variable, not the total amount.

In my opinion, a one time adjustment like @martinholmer proposed would work best. To be clear, this CPS file is being produced by SAS files provided by John, but this adjustment could be added at the end of the CPS version of finalprep.py

martinholmer · 2017-06-13T17:30:30Z

@andersonfrailey said in issue #90 wrt the new CPS-only input dataset:

Stage 3 adjustments wouldn't be the best way to go about this because all stage 3 does is adjust the distribution of the variable [across AGI bins], not the total amount.

In my opinion, a one time adjustment like @martinholmer proposed would work best. To be clear, this CPS file is being produced by SAS files provided by John, but this adjustment could be added at the end of the CPS version of finalprep.py

Sounds sensible to me. What do you think, @Amy-Xu ?

Amy-Xu · 2017-06-13T18:04:42Z

@andersonfrailey said:

In my opinion, a one time adjustment like @martinholmer proposed would work best. To be clear, this CPS file is being produced by SAS files provided by John, but this adjustment could be added at the end of the CPS version of finalprep.py

@martinholmer followed with:

Sounds sensible to me. What do you think, @Amy-Xu ?

Yes that sounds good to me!

Amy-Xu · 2017-06-13T18:10:06Z

@andersonfrailey I personally feel it would be valuable to ask John whether this level (we're talking about ~150b difference for both state & local and home mortgage) of discrepancy is expected while implementing this adjustment.

andersonfrailey · 2017-06-13T18:47:39Z

@Amy-Xu I agree. I'll shoot him an email.

andersonfrailey · 2017-06-15T13:17:22Z

In response to my email John pointed out that he did the imputations for every tax unit, where the matched file would only have values for the records that had filed so I was comparing two things that weren't really the same. I've updated the notebook to account for this and the state and local deduction looks much better.

andersonfrailey · 2017-06-16T18:28:07Z

As mentioned previously, many of the income variables in the CPS-PUF will need substantial adjustments to get their distribution right. Unlike the PUF file, the CPS-PUF contains records from the 2013, 2014, and 2015 CPS, which correspond to 2012, 2013, and 2014 tax years. The 2013 and 2014 files are aged to match the 2015 file (2014 tax year).

Because so many of the variables need adjusting and they're already equivalent to the 2014 tax year, I'm considering adding a one time adjustment to the finalprep.py file. My reasoning is, in stage 3 we assume no change in the distribution of adjusted variables after 2014 and the start year for the CPS file will need to be set at 2014. Therefore, in my opinion, we might as well adjust the variables in the prep stages rather than have to modify tax-calc to account for all of the additional variables that will need adjusting.

Thoughts?

@martinholmer @Amy-Xu @MattHJensen

Amy-Xu · 2017-06-16T19:02:26Z

By adjustment, do you mean augment or shrink the total to match PUF? How many variables do we need to adjust? @andersonfrailey

andersonfrailey · 2017-06-16T19:15:58Z

@Amy-Xu, neither. In this case I'm just talking about fixing the distribution of the variables. Specifically I'd say interest income and ordinary and qualified dividends definitely need adjustments and business income could benefit as well.

The actual totals for the income variables are actually relatively close to the PUF already.

Amy-Xu · 2017-06-16T20:48:17Z

I see, that sounds valuable. At the same time, it might also be helpful to investigate what is the major source dragging individual income tax liability off by almost 40%. The distribution of those income variables could be a reason, but not sure whether they're big problems given totals are close.

andersonfrailey · 2017-06-20T21:08:35Z

I spent the day working on adjusting the distribution of interest and business income and ordinary and qualified dividends. I don't believe it's possible to get the distribution as accurate as we were able to with the PUF, but there is improvement.

AGI is reported in the CPS, but I quickly realized that it would not be very helpful. The highest reported AGI was just over $2 million, which would put nobody in the top two income bins, as defined in the IRS SOI data mentioned previously in this thread. I instead added up wages, interest income, dividends, alimony, business income, pensions, rental income, farm income, unemployment compensation, and social security income.

The new distributions can be seen in the updated notebook. There was a slight improvement in individual income tax liability. It is now short by about 30% rather than 40%

Amy-Xu · 2017-06-21T13:04:43Z

The number of itemizers and standard deductors look much much better, but it seems from the last chart for itemized deduction, interest paid in CPS is gone in this version? @andersonfrailey

andersonfrailey · 2017-06-21T13:28:41Z

@Amy-Xu that's because I'm working on slightly tweaking the final prep scripts and left HMIE, which I originally thought was equivalent to interest paid, out. HMIE is just home mortgage interest expense, part of e19200, but not all of it. It's still much higher than e19200 in the PUF in sum though.

I'm reading through the IRS documentation and there are a couple of instances where HMIE is not fully deductible, so I'm working that into final prep to see the affects on the aggregate total.

Amy-Xu · 2017-06-21T13:33:34Z

@andersonfrailey Sounds great. Thanks!

andersonfrailey · 2017-06-21T14:14:35Z

Update on above comment. All of the instances where home mortgage interest isn't deductible are related to when you took out the mortgage and total home equity so it doesn't look like I'll be able to adjust HMIE based on that. Right now it's a little over $100 billion higher than e19200 in the PUF.

Amy-Xu · 2017-06-21T14:33:48Z

@andersonfrailey along the line, I was thinking if HMIE wasn't imputed for the purpose of itemized deduction, it might include interest of standard deductors although this item is not deductible for them. So the total of HMIE might not be reflective how much is taken account in actual itemize deduction. Do you see what I mean? Have you looked at how much in total under the records who take itemized deduction?

andersonfrailey · 2017-07-10T17:41:19Z

Updated notebook shows the result of taking the HMIE variable in the CPS and shirking it for everyone by the ratio of home interest in table 2.1 of the SOI stats to HMIE.

@Amy-Xu @MattHJensen

Amy-Xu · 2017-07-10T17:47:03Z

@andersonfrailey Looking at cell 20 in your notebook, it seems the adjusted interest paid deduction is quite a bit lower than PUF or SOI number. Did you use the total of HMIE or the total of itemizers' HMIE to create the ratio?

andersonfrailey · 2017-07-10T17:58:33Z

@Amy-Xu I did total HMIE in the CPS to the HMIE portion of the Interest Paid deduction reported by SOI. I'll do HMIE of just itemizers in the CPS and see what happens

Amy-Xu · 2017-07-10T18:00:38Z

I guess this version confuses me in that CPS total itemized deduction is lower than PUF, but somehow CPS data has more itemizers. In a previous version where CPS has higher aggregates on both interest paid and state and local income taxes, it makes sense that CPS would end up with more itemizers than it should be. But in this version, the total itemized deduction in CPS is lower than PUF, but still end up with more itemizers?

Amy-Xu · 2017-07-10T18:04:52Z

@andersonfrailey Thanks!

andersonfrailey · 2017-07-11T14:12:01Z

@Amy-Xu, using SOI HMIE / HMIE[CPS|Itemizer] results in the total for interest paid deduction being about 28% too high. Number of itemizers is 43% too high, but total itemized deductions gets to within 6% of both the PUF and SOI totals.

Amy-Xu · 2017-07-11T15:18:29Z

@andersonfrailey I see. My whole point is just to see whether there is a way to get interest paid deduction close to SOI, and theoretically number of itemizers will be closer as a result. If the ratio of SOI HMIE / HMIE[CPS|Itemizer] doesn't work, which I guess is people identified as itemizers are not itemizers anymore after their deduction get scaled down, do you think there is a way to get interest paid total similar to SOI? An alternative question, if this is not feasible, is what would be the best adjustment to get the sum of (%itemizers diff + %itemized deduction diff + % standard deduction diff + % standard deductors diff) minimized. Does it make sense?

andersonfrailey · 2017-07-11T17:31:02Z

@Amy-Xu, I'm sure there's a way to get HMIE close to SOI interest paid. In theory this would help minimize %itemizers diff + %itemized deduction diff + % standard deduction diff + % standard dedicators diff as well. I'll play around with a few ideas and let you know what I can get.

andersonfrailey · 2017-07-12T15:56:06Z

After playing with the interest paid deduction for awhile, I've updated the notebook to show the results of using HMIE * .8 as e19200. I used .8 as a scalar because the other ratios discussed above resulted in e19200 still being too high. These ratios were around .9 so I decided to give .8 a shot and the resulting e19200 was pretty much spot on. That being said, home mortgage interest is not the only component of the interest paid deduction, though it does comprise roughly 93% of total reported according to SOI stats, so it might be more accurate to have e19200 be a little lower in the CPS file than the PUF.

Despite e19200 being more accurate, total number of itemizers is still much higher than the PUF or the SOI stats. I've tried a number of other ratios to adjust HMIE in the CPS file, but haven't had any luck in getting that number down. I'm currently working on plotting out the distribution of itemized deductions and the itemizers themselves. @Amy-Xu suggested the issue may be that more people have just enough in itemized deductions to out value the standard deduction.

I'll post more as I work on this.

cc @MattHJensen

Amy-Xu · 2017-07-12T16:12:28Z

I suspect the reason why the number of itemizers in CPS outruns PUF is that CPS has more people w/ smaller amount of itemized deduction than PUF, and less people with large amount of itemized deduction. One way to verify is to tabulate itemizer number by itemize deduction amount @andersonfrailey. If this is true, it means we might need to apply different scalers to people w/ different itemize deduction amount. For example, disqualify some itemizers by reducing their deduction to levels below 6k, and augment the others itemize deduction to match the total admin itemized deduction number. Even though this approach might get us a better total liability number at some point, it seems to me this approach will take longer and is susceptible to accusations being too manipulative.

@MattHJensen What do you think the best way to deal with itemized deduction at this point?

hdoupe · 2017-07-12T17:12:42Z

@Amy-Xu said

I suspect the reason why the number of itemizers in CPS outruns PUF is that CPS has more people w/ smaller amount of itemized deduction than PUF, and less people with large amount of itemized deduction. One way to verify is to tabulate itemizer number by itemize deduction amount @andersonfrailey. If this is true, it means we might need to apply different scalers to people w/ different itemize deduction amount. For example, disqualify some itemizers by reducing their deduction to levels below 6k, and augment the others itemize deduction to match the total admin itemized deduction number

How do the distributions of the components (medical expenses, interest paid, etc.) of the itemized deduction amount in the CPS compare to those of the PUF? To me, it would make more sense to scale those components than to scale the full itemized deduction amount.

Amy-Xu · 2017-07-12T17:28:13Z

@hdoupe Take a look at Cell 22 in the latest notebook Anderson posted in his comment. I believe what Anderson has been doing is just scaling the interest paid section of itemized deduction, instead of the full itemized deduction. If you trace back a few previous comments by Anderson, you can see that scaling back one item will increase the number of total itemizers more than desired level -- at least that's my understanding. The increased itemizer number decreases total number of standard deductors, and total standard deduction, which drag ind income liability further away from where it's supposed to be. Does that make sense?

feenberg · 2017-07-12T17:30:40Z

On Wed, 12 Jul 2017, Henry Doupe wrote: @Amy-Xu said I suspect the reason why the number of itemizers in CPS outruns PUF is that CPS has more people w/ smaller amount of itemized deduction than PUF, and less people with large amount of itemized deduction. One way to verify is to tabulate itemizer number by itemize deduction amount @andersonfrailey. If this is true, it means we might need to apply different scalers to people w/ different itemize deduction amount. For example, disqualify some itemizers by reducing their deduction to levels below 6k, and augment the others itemize deduction to match the total admin itemized deduction number How do the distributions of the components (medical expenses, interest paid, etc.) of the itemized deduction amount in the CPS compare to those of the PUF? To me, it would make more sense to scale those components than to scale the full itemized deduction amount.

It would help to know where the itemized deduction imputation comes from. Is it the CEX? dan

hdoupe · 2017-07-12T17:31:19Z

@Amy-Xu Ah, ok thanks. That makes sense. Sorry, I should've looked before I commented.

Amy-Xu · 2017-07-12T17:44:57Z

@andersonfrailey Since we have been applying scaler, the adjustment might be accused of data manipulation already. One brutal force way to solve this problem I think is to apply one scaler getting total itemizers in right place, and then apply another scaler to augment these itemizers' interest paid & state and local to match the total itemize deduction. But it's Matt's call to do this or not.

@hdoupe no worries.

Amy-Xu · 2017-07-12T17:52:43Z

@feenberg Dan, John has this documentation for CPS tax unit online at: http://www.quantria.com/assets/img/TechnicalDocumentationV4-2.pdf

This documentation (on page 14) seems to indicate two biggest itemized deductions -- home mortgage interest and state and local income taxes -- are respectively imputed from Survey of Consumer Finance (Federal Reserve Board) and a proprietary state calculator. Other items are matched from SOI etc.

Do you think this mortgage interest issue might be rooted in original CPS income distribution? CPS doesn't have enough high income earners so the mortgage interests matched from SCF are tilted toward lower end.

andersonfrailey · 2017-07-12T17:59:07Z

@Amy-Xu it looks like you're right about there being more itemizers, but smaller totals, as can be seen in the plots below.

MattHJensen · 2017-07-12T21:20:02Z

@Amy-Xu asked:

@MattHJensen What do you think the best way to deal with itemized deduction at this point?

My view is that we should leave everything as is and focus on documenting the file and how to recreate the file from scratch from initial datasets (CPS, SCF, etc.), including all imputations. Only after the file is documented and we have scripts to create it from scratch should we focus on tuning things. At that point we'll have a much better idea of whether the imputations themselves should be adjusted or whether the file should receive 'post processing' adjustments like those discussed above.

Amy-Xu · 2017-07-12T21:28:52Z

@MattHJensen said:

My view is that we should leave everything as is

What do you mean by 'as is'? You asked to include HMIE as e19200 in this comment. Without any scaling adjustment, this HMIE/e19200 will increase the different of individual income liability between SOI and CPS to more than 30%. Do you mean to include HMIE or not?

MattHJensen · 2017-07-12T22:00:02Z

@Amy-Xu asked:

Do you mean to include HMIE or not?

If others agree with me that home mortgage interest expense is a component "total interest paid", then I think we should include HMIE in e19200. If others don't, then I'd like to discuss that further.

What do you mean by 'as is'?

My point is that we should make correspondances between as many of the variables from the CPS file with variables from the puf as we can and then wait on doing further adjustments until we have documented and can reproduce the CPS file.

Otherwise we will run into more conversations like this:

q. Why did you adjust the interest deduction amount in post processing?
a. Because the number wasn't close to SOI.
q. Why wasn't the number close to SOI?
a. I'm not quitre sure, something to do with the imputation.
q. How does the imputation work?
a. I'm not quite sure, here's a link to John O'Hare's documentation.

codykallen · 2017-07-12T22:05:10Z

@MattHJensen said:

If others agree with me that home mortgage interest expense is a component "total interest paid", then I think we should include HMIE in e19200. If others don't, then I'd like to discuss that further.

It is a component of total interest paid, and the most important one. I would love if we could get the other components as well, but I could settle for at least being able to distinguish between HMIE and non-HMIE deductible interest expense.

Amy-Xu · 2017-07-12T22:09:51Z

@andersonfrailey Could you explain again why you thought HMIE is not equivalent to e19200 so everyone can see?

andersonfrailey · 2017-07-13T13:06:20Z

@Amy-Xu I came to that conclusion by talking to John about it. It is a component of e19200 as @codykallen said, but not a direct map like other variables are.

I generally agree with the points @MattHJensen and @Amy-Xu have made about excessive data manipulation. I suppose it would be best to leave it as is and make a note in the documentation of the dataset about it and the effects it may have on the results coming out of tax-calculator.

andersonfrailey · 2017-07-18T21:16:59Z

Updated notebook just includes some additional information requested by @Amy-Xu on the number of benefits participants.

martinholmer · 2017-07-18T21:45:30Z

@andersonfrailey said:

Updated notebook just includes some additional information requested by @Amy-Xu on the number of benefits participants [and average benefits by wages among participants].

Thanks for the additional tabulations.

I'm puzzled by the distribution of benefits by wage percentile among participants. For means-tested programs, I would expect to see benefits decline as wages rise. But what we see for SSI is nothing like that. And we see just the opposite for Social Security, which has an individual level "earnings test" (that reduces benefits as wages rise). I do not expect to see such a decline for benefits that are not means-tested, such as Medicare. What am I missing? Is there some explanation that makes sense out of what is puzzling me?

@MattHJensen @Amy-Xu

Amy-Xu · 2017-07-19T16:49:43Z

@martinholmer Thanks for looking into the charts. At this point, it might be hard to explain the trends in participation and benefit charts because these charts are not final version yet -- we might still need to tweak here and there. That being said, I try to explain what looks sensible to me regarding three observations you have respectively for SSI, Social Security and Medicare. Many are just my speculations -- I'm happy to continue the discussion and hear more feedback since there's no official tax-unit distribution we could compare with.

SSI: The average benefit chart with little benefit decline I believe is plotted with participants only. A very small number of high income taxpayers received SSI for themselves or their families over some periods of the year, during which they might get laid off or lose their jobs for other reasons. Thus, they might get the benefit at same level, making the average benefit among participants chart look flat, but if you take a look at the average benefit among entire population, the benefit decline is still quite significant there since the participation rate of high income groups is very low.
Social Security: Although Social Security benefit is subject to earning test, I think that applies to people under normal retirement age, which I suspect is not the majority of beneficiaries included in the chart. Also, SS benefit declines as earning goes up, but the earnings refer to historical earnings instead of current wage. Thus I feel it's quite hard to tell how it should look like when most beneficiaries probably have little current wage income.
Medicare: This is something I'm not quite sure about, but I suspect higher income beneficiaries might have relied on private insurance more than lower income beneficiaries, as suggested in this BLS report. Additionally in the raw CPS imputation, as you can see in C-TAM documentation, the top quintile indeed has a lower average benefits than the other four quintiles. Last but not least, the medicare average benefit chart has most beneficiaries between $8000 and $16000 benefits. The sharp decline may look not so sharp if the y-axis of the chart is from $0 to $16000 instead.

andersonfrailey · 2017-07-25T14:49:44Z

The CPS files and documentation have been merged into master so I am closing this issue.

Amy-Xu mentioned this issue Jun 12, 2017

Use TC to do UBI reforms with CPS tax unit dataset (3 months) PSLmodels/UBI-examples#9

Open

4 tasks

martinholmer mentioned this issue Jul 18, 2017

[WIP] Corporate Income Tax Distribution PSLmodels/Tax-Calculator#1482

Closed

andersonfrailey mentioned this issue Jul 20, 2017

Revamp CPS final prep #110

Merged

andersonfrailey closed this as completed Jul 25, 2017

CPS File Progress Report #90

CPS File Progress Report #90

Comments

andersonfrailey commented May 11, 2017

martinholmer commented May 11, 2017 • edited Loading

andersonfrailey commented May 12, 2017

martinholmer commented May 12, 2017

martinholmer commented May 12, 2017

andersonfrailey commented May 16, 2017

andersonfrailey commented May 17, 2017

andersonfrailey commented Jun 7, 2017 • edited Loading

Amy-Xu commented Jun 9, 2017

martinholmer commented Jun 9, 2017 • edited Loading

Amy-Xu commented Jun 9, 2017 • edited Loading

andersonfrailey commented Jun 12, 2017

Amy-Xu commented Jun 12, 2017

martinholmer commented Jun 12, 2017

andersonfrailey commented Jun 13, 2017

martinholmer commented Jun 13, 2017

Amy-Xu commented Jun 13, 2017

Amy-Xu commented Jun 13, 2017

andersonfrailey commented Jun 13, 2017

andersonfrailey commented Jun 15, 2017

andersonfrailey commented Jun 16, 2017

Amy-Xu commented Jun 16, 2017 • edited Loading

andersonfrailey commented Jun 16, 2017

Amy-Xu commented Jun 16, 2017

andersonfrailey commented Jun 20, 2017

Amy-Xu commented Jun 21, 2017

andersonfrailey commented Jun 21, 2017

Amy-Xu commented Jun 21, 2017

andersonfrailey commented Jun 21, 2017

Amy-Xu commented Jun 21, 2017 • edited Loading

andersonfrailey commented Jul 10, 2017

Amy-Xu commented Jul 10, 2017 • edited Loading

andersonfrailey commented Jul 10, 2017

Amy-Xu commented Jul 10, 2017

Amy-Xu commented Jul 10, 2017

andersonfrailey commented Jul 11, 2017 • edited Loading

Amy-Xu commented Jul 11, 2017

andersonfrailey commented Jul 11, 2017

andersonfrailey commented Jul 12, 2017

Amy-Xu commented Jul 12, 2017 • edited Loading

hdoupe commented Jul 12, 2017

Amy-Xu commented Jul 12, 2017

feenberg commented Jul 12, 2017 via email

hdoupe commented Jul 12, 2017

Amy-Xu commented Jul 12, 2017

Amy-Xu commented Jul 12, 2017

andersonfrailey commented Jul 12, 2017

MattHJensen commented Jul 12, 2017 • edited Loading

Amy-Xu commented Jul 12, 2017

MattHJensen commented Jul 12, 2017 • edited Loading

codykallen commented Jul 12, 2017

Amy-Xu commented Jul 12, 2017

andersonfrailey commented Jul 13, 2017

andersonfrailey commented Jul 18, 2017

martinholmer commented Jul 18, 2017

Amy-Xu commented Jul 19, 2017 • edited Loading

andersonfrailey commented Jul 25, 2017

martinholmer commented May 11, 2017 •

edited

Loading

andersonfrailey commented Jun 7, 2017 •

edited

Loading

martinholmer commented Jun 9, 2017 •

edited

Loading

Amy-Xu commented Jun 9, 2017 •

edited

Loading

Amy-Xu commented Jun 16, 2017 •

edited

Loading

Amy-Xu commented Jun 21, 2017 •

edited

Loading

Amy-Xu commented Jul 10, 2017 •

edited

Loading

andersonfrailey commented Jul 11, 2017 •

edited

Loading

Amy-Xu commented Jul 12, 2017 •

edited

Loading

MattHJensen commented Jul 12, 2017 •

edited

Loading

MattHJensen commented Jul 12, 2017 •

edited

Loading

Amy-Xu commented Jul 19, 2017 •

edited

Loading