Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling Query #1

Open
andrewwaites opened this issue Oct 8, 2017 · 2 comments
Open

Scaling Query #1

andrewwaites opened this issue Oct 8, 2017 · 2 comments

Comments

@andrewwaites
Copy link

Hi

This is excellent code - many thanks as it aligns with some research I am doing and also helps with my Python.

I have a query regarding the scaling. Am I correct that this code is including the test data in the scaling fit? Should the test set be excluded from that process? That is, should the data not be sploit into train/test prior to fit_transform on train and transform only on test?

@dafrie
Copy link
Owner

dafrie commented Oct 8, 2017

Hi Andrew,

Glad the code is helping you! As this is my first shot at RNN (thus take everything cautiously), I also profited massively from other projects and blog posts and thus had to share this...

Regarding the scaling on test data:
Thanks for raising this point. You are very correct, one should not base the standardization on the whole sample but only on the training data and then use the estimated parameters also on the test data, otherwise the model gets an "illegal" glimpse at the test data.

I noticed this flaw only after running the generated models (which took many hours...) and as the hand-in date for the paper was fast approaching, I didn't have time to correct and rerun the models. In the paper I argued that as the series seems to be stationary (if you take out the seasonality) and the distribution of both the train- and test data is similar, the results should not really be affected...

@andrewwaites
Copy link
Author

Hi Dafrie

Was more a sanity check than anything critical as I agree the scaling would have been very similar either way.

thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants