Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading date interval in splitting.py file #63

Open
vultor33 opened this issue May 13, 2019 · 0 comments
Open

Misleading date interval in splitting.py file #63

vultor33 opened this issue May 13, 2019 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@vultor33
Copy link
Contributor

Instructions

The specified date inteval code in splitting.py could lead to errors.
Path: .\fklearn\src\fklearn\preprocessing\splitting.py

Code sample

In splitting.py we have:

train_period = dataset[
    (dataset[time_column] >= train_start_date) & (dataset[time_column] < train_end_date)]
outime_inspace_hdout = dataset[
    (dataset[time_column] >= train_end_date) & (dataset[time_column] < holdout_end_date)]

Problem description

We can see from source that "holdout_end_date" won't be included in "outime_inspace_hdout" dataset.
This could mislead users.
For example, in yours regression.ipynb notebook, the date "2016-12-31" had completely vanished when data was splitted.

Also, looking to the source, "train_end_date" won't be included in "train_period" dateset.
It is not wrong, but is not what we would expect.

Possible solutions

I would suggest to redefine this limits:

train_period = dataset[
    (dataset[time_column] >= train_start_date) & (dataset[time_column] <= train_end_date)]
outime_inspace_hdout = dataset[
    (dataset[time_column] > train_end_date) & (dataset[time_column] <= holdout_end_date)]

In the same file (splitting.py), the function "time_split_dataset" have similar behaviour.

@vultor33 vultor33 added the bug Something isn't working label May 13, 2019
@caique-lima caique-lima added this to the 1.16.x milestone Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants