Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why shuffling data? #7

Open
ochoch opened this issue Apr 20, 2019 · 3 comments
Open

why shuffling data? #7

ochoch opened this issue Apr 20, 2019 · 3 comments

Comments

@ochoch
Copy link

ochoch commented Apr 20, 2019

Hello,
Nice and interresting work, I learned a lot.
During train and testing dataset building process, why are you shuffling data? I though that regarding time serie we should not shuffling data.

data_utils.py

def split_dataset(dataset, ratio=None):
size = dataset.size
if ratio is None:
ratio = _choose_optimal_train_ratio(size)

mask = np.zeros(size, dtype=np.bool_)
train_size = int(size * ratio)
mask[:train_size] = True
np.random.shuffle(mask)

train_x = dataset.x[mask, :]
train_y = dataset.y[mask]

mask = np.invert(mask)
test_x = dataset.x[mask, :]
test_y = dataset.y[mask]

return DataSet(train_x, train_y), DataSet(test_x, test_y)

Regards,

@maxim5
Copy link
Owner

maxim5 commented Apr 20, 2019

Hi @ochoch I think you're right. At that time I thought it was a good idea to shuffle the data, but I now I'd say it leads to overfitting and forward-looking bias.

@ochoch
Copy link
Author

ochoch commented Apr 24, 2019 via email

@maxim5
Copy link
Owner

maxim5 commented May 16, 2019

Hi @ochoch sorry for the delay.

Unfortunately that's the way it is: there is so much noise and so little signal in financial data. If you are able to find a reliable signal more than 50% accurate, it's good enough and you can make money.

In terms of features: that's the key question. All ML algorithms that make money boil down to features. I haven't worked much on crypto data since then. Do you have any ideas in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants