-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why shuffling data? #7
Comments
Hi @ochoch I think you're right. At that time I thought it was a good idea to shuffle the data, but I now I'd say it leads to overfitting and forward-looking bias. |
Hi Maxim,
Thanks for your reply. I played a bit with your implementation and add a
provider (FXCM), using pyfxcm (
https://github.com/fxcm/RestAPI/tree/master/fxcmpy).
At the end, as it is time consumming to connect to FXCM servers and they
are not delivering the last bar(!), I integrate your python scripts with
MT4.
On each tick I mn providing the last data (replacement of get_latest_data
method), I am providing a csv file, and replace raw_df dataframe with a
read_csv method.
Then I run predict.py and get prediction for the next bar and draw the
result on a chart...
[image: image.png]
At this stage, I am also calculating some accuracy... And to be honest it
is quit hard to get some tradable predictions...
I have more or less following accuracy on forward testing :
TF High Accuracy (%) Low Accuracy (%)
m15 57.25 56.29
H4 56.25 63.55
D1 65.63 57.29
W1 52.08 58.33
Maybe we should add some additionnal features with selection feature
algorithm. Any insights?
Regards,
och
Le sam. 20 avr. 2019 à 11:19, Maxim Podkolzine <notifications@github.com> a
écrit :
… Hi @ochoch <https://github.com/ochoch> I think you're right. At that time
I thought it was a good idea to shuffle the data, but I now I'd say it
leads to overfitting and forward-looking bias.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABTHQD4XEF6YQBYEWAKYTLTPRLNZ7ANCNFSM4HHJFWYA>
.
|
Hi @ochoch sorry for the delay. Unfortunately that's the way it is: there is so much noise and so little signal in financial data. If you are able to find a reliable signal more than 50% accurate, it's good enough and you can make money. In terms of features: that's the key question. All ML algorithms that make money boil down to features. I haven't worked much on crypto data since then. Do you have any ideas in mind? |
Hello,
Nice and interresting work, I learned a lot.
During train and testing dataset building process, why are you shuffling data? I though that regarding time serie we should not shuffling data.
data_utils.py
def split_dataset(dataset, ratio=None):
size = dataset.size
if ratio is None:
ratio = _choose_optimal_train_ratio(size)
mask = np.zeros(size, dtype=np.bool_)
train_size = int(size * ratio)
mask[:train_size] = True
np.random.shuffle(mask)
train_x = dataset.x[mask, :]
train_y = dataset.y[mask]
mask = np.invert(mask)
test_x = dataset.x[mask, :]
test_y = dataset.y[mask]
return DataSet(train_x, train_y), DataSet(test_x, test_y)
Regards,
The text was updated successfully, but these errors were encountered: