Data leakage with classification models on small dataset #739
Replies: 2 comments
-
Hi @georgemossad, |
Beta Was this translation helpful? Give feedback.
-
hi @oguiza so this approach might lead to data leakage as you also as also the model might overfit the data in the training then due to leakage it also achieved high results on the test. is there any suggestion to deal with this issue and get real representative results, such as whether I should use the original signals as the samples to the model directly with 4096 points or use ), or split by order but when trying to do so and set startfy=True I get an error that tells me that I need to shuffle the data to use Startify so how can I dead with that? many thanks for your time. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone,
I am working with a multi-variate, multi-labeled dataset that has a small size of X (468,12,512) and 14 classes. I am using the library classification models to train and test my data, and I get very high results on all metrics (almost 100%) on both validation and test sets. I am worried that the models are overfitting the data because I divided my original time series readings into smaller samples and then used random split in the get_split function. Could this cause data leakage when using a random split?
P.S (I can't use order split as my data is imbalanced)
Beta Was this translation helpful? Give feedback.
All reactions