-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_train infinite loop? #2
Comments
silly me. i did not see the |
Hi @flamby First of all, you can examine the logs and check the best scores that RNN or DNN models are showing (by the way there's also verbose logging, but as far as I remember I didn't do a command-line option for it). Probably they are much worse, which means that there is too much noise in the chart and very little signal. In this case, the simplest possible model will often win. There is a small chance that tweaking the ranges of hyper-parameters will help, but unfortunately there's not much you can do, other than choosing a different period/target. I have seen pairs like and in my opinion the best thing to do is to select another pair, which demonstrates more vivid pattern in the data, thus easier to train. Luckily there is a big choice. |
Hi @maxim5 Thanks for the clarification. I like the idea of having an ensemble of different ML algorithm, instead of an ensemble of 5 LinearModel. So I came to this def main():
tickers, periods, targets = util.parse_command_line(default_periods=['day'],
default_targets=['high'])
# Change me
_models = [
{'func': iterate_linear, 'max_iteration': 2},
{'func': iterate_neural, 'max_iteration': 2},
{'func': iterate_rnn, 'max_iteration': 1},
{'func': iterate_cnn, 'max_iteration': 1}
# {'func': iterate_xgb, 'max_iteration': 1}
]
for _model in _models:
i = 0
while i < _model['max_iteration']:
for ticker in tickers:
for period in periods:
BASE_DIR = "_zoo/%s_%s/" % (ticker, period)
for target in targets:
job_info = JobInfo('_data', '_zoo', name='%s_%s' % (ticker, period), target=target)
job_runner = JobRunner(job_info, limit=np.median)
_model['func'](job_info, job_runner)
job_runner.print_result()
i += 1
TEMP_DIR = os.path.join(BASE_DIR, _model['func'].__name__)
os.makedirs(TEMP_DIR, exist_ok=True)
to_move = [os.path.join(BASE_DIR, d) for d in os.listdir(BASE_DIR)
if os.path.isdir(os.path.join(BASE_DIR, d))
and (d.startswith("low_") or d.startswith("high_"))]
print("** Moving %s models to %s directory for manual selection **"
% (_model['func'].__name__, TEMP_DIR))
for d in to_move:
shutil.move(d, TEMP_DIR) It moves all stored models into their respective directory, named after the function used (iterate_linear, iterate_neural, etc.) The rational came to me when reading François Chollet (Keras author) "Deep Learning with Python" book : Regarding pair w/ less noise, I found ETC_ETH having good patterns for ML. I'll try to plug WaveNet model into your code as it seems to be very performant to detect patterns. I'll need at least 2 other exchanges connector I guess for that. New activation functions like SWISH could also help. |
Hi @flamby I see what you mean. What I usually do is let all models to learn and then drop the similar ones from the ensemble. Note that combining models with different hyperparameters (e.g., My concern was specifically about a situation when the linear model performs much better than any other complex model: this usually indicates that only a simple inference is possible, due to the nature of the data. It would be interesting to know if your approach leads to significant improvement over a linear model alone. Can you share how you'd like to plug the wave net in? |
Dear maxim, I also have same concern with flamby: Because in run_train.py has while True loop, so how can I know when I should stop the training and move to run_predict.py? And need I run train every time before run predict? Thanks for your hard work! |
@bautroibaola This is a common problem in ML: there is no way to tell the model is ready. What people usually do is train it as long as they have time and simply take the best models. That's why there's an endless loop. However, feel free to replace it with some limit. |
Hi,
First of all, congrats for this project, it appears to be very promising.
I ran run_train like this
./run_train.py --target=low BTC_ETH --period=day
and 2 days later, it's still running with around 77 _zoo/BTC_ETH sub folders, with only LinearModel.Could it be the reason that the training is still ongoing? i.e. in an attempt to find other models with good results, without success?
I did not find where to configure the limit.
Thanks and keep the good work!
The text was updated successfully, but these errors were encountered: