run_train infinite loop? #2

flamby · 2018-04-04T09:48:33Z

Hi,

First of all, congrats for this project, it appears to be very promising.

I ran run_train like this ./run_train.py --target=low BTC_ETH --period=day and 2 days later, it's still running with around 77 _zoo/BTC_ETH sub folders, with only LinearModel.

Could it be the reason that the training is still ongoing? i.e. in an attempt to find other models with good results, without success?
I did not find where to configure the limit.

Thanks and keep the good work!

The text was updated successfully, but these errors were encountered:

flamby · 2018-04-05T07:49:59Z

silly me. i did not see the while True
but anyway, no other model than LinearModel are stored...

maxim5 · 2018-04-05T08:01:34Z

Hi @flamby

First of all, you can examine the logs and check the best scores that RNN or DNN models are showing (by the way there's also verbose logging, but as far as I remember I didn't do a command-line option for it). Probably they are much worse, which means that there is too much noise in the chart and very little signal. In this case, the simplest possible model will often win. There is a small chance that tweaking the ranges of hyper-parameters will help, but unfortunately there's not much you can do, other than choosing a different period/target.

I have seen pairs like and in my opinion the best thing to do is to select another pair, which demonstrates more vivid pattern in the data, thus easier to train. Luckily there is a big choice.

flamby · 2018-04-05T11:16:19Z

Hi @maxim5

Thanks for the clarification.

I like the idea of having an ensemble of different ML algorithm, instead of an ensemble of 5 LinearModel.

So I came to this run_predict change. Instead of one infinite while with all models, i have now 4 consecutives (xgboost is not getting any result, need to dig into this) run of models (w/ capped iteration each). Then I do manual selection based on Sign accuracy. It could be automated also I guess.

def main():
  tickers, periods, targets = util.parse_command_line(default_periods=['day'],
                                                      default_targets=['high'])
  # Change me
  _models = [
             {'func': iterate_linear, 'max_iteration': 2},
             {'func': iterate_neural, 'max_iteration': 2},
             {'func': iterate_rnn,    'max_iteration': 1},
             {'func': iterate_cnn,    'max_iteration': 1}
             # {'func': iterate_xgb,    'max_iteration': 1}
            ]

  for _model in _models:
    i = 0
    while i < _model['max_iteration']:
      for ticker in tickers:
        for period in periods:
          BASE_DIR = "_zoo/%s_%s/" % (ticker, period)
          for target in targets:
            job_info = JobInfo('_data', '_zoo', name='%s_%s' % (ticker, period), target=target)
            job_runner = JobRunner(job_info, limit=np.median)
            _model['func'](job_info, job_runner)
            job_runner.print_result()
            i += 1
          TEMP_DIR = os.path.join(BASE_DIR, _model['func'].__name__)
          os.makedirs(TEMP_DIR, exist_ok=True)
          to_move = [os.path.join(BASE_DIR, d) for d in os.listdir(BASE_DIR) 
                     if os.path.isdir(os.path.join(BASE_DIR, d))
                     and (d.startswith("low_") or d.startswith("high_"))]
          print("** Moving %s models to %s directory for manual selection **" 
                % (_model['func'].__name__, TEMP_DIR))
          for d in to_move:
            shutil.move(d, TEMP_DIR)

It moves all stored models into their respective directory, named after the function used (iterate_linear, iterate_neural, etc.)

The rational came to me when reading François Chollet (Keras author) "Deep Learning with Python" book :

Regarding pair w/ less noise, I found ETC_ETH having good patterns for ML.

I'll try to plug WaveNet model into your code as it seems to be very performant to detect patterns. I'll need at least 2 other exchanges connector I guess for that.

New activation functions like SWISH could also help.

maxim5 · 2018-04-12T12:15:20Z

Hi @flamby

I see what you mean. What I usually do is let all models to learn and then drop the similar ones from the ensemble. Note that combining models with different hyperparameters (e.g., k - the window size, a very important parameter) is actually fine and it doesn't contradict François's idea. But I like your approach as well.

My concern was specifically about a situation when the linear model performs much better than any other complex model: this usually indicates that only a simple inference is possible, due to the nature of the data. It would be interesting to know if your approach leads to significant improvement over a linear model alone.

Can you share how you'd like to plug the wave net in?

bautroibaola · 2018-09-07T23:55:53Z

Dear maxim,

I also have same concern with flamby:

Because in run_train.py has while True loop, so how can I know when I should stop the training and move to run_predict.py? And need I run train every time before run predict?

Thanks for your hard work!

maxim5 · 2018-09-08T10:27:05Z

@bautroibaola This is a common problem in ML: there is no way to tell the model is ready. What people usually do is train it as long as they have time and simply take the best models. That's why there's an endless loop. However, feel free to replace it with some limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_train infinite loop? #2

run_train infinite loop? #2

flamby commented Apr 4, 2018

flamby commented Apr 5, 2018

maxim5 commented Apr 5, 2018

flamby commented Apr 5, 2018

maxim5 commented Apr 12, 2018

bautroibaola commented Sep 7, 2018 •

edited

Loading

maxim5 commented Sep 8, 2018

run_train infinite loop? #2

run_train infinite loop? #2

Comments

flamby commented Apr 4, 2018

flamby commented Apr 5, 2018

maxim5 commented Apr 5, 2018

flamby commented Apr 5, 2018

maxim5 commented Apr 12, 2018

bautroibaola commented Sep 7, 2018 • edited Loading

maxim5 commented Sep 8, 2018

bautroibaola commented Sep 7, 2018 •

edited

Loading