how to maximize CPU usage in distributed training? #603
-
My cluster consists of 4 nodes. Each node has 88 CPU cores. When I run distributed linear regression training using command below, I see ~23% CPU usage on each node. It is aligned with the expectation (20/88 ~= 23%). I would like to make full use of the HW resources in my cluster for . So I add nthreads parameter to daalinit() in linear_regression_spmd.py like below. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Update: |
Beta Was this translation helpful? Give feedback.
Update:
By adding mpi barrier between data loading and training, I see the CPU usage is expected (~90%) in training stage. The ~23% CPU usage appear in data loading stage.
So I guess multithreading doesn't apply to the daal4py data loading implementation.
The initial question becomes invalid now.