how to maximize CPU usage in distributed training? #603

bobjiang82 · 2021-04-02T07:55:12Z

bobjiang82
Apr 2, 2021

My cluster consists of 4 nodes. Each node has 88 CPU cores. When I run distributed linear regression training using command below, I see ~23% CPU usage on each node. It is aligned with the expectation (20/88 ~= 23%).
mpirun -np 80 -hosts sr542,sr543,sr545,sr546 python ./linear_regression_spmd.py

I would like to make full use of the HW resources in my cluster for . So I add nthreads parameter to daalinit() in linear_regression_spmd.py like below.
d4p.daalinit(nthreads=4)
I run the same command again and expect to see ~91% CPU usage on each node.
However, the CPU usage is still ~23%.
I wonder why and how to make full use of the CPU resources and get the shortest training time on a cluster.
Thanks.

Answered by bobjiang82

Apr 2, 2021

Update:
By adding mpi barrier between data loading and training, I see the CPU usage is expected (~90%) in training stage. The ~23% CPU usage appear in data loading stage.
So I guess multithreading doesn't apply to the daal4py data loading implementation.
The initial question becomes invalid now.

View full answer

bobjiang82 · 2021-04-02T08:56:05Z

bobjiang82
Apr 2, 2021
Author

Update:
By adding mpi barrier between data loading and training, I see the CPU usage is expected (~90%) in training stage. The ~23% CPU usage appear in data loading stage.
So I guess multithreading doesn't apply to the daal4py data loading implementation.
The initial question becomes invalid now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to maximize CPU usage in distributed training? #603

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

how to maximize CPU usage in distributed training? #603

bobjiang82 Apr 2, 2021

Replies: 1 comment

bobjiang82 Apr 2, 2021 Author

bobjiang82
Apr 2, 2021

bobjiang82
Apr 2, 2021
Author