Dnn ops #734

corepointer · 2024-05-28T07:40:36Z

This is a dirty and buggy snapshot of the progress on integrating all the needed ops for DNN.

Also contains fixes for some of the bugs :)

CUDA version of DNN ops mostly implemented (some backward passes missing)
CUDA convolution keeps crashing :-/
Script level alternatives for the operations are there to replace the CUDA ops
Pooling script level operation is crashing :-/
Contains some implementations for shape/type inference but this is not in a solid state
In some locations of the scripts there needed to be a type indicator to make it work. So all these ops rely on atm, which might not be ideal.
Contains an example with a LeNet implementation for MNIST character classification (ported from SystemDS)

corepointer · 2024-07-22T16:20:44Z

This PR is now based on #758

* This commit introduces the meta data object to the CSR data type * Memory pinning To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.

Due to the use of ptr to local var the distributed (GRPC_SYNC) mode crashed in test cases. This patch fixes this by using std::unique_ptr appropriately.

batch_norm2d shape & type inference affine shape/type inference softmax namespace, shape & type inference "fix" shape inference of some dnn ops by returning -1 instead of trying to calculate proper dimensions based on data that is sometimes just not there :-/ * getShape(getInput()) returns dimensions when input comes from readMatrix() but not if it's a rand() ?! * getInputHeight(), getInputWidth() does not return proper dimensions

This commit adds the necessary code changes to call CUDNN's activation backward function with ReLU activation type. No tests yet.

Contains ports from SystemDS * script level alternatives for pooling, convolution, etc * wrapper scripts for DAPHNE builtins (conv2d() -> conv2d.forward(), etc) * script path in default UserConfig.json

Example ported from SystemDS

currently supported: relu & conv2d tests are failing atm

This change makes the DaphneContext object global to avoid creation/desctruction in every UDF. The global context is passed by int64 casted pointer through the UserConfig.

agg all cuda launch config bugfix fix allagg cuda launch configs (now looping) aggall log

Added more operators to apply elementwise

The handling of the 1x1 matrix case should not be needed anymore once this is fixed in the compiler to call EwBinaryObjSca

corepointer added Accelerators feature missing/requested features labels May 28, 2024

corepointer force-pushed the dnn-ops branch 2 times, most recently from 315d0db to 3e02d16 Compare May 28, 2024 09:36

corepointer force-pushed the dnn-ops branch from da84126 to 1ae98c5 Compare July 1, 2024 18:01

corepointer force-pushed the dnn-ops branch 2 times, most recently from 66df10a to 40025a8 Compare July 22, 2024 16:19

corepointer mentioned this pull request Jul 25, 2024

788 closing mem leaks #791

Merged

corepointer force-pushed the dnn-ops branch from 40025a8 to 3248a93 Compare July 29, 2024 09:39

corepointer mentioned this pull request Jul 30, 2024

DNN fallback CPU ops #226 #780

Closed

corepointer added 19 commits August 19, 2024 15:26

[DAPHNE-daphne-eu#758] MetaDataObject for CSRMatrix

59944e7

* This commit introduces the meta data object to the CSR data type * Memory pinning To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.

[MINOR] Silenced a variety of compiler and linter warnings

ca1210e

[BUGFIX] LoadPartitioningDistributed crashed

4e0ae52

Due to the use of ptr to local var the distributed (GRPC_SYNC) mode crashed in test cases. This patch fixes this by using std::unique_ptr appropriately.

[DAPHNE-#xyz2] ReLU backward pass

06bff26

This commit adds the necessary code changes to call CUDNN's activation backward function with ReLU activation type. No tests yet.

[DAPHNE-#xyz3] batchnorm2d fwd test/train cpp

d6c4e1e

[DAPHNE-#xyz4] Neural Network DaphneDSL Scripts

511e379

Contains ports from SystemDS * script level alternatives for pooling, convolution, etc * wrapper scripts for DAPHNE builtins (conv2d() -> conv2d.forward(), etc) * script path in default UserConfig.json

[DAPHNE-#xyz5] Lenet Example Pipeline for MNIST character classification

f4651b1

Example ported from SystemDS

[DAPHNE-#xyz6] WIP Script level tests for DNN ops

6d322c4

currently supported: relu & conv2d tests are failing atm

[DAPHNE-#xyz7] Make DaphneContext global

a0c7d9a

This change makes the DaphneContext object global to avoid creation/desctruction in every UDF. The global context is passed by int64 casted pointer through the UserConfig.

[BUGFIX] AggAll CUDA launch config

7a652ba

agg all cuda launch config bugfix fix allagg cuda launch configs (now looping) aggall log

[BUGFIXI] Only shortcut-reshape if shared_ptr is not null

cd58be5

[MINOR] CUDA EwBinaryObjSca MIN/MAX/NEQ

f0ffae5

Added more operators to apply elementwise

[MINOR] Comparing booleans

f56c270

[MINOR] Cleanup & bugfix CUDA EwBinaryMat to handle mats of 1x1

8d9d4d2

The handling of the 1x1 matrix case should not be needed anymore once this is fixed in the compiler to call EwBinaryObjSca

[TODO] genkernel bugfix multi return ref datatype (dirty)

b761c7e

[TODO] possible problem: ErrorHandler does not check for null string

547d168

[TEMP] dnn cpp debug statements

997a324

[TEMP] UserConfig.json changes while debugging

5b2d890

[LOCAL] test scripts

afd7502

corepointer force-pushed the dnn-ops branch from a3050b0 to afd7502 Compare August 19, 2024 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dnn ops #734

Dnn ops #734

corepointer commented May 28, 2024

corepointer commented Jul 22, 2024

Dnn ops #734

Are you sure you want to change the base?

Dnn ops #734

Conversation

corepointer commented May 28, 2024

corepointer commented Jul 22, 2024