Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance improvements #281

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

tiemvanderdeure
Copy link
Contributor

recode has a specialized dispatch on categorical arrays, so calling unwrap on them has huge overhead

@tiemvanderdeure
Copy link
Contributor Author

tiemvanderdeure commented Oct 31, 2024

Before and after profviews:
Before:
billede
After:
billede

I'm running this and am getting a 10 times speedup

N = 1000
X = (x1 = rand(Float32, N), x2 = randn(Float32, N), x3 = categorical(rand('a':'c', N)))
y = categorical(bitrand(N))

model = MLJFlux.NeuralNetworkBinaryClassifier(epochs = 10, builder=MLJFlux.MLP(; hidden=(5,4)), batch_size = 100)
mach = machine(model, X, y)
fit!(mach)

@profview for i in 1:100 predict(mach, X) end

@tiemvanderdeure tiemvanderdeure changed the title do not unwrap categorical column when encoding performance improvements Oct 31, 2024
@tiemvanderdeure
Copy link
Contributor Author

tiemvanderdeure commented Oct 31, 2024

Added a commit to change matrices to Float32 before passing them to the chain - otherwise this is done inside the chain anyway, but this throws a warning which has a small overhead, even though it is not printed for some reason. I think we just forgot to do this in #272

@tiemvanderdeure
Copy link
Contributor Author

The test failure is unrelated, I think?

@ablaom
Copy link
Collaborator

ablaom commented Oct 31, 2024

Great work @tiemvanderdeure identifying the performance bottleneck, @tiemvanderdeure. I just want to confirm that we are not returning to the problem reported here. Can you please check?

@ablaom
Copy link
Collaborator

ablaom commented Oct 31, 2024

Yes, it appears MLJFlux is failing on julia 1.11

@tiemvanderdeure
Copy link
Contributor Author

I just want to confirm that we are not returning to the problem reported here.

I did reintroduce that problem (although I don't think it affects the matrix in the end). I fixed it by calling unwrap after recoding.

The real problem here is that recode is type unstable in CategoricalArrays for normal Array types.

@tiemvanderdeure
Copy link
Contributor Author

Okay just made a PR to CategoricalArrays.jl that also fixes this JuliaData/CategoricalArrays.jl#407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants