Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing boundary checks on masked grids #114

Merged
merged 41 commits into from
Jul 9, 2024

Conversation

nicolasmerino41
Copy link

@nicolasmerino41 nicolasmerino41 commented Jun 13, 2024

Hey there again :)

I opened the PR to discuss a "small?" implementation of masked boundary checks in OutwardsDispersal. I haven't had much success yet but we're on it :) I've tried to modify OutwardsDispersal such that it can hold an optional mask:

struct OutwardsDispersal{R,W,S<:Stencils.AbstractKernelStencil} <: SetNeighborhoodRule{R,W}
    stencil::S
    mask::Union{Nothing, Array{Bool}}  # Add an optional mask parameter
end

And then modified its applyrule! so it discards cells outside of the mask.

Perhaps I instead should create a brand new OutwardsDispersalWithMask to keep them separate but since I don't know what other parts of the "family" packages are related with these rules, this feels more tedious to build.

Just for the sake of understanding the package behaviour:

  • How come InwardsDispersal works fine while not being defined anywhere in the package nor DG/Stencils/etc? Like this is all it's written:
struct InwardsDispersal{R,W,S<:Stencils.AbstractKernelStencil} <: NeighborhoodRule{R,W}
    stencil::S
end
function InwardsDispersal{R,W}(; kw...) where {R,W}
    InwardsDispersal{R,W}(DispersalKernel(; kw...))
end

@inline applyrule(data, rule::InwardsDispersal, N, I) = kernelproduct(rule)

I know what kernelproduct() is used for, but where did you define its method for an InwardsDispersal rule?

Edit: Btw, just to know. Are tests passing for you? Because I could not pass them even before making any modification.

src/kernel/outwards.jl Outdated Show resolved Hide resolved
Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
src/kernel/outwards.jl Outdated Show resolved Hide resolved
Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
sum += propagules
target = I .+ offset
# Check if the target cell is within bounds and not masked
(target_mod, inbounds) = _inbounds(Reflect(), size(data[1]), target...)
Copy link
Member

@rafaqz rafaqz Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inbounds(data, i) does exactly this for you

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! And it's better cause I don't need to specify the boundary condition

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we try to keep anything related to boundary conditions, aux data, masks etc out of rules and in the data object. Then we can swap them around without changing all our code.

target = I .+ offset
# Check if the target cell is within bounds and not masked
(target_mod, inbounds) = _inbounds(Reflect(), size(data[1]), target...)
if inbounds && (rule.mask === nothing || rule.mask[target_mod...])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if inbounds && (rule.mask === nothing || rule.mask[target_mod...])
if inbounds && (isnothing(rule.mask) || mask(data)[target_mod...])

@rafaqz
Copy link
Member

rafaqz commented Jun 13, 2024

Ok instead of attaching the mask to the rule, we can just get it from data with mask(data).

We can have the rule field as nothing and some other value? maybe a singleton _Mask()? Then in the constructor for the rule we switch true/false to _Mask()/nothing.

This is better than the field being true or false, because the compiler cant remove the check for true/false, but it can for two different types.

@rafaqz
Copy link
Member

rafaqz commented Jun 13, 2024

kernelproduct is in Stencils.jl now (try @which kernelproduct).

Thats available to Dispersal.jl, so it just works.

(we need kernelproduct because Base.dot is recursive, which is awful)

And maybe you wonder how the constructor for InwardsDispersal works?

There are default constructors, see here:
https://github.com/cesaraustralia/DynamicGrids.jl/blob/main/src/rules.jl#L96-L114

@nicolasmerino41
Copy link
Author

But how will mask(data) work when it's not missingval's what's in the masked area? And I don't think using the with= keyword would work either cause it'd need to be specified before.

@nicolasmerino41
Copy link
Author

Well, I'll leave it for today. I don't even know what I'm doing at this point jajaja. I created what I think it's a singleton struct _Mask end so it can be specified when using OutwardsDispersal, otherwise mask_flag is nothing by default.

Then I added the mask(data) approach within applyrule! but still don't understand why it should work, which it doesn't so far :)

@inline function applyrule!(data, rule::OutwardsDispersal{R,W}, N, I) where {R,W}
    N == zero(N) && return nothing
    
    # Check if the current cell is masked, skip if it is
    mask_data = if rule.mask_flag === NoMask nothing else mask(data) end
    if !isnothing(mask_data) && !mask_data[I...]
        return nothing
    end
    
    sum = zero(N)
    for (offset, k) in zip(offsets(rule), kernel(rule))
        target = I .+ offset
        (target_mod, inbounds) = inbounds(data, target)
        if inbounds && (isnothing(mask_data) || mask_data[target_mod...])
            @inbounds propagules = N * k  
            @inbounds add!(data[W], propagules, target_mod...)  
            sum += propagules
        end
    end
    @inbounds sub!(data[W], sum, I...)
    return nothing
end

src/kernel/outwards.jl Outdated Show resolved Hide resolved
@@ -32,21 +32,90 @@ is occupied.

Pass grid name `Symbol`s to `R` and `W` type parameters to use specific grids.
"""
struct OutwardsDispersal{R,W,S<:Stencils.AbstractKernelStencil} <: SetNeighborhoodRule{R,W}
struct _Mask end
const NoMask = nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by what NoMask was below. I would just use nothing, or make NoMask a real singleton like _Mask.

You can even make Mask and NoMask singletons, then users could even use them directly. IDK

outdisp_with_mask = OutwardsDispersal(
formulation=ExponentialKernel(λ=0.0125),
distancemethod=AreaToArea(30),
mask_flag=_Mask()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use it directly like this I would just make it Mask, underscore is if its internal

@rafaqz
Copy link
Member

rafaqz commented Jun 13, 2024

I mean DynamicGrids.mask! That's what you will use inside DynamicGrids.jl, as theres no Rasters.jl dep.

Its unfortunate the name is the same, its from before Rasters existed. It just gets the mask object that you pass in to the output.

@nicolasmerino41
Copy link
Author

nicolasmerino41 commented Jun 13, 2024 via email

Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
@nicolasmerino41
Copy link
Author

Ok, for perspective here's a full benchmarking for the original package (O), my script (M) and your suggestions(S):
With mask but OutwardsDispersal NoMask(): O:75ms, M:190ms, S:197ms
Without mask and OutwardsDispersal NoMask(): O:66ms, M:186, S:193ms
With mask and OutwardsDispersal Mask(): O:NoFeature, M:260ms, S:260ms
Without mask and InwardsDispersal: 20ms
With mask and InwardsDispersal: 50ms
I guess if you'd like to go back to original performance we'd have to take the route of creating a completely different function, like OutwardsDispersalWithMask? so you can keep the performance of the original OutwardsDispersal through not checking any mask at all

@inbounds add!(data[W], propagules, I .+ offset...)
sum += propagules
target = I .+ offset
(target_mod, inbounds) = inbounds(data, target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(target_mod, inbounds) = inbounds(data, target)
(target_mod, inbounds) = inbounds(data, target)

This line is still running in all cases... this is the problem ;)

It doesn't need to, it can be in the conditional below.

Copy link
Author

@nicolasmerino41 nicolasmerino41 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're reviewing the old code again because I went back to it after checking your idea. If you check the previous commit 607885d you'd see it only runs (target_mod, inbounds) = inbounds(data, target) when mask.flag === Mask() I believe.

But I think I know what you mean, I'll try to implement it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code that's running now... The rest is commented out

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented out part is actually your conditional idea that I benchmarked and didn't get better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, hard to assess it clearly without being able to see the diff

@rafaqz
Copy link
Member

rafaqz commented Jun 17, 2024

I don't think we need a different function, as long as we let the compiler see the differences it will just delete them completely.

That line I've highlighted needs to go inside a conditional that depends on the type of mask_flag. Then it wont run. I can rewrite this with a PR to this PR if thats easier for you.

This should work and have the original performance when mask_data is nothing

  sum = zero(N)
    for (offset, k) in zip(offsets(rule), kernel(rule))
        target = I .+ offset
        inbounds = if isnothing(mask_data)    
            true
        else        
            (target_mod, inbounds) = inbounds(data, target)
            mask_data[target_mod...])
        end
        if inbounds
            propagules = N * k  
            @inbounds add!(data[W], propagules, target_mod...)  
            sum += propagules
        end
    end

@nicolasmerino41
Copy link
Author

Ok, kind of fixed?
Times now are (N), compared to original (O):
With mask but OutwardsDispersal NoMask(): N: 76ms, O:76ms
Without mask and OutwardsDispersal NoMask(): N:66ms, O:66ms
With mask and OutwardsDispersal Mask(): N:278ms, O:NoFeature

However, now I'm getting losses on real boundaries, which didn't use to happen. For example:
A non_masked grid with non-mask OutwardsDispersal causes losses.
A masked_grid with masked OutwardsDispersal but with "some" mask borders equal to real edges, also cause losses on those edges (e.g. an L shape mask). If no mask edge touches a real edge, then it works fine.
I guess we're missing some bounds-check with this new code?

@inline function applyrule!(data, rule::OutwardsDispersal{R,W}, N, I) where {R,W}
    N == zero(N) && return nothing

    # Check if the current cell is masked, skip if it is
    mask_data = if rule.mask_flag === NoMask() nothing else DynamicGrids.mask(data) end
    if !isnothing(mask_data) && !mask_data[I...]
        return nothing
    end
    
    sum = zero(N)
    for (offset, k) in zip(offsets(rule), kernel(rule))
        target = I .+ offset
        inbounds = if isnothing(mask_data)    
            true
        else        
            (target_mod, inbounds) = DynamicGrids.inbounds(data, target)
            mask_data[target_mod...]
        end
        if inbounds
            @inbounds propagules = N * k  
            @inbounds add!(data[W], propagules, target...)  
            sum += propagules
        end
    end
    @inbounds sub!(data[W], sum, I...)
    return nothing
end

@rafaqz
Copy link
Member

rafaqz commented Jun 18, 2024

Nice timings!

A non_masked grid with non-mask OutwardsDispersal causes losses

The original algorithm should be identical to what it was. It looks identical to me?

I assumed there was always some losses at the edges for non-masked grids with Remove boundary conditions?

(And bounds checks are pretty expensive, as you can see in your algorithm timings. We could add another singleton like Mask to opt into them?)

@nicolasmerino41
Copy link
Author

I don't know, I'm quite confused now, because the tests I wrote actually checked that when there was no mask and NoMask() OutwardsDispersal there should only be floating error loss (which is below 1.0) and I was passing them (when using Wrap or Reflect of course). But now I don't pass them, neither I pass them using the original Dispersal#dev. InwardsDispersal only has floating error loss when not using a mask. I don't think there should be that large loss when not using a mask (and using Wrap or Reflect), I find it quite inconvenient because if you really need to shut down losses you'll have to create a buffer mask. It doesn't personally affect because my masks barely touch the grid edges, but seems suboptimal to me :)

We could add another singleton like Mask to opt into them?
I don't really follow you. Like, if you don't specify mask_flag in OutwardsDispersal it will not use the feature, not bound-check and perform as fast as usual, no?

@rafaqz
Copy link
Member

rafaqz commented Jun 18, 2024

Yes having to use a fake mask sounds annoying. Personally I always have a mask if I use Remove, otherwise I would use Wrap (shouldn't Wrap be loss-free? Don't the writes get wrapped around? maybe thats another bug lol). But others may not.

I don't really follow you. Like, if you don't specify mask_flag in OutwardsDispersal it will not use the feature, not bound-check and perform as fast as usual, no?

Ahh I think I get you, yes we can just use the same Mask singleton when there is no mask array just to mean no out of bounds losses. Maybe we should change the name to Lossless or something to cover both of these cases?

(We will also need to add a docstring to describe what these things do and why you would choose to use them or not)

@nicolasmerino41
Copy link
Author

nicolasmerino41 commented Jun 18, 2024

shouldn't Wrap be loss-free?
Exactly! I think it should but it's not now unless there's a full mask around borders. I feel that bug didn't use to be there.

we can just use the same Mask singleton when there is no mask array just to mean no out of bounds losses
That's a good idea, but it doesn't work because if there's no mask(data), the function won't access the bound-checking part of the code and lose individuals anyway. This for example gives a false:

output_without_mask_and_masked_rule = ArrayOutput(init; tspan=1:1000)

r = sim!(output_without_mask_and_masked_rule, Ruleset(outdisp_with_mask; boundary=Reflect()))
sum(r[1]) ≈ sum(r[1000]) #FALSE

Because it's basically the same problem, if a mask doesn't cover all grid borders, it'll lose individuals.

We will also need to add a docstring to describe what these things do and why you would choose to use them or not.
Consider it done :)

@rafaqz
Copy link
Member

rafaqz commented Jun 18, 2024

Yes I meant we would have to adjust the code so that the Mask flag will also trigger checking for bounds if there is no Mask object. And for that reason we would need to change the name too so its not connected to mask.

For Wrap and Reflect to work at the boundaries we need to use the target_mod indices that have been fixed to be wrapped or reflected, while for Remove we just don't do the check at all. So we would need another check, probably on the boundary condition being Remove, or anything else.

Or, we just always do the inbounds check for all cases and take the performance hit to keep the algorithm simple... so only write if we are inbounds and if we do then write to the modified indices.

Maybe that's just simpler and better?

(you can probably see why this was simple but a bit buggy... its pretty hard to get all of these things right and have good performance and simple code all at the same time)

@nicolasmerino41
Copy link
Author

I mean, that's your call. If you're ok with a decrease in performance I'm fine with that. And InwardsDispersal is always there in need of efficiency.

And yes, I see the tradeoff :)

@rafaqz
Copy link
Member

rafaqz commented Jun 18, 2024

Ok lets just make it correct for now :)

I can go back in an retrieve some Remove boundaries performance later if need be.

@nicolasmerino41
Copy link
Author

Ok, so I brought the code back to always bound-check. We're back to the not-so-good timings we discussed the other day but the tests all pass.

I also added a more detailed description in Outwards.jl so the user clearly knows what mask_flag is for. If there is any other place or any other info you'd like to specifiy, let me know :)

output_without_mask = ArrayOutput(init; tspan=1:1000)
b = sim!(output_without_mask, rule_without_mask)

@test sum(b[1]) ≈ sum(b[1000]) # Floating error should be smaller than 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test sum(b[1]) sum(b[1000]) # Floating error should be smaller than 1.0
@test sum(b[1]) sum(b[1000]) # Floating error should be close to zero

output_with_mask = ArrayOutput(init; tspan=1:1000, mask=mask_data)
a = sim!(output_with_mask, rule_with_mask)

@test sum(a[1]) ≈ sum(a[1000]) # Floating error should be smaller than 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test sum(a[1]) sum(a[1000]) # Floating error should be smaller than 1.0
@test sum(a[1]) sum(a[1000]) # Floating error should be close to zero

Comment on lines 40 to 41
struct Mask end
struct NoMask end
Copy link
Member

@rafaqz rafaqz Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct Mask end
struct NoMask end
struct AvoidMasked end
struct LossToMasked end

Ideas for more descriptive names, now that everything works

stencil::S
mask_flag::M
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mask_flag::M
maskbehavior::M

@nicolasmerino41
Copy link
Author

Cool, all changes done.
I guess by AvoidMasked you meant that masked area is out of bounds so you avoid it, but also sounds like you avoid using the mask? Which is the opposite... jeje
So, if it's ok with you, I used CheckMaskEdges and IgnoreMaskEdges instead.

@rafaqz
Copy link
Member

rafaqz commented Jun 19, 2024

Yes it's hard to not get a double negative there!

But those names sound good to me.

@nicolasmerino41
Copy link
Author

Hey Raf,

Hope you're doing good :)
Is there any chance this PR could be merged any time soon? I'm working with a remote platform and I can't install packages from local so I'm not able to use this new implementation :/

I'll figure something out if you're busy, no worries :)
Nico

@rafaqz rafaqz merged commit 2a72555 into cesaraustralia:dev Jul 9, 2024
0 of 3 checks passed
@rafaqz
Copy link
Member

rafaqz commented Jul 9, 2024

Sorry, holidays

I've merged will fix the tests later

(You will still have to use the main branch it's not registered yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants