Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in the docs and in combining layers #329

Open
marcobonici opened this issue Sep 21, 2024 · 1 comment
Open

Error in the docs and in combining layers #329

marcobonici opened this issue Sep 21, 2024 · 1 comment

Comments

@marcobonici
Copy link

marcobonici commented Sep 21, 2024

I need to train a Normalizing flow on some samples and then use it as a distribution.
@Red-Portal suggested me to use Bijectors.jl. However, when trying to follow the example in the documentation, I noticed there is an error in the documentation.

julia> sb = stack(ibs...) # == Stacked(ibs) == Stacked(ibs, [i:i for i = 1:length(ibs)]
ERROR: MethodError: no method matching length(::Inverse{Bijectors.Logit{Float64, Float64}})

Closest candidates are:
  length(!Matched::LibGit2.GitBlob)
   @ LibGit2 /opt/hostedtoolcache/julia/1.10.5/x64/share/julia/stdlib/v1.10/LibGit2/src/blob.jl:3
  length(!Matched::LibGit2.GitStatus)
   @ LibGit2 /opt/hostedtoolcache/julia/1.10.5/x64/share/julia/stdlib/v1.10/LibGit2/src/status.jl:21
  length(!Matched::DataStructures.DiBitVector)
   @ DataStructures ~/.julia/packages/DataStructures/95DJa/src/dibit_vector.jl:40

For my use case, that is not important, but I thought it was worth mentioning.

So, I tried to use a PlanarLayer on my use case, but it did not work: the training was performed, but the result was not satisfying. Then, as suggested by the tutorial, I tried to compose a couple of layers, to see it would have improved performance...and I got this error, when getting to the training

ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved

Stacktrace:
 [1] broadcastable(::@NamedTuple{w::Vector{Float64}, u::Vector{Float64}, b::Vector{Float64}})
   @ Base.Broadcast ./broadcast.jl:744
 [2] broadcasted
   @ ./broadcast.jl:1345 [inlined]
 [3] (::var"#1#2")(θ::PlanarLayer{Vector{Float64}, Vector{Float64}}, ∇::@NamedTuple{w::Vector{Float64}, u::Vector{Float64}, b::Vector{Float64}})
   @ Main ./In[2]:31
 [4] (::Base.var"#4#5"{var"#1#2"})(a::Tuple{PlanarLayer{Vector{Float64}, Vector{Float64}}, @NamedTuple{w::Vector{Float64}, u::Vector{Float64}, b::Vector{Float64}}})
   @ Base ./generator.jl:36
 [5] iterate(::Base.Generator{Vector{Any}, IRTools.Inner.var"#52#53"{IRTools.Inner.var"#54#55"{IRTools.Inner.Block}}})
   @ Base ./generator.jl:47 [inlined]
 [6] collect(itr::Base.Generator{Base.Iterators.Zip{Tuple{@NamedTuple{outer::PlanarLayer{Vector{Float64}, Vector{Float64}}, inner::PlanarLayer{Vector{Float64}, Vector{Float64}}}, Tuple{@NamedTuple{w::Vector{Float64}, u::Vector{Float64}, b::Vector{Float64}}, @NamedTuple{w::Vector{Float64}, u::Vector{Float64}, b::Vector{Float64}}}}}, Base.var"#4#5"{var"#1#2"}})
   @ Base ./array.jl:834
 [7] map(::Function, ::@NamedTuple{outer::PlanarLayer{Vector{Float64}, Vector{Float64}}, inner::PlanarLayer{Vector{Float64}, Vector{Float64}}}, ::Tuple{@NamedTuple{w::Vector{Float64}, u::Vector{Float64}, b::Vector{Float64}}, @NamedTuple{w::Vector{Float64}, u::Vector{Float64}, b::Vector{Float64}}})
   @ Base ./abstractarray.jl:3406
 [8] top-level scope
   @ In[2]:30

Here below is the MWE, to reproduce the error. I am doing what is suggested in the docs, but maybe I misinterpreted something...? Thanks in advance for your help!

using Zygote
using Bijectors
using Functors
b = PlanarLayer(2)  PlanarLayer(2)
using Functors
θs, reconstruct = Functors.functor(b);
       
struct NLLObjective{R,D,T}
    reconstruct::R
    basedist::D
    data::T
end


function (obj::NLLObjective)(θs...)
    transformed_dist = transformed(obj.basedist, obj.reconstruct(θs))
    return -sum(Base.Fix1(logpdf, transformed_dist), eachcol(obj.data))
end

xs = randn(2, 1000);

f = NLLObjective(reconstruct, MvNormal(2, 1), xs);
       
@info "Initial loss: $(f(θs...))"

ε = 1e-3;

for i in 1:100
    ∇s = Zygote.gradient(f, θs...)
    θs = map(θs, ∇s) do θ, ∇
        θ - ε .*end
end
@info "Finall loss: $(f(θs...))"
       
samples = rand(transformed(f.basedist, f.reconstruct(θs)), 1000);

mean(eachcol(samples)) # ≈ [0, 0]

cov(samples; dims=2)   # ≈ I
@torfjelde
Copy link
Member

torfjelde commented Sep 23, 2024

Thanks for raising these @marcobonici ! Both are now addressed in #330 .

For the first issue, this is a mistake where stack has been removed (due to being introduced in julia's Base so our impl ended up being type piracy).

For your second issue, the following works just fine:)

using Zygote, Bijectors, Functors
b = PlanarLayer(2)  PlanarLayer(2)

θs, reconstruct = Functors.functor(b);
       
struct NLLObjective{R,D,T}
    reconstruct::R
    basedist::D
    data::T
end


# CHANGED: `θs...` -> `θs` so we maintain the same container-structure.
function (obj::NLLObjective)(θs)
    transformed_dist = transformed(obj.basedist, obj.reconstruct(θs))
    return -sum(Base.Fix1(logpdf, transformed_dist), eachcol(obj.data))
end

xs = randn(2, 1000);

f = NLLObjective(reconstruct, MvNormal(2, 1), xs);
       
@info "Initial loss: $(f(θs))"

# CHANGED: `1e-3` -> `1e-4` because I ran into divergences when using `1e-3`.
ε = 1e-4;

for i in 1:200
    # CHANGED: `θs... -> θs` again due to above.
    # CHANGED: `Zygote.gradient` returns a tuple of length equal to number of args,
    #   so we need to unpack the first component to get the gradient of `θs`.
    (∇s,) = Zygote.gradient(f, θs)
    # CHANGED: `map` -> `fmap` so we can map over nested `NamedTuple`s, etc.
    θs = fmap(θs, ∇s) do θ, ∇
        θ - ε .*end
end
@info "Finall loss: $(f(θs))"
       
samples = rand(transformed(f.basedist, f.reconstruct(θs)), 1000);

mean(eachcol(samples)) # ≈ [0, 0]

cov(samples; dims=2)   # ≈ I

Basically, the example in the docs is not really the correct way to do things but it just happened to work out 😬

To be compatible with nested transformations, etc. we need to use fmap instead of map (this wasn't an issue before because we curried the namedtuple θs... when passing it to the loss + it was a single layer so the resulting gradient was a flat namedtuple; now that we do nested transforms, this is no longer the case).

torfjelde added a commit that referenced this issue Sep 23, 2024
torfjelde added a commit that referenced this issue Sep 23, 2024
torfjelde added a commit that referenced this issue Sep 23, 2024
* Fixed bug in docs as reported in #329

* Removed usage of removed `stack` in favour of `Stacked` in docs
example as reported in #329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants