Etymology of "Normalizing Flows" #57

chester-tan · 2024-08-16T11:22:02Z

chester-tan
Aug 16, 2024

In the "Learn the basics" tutorial, the etymology of the term "normalizing" in "normalizing" flows is explained as: "normalizing refers to the fact that the base distribution is often a (standard) normal distribution.

https://github.com/probabilists/zuko/blob/master/docs/tutorials/basics.ipynb

However in the prototypical(?) paper "Variational Inference with Normalizing Flows" it is explained that "By repeatedly applying the rule for change of variables, the initial density ‘flows’ through the sequence of invertible mappings. At the end of this sequence we obtain a valid probability distribution and hence this type of flow is referred to as a normalizing flow."

https://arxiv.org/abs/1505.05770

Is the etymology of the term "normalizing" in "normalizing" flows then more accurately explained as derived from the ability of the method to produce a "normalized" valid probability distribution instead of being derived from common use of "normal" distributions in the method?

Answered by francois-rozet

Aug 16, 2024

Hello @chester-tan 👋

I think there is indeed a bit of ambiguity here. The term "normalizing" is defined by Tabak et al. (2013) as

Normalizing the data $x_j$ is finding a map $y(x)$ such that the $y_j = y(x_j)$ have a prescribed distribution $\mu(y)$, for which we shall adopt here the isotropic Gaussian [...]

It is not clear to me whether they mean that normalizing means finding a transformation that matches any distribution $\mu(y)$ (with normalized density) or only an isotropic Gaussian (a normal distribution).

However, they later say

There is more than semantics to this rephrasing: normalizing the data is often a goal per se. It allows us, for instance, to compare observations from d…

View full answer

francois-rozet · 2024-08-16T12:48:41Z

francois-rozet
Aug 16, 2024
Maintainer

Hello @chester-tan 👋

I think there is indeed a bit of ambiguity here. The term "normalizing" is defined by Tabak et al. (2013) as

Normalizing the data $x_j$ is finding a map $y(x)$ such that the $y_j = y(x_j)$ have a prescribed distribution $\mu(y)$, for which we shall adopt here the isotropic Gaussian [...]

It is not clear to me whether they mean that normalizing means finding a transformation that matches any distribution $\mu(y)$ (with normalized density) or only an isotropic Gaussian (a normal distribution).

However, they later say

There is more than semantics to this rephrasing: normalizing the data is often a goal per se. It allows us, for instance, to compare observations from different datasets, to define robust metrics in phase-space, and to use standard statistical tools, often applicable only to normal distributions.

I therefore prefer the interpretation that normalizing refers to transforming data to a normal distribution. In addition, any distribution has a normalized probability density, even though it might be unknown. In this regard, the data distribution is not less "normalized" than the target distribution. I therefore find this definition confusing.

0 replies

chester-tan · 2024-08-16T13:31:42Z

chester-tan
Aug 16, 2024
Author

Hi @francois-rozet 👋

Thanks so much for the very helpful reply! 😊

Its etymology does seem a little ambiguous (I guess with many ML terms haha) and it's great to hear your take on it and why it's explained in the (instructive!) tutorials that way.

0 replies

francois-rozet · 2024-08-16T13:48:51Z

francois-rozet
Aug 16, 2024
Maintainer

You are welcome! I will convert this issue to a discussion as it is more appropriate. Feel free to ask more questions 😁

0 replies

stevenwalton · 2024-08-22T07:54:32Z

stevenwalton
Aug 22, 2024

I saw this and I'm not sure the term comes from the normal distribution, though it might... I do agree that there's a lot of ambiguity but to me the most clear thing (even if it means rewriting history) is seeing the normalization aspect as related to the normalizing constant (e.g. the marginal distribution in Bayes). This is because this part is what is typically intractable and is the "normalizing" part of a distribution. There seems to also be references and hints to it through the history of flows as well as diffusion and many others. This all gets extremely convoluted and I think there are a lot more similarities than we often admit.

I wrote a survey (intend to get this turned into a journal pub at some point) and I highly recommend Papamakarios et al's Normalizing Flows for Probabilistic Modeling and Inference.

The term is a bit convoluted and unfortunately messy. I mention my work because I try to make some clarification around this. If we go through the literature we find that people use the same methods to convert to any probability distribution. Which should be natural when we think of the process as composible bijections.

Papamakarios et al mentions The minimal transformation to orthonormality and Exploratory Projection Pursuit but I believe Gaussianization probably has the clearest root. From there we see the equation

$$ p(x) = \phi(T(x))|\frac{\partial T}{\partial x}| $$

There are a handful of works that use the diagonal Normal ( $\mathcal{N}(0,\mathbb{I})$ ), and I remember seeing plenty of other distributions like Gamma.

It gets a bit weirder as the history progresses because Residual Flows still uses the term but is the beginning of the branch of Neural ODEs (NODEs).

Personally, I'd advocate for a class of models called Isomorphic Flows, as categorically we would find all such classes of models to be equivalent. That is, we can draw a bijection between them. Then we could have clarity about calling these "Jacobian Flows", "COV Flows", or something along those lines. I believe this also helps remind us that there's much more power to these things and we can convert to any distribution.

3 replies

francois-rozet Aug 22, 2024
Maintainer

Hi @stevenwalton, thank you for your perspective!

I agree, the term Normalizing Flo was not appropriate in hindsight. If we could change it, "Isomorphic Flow" would be nice, or even "Diffeomorphic Flow" as we need a differentiable isomorphism. Unfortunately, I think it is a bit late to change the name, but it is worth mentioning the confusion in a survey.

For our tutorial, which is aimed at beginners, I think it is best to give a simple explanation. I find the interpretation with the normalization constant to be inappropriate, as we never compute it, nor mention it. That is why I like to interpret "normalization" as "gaussianization". Note that, in the tutorial we say that the base distribution is often -- not always -- a normal distribution.

stevenwalton Aug 22, 2024

Oh I totally understand, and I think it is correct to have the focus being on a gaussian distribution and not discussing the normalizing part in an intro tutorial. There's already too high of a barrier to entry for this class of models. I only wanted to mention the other parts given the specific question at hand.

As for the naming, I'm not convinced it is too late. There's so few people working with these architectures that I think we could make the push for the change. Plus, terms frequently change in ML, and I think this would at least be cleaner than many other changes lol. I think we'd have to get the Bayesians (nflows) group on board, but I'm under the impression that many feel a name change could be good due to the confusion many face with other similarly named architectures and methods. How I'd like to classify is "Isomorphic Flows" are the higher broader class (includes more than NFs) and I think you're right that Diffeomorphic Flows could be a good alternative to Normalizing Flows. The only reason I'd prefer "Jacobian Flows" is (imo) "diffeomorphic" sounds scarier.

Also, I've been following this project for awhile (I was actually building a similar project myself around the time you started). If you ever want help, I'd be happy to lend a hand.

francois-rozet Aug 23, 2024
Maintainer

I am glad that you like the project! I hope you find it useful for your research 😁

The project has been fairly stable for quite some time now, but there are always new features that can be added. There is a issue (#28) with a list of requested flow architectures. This is a great place to start contributing to Zuko.

By the way, I recently created Azula, which aims to be the same as Zuko, but for diffusion models. It is still in early stages, so there is a lot more to do over there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Etymology of "Normalizing Flows" #57

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Etymology of "Normalizing Flows" #57

chester-tan Aug 16, 2024

Replies: 4 comments · 3 replies

francois-rozet Aug 16, 2024 Maintainer

chester-tan Aug 16, 2024 Author

francois-rozet Aug 16, 2024 Maintainer

stevenwalton Aug 22, 2024

francois-rozet Aug 22, 2024 Maintainer

stevenwalton Aug 22, 2024

francois-rozet Aug 23, 2024 Maintainer

chester-tan
Aug 16, 2024

Replies: 4 comments 3 replies

francois-rozet
Aug 16, 2024
Maintainer

chester-tan
Aug 16, 2024
Author

francois-rozet
Aug 16, 2024
Maintainer

stevenwalton
Aug 22, 2024

francois-rozet Aug 22, 2024
Maintainer

francois-rozet Aug 23, 2024
Maintainer