Etymology of "Normalizing Flows" #57
-
In the "Learn the basics" tutorial, the etymology of the term "normalizing" in "normalizing" flows is explained as: "normalizing refers to the fact that the base distribution is often a (standard) normal distribution. https://github.com/probabilists/zuko/blob/master/docs/tutorials/basics.ipynb However in the prototypical(?) paper "Variational Inference with Normalizing Flows" it is explained that "By repeatedly applying the rule for change of variables, the initial density ‘flows’ through the sequence of invertible mappings. At the end of this sequence we obtain a valid probability distribution and hence this type of flow is referred to as a normalizing flow." https://arxiv.org/abs/1505.05770 Is the etymology of the term "normalizing" in "normalizing" flows then more accurately explained as derived from the ability of the method to produce a "normalized" valid probability distribution instead of being derived from common use of "normal" distributions in the method? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
Hello @chester-tan 👋 I think there is indeed a bit of ambiguity here. The term "normalizing" is defined by Tabak et al. (2013) as
It is not clear to me whether they mean that normalizing means finding a transformation that matches any distribution However, they later say
I therefore prefer the interpretation that normalizing refers to transforming data to a normal distribution. In addition, any distribution has a normalized probability density, even though it might be unknown. In this regard, the data distribution is not less "normalized" than the target distribution. I therefore find this definition confusing. |
Beta Was this translation helpful? Give feedback.
-
Hi @francois-rozet 👋 Thanks so much for the very helpful reply! 😊 Its etymology does seem a little ambiguous (I guess with many ML terms haha) and it's great to hear your take on it and why it's explained in the (instructive!) tutorials that way. |
Beta Was this translation helpful? Give feedback.
-
You are welcome! I will convert this issue to a discussion as it is more appropriate. Feel free to ask more questions 😁 |
Beta Was this translation helpful? Give feedback.
-
I saw this and I'm not sure the term comes from the normal distribution, though it might... I do agree that there's a lot of ambiguity but to me the most clear thing (even if it means rewriting history) is seeing the normalization aspect as related to the normalizing constant (e.g. the marginal distribution in Bayes). This is because this part is what is typically intractable and is the "normalizing" part of a distribution. There seems to also be references and hints to it through the history of flows as well as diffusion and many others. This all gets extremely convoluted and I think there are a lot more similarities than we often admit. I wrote a survey (intend to get this turned into a journal pub at some point) and I highly recommend Papamakarios et al's Normalizing Flows for Probabilistic Modeling and Inference. The term is a bit convoluted and unfortunately messy. I mention my work because I try to make some clarification around this. If we go through the literature we find that people use the same methods to convert to any probability distribution. Which should be natural when we think of the process as composible bijections. Papamakarios et al mentions The minimal transformation to orthonormality and Exploratory Projection Pursuit but I believe Gaussianization probably has the clearest root. From there we see the equation There are a handful of works that use the diagonal Normal ( It gets a bit weirder as the history progresses because Residual Flows still uses the term but is the beginning of the branch of Neural ODEs (NODEs). Personally, I'd advocate for a class of models called Isomorphic Flows, as categorically we would find all such classes of models to be equivalent. That is, we can draw a bijection between them. Then we could have clarity about calling these "Jacobian Flows", "COV Flows", or something along those lines. I believe this also helps remind us that there's much more power to these things and we can convert to any distribution. |
Beta Was this translation helpful? Give feedback.
Hello @chester-tan 👋
I think there is indeed a bit of ambiguity here. The term "normalizing" is defined by Tabak et al. (2013) as
It is not clear to me whether they mean that normalizing means finding a transformation that matches any distribution$\mu(y)$ (with normalized density) or only an isotropic Gaussian (a normal distribution).
However, they later say