Pre-train based on existing simulation data? #981

earlbellinger · 2024-03-08T06:36:59Z

earlbellinger
Mar 8, 2024

I have a fairly large number of objects that I want to characterize individually using sbi with a relatively high-dimensional and expensive simulator.

I have run my simulator with quasi-random (Sobol) inputs and generated a fairly large grid of data.

Is it possible/convenient in sbi to train the simulator based on this existing simulation data first before entering the active learning phase?

Relatedly, I would like the final inference for each of the objects to come from the same trained model. Is there a best practice for achieving this? I suppose I would want to iteratively apply the method until the posteriors for each don't change within some tolerance?

michaeldeistler · 2024-03-10T12:07:28Z

michaeldeistler
Mar 10, 2024
Maintainer

Hi there,

generally, the active learning phase is entirely optional. You can train sbi with pre-existing simulations with the flexible interface.

If you want to perform inference for many observations (which I think you call "objects") then it would not recommend to enter the active learning phase at all.

However, "high-dimensional and expensive" sounds like a difficult combination ;)

Best wishes
Michael

0 replies

earlbellinger · 2024-03-10T18:05:42Z

earlbellinger
Mar 10, 2024
Author

Many thanks for your helpful reply. This looks like precisely what I want.

Can you expand a bit more on why active learning is not recommended when there are many observations? My grid of pre-computed simulations provides good coverage across the parameter space, but there are likely areas of parameter space that require a higher density of models than I have computed in order to accurately model any corresponding observations, so I was hoping the active learning component would help there.

I am actually hoping to do two things: 1) do simulation-based inference on each observation separately, which I believe is the normal case; and 2) do simulation-based inference on all the observations simultaneously, with each observation having 2a) some parameters that are unique to that observation, 2b) some that are shared among all the observations, and 2c) some that are shared but should vary smoothly as a function of the parameters that are unique to each observation. Do you happen to know of a pre-existing example of the second case being carried out? This of course explodes the dimensionality of the problem a bit, but my impression is that this should be possible when replacing the simulator with a sufficiently trained deep neural network.

Yes, high-dimensional and expensive is certainly difficult. Fortunately I have a good amount of supercomputing resources at my disposal! :-)

0 replies

michaeldeistler · 2024-03-11T09:27:28Z

michaeldeistler
Mar 11, 2024
Maintainer

Can you expand a bit more on why active learning is not recommended when there are many observations?

You can definitely try this (and it might help) but to my knowledge, there are no papers that show that this is beneficial. In addition, using an active learning scheme makes the whole project more difficult because one cannot simply reuse "presimulated" data many times so I would suggest to only use active learning if really necessary.

which I believe is the normal case;

I don't think there is a "normal" case, but if there were one it would rather be the case were one uses a single inference network (without active learning)

...some that are shared but should vary smoothly as a function of the parameters that are unique to each observation.

I don't think there are pre-built methods for this, but: you can probably use SNLE to learn the likelihood and then use any Bayesian inference method you want to infer the posterior.

Michael

0 replies

earlbellinger · 2024-03-14T15:01:15Z

earlbellinger
Mar 14, 2024
Author

Thank you Michael! This has been very helpful. I will mark this as closed now. I may bug you again some more in the future :-)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-train based on existing simulation data? #981

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pre-train based on existing simulation data? #981

earlbellinger Mar 8, 2024

Replies: 4 comments

michaeldeistler Mar 10, 2024 Maintainer

earlbellinger Mar 10, 2024 Author

michaeldeistler Mar 11, 2024 Maintainer

earlbellinger Mar 14, 2024 Author

earlbellinger
Mar 8, 2024

michaeldeistler
Mar 10, 2024
Maintainer

earlbellinger
Mar 10, 2024
Author

michaeldeistler
Mar 11, 2024
Maintainer

earlbellinger
Mar 14, 2024
Author