Skip to content

Latest commit

 

History

History
297 lines (232 loc) · 12.5 KB

troubles.md

File metadata and controls

297 lines (232 loc) · 12.5 KB


 

Restrictions and Trouble Shooting.

The math is all OK; computation is the nightmare.

Introduction

Although the theory of model fitting is quite straightforward, the implementation can be cumbersome. This is mostly due to the fact that we have limited precision computers and limited amounts of time. Even though all the computation is done in 8-byte floats, limited resources and sometimes the obnoxious shape of the likelihood landscape (or χ2-landscape) can make life for the fitter difficult. Another common bear on the road is the (near-)degeneracy in the model with respect to the data. It sometimes looks more like an art or a craft than like science

For diagnostic (and debugging) purposes in the iterative fitters there is the verbose option, which prints increasingly more information for higher values.

It is also a good idea to visually inspect the results of the fit. All fitters have a method

fitter.plotResult( xdata, ydata, model )

It is more easily called by turning on the plot option in the fit method:

fitter.fit( ydata, plot=True )

Or for a NestedSampler called ns:

ns.sample( plot=True )

Here are some guidelines that might help to get usefull results.

Restrictions on Data

Constraints on the independent variable(s)

The independent variable(s) (x) should be "nice" numbers, ie. roughly of order 1. Mostly the fitter obtains solutions by manipulating a matrix consisting of a direct product of the partial derivatives of the model to each of its parameters. If the elements of this matrix wildly vary in size, loss of precision is quickly attained.
E.g. a polynomial model of order 3 with an independent variable which has values from 1 to 100, will have a matrix with values ranging between 1 and 1012. Mathematically this is all OK, computation ...
The software can not and does not scale its inputs in any way. It takes it all at face value. It is upto the user to present the software with usable data.

Constraints on the dependent variable

The dependent variable (y) has less constraints. Still there is a silent assumption in the algorithms that the amount of noise in the data is of the order 1. This is only of importance for the stopping of iterative fitters. An iterative fitter stops when the change in the cost function from one iteration to the next is less than the tolerance. When the cost function is small to begin with, due to small y values, the fitter might stop too early. Or too late when y values are large. There is no way to do it right in all imaginable cases.
To redress this condition you can either use weights or set the tolerance attribute to adapt the stopping criterion to your problem, or scale the dependent variable such that the noise level attains a more usefull value.
Check whether χ2 is of the order of the number of datapoints.

Model Degeneracy

Sometimes the model is degenerate, meaning that 2 (or more) of its parameters are essentially measuring the same thing. Trying to fit data using Fitter to such a model results in a singular matrix.

It can also happen that the model is in principle non-degenerate. But the data do not encompass the full range of possibilties the model has to offer. The model is (almost) degenerate under this particular data set. For other data sets it could be OK.

Although NestedSampler has no problems with degenerate models, it is in general better to use simpler models.

Errors and warnings

Runtime Warning.

It can happen that a runtime warning is thrown during the run of a fitter (or of NestedSampler). In most cases the results are still OK. The program sailed undamaged pass the obstacle. Although some extra precautions should be taken with the results, the warnings can almost always be ignored.

Division by zero.

Sometimes a nonlinear fitter produces a division-by-zero warning. When the fitter just continues and produces sensible results, the warnings can be ignored. They are caused by a noise scale equal to 0, when calculating the loglikelihood. As +inf is larger than any other number the fitter cannot have a minimum there.

Matrix is degenerate

The model is (almost) degenerate under the data it is presented to. In general this signals that the model is not the best one for the data at hand. Use a simpler model.

Fitters.

(Nonlinear)Fitter does not find the minimum.

When a non-linear fitter searches for a minimum, it migh happen that for almost all values of the parameters χ2 does not have a gradient. That is when the parameters move away from the sought minimum, the χ2 does not or hardly change. Mostly this can be the case with models of periodic functions or when you are far from the global minimum. Only at a small area in the parameters space around the minimum there is a gradient to move along.
A similar and sometimes simultaneous condition happens when the landscape is multimodal, i.e. it has more than one minimum. Of course only one minimum is the lowest, the global minimum. And that one is the one we want to find. In general fitters don't have a global view on the χ2 landscape. They fall for the first minimum they encounter.
If you can in some other way localise the area where your parameters should be found, feed them to the system as initial parameters. Otherwise you have to use the AmoebaFitter in the annealing mode or even do exhaustive search. Both strategies take a lot more time.

AmoebaFitter does not start.

The size of the Simplex in the AmoebaFitter is by default 1. When your x-data is very much larger (or smaller) than 1 and you set your startValues accordingly, you might want to adapt the size of the simplex too. In extreme cases the 1 vanishes in the precision of the startValues and the simplex is frozen in some dimension(s). Use AmoebaFitter.fit( data, size=simplexsize ).

χ2 is zero.

This can only happen when the data fit the model perfectly. And as true perfection is not given to us it often is the result of something else e.g. digitization of the data: the data is only perfect after they have been digitized. This digitization gives each data point a noise of 1/sqrt(12), the standard deviation of a uniform distribution. The actual position of each datapoint can be anywhere between two digitization levels (presumably at distance 1). This noise can be added to the internal χ2 with setChiSquared( N/12 ).

Standard deviations seem too large.

When you think that the standard deviations on the parameters are ridiculously large, please calculate the confidence region on the model fit (use fitter.monteCarloError()). This 1-σ confidence interval is obtained directly from the standard deviation and its covariance matrix. Due to strong covariance between the parameters, the standard deviations can be large while the confidence region is still acceptable.
The confidence region is actually is better indicator for how well the model performs than the standard deviations on the parameters.

In a linear fit, the standard deviations are calculated for the point where xdata is zero. When all your xdata values are actually quite large, then the stdevs for a position way outside the actual range. However, due to the covariance between the stdevs, the confidence region in the range of xdata is acceptale.

Parameters at the edge

The standard deviations of the parameters are calculated as the square root of chisq multiplied by the diagonal elements of the inverse Hessian matrix at the chisq minimum, divided by the degrees of freedom. The chisq landscape at that point is a stationary minimum, curving up in all directions. The shape of the minimum is approximated by the inverse Hessian (also known as Covariance Matrix), upto the 2nd power in the Taylor expansion of the chisq function.
When you have set limits on the parameters and the unconstraint minimum of chisq would be outside the limits, then the constraint minimum of chisq is at one of the limits. This constraint minimum is not a stationary point any more so the assumptions above about a neat valley in the chisq landscape, curving up to all sides are not met. What the resulting calculations will give is anyones guess.

NestedSampler

The NestedSampler has much of the restrictions and problems the Fitters also have. Runtime warnings can commonly be ignored when the final results seem to be OK.

One important difference is that in Fitters the fit is independent on the noise scale. In NS it is not. NestedSampler takes the provided noise scale seriously. By default scale=1 and unfortunately, that is in almost all cases wrong.

It is much better to fit the noise scale along with the model. For that, the noise scale needs a prior. By default it is a JeffreysPrior, which needs limits to convert into a valid prior, integrating to 1.

Standard Deviations.

Standard deviations are proportional to the noise scale divided by the square root of the number of points. Nestedsampler is doing that too. However, if the noise scale is fixed during the run of NestedSampler, the standard deviations on the parameters is as much off as the true noise scale differs from the provided noise scale.

Even worse, if you set the scale of the error distribution to a fixed value while the residuals are significantly smaller, their contributions to the likelihood are all similarly small.

In those cases, it is better to add the scale factor as an extra (hyper)parameter to the problem. Actually in almost all case it is better to do so.

Starting Problems.

Sometimes it looks like NestedSampler cannot find the maximum in the likelihood landscape. The path to the likelihood mountain is lost in the n-dim space spanned by the parameters. A tighter prior might help. Or setting the attribute minimumIterations to something large. By default it is 100. Enlarging it, gives NestedSampler more opportunities to explore the parameter space.
Ensuring that the prior space contains the expected values for the posterior parameter values, can also help.

It may help to restart NestedSampler with another seed for the random number generator.

Premature Freezing

Premature freezing happens when the NestedSampler stops and obviously has not reached its goal. The parameters are not well optimized and got stuck somewhere along the way. This can happen when the changes needed in the model only affect a small part of the data. The rest of the data is comfortable with the status quo. It is now easier to adapt the noise scale to encompass the deviating points, than to find the path to the top.

For NS all data points are independent, even when we see a clear systematic behaviour in the residuals, fitters, including NS do not have that overview.

The keyword bestBoost was introduced to address this issue. When set True, every valid step is checked whether it is better than any of the previous steps residing in the PhantomCollection. If so, a Fitter is run, starting at the best position to improve the best even further. As this is happening in the PhantomCollection it does not affect the evidence calculation. But it opens a corridor to the top.