RefactoringNLS.Rmd

---
title: "Refactoring the `nls()` function in R"
author: 
   - John C. Nash \thanks{ retired professor, Telfer School of Management, University of Ottawa}
   - Arkajyoti Bhattacharjee \thanks{Department of Mathematics and Statistics, Indian Institute of Technology, Kanpur}
date: "2021-8-20"
output: 
    pdf_document:
        keep_tex: false
        toc: true
bibliography: ImproveNLS.bib
link-citations: yes
linkcolor: red
urlcolor: blue
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
## require(bookdown) # language engine to display text - does not seem necessary
```

# Abstract

This article reports the particular activities of our Google Summer of Code
project "Improvements to `nls()`" that relate to the R code for that function, which
is intended for the estimation of models written as a formula that has at least
one parameter that is "nonlinear", that is, not estimable via solving a set of 
linear equations. A companion document **Variety in Nonlinear Least Squares Codes** 
presents an overview of methods for the problem which takes a much wider view of the
problem of minimizing a function that can be written as a sum of squared
terms.

Our work has not fully addressed all the issues that we would like to see
resolved, but we believe we have made sufficient progress to demonstrate that
there are worthwhile improvements that can be made to the R function `nls()`.
An important overall consideration in our work has been the maintainability of
the code base that supports the `nls()` functionality, as we believe that the
existing code makes maintenance and improvement very difficult.

# The existing `nls()` function: strengths and shortcomings

`nls()` is the tool in base R (the distributed software package from [CRAN](https://cran.r-project.org)
for estimating nonlinear statistical models. The function was developed mainly in the 1980s and
1990s by Doug Bates et al., initially for S (see https://en.wikipedia.org/wiki/S_%28programming_language%29) . 
The ideas spring primarily from the book by @bateswatts.

The `nls()` function has a remarkable and quite comprehensive set of capabilities for estimating
nonlinear models that are expressed as formulas. In particular, we note that it

- handles formulas that include R functions
- allows data to be subset
- permits parameters to be indexed over a set of related data
- produces measures of variability (i.e., standard error estimates) for the estimated parameters
- has related profiling capabilities for exploring the likelihood surface as parameters are changed

With such a range of features and a long history, it is not surprising that the code has become untidy 
and overly patched. It is, to
our mind, essentially unmaintainable. Moreover, its underlying methods can and should be improved. Let
us review some of the issues. We will then propose corrective actions, some of which we have carried out.

## Feature: Convergence and termination tests

Within the standard documentation (**man** or ".Rd" file) `nls()` warns

>  **The default settings of nls generally fail on artificial “zero-residual” data problems.**

>  The nls function uses a relative-offset convergence criterion that compares the numerical imprecision at
  the current parameter estimates to the residual sum-of-squares. This performs well on data of the form $$y = f(x, \theta) + eps$$

>  (with var(eps) > 0). It fails to indicate convergence on data of the form $$y = f(x, \theta)$$

>  because the criterion amounts to comparing two components of the round-off error. To avoid a zero-divide in
  computing the convergence testing value, a positive constant `scaleOffset` should be added to the denominator
  sum-of-squares; it is set in control; this does not yet apply to algorithm = "port".

It turns out that this issue can be quite easily resolved. The key "convergence test" -- more properly
a "termination test" for the **program** rather than testing for convergence of the underlying **algorithm** --
is the Relative Offset Convergence Criterion (see @BatesWatts81). This works by projecting the proposed
step in the parameter vector on the gradient and estimating how much the sum of squares loss function 
will decrease. To avoid scale issues, we use the current size of the loss function as a measure and divide
by it. When we have "converged", the estimated
decrease is very small, as usually is its ratio to the sum of squares. However, in some cases we have the 
possibility of an exact fit and the sum of squares is (almost) zero and we get the possibility of a 
zero-divide failure. 

The issue is easily resolved by adding a small quantity to the loss function. To preserve legacy behaviour,
in 2021, one of us (J. Nash) proposed that `nls.control()` have an additional parameter `scaleOffset` 
with a default value of zero for legacy behaviour. Setting it to a small number -- 1.0 is 
a reasonable choice --
allows small-residual problems (i.e., near-exact fits) to be dealt with easily. We call this the
**safeguarded relative offset convergence criterion**.

We are pleased to report that this improvement is now in the R distributed code.

### Example of a small-residual problem

```
rm(list=ls())
t <- -10:10
y <- 100/(1+.1*exp(-0.51*t))
lform<-y~a/(1+b*exp(-c*t))
ldata<-data.frame(t=t, y=y)
plot(t,y)
lstartbad<-c(a=1, b=1, c=1)
lstart2<-c(a=100, b=10, c=1)
nlsr::nlxb(lform, data=ldata, start=lstart2)
nls(lform, data=ldata, start=lstart2, trace=TRUE)
# Fix with scaleOffset
nls(lform, data=ldata, start=lstart2, trace=TRUE, control=list(scaleOffset=1.0))
sessionInfo()
```

Edited output of running this function follows:

```
> rm(list=ls())
> t <- -10:10
> y <- 100/(1+.1*exp(-0.51*t))
> lform<-y~a/(1+b*exp(-c*t))
> ldata<-data.frame(t=t, y=y)
> plot(t,y)
> lstart2<-c(a=100, b=10, c=1)
> nlsr::nlxb(lform, data=ldata, start=lstart2)
nlsr object: x 
residual sumsquares =  1.007e-19  on  21 observations
    after  13    Jacobian and  19 function evaluations
  name            coeff          SE       tstat      pval      gradient    JSingval   
a                    100     2.679e-11  3.732e+12  1.863e-216  -6.425e-11       626.6  
b                    0.1      3.78e-13  2.646e+11  9.125e-196  -3.393e-08       112.3  
c                   0.51       6.9e-13  7.391e+11  8.494e-204   1.503e-08       2.791  
# Note that this has succeeded. The test in nlsr recognizes small residual problems.
> nls(lform, data=ldata, start=lstart2, trace=TRUE)
40346.    (1.08e+00): par = (100 10 1)
11622.    (2.93e+00): par = (101.47 0.49449 0.71685)
5638.0    (1.08e+01): par = (102.23 0.38062 0.52792)
642.08    (1.04e+01): par = (102.16 0.22422 0.41935)
97.712    (1.79e+01): par = (100.7 0.14774 0.45239)
22.250    (1.78e+02): par = (99.803 0.093868 0.50492)
0.025789  (1.33e+03): par = (100.01 0.10017 0.50916)
6.0571e-08 (7.96e+05): par = (100 0.1 0.51)
4.7017e-19 (1.86e+04): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
Error in nls(lform, data = ldata, start = lstart2, trace = TRUE) : 
  number of iterations exceeded maximum of 50

> nls(lform, data=ldata, start=lstart2, trace=TRUE, control=list(scaleOffset=1.0))
40346.      (1.08e+00): par = (100 10 1)
11622.      (2.91e+00): par = (101.47 0.49449 0.71685)
5638.0      (9.23e+00): par = (102.23 0.38062 0.52792)
642.08      (5.17e+00): par = (102.16 0.22422 0.41935)
97.712      (2.31e+00): par = (100.7 0.14774 0.45239)
22.250      (1.11e+00): par = (99.803 0.093868 0.50492)
0.025789    (3.79e-02): par = (100.01 0.10017 0.50916)
6.0571e-08  (5.80e-05): par = (100 0.1 0.51)
4.7017e-19  (1.62e-10): par = (100 0.1 0.51)
Nonlinear regression model
  model: y ~ a/(1 + b * exp(-c * t))
   data: ldata
     a      b      c 
100.00   0.10   0.51 
 residual sum-of-squares: 4.7e-19

Number of iterations to convergence: 8 
Achieved convergence tolerance: 1.62e-10

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2
```

### More general termination tests

The single convergence criterion of `nls()` leaves out some possibilities that could
be useful for some problems. The package `nlsr` (@nlsr-manual) already offers both 
the safeguarded relative offset test (**roffset**) as well as a **small sum of 
squares** test (**smallsstest**) that compares the latest evaluated sum of squared
(weighted) residuals to a very small multiple of the initial sum of squares. The
multiple uses a control setting `offset` which defaults to 100.0 and we compute
the 4th power of the machine epsilon times this offset.

```{r meps4, echo=TRUE}
epstol<-100*.Machine$double.eps
e4 <- epstol^4
e4
```

We do note that `nls()` stops after `maxiter` "iterations". However, for almost all
iterative algorithms, the meaning of "iteration" requires careful examination of the
code. Instead, we prefer to record the number of times the residuals or the jacobian have
been computed and put upper limits on these. Our codes exit (terminate) when these limits
are reached. Generally we prefer larger limits than the default `maxiter=50` of `nls()`,
but that may simply reflect our history of dealing with more difficult problems as we are
the tool-makers users consult when things go wrong. 


## Feature: Failure when Jacobian is computationally singular

This is the infamous "singular gradient" termination. A Google search of 

```
R nls "singular gradient"
```

gets over 4000 hits that are spread over the years. 
In some cases this is due
to the failure of the simple finite difference approximation of the Jacobian in the
`numericDeriv()` function that is a part of `nls()`. `nlsr` can use analytic
derivatives, and we can import this functionality to the `nls()` code as an
improvement. See below in the section **Jacobian computation**.

However, the more common source of the issue is that the Jacobian is very close
to singular for some values of the model parameters. In such cases we need to find
an alternative algorithm to the Gauss-Newton iteration of `nls()`. The most common
work-around is the Levenberg-Marquardt stabilization (see @Marquardt1963, @Levenberg1944, 
@jn77ima). Versions of this have been implemented in packages `minpack.lm` and `nlsr`.
and we have preliminary versions of an `nls` replacement that can incorporate a
version of the Levenberg-Marquardt stabilization. (There are some issues of 
integration with other code structures and of complexity of the computations that
suggest we should use a simplified LM stabilization.)


## Feature: Jacobian computation

`nls()`, with the `numericDeriv()` function, computes the Jacobian as the "gradient"
attribute of the residual vector. This is implemented as a mix of R and C code, but
we have created a rather more compact version entirely in R in this Google Summer of
Code project. See the document **DerivsNLS.pdf**.

```{r code=xfun::read_utf8('Tests/Examples/badJlogmod.R')}
```

It should be noted that the selfStart models in the `./src/library/stats/R/zzModels.R` file
provide the Jacobian in the "gradient" attribute of the "one-sided" formula that defines each
model, and these Jacobians are the analytic forms. The `nls()` function, after computing the
"right hand side" or `rhs` of the residual, then checks to see if the "gradient" attribute is
defined, and, if not, uses `numericDeriv` to compute a Jacobian into that attribute. This code
is within the `nlsModel()` or `nlsModel.pliner()` functions. 

## Feature: Subsetting

`nls()` accepts an argument `subset`. Unfortunately, this acts through the mediation of
`model.frame` and is not clearly obvious in the source code files `/src/library/stats/R/nls.R` and 
`/src/library/stats/src/nls.C`. 

While the implementation of subset at the level of the call to `nls()` has a certain
attractiveness, it does mean that the programmer of the solver needs to be aware of the
source (and value) of objects such as the data, residuals and Jacobian. By preference, 
we would implement subsetting by means of zero-value weights, with observation counts
(and degrees of freedom) computed via the non-zero weights.

## Feature: na.action

`na.action` is an argument to the `nls()` function, but it does not appear in obviously in the
source code, often being handled behind the scenes after referencing the option `na.action`.
A useful, but possibly dated, description is given in:
https://stats.idre.ucla.edu/r/faq/how-does-r-handle-missing-values/  

The typical default action, which can be seen by using the command `getOption("na.action")`
is `na.omit`. This option essentially presents computations with data with all observations
containing any missing values (i.e. any row of a data frame with an NA) omitted. 
`na.exclude` does much the same for computations, but keeps the rows with NA elements so 
that predictions are in the correct row position. We recommend that workers actually test 
output to verify behaviour is as wanted.

A succinct answer is given in: https://stats.stackexchange.com/questions/492955/should-i-use-na-omit-or-na-exclude-in-a-linear-model-in-r

> The only benefit of `na.exclude` over `na.omit` is that the former will retain the 
original number of rows in the data. This may be useful where you need to retain the 
original size of the dataset - for example it is useful when you want to compare 
predicted values to original values. With `na.omit` you will end up with fewer 
rows so you won't as easily be able to compare.

`na.pass` simply passes on data "as is", while `na.fail` will essentially stop if any missing
values are present.

## Feature: model frame

`model` is an argument to the `nls()` function, which is documented

> **model** logical. If true, the model frame is returned as part of the object. Default is FALSE.

Indeed, the argument only appears when `nls()` is about to return its result object, and the
element `model` is NULL unless the calling argument `model` is TRUE. (Using the same name could
be confusing.) However, the model frame is used within the function code in the form of an object
`mf`. 

## Feature: sources of data

`nls()` can be called without specifying the `data` argument. In this case, it will
search in the available environments (i.e., workspaces) for suitable data objects. 
We do NOT like this approach, but it is "the R way". R allows users to leave many 
objects in the default (.GlobalEnv) workspace. Moreover, users have to actively 
suppress saving this workspace (`.RData`) on exit, and any such file in the path 
when R is launched will be loaded. The overwhelming proportion of R users in our
acquaintance avoid saving the workspace because of the danger of lurking data and
functions which may cause unwanted results.

Nevertheless, to provide compatible behaviour with `nls()`, we will need to ensure
that equivalent behaviour is guaranteed. Furthermore, we need to test that the 
operation is correct. 

## Feature: missing start vector and self-starting models

Nonlinear estimation algorithms are almost all iterative and need a set of starting
parameters. `nls()` offers a special class of modeling functions called **selfStart** 
models. There are a number of these in base R (see list below) and others in R packages
such as CRAN package `nlraa` (@MiguezNLRAA2021), as well as the now-archived package
`NRAIA`. Unfortunately, the structure of the programming of 
these is such that the methods by which initial parameters are computed is entangled
with the particularities of the `nls()` code. Though there is a `getInitial()` function,
this is not easy to use to simply compute the initial parameter estimates, in part 
because it may call `nls()`. 

In the example below, we show how the `SSlogis` selfStart function can generate a set
of initial parameters for a 3-parameter logistic curve. The form used by `SSlogis`
is $$ y \sim Asym/(1+exp((xmid-tt)/scal)) $$

The example shows how these starting parameters can be transformed to those of 
another form of the model, namely, $$ y \sim b1/(1 + b2*exp(-b3*t)) $$

Let us look at the actual code for `SSlogis()` in `R-devel/src/library/stats/R/zzModels.R`:

```
SSlogis <- selfStart(~ Asym/(1 + exp((xmid - input)/scal)),
    selfStart(
        function(input, Asym, xmid, scal)
        {
              .expr1 <- xmid - input
              .expr3 <- exp(.e2 <- .expr1/scal)
              .expr4 <- 1 + .expr3
              .value <- Asym/.expr4
              .actualArgs <- as.list(match.call()[c("Asym", "xmid", "scal")])
              if(all(vapply(.actualArgs, is.name, NA)))
              {
            		  .expr10 <- .expr4^2
                  .grad <- array(0, c(length(.value), 3L), list(NULL, c("Asym", "xmid", "scal")))
                  .grad[, "Asym"] <- 1/.expr4
		              .grad[, "xmid"] <- - (xm <- Asym * .expr3/scal/.expr10)
		              .grad[, "scal"] <- xm * .e2
                  dimnames(.grad) <- list(NULL, .actualArgs)
                  attr(.value, "gradient") <- .grad
              }
              .value
        },
        initial = function(mCall, data, LHS, ...) {
              xy <- sortedXyData(mCall[["input"]], LHS, data)
              if(nrow(xy) < 4) {
                  stop("too few distinct input values to fit a logistic model")
              }
              z <- xy[["y"]]
              ## transform to proportion, i.e. in (0,1) :
              rng <- range(z); dz <- diff(rng)
              z <- (z - rng[1L] + 0.05 * dz)/(1.1 * dz)
              xy[["z"]] <- log(z/(1 - z))		# logit transformation
              aux <- coef(lm(x ~ z, xy))
              pars <- coef(nls(y ~ 1/(1 + exp((xmid - x)/scal)),
                               data = xy,
                               start = list(xmid = aux[[1L]], scal = aux[[2L]]),
                               algorithm = "plinear", ...))
              setNames(pars [c(".lin", "xmid", "scal")],
                       mCall[c("Asym", "xmid", "scal")])
        },
        parameters = c("Asym", "xmid", "scal"))
```

We note that the function includes analytic expressions for the Jacobian ("gradient").
These could be possibly be useful to R users, especially if documented. Moreover, we
wonder why the programmers have chosen to save so many quantities in "hidden"
variables, i.e., with names preceded by ".". These are then not displayed by the `ls()`
command, making them difficult to access.

In the event that a selfStart model is not available, `nls()` sets all the starting parameters 
to 1. This is, in our view, tolerable, but could possibly be improved by using a set of values
that are slightly different e.g., in the case of a model $$  y \sim a*exp(-b*x) + c*exp(-d*x)$$
it would be useful to have $b$ and $d$ values different so the Jacobian is not singular. Thus,
some sort of sequence like  1.0, 1.1, 1.2, 1.3 for the four parameters might be better and it
can be provided quite simply instead of all 1's. 


```{r hobbsSSlogis}
weed <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
          38.558, 50.156, 62.948, 75.995, 91.972)
tt <- 1:12

NLSformula0 <- y ~ b1/(1+b2*exp(-b3*tt))
NLSformula <- y ~ SSlogis(tt, Asym, xmid, scal)
NLSformulax <- y ~ Asym/(1+exp((xmid-tt)/scal))
NLStestdata <- data.frame(y=weed, tt=tt) # should we use standard name?
s0 <- getInitial(NLSformula, NLStestdata)
print(s0)
s1<-list(b1=s0[1], b2=exp(s0[2]/s0[3]), b3=1/s0[3])
print(as.numeric(s1))
# No actual improvement because nls() has been already used to get the starting values,
# but we do get SEs
hobblog<-nls(NLSformula0, data=NLStestdata, start=s1)
summary(hobblog)
deviance(hobblog)
# nls fails without selfStart -- singular gradient
try(hobblogx<-nls(NLSformulax, data=NLStestdata))
# But Marquardt is able to get a solution easily
library(nlsr)
hobblogxx<-nlxb(NLSformulax, data=NLStestdata, start=c(Asym=1, xmid=1, scal=1))
hobblogxx
```

### selfStart models in base R

The following models are provided (in file ./src/library/stats/R/zzModels.R)

```
SSasymp         - asymptotic regression model
SSasympOff      - alternate formulation of asymptotic regression model with offset
SSasympOrig     - exponential curve through the origin to an asymptote
SSbiexp         - y ~ ~ A1 * exp(-exp(lrc1)*input) + A2 * exp(-exp(lrc2) * input)
SSfol           - y ~ Dose * (exp(lKe + lKa - lCl) * (exp(-exp(lKe) * input) -
                          exp(-exp(lKa) * input))/(exp(lKa) - exp(lKe)))
SSfpl           - four parameter logistic model
SSlogis         - three parameter logistic model
SSmicmen        - Michaelis-Menten model for enzyme kinetics
SSgompertz2     - Gompertz model for growth curve data
SSweibull       - Weibull model for growth curve data
```

### Strategic issues in selfStart models

Because the Gauss-Newton algorithm is rather unreliable from many starting sets
of parameters, selfStart models are more than an accessory to `nls()` but a part
of the infrastructure. However, creating such functions is a lot of work, and their
documentation (file `./src/library/stats/man/selfStart.Rd`) is quite complicated.
We believe that the focus would better be placed on getting good initial parameters,
possibly with some interactive tools. That is, the emphasis should be on the 
`getInitial()` function, though avoiding the current calls back to `nls()`.

## Issue: documentation of the results of running nls()

The output of `nls()` is an object of class "nls" which has the following structure:

### nls() result output according to the documentation

```
A list of:

m	        an nlsModel object incorporating the model.
data  	    the expression that was passed to nls as the data argument. The actual data values 
              are present in the environment of the m components, e.g., environment(m$conv).
call	    the matched call with several components, notably algorithm.
na.action 	the "na.action" attribute (if any) of the model frame.
dataClasses	the "dataClasses" attribute (if any) of the "terms" attribute of the model frame.
model	    if model = TRUE, the model frame.
weights	    if weights is supplied, the weights.
convInfo	a list with convergence information.
control	    the control list used, see the control argument.

There are also two deprecated items if algorithm = "port" fit only. These are
convergence (a code = 0 for convergence) and message. These are available from convInfo.
```

### Example output

To illustrate, let us run the Croucher example.

```{r nlsoutx}
# Croucher example
xdata <- c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9)
ydata <- c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001)
p1<- 1; p2<-0.2; NLSstart<-list(p1=p1,p2=p2)
NLSformula <- ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata)
NLSdata<-data.frame(xdata, ydata)
# Try full output version of nls
library(nlspkg) # use the packaged version of nls()
result<-nls(NLSformula, data=NLSdata, start=NLSstart, model=TRUE)
# str(result) -- displays large amount of material - suppressed here
#    as it is too wide for the page
result
ls(result) # to list the elements of the output
ls(result$m) # and in particular the "m" object
```

### Concerns with content of the nls result object

The nls object contains some elements that are awkward to produce by other algorithms.
Moreover, some information that would be useful is not presented obviously.

In the following, we use `result` as the returned object from `nls()`.

The `data` return element is an R symbol. To actually access the data from this 
element, we need to use syntax

```
eval(parse(text=result$data))
```

However, if the call is made with `model=TRUE`, then there is a returned element
`model` which contains the data, and we can see its contents using 

```
ls(result$model)
```

and if there is an element called `xdata`, it can be accessed as `result$model$xdata`.

### Information that is NOT in the nls result object

`nlsr::nlxb()` solves ostensibly the same problem as `nls()` but only claims to return

```
coefficients    A named vector giving the parameter values at the supposed solution.

ssquares        The sum of squared residuals at this set of parameters.

resid           The residual vector at the returned parameters.

jacobian        The jacobian matrix (partial derivatives of residuals w.r.t. the 
                parameters) at the returned parameters.

feval           The number of residual evaluations (sum of squares computations) used.

jeval           The number of Jacobian evaluations used.
```

However, actually looking at the structure of a returned result gives a list of 11
items:
```
 $ resid       : num [1:12] 0.0119 -0.0328 0.092 0.2088 0.3926 ...
  ..- attr(*, "gradient")= num [1:12, 1:3] 0.0271 0.0367 0.0496 0.0666 0.089 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:3] "Asym" "xmid" "scal"
 $ jacobian    : num [1:12, 1:3] 0.0271 0.0367 0.0496 0.0666 0.089 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:3] "Asym" "xmid" "scal"
 $ feval       : num 31
 $ jeval       : num 23
 $ coefficients: Named num [1:3] 196.19 12.42 3.19
  ..- attr(*, "names")= chr [1:3] "Asym" "xmid" "scal"
 $ ssquares    : num 2.59
 $ lower       : num [1:3] -Inf -Inf -Inf
 $ upper       : num [1:3] Inf Inf Inf
 $ maskidx     : int(0) 
 $ weights     : NULL
 $ formula     :Class 'formula'  language y ~ Asym/(1 + exp((xmid - tt)/scal))
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 - attr(*, "class")= chr "nlsr"
```
This is still a smaller result object than the one `nls()` returns. Moreover, `nlxb`
explicitly returns the sum of squares as well as the residual vector and Jacobian. The 
counts of evaluations are also returned. Working on this project showed several potential
updates to the `nlsr` documentation.

### Weights in returned functions from nls()

The functions `resid()` (an alias for `residuals()`) and `fitted()` and `lhs()` are UNWEIGHTED.
But if we return `ans` from `nls()` or `minpack.lm::nlsLM` or our new `nlsj` (interim package), 
then `ans$m$resid()` is WEIGHTED.

### Interim output from the "port" algorithm

As the `nls()` **man** page states, when the "port" algorithm is used with the `trace` argument
TRUE, the iterations display the objective function value which is 1/2 the sum of squares (or 
deviance). It is likely that the trace display is embedded in the Fortran of the `nlminb`
routine that is called to execute the "port" algorithm, but the discrepancy is nonetheless
unfortunate for users.

### Failure to return best result achieved

If `nls()` reaches a point where it cannot continue but has not found a point where the
relative offset convergence criterion is met, it may simply exit, especially if a 
"singular gradient" (singular Jacobian) is found. However, this may occur AFTER the 
function has made considerable progress in reducing the sum of squared residuals. 
An example is to be found in the `Tetra_1.R` example from the `nlsCompare` package.
Here is an abbreviated version of that problem and the `nls()` output:

```{r tetrarun}
time=c( 1,  2,  3,  4,  6 , 8, 10, 12, 16)
conc = c( 0.7, 1.2, 1.4, 1.4, 1.1, 0.8, 0.6, 0.5, 0.3)
NLSdata <- data.frame(time,conc)
NLSstart <-c(lrc1=-2,lrc2=0.25,A1=150,A2=50) # a starting vector (named!)
NLSformula <-conc ~ A1*exp(-exp(lrc1)*time)+A2*exp(-exp(lrc2)*time)
tryit <- try(nls(NLSformula, data=NLSdata, start=NLSstart, trace=TRUE))
print(tryit)
```

Note that the sum of squares has been reduced from 61216 to 1.6211, but 
unless `trace` is invoked, the user will not get any information about this.
This would be an almost trivial change to the `nls() function and could be 
useful to R users.

## Feature: partially linear models and their specification

Specifying a model to a solver should, ideally, use the same syntax across
solver tools. Unfortunately, R allows multiple approaches.

One obvious case is that nonlinear modeling tools are a superset of linear ones.
Yet the explicit model

```
y ~ a*x + b
```

does not work with the linear modeling function `lm()`, which requires
this model to be specified as

```
y ~ x
```

However, even within `nls()`, we see annoying inconsistencies. Consider the 
following FOUR different calling sequences for the same problem, though the
second is to illustrate how one intuitive choice will not work. In this failed
attempt, putting the `Asym` parameter in the model causes the `plinear` algorithm
to try to add another term to the model. We believe this is unfortunate, and would
like to see a consistent syntax. At the time of writing (end of August 2021) we do
not have a resolution for this issue.

```{r log4ways, echo=TRUE}
DNase1 <- subset(DNase, Run == 1)

## using a selfStart model - do not specify the starting parameters
fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)
summary(fm1DNase1)
## the coefficients only:
coef(fm1DNase1)
## including their SE, etc:
coef(summary(fm1DNase1))

## using conditional linearity - leave out the Asym parameter
fm2DNase1 <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1,
                 start = list(xmid = 0, scal = 1),
                 algorithm = "plinear")
summary(fm2DNase1)

## using conditional linearity AND Asym does NOT work

fm2aDNase1 <- try(nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1,
                 start = list(Asym=3, xmid = 0, scal = 1),
                 algorithm = "plinear",
                 trace = TRUE))
summary(fm2aDNase1)

## without conditional linearity
fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1,
                 start = list(Asym = 3, xmid = 0, scal = 1))
summary(fm3DNase1)

## using Port's nl2sol algorithm
fm4DNase1 <- try(nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1,
                 start = list(Asym = 3, xmid = 0, scal = 1),
                 algorithm = "port"))
summary(fm4DNase1)
```

## Issue: code structure

The `nls()` code is structured in a way that inhibits both maintenance and improvement.
In particular, the iterative setup is such that introduction of Marquardt stabilization
is not easily available. 

To obtain performance, a lot of the code is in C with consequent calls and returns that
complicate the code. Over time, R has become much more efficient on modern computers, and
the need to use compiled C and Fortran is less critical. Moreover, the burden for maintenance
could be much reduced by moving code entirely to R.

## Issue: code documentation for maintenance 

`setPars()` -- explain weaknesses. Only used by `profile.nls()`

The paucity of documentation is exacerbated by the mixed R/C/Fortran code base.

Following is an email to John Nash from Doug Bates. This is NOT a criticism
of Prof. Bates work, but a reflection on how difficult it is to develop code in this
subject area and to keep it maintainable. We have experienced similar loss of understanding
for some of our own codes.

```
I'm afraid that I don't know the purpose of the recursive call either.  
I know that I wrote the code to use a closure for the response, covariates, etc., 
but I don't recall anything like a recursive call being necessary.

If the R sources were in a git repository I might try to use `git blame` 
to find out when and by whom that was written but they are in an SVN repository, 
I think, and I haven't used it for a long, long time.

I don't think I will be of much help.  My R skills have atrophied to the point 
where I wouldn't even know how to start exploring what is happening in the first 
call as opposed to the recursive call.
```

This was in response to an email to Dough bates on Jun 29, 2021 from John Nash.

```
    Thanks.

    https://gitlab.com/nashjc/improvenls/-/blob/master/Croucher-expandednlsnoc.R
    <https://gitlab.com/nashjc/improvenls/-/blob/master/Croucher-expandednlsnoc.R>

    This has the test problem and the expanded code. Around line 367 is where we are
    scratching our heads. The function code (from nlsModel()) is in the commented lines below
    the call. This is

          # > setPars
          # function(newPars) {
          #   setPars(newPars)
          #   resid <<- .swts * (lhs - (rhs <<- getRHS())) # envir = thisEnv {2 x}
          #   dev   <<- sum(resid^2) # envir = thisEnv
          #   if(length(gr <- attr(rhs, "gradient")) == 1L) gr <- c(gr)
          #   QR <<- qr(.swts * gr) # envir = thisEnv
          #   (QR$rank < min(dim(QR$qr))) # to catch the singular gradient matrix
          # }

    I'm anticipating that we will be able to set up a (possibly inefficient) code
    with documentation that will be easier to follow and test, then gradually figure
    out how to make it more efficient.

    The equivalent from minpack.lm is

    setPars = function(newPars) {
                setPars(newPars)
                assign("resid", .swts * (lhs - assign("rhs", getRHS(),
                    envir = thisEnv)), envir = thisEnv)
                assign("dev", sum(resid^2), envir = thisEnv)
                assign("QR", qr(.swts * attr(rhs, "gradient")), envir = thisEnv)
                return(QR$rank < min(dim(QR$qr)))
            }

    In both there is the recursive call, which must have a purpose I don't understand.
```

## Feature: indexed parameters

The **man** file for `nls()` includes the following example of a situation in which
parameters are indexed. It also uses the "plinear" option as an added complication.
Here we use a truncated version of the example to save display space.

```{r nlsindx1}
## The muscle dataset in MASS is from an experiment on muscle
## contraction on 21 animals.  The observed variables are Strip
## (identifier of muscle), Conc (Cacl concentration) and Length
## (resulting length of muscle section).
if(! requireNamespace("MASS", quietly = TRUE)) stop("Need MASS pkg")
mm<- MASS::muscle[1:12,] # take only 1st few values of Strip
str(mm)
mm<-droplevels(mm)
str(mm)
nlev <- nlevels(mm)
withAutoprint({
  ## The non linear model considered is
  ##       Length = alpha + beta*exp(-Conc/theta) + error
  ## where theta is constant but alpha and beta may vary with Strip.
  with(mm, table(Strip)) # 2, 3 or 4 obs per strip
  nl <- nlevels(mm$Strip)
  ## We first use the plinear algorithm to fit an overall model,
  ## ignoring that alpha and beta might vary with Strip.
  musc.1 <- nls(Length ~ cbind(1, exp(-Conc/th)), mm,
                start = list(th = 1), algorithm = "plinear")
  summary(musc.1)
  
  ## Then we use nls' indexing feature for parameters in non-linear
  ## models to use the conventional algorithm to fit a model in which
  ## alpha and beta vary with Strip.  The starting values are provided
  ## by the previously fitted model.
  ## Note that with indexed parameters, the starting values must be
  ## given in a list (with names):
  ## ?? but why use b here AND in the new formula??
  b <- coef(musc.1)
  
  musc.2 <- nls(Length ~ a[Strip] + b[Strip]*exp(-Conc/th), data=mm,
                start = list(a = rep(b[2], nl), b = rep(b[3], nl), th = b[1]))
  summary(musc.2)
})

```

Note that the answers for the parameters are NOT indexed. e.g. `coef(musc.2)` is a single level
vector of parameters. We do not see `a[1], a[2], a[3]` but `a1, a2, a3`. This is because the model
must integrate all the parameters because `th` is a common parameter across the index `Strip`.

We believe this structure is quite likely to cause confusion and error, and propose an alternative
approach below.


# Goals of our effort

Here are some of the goals we hope to accomplish.

## Code rationalization and documentation

We want

- to provide a packaged version of `nls()` (call it `nlsalt`) coded entirely in R 
  that matches the version in base R or what is packaged in `nlspkg` as described 
  in the "PkgFromRbase" document.
  
- try to obtain a cleaner structure for the overall `nls()` infrastructure. By this we
  mean a re-factoring of the routines so they are better suited to maintenance of both
  the existing `nls()` methods and features as well as the new features we would like
  to add.

- try to explain what we do, either in comments or separate maintainer documentation.
  Since we are complaining about the lack of explanatory material for the current
  code, we feel it is incumbent on us to provide such material for our own work, and if
  possible for the existing code.

### Rationalization of formula specifications

Below in "Consistent specifications of partially linear models" we point out that
`nls()` uses a different formula specification from the default for a problem if the 
`plinear` algorithm is used. This is unfortunate, since the user cannot then simply
add `algorithm="plinear"` to the call. Moreover, we believe it makes errors more
likely. We suggest a possible approach to avoiding this issue, but have yet to 
cast this into working code (2021-8-19).


<!-- ``` -->
<!-- y ~ x + z + x*z -->
<!-- y ~ a + b*x +c*z + d * (x*z)  # <-- fully specified -->
<!-- y ~ a*Fn1(b, c, x, z)/Fn2() -->

<!-- ``` -->


### Rationalization of indexed models

Indexed models clearly have a place in some areas of research. However, the current
approach in `nls()` is awkward. The user must use DIFFERENT formulas depending on 
whether the `plinear` algorithm is chosen. This seems to be related to the need for
that approach to use the `lm()` function or its infrastructure, thus employing 
formulas that omit the parameters of models and simply give the variables or functions
thereof, such as interaction terms. 

Users, we believe, are much more likely to be comfortable with fully specified formulas.
And such formulas are needed by nonlinear least squares functions such as `minpack.lm::nlsLM`
and `nlsr::nlxb`. Thus we would like the identification of the linear parameters to be,
if possible, automated. At the very least, we should be able to use a structure like

```
algorithm="plinear(parmx, thetaz)"
```

to allow for the full formula to be used with the linear parameters identified.

## Provide tests

We need suitable tests in order:

- to ensure our new `nlsalt` or related packages work properly, in particular, giving
  results comparable to or better than the `nls()` in base R or `nlspkg`;
- to test individual solver functions to ensure they work across the range of calling 
  mechanisms, that is, different ways of supplying inputs to the solver(s);
- to pose "silly" inputs to nonlinear least squares solvers (in R) to see if 
  these bad input exceptions are caught by the programs.

### A test runner program

?? Arkajyoti -- do you want to expand?

When we have a "new" or trial solver function, we would like to know if it gives 
acceptable results on a range of sample problems of different types, starting
parameters, input conditions, constraints, subsets, weights or other settings. 
Ideally we want to be able to get a summary that is easy to read and assess. For
example, one approach would be to list the names of a set of tests with a red, green
or yellow dot beside the name for FAILURE, SUCCESS, or "NOT APPLICABLE". In the last
category would be a problem with constraints that the solver is not designed to 
handle. 

To accomplish this, we need a suitable "runner" program that can be supplied with
the name of a solver or solvers and a list of test problem cases. Problems generally
have a base setup -- a specification of the function to fit as a formula, some data
and a default starting set of parameters. Other cases can be created by imposing
bounds or mask constraints, subsets of the data, and different starts. 

How to set up this "runner" and its supporting infrastructure is non-trivial. While
the pieces are not as complicated as the inter-related parts of the solvers, especially
`nls()`, the categorization of tests, their documentation, and the structuring to make
running them straightforward and easy requires much attention to detail.

Some considerations for our test scrips:

- Is it useful to have a "base" script for each family of test problem, with numbered particular
  cases? That is, if we run the scripts in order, we can avoid some duplication of code and
  data.
  
- While we have developed some tags to document the test problem families and cases, we believe
  that such tags (essentially summary documentation) will continue to need revision as different
  tools and problems are included in scope of `nlsCompare`.
  
- Similarly, we expect that there will be ongoing review of the structure of the result files.


# Outputs of the project

The project output is available in the Git repository https://gitlab.com/nashjc/improvenls
?? Arkajyoti -- do you want to change to Github

## Formal reports or documentation 

- RefactoringNLS.Rmd: this document which will become the main report of the project.

- TestsDoc.Rmd: a survey of testing tools in R. It has more general possibilities 
  and fits into the subject of regression testing, in which case a 
  more extensive literature review will be needed.  Note that this
  document reflects the work in the the "Problem sets and test infrastructure"
  below.

  <!-- need to incorporate MachID and -->
  <!-- the many tests. Arkajyoti: should we think of working this up into a paper for -->
  <!-- JSS or the R-Journal? My  -->
  <!-- view is that this COULD be a long term side-interest for your academic work, but -->
  <!-- that would depend on your own interests as well as opportunities. -->


## Informal reports

These are documents used to discuss particular aspects of our work. These are part
of the repository https://gitlab.com/nashjc/improvenls.

- DerivsNLS.Rmd: a document to explain different ways in which Jacobian information
  is supplied to nonlinear least squares computation in R. File `ExDerivs.R` is
  a DRAFT of a script to provide examples.

- ImproveNLS.bib: a consolidated BibTex bibliography for all documents in this
  project, possibly with wider application to nonlinear least squares in general

- MachineSummary.Rmd: an informal investigation of ways to report the characteristics
  and identity of machines running tests. `MachID.R` offers a concise summary function
  to identify a particular computational system used for tests. 
  Note that there remains an open issue that the get_ram() for Windows 10 in the `benchmarkme`
  package may report the RAM size as "NA". We are collaborating with Colin Gillespie, 
  maintainer of `benchmarkme` to seek a workaround.

- VarietyInNonlinearLeastSquaresCodes.Rmd: a review of the different algorithms and
  the many choices in their implementation for nonlinear least squares. This is still
  a DRAFT at 2021-8-20.

- PkgFromRbase.Rmd: an explanation of the construction of the `nlspkg` from the
  code in R-base. 
  
  <!-- We could think of putting this in R-bloggers or similar. Or if -->
  <!-- we came up with a more general template, then an R-Journal article. -->

- WorkingDocument4ImproveNLS.Rmd: essentially a way to record what we have worked on.
  A project diary.

## Problem sets and test infrastructure

We have several test problems and variants thereof in the `inst/scripts/` directory of
the `nlsCompare` package available on Github (https://github.com/ArkaB-DS/nlsCompare).
We direct the reader to that package for documentation of the test infrastructure, 
in particular the problems and methods files (`problems.csv` and `methods.csv`) and
the various functions invoked by `run.R` to produce an output file in CSV form also.

Towards the end of the project, we have focused our attention on the `nlsCompare`
package, which looks at evaluating and comparing functions for nonlinear least
squares problems. We use many of the same tests as checks that are or will be 
built into our packages for such problems e.g., `nlsalt`. These use the `testthat`
structure and may include verification of outputs that are specific to a package,
such as the upper triangular matrix R of the QR decomposition that has been 
computed for a Jacobian. Since such an object will not be computed by all methods, 
testing it in `nlsCompare` makes no sense, and in that package we concentrate on
the minimum sum of squares and the associated model parameters.

## Code and documentation

A summary of our main results:

- nlspkg: a packaged version of the `nls() code from R-base. Thanks to Duncan Murdoch

- nlsalt: attempt to mirror `nls() behaviour entirely in R. This is UNFINISHED. 
  The effort showed that the structure of the programs was difficult to follow, 
  undocumented, and unsuited to adding improvements like the Marquardt stabilization. 
  We were able to get a pure-R version of numericDeriv() and rework most of the functions 
  of `nlsModel` (but not `nlsModel.plinear`). This work may continue after the project
  formally ends, but collaboration is likely needed with workers who have a deep knowledge
  of R internals. `nls-flowchart.txt` was a start at documentation of the structure of `nls()`.

- nls-changes-for-small-residuals-in-nls-R-4.0.2.zip: collected material for the 
  fix by JN to the relative offset convergence criterion failure when there are small 
  residuals in problems sent to ``nls()`. Now part of R.
  
- nlsj: A refactoring of the `nls() functionality. `redesign2107.txt` gives some notes.
  This is an interim package for exploration of ideas and will NOT, as it currently
  is presented, become a distributed package.

- nlsralt: this is intended to be a modified version of Nash and Murdoch package 
  `nlsr` with improvements discovered as a result of this project. At the close of
  this project, this is still a work in progress.
  
  <!-- For example,  -->
  <!-- `nlsr` does not currently use the `subset` argument correctly. We also need to document -->
  <!-- changes e.g., to numerical derivatives. However, these are tasks for JN rather than AB. -->


## Strategic choices in a nonlinear model estimation

A key difference in approach between `nls()` and `nlsr::nlxb()`, possibly the key
difference, is that `nls()` builds a large infrastructure from which the Gauss-Newton
iteration can be executed and other statistical information such as profiles can be
computed, while `nlxb()` returns quite limited information, and in its execution
computes what is needed on an as and when basis. This follows a path that one of us
(JN) established almost 50 years ago with the software that became @cnm79, using
setup, solver, and post-solution analysis phases to computation. Here, that last 
phase is not part of `nlxb()`, but is the domain of other functions in the `nlsr`
package. `nls()` also has some similar functions, but they are much more tied into
the infrastructure that is created mostly in the `nlsModel()` function.

The `nls()` approach, as implemented in the base R code and `nlspkg` leads to considerable
entangling of the different functions and capabilities. This is to an extent that even after
both the preparation for and carrying out of the present project, we do not feel confident
to explain the code completely, nor to maintain it. However, we have made some progress toward
such a goal. Part of that has been to render the code in R. In this there remain some tasks
to complete, and they are possibly difficult.

### Opinion for discussion

To advance the stability and maintainability of R, we believe that the functions that carry out
quite advanced calculations like those achieved by `nls()` should be organized to have their 
interactions as orderly as possible. The aspects of `nls()` that have given us the most trouble
are as follows:

- The functions that compute the residuals and jacobians are frequently computed by presuming
  the current data and parameters for those functions are available in an environment. As long
  as the correct environment is used, this provides a surprisingly short syntax to invoke the
  calculations. 
  
- Weights, subsets, and various contextual controls such as that for the `na.action` that tells
  our code what to do with undefined or missing numbers. Again, this makes for a very simple
  invocation of the calculation, but the context is hidden from the user -- and it can be
  difficult for those of us trying to maintain or improve the code to be certain we have the
  context correct.
  
- The mixing of R and C code makes for high performance, but without developer documentation 
  leaves programmers with a lot of work to do to adjust the code. We believe in keeping code --
  at least a runnable reference version -- in a single programming language. If necessary, by
  measuring ("profiling") the code, we can find bottlenecks and replace just those slower 
  parts of the reference code.

- We believe that a structure that isolates the setup, solve, and post-solution structures 
  for complicated functions reduces the number of objects that must be kept in appropriate 
  alignment at any one stage in a set of calculations.
  
- We feel that all codes should return the best fit they have found so far, even if there
  are untoward conditions reached, such as a singular Jacobian. This modification could be
  made to `nls()` even in the short term. 

### Consistent specifications of partially linear models

Ideally, we believe it would be best to specify all models with a complete set of parameters.
That is, the model should be written down for as it would be used, rather than for estimation. 
The `algorithm="plinear"` setting requires that we leave out the linear parameters. This
means that we cannot simply change the `algorithm` argument without also changing the 
`formula`. Moreover, the partially linear parameters appear in the result object
as `.lin1`, `.lin2`, etc. 

A possible syntax to provide consistency would be to specify the partially linear parameters
when the `plinear` option is specified. For example, we could use

```
algorithm="plinear(Asym)"
```

which would allow for the partially linear algorithm to function and the output of different
`algorithm` options to be consistent.

Furthermore, we ask if it is possible for us to devise code that will detect linearity, 
rather than expecting the user to provide the special structure and  setting. Work in this
area appears to tbe quite limited, with @Birkisson2013 being seemingly a sole contribution.
Unfortunately, the mention of a "Linear and Nonlinear Discoverer" in @Zhang2011LinNonlin 
is about a class of models that have almost nothing in common
with the present issue. The algorithm in @Birkisson2013 will need investigating further,
but we suspect that it is complicated enough that it will not easily be adapted to 
an `nls()` style program. Possibly a simpler algorithm may be found.

# Acknowledgement

Hans Werner Borchers was helpful in developing the proposal for this project.

# References