-
Notifications
You must be signed in to change notification settings - Fork 0
/
RefactoringNLS.Rmd
1126 lines (897 loc) · 51.3 KB
/
RefactoringNLS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Refactoring the `nls()` function in R"
author:
- John C. Nash \thanks{ retired professor, Telfer School of Management, University of Ottawa}
- Arkajyoti Bhattacharjee \thanks{Department of Mathematics and Statistics, Indian Institute of Technology, Kanpur}
date: "2021-8-20"
output:
pdf_document:
keep_tex: false
toc: true
bibliography: ImproveNLS.bib
link-citations: yes
linkcolor: red
urlcolor: blue
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
## require(bookdown) # language engine to display text - does not seem necessary
```
# Abstract
This article reports the particular activities of our Google Summer of Code
project "Improvements to `nls()`" that relate to the R code for that function, which
is intended for the estimation of models written as a formula that has at least
one parameter that is "nonlinear", that is, not estimable via solving a set of
linear equations. A companion document **Variety in Nonlinear Least Squares Codes**
presents an overview of methods for the problem which takes a much wider view of the
problem of minimizing a function that can be written as a sum of squared
terms.
Our work has not fully addressed all the issues that we would like to see
resolved, but we believe we have made sufficient progress to demonstrate that
there are worthwhile improvements that can be made to the R function `nls()`.
An important overall consideration in our work has been the maintainability of
the code base that supports the `nls()` functionality, as we believe that the
existing code makes maintenance and improvement very difficult.
# The existing `nls()` function: strengths and shortcomings
`nls()` is the tool in base R (the distributed software package from [CRAN](https://cran.r-project.org)
for estimating nonlinear statistical models. The function was developed mainly in the 1980s and
1990s by Doug Bates et al., initially for S (see https://en.wikipedia.org/wiki/S_%28programming_language%29) .
The ideas spring primarily from the book by @bateswatts.
The `nls()` function has a remarkable and quite comprehensive set of capabilities for estimating
nonlinear models that are expressed as formulas. In particular, we note that it
- handles formulas that include R functions
- allows data to be subset
- permits parameters to be indexed over a set of related data
- produces measures of variability (i.e., standard error estimates) for the estimated parameters
- has related profiling capabilities for exploring the likelihood surface as parameters are changed
With such a range of features and a long history, it is not surprising that the code has become untidy
and overly patched. It is, to
our mind, essentially unmaintainable. Moreover, its underlying methods can and should be improved. Let
us review some of the issues. We will then propose corrective actions, some of which we have carried out.
## Feature: Convergence and termination tests
Within the standard documentation (**man** or ".Rd" file) `nls()` warns
> **The default settings of nls generally fail on artificial “zero-residual” data problems.**
> The nls function uses a relative-offset convergence criterion that compares the numerical imprecision at
the current parameter estimates to the residual sum-of-squares. This performs well on data of the form $$y = f(x, \theta) + eps$$
> (with var(eps) > 0). It fails to indicate convergence on data of the form $$y = f(x, \theta)$$
> because the criterion amounts to comparing two components of the round-off error. To avoid a zero-divide in
computing the convergence testing value, a positive constant `scaleOffset` should be added to the denominator
sum-of-squares; it is set in control; this does not yet apply to algorithm = "port".
It turns out that this issue can be quite easily resolved. The key "convergence test" -- more properly
a "termination test" for the **program** rather than testing for convergence of the underlying **algorithm** --
is the Relative Offset Convergence Criterion (see @BatesWatts81). This works by projecting the proposed
step in the parameter vector on the gradient and estimating how much the sum of squares loss function
will decrease. To avoid scale issues, we use the current size of the loss function as a measure and divide
by it. When we have "converged", the estimated
decrease is very small, as usually is its ratio to the sum of squares. However, in some cases we have the
possibility of an exact fit and the sum of squares is (almost) zero and we get the possibility of a
zero-divide failure.
The issue is easily resolved by adding a small quantity to the loss function. To preserve legacy behaviour,
in 2021, one of us (J. Nash) proposed that `nls.control()` have an additional parameter `scaleOffset`
with a default value of zero for legacy behaviour. Setting it to a small number -- 1.0 is
a reasonable choice --
allows small-residual problems (i.e., near-exact fits) to be dealt with easily. We call this the
**safeguarded relative offset convergence criterion**.
We are pleased to report that this improvement is now in the R distributed code.
### Example of a small-residual problem
```
rm(list=ls())
t <- -10:10
y <- 100/(1+.1*exp(-0.51*t))
lform<-y~a/(1+b*exp(-c*t))
ldata<-data.frame(t=t, y=y)
plot(t,y)
lstartbad<-c(a=1, b=1, c=1)
lstart2<-c(a=100, b=10, c=1)
nlsr::nlxb(lform, data=ldata, start=lstart2)
nls(lform, data=ldata, start=lstart2, trace=TRUE)
# Fix with scaleOffset
nls(lform, data=ldata, start=lstart2, trace=TRUE, control=list(scaleOffset=1.0))
sessionInfo()
```
Edited output of running this function follows:
```
> rm(list=ls())
> t <- -10:10
> y <- 100/(1+.1*exp(-0.51*t))
> lform<-y~a/(1+b*exp(-c*t))
> ldata<-data.frame(t=t, y=y)
> plot(t,y)
> lstart2<-c(a=100, b=10, c=1)
> nlsr::nlxb(lform, data=ldata, start=lstart2)
nlsr object: x
residual sumsquares = 1.007e-19 on 21 observations
after 13 Jacobian and 19 function evaluations
name coeff SE tstat pval gradient JSingval
a 100 2.679e-11 3.732e+12 1.863e-216 -6.425e-11 626.6
b 0.1 3.78e-13 2.646e+11 9.125e-196 -3.393e-08 112.3
c 0.51 6.9e-13 7.391e+11 8.494e-204 1.503e-08 2.791
# Note that this has succeeded. The test in nlsr recognizes small residual problems.
> nls(lform, data=ldata, start=lstart2, trace=TRUE)
40346. (1.08e+00): par = (100 10 1)
11622. (2.93e+00): par = (101.47 0.49449 0.71685)
5638.0 (1.08e+01): par = (102.23 0.38062 0.52792)
642.08 (1.04e+01): par = (102.16 0.22422 0.41935)
97.712 (1.79e+01): par = (100.7 0.14774 0.45239)
22.250 (1.78e+02): par = (99.803 0.093868 0.50492)
0.025789 (1.33e+03): par = (100.01 0.10017 0.50916)
6.0571e-08 (7.96e+05): par = (100 0.1 0.51)
4.7017e-19 (1.86e+04): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
1.2440e-27 (5.71e-01): par = (100 0.1 0.51)
Error in nls(lform, data = ldata, start = lstart2, trace = TRUE) :
number of iterations exceeded maximum of 50
> nls(lform, data=ldata, start=lstart2, trace=TRUE, control=list(scaleOffset=1.0))
40346. (1.08e+00): par = (100 10 1)
11622. (2.91e+00): par = (101.47 0.49449 0.71685)
5638.0 (9.23e+00): par = (102.23 0.38062 0.52792)
642.08 (5.17e+00): par = (102.16 0.22422 0.41935)
97.712 (2.31e+00): par = (100.7 0.14774 0.45239)
22.250 (1.11e+00): par = (99.803 0.093868 0.50492)
0.025789 (3.79e-02): par = (100.01 0.10017 0.50916)
6.0571e-08 (5.80e-05): par = (100 0.1 0.51)
4.7017e-19 (1.62e-10): par = (100 0.1 0.51)
Nonlinear regression model
model: y ~ a/(1 + b * exp(-c * t))
data: ldata
a b c
100.00 0.10 0.51
residual sum-of-squares: 4.7e-19
Number of iterations to convergence: 8
Achieved convergence tolerance: 1.62e-10
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2
```
### More general termination tests
The single convergence criterion of `nls()` leaves out some possibilities that could
be useful for some problems. The package `nlsr` (@nlsr-manual) already offers both
the safeguarded relative offset test (**roffset**) as well as a **small sum of
squares** test (**smallsstest**) that compares the latest evaluated sum of squared
(weighted) residuals to a very small multiple of the initial sum of squares. The
multiple uses a control setting `offset` which defaults to 100.0 and we compute
the 4th power of the machine epsilon times this offset.
```{r meps4, echo=TRUE}
epstol<-100*.Machine$double.eps
e4 <- epstol^4
e4
```
We do note that `nls()` stops after `maxiter` "iterations". However, for almost all
iterative algorithms, the meaning of "iteration" requires careful examination of the
code. Instead, we prefer to record the number of times the residuals or the jacobian have
been computed and put upper limits on these. Our codes exit (terminate) when these limits
are reached. Generally we prefer larger limits than the default `maxiter=50` of `nls()`,
but that may simply reflect our history of dealing with more difficult problems as we are
the tool-makers users consult when things go wrong.
## Feature: Failure when Jacobian is computationally singular
This is the infamous "singular gradient" termination. A Google search of
```
R nls "singular gradient"
```
gets over 4000 hits that are spread over the years.
In some cases this is due
to the failure of the simple finite difference approximation of the Jacobian in the
`numericDeriv()` function that is a part of `nls()`. `nlsr` can use analytic
derivatives, and we can import this functionality to the `nls()` code as an
improvement. See below in the section **Jacobian computation**.
However, the more common source of the issue is that the Jacobian is very close
to singular for some values of the model parameters. In such cases we need to find
an alternative algorithm to the Gauss-Newton iteration of `nls()`. The most common
work-around is the Levenberg-Marquardt stabilization (see @Marquardt1963, @Levenberg1944,
@jn77ima). Versions of this have been implemented in packages `minpack.lm` and `nlsr`.
and we have preliminary versions of an `nls` replacement that can incorporate a
version of the Levenberg-Marquardt stabilization. (There are some issues of
integration with other code structures and of complexity of the computations that
suggest we should use a simplified LM stabilization.)
## Feature: Jacobian computation
`nls()`, with the `numericDeriv()` function, computes the Jacobian as the "gradient"
attribute of the residual vector. This is implemented as a mix of R and C code, but
we have created a rather more compact version entirely in R in this Google Summer of
Code project. See the document **DerivsNLS.pdf**.
```{r code=xfun::read_utf8('Tests/Examples/badJlogmod.R')}
```
It should be noted that the selfStart models in the `./src/library/stats/R/zzModels.R` file
provide the Jacobian in the "gradient" attribute of the "one-sided" formula that defines each
model, and these Jacobians are the analytic forms. The `nls()` function, after computing the
"right hand side" or `rhs` of the residual, then checks to see if the "gradient" attribute is
defined, and, if not, uses `numericDeriv` to compute a Jacobian into that attribute. This code
is within the `nlsModel()` or `nlsModel.pliner()` functions.
## Feature: Subsetting
`nls()` accepts an argument `subset`. Unfortunately, this acts through the mediation of
`model.frame` and is not clearly obvious in the source code files `/src/library/stats/R/nls.R` and
`/src/library/stats/src/nls.C`.
While the implementation of subset at the level of the call to `nls()` has a certain
attractiveness, it does mean that the programmer of the solver needs to be aware of the
source (and value) of objects such as the data, residuals and Jacobian. By preference,
we would implement subsetting by means of zero-value weights, with observation counts
(and degrees of freedom) computed via the non-zero weights.
## Feature: na.action
`na.action` is an argument to the `nls()` function, but it does not appear in obviously in the
source code, often being handled behind the scenes after referencing the option `na.action`.
A useful, but possibly dated, description is given in:
https://stats.idre.ucla.edu/r/faq/how-does-r-handle-missing-values/
The typical default action, which can be seen by using the command `getOption("na.action")`
is `na.omit`. This option essentially presents computations with data with all observations
containing any missing values (i.e. any row of a data frame with an NA) omitted.
`na.exclude` does much the same for computations, but keeps the rows with NA elements so
that predictions are in the correct row position. We recommend that workers actually test
output to verify behaviour is as wanted.
A succinct answer is given in: https://stats.stackexchange.com/questions/492955/should-i-use-na-omit-or-na-exclude-in-a-linear-model-in-r
> The only benefit of `na.exclude` over `na.omit` is that the former will retain the
original number of rows in the data. This may be useful where you need to retain the
original size of the dataset - for example it is useful when you want to compare
predicted values to original values. With `na.omit` you will end up with fewer
rows so you won't as easily be able to compare.
`na.pass` simply passes on data "as is", while `na.fail` will essentially stop if any missing
values are present.
## Feature: model frame
`model` is an argument to the `nls()` function, which is documented
> **model** logical. If true, the model frame is returned as part of the object. Default is FALSE.
Indeed, the argument only appears when `nls()` is about to return its result object, and the
element `model` is NULL unless the calling argument `model` is TRUE. (Using the same name could
be confusing.) However, the model frame is used within the function code in the form of an object
`mf`.
## Feature: sources of data
`nls()` can be called without specifying the `data` argument. In this case, it will
search in the available environments (i.e., workspaces) for suitable data objects.
We do NOT like this approach, but it is "the R way". R allows users to leave many
objects in the default (.GlobalEnv) workspace. Moreover, users have to actively
suppress saving this workspace (`.RData`) on exit, and any such file in the path
when R is launched will be loaded. The overwhelming proportion of R users in our
acquaintance avoid saving the workspace because of the danger of lurking data and
functions which may cause unwanted results.
Nevertheless, to provide compatible behaviour with `nls()`, we will need to ensure
that equivalent behaviour is guaranteed. Furthermore, we need to test that the
operation is correct.
## Feature: missing start vector and self-starting models
Nonlinear estimation algorithms are almost all iterative and need a set of starting
parameters. `nls()` offers a special class of modeling functions called **selfStart**
models. There are a number of these in base R (see list below) and others in R packages
such as CRAN package `nlraa` (@MiguezNLRAA2021), as well as the now-archived package
`NRAIA`. Unfortunately, the structure of the programming of
these is such that the methods by which initial parameters are computed is entangled
with the particularities of the `nls()` code. Though there is a `getInitial()` function,
this is not easy to use to simply compute the initial parameter estimates, in part
because it may call `nls()`.
In the example below, we show how the `SSlogis` selfStart function can generate a set
of initial parameters for a 3-parameter logistic curve. The form used by `SSlogis`
is $$ y \sim Asym/(1+exp((xmid-tt)/scal)) $$
The example shows how these starting parameters can be transformed to those of
another form of the model, namely, $$ y \sim b1/(1 + b2*exp(-b3*t)) $$
Let us look at the actual code for `SSlogis()` in `R-devel/src/library/stats/R/zzModels.R`:
```
SSlogis <- selfStart(~ Asym/(1 + exp((xmid - input)/scal)),
selfStart(
function(input, Asym, xmid, scal)
{
.expr1 <- xmid - input
.expr3 <- exp(.e2 <- .expr1/scal)
.expr4 <- 1 + .expr3
.value <- Asym/.expr4
.actualArgs <- as.list(match.call()[c("Asym", "xmid", "scal")])
if(all(vapply(.actualArgs, is.name, NA)))
{
.expr10 <- .expr4^2
.grad <- array(0, c(length(.value), 3L), list(NULL, c("Asym", "xmid", "scal")))
.grad[, "Asym"] <- 1/.expr4
.grad[, "xmid"] <- - (xm <- Asym * .expr3/scal/.expr10)
.grad[, "scal"] <- xm * .e2
dimnames(.grad) <- list(NULL, .actualArgs)
attr(.value, "gradient") <- .grad
}
.value
},
initial = function(mCall, data, LHS, ...) {
xy <- sortedXyData(mCall[["input"]], LHS, data)
if(nrow(xy) < 4) {
stop("too few distinct input values to fit a logistic model")
}
z <- xy[["y"]]
## transform to proportion, i.e. in (0,1) :
rng <- range(z); dz <- diff(rng)
z <- (z - rng[1L] + 0.05 * dz)/(1.1 * dz)
xy[["z"]] <- log(z/(1 - z)) # logit transformation
aux <- coef(lm(x ~ z, xy))
pars <- coef(nls(y ~ 1/(1 + exp((xmid - x)/scal)),
data = xy,
start = list(xmid = aux[[1L]], scal = aux[[2L]]),
algorithm = "plinear", ...))
setNames(pars [c(".lin", "xmid", "scal")],
mCall[c("Asym", "xmid", "scal")])
},
parameters = c("Asym", "xmid", "scal"))
```
We note that the function includes analytic expressions for the Jacobian ("gradient").
These could be possibly be useful to R users, especially if documented. Moreover, we
wonder why the programmers have chosen to save so many quantities in "hidden"
variables, i.e., with names preceded by ".". These are then not displayed by the `ls()`
command, making them difficult to access.
In the event that a selfStart model is not available, `nls()` sets all the starting parameters
to 1. This is, in our view, tolerable, but could possibly be improved by using a set of values
that are slightly different e.g., in the case of a model $$ y \sim a*exp(-b*x) + c*exp(-d*x)$$
it would be useful to have $b$ and $d$ values different so the Jacobian is not singular. Thus,
some sort of sequence like 1.0, 1.1, 1.2, 1.3 for the four parameters might be better and it
can be provided quite simply instead of all 1's.
```{r hobbsSSlogis}
weed <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972)
tt <- 1:12
NLSformula0 <- y ~ b1/(1+b2*exp(-b3*tt))
NLSformula <- y ~ SSlogis(tt, Asym, xmid, scal)
NLSformulax <- y ~ Asym/(1+exp((xmid-tt)/scal))
NLStestdata <- data.frame(y=weed, tt=tt) # should we use standard name?
s0 <- getInitial(NLSformula, NLStestdata)
print(s0)
s1<-list(b1=s0[1], b2=exp(s0[2]/s0[3]), b3=1/s0[3])
print(as.numeric(s1))
# No actual improvement because nls() has been already used to get the starting values,
# but we do get SEs
hobblog<-nls(NLSformula0, data=NLStestdata, start=s1)
summary(hobblog)
deviance(hobblog)
# nls fails without selfStart -- singular gradient
try(hobblogx<-nls(NLSformulax, data=NLStestdata))
# But Marquardt is able to get a solution easily
library(nlsr)
hobblogxx<-nlxb(NLSformulax, data=NLStestdata, start=c(Asym=1, xmid=1, scal=1))
hobblogxx
```
### selfStart models in base R
The following models are provided (in file ./src/library/stats/R/zzModels.R)
```
SSasymp - asymptotic regression model
SSasympOff - alternate formulation of asymptotic regression model with offset
SSasympOrig - exponential curve through the origin to an asymptote
SSbiexp - y ~ ~ A1 * exp(-exp(lrc1)*input) + A2 * exp(-exp(lrc2) * input)
SSfol - y ~ Dose * (exp(lKe + lKa - lCl) * (exp(-exp(lKe) * input) -
exp(-exp(lKa) * input))/(exp(lKa) - exp(lKe)))
SSfpl - four parameter logistic model
SSlogis - three parameter logistic model
SSmicmen - Michaelis-Menten model for enzyme kinetics
SSgompertz2 - Gompertz model for growth curve data
SSweibull - Weibull model for growth curve data
```
### Strategic issues in selfStart models
Because the Gauss-Newton algorithm is rather unreliable from many starting sets
of parameters, selfStart models are more than an accessory to `nls()` but a part
of the infrastructure. However, creating such functions is a lot of work, and their
documentation (file `./src/library/stats/man/selfStart.Rd`) is quite complicated.
We believe that the focus would better be placed on getting good initial parameters,
possibly with some interactive tools. That is, the emphasis should be on the
`getInitial()` function, though avoiding the current calls back to `nls()`.
## Issue: documentation of the results of running nls()
The output of `nls()` is an object of class "nls" which has the following structure:
### nls() result output according to the documentation
```
A list of:
m an nlsModel object incorporating the model.
data the expression that was passed to nls as the data argument. The actual data values
are present in the environment of the m components, e.g., environment(m$conv).
call the matched call with several components, notably algorithm.
na.action the "na.action" attribute (if any) of the model frame.
dataClasses the "dataClasses" attribute (if any) of the "terms" attribute of the model frame.
model if model = TRUE, the model frame.
weights if weights is supplied, the weights.
convInfo a list with convergence information.
control the control list used, see the control argument.
There are also two deprecated items if algorithm = "port" fit only. These are
convergence (a code = 0 for convergence) and message. These are available from convInfo.
```
### Example output
To illustrate, let us run the Croucher example.
```{r nlsoutx}
# Croucher example
xdata <- c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9)
ydata <- c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001)
p1<- 1; p2<-0.2; NLSstart<-list(p1=p1,p2=p2)
NLSformula <- ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata)
NLSdata<-data.frame(xdata, ydata)
# Try full output version of nls
library(nlspkg) # use the packaged version of nls()
result<-nls(NLSformula, data=NLSdata, start=NLSstart, model=TRUE)
# str(result) -- displays large amount of material - suppressed here
# as it is too wide for the page
result
ls(result) # to list the elements of the output
ls(result$m) # and in particular the "m" object
```
### Concerns with content of the nls result object
The nls object contains some elements that are awkward to produce by other algorithms.
Moreover, some information that would be useful is not presented obviously.
In the following, we use `result` as the returned object from `nls()`.
The `data` return element is an R symbol. To actually access the data from this
element, we need to use syntax
```
eval(parse(text=result$data))
```
However, if the call is made with `model=TRUE`, then there is a returned element
`model` which contains the data, and we can see its contents using
```
ls(result$model)
```
and if there is an element called `xdata`, it can be accessed as `result$model$xdata`.
### Information that is NOT in the nls result object
`nlsr::nlxb()` solves ostensibly the same problem as `nls()` but only claims to return
```
coefficients A named vector giving the parameter values at the supposed solution.
ssquares The sum of squared residuals at this set of parameters.
resid The residual vector at the returned parameters.
jacobian The jacobian matrix (partial derivatives of residuals w.r.t. the
parameters) at the returned parameters.
feval The number of residual evaluations (sum of squares computations) used.
jeval The number of Jacobian evaluations used.
```
However, actually looking at the structure of a returned result gives a list of 11
items:
```
$ resid : num [1:12] 0.0119 -0.0328 0.092 0.2088 0.3926 ...
..- attr(*, "gradient")= num [1:12, 1:3] 0.0271 0.0367 0.0496 0.0666 0.089 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:3] "Asym" "xmid" "scal"
$ jacobian : num [1:12, 1:3] 0.0271 0.0367 0.0496 0.0666 0.089 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "Asym" "xmid" "scal"
$ feval : num 31
$ jeval : num 23
$ coefficients: Named num [1:3] 196.19 12.42 3.19
..- attr(*, "names")= chr [1:3] "Asym" "xmid" "scal"
$ ssquares : num 2.59
$ lower : num [1:3] -Inf -Inf -Inf
$ upper : num [1:3] Inf Inf Inf
$ maskidx : int(0)
$ weights : NULL
$ formula :Class 'formula' language y ~ Asym/(1 + exp((xmid - tt)/scal))
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
- attr(*, "class")= chr "nlsr"
```
This is still a smaller result object than the one `nls()` returns. Moreover, `nlxb`
explicitly returns the sum of squares as well as the residual vector and Jacobian. The
counts of evaluations are also returned. Working on this project showed several potential
updates to the `nlsr` documentation.
### Weights in returned functions from nls()
The functions `resid()` (an alias for `residuals()`) and `fitted()` and `lhs()` are UNWEIGHTED.
But if we return `ans` from `nls()` or `minpack.lm::nlsLM` or our new `nlsj` (interim package),
then `ans$m$resid()` is WEIGHTED.
### Interim output from the "port" algorithm
As the `nls()` **man** page states, when the "port" algorithm is used with the `trace` argument
TRUE, the iterations display the objective function value which is 1/2 the sum of squares (or
deviance). It is likely that the trace display is embedded in the Fortran of the `nlminb`
routine that is called to execute the "port" algorithm, but the discrepancy is nonetheless
unfortunate for users.
### Failure to return best result achieved
If `nls()` reaches a point where it cannot continue but has not found a point where the
relative offset convergence criterion is met, it may simply exit, especially if a
"singular gradient" (singular Jacobian) is found. However, this may occur AFTER the
function has made considerable progress in reducing the sum of squared residuals.
An example is to be found in the `Tetra_1.R` example from the `nlsCompare` package.
Here is an abbreviated version of that problem and the `nls()` output:
```{r tetrarun}
time=c( 1, 2, 3, 4, 6 , 8, 10, 12, 16)
conc = c( 0.7, 1.2, 1.4, 1.4, 1.1, 0.8, 0.6, 0.5, 0.3)
NLSdata <- data.frame(time,conc)
NLSstart <-c(lrc1=-2,lrc2=0.25,A1=150,A2=50) # a starting vector (named!)
NLSformula <-conc ~ A1*exp(-exp(lrc1)*time)+A2*exp(-exp(lrc2)*time)
tryit <- try(nls(NLSformula, data=NLSdata, start=NLSstart, trace=TRUE))
print(tryit)
```
Note that the sum of squares has been reduced from 61216 to 1.6211, but
unless `trace` is invoked, the user will not get any information about this.
This would be an almost trivial change to the `nls() function and could be
useful to R users.
## Feature: partially linear models and their specification
Specifying a model to a solver should, ideally, use the same syntax across
solver tools. Unfortunately, R allows multiple approaches.
One obvious case is that nonlinear modeling tools are a superset of linear ones.
Yet the explicit model
```
y ~ a*x + b
```
does not work with the linear modeling function `lm()`, which requires
this model to be specified as
```
y ~ x
```
However, even within `nls()`, we see annoying inconsistencies. Consider the
following FOUR different calling sequences for the same problem, though the
second is to illustrate how one intuitive choice will not work. In this failed
attempt, putting the `Asym` parameter in the model causes the `plinear` algorithm
to try to add another term to the model. We believe this is unfortunate, and would
like to see a consistent syntax. At the time of writing (end of August 2021) we do
not have a resolution for this issue.
```{r log4ways, echo=TRUE}
DNase1 <- subset(DNase, Run == 1)
## using a selfStart model - do not specify the starting parameters
fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)
summary(fm1DNase1)
## the coefficients only:
coef(fm1DNase1)
## including their SE, etc:
coef(summary(fm1DNase1))
## using conditional linearity - leave out the Asym parameter
fm2DNase1 <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
data = DNase1,
start = list(xmid = 0, scal = 1),
algorithm = "plinear")
summary(fm2DNase1)
## using conditional linearity AND Asym does NOT work
fm2aDNase1 <- try(nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = DNase1,
start = list(Asym=3, xmid = 0, scal = 1),
algorithm = "plinear",
trace = TRUE))
summary(fm2aDNase1)
## without conditional linearity
fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = DNase1,
start = list(Asym = 3, xmid = 0, scal = 1))
summary(fm3DNase1)
## using Port's nl2sol algorithm
fm4DNase1 <- try(nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = DNase1,
start = list(Asym = 3, xmid = 0, scal = 1),
algorithm = "port"))
summary(fm4DNase1)
```
## Issue: code structure
The `nls()` code is structured in a way that inhibits both maintenance and improvement.
In particular, the iterative setup is such that introduction of Marquardt stabilization
is not easily available.
To obtain performance, a lot of the code is in C with consequent calls and returns that
complicate the code. Over time, R has become much more efficient on modern computers, and
the need to use compiled C and Fortran is less critical. Moreover, the burden for maintenance
could be much reduced by moving code entirely to R.
## Issue: code documentation for maintenance
`setPars()` -- explain weaknesses. Only used by `profile.nls()`
The paucity of documentation is exacerbated by the mixed R/C/Fortran code base.
Following is an email to John Nash from Doug Bates. This is NOT a criticism
of Prof. Bates work, but a reflection on how difficult it is to develop code in this
subject area and to keep it maintainable. We have experienced similar loss of understanding
for some of our own codes.
```
I'm afraid that I don't know the purpose of the recursive call either.
I know that I wrote the code to use a closure for the response, covariates, etc.,
but I don't recall anything like a recursive call being necessary.
If the R sources were in a git repository I might try to use `git blame`
to find out when and by whom that was written but they are in an SVN repository,
I think, and I haven't used it for a long, long time.
I don't think I will be of much help. My R skills have atrophied to the point
where I wouldn't even know how to start exploring what is happening in the first
call as opposed to the recursive call.
```
This was in response to an email to Dough bates on Jun 29, 2021 from John Nash.
```
Thanks.
https://gitlab.com/nashjc/improvenls/-/blob/master/Croucher-expandednlsnoc.R
<https://gitlab.com/nashjc/improvenls/-/blob/master/Croucher-expandednlsnoc.R>
This has the test problem and the expanded code. Around line 367 is where we are
scratching our heads. The function code (from nlsModel()) is in the commented lines below
the call. This is
# > setPars
# function(newPars) {
# setPars(newPars)
# resid <<- .swts * (lhs - (rhs <<- getRHS())) # envir = thisEnv {2 x}
# dev <<- sum(resid^2) # envir = thisEnv
# if(length(gr <- attr(rhs, "gradient")) == 1L) gr <- c(gr)
# QR <<- qr(.swts * gr) # envir = thisEnv
# (QR$rank < min(dim(QR$qr))) # to catch the singular gradient matrix
# }
I'm anticipating that we will be able to set up a (possibly inefficient) code
with documentation that will be easier to follow and test, then gradually figure
out how to make it more efficient.
The equivalent from minpack.lm is
setPars = function(newPars) {
setPars(newPars)
assign("resid", .swts * (lhs - assign("rhs", getRHS(),
envir = thisEnv)), envir = thisEnv)
assign("dev", sum(resid^2), envir = thisEnv)
assign("QR", qr(.swts * attr(rhs, "gradient")), envir = thisEnv)
return(QR$rank < min(dim(QR$qr)))
}
In both there is the recursive call, which must have a purpose I don't understand.
```
## Feature: indexed parameters
The **man** file for `nls()` includes the following example of a situation in which
parameters are indexed. It also uses the "plinear" option as an added complication.
Here we use a truncated version of the example to save display space.
```{r nlsindx1}
## The muscle dataset in MASS is from an experiment on muscle
## contraction on 21 animals. The observed variables are Strip
## (identifier of muscle), Conc (Cacl concentration) and Length
## (resulting length of muscle section).
if(! requireNamespace("MASS", quietly = TRUE)) stop("Need MASS pkg")
mm<- MASS::muscle[1:12,] # take only 1st few values of Strip
str(mm)
mm<-droplevels(mm)
str(mm)
nlev <- nlevels(mm)
withAutoprint({
## The non linear model considered is
## Length = alpha + beta*exp(-Conc/theta) + error
## where theta is constant but alpha and beta may vary with Strip.
with(mm, table(Strip)) # 2, 3 or 4 obs per strip
nl <- nlevels(mm$Strip)
## We first use the plinear algorithm to fit an overall model,
## ignoring that alpha and beta might vary with Strip.
musc.1 <- nls(Length ~ cbind(1, exp(-Conc/th)), mm,
start = list(th = 1), algorithm = "plinear")
summary(musc.1)
## Then we use nls' indexing feature for parameters in non-linear
## models to use the conventional algorithm to fit a model in which
## alpha and beta vary with Strip. The starting values are provided
## by the previously fitted model.
## Note that with indexed parameters, the starting values must be
## given in a list (with names):
## ?? but why use b here AND in the new formula??
b <- coef(musc.1)
musc.2 <- nls(Length ~ a[Strip] + b[Strip]*exp(-Conc/th), data=mm,
start = list(a = rep(b[2], nl), b = rep(b[3], nl), th = b[1]))
summary(musc.2)
})
```
Note that the answers for the parameters are NOT indexed. e.g. `coef(musc.2)` is a single level
vector of parameters. We do not see `a[1], a[2], a[3]` but `a1, a2, a3`. This is because the model
must integrate all the parameters because `th` is a common parameter across the index `Strip`.
We believe this structure is quite likely to cause confusion and error, and propose an alternative
approach below.
# Goals of our effort
Here are some of the goals we hope to accomplish.
## Code rationalization and documentation
We want
- to provide a packaged version of `nls()` (call it `nlsalt`) coded entirely in R
that matches the version in base R or what is packaged in `nlspkg` as described
in the "PkgFromRbase" document.
- try to obtain a cleaner structure for the overall `nls()` infrastructure. By this we
mean a re-factoring of the routines so they are better suited to maintenance of both
the existing `nls()` methods and features as well as the new features we would like
to add.
- try to explain what we do, either in comments or separate maintainer documentation.
Since we are complaining about the lack of explanatory material for the current
code, we feel it is incumbent on us to provide such material for our own work, and if
possible for the existing code.
### Rationalization of formula specifications
Below in "Consistent specifications of partially linear models" we point out that
`nls()` uses a different formula specification from the default for a problem if the
`plinear` algorithm is used. This is unfortunate, since the user cannot then simply
add `algorithm="plinear"` to the call. Moreover, we believe it makes errors more
likely. We suggest a possible approach to avoiding this issue, but have yet to
cast this into working code (2021-8-19).
<!-- ``` -->
<!-- y ~ x + z + x*z -->
<!-- y ~ a + b*x +c*z + d * (x*z) # <-- fully specified -->
<!-- y ~ a*Fn1(b, c, x, z)/Fn2() -->
<!-- ``` -->
### Rationalization of indexed models
Indexed models clearly have a place in some areas of research. However, the current
approach in `nls()` is awkward. The user must use DIFFERENT formulas depending on
whether the `plinear` algorithm is chosen. This seems to be related to the need for
that approach to use the `lm()` function or its infrastructure, thus employing
formulas that omit the parameters of models and simply give the variables or functions
thereof, such as interaction terms.
Users, we believe, are much more likely to be comfortable with fully specified formulas.
And such formulas are needed by nonlinear least squares functions such as `minpack.lm::nlsLM`
and `nlsr::nlxb`. Thus we would like the identification of the linear parameters to be,
if possible, automated. At the very least, we should be able to use a structure like
```
algorithm="plinear(parmx, thetaz)"
```
to allow for the full formula to be used with the linear parameters identified.
## Provide tests
We need suitable tests in order:
- to ensure our new `nlsalt` or related packages work properly, in particular, giving
results comparable to or better than the `nls()` in base R or `nlspkg`;
- to test individual solver functions to ensure they work across the range of calling
mechanisms, that is, different ways of supplying inputs to the solver(s);
- to pose "silly" inputs to nonlinear least squares solvers (in R) to see if
these bad input exceptions are caught by the programs.
### A test runner program
?? Arkajyoti -- do you want to expand?
When we have a "new" or trial solver function, we would like to know if it gives
acceptable results on a range of sample problems of different types, starting
parameters, input conditions, constraints, subsets, weights or other settings.
Ideally we want to be able to get a summary that is easy to read and assess. For
example, one approach would be to list the names of a set of tests with a red, green
or yellow dot beside the name for FAILURE, SUCCESS, or "NOT APPLICABLE". In the last
category would be a problem with constraints that the solver is not designed to
handle.
To accomplish this, we need a suitable "runner" program that can be supplied with
the name of a solver or solvers and a list of test problem cases. Problems generally
have a base setup -- a specification of the function to fit as a formula, some data
and a default starting set of parameters. Other cases can be created by imposing
bounds or mask constraints, subsets of the data, and different starts.
How to set up this "runner" and its supporting infrastructure is non-trivial. While
the pieces are not as complicated as the inter-related parts of the solvers, especially
`nls()`, the categorization of tests, their documentation, and the structuring to make
running them straightforward and easy requires much attention to detail.
Some considerations for our test scrips:
- Is it useful to have a "base" script for each family of test problem, with numbered particular
cases? That is, if we run the scripts in order, we can avoid some duplication of code and
data.
- While we have developed some tags to document the test problem families and cases, we believe
that such tags (essentially summary documentation) will continue to need revision as different
tools and problems are included in scope of `nlsCompare`.
- Similarly, we expect that there will be ongoing review of the structure of the result files.
# Outputs of the project
The project output is available in the Git repository https://gitlab.com/nashjc/improvenls
?? Arkajyoti -- do you want to change to Github
## Formal reports or documentation
- RefactoringNLS.Rmd: this document which will become the main report of the project.
- TestsDoc.Rmd: a survey of testing tools in R. It has more general possibilities
and fits into the subject of regression testing, in which case a
more extensive literature review will be needed. Note that this
document reflects the work in the the "Problem sets and test infrastructure"
below.
<!-- need to incorporate MachID and -->
<!-- the many tests. Arkajyoti: should we think of working this up into a paper for -->
<!-- JSS or the R-Journal? My -->
<!-- view is that this COULD be a long term side-interest for your academic work, but -->
<!-- that would depend on your own interests as well as opportunities. -->
## Informal reports
These are documents used to discuss particular aspects of our work. These are part
of the repository https://gitlab.com/nashjc/improvenls.
- DerivsNLS.Rmd: a document to explain different ways in which Jacobian information
is supplied to nonlinear least squares computation in R. File `ExDerivs.R` is
a DRAFT of a script to provide examples.
- ImproveNLS.bib: a consolidated BibTex bibliography for all documents in this
project, possibly with wider application to nonlinear least squares in general
- MachineSummary.Rmd: an informal investigation of ways to report the characteristics
and identity of machines running tests. `MachID.R` offers a concise summary function
to identify a particular computational system used for tests.
Note that there remains an open issue that the get_ram() for Windows 10 in the `benchmarkme`
package may report the RAM size as "NA". We are collaborating with Colin Gillespie,
maintainer of `benchmarkme` to seek a workaround.
- VarietyInNonlinearLeastSquaresCodes.Rmd: a review of the different algorithms and
the many choices in their implementation for nonlinear least squares. This is still
a DRAFT at 2021-8-20.
- PkgFromRbase.Rmd: an explanation of the construction of the `nlspkg` from the
code in R-base.
<!-- We could think of putting this in R-bloggers or similar. Or if -->
<!-- we came up with a more general template, then an R-Journal article. -->
- WorkingDocument4ImproveNLS.Rmd: essentially a way to record what we have worked on.
A project diary.
## Problem sets and test infrastructure
We have several test problems and variants thereof in the `inst/scripts/` directory of
the `nlsCompare` package available on Github (https://github.com/ArkaB-DS/nlsCompare).
We direct the reader to that package for documentation of the test infrastructure,
in particular the problems and methods files (`problems.csv` and `methods.csv`) and
the various functions invoked by `run.R` to produce an output file in CSV form also.