QC-S4.txt

QC and S4
*********

Here are some thoughts on the current situation with the QC tools and S4
classes and methods, and what possible actions and/or improvements could
be.

The QC tools to be considered are the following:

    undoc
    codoc
    checkDocArgs
    checkDocStyle
    checkFF
    checkMethods
    checkReplaceFuns
    checkTnF

* undoc() is about finding 'things' which are missing an entry in the
  documentation system via an appropriate \alias in an Rd file, although
  we think they should (at least) have such an entry.  (Note that with
  the current understanding of "all objects should be documented", an
  \alias suffices.  We might want to ask for explicit documentation in
  some cases as well, e.g. for explicit synopses of functions exported
  in a package namespace.)

  These 'things' currently include

  ** all objects visible to the user (obtained via objects()), with the
     convention that Code objects in add-on packages with names starting
     with a dot are considered 'internal' (not user-level) by
     convention;

  ** all visible data sets;

  ** all S4 classes in the package as obtained by getClasses().

  I still need to read the recent thread on 'Methods and namespaces' to
  see whether the future might have non-exported S4 classes in store: in
  that case, I would assume that getClasses() would only report the
  exported ones.

  I have also recently added code suggested by JMC for testing for
  undocumented S4 *methods* (currently, these computations are only
  performed when running undoc() on package 'methods').  This is based
  on the understanding that 'things' should also include

  ** all S4 methods in the package as obtained by working on the
     MethodsList of each S4 generic listed by getGenerics().

  (More precisely, the inclusion should be for methods owned by the
  package, and exclude default methods created by setMethod() on
  'ordinary' functions.)

  Again, I am not sure whether there can be 'private' S4 methods: my
  current understanding is that along the lines of the Green Book, for
  an S4 generic foo(),

	? foo(x, y)

  would (and will soon) result in lookup for an \alias of the form

	foo-siglist-method

  with 'siglist' obtained from the signature of the methods that would
  be dispatched to, and it would be nice if this was guaranteed to find
  something.

  Summing up, if there is agreement that all S4 methods should be
  documented in the sense of having the appropriate \alias, then the
  corresponding test can be switched on for all packages, but I am not
  sure about the possible implications of using namespaces.

									[OPEN]

* codoc() is about verifying 'consistency of code and documentation', in
  the sense that there is agreement on the basic 'structure'.

  For functions (which are not methods), this is rather simple: either
  \usage or possibly \synopsis should contain a synopsis of the function
  being documented, and agreement means that the names of the formals of
  the function should be the same in both code and documentation.  (In
  fact, codoc() can also verify agreement on possible default values,
  but I don't think anyone is currently using this feature.  It would be
  nice if we would, but then 'strange' default values are an issue.)

  To deal with the fact that S3 methods exist as functions but should be
  documented as methods (i.e., without explicitly indicating their full
  function name), the \method{GENERIC}{CLASS} markup was introduced, and
  recently enhanced to be rendered in a more informative way, such that

	\method{GENERIC}{CLASS}(ARGLIST)

  now expands to

        ## S3 method for class 'CLASS':
	GENERIC(ARGLIST)

  (and 'default' is handled appropriately as well).

  In case S4 methods are to be documented 'explicitly' (i.e., in order
  to indicate their synopsis, e.g. to document 'surprising' arguments),
  I have recently suggested the markup \S4method (sorry, just \method
  would be more consistent with getMethod() etc, but this is already
  taken for S3 methods):

	\S4method{GENERIC}{SIGLIST}(ARGLIST)

  would be expanded to something like

	## S4 method for signature 'SIGLIST':
	GENERIC(ARGLIST)

  There are some implementation challenges here.  The actual codoc()
  test proceeds by Perl code creating a function declaration of the form

	FOO <- function(ARGLIST) NULL

  from a recognized synopsis FOO(ARGLIST): for S4 methods, I think we
  instead need something like

	setMethod("FOO", SIGSPEC, function(ARGLIST) NULL)

  (with 'where = docsEnv') and then compare these to what we find in the
  code.  One also needs to parallel what I've recently done for synopses
  of replacement functions and S3 methods for S4 replacement methods,
  e.g. by turning

	\S4method{GENERIC}{SIGLIST}(ARGLIST) <- value

  into

        setReplaceMethod("FOO", SIGSPEC, function(ARGLIST, value) NULL)

  (not tested yet).

									[DONE]

  (Actually, the above does not work because setMethod() already throws
  an error in case of a mismatch.  We now create a separate environment
  with code stubs of the S4 methods, and compare these to what we find
  in the environment obtained from the documentated usages, using
  \S4method{GENERIC}{SIGLIST} as the function name in both cases.)

  Assessing 'agreement' of code and documentation for S4 classes, or
  more generally non-function 'things', is currently non-existent.  We
  need to specify the 'structure' for which agreement is needed.
  Function promptClass() currently creates a stub with entries for
  Slots, Extends and Methods.  A minimal requirement seems to be that
  code and documentation agree on the slots.  E.g., for promptClass() on
  class "genericFunction" we get

    \section{Slots}{
      \describe{
        \item{\code{.Data}:}{Object of class \code{"function"} ~~ }
        \item{\code{generic}:}{Object of class \code{"character"} ~~ }
	...

  and provided that we can rely on the assumption that after editing the
  stub the slots will still correspond to the information represented by
  the items (modulo \code) in the describe list in section Slots, we can
  repeatedly use the recently added tools:::getRdSection() to compute
  the slots documented, and compare these to the ones actually in the
  code.

									[DONE]

  Alternatively, we could try to add Rd markup for describing the
  structure of the class: in that case, the markup would be compared to
  the code, but unless the markup is comprehensive (and e.g. for
  function documentation we still have \arguments for documenting the
  formals), we still need to verify internal consistency of the Rd file
  along the lines of checkDocArgs(), see below.  (Whether this is Rd or
  e.g. XML markup is irrelevant as far as I am concerned.)

  I assume that what is currently put in section 'Methods' is best
  regarded as dynamic information.  What about section 'Extends'?
									[OPEN]

  As already mentioned above, there are related issues for e.g. Rd files
  documenting variables (or data sets) which are data frames (as very
  recently pointed out to me by DB): prompt.data.frame() creates a stub
  in which the variables in the data frame are documented as a describe
  list inside a \format section, so what codoc() should do is to compare
  the items of that list to the names of the variables in the data
  frame.  (Unfortunately, we currently do not know whether a given Rd
  file provides documentation for a data frame: I assume we want another
  \docType{} here?)
									[DONE]

* checkDocArgs() tests for agreement between the formals shown in \usage
  (note: not \synopsis, as \usage is what gets rendered) and the items
  documented in \arguments.  This was recently enhanced to also report
  items documented in \arguments but missing from \usage.  We need to
  add support for dealing with S4 methods documented via \S4method{}{},
  but that is the only case whether I can see the \usage being used for
  documentation of S4 classes and methods.

									[DONE]

* checkDocStyle() tests for \usage sections where S3 methods are shown
  alongside the generic, or by their function name (i.e., not using the
  \method{CLASS}{GENERIC} markup).  The former was taken to be a problem
  as e.g.

	barplot(x, ...)
	\method{barplot}{default}(x, MOREARGS)

  would be rendered as

	barplot(x, ...)
	barplot(x, MOREARGS)

  making it a bit tricky to correctly document the primary argument for
  both the generic and the S3 method.  With the new rendering

	barplot(x, ...)
	## Default S3 method:
	barplot(x, MOREARGS)

  it is easy to refer to the respective methods.  Thus, I think we will
  retain the computed data on S3 methods shown alongside the generic,
  but not print it by default anymore.  On the other hand, we could now
  be more stringent about usages for S3 methods not using \method{}{}.
  (Nevertheless, there is little point worrying about getting the \usage
  right and subsequently still referring to the method by its function
  name, so I assume that checkDocArgs() will remain a 'lesser' test.)

									[DONE]

  In any case, currently documentation for S4 classes does not use
  \usage, and I see little point worrying about possible \usage entries
  for S4 methods via the function name of what is subsequently entered
  into setMethod(), so nothing related to S4 classes and methods as far
  as I am concerned.

* checkMethods() tests whether S3 methods have 'all arguments of their
  generic'.  It currently only looks at S3 methods, based on the idea
  that if a function has a call to UseMethod() in its body then it
  should be investigated further (currently, without going through the
  trouble of dealing with possibly dispatching under a different name,
  as the methods() code now does).

  My understanding is that setMethod() already verifies consistency of
  the formals via conformMethod(), so no action is required here, apart
  from possibly changing the name checkMethods() to checkS3Methods() [or
  checkS3methods()].

* checkReplaceFuns() tests whether replacement functions and S3 methods
  have their final formal called 'value' (which seems to be the only way
  to consistently handle replacement functions in both implementations
  of the S language.  It already tries to deal with S3 replacement
  methods registered with namespace renaming, e.g.

	S3method("length<-", "foo", ".myLittleFun")

  (the point being that the function name of what is registered does not
  end in '<-').

  It currently does not deal with S4 replacement methods as defined by
  calls to setReplaceMethod(), but it should not be too hard to extend
  it accordingly, modulo returning something better than a list of the
  names of replacement functions with offending final formal.

									[DONE]

  I assume that an S4 generic with name ending in '<-' is a (generic)
  replacement function?

* checkFF() tests whether the interface functions such as .C() or
  .Call() are called with argument 'PACKAGE' specified, which is highly
  recommended by R-exts.  In the case of a package installed as a save
  image, the current implementation only looks at the bodies of
  functions listed by ls(), and hence knows nothing about code inside S4
  methods.  I think it should not be too hard to also look there.

									[DONE]

* Finally, checkTnF() is a 'lesser' test for the occurrence of literals
  'T' and 'F' in package code, with the idea that these should possibly
  be replaced by 'TRUE' and 'FALSE'.  This test cannot deal with save
  images, and packages using S4 classes are typically installed as such.
  It can analyze the raw code, of course.