-
Notifications
You must be signed in to change notification settings - Fork 2
/
DynamicCSymbols.tex
343 lines (281 loc) · 12.8 KB
/
DynamicCSymbols.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
\documentclass{article}
\usepackage{fullpage}
\usepackage{times}
\usepackage{hyperref}
% The following are in the Omegahat docs.
\usepackage{pstricks}
\input{WebMacros}
\input{CMacros}
\input{SMacros}
\author{Duncan Temple Lang}
\title{An Alternative Mechanism for Resolving C Symbols}
\begin{document}
\begin{abstract}
This is a note on the way the C level symbols are accessed in the S
language (R and S-Plus). Luke and I propose that we introduce a
more controlled and informative way to explicitly \textit{register}
C routines that can be called from S via the \SFunction{.Call},
\SFunction{.C} and \SFunction{.Fortran} functions. For a package
that uses this registration mechanism, rather than dynamically
resolving arbitrary C routines by name, we limit the available
routines to those explicitly exported by a module. The export
information include the external name and the address of the C
routine itself. The \SFunction{.Call}, etc. functions are called
with the alias name used to register the C routine, much like
the way the internal/primitive functions are mapped to C routines in
\file{names.c}.
This registration approach is used by Python. The benefits
include
\begin{itemize}
\item restricted access to non-exported symbols, reducing
the potential for fatal errors;
\item avoids the underscore issue;
\item link-time resolution rather than load-time;
\item allows aliasing/renaming of routines;
\item allows static routines to be exported;
\item reduces the differences between the different OSes.
\end{itemize}
Generally, there is more structure and the potential for exploiting
this is greater. Also, we are relying less on linking and loading,
and more on the more predictable C language!
In the near future, we can extend this information to optionally
include details about the number and types of arguments. This great
flexibility for handling external/foreign data types such as objects
from other languages (e.g. Perl, Python, Java, etc.), and result
sets from database queries, hdf5 objects, etc.
At present the author of the package would need to manually create
the tables (arrays) of routines that are to be exported. While this
is reasonably trivial, we can also simplify the use of this
mechanism for library developers by creating a tool that can
automatically generate the tables of available and called routine
symbols. This tool can potentially even generate the S-level wrapper
functions to these routines.
\end{abstract}
\section{The Basic Idea}
The basic setup is that the author of a package will explicitly
register C routines that can be called via the \SFunction{.Call},
\SFunction{.C} and \SFunction{.Fortran} functions. This is done by
putting entries for each of the symbols that the developer makes
accessible to the S user into the appropriate table. We use three
tables: one for the functions accessed via the \SFunction{.C}
function, another for those accessed via the \SFunction{.Call}, and
finally one for the \SFunction{.Fortran} function. These routines are
registered with the S engine by calling the C routine
\Croutine{R_registerRoutines}. This should be done within an
initialization routine for the library. When the library is loaded,
we look for such a routine by the name \Escape{R_init_}\textit{library
name} and invoke it if it is available. In this way, the library
need only explicitly export one symbol for access via \Croutine{dlsym}
or its equivalents on other platforms.
The \Croutine{R_registerRoutines} routine processes the symbol
entries in each of the three tables and stores them away in internal
structures. The internal code for \SFunction{.Call}, \SFunction{.C}
and \SFunction{.Fortran} call \Croutine{R_FindSymbol} to resolve the
native symbol. It is here that we see if the library/libraries in
which we search for the symbol have this explicitly exported
information. If so, we only look in those tables. We search for an
entry in the tables whose public name matches the specified symbol
being sought. If found, we return the associated routine pointer. If
no such entry is found in this library, but there it has registered
routines, we do not look further in this library. However, if there
are no registered routines for this library, we use the current
dynamic lookup mechanism using \Croutine{dlsym}.
This approach preserves backward compatibility and the new mechanism
is merely an extension. The overhead should be minimal if it is not
used. The current implementation of the lookup uses a linear search.
As it is accepted and used more, we can move to a hash-table.
\section{Example}
The following is an example of how one uses this registration
facility. One needs to include \file{Rforeign.h} to obtain the
definition of the \CStruct{R_CMethodDef} and \CStruct{R_CallMethodDef}
structures. Then, we specify the tables of \{name, routine, number of
argument\} tuples. (Currently the number of arguments is not used. It
will be used in another extension Robert, John and I are discussing.)
Finally, we register the entries in the tables. We do this by calling
\Croutine{R_registerRoutines}. It is easiest to do this by creating
an initialization routine for the library that R will call
automatically when it is loaded. In our case, the library is named
\file{foo.so}. Therefore, the name of this initialization routine for
which R will search is \Croutine{R_init_foo}. This takes a single
argument of type \CStruct{DllInfo *}. This is passed back to
\Croutine{R_registerRoutines} to identify for which DLL the symbols
are being registered. If one chooses not to use an initialization
routine, one can get the \CStruct{DllInfo} reference to pass to
\Croutine{R_registerRoutines} via a call to \Croutine{R_getDllInfo}
with the fully qualified path of the library file (i.e. including the
full directory and name).
\begin{verbatim}
#include "Rinternals.h"
#include "R_ext/Rforeign.h"
static void R_localMean(double *vals, long *len);
static void foo(double *vals, long *len, double *val);
static SEXP myCall(SEXP arg);
static R_CMethodDef R_CDef[] = {
{"foo", &foo, 3},
{"myMean", &R_localMean, 2}
};
static R_CallMethodDef R_CallDef[] = {
{"myCall", myCall, 1}
};
void
R_init_foo(DllInfo *info)
{
R_registerRoutines(info,
R_CDef, 2, /* .C routines */
R_CallDef, sizeof(R_CallDef)/sizeof(R_CallDef[0]),
NULL, 0);
}
static void
R_localMean(double *vals, long *len)
{
double mean = 0;
foo(vals, len, &mean);
vals[0] = mean;
return;
}
/* Naive way to compute the mean. */
static void
foo(double *vals, long *len, double *val)
{
double mean = 0;
int i;
for(i = 0; i < *len;i++) {
mean += vals[i];
}
*val = mean/(*len);
return;
}
static SEXP
myCall(SEXP arg)
{
SEXP ans;
PROTECT(ans = allocVector(INTSXP, 1));
INTEGER(ans)[0] = Rf_length(arg);
UNPROTECT(1);
return(ans);
}
\end{verbatim}
\section{Comments}
Python uses \CNull{} to identify the end of the array so that the user
doesn't have to specify the number of elements being registered. This
doesn't seem appropriate for our use, and is easy to use later.
We choose to have three separate tables for the routines that are to
be accessed via the \SFunction{.C}, \SFunction{.Call} and
\SFunction{.Fortran} functions. At present, the information is
exactly the same for all of them. However, in the near future we will
support additional fields which specify information about the argument
types (convert routines, type constraints, etc.).
\section{Necessary Code Changes}
I have made the changes for UNIX. Nothing is conceptually different
for Windows or the Macintosh versions. If anything, this mechanism
reduces the differences as the symbol resolution does not use system
level tools. Rather than adding support for each of the three
\file{dynload.c} files, I believe it is best to consolidate these
three different versions that currently contain large blocks of
duplicated code into a single file that is in \dir{main} of the R
base. Then I will abstract the platform specific differences using
function pointers (or different OS-specific versions of the same
symbols that are bound at link time). This consolidation of code is
necessary if we are to go further with the extensions to the
\SFunction{.C} and \SFunction{.Call} functions that Robert, John, Luke
and I have been discussing and of which I will provide a description
shortly.
The additions to support this facility in a backward compatible
manner involve the following.
\begin{description}
\item[\Croutine{AddDLL}] In \Croutine{AddDLL}, we look for a routine by the name
\Croutine{R_init_<library name>}. If we find it, we invoke it. This
is expected to register the exported routines via a call to
\Croutine{R_registerRoutines}.
\item[\CStruct{DllInfo}]
We make an explicit structure to hold the \CStruct{DllInfo}
elements. This structure is extended to have
fields for the arrays and lengths of the arrays of
the different types of symbols.
\item[\Croutine{R_dlsym}]
Rather than passing the handle returned from \Croutine{dlopen}, we
pass the containing \Croutine{DllInfo} reference to the library in
which the symbol is being searched. This routine now checks
whether there are any symbols in the the export tables,
and looks through those.
If this search fails to find the symbol in the tables,
we signal an error.
If there are no explicitly exported entries,
we use the regular
\Croutine{dlsym} mechanism to find the symbol in the shared library.
\item[\file{Rforeign.h}] We add a file named \file{Rforeign.h} in the
\dir{include/R_ext} directory. This contains declarations for the
different types the user needs to create the different tables and
call \Croutine{R_registerRoutines}.
\end{description}
We can make this faster by populating a hash-table when we register the
routines for a different type (\SFunction{.Fortran}, \SFunction{.C},
\SFunction{.Call}).
\section{Questions}
Here are some questions that I would like some response
\begin{description}
\item Shall I commit this for the Unix version? Should I wait to
have a Windows and Mac version also?
I suggest no and that I consolidate the code for the three versions
into a single common/shared version that is parameterized with
function pointers.
\item Under what circumstances is \CppMacro{CACHE_DLL_SYM} activated?
\item The dynload code for the Mac looks very similar to that for
Unix but does not share structure definitions, etc.
Similarly, the Windows dynload is not quite as similar but does
share the same structure. Should we move to function pointers for
implementing the differences and centralize the common facilities?
This will become more important if we decide to do more with
these symbols such as checking arguments, calling user-level
converters, etc.
\item Should \SFunction{is.loaded}, \SFunction{symbol.C} and
\SFunction{symbol.For} take a \SArg{PACKAGE} argument.
Similarly,
\SFunction{.Call} is not documented as having one.
Should \SFunction{symbol.C} and \SFunction{symbol.For} also return
the package they were found in, if they were found.
\end{description}
\section{Tests}
We compile the two shared libraries \SharedLibrary{foo} and
\SharedLibrary{bar} from the tar file \href{http://www.omegahat.org/Dynload.tar.gz}{dynload
test}.
\begin{verbatim}
> dyn.load("bar.so")
> .Call("foo")
[1] TRUE
> .Call("foo", PACKAGE="bar")
[1] TRUE
> dyn.load("foo.so")
> .Call("foo", PACKAGE="bar")
[1] TRUE
> .C("foo", as.numeric(1:10), as.integer(10), x=numeric(1))$x
[1] 5.5
> .C("foo", as.numeric(1:10), as.integer(10), x=numeric(1), PACKAGE="foo")$x
[1] 5.5
\end{verbatim}
\end{document}
typedef R_CArgType {REAL, INTEGER, LOGICAL, LIST};
\begin{description}
\item[library.dynam]
Add an argument to represent new style packge. This would be the
name of a C routine to call or alterntively a logical value
indicating whether to call the default name,
`\Croutine{R_init}\textit{package name}'. This routine is passed a
single argument which is a reference to the structure representing
the shared library. This is an opaque structure and is intended to
be passed back to some routines to manage how R sees the symbols.
Specifically, it is used in the call to
\Croutine{R_registerRoutines}.
Any other initialization can be done at this time, such as computing
constants, etc.
While this can all be done using compiler and linker facilities for
registering a C routine that is to be called when the library is
loaded, these are not necessarily easily portable across platforms
or linkers.
\item[\SFunction{.C} \& \SFunction{.Call}] When invoking a C routine,
we now have to include this lookup facility. If the package is
specified, we look up the package structure and check if it has C
symbols. If it does, we look in that list. Otherwise, we signal an
error. If no package was specified, we do the lookup in the current
manner.
\end{description}