Ufuncs and reductions #318

kohr-h · 2016-12-28T19:27:31Z

So here's version 1 of the ufuncs and reductions. Everything works as far as I can see, that means all ufuncs and reductions with all dtypes except float16.

In general, I have added a bunch of TODO notes in the code where I didn't quite know if there's a better way to implement something. Here are some more high-level TODOs:

Remaining open questions / TODOs:

~~Add ufuncs as ndgpuarray methods?~~ Probably not, for this we have the __array_ufunc__ interface.
Expose ufuncs in the top-level pygpu namespace?
~~Pre-compile kernels for ufuncs and reductions and a selection of dtypes and ndims? That would make them much faster when called for the first time.~~ This would be a new issue, too much for thei PR. Related to Disk cache #335.
The tests are currently not very exhaustive, they just use random positive arrays and check results against Numpy. The functions where it makes sense should also test mixed-sign arrays and arrays containing Inf and NaN. Also a test with the same arrays as input should be added for the comparison methods.
Which style should the docstrings adhere to? I used numpydoc style.
How to handle ufuncs with multiple outputs? ~~Their value is probably marginal so I didn't bother for now.~~ This was easier than expected.
Make a comprehensive info dictionary per ufunc for things like default output dtype (e.g. for logical), reorderable, known failures etc.
Remove doc assignment from numpy
Remove debug print of kernel source

Some notes on implementation details, problems and solutions.

Functions follow Numpy signature

All functions have the same signature as their Numpy counterparts, which is

ufunc(a[, out]) for unary ufuncs,
ufunc(a, b[, out]) for binary ufuncs,
reduction(a, axis=None, dtype=None, out=None, keepdims=False) for sum, prod and
reduction(a, axis=None, out=None, keepdims=False) for amin, amax

Array-like input

Problem 1:
All array arguments are allowed to be array-like, which has the consequence that in the extreme case when all arguments are not GpuArray instances, there is no object from which to get a context.
Current solution: A default context must be defined in that case.

Problem 2: Which class should be chosen for the output?
Current solution: ndgpuarray

Ufunc signatures are matched with those of Numpy ufuncs

The ufunc_dtypes helper looks at <ufunc>.types and tries to find an input-output pair of dtypes that are adequate for the given input dtypes. Arrays are transformed to those types if necessary. This is a bit tricky for ldexp which needs a float and an integer array to work - that case is handled a bit hacky when the second argument is actually a scalar and not an array.

Current failures

Some ufuncs have a different upcasting behavior from Numpy's when called with a negative scalar. For example, logaddexp called with arguments -2 and a 'uint16' array results in a float64 for our code and in float32 in Numpy. This is because I use numpy.result_type(a, b) to determine the result dtype of both arguments, which yields numpy.int64 in our case, which in turn triggers upcasting to numpy.float64.
Not sure how to handle this.
Division by zero is not handled at all in this code, while Numpy does some checking and other magic. Maybe that's not very important.
Some other cases, everything documented in the test code.

Random notes

How do you handle contributed code?
The commit history is not cleaned up but rather a port from development out-of-tree here. The "honest" history is there. (I did that mostly due to better test error reporting, see below).

Edit: Question on license and remark on commit history added.
Edit2: Removed long-ish output comparison between nose and pytest, and removed the license question again.
Edit3: Section on problems with some ufuncs removed, that issue is solved.
Edit4: Test failures documented.
Edit5: TODO about test scope added.
Edit6: TODO about info dict added, otherwise removed some stuff.
Edit7: Updated ndgpuarray TODO

abergeron · 2017-04-18T17:45:23Z

This looks very interesting, but I won't have time to review in the coming week. I'm sorry for the delay.

kohr-h · 2017-04-18T19:03:06Z

This looks very interesting, but I won't have time to review in the coming week. I'm sorry for the delay.

Don't worry, I'm still finalizing the last part, but in the meanwhile some issues have also gone away magically :-)
I'll bump this for a review sometime soon.

kohr-h · 2017-04-19T11:48:57Z

Bump already.

This is actually in quite good shape now, only a few ufuncs missing and everything works apart from some documented cases and potentially corner cases that aren't tested for yet. I'll leave it at that for now and wait for a review.

kohr-h · 2017-05-16T22:43:13Z

Maybe somebody can look into this at a point in the not-too-distant future?

I'd also like to add that NumPy now exposes the __array_ufunc__ interface, which means that a class that implements __array_ufunc__ gets to decide what to do when a NumPy ufunc is called on it. That's very good since somebody calling np.sin on a GPU array will then actually call the super-fast GPU elementwise kernel instead of casting to a NumPy array and then calling the NumPy implementation. (In another way that's bad since users may forget who's the workhorse :-) ).

Anyway, that would be a natural follow-up of this PR.

nouiz · 2017-05-17T13:29:06Z

Not this week. It is nips deadline and Arnaud isn't here, but maybe/probably next week.

abergeron

I've noted a couple of minor problems, but I can't help but feel the approach is somewhat off.

This mimics the very surface of the functionality that ufuncs offers. It does not support accumulate, reduce and others. It does not allow the user to define their own ufuncs. It doesn't support the where and order arguments. Most importantly it doesn't unify the interface.

I understand that your time is limited and you might not have the time to do all those things right now, but the current structure would make those very hard to add. If at least all the ufuncs where instances of a class, then we could modify that class later to provide those things.

abergeron · 2017-05-18T14:03:01Z

pygpu/ufuncs.py

+def make_unary_ufunc(name, doc):
+    def wrapper(a, out=None):
+        return unary_ufunc(a, name, out)
+    wrapper.__qualname__ = wrapper.__name__ = name


__qualname__ doesn't exist in python 2

abergeron · 2017-05-18T14:04:24Z

pygpu/ufuncs.py

+# Add the ufuncs to the module dictionary
+for ufunc_name in UNARY_UFUNCS:
+    npy_ufunc = getattr(numpy, ufunc_name)
+    descr = npy_ufunc.__doc__.splitlines()[2]


I'm not sure I like using numpy's documentation like this here. I think leaving these undocumented for now is probably better.

abergeron · 2017-05-18T14:05:09Z

pygpu/ufuncs.py

+See Also
+--------
+numpy.{}
+""".format(ufunc_name)


What if there is already a See Also section?

abergeron · 2017-05-18T17:20:07Z

src/gpuarray_buffer_cuda.c

@@ -1025,6 +1025,8 @@ static int call_compiler(cuda_context *ctx, strb *src, strb *ptx, strb *log) {
    , "-G", "-lineinfo"
  };
  nvrtcResult err;
+  // DEBUG: print the kernel source
+  // printf("%s\n", src);


Please remove this

abergeron · 2017-05-18T17:48:24Z

pygpu/ufuncs.py

+    ('rad2deg', 'radians'),
+    ('true_divide', 'divide'),
+    ('maximum', 'fmax'),  # TODO: differ in NaN propagation in numpy, doable?
+    ('minimum', 'fmin'),  # TODO: differ in NaN propagation in numpy, doable?


Adjusting NaN propagation is doable with a bit of C code. If you don't want to do it, I'd rather we don't provide fmax/fmin than provide them with the wrong comportment.

kohr-h · 2017-05-18T22:10:30Z

I've noted a couple of minor problems, but I can't help but feel the approach is somewhat off.

This mimics the very surface of the functionality that ufuncs offers. It does not support accumulate, reduce and others. It does not allow the user to define their own ufuncs. It doesn't support the where and order arguments. Most importantly it doesn't unify the interface.

I understand that your time is limited and you might not have the time to do all those things right now, but the current structure would make those very hard to add. If at least all the ufuncs where instances of a class, then we could modify that class later to provide those things.

Thanks for the quick review. I agree that some flexibility wouldn't hurt. Making the ufuncs class instances that implement __call__ shouldn't be a big deal.
On the other hand, I don't see how this keeps users from defining their own ufuncs. It's actually easier when they don't have to adhere to a specific class interface. In some sense, the elemwise helpers provide exactly that.

Anyway, my approach here was to get the basic stuff working first and not let the amount of code get too much out of hand.
Certainly ufunc.reduce seems relatively easily doable, maybe also outer, but for the rest accumulate, at, reduceat and the where argument I don't really have a good idea. The last 3 require index arrays with broadcasting and all, which is not trivial as far as I can tell. The accumulate function may be doable with some mako templating, but I assume it would be rather slow due to bad parallelism.

Another way of playing this would be to just call into NumPy's function (e.g. at) in case there is no native implementation, instead of leaving it unimplemented. An upside of that would be that people can write code that works, and which is slow now but becomes fast when someone sits down and implements the feature. In the other scenario, people would use the NumPy ufunc and never learn about the added native feature.

I'll do the minimal fix and make ufuncs classes in any case.

kohr-h · 2017-06-06T22:50:24Z

I made ufuncs class instances now, with a rather lame solution for ufuncs with different signatures (they are different classes; If you know a good solution for creating a callable class with __call__ signature changing per instance, please let me know.)

I also added some native ufunc.reduce methods. For all other methods (like accumulate) the code simply calls into the corresponding Numpy function. What's currently missing is a fallback for at, I'll add that.

abergeron · 2017-06-07T14:44:28Z

pygpu/ufuncs.py

+
+    if need_context:
+        ctx = get_default_context()
+        cls = ndgpuarray  # TODO: sensible choice as default?


This is ok for now.

abergeron · 2017-06-07T14:45:01Z

pygpu/ufuncs.py

+        ctx = get_default_context()
+        cls = ndgpuarray  # TODO: sensible choice as default?
+
+    # TODO: can CPU memory handed directly to kernels?


No it can't. Unless it's a scalar.

abergeron · 2017-06-07T14:46:24Z

pygpu/ufuncs.py

+        if a.flags.f_contiguous and not a.flags.c_contiguous:
+            order = 'F'
+        else:
+            order = 'C'


You can pass a directly to array(). It will handle all the array-like things

abergeron · 2017-06-07T14:48:43Z

pygpu/ufuncs.py

+            need_context = False
+
+    if need_context:
+        ctx = get_default_context()


Instead of doing this, it would be better to add an additional argument to the signature to specify the context.

You can make it a keyword argument to avoid major conflicts with numpy.

If no context is specified, the default context (if any) should automatically be used.

abergeron · 2017-06-07T15:06:28Z

pygpu/ufuncs.py

+        neutral = 'INFINITY'
+    elif numpy.issubsctype(a.dtype, numpy.complexfloating):
+        raise ValueError('array dtype {!r} not comparable'
+                         ''.format(a.dtype.name))


Just the else should be enough.

abergeron · 2017-06-07T15:08:01Z

pygpu/ufuncs.py

+
+
+# This dictionary is derived from Numpy's C99_FUNCS list, see
+# https://github.com/numpy/numpy/search?q=C99_FUNCS


Take care with this because the code you are generating is not C99, but CUDA or OpenCL. I'm pretty sure at least some of these will not be available, like cbrt.

So far I haven't found any that don't work, but that will probably depend on the CUDA version and may also be different for OpenCL.

abergeron · 2017-06-07T16:16:14Z

In numpy the ufuncs are in C and they use the fact that python can't know the signature of C functions to trick the interpreter into thinking that there is a fixed number of arguments.

In reality the functions accept any number of arguments and perform a check on them to limit the number according to the instance.

In python it would be possible to "patch in" a call to have a differing signatures, but I don't think this is worth the effort.

Also, if we don't have certain methods because they aren't available on the GPU (like at(), accumulate(), ...) then I'd rather they stay unimplemented for now rather than fallback to the CPU. This way it can at least attract attention to the fact that it needs more work if somebody needs this.

abergeron · 2017-06-12T15:47:18Z

jenkins test this please

kohr-h · 2017-06-12T15:52:00Z

Circular dependency, needs fix

abergeron · 2017-06-12T15:41:40Z

pygpu/ufuncs.py

-        self.accumulate.__name__ = self.accumulate.__qualname__ = 'accumulate'
+        self.accumulate.__name__ = 'accumulate'
+        if PY3:
+            self.accumulate.__qualname__ = 'accumulate'


Why is this done different than the others?

Not intentionally, copy-paste error.

abergeron · 2017-06-12T16:22:05Z

jenkins test this please

abergeron · 2017-06-12T16:25:04Z

jenkins test this please

abergeron · 2017-06-12T16:57:50Z

It seems your code is not ready for python2. Make sure to test it on a python2 install somewhere.

six can help with portability issues.

kohr-h · 2017-06-12T17:01:23Z

There were some other issues. There is still a failure with ldexp (doesn't raise where it should), will look into it.

kohr-h · 2017-06-12T22:32:42Z

Py2 failed due to some string issue, but there was an actual issue with deg2rad, missing precision. I also added an optional context argument to everything and simplified the ufunc wrapper a bit. Still a bit of remaining work, but we're getting closer.

Numpy does some strange things with bools in ufuncs, we just rely on C behavior. The thus failing tests are filtered out. Other tests for corner cases fail due to non-obvious handling of zero division and upcasting in Numpy, they're filtered out, too.

abergeron reviewed May 18, 2017

View reviewed changes

abergeron reviewed Jun 7, 2017

View reviewed changes

abergeron reviewed Jun 12, 2017

View reviewed changes

Holger Kohr added 10 commits July 6, 2017 11:44

ENH: add numpy-style ufuncs and reductions

4730095

TST: add tests for ufuncs and reductions

baf0d9b

TST: add test and fixes for binary ufuncs with scalars

159133f

TEMP: insert debug print into call_compiler to see kernel source

66d69b9

TST: rewrite tests in yield style

f03bf65

MAINT: various minor fixes

988adf8

ENH: add all and any reductions

3579188

BUG: fix ufuncs for bool dtype

696751f

Numpy does some strange things with bools in ufuncs, we just rely on C behavior. The thus failing tests are filtered out. Other tests for corner cases fail due to non-obvious handling of zero division and upcasting in Numpy, they're filtered out, too.

BUG: fix power function rounding issue

55018f1

TST: fix or filter out tests of ufunc corner cases

1dbe9cd

Holger Kohr added 19 commits July 6, 2017 11:44

ENH: use preamble for more complex ufuncs

be93694

ENH: add some more ufuncs

a39d64a

ENH: implement two-out ufuncs and fix spacing

7d6fe84

BUG: fix NaN comparison in ufuncs

994eaeb

BUG: fix binary ufunc for non-array second op

fa04da3

ENH: make ufuncs classes

61b75cf

TST: add tests for ufunc.reduce

d1c5977

ENH: use new reductions on ndgpuarray

fccfeea

MAINT: remove Numpy fallbacks in ufuncs

fed41eb

ENH: add __array_ufunc__ interface and more ndgpuarray methods

0a8c136

MAINT: minor fixes

2c99f57

BUG: fix circular imports

bc14f52

BUG: fix qualname of accumulate

627a448

BUG: remove imports from builtin

b6a8901

MAINT: raise TypeError for deprecated boolean ops in numpy >= 1.13

48906b7

TST: avoid failing test due to not implemented reduction

0bf6043

BUG: fix missing precision in deg2rad constant

9f0a578

MAINT: add optional context parameter to ufuncs and friends

dd88c9a

MAINT: remove debug print statement from gpuarray_buffer_cuda.c

efea028

kohr-h mentioned this pull request Aug 21, 2017

slow maximum function odlgroup/odlcuda#24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ufuncs and reductions #318

Ufuncs and reductions #318

kohr-h commented Dec 28, 2016 •

edited

Loading

abergeron commented Apr 18, 2017

kohr-h commented Apr 18, 2017

kohr-h commented Apr 19, 2017

kohr-h commented May 16, 2017

nouiz commented May 17, 2017

abergeron left a comment

abergeron May 18, 2017

abergeron May 18, 2017

abergeron May 18, 2017

abergeron May 18, 2017

abergeron May 18, 2017

kohr-h commented May 18, 2017

kohr-h commented Jun 6, 2017

abergeron Jun 7, 2017

abergeron Jun 7, 2017

abergeron Jun 7, 2017

abergeron Jun 7, 2017

abergeron Jun 7, 2017

abergeron Jun 7, 2017

kohr-h Jun 12, 2017

abergeron commented Jun 7, 2017

abergeron commented Jun 12, 2017

kohr-h commented Jun 12, 2017

abergeron Jun 12, 2017

kohr-h Jun 12, 2017

abergeron commented Jun 12, 2017

abergeron commented Jun 12, 2017

abergeron commented Jun 12, 2017

kohr-h commented Jun 12, 2017

kohr-h commented Jun 12, 2017



		# This dictionary is derived from Numpy's C99_FUNCS list, see
		# https://github.com/numpy/numpy/search?q=C99_FUNCS

Ufuncs and reductions #318

Are you sure you want to change the base?

Ufuncs and reductions #318

Conversation

kohr-h commented Dec 28, 2016 • edited Loading

Functions follow Numpy signature

Array-like input

Ufunc signatures are matched with those of Numpy ufuncs

Current failures

Random notes

abergeron commented Apr 18, 2017

kohr-h commented Apr 18, 2017

kohr-h commented Apr 19, 2017

kohr-h commented May 16, 2017

nouiz commented May 17, 2017

abergeron left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kohr-h commented May 18, 2017

kohr-h commented Jun 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abergeron commented Jun 7, 2017

abergeron commented Jun 12, 2017

kohr-h commented Jun 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abergeron commented Jun 12, 2017

abergeron commented Jun 12, 2017

abergeron commented Jun 12, 2017

kohr-h commented Jun 12, 2017

kohr-h commented Jun 12, 2017

kohr-h commented Dec 28, 2016 •

edited

Loading