Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-84016: Implement missing PyObject_CopyToObject in PEP3118 #124747

Closed
wants to merge 21 commits into from

Conversation

rruuaanng
Copy link
Contributor

@rruuaanng rruuaanng commented Sep 29, 2024

https://peps.python.org/pep-3118/
PEP3118 specifies the following API requirements

Py_ssize_t PyBuffer_SizeFromFormat(const char *)
PyObject * PyMemoryView_GetContiguous(PyObject *obj,  int buffertype,
                                      char fortran)
int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
                          char fortran)
int PyObject_CopyData(PyObject *dest, PyObject *src)
int PyBuffer_IsContiguous(Py_buffer *view, char fortran)
void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape,
                                    Py_ssize_t *strides, Py_ssize_t itemsize,
                                    char fortran)
int PyBuffer_FillInfo(Py_buffer *view, void *buf,
                      Py_ssize_t len, int readonly, int infoflags)

However, in the existing implementation, there is no PyObject_CopyToObject. So, the significance of this PR is this.

@rruuaanng rruuaanng marked this pull request as draft September 29, 2024 06:48
@rruuaanng rruuaanng changed the title Implement PyObject_CopyToObject in PEP3118 gh-84016: Implement PyObject_CopyToObject in PEP3118 Sep 29, 2024
@rruuaanng rruuaanng changed the title gh-84016: Implement PyObject_CopyToObject in PEP3118 gh-84016: Implement PyObject_CopyToObject in **PEP3118** Sep 29, 2024
@rruuaanng rruuaanng changed the title gh-84016: Implement PyObject_CopyToObject in **PEP3118** gh-84016: Implement PyObject_CopyToObject in PEP3118 Sep 29, 2024
@rruuaanng rruuaanng changed the title gh-84016: Implement PyObject_CopyToObject in PEP3118 gh-84016: Implement the missing PyObject_CopyToObject in PEP3118 Sep 29, 2024
@rruuaanng rruuaanng changed the title gh-84016: Implement the missing PyObject_CopyToObject in PEP3118 gh-84016: Implement missing PyObject_CopyToObject in PEP3118 Sep 29, 2024
@rruuaanng rruuaanng marked this pull request as ready for review September 29, 2024 09:35
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are required for this functionality. You can have a look at how tests are implemented in the Modules/_testcapi folder.

Objects/abstract.c Outdated Show resolved Hide resolved
Objects/abstract.c Outdated Show resolved Hide resolved
Doc/c-api/buffer.rst Outdated Show resolved Hide resolved
Objects/abstract.c Outdated Show resolved Hide resolved
Objects/abstract.c Outdated Show resolved Hide resolved
Objects/abstract.c Outdated Show resolved Hide resolved
@picnixz
Copy link
Contributor

picnixz commented Sep 29, 2024

Please @rruuaanng, before marking a review comment as resolved, either add an emote to indicate whether you made the changes or not, or if you did not make the changes, write a comment to explain why and let the reviewer close the discussion once they think it's resolved.

@rruuaanng
Copy link
Contributor Author

Please @rruuaanng, before marking a review comment as resolved, either add an emote to indicate whether you made the changes or not, or if you did not make the changes, write a comment to explain why and let the reviewer close the discussion once they think it's resolved.

Oh sorry, the incidents I marked resolved have been submitted, but there is no reply.

@rruuaanng
Copy link
Contributor Author

rruuaanng commented Sep 29, 2024

As I commented on the issue, I think we need to get some more evidence that this is actually needed.

I propose an application scenario. If you need to copy the data of a C array to an object with high performance, this seems to be the most direct way, and it also meets the operation interface of PEP3118.

Edit: The second reason is that PEP3118 does not implement this interface, making the standard incomplete. In fact, this makes the coverage of the standard smaller.

@rruuaanng rruuaanng marked this pull request as draft October 1, 2024 03:53
@encukou
Copy link
Member

encukou commented Oct 1, 2024

PEP-3118 is an enhancement proposal, not a standard. If this wasn't done together with the original change in 3.0, it should be treated as a new change.

@rruuaanng
Copy link
Contributor Author

PEP-3118 is an enhancement proposal, not a standard. If this wasn't done together with the original change in 3.0, it should be treated as a new change.

Yes, I think you are right, but this commit I think is very meaningful, because it makes it possible to quickly copy C memory into objects. And I added corresponding documentation, stable instructions and corresponding tests for this (currently being revised).

@rruuaanng rruuaanng marked this pull request as ready for review October 2, 2024 05:56
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General remarks:

  • Several core developers have expressed concern about the usefulness of this API. Even if it is part of a PEP, it would be good to know why it was not implemented with the rest of the API.
  • Is it actually possible to simply use PyBuffer_FromContiguous instead? namely doing everything using a Py_buffer object and then using it? If so, I don't think it's worth the addition of that API.
  • Tests are insufficient IMO. They should cover more cases, e.g., when the buffer is not filled until the end.

Following those observations, I'd like not to be requested for a review until we are sure to add this interface.

Comment on lines +762 to +767
/* just copy it directly through memcpy */
if (fort == 'C' || fort == 'F') {
memcpy(view_obj.buf, buf, len);
PyBuffer_Release(&view_obj);
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is still incorrect; memcpy should only be used if the destination is one-dimensional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the target is multi-dimensional, their memory is still continuous. Refer to PyObject_CopyData for details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but maybe there is some misunderstanding. I suggest reading this post to understand the difference between a C and a F-style ordering (namely row vs column-major order): https://manik.cc/2021/02/25/memory-order.html.

Note that PyObject_CopyData only uses memcpy when the memory is contiguous, and this is checked via PyBuffer_IsContiguous.

@rruuaanng
Copy link
Contributor Author

rruuaanng commented Oct 3, 2024

General remarks:

* Several core developers have expressed concern about the usefulness of this API. Even if it is part of a PEP, it would be good to know why it was not implemented with the rest of the API.

* Is it actually possible to simply use `PyBuffer_FromContiguous` instead? namely doing everything using a `Py_buffer` object and then using it? If so, I don't think it's worth the addition of that API.

* Tests are insufficient IMO. They should cover more cases, e.g., when the buffer is not filled until the end.

Following those observations, I'd like not to be requested for a review until we are sure to add this interface.

I agree with picnixz, because I noticed that there is already a PyBuffer_FromContiguous. So I think my PR may be able to change from a new function to an alias of PyBuffer_FromContiguous, so as to fully implement PEP3118 and reuse its functionality.

Edit: My reasoning for this is that PyBuffer_FromContiguous is a weird name for PyObject_CopyToObject, which is not that intuitive (it’s important that users can directly know from the pep3118 homepage that there is such an API).

Like this

int
PyObject_CopyToObject(PyObject *obj, const void *buf, Py_ssize_t len, char fort)
{
    Py_buffer view_obj;

    if (PyObject_GetBuffer(obj, &view_obj, PyBUF_FULL)) {
        return -1;
    }
    return PyBuffer_FromContiguous(&view_obj, buf, len, fort);
}

Edit: And I personally think that although it is difficult to find the implementation author at present, it seems that the name of PyObject_CopyToObject in the existing implementation is PyBuffer_FromContiguous. I think it may be possible to change this name or use the alias I suggested above.

@rruuaanng
Copy link
Contributor Author

cc @vstinner

@vstinner
Copy link
Member

vstinner commented Oct 3, 2024

@JelleZijlstra:

As I commented on the issue, I think we need to get some more evidence that this is actually needed.

I concur with Jelle. I'm not convinced that we need this feature.


Copy data from *buf* to *obj* buffer. Can convert between C-style and
or Fortran-style buffers (C-style if *fort* is ``'C'`` or Fortran-style
if *fort* is ``'F'``, ``'A'`` or other for defalut quick copy).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if *fort* is ``'F'``, ``'A'`` or other for defalut quick copy).
if *fort* is ``'F'``, ``'A'`` or other for default quick copy).

or Fortran-style buffers (C-style if *fort* is ``'C'`` or Fortran-style
if *fort* is ``'F'``, ``'A'`` or other for defalut quick copy).

``0`` is returned on success, ``-1`` on error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``0`` is returned on success, ``-1`` on error.
Return ``0`` on success, or return ``-1`` on error.

s1 = copy_to_object(bytearray(3), 'abc', b'C')
s2 = copy_to_object(bytearray(3), 'abc', b'F')
s3 = copy_to_object(bytearray(3), 'abc', b'A')
self.assertEqual(s1, s2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to compare it to a known bytearray, such as: bytearray(b'abc').

int result;
char *buf, fort;

if (!PyArg_ParseTuple(args, "Os#c", &obj, &buf, &len, &fort)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not expect Unicode, but raw bytes:

Suggested change
if (!PyArg_ParseTuple(args, "Os#c", &obj, &buf, &len, &fort)) {
if (!PyArg_ParseTuple(args, "Oy#c", &obj, &buf, &len, &fort)) {

@@ -2283,6 +2283,8 @@
added = '3.11'
[function.PyObject_CopyData]
added = '3.11'
[function.PyObject_CopyToObject]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add this function directly to the stable ABI: remove this entry.

self.assertRaises(BufferError, copy_to_object, bytearray(2), 'abc', b'A')
self.assertRaises(TypeError, copy_to_object, list(), 'abc', b'C')
self.assertRaises(TypeError, copy_to_object, list(), 'abc', b'F')
self.assertRaises(TypeError, copy_to_object, list(), 'abc', b'A')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test read-only buffer.

@vstinner
Copy link
Member

vstinner commented Oct 3, 2024

@rruuaanng: I suggest you to put the implementation on hold until we decide if this function is really needed. Instead, you can answer to @JelleZijlstra questions. See also his question on the issue.

@rruuaanng
Copy link
Contributor Author

I have already given the answer to this question above. And wait for the results of the core discussion.

@vstinner
Copy link
Member

vstinner commented Oct 3, 2024

@rruuaanng:

I have already given the answer to this question above. And wait for the results of the core discussion.

I'm thinking at this question of @JelleZijlstra:

If you'd like this function implemented, could you provide examples of existing code (e.g., in existing C extensions) that would be made better by this function?

I didn't see any convincing example so far.

@rruuaanng
Copy link
Contributor Author

rruuaanng commented Oct 3, 2024

@rruuaanng:

I have already given the answer to this question above. And wait for the results of the core discussion.

I'm thinking at this question of @JelleZijlstra:

If you'd like this function implemented, could you provide examples of existing code (e.g., in existing C extensions) that would be made better by this function?

I didn't see any convincing example so far.

It seems that I didn’t see this issue. However, that’s fine. My response to the issue is that if the user needs to directly write to an already encapsulated PyObject, rather than obtaining a buffer through PyObject_GetBuffer and performing other operations, they can directly choose the API implemented in this PR. It provides an intuitive and fast way to write the desired data into a PyObject.
(I would like to reiterate that this PR is a supplement to the missing API in PEP 3118. Although there is a function PyBuffer_FromContiguous for operating on Py_buffer, it still requires using PyObject_GetBuffer rather than the quick and direct operation I mentioned earlier).

I'm actually planning to modify my PR to make it an alias, which would fully implement PEP 3118 while reusing existing functionality to make the API more expressive.

int
PyObject_CopyToObject(PyObject *obj, const void *buf, Py_ssize_t len, char fort)
{
    Py_buffer view_obj;

    if (PyObject_GetBuffer(obj, &view_obj, PyBUF_FULL)) {
        return -1;
    }
    return PyBuffer_FromContiguous(&view_obj, buf, len, fort);
}

Edit: I think this makes sense for users, because intuitively, most programmers will choose XxxToXxx for writing xxx to xxx, rather than implementing a combination of multiple operations.

Edit:Please give me some time if you can. I'm looking for a stronger proof.

@rruuaanng
Copy link
Contributor Author

rruuaanng commented Oct 4, 2024

In pygl/src/buffers/buffer.c

static PyObject *buf_read(PyBuffer *self, PyObject *args, PyObject *kwargs)
{
    static char *kwNames[] = {"out", "size", "offset", NULL};

    PyObject *result = NULL;

    Py_buffer outBuffer = {0};
    GLsizeiptr size = 0;
    GLintptr offset = 0;
    if (!PyArg_ParseTupleAndKeywords(
            args, kwargs, "w*L|L", kwNames,
            &outBuffer, &size, &offset))
        goto end;

    if (!utils_check_buffer_contiguous(&outBuffer))
        goto end;

    THROW_IF_GOTO(
        size + offset > self->size,
        PyExc_ValueError,
        "Requested size and offset exceed buffer size.",
        end);

    if (size > outBuffer.len)
    {
        PyErr_Format(PyExc_ValueError, "Buffer that was passed in is smaller than requested size. (buffer size: %zu, requested: %zu)", outBuffer.len, size);
        goto end;
    }

    if (self->flags & GL_DYNAMIC_STORAGE_BIT)
        glGetNamedBufferSubData(self->id, offset, size, outBuffer.buf);
    else
    {
        THROW_IF_GOTO(
            !(self->flags & GL_MAP_READ_BIT),
            PyExc_RuntimeError,
            "Mappable buffer has to have flag MAP_READ_BIT set to allow reading.",
            end);

        THROW_IF_GOTO(
            self->dataPtr == NULL,
            PyExc_RuntimeError,
            "Mappable buffer has to be mapped before attempting to read data. Map buffer or use MAP_PERSISTENT_BIT.",
            end);

        int result = PyBuffer_FromContiguous(&outBuffer, (char *)self->dataPtr + offset, size, 'C');
        THROW_IF_GOTO(
            result == -1,
            PyExc_RuntimeError,
            "Couldn't read data from buffer.",
            end);
    }

    result = Py_NewRef(Py_None);

end:
    if (outBuffer.buf != NULL)
        PyBuffer_Release(&outBuffer);

    return result;
}

The following it's my change

static PyObject *buf_read(PyBuffer *self, PyObject *args, PyObject *kwargs)
{
    ...
    if (!PyArg_ParseTupleAndKeywords(
            args, kwargs, "OL|L", kwNames,
            &obj, &size, &offset))
        goto end;
    
    ...

    PyObject_GetBuffer(&obj, &view_obj, PyBUF_FULL);
    if (self->flags & GL_DYNAMIC_STORAGE_BIT) {
        glGetNamedBufferSubData(self->id, offset, size, view_obj.buf);
    }
    PyObject_CopyToObject(obj, (char *)self->dataPtr + offset, size, 'C');
    ...
}

Hmmm, this example doesn't seem to be the most appropriate, but I did look into buffer operations in libraries such as NumPy and SciPy. It seems that they use their own implemented APIs instead of the API provided by CPython. For the modified function I provided (although the code is really not good, please ignore the other parts I did not change and only focus on the function return parameters), I believe this simplifies a lot of redundant checks (such as continuity and buffer size). I want to emphasize that users do not need to write multiple function combinations; they can just use one function. In addition, the buffer it returns is PyObject instead of Py_buffer, which makes the code more general (it seems a bit far-fetched, but I'm sorry, I really can't find other examples)

@erlend-aasland
Copy link
Contributor

@rruuaanng, instead of prematurely opening a PR, wait for consensus on Discourse or on the issue before proposing a change. I'm closing this as of the comments from several core devs and triagers. Feel free to re-open this PR on your on your own fork. You can reopen it here if there is consensus on implementing this feature.

Thanks!

@rruuaanng rruuaanng deleted the gh84016 branch October 15, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants