-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-84016: Implement missing PyObject_CopyToObject
in PEP3118
#124747
Conversation
PyObject_CopyToObject
in PEP3118PyObject_CopyToObject
in PEP3118
PyObject_CopyToObject
in PEP3118PyObject_CopyToObject
in **PEP3118**
PyObject_CopyToObject
in **PEP3118**PyObject_CopyToObject
in PEP3118
PyObject_CopyToObject
in PEP3118
PyObject_CopyToObject
in PEP3118
PyObject_CopyToObject
in PEP3118PyObject_CopyToObject
in PEP3118
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are required for this functionality. You can have a look at how tests are implemented in the Modules/_testcapi
folder.
Misc/NEWS.d/next/C_API/2024-09-29-14-34-53.gh-issue-84016.f1ThMN.rst
Outdated
Show resolved
Hide resolved
Please @rruuaanng, before marking a review comment as resolved, either add an emote to indicate whether you made the changes or not, or if you did not make the changes, write a comment to explain why and let the reviewer close the discussion once they think it's resolved. |
Oh sorry, the incidents I marked resolved have been submitted, but there is no reply. |
I propose an application scenario. If you need to copy the data of a C array to an object with high performance, this seems to be the most direct way, and it also meets the operation interface of PEP3118. Edit: The second reason is that PEP3118 does not implement this interface, making the standard incomplete. In fact, this makes the coverage of the standard smaller. |
PEP-3118 is an enhancement proposal, not a standard. If this wasn't done together with the original change in 3.0, it should be treated as a new change. |
Yes, I think you are right, but this commit I think is very meaningful, because it makes it possible to quickly copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General remarks:
- Several core developers have expressed concern about the usefulness of this API. Even if it is part of a PEP, it would be good to know why it was not implemented with the rest of the API.
- Is it actually possible to simply use
PyBuffer_FromContiguous
instead? namely doing everything using aPy_buffer
object and then using it? If so, I don't think it's worth the addition of that API. - Tests are insufficient IMO. They should cover more cases, e.g., when the buffer is not filled until the end.
Following those observations, I'd like not to be requested for a review until we are sure to add this interface.
/* just copy it directly through memcpy */ | ||
if (fort == 'C' || fort == 'F') { | ||
memcpy(view_obj.buf, buf, len); | ||
PyBuffer_Release(&view_obj); | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is still incorrect; memcpy
should only be used if the destination is one-dimensional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the target is multi-dimensional, their memory is still continuous. Refer to PyObject_CopyData for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but maybe there is some misunderstanding. I suggest reading this post to understand the difference between a C and a F-style ordering (namely row vs column-major order): https://manik.cc/2021/02/25/memory-order.html.
Note that PyObject_CopyData only uses memcpy when the memory is contiguous, and this is checked via PyBuffer_IsContiguous
.
I agree with picnixz, because I noticed that there is already a Edit: My reasoning for this is that Like this
Edit: And I personally think that although it is difficult to find the implementation author at present, it seems that the name of |
cc @vstinner |
I concur with Jelle. I'm not convinced that we need this feature. |
|
||
Copy data from *buf* to *obj* buffer. Can convert between C-style and | ||
or Fortran-style buffers (C-style if *fort* is ``'C'`` or Fortran-style | ||
if *fort* is ``'F'``, ``'A'`` or other for defalut quick copy). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if *fort* is ``'F'``, ``'A'`` or other for defalut quick copy). | |
if *fort* is ``'F'``, ``'A'`` or other for default quick copy). |
or Fortran-style buffers (C-style if *fort* is ``'C'`` or Fortran-style | ||
if *fort* is ``'F'``, ``'A'`` or other for defalut quick copy). | ||
|
||
``0`` is returned on success, ``-1`` on error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``0`` is returned on success, ``-1`` on error. | |
Return ``0`` on success, or return ``-1`` on error. |
s1 = copy_to_object(bytearray(3), 'abc', b'C') | ||
s2 = copy_to_object(bytearray(3), 'abc', b'F') | ||
s3 = copy_to_object(bytearray(3), 'abc', b'A') | ||
self.assertEqual(s1, s2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to compare it to a known bytearray, such as: bytearray(b'abc')
.
int result; | ||
char *buf, fort; | ||
|
||
if (!PyArg_ParseTuple(args, "Os#c", &obj, &buf, &len, &fort)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not expect Unicode, but raw bytes:
if (!PyArg_ParseTuple(args, "Os#c", &obj, &buf, &len, &fort)) { | |
if (!PyArg_ParseTuple(args, "Oy#c", &obj, &buf, &len, &fort)) { |
@@ -2283,6 +2283,8 @@ | |||
added = '3.11' | |||
[function.PyObject_CopyData] | |||
added = '3.11' | |||
[function.PyObject_CopyToObject] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add this function directly to the stable ABI: remove this entry.
self.assertRaises(BufferError, copy_to_object, bytearray(2), 'abc', b'A') | ||
self.assertRaises(TypeError, copy_to_object, list(), 'abc', b'C') | ||
self.assertRaises(TypeError, copy_to_object, list(), 'abc', b'F') | ||
self.assertRaises(TypeError, copy_to_object, list(), 'abc', b'A') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should test read-only buffer.
@rruuaanng: I suggest you to put the implementation on hold until we decide if this function is really needed. Instead, you can answer to @JelleZijlstra questions. See also his question on the issue. |
I have already given the answer to this question above. And wait for the results of the core discussion. |
I'm thinking at this question of @JelleZijlstra:
I didn't see any convincing example so far. |
It seems that I didn’t see this issue. However, that’s fine. My response to the issue is that if the user needs to directly write to an already encapsulated PyObject, rather than obtaining a buffer through I'm actually planning to modify my PR to make it an alias, which would fully implement PEP 3118 while reusing existing functionality to make the API more expressive. int
PyObject_CopyToObject(PyObject *obj, const void *buf, Py_ssize_t len, char fort)
{
Py_buffer view_obj;
if (PyObject_GetBuffer(obj, &view_obj, PyBUF_FULL)) {
return -1;
}
return PyBuffer_FromContiguous(&view_obj, buf, len, fort);
} Edit: I think this makes sense for users, because intuitively, most programmers will choose XxxToXxx for writing Edit:Please give me some time if you can. I'm looking for a stronger proof. |
In
The following it's my change
Hmmm, this example doesn't seem to be the most appropriate, but I did look into buffer operations in libraries such as NumPy and SciPy. It seems that they use their own implemented APIs instead of the API provided by CPython. For the modified function I provided (although the code is really not good, please ignore the other parts I did not change and only focus on the function return parameters), I believe this simplifies a lot of redundant checks (such as continuity and buffer size). I want to emphasize that users do not need to write multiple function combinations; they can just use one function. In addition, the buffer it returns is PyObject instead of Py_buffer, which makes the code more general (it seems a bit far-fetched, but I'm sorry, I really can't find other examples) |
@rruuaanng, instead of prematurely opening a PR, wait for consensus on Discourse or on the issue before proposing a change. I'm closing this as of the comments from several core devs and triagers. Feel free to re-open this PR on your on your own fork. You can reopen it here if there is consensus on implementing this feature. Thanks! |
https://peps.python.org/pep-3118/
PEP3118 specifies the following API requirements
However, in the existing implementation, there is no
PyObject_CopyToObject
. So, the significance of this PR is this.