-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] gh-123358: Update LOAD_DEREF to use stackref and atomic incref #124463
base: main
Are you sure you want to change the base?
Conversation
@Fidget-Spinner I am still working on :) I will ping you once I am ready! |
{ | ||
PyObject *value; | ||
#ifdef Py_GIL_DISABLED | ||
value = _Py_atomic_load_ptr(&cell->ob_ref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colesbury Let's extract the utility function with separated PRs.
ec7594a
to
f573fa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fidget-Spinner
Ken, Would you like to take a look at overall approach for both stack effect and stackref side?
I am not sure that I do correctly so before checking why Windows free-threaded is failed, I would like to get rough review.
Thank you for understanding.
Include/internal/pycore_cell.h
Outdated
FT_ATOMIC_STORE_PTR_RELEASE(old_value, cell->ob_ref); | ||
FT_ATOMIC_STORE_PTR_RELEASE(cell->ob_ref, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little suspicious. Why do we need the atomics when there's the critical section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check,, yeah we may don't need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SUMMARY: ThreadSanitizer: data race /home/runner/work/cpython/cpython/./Include/internal/pycore_cell.h:23:18 in PyCell_SwapTakeRef
Maybe TSAN thought that they could be memory ordering issue.
@colesbury, Do you think that this would be false alarm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the second atomic write here (to cell->ob_ref
), but not the first one. In other words, something like:
old_value = cell->ob_ref;
FT_ATOMIC_STORE_PTR_RELEASE(cell->ob_ref, value);
Although the write happens in the critical section, the read happens (with this PR) outside of the lock. This is the same pattern we use in dictionaries :
Lines 278 to 279 in da5855e
#define STORE_KEY(ep, key) FT_ATOMIC_STORE_PTR_RELEASE(ep->me_key, key) | |
#define STORE_VALUE(ep, value) FT_ATOMIC_STORE_PTR_RELEASE(ep->me_value, value) |
I suspect that there would be a deadlock issue at the Windows build. I am under investigation. |
Don't think so. |
@@ -41,6 +41,8 @@ race_top:tstate_is_freed | |||
race_top:type_modified_unlocked | |||
race_top:write_thread_id | |||
race_top:PyThreadState_Clear | |||
# see: https://github.com/python/cpython/issues/117721 | |||
race_top:lock_PyThread_release_lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colesbury found that there is a possible issue from
cpython/Lib/test/lock_tests.py
Lines 117 to 121 in 5e7eba0
def wait_phase(self, phase, expected): | |
for _ in support.sleeping_retry(support.SHORT_TIMEOUT): | |
if len(phase) >= expected: | |
break | |
self.assertEqual(len(phase), expected) |
5f4e42c
to
d527e0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to use the extra indirection of passing a pointer to a stack ref
Include/internal/pycore_cell.h
Outdated
@@ -42,6 +43,31 @@ PyCell_GetRef(PyCellObject *cell) | |||
return res; | |||
} | |||
|
|||
static inline void | |||
_PyCell_GetStackRef(PyCellObject *cell, _PyStackRef *value_addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_PyCell_GetStackRef(PyCellObject *cell, _PyStackRef *value_addr) | |
_PyStackRef _PyCell_GetStackRef(PyCellObject *cell) |
I think that this can return the new _PyStackRef
directly instead of via pointer, since there can be no garbage collection in this function.
Python/bytecodes.c
Outdated
@@ -1647,14 +1647,13 @@ dummy_func( | |||
value = PyStackRef_FromPyObjectSteal(value_o); | |||
} | |||
|
|||
inst(LOAD_DEREF, ( -- value)) { | |||
inst(LOAD_DEREF, ( -- value[1])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be left as scalar, for the reason given above.
@@ -16,7 +16,6 @@ extern "C" { | |||
#include "pycore_pystate.h" // _PyInterpreterState_GET() | |||
#include "pycore_typeid.h" // _PyType_IncrefSlow | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if in _Py_TryIncRefShared
we could just do an atomic add to the ob_ref_shared
if the object is already marked as weakref'd? E.g:
Py_ssize_t shared = _Py_atomic_load_ssize_relaxed(&op->ob_ref_shared);
for (;;) {
if ((shared & _Py_REF_SHARED_FLAG_MASK) == _Py_REF_MAYBE_WEAKREF) {
_Py_atomic_add_ssize(&op->ob_ref_shared, (1 << _Py_REF_SHARED_SHIFT));
return 1;
}
}
That might avoid the threads spinning on updating the value when the value is in heavy contention and instead all of the atomic operations should just slowly proceed.
d56c823
to
25d3f9e
Compare
cell
objects (LOAD_DEREF
) do not scale well #123358