Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-API for signalling monitoring events #111997

Closed
scoder opened this issue Nov 12, 2023 · 25 comments
Closed

C-API for signalling monitoring events #111997

scoder opened this issue Nov 12, 2023 · 25 comments
Assignees
Labels
3.13 bugs and security fixes deferred-blocker topic-C-API type-feature A feature request or enhancement

Comments

@scoder
Copy link
Contributor

scoder commented Nov 12, 2023

Feature or enhancement

Proposal:

Language implementations for the CPython runtime (Cython, JIT compilers, regular expression engines, template engines, etc.) need a way to signal PEP-669 monitoring events to the registered listeners.

  1. We need a way to create events and inject them into the monitoring system. Since events have more than one signature, we might end up needing more than one C-API function for this.

  2. We need a way to map 3D source code positions (file name, line, character) to 1D integer offsets. Code objects help, but branches might cross source file boundaries, so that's more tricky. For many use cases, a mapping between (line, character) positions and an integer offset would probably suffice, although templating languages usually provide some kind of include commands, as does Cython. There should be some help in the C-API for building up such a mapping.

The reason why I think we need CPython's help for mapping indices is that both sides, listeners and even producers, need to agree on the same mapping. Sadly, the events don't include line/character positions directly but only a single integer. So event producers need a way to produce a number that event listeners like coverage analysers can map back to a source code position.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/61?u=scoder

Linked PRs

@scoder scoder added the type-feature A feature request or enhancement label Nov 12, 2023
@scoder
Copy link
Contributor Author

scoder commented Nov 12, 2023

@vstinner
Copy link
Member

Do you want to propose an API for that?

@drdavella
Copy link

PEP-669 seems to discuss a mechanism to modify the bytestream for the purpose of implementing monitoring. Would access to this mechanism also be provided as part of this API? I haven't seen too much mention of this feature elsewhere.

Source: https://peps.python.org/pep-0669/#rationale

The quickening mechanism provided by PEP 659 provides a way to dynamically modify executing Python bytecode. These modifications have little cost beyond the parts of the code that are modified and a relatively low cost to those parts that are modified. We can leverage this to provide an efficient mechanism for monitoring that was not possible in 3.10 or earlier.

IMO it would be very useful for an API of this nature to expose low-level interpreter state to monitoring tools. For example, if we are able to see events like INSTRUCTION, it would also be useful to have access to the operands involved.

@scoder
Copy link
Contributor Author

scoder commented Nov 21, 2023

Do you want to propose an API for that?

In order to map source/line/column information to an offset, we need a way to build a co_linetable for a code object: https://github.com/python/cpython/blob/main/Objects/locations.md
That's probably the most complex thing to do here. IIUC, CPython implements this in C here:

cpython/Python/assemble.c

Lines 190 to 444 in d67f947

/* Code location emitting code. See locations.md for a description of the format. */
#define MSB 0x80
static void
write_location_byte(struct assembler* a, int val)
{
PyBytes_AS_STRING(a->a_linetable)[a->a_location_off] = val&255;
a->a_location_off++;
}
static uint8_t *
location_pointer(struct assembler* a)
{
return (uint8_t *)PyBytes_AS_STRING(a->a_linetable) +
a->a_location_off;
}
static void
write_location_first_byte(struct assembler* a, int code, int length)
{
a->a_location_off += write_location_entry_start(
location_pointer(a), code, length);
}
static void
write_location_varint(struct assembler* a, unsigned int val)
{
uint8_t *ptr = location_pointer(a);
a->a_location_off += write_varint(ptr, val);
}
static void
write_location_signed_varint(struct assembler* a, int val)
{
uint8_t *ptr = location_pointer(a);
a->a_location_off += write_signed_varint(ptr, val);
}
static void
write_location_info_short_form(struct assembler* a, int length, int column, int end_column)
{
assert(length > 0 && length <= 8);
int column_low_bits = column & 7;
int column_group = column >> 3;
assert(column < 80);
assert(end_column >= column);
assert(end_column - column < 16);
write_location_first_byte(a, PY_CODE_LOCATION_INFO_SHORT0 + column_group, length);
write_location_byte(a, (column_low_bits << 4) | (end_column - column));
}
static void
write_location_info_oneline_form(struct assembler* a, int length, int line_delta, int column, int end_column)
{
assert(length > 0 && length <= 8);
assert(line_delta >= 0 && line_delta < 3);
assert(column < 128);
assert(end_column < 128);
write_location_first_byte(a, PY_CODE_LOCATION_INFO_ONE_LINE0 + line_delta, length);
write_location_byte(a, column);
write_location_byte(a, end_column);
}
static void
write_location_info_long_form(struct assembler* a, location loc, int length)
{
assert(length > 0 && length <= 8);
write_location_first_byte(a, PY_CODE_LOCATION_INFO_LONG, length);
write_location_signed_varint(a, loc.lineno - a->a_lineno);
assert(loc.end_lineno >= loc.lineno);
write_location_varint(a, loc.end_lineno - loc.lineno);
write_location_varint(a, loc.col_offset + 1);
write_location_varint(a, loc.end_col_offset + 1);
}
static void
write_location_info_none(struct assembler* a, int length)
{
write_location_first_byte(a, PY_CODE_LOCATION_INFO_NONE, length);
}
static void
write_location_info_no_column(struct assembler* a, int length, int line_delta)
{
write_location_first_byte(a, PY_CODE_LOCATION_INFO_NO_COLUMNS, length);
write_location_signed_varint(a, line_delta);
}
#define THEORETICAL_MAX_ENTRY_SIZE 25 /* 1 + 6 + 6 + 6 + 6 */
static int
write_location_info_entry(struct assembler* a, location loc, int isize)
{
Py_ssize_t len = PyBytes_GET_SIZE(a->a_linetable);
if (a->a_location_off + THEORETICAL_MAX_ENTRY_SIZE >= len) {
assert(len > THEORETICAL_MAX_ENTRY_SIZE);
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_linetable, len*2));
}
if (loc.lineno < 0) {
write_location_info_none(a, isize);
return SUCCESS;
}
int line_delta = loc.lineno - a->a_lineno;
int column = loc.col_offset;
int end_column = loc.end_col_offset;
assert(column >= -1);
assert(end_column >= -1);
if (column < 0 || end_column < 0) {
if (loc.end_lineno == loc.lineno || loc.end_lineno == -1) {
write_location_info_no_column(a, isize, line_delta);
a->a_lineno = loc.lineno;
return SUCCESS;
}
}
else if (loc.end_lineno == loc.lineno) {
if (line_delta == 0 && column < 80 && end_column - column < 16 && end_column >= column) {
write_location_info_short_form(a, isize, column, end_column);
return SUCCESS;
}
if (line_delta >= 0 && line_delta < 3 && column < 128 && end_column < 128) {
write_location_info_oneline_form(a, isize, line_delta, column, end_column);
a->a_lineno = loc.lineno;
return SUCCESS;
}
}
write_location_info_long_form(a, loc, isize);
a->a_lineno = loc.lineno;
return SUCCESS;
}
static int
assemble_emit_location(struct assembler* a, location loc, int isize)
{
if (isize == 0) {
return SUCCESS;
}
while (isize > 8) {
RETURN_IF_ERROR(write_location_info_entry(a, loc, 8));
isize -= 8;
}
return write_location_info_entry(a, loc, isize);
}
static int
assemble_location_info(struct assembler *a, instr_sequence *instrs,
int firstlineno)
{
a->a_lineno = firstlineno;
location loc = NO_LOCATION;
int size = 0;
for (int i = 0; i < instrs->s_used; i++) {
instruction *instr = &instrs->s_instrs[i];
if (!same_location(loc, instr->i_loc)) {
RETURN_IF_ERROR(assemble_emit_location(a, loc, size));
loc = instr->i_loc;
size = 0;
}
size += instr_size(instr);
}
RETURN_IF_ERROR(assemble_emit_location(a, loc, size));
return SUCCESS;
}
static void
write_instr(_Py_CODEUNIT *codestr, instruction *instr, int ilen)
{
int opcode = instr->i_opcode;
assert(!IS_PSEUDO_INSTR(opcode));
int oparg = instr->i_oparg;
assert(OPCODE_HAS_ARG(opcode) || oparg == 0);
int caches = _PyOpcode_Caches[opcode];
switch (ilen - caches) {
case 4:
codestr->op.code = EXTENDED_ARG;
codestr->op.arg = (oparg >> 24) & 0xFF;
codestr++;
/* fall through */
case 3:
codestr->op.code = EXTENDED_ARG;
codestr->op.arg = (oparg >> 16) & 0xFF;
codestr++;
/* fall through */
case 2:
codestr->op.code = EXTENDED_ARG;
codestr->op.arg = (oparg >> 8) & 0xFF;
codestr++;
/* fall through */
case 1:
codestr->op.code = opcode;
codestr->op.arg = oparg & 0xFF;
codestr++;
break;
default:
Py_UNREACHABLE();
}
while (caches--) {
codestr->op.code = CACHE;
codestr->op.arg = 0;
codestr++;
}
}
/* assemble_emit_instr()
Extend the bytecode with a new instruction.
Update lnotab if necessary.
*/
static int
assemble_emit_instr(struct assembler *a, instruction *instr)
{
Py_ssize_t len = PyBytes_GET_SIZE(a->a_bytecode);
_Py_CODEUNIT *code;
int size = instr_size(instr);
if (a->a_offset + size >= len / (int)sizeof(_Py_CODEUNIT)) {
if (len > PY_SSIZE_T_MAX / 2) {
return ERROR;
}
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_bytecode, len * 2));
}
code = (_Py_CODEUNIT *)PyBytes_AS_STRING(a->a_bytecode) + a->a_offset;
a->a_offset += size;
write_instr(code, instr, size);
return SUCCESS;
}
static int
assemble_emit(struct assembler *a, instr_sequence *instrs,
int first_lineno, PyObject *const_cache)
{
RETURN_IF_ERROR(assemble_init(a, first_lineno));
for (int i = 0; i < instrs->s_used; i++) {
instruction *instr = &instrs->s_instrs[i];
RETURN_IF_ERROR(assemble_emit_instr(a, instr));
}
RETURN_IF_ERROR(assemble_location_info(a, instrs, a->a_lineno));
RETURN_IF_ERROR(assemble_exception_table(a, instrs));
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_except_table, a->a_except_table_off));
RETURN_IF_ERROR(_PyCompile_ConstCacheMergeOne(const_cache, &a->a_except_table));
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_linetable, a->a_location_off));
RETURN_IF_ERROR(_PyCompile_ConstCacheMergeOne(const_cache, &a->a_linetable));
RETURN_IF_ERROR(_PyBytes_Resize(&a->a_bytecode, a->a_offset * sizeof(_Py_CODEUNIT)));
RETURN_IF_ERROR(_PyCompile_ConstCacheMergeOne(const_cache, &a->a_bytecode));
return SUCCESS;
}

That's purely internal code. I don't think we need C-API support for this, but a Python implementation would be nice. Or a static method on code objects to build the byte string from a Python list of positions. Given the complexity of the format and the risk of future changes, I think it's in CPython's responsibility to generate this format from something user friendly.

compile.h defines the location like this:

/* source location information */
typedef struct {
    int lineno;
    int end_lineno;
    int col_offset;
    int end_col_offset;
} _PyCompilerSrcLocation;

This looks like a function could help that accepts a sequence of tuples of these four values and maps it to a bytes object with the corresponding line table in it.

We'd probably still have to reimplement it in Cython in order to generate the line table string in "older" Python versions. But we could at least rely on it from CPython 3.13 on.

@scoder
Copy link
Contributor Author

scoder commented Nov 22, 2023

@markshannon, could you please comment whether you consider this the right approach? To me, a line table seems required to generate source level monitoring events.

@scoder
Copy link
Contributor Author

scoder commented Dec 14, 2023

@markshannon any comments? Given that Python 3.12 broke an entire C-API feature, I consider it a blocker for 3.13 to fix it.

@markshannon
Copy link
Member

What C-API was broken?
AFAICT there was no API to call the tstate->c_tracefunc and tstate->c_profilefunc functions.
How were you calling those functions, and why does it no longer work?

@markshannon
Copy link
Member

C extensions can already register callbacks and set events using the Python API, so I assume we are talking about adding an API for C extensions to fire events.

There will be two parts to such an API:

  1. Specifying the mapping from "code objects" and offsets to full locations.
  2. Firing events from C extension code.

Handling locations

Apart from LINE events, all event callback functions include code: CodeType, offset: int. Let's not force C extensions to build a full code objects, so we should relax code to be a "code like" object. We define exactly what "code like" means later.

We also need a specification for the table mapping offsets to locations. The existing table format is quite fiddly, and can only be searched linearly, so designing a new table format might be a good idea.

Firing events from C extension code

Any potential API has to be more complex than for sys.settrace as we need to handle disabling of event locations and multiple tools.

We will need 16 bits of data for each event location, to track active tools and which tool/location pairs are disabled.

We will need to initialize this data, and to specify which event location maps to which event.

With that in mind, and using the PY_START event as an example, an API could look something like this:

typedef struct _PyMonitoringState {
    uint8_t active;
    uint8_t opaque;
} PyMonitoringState;

typedef struct _PyCodeLikeObject PyCodeLikeObject;

/* Returns a version, which should be stored and passed to the next call of _PyMonitioringEnterScope.
   Arguments:
       previous_version: Points to a per code-like object value. The value must be set to 0 before first call per code-like object.
       state_array: The array of all PyMonitoringStates for this code-like object. 
                    Should be initialized to all zeroes before first call.
       event_types: Array of the event types describing the event type for each state in state array.
       length: The length of the state and event arrays, which must be the same.
*/
void _PyMonitoringEnterScope(uint64_t *previous_version, PyMonitoringState *state_array, uint8_t event_types, uint32_t length);

int _PyMonitoring_FirePyStartEvent(PyCodeLikeObject *codelike, uint32_t offset, PyMonitoringState *state);

If performance is going to be an issue, then the above could be implemented as inline functions.
We would lose ABI compatibility, but would still retain API compatibility across CPython versions.

Example usage.

Suppose we have a code-like object that that should fire two events when called, PY_START then PY_RETURN.

We would need to describe the events:

static const uint8_t EVENTS = { PY_START, PY_RETURN };

And the code-like object would need an array for the state and a version number.

struct MyCodeLike {
    PyObject_HEAD
    PyMonitoringState monitoring_state[2];
    uint64_t version;
    /* Other fields */
};

The C code for the function:

void init_code(MyCodeLike *code)
{
     code->monitoring_state = {0, 0};
     code->version = 0;
}

PyObject *call_code(MyCodeLike *code)
{
    _PyMonitoringEnterScope(&code->version, code->monitoring_state, EVENTS, 2);
    ...
    _PyMonitoring_FirePyStartEvent(code, 0, &code->monitoring_state[0]);
    ...
    _PyMonitoring_FirePyReturnEvent(code, 1, &code->monitoring_state[1]);
    ...
}

@encukou
Copy link
Member

encukou commented Feb 6, 2024

@scoder, would such API work for you?

@scoder
Copy link
Contributor Author

scoder commented Feb 22, 2024

Hmm, looks like I didn't receive any email notifications for this ticket although I created it, am subscribed to it, and got explicitly mentioned. No idea why. Sorry. I've re-subscribed, just in case.

@markshannon, I've done some guesswork to fill in the gaps of your proposal. Please correct me if I misunderstood something.

  • The version is essentially a global counter of listener registrations which lead to different event types being handled (or not) over time.
  • Each monitoring state refers to a specific "code position" that can produce events. The corresponding event type is given by the EVENTS array at the same index as the entry in the monitoring state array.
  • The EnterScope call will look at the version, and if it was updated since the last call with these states, re-initialises the monitoring state array, marking currently audited events as active based on their event type. Otherwise, returns quickly.
  • Each Fire call then receives the code-like and essentially an integer ID as offset for a given "code position", which refers to a specific entry in the monitoring state array.
  • The Fire function looks if the monitoring state is marked as active and if so, creates and pushes an event with that offset ID to the listeners.

Comments from me:

  • I fail to see why you need both the pointer to the array entry and its offset in the Fire calls.
  • The code-like struct would need a file path field as well, to map events to source files (if applicable).
  • Lines are missing from the API so far. They could be stored as a separate array in MyCodeLike, preferably as a pointer since they won't always be used (not every such code-like would refer to a physically existing file).
  • How will listeners know the file and line number from the notification?
  • It looks like MyCodeLike could just be zeroed out with memset() for initialisation. This could be done by a C-API function, given the number of states in the struct. Basically, CPython could provide your init_code() function as PyMonitoring_CreateCodeLike(int states, const int** line_numbers. PyObject *source_file_path). Maybe with a simpler "no file nor lines" variant that only takes the number of states.

@markshannon
Copy link
Member

The version is essentially a global counter of listener registrations which lead to different event types being handled (or not) over time.

More or less, yes. It is changed when the set of events being monitored changes. Listener registrations do not change it.

Each Fire call then receives the code-like and essentially an integer ID as offset for a given "code position", which refers to a specific entry in the monitoring state array.

The ID also serves as an index into the locations table.

The Fire function looks if the monitoring state is marked as active and if so, creates and pushes an event with that offset ID to the listeners.

Broadly, yes. There are up to 8 possible listeners for events, so if the state is non-zero it represents a bit vector of active listeners.

I fail to see why you need both the pointer to the array entry and its offset in the Fire calls.

I'm assuming that the layout of the code-like struct is opaque to the VM, it is up to Cython/pybind11/mypyc, etc. how to lay it out, so we can't deduce the offset from the pointer to the monitoring state. Thus we need both.

How will listeners know the file and line number from the notification?"

From the (as yet unspecified) location table and the offset ID.

The code-like struct would need a file path field as well, to map events to source files (if applicable).

Would one location per code-like object be enough?

It looks like MyCodeLike could just be zeroed out with memset() for initialisation.

MyCodeLike would include location information, but we can design the monitoring state struct so that it should be initialized to zero.

@scoder
Copy link
Contributor Author

scoder commented Feb 26, 2024

Would one location per code-like object be enough?

Probably not. If the API requires a "start scope", "work in scope", "end scope" kind of state keeping, then Cython's include statement would get in the way, which allows injecting source code from other files. And include features are probably common in templating languages etc.

While it's probably possible for user code to juggle with different code-like objects for a single code scope, it would be nicer if the API kept code scope and file position independent from each other.

@iritkatriel
Copy link
Member

typedef struct _PyCodeLikeObject PyCodeLikeObject;

What is _PyCodeLikeObject and where is it defined? I'm confused by this and the later discussions on MyCodeLike (which is user-defined).

@iritkatriel iritkatriel self-assigned this Mar 4, 2024
@markshannon
Copy link
Member

What is _PyCodeLikeObject and where is it defined? I'm confused by this and the later discussions on MyCodeLike (which is user-defined).

It is defined by the third-party code (generator).
It can be any object, but will need to support certain attributes and methods so that it looks like a Code object to coverage, profile, inspect, etc.

@iritkatriel
Copy link
Member

iritkatriel commented Mar 4, 2024

  • The EnterScope call will look at the version, and if it was updated since the last call with these states, re-initialises the monitoring state array, marking currently audited events as active based on their event type. Otherwise, returns quickly.

How does it determine that the version changed since last time (where does it store the previous version)?

(I got an answer offline)

@gvanrossum
Copy link
Member

@iritkatriel Is this still a (deferred) release blocker?

@scoder
Copy link
Contributor Author

scoder commented May 19, 2024

Regarding the interaction with the existing sys.set_trace() API, I tried running coverage 7.5.1 which still uses the old tracing API. I sent a PR to fix the listener arguments when generating of LINE events in #119179

After that, I got assertion failures in CPython when generating LINE events in sys_trace_line_func, file legacy_tracing.c:

    assert(args[0] == (PyObject *)_PyFrame_GetCode(frame->f_frame));

The old API depended on frames to provide line information, the new one doesn't. The effect is that event creators now need to know whether there are listeners of the old API (which they probably can't know), or they always need to set up frames just in case, which is quite an annoyance given the new shiny monitoring API.

Any idea what we can do about this?

@scoder scoder reopened this May 19, 2024
@markshannon
Copy link
Member

We can remove the assert, nothing will crash.
However, there might be an issue with the frame passed to the sys.settrace callback function, as it will be the frame of the closest Python function, not the Cython function.

How did Cython work with profiling prior to 3.12? Did you create a fake frame?

@scoder
Copy link
Contributor Author

scoder commented May 25, 2024

How did Cython work with profiling prior to 3.12? Did you create a fake frame?

Yes. We could keep doing that, but a) frames aren't "officially" exposed from the C-API and b) as I understand it, the goal of creating the monitoring C-API was to allow non-Python non-frame code to interact with the tracing infrastructure, so requiring frames seems counter-productive.

I now found that coverage.py was actually updated to work directly with sys.monitoring, so I tried that and ran into some more issues:

  File "/home/stefan/source/Python/cython/cython-git/venv/py314/lib/python3.14/site-packages/coverage/sysmon.py", line 337, in sysmon_py_start
    sys_monitoring.set_local_events(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self.myid,
        ^^^^^^^^^^
    ...<7 lines>...
        # | events.JUMP
        ^^^^^^^^^^^^^^^
    )
    ^
SystemError: cannot instrument shim code object 'func1'

The exception originates here and seems to trigger on generally all code objects that do not have byte code:

if (code->_co_firsttraceable >= Py_SIZE(code)) {
PyErr_Format(PyExc_SystemError, "cannot instrument shim code object '%U'", code->co_name);
return -1;
}

set_local_events() also has this code, requiring the "code-like object" to be an actual code object:

if (!PyCode_Check(code)) {
PyErr_Format(
PyExc_TypeError,
"code must be a code object"
);
return NULL;
}

I'm afraid that we might still run into a lot of similar assumptions about what is traceable code.

@scoder
Copy link
Contributor Author

scoder commented May 25, 2024

Changing the code from
if (code->_co_firsttraceable >= Py_SIZE(code)) {
to
if (code->_co_firsttraceable && code->_co_firsttraceable >= Py_SIZE(code)) {
lets me get past the check and continue.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 26, 2024
…f argument counts. (pythonGH-119179)

(cherry picked from commit 70b07aa)

Co-authored-by: scoder <stefan_ml@behnel.de>
scoder added a commit that referenced this issue May 26, 2024
…of argument counts. (GH-119179) (GH-119575)

gh-111997: Fix argument count for LINE event and clarify type of argument counts. (GH-119179)
(cherry picked from commit 70b07aa)

Co-authored-by: scoder <stefan_ml@behnel.de>
@iritkatriel
Copy link
Member

Should we make sys_trace_line_func and other functions in legacy_tracing.c create a fake frame if frame passed in is NULL?

@iritkatriel
Copy link
Member

What are the open issues here currently?

estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
@iritkatriel
Copy link
Member

I'm closing this as complete. Let's create new issues for any followup work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes deferred-blocker topic-C-API type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

10 participants