Fix `USE_DS` for MIPS ##emu #23307

vobst · 2024-09-11T17:11:12Z

Mark this if you consider it ready to merge
I've added tests (optional)
I wrote some lines in the book (optional)

Description

This PR is mostly a question on how one should handle branch delay slots when emulating ESIL.

For context, I've been working on adding MIPS support to radius2, a symbolic execution engine for ESIL. One problem was handling of branch delay slots, at least until I noticed that everything I needed from r2 was already there. It's just hidden behind the USE_DS #define. With this define active the delay-slot behavior is explicit in the ESIL of an instruction via the (SETD and SETJT directives). Given this information the emulation is pretty straight forward.

The included patches fix the compilation with USE_DS=1 and correct a few errors where instructions without a delay slot (compact instructions) were treated incorrectly as having a delay slot. There shouldn't be any change for USE_DS=0 builds. The patch is incomplete in that "likely" instructions, where the delay slot is only executed if the branch is taken, are still incorrect. Fixing this by encoding the behavior in the ESIL string would be pretty straight-forward.

However, if I understand it correctly the USE_DS way of embedding delay slot behavior in the ESIL string is legacy (code doesn't even compile + all the errors in instruction semantics).

This brings me to my question: What is the intended way for external tools that want to work with ESIL to learn about and handle delay slots? With the information that I currently get via pdj after analyzing a function I cannot handle them properly. I am at least missing information about: how many delay slot has this instruction?, are the delay slots always executed or is there some condition (like the branch being taken for "likely" instructions)?.

[If there is no "intended" way to get this information, it would be OK for me to rebuild r2 with the define enabled as part of my build process, but then it would be nice to have a way to enable USE_DS via the build system.]

Cheers

trufae

from what i get here USE_DS makes the expressions more complex to trap when an instruction is executed if the delay slot is going on and the instruction cant happen there. i assume this can only happen on invalid genereated code by compilers or linearly emulating the code. i think the logic was commented out because of that. but ideally this feature should be not a compile-time feature

trufae · 2024-09-12T13:35:05Z

libr/arch/p/mips/plugin_cs.c

+			// in register $31.
+			r_strbuf_appendf (&op->esil, ES_TRAP_DS () "" ES_CALL_ND ("%s"), ARG (0));
+#if USE_DS
+			r_strbuf_replacef (&op->esil, "$$", "0x%"PFMT64x, addr);


$$ is a keyword that should be removed because we have PC and op->addr already, so there's no need to have an extra keyword for that. also its better to build the string right away instead of using replacef

So would you suggest to change the macro like this to avoid $$:

#if USE_DS #define ES_TRAP_DS(addr) "$ds,!,!,?{," addr ",1,TRAP,BREAK,}," #else #define ES_TRAP_DS(addr) "" #endif

With that we can remove the #if USE_DS -> r_strbuf_replacef stuff and build the string in-place by rewriting the r_strbuf_appendf call to:

#if USE_DS r_strbuf_appendf (&op->esil, ES_TRAP_DS ("0x%"PFMT64x) "" ES_CALL_DR ("%s", "%s"), addr, ARG (0), ARG (1)); #else r_strbuf_appendf (&op->esil, "" ES_CALL_DR ("%s", "%s"), ARG (0), ARG (1)); #endif

where addr is op->addr or insn->address (I guess that's the same). The downside I see is that by unconditionally passing addr to the invocation we must duplicate the code. Alternatively, one could to a bad hack like:

r_strbuf_appendf (&op->esil, ES_TRAP_DS ("0x%3$"PFMT64x) "" ES_CALL_DR ("%s", "%s"), ARG (0), ARG (1), addr);

But that could, depending on compiler settings, complain about the wrong number of args in the format string.

The problem with using PC is that AFAIK it points to the first byte after instruction, and we would need to subtract its size, which may actually be 4 or 2 bytes.

from what i get here USE_DS makes the expressions more complex to trap when an instruction is executed if the delay slot is going on and the instruction cant happen there. i assume this can only happen on invalid genereated code by compilers or linearly emulating the code.

FYI, the manual has the following to say about putting a branch/jump into the branch delay slot: "If a branch or jump instruction is placed in the branch delay slot, the operation of both instructions is undefined." So I guess what happens depends on the impl of the CPU u are running on. To me, it seems like trapping on that is reasonable, but given that behavior is undefined, it would also be OK to throw this part out completely and just let the emulator decide.

what mainly worries me here is the fact that there are two compile time ways to have this esil expression. so i would prefer to have only one and not depend on compile time options

Okay, so completely throw away the USE_DS define and always emit the version with the trap code and SETD/SETJT directives?

well. if the code is trapping, the linear emulation code should be ignoring those stop points.. did you had a look at the failing tests?

i mean in theory the change looks good to me, but will need some refinements and checks to see if the behaviour keeps correct for the current usecases. thanks for taking the time!

Sorry for the delay - was on vacation. Then I'll now work on a version with only one compile time version of the ESIL. Once that one is ready, I'll certainly need some help since I have no idea where those changes might break something (besides the tests of course) in the larger r2 picture.

Count on me for that help, don't hesitate to make questions or join discord/telegram/matrix channel for a more fluent chat. Thanks!

trufae · 2024-09-12T13:47:00Z

libr/arch/p/mips/plugin_cs.c

@@ -241,6 +244,7 @@ static const char *arg(csh *handle, cs_insn *insn, char *buf, size_t buf_sz, int
 static int analop_esil(RArchSession *as, RAnalOp *op, csh *handle, cs_insn *insn) {
 	char str[8][32] = {{0}};
 	int i;
+	u_int64_t addr = insn->address;


we use ut64 in r2land, its an alias to uint64_t but just good for consistency sugar

Sure, I also hate writing out u_int64_t ^^

trufae · 2024-09-25T15:51:06Z

ping

vobst · 2024-10-02T13:27:55Z

I made the discussed changes (only to the Capstone backend for now). In short:

There is now only one compile time version of the ESIL for each instruction. Its the one that you would have gotten with USE_DS = 1 before. I removed USE_DS.
We no longer use $$ and then replace, but rather construct the full string in one go using insn->address.
I fixed the semantics for the "likely" versions of instructions, e.g., MIPS_INS_BNEL.
I fixed the semantics for some conditional branch instructions, e.g., MIPS_INS_BLEZC.

Before I apply the same changes to the GNU backend I think it would make sense if we check whether they work out nicely with the rest of the code. Will start by looking at the tests I guess.

trufae · 2024-10-08T18:15:26Z

Sorry I was afk busy in a conference and didnt had time to do a proper review.

Yes, take a look at the failing tests. all the changes you did looks good to me, but im unsure about the behaviour change, so better check in detail to not break any test before merging.

trufae · 2024-12-03T20:52:58Z

ping

vobst · 2024-12-05T22:42:04Z

ping

Yea, I got side-tracked with other stuff, so nothing happened here ... but still got this one on my list of things to do :)

trufae · 2024-12-07T09:14:05Z

Good! We are in abi breaking season now and i want @condret to be aware of this pr so your work dont overlap

libr/mips: fix USE_DS

631c249

vobst mentioned this pull request Sep 12, 2024

Add MIPS support aemmitt-ns/radius2#25

Draft

trufae requested changes Sep 12, 2024

View reviewed changes

squash: address FB

fe4e6ae

vobst force-pushed the mips_fix_USE_DS branch from 56a2986 to fe4e6ae Compare October 2, 2024 13:13

radare force-pushed the master branch from 7cb9ee0 to c54a652 Compare November 6, 2024 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `USE_DS` for MIPS ##emu #23307

Fix `USE_DS` for MIPS ##emu #23307

vobst commented Sep 11, 2024 •

edited

Loading

trufae left a comment

trufae Sep 12, 2024

vobst Sep 13, 2024

vobst Sep 13, 2024

trufae Sep 13, 2024

vobst Sep 13, 2024

trufae Sep 16, 2024

trufae Sep 16, 2024

vobst Sep 27, 2024

trufae Sep 27, 2024

trufae Sep 12, 2024

vobst Sep 13, 2024

trufae commented Sep 25, 2024

vobst commented Oct 2, 2024

trufae commented Oct 8, 2024

trufae commented Dec 3, 2024

vobst commented Dec 5, 2024

trufae commented Dec 7, 2024

Fix USE_DS for MIPS ##emu #23307

Are you sure you want to change the base?

Fix USE_DS for MIPS ##emu #23307

Conversation

vobst commented Sep 11, 2024 • edited Loading

trufae left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trufae commented Sep 25, 2024

vobst commented Oct 2, 2024

trufae commented Oct 8, 2024

trufae commented Dec 3, 2024

vobst commented Dec 5, 2024

trufae commented Dec 7, 2024

Fix `USE_DS` for MIPS ##emu #23307

Fix `USE_DS` for MIPS ##emu #23307

vobst commented Sep 11, 2024 •

edited

Loading