Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SleighContextBuild set_image with &[u8] slice without copy #7

Closed
chf0x opened this issue Aug 22, 2024 · 14 comments · Fixed by #16
Closed

SleighContextBuild set_image with &[u8] slice without copy #7

chf0x opened this issue Aug 22, 2024 · 14 comments · Fixed by #16
Labels
enhancement New feature or request jingle_sleigh Involves the sleigh FFI

Comments

@chf0x
Copy link

chf0x commented Aug 22, 2024

Hi!

I was wondering if there's a specific reason why Image works with Vec instead of using &[u8] directly. In my use case, I provide multiple slices to the lifter to obtain the corresponding pcode, but since Image is copying slice values to vector, I'm experiencing a significant performance hit.

Additionally, would it be possible for functions like pub fn set_image(mut self, img: Image) -> Self to take &mut self instead?

I suppose my solution would be to fork and implement set_image directly on &[u8], but I wanted to ask first if there's a particular reason for the current implementation.

Thank you!

@toolCHAINZ
Copy link
Owner

Hey! I assume you're referring to the From<Vec<u8>> impl for Image? I'd have to re-inspect the code to verify, but I suspect the reason was just to avoid having another lifetime argument flying around: I envisioned Image as a structure that gets made in Rust and then moved in to C++-land, given to sleigh, and forgotten about. So that's the reason for it owning the data internally.

Other than the ergonomics of dealing with more lifetimes, there's no reason why Image NEEDS to own the data I think. Would have to try it out to be sure. Looking at my C++ code, I think there's additional allocations happening in C++ too for the move constructor... I'll think about this, but it will be a couple weeks before I have time to consider implementing larger changes.

On the set_image mut thing, absolutely. This is the first time I've tried the "builder" pattern and naively chose the wrong form of it when I wrote that. It should probably get changed to &mut as you suggest.

Regarding the new impl, you could implement for &[u8] as you suggest, but that would still involve a clone/allocation to produce the owned data Image requires.

The best performance fix would be to have Image and the C++ DummyLoadImage hold a reference instead of the data itself and slap a lifetime onto SleighContext. A middleground might be to pass a reference to C++ but still have it use a copy constructor to make a C++-owned version. That would at least eliminate one layer of extra allocations.

@chf0x
Copy link
Author

chf0x commented Aug 22, 2024

Yes, I meant this https://github.com/toolCHAINZ/jingle/blob/main/jingle_sleigh/src/ffi/image.rs#L12, would love to have instead

    #[derive(Debug, Clone)]
    pub struct ImageSection {
        pub(crate) data: &[u8],
        pub(crate) base_address: usize,
        pub(crate) perms: Perms,
    }

and pass a reference to C++ code

P.S: I think there is similar issue to set_image with build: https://github.com/toolCHAINZ/jingle/blob/main/jingle_sleigh/src/context/builder/mod.rs#L31

@toolCHAINZ
Copy link
Owner

toolCHAINZ commented Aug 22, 2024

Haha, yeah, and there's the fun, because this is not legal rust:

    #[derive(Debug, Clone)]
    pub struct ImageSection {
        pub(crate) data: &[u8],
        pub(crate) base_address: usize,
        pub(crate) perms: Perms,
    }

It would have to be:

    #[derive(Debug, Clone)]
    pub struct ImageSection<'a> {
        pub(crate) data: &'a [u8],
        pub(crate) base_address: usize,
        pub(crate) perms: Perms,
    }

Which would then mean that SleighContext becomes SleighContext<'a> (and might even require messing around with Pin) which could get fun. I'm not opposed to the idea of sending references to slay instead of values, it would definitely be more efficient, but I want to think through the best way to do it first.

It's possible there's another source of the performance hit you're getting too: for every new Image you're making, you're having to make a new SleighContext too. This involves sleigh parsing the architecture definition fresh every time because I'm not caching any of that.

I've also internally gated the sleigh initialization function behind a mutex:

match CTX_BUILD_MUTEX.lock() {
Ok(make_context) => {
let ctx = make_context(path_str, image.clone()).map_err(|_| SleighInitError)?;
let mut spaces: Vec<SpaceInfo> = Vec::with_capacity(ctx.getNumSpaces() as usize);
for idx in 0..ctx.getNumSpaces() {
spaces.push(SpaceInfo::from(ctx.getSpaceByIndex(idx)));
}
Ok(Self { image, ctx, spaces })
}
Err(_) => Err(SleighInitError),
}

This is necessary because the routine that parses the sleigh definition is VERY not thread safe. So if you're trying to make lots of sleigh contexts within threads in the same process, you're going to get bottlenecked severely during context creation.

Maybe in some future iteration of this we allow re-initializing an existing context with a new image, and pass images by reference, but that will probably require me understanding how sleigh uses LoadImages a bit better. I took the current approach because it was very obviously sound even if inefficient.

@toolCHAINZ
Copy link
Owner

Oh yeah, and definitely agreed that the build issue should be fixed too. If you're feeling trying your hand at doing some of these changes, I'm happy to review PRs.

@chf0x
Copy link
Author

chf0x commented Aug 22, 2024

Sure, there should definitely be a lifetime specified. The code I provided was more of a 'pseudo-Rust' example, just to convey the concept. I’ll be happy to submit a PR if I get something working. The only question is when - hopefully within a week or two, depends on time availability

@chf0x
Copy link
Author

chf0x commented Aug 22, 2024

...you're having to make a new SleighContext too. This involves sleigh parsing the architecture definition...

I kind of solved it by cloning. Not an optimal solution, but better than parse every time:

         ctx_builder: SleighContextBuilder::load_folder(settings.ghidra_sla_folder).unwrap(),

And after I just do:

        let ctx = self
            .ctx_builder.clone()
            .set_image(Image::from(bytes))
            .build(&self.arch)
            .unwrap();

@toolCHAINZ
Copy link
Owner

So, yes there's the parsing occuring there but I'm actually referring to parsing on the C++ side. We do all that rust-based parsing to enumerate what architectures are available, how to configure them, and to derive the path for the .sla file. The .sla file is the actual machine-generated specification that sleigh reads. All our parsing is just so we know what path to send to makeContext.

makeContext, which is over in C++, makes a sleigh engine and tells it to parse the provided .sla file to initialize itself.
It's this parsing, internal to sleigh, that takes a while. Decompressed, those files have multiple megabytes of text to parse, so it understandably takes a bit.

So in the future, we can maybe better serve your use-case by enabling mutation of a sleigh context and its Image to avoid that very expensive initialization step. I suspect that it's likely to be responsible for your performance issues.

@toolCHAINZ toolCHAINZ added enhancement New feature or request jingle_sleigh Involves the sleigh FFI labels Aug 22, 2024
@toolCHAINZ
Copy link
Owner

I did some closer reading of the sleigh code and think I have a better understanding of what sleigh will easily support and not support.

The Sleigh constructor takes as input the binary (or LoadImage in sleigh's terminology) and then subsequently takes in the sla file in an initialize call. This is basically backwards of the pattern one might expect where the engine is initialized for an architecture and then given binaries to process.

I believe this initialize call is where the expensive sleigh parsing actually happens. As far as I can tell, if you make a sleigh context, and then reset it for a new image and then reinitialize it (reset(<new image>) followed by initialize(<sla file>)), it should actually ignore the provided sla file (I guess with the assumption that the new binary is of the same architecture) and reuse that parsed data. So it should be pretty straightforward to expose that flow in rust and that would avoid the expensive engine-reinitialization.

As far as Image goes, I don't think it would be feasible to do what I suggested with a mutable image; sleigh caches things internally and it would likely lead to a buggy nightmare.

A constant reference however feels possible. The main difficulty there is wanting to avoid a lifetime argument on the sleigh context over in Rust. Maybe the context object could be split up and it could return a SleighImage structure with a lifetime on it or something.

I'll definitely be making the first change to avoid reloading soon, and then might also think about making the reference change.

@chf0x
Copy link
Author

chf0x commented Sep 20, 2024

That sounds great! With const ref, we can eliminate multiple copies. Sorry, I haven't had time to work on this yet

@toolCHAINZ
Copy link
Owner

toolCHAINZ commented Oct 14, 2024

So I haven't made the change necessary to just send a slice over (and after working with sleigh more, I'm less optimistic that it would work correctly).

BUT I've made some changes that should greatly speed up usage of jingle_sleigh with multiple images and also make the code more ergonomic.

See this unit test as an example:

    #[test]
    fn test_two_images() {
        let mov_eax_0: [u8; 4] = [0x0f, 0x05, 0x0f, 0x05];
        let nops: [u8; 9] = [0x90, 0x90, 0x90, 0x90, 0x0f, 0x05, 0x0f, 0x05, 0x0f];
        let ctx_builder =
            SleighContextBuilder::load_ghidra_installation("/Applications/ghidra").unwrap();
        let sleigh = ctx_builder.build(SLEIGH_ARCH).unwrap();
        let mut sleigh = sleigh.initialize_with_image(mov_eax_0.as_slice()).unwrap();
        let instr1 = sleigh.instruction_at(0);
        sleigh.set_image(nops.as_slice()).unwrap();
        let instr2 = sleigh.instruction_at(0);
        assert_ne!(instr1, instr2);
        assert_ne!(instr1, None);
        let instr2 = sleigh.instruction_at(4);
        assert_ne!(instr1, instr2);
        assert_ne!(instr2, None);
        let instr3 = sleigh.instruction_at(8);
        assert_eq!(instr3, None);
    }

A few things to note here:

        let sleigh = ctx_builder.build(SLEIGH_ARCH).unwrap();
        let mut sleigh = sleigh.initialize_with_image(mov_eax_0.as_slice()).unwrap();

This first line creates a sleigh context without requiring that an image be loaded, which is nice for the modeling work in jingle. The second line consumes this "unloaded" sleigh context to produce a LoadedSleighContext, which then allows for disassembling from the image.

This LoadedSleighContext can then have its image reset as it is done here:

        sleigh.set_image(nops.as_slice()).unwrap();

This is just done with a mutable reference, so you can reuse the same context for multiple images. It will not re-parse the sleigh definitions; in my very quick performance testing, the reset seems to take on the order of a millisecond.

Also, note that slices are passed directly into set image. These slices are still being cloned internally, but all the gross conversion to an Image is at least not littering consuming code now.

@chf0x
Copy link
Author

chf0x commented Oct 14, 2024

Thank you for the update, this is great! However, I'm a bit confused. Why do you think it wouldn't be possible to provide just a reference to the bytes to lift, given that we're simply wrapping around the C++ API?

@toolCHAINZ
Copy link
Owner

Err, sorry I was conflating a couple things.

The idea of a mutable slice isn't going to fly because of how sleigh caches disassembly.

The difficulty with a slice is that my preference would be to also allow images with owned data. The most direct way I can think of to do this would be to make Image a trait and expose generic functions over the FFI. However cxx doesn't support that. As far as I can tell from the cxx docs and issue tracker, this might require some dyn Trait craziness that I'm lacking the motivation to tackle.

If you (or someone else) finds a way to do allow this without adding a lifetime into the context itself and still allowing owned data, I'd be happy to review/merge.

I'll leave this open, but am not actively working on this for now (unless a good approach comes to me, or I decide to just suck it up and add a lifetime).

@toolCHAINZ
Copy link
Owner

I had a thought on another approach that might work, which I've detailed here: #15

I'll try this out when I get the chance.

@toolCHAINZ toolCHAINZ linked a pull request Oct 16, 2024 that will close this issue
@toolCHAINZ
Copy link
Owner

Done in #16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jingle_sleigh Involves the sleigh FFI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants