SIMD in the lexer #2285

overlookmotel · 2024-02-02T19:22:57Z

overlookmotel
Feb 2, 2024
Maintainer

Some parts of the lexer could likely be accelerated significantly by using SIMD instructions, particularly those which chew through a series of bytes searching for an end character:

Identifiers
Strings
Whitespace
Comments
Integers

In all of these cases, the algorithms themselves in a SIMD implementation would not be very complex - use either cmpeq or pshufd + movemask + trailing_zeros() to locate first byte in a batch matching a certain pattern. It's only a few lines, and there are some good examples to follow, like memchr and simd-json.

However, there are practical difficulties. I'm opening this issue to discuss options and their feasibility.

Compilation targets

The biggest headache is supporting multiple compilation targets (x86 with SSE2, SSE4, AVX, AVX2, AVX-512 / aarch64 NEON / WASM / fallback for other architectures).

portable-simd is designed to tackle exactly this problem, but it requires Rust Nightly, so is not an option for OXC at present. By looks of things, portable-simd's API is still undergoing some churn, and judging by Rust's usual anxiety about covering every possibility before making a feature stable, I'm guessing it could take several years to stabilize.

I cannot find any crates which offer what we need prebuilt, which cover both x86 and aarch64. I assume both these are essential, as servers will tend to be x86, and many developers will be running OXC on MacBooks with ARM M-series chips. WASM support would also be nice.

memchr does cover all these targets, but can only match against a maximum of 3 different byte values, which is insufficient for our needs. It's also optimized for finding uncommon needles in large haystacks, whereas in OXC often you'd expect a match to be found in first batch (almost all JS identifiers are less than 32 characters long, for example).

Some possible approaches we could take:

Enable SIMD only on Rust Nightly with a feature flag, and use portable-simd.
Implement from scratch inside OXC.
Implement the SIMD primitives from scratch in an external crate (which would export generic Vector types like these), and then build OXC-specific functions around those types within OXC.
Attempt to collaborate with maintainers of memchr to add what we need to memchr.

If `portable-simd` on nightly only

This is by far the easiest route to getting something working.

How would some of OXC's notable consumers (e.g. Rolldown) feel about this approach?

@Boshen Would you be comfortable with the NodeJS NAPI build and WASM build using nightly?

If implementing from scratch

Implementing from scratch would be challenging. memchr's code to switch between different implementations depending on architecture is quite forbidding.

We would also need to answer the question of whether to select which implementation to use:

at compile time, or
at run time

Compile time is simpler to implement, but requires compiling with -C target-cpu=native to use the fastest implementation available on a particular machine, or building multiple binaries for SSE2, AVX, AVX2 etc.

Run time selection allows a single binary to switch to implementation which uses fastest instructions and widest vector width available on the machine but, judging from the code memchr uses to do this, is pretty gnarly to do well.

Testing infrastructure

At present OXC's tests run on CI only on a single flavour of x86_64, and Mac. Benchmarks and conformance suite only run on x86_64.

Introducing SIMD (regardless of which approach we take) would make the implementations on different platforms quite different. A bug or performance regression could happen one architecture, without manifesting on another.

So we'd need to run tests, conformance, and benchmarks on a variety of different platforms.

Covering Mac is easy - just run the benchmarks and conformance suite on Mac as well as Linux.

But I don't have any idea how to cover different x86 instruction sets (SSE2, SSE4, AVX, AVX2, AVX512). Presumably Github runners don't offer that range of options, and I don't know if they guarantee all runners support all of them, even if most do.

Is it possible to get code to compile for a specific instruction set on CI, and then run tests and benchmarks for e.g. the SSE2 version on an AVX512-equipped runner? And if so, would that be a good enough proxy for the "real deal"?

Other notes

Some SIMD is possible without using any explicit SIMD primitives. The compiler will automatically vectorize standard Rust code which is written in a certain way. I had hoped we might be able to do everything we need through coaxing the compiler to generate efficient SIMD code itself. But, from asking on Rust lang forum, it seems it's not possible to generate movemask instructions with this method, which is essential to a fast implementation, as far as I can see.

Just to say, I'm not that knowledgeable at all about SIMD. Have only done a bit of research over past few weeks. Apologies if some of the above is complete nonsense!

Are there really no crates?

I'm quite surprised to find there don't seem to be any crates available which help with this. memchr and the like contain what's required, but their code is custom for exactly their use cases, and the primitives are not exposed, or reusable.

Have I just not found one yet? Anyone aware of something suitable which already exists?

Boshen · 2024-02-03T12:43:37Z

Boshen
Feb 3, 2024
Maintainer

1. Enable SIMD only on Rust Nightly with a feature flag, and use portable-simd.

We had portable-simd at a point, but it was removed because we need stable Rust for other to rely on.

From Joe (React team lead): https://twitter.com/en_JS/status/1676484656575946753

It would be really nice if it didn’t rely on nightly Rust. Nightly is cool for apps, but not for infra that others will build on.

6 replies

overlookmotel Feb 3, 2024
Maintainer Author

In case I wasn't clear in previous post, what I was suggesting for the portable-simd option is that portable-simd would only be used if user enabled a crate feature "simd". By default, OXC would use scalar code and would not require nightly. Nightly would only be required if user enables that feature.

OXC's consumers would be able to decide based on their individual needs which trade-off is better for them: stable Rust, or nightly and a speed-boost.

Of course this wouldn't be ideal. It'd be better for OXC to be SIMD-optimized for all users, without any trade-off. But given the significant challenges of implementing SIMD without using portable-simd (as discussed above), I wondered if this would be a pragmatic first step in that direction, and allow us to find out how much it moves the needle on performance before deciding whether worthwhile to expend the heavy work to go "all in full stable SIMD".

I was also interested in your opinion on whether some consumers would likely be comfortable with nightly (e.g. Rolldown team), and whether you would be willing to use nightly for the builds which OXC provides as binaries not libs (NodeJS NAPI, WASM).

Interesting that SIMD didn't help on WASM, though from reading those 2 links, it sounds like you weren't sure if it was actually producing SIMD instructions or not. My suspicion is that maybe it wasn't, and that's why it was a de-opt rather than a speed-up.

Boshen Feb 3, 2024
Maintainer

Nightly won't be an option for us due to its instability and maintenance burden.

portable-simd is unstable ... it has limitations ... it won't actually give you all the tools you need to implement what we desire given my limited experience with it.

I remember I had to jump into their zulip and ask for questions, and their answer was they haven't implemented the functionality I need :-)

overlookmotel Feb 3, 2024
Maintainer Author

Ah OK. Thanks for clarifying. If portable-simd doesn't provide what we need, we can rule out that option, regardless of the question over nightly.

dyxushuai Feb 13, 2024

A magic for nightly features in stable channel!😈 https://github.com/m-ou-se/nightly-crimes (Just a joke

dyxushuai Feb 13, 2024

Nightly won't be an option for us due to its instability and maintenance burden.

Actually, we have an approach to switch between nightly and stable on the same codebase. Be like:

https://github.com/BinChengZhao/delay-timer/blob/master/build/build.rs#L8

  // Set cfg flags depending on release channel
    match version_meta()?.channel {
        Channel::Stable => {
            println!("cargo:rustc-cfg=RUSTC_IS_STABLE");
        }
        Channel::Beta => {
            println!("cargo:rustc-cfg=RUSTC_IS_BETA");
        }
        Channel::Nightly => {
            emit("nightly");
            println!("cargo:rustc-cfg=RUSTC_IS_NIGHTLY");
        }
        Channel::Dev => {
            println!("cargo:rustc-cfg=RUSTC_IS_DEV");
        }
    }

We can add compile flags in the compile stage with build.rs. And use the cfg_attr to add nightly features if we are in nightly channel.

#![cfg_attr(RUSTC_IS_NIGHTLY, feature(linked_list_cursors))]

And use nightly features with a compile flag

#[cfg(RUSTC_IS_NIGHTLY)]
// YOUR_NIGHLY_LOGIC

In case I wasn't clear in previous post, what I was suggesting for the portable-simd option is that portable-simd would only be used if user enabled a crate feature "simd". By default, OXC would use scalar code and would not require nightly. Nightly would only be required if user enables that feature.

@overlookmotel We can achieve your goal by this way.

Boshen · 2024-02-03T12:44:04Z

Boshen
Feb 3, 2024
Maintainer

2. Implement from scratch inside OXC.

0 replies

Boshen · 2024-02-03T12:44:18Z

Boshen
Feb 3, 2024
Maintainer

3. Implement the SIMD primitives from scratch in an external crate (which would export generic Vector types like these), and then build OXC-specific functions around those types within OXC.

1 reply

Boshen Feb 3, 2024
Maintainer

I think this is the best approach (implement in another crate) after going through https://github.com/rusticstuff/simdutf8

We need to setup CI jobs differently.

Boshen · 2024-02-03T12:44:55Z

Boshen
Feb 3, 2024
Maintainer

4. Attempt to collaborate with maintainers of memchr to add what we need to memchr.

0 replies

Boshen · 2024-02-03T12:51:07Z

Boshen
Feb 3, 2024
Maintainer

Some parts of the lexer could likely be accelerated significantly by using SIMD instructions, particularly those which chew through a series of bytes searching for an end character:

Identifiers
Strings
Whitespace
Comments
Integers

4 replies

Boshen Feb 3, 2024
Maintainer

Comments: I had trouble figuring out Token::is_on_new_line.
Whitespace: I had trouble figuring out irregular whitespaces when skipping in bulk.

Boshen Feb 3, 2024
Maintainer

@overlookmotel Here's what I can do for you:

If you can figure out the algorithm for all (or majority) of these, I'll contact

Daniel Lemire for SIMD help
Andrew (Burntsushi) for memchr help

overlookmotel Feb 3, 2024
Maintainer Author

I'd imagine processing those 2 as follows:

Multi-line comments:

At first search for */, \r, \n or 0xE2 (1st byte of either LS or PS).
Once a line break is found, just search for */ or 0xE2 (as it's not relevant whether there's further line breaks now).
Put handling 0xE2 on a #[cold] branch as irregular line breaks (and other Unicode characters starting with 0xE2) are rare.
Whether all other characters are ASCII or Unicode is irrelevant - no need to check.

Whitespace:

Search for 1st byte which is ASCII but not or \t, or >= 128 (Unicode)
Handle Unicode in #[slow] branch which checks if char is irregular whitespace or not.

i.e. in both, assume input will be ASCII and chew through it as fast as possible. De-opt to slow path for Unicode cases which do need to be handled, but should be rare in practice.

Thanks for offer of reaching out to those two. I don't know when I'll get time to look into this, but probably simplest thing would be to write a working implementation for x64_64 with AVX2, which we can run on CI. Then seek their help to tune it and make it generic across architectures.

overlookmotel Feb 3, 2024
Maintainer Author

Oh actually 0x0B and 0x0C (irregular whitespace) also need to be handled to create trivia for them. But, again, #[cold] path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD in the lexer #2285

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 11 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

SIMD in the lexer #2285

overlookmotel Feb 2, 2024 Maintainer

Compilation targets

If portable-simd on nightly only

If implementing from scratch

Testing infrastructure

Other notes

Are there really no crates?

Replies: 5 comments · 11 replies

Boshen Feb 3, 2024 Maintainer

1. Enable SIMD only on Rust Nightly with a feature flag, and use portable-simd.

overlookmotel Feb 3, 2024 Maintainer Author

Boshen Feb 3, 2024 Maintainer

overlookmotel Feb 3, 2024 Maintainer Author

dyxushuai Feb 13, 2024

dyxushuai Feb 13, 2024

Boshen Feb 3, 2024 Maintainer

2. Implement from scratch inside OXC.

Boshen Feb 3, 2024 Maintainer

3. Implement the SIMD primitives from scratch in an external crate (which would export generic Vector types like these), and then build OXC-specific functions around those types within OXC.

Boshen Feb 3, 2024 Maintainer

Boshen Feb 3, 2024 Maintainer

4. Attempt to collaborate with maintainers of memchr to add what we need to memchr.

Boshen Feb 3, 2024 Maintainer

Boshen Feb 3, 2024 Maintainer

Boshen Feb 3, 2024 Maintainer

overlookmotel Feb 3, 2024 Maintainer Author

overlookmotel Feb 3, 2024 Maintainer Author

overlookmotel
Feb 2, 2024
Maintainer

If `portable-simd` on nightly only

Replies: 5 comments 11 replies

Boshen
Feb 3, 2024
Maintainer

overlookmotel Feb 3, 2024
Maintainer Author

Boshen Feb 3, 2024
Maintainer

overlookmotel Feb 3, 2024
Maintainer Author

Boshen
Feb 3, 2024
Maintainer

Boshen
Feb 3, 2024
Maintainer

Boshen Feb 3, 2024
Maintainer

Boshen
Feb 3, 2024
Maintainer

Boshen
Feb 3, 2024
Maintainer

Boshen Feb 3, 2024
Maintainer

Boshen Feb 3, 2024
Maintainer

overlookmotel Feb 3, 2024
Maintainer Author

overlookmotel Feb 3, 2024
Maintainer Author