Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature for switching between regex dependencies #106

Closed
wants to merge 2 commits into from

Conversation

Degubi
Copy link

@Degubi Degubi commented Aug 8, 2024

Implementation for request in issue #105

  • Made the dependency on 'regex' optional
  • Added feature flag for 'regex' (added it to default-features, making it the default to avoid any behaviour changes)
  • Added feature flag for 'regex-lite'
  • Updated the 'static_regex' macros
  • Update regex usage in test files

@Degubi
Copy link
Author

Degubi commented Aug 8, 2024

Note: There seem to be 19 failed test cases when using the regex-lite crate with these messages:
'Unicode character classes are not supported'
This is to be expected as the regex-lite crate only has basic support for unicode characters.

@jmcnamara
Copy link
Owner

Thanks. I'll take a look at the PR once I'm back online on Monday. Overall it looks good.

This is to be expected as the regex-lite crate only has basic support for unicode characters.

That may be a deal breaker. I'll need to look at that in more detail.

@Degubi Degubi force-pushed the feature-regex-lite branch from 03ea59f to 28686e0 Compare August 8, 2024 18:00
@Degubi
Copy link
Author

Degubi commented Aug 8, 2024

Great thank you!

Also ran cargo fmt and force-pushed because I forgot

@jmcnamara
Copy link
Owner

Thanks. Could you add a GitHub action with regex-lite as well so I can see the failures. Use one of the others actions like the Polars test as an example.

@Degubi
Copy link
Author

Degubi commented Aug 8, 2024

I added it!
But I noticed something:
Running 'cargo test' yielded only 19 error messages
Running 'cargo test --test integration' made all tests fail: panicked at src/utility.rs:445:23
It seems it doesn't like the regex on that line (and on the next one)

By removing the \p{Emoji} parts from those 2 regexes only a single test fails: quote_name11::test_quote_name11

@jmcnamara
Copy link
Owner

That is probably fixable. Python re regexes don't support \p{Emoji} either but I was able to work around it.

Leave it with me until I'm back online next week.

@jmcnamara
Copy link
Owner

jmcnamara commented Aug 11, 2024

I had a look at this in more detail and I don't think it is feasible to add regex-lite due to the lack of Unicode support.

Apart from the failing integration test there are a number of tests that fail with basic Unicode support:

https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/utility/tests.rs#L175-L181

So any sheet name in non-English/ASCII will end up being quoted unnecessarily. This is generally a safe failure since Excel will accept an erroneously quoted sheet name in place of an optimally quoted sheetname. However, there may be cases where this causes an Excel read/load error.

So I would need to enable the regex-lite feature with a number of warnings/caveats that would confuse the average end user. It also adds a fair amount of fragility and paints me into a corner when adding other regexes in the future. So, overall and from a maintenance point of view I feel that this isn't worth the effort. I hope you can accept that.

I'll leave this open to see if other people add +1s for the use case and I may add it at some stage with a "Developer Beware" warning.

@jmcnamara jmcnamara self-assigned this Aug 11, 2024
@jmcnamara jmcnamara added the wontfix This will not be worked on label Aug 11, 2024
@jmcnamara
Copy link
Owner

@Degubi, in case this discussion comes up again in the future could you add some example size differences between the Regex and Regex-Lite versions, as a reference. Thanks.

@Degubi
Copy link
Author

Degubi commented Aug 12, 2024

Sure thing @jmcnamara !
Will do later today.
Thank you for taking a look!

@Degubi
Copy link
Author

Degubi commented Aug 12, 2024

@jmcnamara Here are some of my results:

WindowsLinux
  regex regex-lite
Debug 6066 KB 2963 KB
Release 2817 KB 1580 KB
  regex regex-lite
Debug 36.3 MB 22.5 MB
Release 2.9 MB 1.6 MB

Code: https://github.com/jmcnamara/rust_xlsxwriter?tab=readme-ov-file#example

Cargo.toml:

[package]
name = "excel-size-diff"
version = "1.0.0"
edition = "2021"

[dependencies]
# rust_xlsxwriter = { git = "https://github.com/Degubi/rust_xlsxwriter.git", branch = "feature-regex-lite", default-features = false, features = [ "regex" ] }
rust_xlsxwriter = { git = "https://github.com/Degubi/rust_xlsxwriter.git", branch = "feature-regex-lite", default-features = false, features = [ "regex-lite" ] }

[profile.release]
strip = "symbols"
lto = true
codegen-units = 1

@jmcnamara
Copy link
Owner

@Degubi In case you miss it, we are currently working to remove regex altogether and we making some progress: #108.

@Degubi
Copy link
Author

Degubi commented Aug 30, 2024

Looking at the conversation in #108 and the work done on the #no_regex branch I'm gonna assume that this PR can be closed.
Thank you for your great work!

@Degubi Degubi closed this Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants