-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add RE2 regexp engine support #2012
base: v3/master
Are you sure you want to change the base?
Conversation
I've just force-pushed updated PR.
|
Since my changes have been essentially merged into |
d2f3df2
to
b5cf105
Compare
Ping. |
Just in case, even though it's marked WIP, it's more than ready for the first round of review. |
moves Pcre to backends/pcre.cc moves PegexMatch to regex_match.h
This enables unit tests to compare the matching groups as well, not just binary match-no match.
Previously, searchAll would stop search when it encountered an empty matching group in any position. This means that, for example, regular expression "(a)(b?)(c)" would match string "ac", but the resulting group list would be ["ac", "a"]. After this change, the resulting list for the aforementioned regular expression becomes ["ac", "a", "", "c"] like it should've been. Additionally, this also changes behaviour for multiple matches. For example, when "aaa00bbb" is matched by "[a-z]*", previously only "aaa" would be returned. Now the matching list is ["aaa", "", "", "bbb", ""]. The old behaviour was confusing and almost certainly a bug. The new behaviour is the same as in Python's re.findall. For reference, though, Go does it somewhat differently: empty matches at the end of non-empty matches are ignored, so in Go above example is ["aaa", "", "bbb"] instead.
RE2 doesn't support certain features, like negative lookaround, so when a regular expression cannot be compiled with RE2, it's compiled with libpcre instead. This has some runtime cost, as this fallback is implemented with an extra heap object and virtual function calls. When RE2 is not enabled, however, everything works as it did before.
Ubuntu 14.04 doesn't have RE2 package altogether, and Ubuntu 16.04 RE2 package is too old. Ubuntu 18.04 RE2 package might work, but this Ubuntu verison it's not supported by Travis yet. So build RE2 from source.
Rebased on top of the current master. |
Very interesting commit. We recently ran into performance issues with modsecurity. Any news on how much improvement we might see? |
Hi @eddie4, That is not the single front that we have on the performance side. Check the branch v3/dev/3.1-experimental. We have had great progress there. This is still on the Queue, to be merged once other pieces of v3.1 are complete. |
@eddie4 in my experience, it doesn't really make it much faster in average, however, it eliminates ReDoS worst cases. |
@zimmerle please ping me when it's time to rebase this on top |
See #1996.
Every commit has detailed description.
I've implemented the fallback approach so far: if RE2 support is enabled, every regular expression is compiled with RE2, and if that fails, regular expression is compiled with libpcre instead. Please see #1996 for more detailed discussion why this approach is necessary.
There's also lots of refactoring of regexp related code.
Open questions:
--with-re
is supplied?