Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add RE2 regexp engine support #2012

Open
wants to merge 12 commits into
base: v3/master
Choose a base branch
from

Conversation

WGH-
Copy link
Contributor

@WGH- WGH- commented Jan 30, 2019

See #1996.

Every commit has detailed description.

I've implemented the fallback approach so far: if RE2 support is enabled, every regular expression is compiled with RE2, and if that fails, regular expression is compiled with libpcre instead. Please see #1996 for more detailed discussion why this approach is necessary.

There's also lots of refactoring of regexp related code.

Open questions:

  • Perhaps, since RE2 support is considered experimental, configure script should not even attempt to build it unless explicit --with-re is supplied?
  • There's no easy way to know which regular expression uses which engine. It might feel like kind of implicit magic, which isn't really good.
  • There's no error handling when regular expression is invalid. This isn't a regression introduced by this PR, but now might be the time to do something about it.
  • PR needs code review, obviously.
  • Changelog entry
  • Documentation
  • Maybe something else I missed.

@WGH- WGH- changed the title WIP: Add RE2 regexp engine support support WIP: Add RE2 regexp engine support Jan 30, 2019
@victorhora victorhora requested a review from zimmerle February 2, 2019 01:59
@victorhora victorhora added enhancement 3.x Related to ModSecurity version 3.x new feature This is a new feature labels Feb 2, 2019
@WGH-
Copy link
Contributor Author

WGH- commented Feb 5, 2019

I've just force-pushed updated PR.

  • New commit with fixed error handling for the rx operator.
  • get_pattern was renamed to getPattern to adhere to naming convention used elsewhere in ModSecurity.
  • verifyCC conversion to the new classes, which was done very sloppily earlier, has been re-done properly.

@WGH- WGH- closed this Feb 6, 2019
@WGH- WGH- reopened this Feb 6, 2019
@WGH- WGH- changed the base branch from v3/dev/re2 to v3/master February 6, 2019 19:14
@WGH-
Copy link
Contributor Author

WGH- commented Feb 6, 2019

Since my changes have been essentially merged into v3/dev/re2, I updated the PR to target v3/master branch now.

@zimmerle
Copy link
Contributor

zimmerle commented Feb 6, 2019

Overall, the performance for the libpcre users was not impacted by this pull request:

screenshot_20190206_164323

The chart was generated using the code merged into v3/dev/re2.

@WGH- WGH- force-pushed the v3/dev/re2 branch 2 times, most recently from d2f3df2 to b5cf105 Compare February 14, 2019 01:52
@WGH-
Copy link
Contributor Author

WGH- commented Nov 8, 2019

Ping.

@WGH-
Copy link
Contributor Author

WGH- commented Jan 14, 2020

Just in case, even though it's marked WIP, it's more than ready for the first round of review.

zimmerle and others added 8 commits September 4, 2020 22:48
moves Pcre to backends/pcre.cc
moves PegexMatch to regex_match.h
This enables unit tests to compare the matching groups as well,
not just binary match-no match.
Previously, searchAll would stop search when it encountered an empty
matching group in any position. This means that, for example,
regular expression "(a)(b?)(c)" would match string "ac", but the
resulting group list would be ["ac", "a"].

After this change, the resulting list for the aforementioned regular
expression becomes ["ac", "a", "", "c"] like it should've been.

Additionally, this also changes behaviour for multiple matches. For
example, when "aaa00bbb" is matched by "[a-z]*", previously only "aaa"
would be returned. Now the matching list is ["aaa", "", "", "bbb", ""].

The old behaviour was confusing and almost certainly a bug. The new
behaviour is the same as in Python's re.findall.

For reference, though, Go does it somewhat differently: empty matches
at the end of non-empty matches are ignored, so in Go above example is
["aaa", "", "bbb"] instead.
WGH- added 3 commits September 5, 2020 17:23
RE2 doesn't support certain features, like negative lookaround,
so when a regular expression cannot be compiled with RE2, it's
compiled with libpcre instead.

This has some runtime cost, as this fallback is implemented
with an extra heap object and virtual function calls.

When RE2 is not enabled, however, everything works as it did before.
Ubuntu 14.04 doesn't have RE2 package altogether,
and Ubuntu 16.04 RE2 package is too old.

Ubuntu 18.04 RE2 package might work, but this Ubuntu verison it's not
supported by Travis yet.

So build RE2 from source.
@WGH-
Copy link
Contributor Author

WGH- commented Sep 5, 2020

Rebased on top of the current master.

@eddie4
Copy link

eddie4 commented Sep 28, 2020

Very interesting commit. We recently ran into performance issues with modsecurity. Any news on how much improvement we might see?

@zimmerle
Copy link
Contributor

Very interesting commit. We recently ran into performance issues with modsecurity. Any news on how much improvement we might see?

Hi @eddie4,

That is not the single front that we have on the performance side. Check the branch v3/dev/3.1-experimental. We have had great progress there. This is still on the Queue, to be merged once other pieces of v3.1 are complete.

@WGH-
Copy link
Contributor Author

WGH- commented Sep 28, 2020

Very interesting commit. We recently ran into performance issues with modsecurity. Any news on how much improvement we might see?

@eddie4 in my experience, it doesn't really make it much faster in average, however, it eliminates ReDoS worst cases.

@WGH-
Copy link
Contributor Author

WGH- commented Oct 29, 2020

@zimmerle please ping me when it's time to rebase this on top v3/dev/3.1-experimental.

@zimmerle
Copy link
Contributor

@zimmerle please ping me when it's time to rebase this on top v3/dev/3.1-experimental.

That is definitively something that I want to see happening. Lets have 3.1-experimental stable first. Working on the new issues that you have reported on #2376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x Related to ModSecurity version 3.x enhancement new feature This is a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants