[WIP] Swap over to yara-x; improve performance and readability #734
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After hacking around with yara-x locally last night, the performance gains over Yara are definitely noticeable (usually 2x faster at minimum).
I wanted to get this rather large refactor up to test the CI experience again since we have to build the API from scratch, but when refreshing the sample data locally I was down to about ~24-25 seconds with the integration tests taking about 27 seconds (on my M1 Pro MBP).
Edit, we're about 3-4x faster in GHA with 8-core runners:
This is usually around ~170s.
That said, I plan on using 16-core runners to help with the yara-x build.
This PR also cleans up
recursiveScan
and fixes the behavior oflongestUnique
and splits out path-related functions into a newpath.go
file. Additionally,findFilesRecursively
will locate files concurrently to help with large numbers of files.Outside of the
longestUnique
changes, output is essentially 1:1 with the current implementation.