[.NET] Improved parsing time #336

obligaron · 2024-12-15T17:05:27Z

🤔 What's changed?

I have optimised the gherkin parser for better performance. This was done by avoiding string culture sensitive comparisons, (string) allocations and other optimisations (see commits for details).

For the very_long.feature this adds up to 3x to 7x improved parsing speed (depending on the .NET runtime used).

Also added a BenchmarkDotNet project to measure the performance improvements (for now and in the future).

Setup:

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4602/23H2/2023Update/SunValley3)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.101
  [Host]     : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  Job-ZJBIYF : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  Job-TTIYKJ : .NET Framework 4.8.1 (4.8.9282.0), X64 RyuJIT VectorSize=256

Results before:

Method	Runtime	FeatureFile	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Parser	.NET 8.0	tags.feature	527.66 μs	9.133 μs	8.543 μs	35.1563	35.1563	35.1563	555.57 KB
ParserReuse	.NET 8.0	tags.feature	97.01 μs	0.873 μs	0.817 μs	6.3477	0.2441	-	104.91 KB
Parser	.NET Framework 4.8.1	tags.feature	958.57 μs	4.861 μs	4.547 μs	106.4453	35.1563	35.1563	593.81 KB
ParserReuse	.NET Framework 4.8.1	tags.feature	72.33 μs	0.771 μs	0.721 μs	19.6533	0.8545	-	121.48 KB
Parser	.NET 8.0	very_long.feature	4,792.91 μs	42.469 μs	37.648 μs	281.2500	140.6250	31.2500	4720.05 KB
ParserReuse	.NET 8.0	very_long.feature	4,256.20 μs	23.611 μs	22.086 μs	257.8125	164.0625	-	4257.06 KB
Parser	.NET Framework 4.8.1	very_long.feature	4,011.65 μs	24.134 μs	21.394 μs	835.9375	289.0625	39.0625	5290.22 KB
ParserReuse	.NET Framework 4.8.1	very_long.feature	2,838.23 μs	11.272 μs	9.992 μs	777.3438	277.3438	-	4782.37 KB

Results after:

Method	Runtime	FeatureFile	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Parser	.NET 8.0	tags.feature	410.61 μs	8.183 μs	10.349 μs	35.1563	35.1563	35.1563	490.98 KB
ParserReuse	.NET 8.0	tags.feature	15.34 μs	0.182 μs	0.162 μs	2.5177	0.0916	-	41.18 KB
Parser	.NET Framework 4.8.1	tags.feature	876.67 μs	15.740 μs	14.724 μs	71.2891	35.1563	35.1563	512.97 KB
ParserReuse	.NET Framework 4.8.1	tags.feature	23.80 μs	0.108 μs	0.084 μs	7.3853	0.2747	-	45.45 KB
Parser	.NET 8.0	very_long.feature	989.34 μs	7.098 μs	6.292 μs	140.6250	70.3125	35.1563	2012.58 KB
ParserReuse	.NET 8.0	very_long.feature	565.29 μs	5.020 μs	4.695 μs	94.7266	47.8516	-	1558.77 KB
Parser	.NET Framework 4.8.1	very_long.feature	1,901.60 μs	11.795 μs	10.456 μs	320.3125	142.5781	35.1563	2071.23 KB
ParserReuse	.NET Framework 4.8.1	very_long.feature	905.75 μs	2.283 μs	1.906 μs	259.7656	100.5859	-	1596.55 KB

⚡️ What's your motivation?

Investigated SourceCodeGenerator for .NET and gherkin parser speed was one part where much of the time was spent. Also gherkin parser speed is relevant for user interactions (IDE editing of future files) and general compilation speed in .NET.

I hope this helps to make gherkin even better. 🙂

🏷️ What kind of change is this?

🏦 Refactoring/debt/DX (improvement to code design, tooling, etc. without changing behaviour)
💥 Breaking change (incompatible changes to the API)

♻️ Anything particular you want feedback on?

I changed some parts of the parsing to string ordinal comparison. string ordinal comparison was already used in a lot of places. So I don't think this is a hard breaking change. But I wanted to highlight it anyway (here and in the changelog).
Another thing is that I removed a Mono workaround that was probably no longer needed (regarding `string.StartsWith') that showed up during profiling.

📋 Checklist:

I agree to respect and uphold the Cucumber Community Code of Conduct
I've changed the behaviour of the code
- I have added/updated tests to cover my changes.
Users should know about my change
- I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

…nal everywhere

…lections

…or for System.Text.Json

gasparnagy

Looks good in general.
Just one question: is the optimization you had both in the parser and the ast builder that flattens the error handling wrapper method to try-catch everywhere counts a lot in perf? I understand that crating and invoking a delegate is more costly, but copy-pasting the same error handling code everywhere is pretty error-prone. That's the only change that somehow hurts me more. Would the numbers be very different without that one?

obligaron · 2024-12-16T11:37:16Z

No problem. 🙂
Here are the results without the delegate optimization (second commit reverted):

Method	Runtime	FeatureFile	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Parser	.NET 8.0	tags.feature	425.46 μs	8.187 μs	10.645 μs	35.1563	35.1563	35.1563	531.46 KB
ParserReuse	.NET 8.0	tags.feature	23.71 μs	0.143 μs	0.127 μs	4.9744	0.1831	-	81.66 KB
Parser	.NET Framework 4.8.1	tags.feature	839.09 μs	1.712 μs	1.602 μs	76.1719	40.0391	35.1563	553.49 KB
ParserReuse	.NET Framework 4.8.1	tags.feature	30.64 μs	0.350 μs	0.292 μs	13.9771	0.4883	-	86.04 KB
Parser	.NET 8.0	very_long.feature	1,245.31 μs	16.312 μs	13.621 μs	210.9375	175.7813	35.1563	3428.37 KB
ParserReuse	.NET 8.0	very_long.feature	831.35 μs	16.616 μs	17.063 μs	181.6406	90.8203	-	2973.31 KB
Parser	.NET Framework 4.8.1	very_long.feature	2,217.68 μs	43.897 μs	41.061 μs	550.7813	250.0000	35.1563	3493.88 KB
ParserReuse	.NET Framework 4.8.1	very_long.feature	1,223.84 μs	8.885 μs	6.937 μs	490.2344	187.5000	-	3015.27 KB

Without it, very_long.feature with reuse decreases by 47% for .NET 8 and 35% for .NET Framework 4.8. Allocations almost double.

Regarding error handling:
Most methods are generated, the changes in the gherking-csharp.razor file adds only 22 lines.

gasparnagy · 2024-12-18T07:54:43Z

@obligaron Thx for the measure. Yes, this is significant enough to keep.

obligaron added 9 commits December 15, 2024 17:46

[.NET] Add benchmarks

b0e598a

[.NET] Gherkin.Parser: avoid delegate creation

b0e449b

[.NET] GherkinLine.SplitCells: Avoid string allocations

3b6382c

[.NET] Optimize AstNode.subItems handling

ec6feef

[.NET] Remove old Mono workaround and use string.StartsWith with ordi…

6e9a59a

…nal everywhere

[.NET] Gherkinline.GetTableCells: avoid string concatenation

d3b06c5

[.NET] AstBuilder.GetCells: avoid unnesscary enumerator and array col…

81d5822

…lections

[.NET] GherkinDialectProvider.ParseJsonContent: Use SourceCodeGenerat…

91380af

…or for System.Text.Json

[.NET] GherkinLine.GetTags: avoid calling RegEx

4b25f1c

gasparnagy reviewed Dec 16, 2024

View reviewed changes

gasparnagy approved these changes Dec 18, 2024

View reviewed changes

gasparnagy merged commit be363bc into cucumber:main Dec 18, 2024
2 checks passed

obligaron deleted the perf branch December 18, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[.NET] Improved parsing time #336

[.NET] Improved parsing time #336

obligaron commented Dec 15, 2024

gasparnagy left a comment

obligaron commented Dec 16, 2024

gasparnagy commented Dec 18, 2024

[.NET] Improved parsing time #336

[.NET] Improved parsing time #336

Conversation

obligaron commented Dec 15, 2024

🤔 What's changed?

⚡️ What's your motivation?

🏷️ What kind of change is this?

♻️ Anything particular you want feedback on?

📋 Checklist:

gasparnagy left a comment

Choose a reason for hiding this comment

obligaron commented Dec 16, 2024

gasparnagy commented Dec 18, 2024