Releases: neon-sunset/RangeExtensions
2.1.1
Changes
🚀 Enhancements
- Further speed-up vectorized array/span initialization (improves
ToList()
too) (9a667be)
Loop body comparison:
| Old | New | New Aarch64 |
|-------------------------------|-------------------------------|-----------------------------------|
| M02_L01: | M02_L01: | G_M000_IG04: |
| vmovd xmm1,r9d | vmovupd [rax],ymm2 | str q18, [x1] |
| vpbroadcastd ymm1,xmm1 | vmovupd [rax+20],ymm0 | str q16, [x1, #0x10] |
| vpaddd ymm1,ymm1,ymm0 | vpaddd ymm2,ymm2,ymm1 | add v18.4s, v18.4s, v17.4s |
| add r9d,r11d | vpaddd ymm0,ymm0,ymm1 | add v16.4s, v16.4s, v17.4s |
| vmovd xmm2,r9d | add rax,40 | add x1, x1, #32 |
| vpbroadcastd ymm2,xmm2 | add r9d,10 | add w5, w5, #8 |
| vpaddd ymm2,ymm2,ymm0 | cmp r11d,r9d | cmp w4, w5 |
| add r9d,r11d | jg short M02_L01 | bgt G_M000_IG04 |
| movsxd r10,esi | | |
| vmovupd [rax+r10*4],ymm1 | | |
| lea r10d,[rsi+8] | | |
| movsxd r10,r10d | | |
| vmovupd [rax+r10*4],ymm2 | | |
| add esi,10 | | |
| cmp edi,esi | | |
| jg short M02_L01 | | |
|-------------------------------|-------------------------------|-----------------------------------|
🗒️ Notes
If you are interested, make sure to check the past two releases for more details!
https://github.com/neon-sunset/RangeExtensions/releases/tag/2.0.0
https://github.com/neon-sunset/RangeExtensions/releases/tag/2.1.0
Full Changelog: 2.1.0...2.1.1
Published with dotnet-releaser
2.1.0
Changes
🚀 Enhancements
- SIMDify array/span initialization which improves performance by appx. 2-4x depending on supported vector width (PR #16)
- Improve
.ToList()
performance by 2x via directly passingRangeEnumerable
toList<int>
constructor which subsequently callsICollection.ToArray(...)
BenchmarkDotNet=v0.13.3, OS=macOS 13.1 (22C65) [Darwin 22.2.0]
AMD Ryzen 7 5800X 3.80GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=8.0.100-alpha.1.23056.11
[Host] : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2
Method | Length | Mean | Error | Ratio | Allocated |
---|---|---|---|---|---|
RangeToArray | 10 | 8.249 ns | 0.1493 ns | 1.00 | 64 B |
EnumerableToArray | 10 | 15.416 ns | 0.1819 ns | 1.87 | 104 B |
RangeToList | 10 | 18.367 ns | 0.1284 ns | 2.23 | 120 B |
EnumerableToList | 10 | 22.675 ns | 0.4197 ns | 2.75 | 136 B |
RangeSelectToArray | 10 | 14.934 ns | 0.1617 ns | 1.81 | 96 B |
EnumerableSelectToArray | 10 | 27.146 ns | 0.2635 ns | 3.30 | 152 B |
RangeToArray | 100 | 19.584 ns | 0.1200 ns | 1.00 | 424 B |
EnumerableToArray | 100 | 44.881 ns | 0.2353 ns | 2.29 | 464 B |
RangeToList | 100 | 29.633 ns | 0.2484 ns | 1.51 | 480 B |
EnumerableToList | 100 | 97.329 ns | 0.7041 ns | 4.97 | 496 B |
RangeSelectToArray | 100 | 62.172 ns | 0.6928 ns | 3.18 | 456 B |
EnumerableSelectToArray | 100 | 75.454 ns | 0.8087 ns | 3.85 | 512 B |
RangeToArray | 10000 | 690.475 ns | 10.5986 ns | 1.00 | 40024 B |
EnumerableToArray | 10000 | 3,285.207 ns | 22.7170 ns | 4.76 | 40064 B |
RangeToList | 10000 | 1,231.378 ns | 11.5101 ns | 1.78 | 40080 B |
EnumerableToList | 10000 | 7,269.712 ns | 75.8317 ns | 10.52 | 40096 B |
RangeSelectToArray | 10000 | 5,024.135 ns | 49.6868 ns | 7.27 | 40056 B |
EnumerableSelectToArray | 10000 | 5,044.309 ns | 67.8153 ns | 7.30 | 40112 B |
📦 Dependencies
- Bump BenchmarkDotNet from 0.13.2 to 0.13.3 (PR #15) by @dependabot (bot)
🧰 Misc
Full Changelog: 2.0.0...2.1.0
Published with dotnet-releaser
2.0.0
Changes
✨ New Features
Select
and Where
This release expands the list of Range
extensions with several convenience methods that bring it much closer to counterparts in Rust
, Python
and others.
var floats = (0..100).Select(i => (float)i);
var odd = (0..100).Where(i => i % 2 != 0);
var randomNumbers = (0..1000)
.Select(_ => Random.Shared.Next())
.ToArray();
Iterating through these directly is significantly faster when compared to Enumerable.Range
.
In fact, with DynamicPGO
enabled, on osx-arm64
the loop foreach (var i in (0..1000).Select(i => i * 2))
gets within 1.3-1.5x ratio of a regular plain for
loop. Delegate devirtualization has gotten so good!
IList
on RangeEnumerable
and SelectRange
Now both these types also implement IList
.
While RangeEnumerable
is effectively a materialized sequence of numbers, SelectRange
serves a purpose of "Select View" over that sequence, with exact T
values materialized on access similar to plain IEnumerable<T>
.
While this does not conform 100% to existing semantics, it was a necessary change to take the advantage of IEnumerable
internals for methods that lack bespoke implementations in this library or when RangeExtensions
enumerators are boxed.
Bespoke Aggregate
, First
, Last
and more
The library now exposes additional bespoke LINQ method implementations for shorthand usage together with Range
that further improve its usability in functional scenarios.
var digits = (0..10)
.Aggregate(new StringBuilder(), (sb, i) => sb.Append(i))
.ToString();
Assert.Equal("0123456789", digits);
// None of these allocate, and all of them are O(1)
var fifteenthDecimal = (0..1337)
.Select(i => (decimal)i)
.Take(10)
.ElementAt(4); // or even just [4]
// Efficient scan from the end of the range
var lastEven = (0..1337)
.Where(i => i % 2 is 0)
.Last();
Further optimizations on RangeEnumerable
Previously,Range
was nested inside of RangeEnumerable
only being unwrapped when creating an enumerator.
This was complicating the job for the JIT code generation and preventing it from seeing that we have already verified a few conditions like whether the range is valid and doesn't start from end for either of the indexes, whether its start value is greater or equal to zero, etc (depends on particular loop used with range).
In addition, this approach was causing further issues when RangeEnumerable
was boxed or nested in either RangeSelect
or RangeWhere
which made the initial prototype have performance barely better or sometimes worse than private implementations of Select
and Where
iterators from BCL.
Therefore, the best solution turned out to be the most straightforward and likely intellectually disappointing to people who read release notes till the end.
First and foremost, reinterpret-casting Index
es nested inside of Range
into int
s allowed for better constant propagation and dead branch elimination reducing the size and improving the quality of codegen for precondition checks on an enumerable/enumerator creation.
Next, by flattening all nested structs into their parent holders i.e. storing int _start; int _end;
instead of Range
and copying RangeEnumerable
implementation bits to RangeSelect
and RangeWhere
instead of nesting RangeEnumerable
, it was possible to achieve pretty good enregistering of struct fields, effectively removing the abstraction and turning them into locals.
This in combination with delegate inlining (requires .NET 7 and DynamicPGO) allowed to bring the performance of direct foreach (var i in (0..Length).Select(i => i * 2))
loops to within 30% of plain for
loop without any delegates. This also applies to (0..100).Aggregate(...)
methods and any other that uses LINQ methods to produce a value functional-style.
However, I decided to postpone (or maybe it was small brain cope?) correctly abstracting away duplicate code in .SpeedOpt.cs
via either generics or templating/code generation to deliver the update as is. If you are interested, I will be happy to review and accept a PR that closes #14
📦 Dependencies
- Bump Microsoft.NET.Test.Sdk from 17.3.1 to 17.3.2 (PR #9) by @dependabot (bot)
- Bump coverlet.collector from 3.1.2 to 3.2.0 (PR #11) by @dependabot (bot)
- Bump Microsoft.NET.Test.Sdk from 17.3.2 to 17.4.0 (PR #12) by @dependabot (bot)
- Bump Microsoft.NET.Test.Sdk from 17.4.0 to 17.4.1 (PR #13) by @dependabot (bot)
Full Changelog: 1.2.2...2.0.0
Published with dotnet-releaser
1.2.2
Changes
🐛 Bug Fixes
- Fix a typo in ThrowHelpers.EmptyRange() message. (89df618)
📦 Dependencies
- Bump MinVer from 4.1.0 to 4.2.0 (PR #6) by @dependabot (bot)
- Bump BenchmarkDotNet from 0.13.1 to 0.13.2 (PR #7) by @dependabot (bot)
- Bump Microsoft.NET.Test.Sdk from 17.3.0 to 17.3.1 (PR #8) by @dependabot (bot)
Full Changelog: 1.2.1...1.2.2
Published with dotnet-releaser
1.2.1
Changes
🐛 Bug Fixes
📦 Dependencies
- Bump Microsoft.NET.Test.Sdk from 17.2.0 to 17.3.0 (PR #4) by @dependabot (bot)
🧰 Misc
- Re-enable Coveralls publishing because it has been fixed (2248bda)
- Complete coverage to 100% (8d6f42b)
- Update description (f453133)
Full Changelog: 1.2.0...1.2.1
1.2.0
Changes
✨ New Features
- Add Span overloads for CopyTo/TryCopyTo for fun (6af9719)
- Implement ICollection on RangeEnumerable for performance (4318e63)
🐛 Bug Fixes
- Fix ICollection.CopyTo implementation and complete test coverage (c71c076)
📦 Dependencies
- Bump xunit from 2.4.1 to 2.4.2 (PR #3) by @dependabot (bot)
🧰 Misc
- Make tests compatible with XUnit 2.4.2 (76970d0)
- Update README, slighly improve Sum() codegen and replace Count() with Count where appropriate (2b3efb6)
Full Changelog: 1.1.0...1.2.0
1.1.0
Changes
✨ New Features
- Add
rangeEnumerable.Sum()
🧰 Misc
- Further optimizations, cleanup and additional tests (482e7b5)
Full Changelog: 1.0.0...1.1.0
1.0.0
✨Initial Release!
Full Changelog: 6ad37f19b58b2776f800017425ea51f58122e2a4...1.0.0