[Feature]: New higher-performance Filter() operators #758

JakenVeina · 2023-11-18T20:52:04Z

Describe the functionality desired 🐞

Consider the following new filtering operators, which should provide performance improvements over existing operators:

public static IObservable<IChangeSet<TObject, TKey>> Filter<TObject, TKey, TState>(
    this    IObservable<IChangeSet<TObject, TKey>>  source,
            IObservable<TState>                     state,
            Func<TState, TObject, bool>             predicate,
            bool                                    suppressEmptyChangeSets = true);

public static IObservable<IChangeSet<TItem>> Filter<TItem, TState>(
    this    IObservable<IChangeSet<TItem>>  source,
            IObservable<TState>             state,
            Func<TState, TItem, bool>       predicate,
            bool                            suppressEmptyChangeSets = true);

public static IObservable<IChangeSet<TObject, TKey>> FilterValues<TObject, TKey>(
    this    IObservable<IChangeSet<TObject, TKey>>  source,
            Func<TObject, bool>                     predicate,
            bool                                    suppressEmptyChangeSets = true);

Not set on the name for the third one, but it stems from a conversation with @RolandPheasant and @dwcullop in Slack. Alternatives for this would maybe be .FilterSlim() or .FilterDeterministic() or .FilterWithoutCaching(). Whatever the name, the goal would be to keep it consistent with other operators (like .Transform()) that normally keep an internal items cache, but could have versions that don't.

The steps the functionality will provide

For the first two operators, utilizing an IObservable<TState> parameter, the primary improvement operator is to reduce memory allocations required by existing operators that support dynamic filtering.

The third operator would improve performance for static-filtering scenarios, where the filter predicate is relatively fast, and is known to be deterministic, I.E. collection items are immutable, at least from the perspective of filtering. Under this scenario, .Filter() can be implemented without the need for an internal cache of items, becoming allocation-free for all single-change ChangeSets.

This operator would not be applicable as an ObservableList operator, as using deterministic-filtering does not eliminate the need for an internal items cache. An internal cache is still needed to track item indexes, so downstream indexes can be adjusted for items filtered out of the source.

Considerations

Consumers can currently use .Filter() with the IObservable<Func<TObject, bool>> or IObservable<Func<TObject, TKey, bool>> parameters to dynamically control filtering, but this requires either a new Func<> allocation for each change in filtering rules, or a decent bit of tedious code to reuse and re-emit a shared Func<> instance each time, while instead mutating its closure.
Consumers can currently use .FilterOnObservable(), but this requires multiple object allocations per-added-item, to create an IObservable<bool> that controls filtering of that item, and to subscribe to that stream. This approach also doesn't really support collection items being immutable, at all, as replace operations force a teardown and reconstruction of the filtering stream for that item.

The text was updated successfully, but these errors were encountered:

RolandPheasant · 2023-11-20T18:33:51Z

I think the filter with state option is inspired, and now that you have suggested I am surprised that no-one, myself included have ever to introduce it. I think it's a no-brainer that we should go with it 😃

Regardless the stateless filtering, I see a pitfall:

An item is added and matches the filter.
An update arrives which no longer matches the filter.
This means that the original item should be removed.

We could not deterministically specify that 3 should be a remove without knowing that it was originally a match. Alternatively we could always fire a remove, but that would be a load of misleading notifications. The same would be true when we determine an update after an add. I think most operators could handle this but I think it would take a lot of looking to be sure.

JakenVeina · 2023-11-20T22:22:46Z

We could not deterministically specify that 3 should be a remove without knowing that it was originally a match.

Sure we can, we re-run the predicate for .Previous, because one of the assumptions for consumers to fulfill is that they don't mutate items. That's where the other assumption that "the filter predicate runs relatively fast" comes from. Or is .Previous not always populated for replacement changes?

RolandPheasant · 2023-11-21T06:31:56Z

Of course. You're right. The third operator must assume that the object does not mutate and that the filter is a pure function.

dwcullop · 2023-11-22T17:03:02Z

For clarity, I want to make sure I understand the use-cases, which might look something like this?

class Person
{
    public int Id { get; }
    public string Name { get; set; }
    public int Age { get; set; }
}

var cache = new SourceCache<Person, int>(p => p.Id);
var ageObservable = viewModel.WhenAnyValue(vm => vm.MinAgeFilter);  // Connected to some UI, for example

var filtered = cache.Connect().Filter(ageObservable, static (age, p) => p.Age >= age)
                      .Bind(out this.filteredByAgeCollection)
                      .Subscribe();

And that would yield filteredByAgeCollection having the subset of people with the minimum age value?
And this is an improvement because the current options require the ageObservable to be transformed into the actual predicate and that could require extra closures and allocations, etc.?

Am I understanding everything correctly? If so, I like it.

RolandPheasant · 2023-11-22T23:54:39Z

That looks about right

dwcullop · 2023-11-26T17:37:50Z

We could not deterministically specify that 3 should be a remove without knowing that it was originally a match. Alternatively we could always fire a remove, but that would be a load of misleading notifications. The same would be true when we determine an update after an add. I think most operators could handle this but I think it would take a lot of looking to be sure.

I think we should avoid emitting "misleading notifications" (the first event observed for a given Key should always be Add, the last should always be Remove, there shouldn't be another Add unless there has been a remove, and Update and Refresh should be appear between these two, etc.). We could even modify the Aggregator to validate that operators always have this behavior via testing.

Operators should also be written to work-around/ignore misleading notifications as much as possible. These two approaches together should avoid bugs. But consumers would likely be surprised to receive a Remove for something for which there was never an Add. I know I would be.

We could not deterministically specify that 3 should be a remove without knowing that it was originally a match.

Sure we can, we re-run the predicate for .Previous, because one of the assumptions for consumers to fulfill is that they don't mutate items. That's where the other assumption that "the filter predicate runs relatively fast" comes from. Or is .Previous not always populated for replacement changes?

I understand how applying the predicate to the Previous will enable emitting the correct Add/Remove in response to an Update. I hadn't considered that and I think that's a great idea. However, what about the following case?

Add Event fires
Predicate is applied with the latest from the State observable and returns false
Add Event is omitted from the downstream events
State observable fires with an updated value that would make the Predicate return true
????

How do we find the item that was just dropped so that an Add event can now be omitted unless there's a local cache of all of the items? You have the same issue with any State observable change. I feel like you need the local cache of all items so that the filter can be re-applied ... Or am I missing something?

JakenVeina · 2023-11-27T04:00:03Z

State observable fires with an updated value that would make the Predicate return true.

The approach of applying the predicate to .Previous instead of keeping a cache of all items only works when the predicate is stateless/static, for specifically this reason. I.E. of the proposed new extensions, only .FilterValues() avoids keeping an internal cache, because it specifically doesn't support a changing predicate, or predicate state.

dwcullop · 2023-11-27T22:12:22Z

The approach of applying the predicate to .Previous instead of keeping a cache of all items only works when the predicate is stateless/static, for specifically this reason. I.E. of the proposed new extensions, only .FilterValues() avoids keeping an internal cache, because it specifically doesn't support a changing predicate, or predicate state.

But it isn't stateless. The other observable is even named IObservable<TState> state. Even if the filter is deterministic with regard to the Cache's value, when a new value comes in from the state observable, doesn't that mean all of the values have to be re-evaluated? At least, that's what it seems like to me because the predicate is Func<TState, TObject, bool> predicate. Am I missing something?

JakenVeina · 2023-11-28T04:47:52Z

Am I missing something?

Yes, you're missing the fact that that overload won't be using the "just apply the predicate to .Previous instead of keeping a cache of all items" approach. The only one that will is the one that doesn't have a state parameter. .FilterValues().

dwcullop · 2023-11-30T23:30:58Z

Got it, thanks!

JakenVeina · 2024-03-02T04:58:51Z

Still need to implement the two TState variations.

dwcullop · 2024-03-03T01:37:45Z

Still need to implement the two TState variations.

What does that look like?

JakenVeina · 2024-03-03T07:13:25Z

Exactly what they look like in the original issue description.

dwcullop · 2024-03-09T18:43:49Z

Exactly what they look like in the original issue description.

Are you going to implement those? I might be able to do it.

JakenVeina · 2024-03-10T01:42:46Z

I've got an implementation in progress, yeah, no worries. The cache version is done, and I was hoping for a little feedback from @RolandPheasant in Slack, with regard to the list version.

JakenVeina · 2024-04-08T04:19:31Z

Got a fun result here.

| Method                           | FilterPolicy    | Mean      | Error    | StdDev   | Ratio | RatioSD | Gen0     | Gen1     | Allocated | Alloc Ratio |
|--------------------------------- |---------------- |----------:|---------:|---------:|------:|--------:|---------:|---------:|----------:|------------:|
| RandomizedEditsAndStateChanges_1 | ClearAndReplace |  39.27 ms | 0.770 ms | 0.917 ms |  1.00 |    0.00 | 923.0769 |  76.9231 |   3.83 MB |        1.00 |
| RandomizedEditsAndStateChanges_2 | ClearAndReplace |  25.55 ms | 0.509 ms | 0.865 ms |  0.66 |    0.03 | 468.7500 |  62.5000 |   1.88 MB |        0.49 |
| RandomizedEditsAndStateChanges_3 | ClearAndReplace |  39.71 ms | 0.783 ms | 2.063 ms |  1.03 |    0.04 | 500.0000 | 142.8571 |   2.01 MB |        0.52 |
|                                  |                 |           |          |          |       |         |          |          |           |             |
| RandomizedEditsAndStateChanges_1 | CalculateDiff   | 122.78 ms | 2.274 ms | 3.261 ms |  1.00 |    0.00 | 750.0000 |        - |   3.87 MB |        1.00 |
| RandomizedEditsAndStateChanges_2 | CalculateDiff   |  72.88 ms | 1.357 ms | 2.481 ms |  0.60 |    0.02 | 857.1429 | 142.8571 |   3.56 MB |        0.92 |
| RandomizedEditsAndStateChanges_3 | CalculateDiff   |  40.89 ms | 0.803 ms | 1.604 ms |  0.33 |    0.01 | 909.0909 |  90.9091 |   3.63 MB |        0.94 |

I worked out 3 different implementations for the List version of this operator, and the one that seems to be the clear winner for performance (Version 3) is actually a FUNCTIONAL improvement as well: it preserves item ordering, through the filter. E.G. if unfiltered items are [A, B, C, D, E] and B and D get filtered out, the downsteam is guaranteed to be [A, C, E].

Verision 1 is the closest implementation to the current List .Filter() operators: it keeps 2 lists internally, an unfiltered one, and a filtered one. Items are always added to the end of the filtered list, so item ordering is deterministic, but not preserved compared to the unfiltered list. When removals need to be processed, the filtered list is linearly searched to find the item.

Version 2 attempts to improve on Version 1 by only keeping one list, internally, with each item being stored alongside its filtered index, if any. Removes can now be processed without a linear search for the item, but it instead requires a full iteration of the list to update the "filtered index" values, in accordance with the removal.

Version 3 is similar to Version 2, with additional work put in for Add changes to preserve order, which then allows a variety of small optimizations when processing other change types, based on the assumption that "filtered index" values are in-order.

For extra fun, if we go with Version 3, I think we could theoretically re-implement all the existing List .Filter() operators to use this one, under-the-hood.

JakenVeina · 2024-04-08T05:19:00Z

Regarding the Cache version of the operator...

| Method                           | Mean     | Error   | StdDev   | Ratio | RatioSD | Gen0      | Allocated | Alloc Ratio |
|--------------------------------- |---------:|--------:|---------:|------:|--------:|----------:|----------:|------------:|
| RandomizedEditsAndStateChanges_1 | 324.9 ms | 6.37 ms |  5.96 ms |  1.00 |    0.00 | 2000.0000 |   8.57 MB |        1.00 |
| RandomizedEditsAndStateChanges_2 | 325.6 ms | 6.38 ms |  6.55 ms |  1.00 |    0.02 | 2000.0000 |   8.58 MB |        1.00 |
| RandomizedEditsAndStateChanges_3 | 356.8 ms | 6.94 ms | 10.17 ms |  1.10 |    0.03 | 2000.0000 |   8.57 MB |        1.00 |
| RandomizedEditsAndStateChanges_4 | 327.3 ms | 6.48 ms |  8.43 ms |  1.01 |    0.03 | 2000.0000 |   8.57 MB |        1.00 |
| RandomizedEditsAndStateChanges_5 | 325.5 ms | 6.20 ms |  7.84 ms |  1.00 |    0.03 | 2000.0000 |   8.57 MB |        1.00 |

All the slightly-different implementations I've tried have essentially the same performance.

The different things I've tried include:

Using only native operators (Version 2)
Using dual-locking, like the List version of the operator (upstream and downstream are locked separately, but have overlap, to preserve ordering of changesets). (Versions 3 and 5)
Having all the code on a dedicate Subscription class, to promote readability, as there's quite a lot of code. (Versions 4 and 5)

I'm planning on going with Version 5.

JakenVeina added the enhancement label Dec 18, 2023

JakenVeina self-assigned this Jan 4, 2024

JakenVeina mentioned this issue Jan 6, 2024

Feature/stateless filtering and transforming #823

Merged

JakenVeina closed this as completed Feb 22, 2024

JakenVeina reopened this Mar 2, 2024

JakenVeina linked a pull request Sep 8, 2024 that will close this issue

New Filter Operators: #758 #941

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: New higher-performance Filter() operators #758

[Feature]: New higher-performance Filter() operators #758

JakenVeina commented Nov 18, 2023 •

edited

Loading

RolandPheasant commented Nov 20, 2023

JakenVeina commented Nov 20, 2023 •

edited

Loading

RolandPheasant commented Nov 21, 2023

dwcullop commented Nov 22, 2023

RolandPheasant commented Nov 22, 2023

dwcullop commented Nov 26, 2023

JakenVeina commented Nov 27, 2023

dwcullop commented Nov 27, 2023

JakenVeina commented Nov 28, 2023

dwcullop commented Nov 30, 2023

JakenVeina commented Mar 2, 2024

dwcullop commented Mar 3, 2024

JakenVeina commented Mar 3, 2024

dwcullop commented Mar 9, 2024

JakenVeina commented Mar 10, 2024

JakenVeina commented Apr 8, 2024 •

edited

Loading

JakenVeina commented Apr 8, 2024 •

edited

Loading

[Feature]: New higher-performance Filter() operators #758

[Feature]: New higher-performance Filter() operators #758

Comments

JakenVeina commented Nov 18, 2023 • edited Loading

Describe the functionality desired 🐞

The steps the functionality will provide

Considerations

RolandPheasant commented Nov 20, 2023

JakenVeina commented Nov 20, 2023 • edited Loading

RolandPheasant commented Nov 21, 2023

dwcullop commented Nov 22, 2023

RolandPheasant commented Nov 22, 2023

dwcullop commented Nov 26, 2023

JakenVeina commented Nov 27, 2023

dwcullop commented Nov 27, 2023

JakenVeina commented Nov 28, 2023

dwcullop commented Nov 30, 2023

JakenVeina commented Mar 2, 2024

dwcullop commented Mar 3, 2024

JakenVeina commented Mar 3, 2024

dwcullop commented Mar 9, 2024

JakenVeina commented Mar 10, 2024

JakenVeina commented Apr 8, 2024 • edited Loading

JakenVeina commented Apr 8, 2024 • edited Loading

JakenVeina commented Nov 18, 2023 •

edited

Loading

JakenVeina commented Nov 20, 2023 •

edited

Loading

JakenVeina commented Apr 8, 2024 •

edited

Loading

JakenVeina commented Apr 8, 2024 •

edited

Loading