Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement deepCopy as copying, instead of merge #742

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

osa1
Copy link
Member

@osa1 osa1 commented Aug 25, 2022

Currently when we copy a message we create an empty message, and then merge the
copied message to the empty message.

This is not optimal as merging is a more complicated operation then copying.

This implement deepCopy as simple deep copy and halves deepCopy benchmark
results.

GeneratedMessage.deepCopy also gets a optional freeze argument for
optionally generating frozen messages. Previously we would have to copy and
then make another pass to mark the message as frozen.

Changes:

  • deepCopy method added to PbList, PbMap, _ExtensionFieldSet,
    UnknownFieldSet, _FieldSet

  • GeneratedMessage.deepCopy now calls _FieldSet.deepCopy, doesn't use
    merging

(I also removed generating clone() overrides, but I will revert it for now
and maybe do it in a separate PR)

Benchmarks:

  • Before: protobuf_deep_copy(RunTimeRaw): 635.2863492063492 us.
  • After: protobuf_deep_copy(RunTimeRaw): 307.04386503067485 us.
  • Diff: -328.23 us, -51.66%

cl/470187112


final copy = _FieldSet._(
message,
null, // TODO: event plugin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave a LDAP or an issue number in todos.

Suggested change
null, // TODO: event plugin
null, // TODO(omersa): event plugin

values,
null,
_unknownFields?.deepCopy(freeze: freeze),
freeze, // TODO: Can we use `_frozenState` here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here


PbList<E> deepCopy({bool freeze = false}) {
// TODO: We know the capacity, is it possible to allocate a list with the right size?
final wrappedList = <E>[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could possibly use List.generate to fill in the right values from the start, thereby avoiding the re-allocation.

@osa1
Copy link
Member Author

osa1 commented Sep 6, 2022

This causes a lot of breakage internally. I think one of the reasons is probably that I'm assuming PbList for repeated fields and PbMap for map fields, but map and list allocation is done via GeneratedMessage.createRepeatedField (returns List<T>) and GeneratedMessage.createMapField (returns Map<K, V>). I should check the field type (instead of field value type), and use Map and List API similar to how we merge map and list fields.

Since we can't have nested lists and maps, cloning of nested maps and lists will have to go through GeneratedMessage.deepCopy, which will again look at field types and use List/Map API to clone maps and lists. So we won't need deepCopy in PbMap and PbList types.

Interestingly, we assume PbList and PbMap in _FieldSet._markReadOnly:

void _markReadOnly() {
  if (_isReadOnly) return;
  _frozenState = true;
  for (var field in _meta.sortedByTag) {
    if (field.isRepeated) {
      PbList? list = _values[field.index!];
      if (list == null) continue;
      list.freeze();
    } else if (field.isMapField) {
      PbMap? map = _values[field.index!];
      if (map == null) continue;
      map.freeze();
    } else if (field.isGroupOrMessage) {
      final entry = _values[field.index!];
      if (entry != null) {
        (entry as GeneratedMessage).freeze();
      }
    }
  }

  _extensions?._markReadOnly();
  _unknownFields?._markReadOnly();
}

If I'm right that the breakage here caused by users overriding list and map field types, then I'm not sure how _markReadOnly works. Maybe users that override map and list types never freeze their messages?

We should just rewrite this library..

@osa1
Copy link
Member Author

osa1 commented Sep 8, 2022

I managed to fix at least some of the downstream breakage by removing the assumptions in the implementation about repeated fields being PbList and map fields being PbMap.

The fact that we csan't assume this means freezing messages is buggy, for the reasons described in my previous comment above.

Performance is still much better than merge-based implementation. I will update the PR description, and refactor the code a little bit (document, revert some of the changes).

@osa1
Copy link
Member Author

osa1 commented Sep 9, 2022

I managed to fix at least some of the downstream breakage by removing the assumptions in the implementation about repeated fields being PbList and map fields being PbMap.

This version fixed some of the previous failures, but introduced new ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants