Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown Fields #864

Open
cranst0n opened this issue Aug 3, 2023 · 12 comments
Open

Unknown Fields #864

cranst0n opened this issue Aug 3, 2023 · 12 comments

Comments

@cranst0n
Copy link

cranst0n commented Aug 3, 2023

I've run into a situation where I'm trying to decode a buffer created in Python that includes unknown fields.

The Python data looks suspicious because it seems to have duplicate tag numbers. Here's the protoscope output from two examples:

I can parse this one:

1: 100
2: {
  1: {"identifier"}
  2: {
    14:SGROUP
    13: 101
    13:EGROUP
    13:EGROUP
    14:SGROUP
  }
  3: {"rpo"}
  4: {
    "https://www.example.com"
    "1234567890"
  }
  5: {"checksum"}
  6: 1
  7: 241532
}

This one fails:

1: 100
2: {
  1: {"identifier"}
  2: {
    14:SGROUP
    13: 101
    13:EGROUP
    13:EGROUP
    14:SGROUP
  }
  3: {"rpo"}
  4: {
    "https://www.example.com"
    "1234567890"
  }
  5: {"checksum"}
  6: 1
  7: 257372
}
1: {"enum_identifier"}     # <-- This is an UnknownField on the Python side

In the second example, you'll notice the duplicate tag number of 1. Based on the .proto definition, tag 1 should be an enum, which 100 properly resolved to. The second 1 however is a mystery to me. It appears to be an object with a string that matches an enum value, but it should be an integer. The dart parser goes off the rails when it encounters this message.

If I call DiscardUnknownFields on the Python side I can read that serialized message, as the duplicate tag is removed.

Ultimately, I'm unsure if this is an issue with how Dart handles this unknown field or how Python serializes it's unknown field. Unfortunately, I don't have access to the Python side which generates the data so I'm in a tough spot.

@cranst0n
Copy link
Author

cranst0n commented Aug 4, 2023

Following up here I can avoid this issue by changing some code in coded_beffer.dart to check if the _FieldSet already has the given tag number set to a value. Not saying this is a proper way to handle this, just that it solves my issue.

-    if (fi == null || !_wireTypeMatches(fi.type, wireType)) {
+    if (fi == null ||
+        !_wireTypeMatches(fi.type, wireType) ||
+        fs._hasField(tagNumber)) {
      if (!fs._ensureUnknownFields().mergeFieldFromBuffer(tag, input)) {
        return;
      }
      continue;
    }

Would greatly appreciate any feedback.

@cranst0n cranst0n closed this as completed Aug 4, 2023
@cranst0n cranst0n reopened this Aug 4, 2023
@cranst0n
Copy link
Author

cranst0n commented Aug 4, 2023

A more specific approach that only changes optional enum field handling.

      case PbFieldType._OPTIONAL_ENUM:
+        if (fs._hasField(tagNumber)) {
+          fs._ensureUnknownFields().mergeFieldFromBuffer(tag, input);
+        } else {
          final rawValue = input.readEnum();
          final value = meta._decodeEnum(tagNumber, registry, rawValue);
          if (value == null) {
            final unknown = fs._ensureUnknownFields();
            unknown.mergeVarintField(tagNumber, Int64(rawValue));
          } else {
            fs._setFieldUnchecked(meta, fi, value);
          }
+        }
        break;

@osa1
Copy link
Member

osa1 commented Oct 10, 2023

Could you share the encoded message, version of the Dart protobuf you're using, and the error message?

I tried to come up with a repro myself but failed. Unknown fields are already assumed to be repeated so you can have a tag for an unknown field multiple times in a message it's handled fine. If you could share your repro that would make debugging easier.

@cranst0n
Copy link
Author

I'm using protobuf 3.1.0.

The error I'm seeing is:

try {
  ei.ConfigResponse.fromBuffer(bytes);
} catch (e) {
  print('error: $e'); // InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
}

Zip file with raw message and proto definition: message.zip

The field in question starts at byte 32737, if that's helpful at all. For the time being, I manually change the byte to give it a different tag value.

@osa1 osa1 added the bug label Oct 12, 2023
@osa1
Copy link
Member

osa1 commented Oct 12, 2023

Thanks. I can reproduce the bug with the zip file, but the input is huge so it's still not easy to debug.

What's the name of the message (in the proto definitions you shared in the zip file) you use in the text protos in your original message?

Your fixes are not correct, they break existing tests.

Since you seem to have a minimal reproducer already (as you shared in your original message) if you could share it with us with all the involved files this should take a few minutes to debug.

@osa1
Copy link
Member

osa1 commented Oct 12, 2023

The text protos you show in your original message seem to be malformed, they have end groups without matching starts:

  2: {
    14:SGROUP
    13: 101
    13:EGROUP <--------- here
    13:EGROUP <--------- and here
    14:SGROUP
  }

So perhaps this is not a bug in this library.

@cranst0n
Copy link
Author

Yes apologies for the size of the message, but it's the only example I have unfortunately. Not surprised my "fixes" break other things, was just hoping they may provide some context on what the underlying issue is.

The name of the message I use to decode the message is ConfigResponse from the provided proto definition. I referenced it in this comment.

@osa1
Copy link
Member

osa1 commented Oct 12, 2023

The name of the message I use to decode the message is ConfigResponse from the provided proto definition

By "the message" do you mean the text proto messages in your first message, or the message in the zip?

The text messages in your first message are not valid (as explained in my previous comment) so it's not surprising that they don't decode.

Are you able to decode your message with any protobuf implementation other than this library? (maybe in an implementation in another language, e.g. C++ or Java implementations in https://github.com/protocolbuffers/protobuf/)

@cranst0n
Copy link
Author

Sorry, re-reading my previous message and realizing it's super confusing.

By message, I mean the binary file I included in the zip file. The protobuf type is ConfigResponse from the included proto definition file.

The examples in my first comment were excerpts from the protoscope output generated from this message. I understand what you're saying though about how they don't appear to be valid. Which is a bit confusing to me since I am able to decode the message when I manually change the appropriate byte to change the tag number of the field in question.

Hopefully this is a little clearer.

The only other language I've tried is Python, which not surprisingly succeeds probably because it was used to generate the message. I can give it a go in scala or java but don't have time at the moment.

@osa1 osa1 removed the bug label Oct 12, 2023
@njovy
Copy link

njovy commented Nov 17, 2023

@osa1 I am using grpc java and protocol buffer can't decode generated message from java as well. At first I got DATA_LOSS which never happend. I tested a bunch and server did return a correct response as protocol buffer code was generated by official plugin.

@osa1
Copy link
Member

osa1 commented Nov 17, 2023

@njovy please share your proto files and the binary message that Dart protobuf library fails to decode.

@njovy
Copy link

njovy commented Nov 18, 2023

@osa1 I apologize for the inconvenience caused by the outdated messages in the serialized protocol buffer. This was a misunderstanding on my part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants