From 875de4f5a5322281a6e8e720f7663abf83c854c5 Mon Sep 17 00:00:00 2001 From: Tim Bray Date: Thu, 21 Sep 2023 09:28:01 -0700 Subject: [PATCH] character-repertoire issues Signed-off-by: Tim Bray --- draft-ietf-jsonpath-base.md | 13 ++++++++++--- lib | 2 +- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/draft-ietf-jsonpath-base.md b/draft-ietf-jsonpath-base.md index d6e3df48..01a362c0 100644 --- a/draft-ietf-jsonpath-base.md +++ b/draft-ietf-jsonpath-base.md @@ -175,7 +175,7 @@ companion to, JSON Pointer {{RFC6901}}. See {{json-pointer}}. The grammatical rules in this document are to be interpreted as ABNF, as described in {{-abnf}}. -ABNF terminal values in this document define Unicode code points rather than +ABNF terminal values in this document define Unicode scalar values rather than their UTF-8 encoding. For example, the Unicode PLACE OF INTEREST SIGN (U+2318) would be defined in ABNF as `%x2318`. @@ -526,7 +526,7 @@ these nodes as a nodelist. A query MUST be encoded using UTF-8. The grammar for queries given in this document assumes that its UTF-8 form is first decoded into -Unicode code points as described +Unicode scalar valuess as described in {{RFC3629}}; implementation approaches that lead to an equivalent result are possible. @@ -831,6 +831,12 @@ sequences of Unicode scalar values. In other words, normalization operations MUST NOT be applied to either the member name string `M` from the JSONPath or to the member name strings in the JSON prior to comparison. +Note that the "\u" Unicode character escape is dissimilar from the +same escape mechanism in JSON. In JSON, Unicode scalar values greater than +U+FFFF are given in two \u escapes, each carrying one UTF-16 surrogate +code point. In JSON, U+1F609 WINKING FACE EMOJI would be represented +as \uD83D\uDE09, but in JSONPath it would just be be \u1F609. + #### Examples @@ -1808,7 +1814,8 @@ bracketed-selection = "[" S selector *(S "," S selector) S "]" member-name-shorthand = name-first *name-char name-first = ALPHA / "_" / - %x80-10FFFF ; any non-ASCII Unicode character + %x7E-D7FF / ; skip surrogate code points + %xE000-10FFFF name-char = DIGIT / name-first DIGIT = %x30-39 ; 0-9 diff --git a/lib b/lib index 146e2ac1..5f0e0491 160000 --- a/lib +++ b/lib @@ -1 +1 @@ -Subproject commit 146e2ac1396f7d1bbafbeb6f6a46b51ad92e2a63 +Subproject commit 5f0e04913591c0d8fae44a36b3f1ce2177985f93