-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support embedded expressions/braces in double quoted strings #26
base: main
Are you sure you want to change the base?
Conversation
I noticed that embedded braces do not support scoped identifiers. For example the following test case will fail:
Not sure if this should be addressed in this PR or we can make a separate PR for it since this one has quite a few changes already. |
choice.rep($._string_character, $._escape_sequence, $.embedded_expression, $.embedded_brace_expression), | ||
'"', | ||
), | ||
|
||
_string_character: $ => choice(token(/([^"\\])/), token(prec(1, choice('#', '//', '/*')))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From last time we talked, did you try putting repeating $._string_character
in it's own rule, so we can just have a $.string_body
rule that encompass a full consecutive string?
string_body: $ => rep1($._string_character)
or
string_body: $ => rep1(choice(token(/([^"\\])/), token(prec(1, choice('#', '//', '/*')))))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is a nit, but I don't think you need token
around /([^"\\])/
. And that rule also doesn't have to use a regex group anymore ()
since there's only one option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try that it creates a conflict. Any suggestions for resolving it?
Unresolved conflict for symbol sequence:
'"' string_body_repeat1 • '_string_character_token1' …
Possible interpretations:
1: '"' (string_body string_body_repeat1) • '_string_character_token1' …
2: '"' (string_body_repeat1 string_body_repeat1 • string_body_repeat1)
Possible resolutions:
1: Specify a left or right associativity in `string_body`
2: Add a conflict for these rules: `string_body`
I suspect that adding this rule as is might lead some problems. Tree-sitter prioritizes matching based on length. Say for example we have:
"sometext$var"
I think the $
will be captured as part of the string body instead of an embedded expression, and therefore the whole thing will be parsed as a string body. We might need to make some modifications to accommodate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd want right associativity for $.string_body
since that'll capture the largest node. Left associativity would capture the $.string_body
on first match so it'd end up just capturing individual characters like it does without the repeat1
.
If we run into problems with $
and {
, we could do something like we do for $.xhp_comment
where we exclude problematic token characters and only allow them in specific sequences. Something like,
string_body: $ =>
repeat1(
choice(
seq(opt('\\'), choice(/[^"\\${]/, token(prec(1, choice('#', '//', '/*'))))),
seq(
// $ is allowed only if not followed by an identifier character
repeat1('$'),
opt('\\'),
choice(/[^"\\${a-zA-Z_\x80-\xff]/, token(prec(1, choice('#', '//', '/*')))),
),
seq(
// { is allowed only if not followed by $
repeat1('{'),
opt('\\'),
choice(/[^"\\${]/, token(prec(1, choice('#', '//', '/*')))),
),
),
),
I checked out your branch and tried adding right associativity to repeating $. _string_character
and it doesn't seem conflict with $.variable
(honestly surprised it doesn't cause issues (at least not immediate ones)).
This is good progress, but there are some extended test cases that seem to break it. Also, we have some inconsistency with the way we nest expression items. Consider the following test case/output:
In the case of the double quoted string, we have the variable in the level between the two selection expressions. This is incorrect, as the selection of the variable isn't against the value of |
In case it could be helpful, I've implemented string parsing for PHP in the tree-sitter-php repository. Please use whatever is useful to you: tree-sitter/tree-sitter-php#72 |
Started looking into this and you're right that the inconsistency comes from Reusing existing call/subscript/selection definitions Lines 154 to 164 in 8ac0c52
Replacing Scanner hack Fixing custom call/subscript/selecting definitions |
Summary
This PR adds support for embedded expressions and embedded braces in double quoted strings. Note that this PR addresses a similar issue to PR #25. Notably this PR also adds support for embedded expressions and this implementation is entirely done in
grammar.json
(notscanner.cc
).Here are some examples of the constructs that are now supported:
I also added support for escape character sequences so the following examples should parse correctly:
Initially there were some issues with the parser incorrectly interpreting instances of
#
,//
,/*
in the string as a comment, but this should not be a problem anymore!Requirements (place an
x
in each[ ]
)