Preliminary work to parse a "pythonic" language and translate it to C #176
Replies: 6 comments 2 replies
-
Some comments:
|
Beta Was this translation helpful? Give feedback.
-
struct expression : lexy::expression_production {
static constexpr auto whitespace = dsl::ascii::space;
struct nested_expr : lexy::transparent_production {
static constexpr auto rule = dsl::recurse<struct expression>;
};
struct paren_expr{
// corresponds to "paren_or_tuple" that was suggested
static constexpr auto rule =
dsl::parenthesized.list(dsl::p<nested_expr>, dsl::sep(dsl::comma));
};
struct expected_operand { static constexpr auto name = "expected operand"; };
// We need to specify the atomic part of an expression.
static constexpr auto atom = [] {
// shouldn't use dsl::p<expression> instead dsl::p<nested_expr>
auto var_or_call = dsl::p<identifier> >> dsl::if_(dsl::p<paren_expr>);
return
dsl::p<paren_expr>
| var_or_call
| dsl::p<string_literal>
| dsl::p<number>
| dsl::error<expected_operand>;
}();
Or something else?
|
Beta Was this translation helpful? Give feedback.
-
So apparently I don't even need lexy::recurse... well okay!
But you're using lexy::parse_as_tree which does not produce any values of
rules.
Oh now I understand, that's right that I could let lexy directly extract
typed values instead ... I guess I would have to change my parser then.
…On Tue, Sep 26, 2023 at 7:50 PM Jonathan Müller ***@***.***> wrote:
Or something else?
You can remove expressions::whitespace and expression::nested_expr
entirely. It's only in the example because it changes the whitespace rules
when there is a pending parenthesis.
so to answer, yes, I care about the value.
But you're using lexy::parse_as_tree which does not produce any values of
rules.
anything with lexy is not fast to build
It is designed to be isolated to single translation units for that reason
;)
—
Reply to this email directly, view it on GitHub
<#176 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAECK3LMBIWWIRRACXKB3ELX4MIU5ANCNFSM6AAAAAA5FETX4M>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Even if the is_last_node() is quite useful to generate json, that json becomes invalid if "noise" is removed (things like whitespace etc), because if a noise token is the last leaf, it will cause a faulty comma to be inserted in the json, unless I check the previous leaf and remove it as well, but that doesn't seem easy. That leads me to think I should use lexy to generate C code, not just to emit a parse tree. I want to use python since it feels more comfortable to make a proof of concept. Looking at parser_tree object, it doesn't seem I could really iterate over it and remove node/leaves I want, to have the "actual" is_last_node() I want after that. I don't really know if I could rewrite the parser_tree while filtering what I don't want. |
Beta Was this translation helpful? Give feedback.
-
I fixed it by just iterating through the whole tree and filtering things I did not want. Current todo list:
I will update it there: |
Beta Was this translation helpful? Give feedback.
-
I wrote a longer article https://jokoon.github.io/How_to_parse_a_C_like_language_with_lexy.html It deals with parsing certain things like tuples, floating points/integers, indent/dedent tokens, and links to a parse tree walker. |
Beta Was this translation helpful? Give feedback.
-
My goal is a language that gets translated to C, to benefit from existing C compilers and libraries.
That would be some sort of better C, with python indentation, strings, lists, dict and tuple-as-struct. I'm new to parsing so this is a training project.
I used lexy to write a first draft of a parser. It might have flaws, but I'm thankful to foonathan for the help :)
The first step was to tokenize python style indents, I did it using a python script:
Of course this code sample makes no sense, but I'm planning to rely on C errors.
The parser is about 300 lines of lexy rules:
Here is the pretty tree output. It's possible to remove whitespace noise with some CTRL F:
And the json (I haven't tested it, it seems like it's correct json)
My current goal is to iterate the parse tree and to generate C. I have litterally no idea how difficult that will be, but I'm still excited to try!
Don't hesitate to let me know what you think!
Beta Was this translation helpful? Give feedback.
All reactions