🤩Let's play with pest3 0.0.0-prealpha0
!😍
#1016
Replies: 3 comments 3 replies
-
Typed tree API is amazing! |
Beta Was this translation helpful? Give feedback.
-
regarding pest-parser/pest3#4 (comment) @TheVeryDarkness not sure if a feature or a bug. One difference I see is that in the saved CSV file, I have a newline ( Intuitively, one may expect
|
Beta Was this translation helpful? Give feedback.
-
This is an astonishing effort! I will only raise a few issues off the top of my head:
|
Beta Was this translation helpful? Give feedback.
-
The early alpha prototype of pest3, the next major revision of the pest parser generator library, was published on crates.io! Read more for details on how you can experiment with this prototype.
What is pest3?
Many years ago, the "original vision" for pest3 was two-fold:
to have an improved simplified grammar. This new grammar should be easier to use, avoiding some pitfalls learned from pest2, as well as easier to optimize and analyze, thus allowing potentially faster code execution and more comprehensive error messages.
to have an alternative API for the parser output that would better leverage Rust's type system (unlike the existing Pairs API), thus reducing boilerplate and avoiding unnecessary errors arising from Pairs API output processing without a need for a third-party crate, such as
pest-ast
.Over different issue threads and discussions, many other cool ideas for pest3 appeared. You can read their latest summary in this discussion: #885
This early alpha prototype of pest3 does not incorporate all those amazing ideas but more-or-less implements the scope of the "original vision" for pest3. In that sense, it's in a state where you can try it, experiment with it, hack on it, and give feedback on it.
Let's get into it then!
How to use this early alpha prototype?
The following steps assume you are already familiar with the current pest, so it will roughly follow this example to show the differences in pest3.
Setup
Start by initializing a new project using Cargo:
Add the
pest3
andpest3_derive
crates to the dependencies section in Cargo.toml:Note that the versions in this early alpha prototype need to be fixed with the
=
symbol (otherwise the version resolution fails) for now.Writing the parser
As before, you can create a file
csv.pest
with the following content:You can see there are two differences from the current pest grammar syntax:
With regards to namespaces, the early prototype supports a syntax to import rules (
use some_module as something
) defined in other grammars, e.g. see this example in tests. This is likely subject to changes, but feel free to experiment with it.As before, Rust needs this grammar file specified using annotations in order to compile it:
You can see one notable difference here from the current pest API: the rule to use to start parsing is not passed in an enum as a runtime function parameter, but declared as a static type parameter.
We can then test it out:
You can see the successful parse output is a lot different from the current pest. The early prototype includes accessor APIs for that raw-typed parse tree output as well as a wrapper API that resembles the existing Pairs API. Those are, however, not improvements over the existing pest, so we will not show them here (you can see them e.g. in these tests). We will focus on the typed accessor API, but it is best illustrated if we continue with that CSV example.
First, let's complete the grammar:
One main thing to notice here is that pest3 has three explicit operators where ones with trivia can be overridden (to handle whitespaces and comments):
~
for optional (0 or more) triviaz-
for no trivia^
for mandatory (1 or more) triviaWe could, for instance, rewrite that grammar as follows:
Note that the operators can also appear in postfix positions, so ideally, it should perhaps be possible to rewrite that part as
(record)^*
, but a bug (or a feature?) in this early prototype appears to prevent it from working as intended in this example.In any case, this is one major difference from the current way in pest where trivia is defined using two special built-in rules and handled implicitly by the rule "atomicity".
Let's continue with our example. With that complete grammar, we can change the Rust program as follows:
As you can see, the parse tree output directly corresponds to the names of the rules we defined (
file
,field
, andrecord
). Besides that, the iterator types are statically defined, so we don't need amatch
statement on the rule type at runtime. Finally, the outputs also correspond to the postfix operator defined in the rules: therecord
rule has onefield
followed by zero or morefield
s separated by a comma, so thefield()
getter on therecord
returns a tuple(&rule::field, Vec<&rule::field>)
(which are assigned tohead_field
andtail_fields
in the example main loop).Done
In summary, you should be able to try out the following features in the early pest3 prototype:
Enjoy!
(And of course, if you have any questions or comments, feel free to write them in this discussion below.
Naturally, if you want to dip your toe in its development, you are more than welcome to open issues or pull requests directly on its repo. As an early prototype, the codebase is not well-documented, so it's primarily for more adventurous minds at this stage.)
What are the next steps?
Besides the general improvements and bug fixes in the prototype, other major work is pending for pest3. A lot of it may be obvious, but it is worth summarizing it in one place.
Most likely nearer-term work 🧪
The voluntary work naturally occurs organically ad hoc whenever someone has time and mood for it (so please do help if you feel like it!), and it is hard to tell exactly what the next steps are.
From my perspective and given the "experimentation" period of pest3, I foresee two main areas of work:
Major refactoring 🏭
While not critical (and perhaps that exciting), two changes could make things nicer and more flexible:
pest
andpest_derive
). With some internal code reshuffling, it should hopefully be possible to just have one crate as a dependency andpest_derive
functionality as its feature, similarly to how it is in e.g.serde
.Open for more experimentation ⚗️
This thread contains a lot of cool ideas: #885 Naturally, it is most likely infeasible to explore deeply all of them, but a few of them appeared on some original pest3 discussions, such as having "meta-rules" (i.e. grammar rules with parameters), that are worth exploring. Having said that, if any of those ideas seem interesting to you, don't hesitate to open an issue on https://github.com/pest-parser/pest3/issues/new to discuss them and how they can be added to the pest3 prototype.
Main missing bits and pieces of work 🚧
Once the prototype stabilizes to some extent (it is hard to define an explicit cut-off for that, but I assume it's when no major work is pending), here's a checklist of items needed before pest3 can see some real-world usage:
Semi-formal semantics
With the new simpler grammar, it should be easier to define what the parser should do in its operations. This is mainly to have a compact summary (for documentation and tests) of what came out of the feature experimentation. For example, regarding namespaces and sequencing operators, we should clearly define their scopes (I assume operators should be only applied within the module they are defined in, but maybe we'll find something else that makes sense after more experimentation). In any case, having pest3 semantics just somewhere compactly written down is most desirable.
WASM / pest_vm
pest's online web editor is an amazing way for people to get started and play with their grammar code. We'll need to figure out how to preserve it and address pest3's new feature in that setting: e.g. the current prototype namespace system is too tied with the filesystem and may not port well to the
pest_vm
setting.Bootstrapping
The pest3 prototype is using the current pest to process its meta-grammar. At some point, the big test will be to rewrite the pest3 meta-grammar in pest3 and generate its parser code.
Handling of pest2 grammars
Given the large ecosystem of existing pest grammars, the existing pest meta-grammar should likely need to be rewritten in pest3 for processing. Given the differences and other potential future breaking changes, it is hard to tell whether they (or if their subset) could easily map to pest3's parser generator process. In any case, having the previous version's meta-grammar parser would help implement tools that can help in the transition.
Tools and docs & incorporation into the main codebase
Besides updating the book and IDE tools to support pest3, some transition-related docs or tools may be needed (e.g. automatic grammar conversion if that's possible). At that point, pest3 will grow out of being a separate prototype, and we will see how to best incorporate it into the existing
pest
codebase (I assume they will live side-by-side and we'll have several "pest 3.0.0-something" releases to try it out).Big thanks and a call for more help 🙌
Massive thanks go to @TheVeryDarkness for integrating the amazing
pest_typed
work into the pest3 prototype. And a huge thank you to everyone who commented on the pest3 restoration effort discussion and to @dragostis for his input on the original pest3 vision.As mentioned a few times here, anyone's help in pest3 development would be much appreciated, so feel free to head to https://github.com/pest-parser/pest3/ and happy hacking! (But again, be aware that the prototype codebase is not in the best shape at the moment, so it's primarily for more adventurous minds at this stage!)
Beta Was this translation helpful? Give feedback.
All reactions