Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: SLR parse table generation #27

Draft
wants to merge 3 commits into
base: develop/jsglr2
Choose a base branch
from

Conversation

mpsijm
Copy link
Contributor

@mpsijm mpsijm commented Jul 8, 2019

@udesou Just to get some feedback, and also to check whether this will not conflict with your upcoming refactorings.

I've implemented SLR parse table generation in sdf2table based on http://compilers.iecc.com/comparch/article/01-04-079, as was linked in the FIXME comment. Some remarks:

  • The First and Follow sets are calculated per symbol (rather than per production) and stored as a character class.
    • For SLR, this is precise enough. In order to know which symbols can follow another symbol, you need to check all productions for that symbol anyway.
    • For LR(1), there will probably be more benefit in calculating the First set per production. This should not be difficult when using the symbolProductionsMapping of the grammar.
  • The refactoring of character classes of Refactor CharacterClass: use ICharacterClass where possible #26 is not used yet, as experimenting with SLR parse table generation was my inspiration for this refactoring. (EDIT: done!)
  • In the end, we will probably want to allow the end-user to switch between different types of parse table generation.
  • I have no idea what contextual symbols are, but I could work around them (see 1c2518b#diff-bbb27858751954065a9ac33b038c11e3R146). Did I do this correctly?

Thanks in advance for your feedback! 🙂

@udesou
Copy link
Contributor

udesou commented Jul 9, 2019

Great stuff!
Don't worry about my refactorings, atm I'm just shifting interfaces around to remove the dependency between JSGLR (1 and 2) and sdf2table.
I think a good project (not necessarily for you) would be benchmarking all cases (LR(0), SLR and LR(1)) with respect to parse table generation time, but also parse time. The original sdf2table does SLR if I'm not wrong, but I remember reading some paper saying that the difference between all of those was minimal in terms of parse time, but they were considering GLR, not SGLR.
Finally, contextual symbols are like traditional non-terminals but with context tokens indicating priority and associativity conflicts. For data-dependent parsing, the contexts are used to solve conflicts at parse time, so they should be the same as the original non-terminal.
For disambiguation by grammar transformation, you might need to calculate the sets for the contextual symbols themselves, since they get new productions (I believe that in most cases, the first/follow sets for the contextual symbols are actually smaller than the original non-terminal).
Section 3.1 of http://delivery.acm.org/10.1145/3140000/3136020/sle17-sle17main12.pdf gives a short overview on contextual grammars, but if you wanna know more, chapter 2 of my thesis has a proper description of what those symbols are.

@mpsijm mpsijm force-pushed the slr-generation branch 3 times, most recently from 3f93359 to a1809ec Compare July 18, 2019 12:08
mpsijm added a commit to mpsijm/sdf that referenced this pull request Nov 4, 2019
**The Problem:** In `NormGrammarReader.processSymbol`, a cache is
maintained for symbols that are already processed. However, this cache
did not take into account that Symbols with and without a Label should
be treated as equal (e.g. `tail:Stm` and `Stm`, see
org.spoofax.jsglr2.integration/src/main/resources/grammars/layout-sensitive.sdf3).
These labels are only used to reference the symbols from the layout
constraints, but do not actually contribute to the meaning of a
production.

**Why is this a problem?** For SLR parse table generation (see metaborg#27), all
grammar symbols are assumed to be unique. However, because Symbols with
or without a Label were not treated as equal, the "graph" used to
calculate the First- and Follow-sets became disconnected. This in turn
caused some symbols to have empty First-/Follow-sets. The productions
belonging to these symbols would not get any reduce actions in the parse
table, because if the Follow-set is empty, no lookahead is allowed.

**Current solution:** When processing symbols in the `NormGrammarReader`,
any Label constructor is removed from the symbol term.

**Discussion:** I have tried looking around where these labels are used
(I am assuming layout constraints, but I couldn't find how they are
processed in NormGrammarReader).
Depending on their usage, it might be a good idea to apply this filter
in Stratego already, e.g. in the strategy `module-to-normal-form`.
@udesou any feedback is appreciated, as always 🙂
@mpsijm mpsijm force-pushed the slr-generation branch 2 times, most recently from 7d49367 to 582dcc1 Compare February 25, 2020 14:42
@mpsijm mpsijm force-pushed the slr-generation branch 2 times, most recently from f9083b1 to 084742e Compare November 24, 2020 12:28
mpsijm and others added 3 commits June 2, 2021 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants