Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Buffer class #239

Open
jaapio opened this issue Mar 10, 2023 · 3 comments
Open

Improve Buffer class #239

jaapio opened this issue Mar 10, 2023 · 3 comments

Comments

@jaapio
Copy link
Member

jaapio commented Mar 10, 2023

Right now we have a number of locations using the buffer class to collect lines from the documents to parse them as blocks. These blocks are always indented. When consuming this buffer we need to remove the initial indentation of the block to be able to process the lines are normal lines.

We want the buffer to be responsible for this un-indenting part. To centralize the way we are doing this. The complexity here is that the indentation of a block may defer per situation. and can be something between 2 or PHP_MAX_INT spaces.

Locations where to do this improvement:

This method removes the indenting. Based on the first line of a block.
\phpDocumentor\Guides\RestructuredText\Parser\Productions\BlockQuoteRule::normalizeLines

Here I do something simular for lists
packages/guides-restructured-text/src/RestructuredText/Parser/Productions/ListRule.php:107
packages/guides-restructured-text/src/RestructuredText/Parser/Productions/EnumeratedListRule.php:109
Definition lists: packages/guides-restructured-text/src/RestructuredText/Parser/Productions/DefinitionListRule.php:99

See #225 for the original discussion.

@greg0ire
Copy link
Contributor

greg0ire commented Jul 3, 2023

So if I understand well, you would like getLines() to invert the current situation:

  • have getLines() return something unindented
  • when needing to preserve the indention, call another method (like getLinesWithOriginalIndent()

Would that work for you? Oh and also, I think there is a need for detecting the ident, right? So that the caller does not have to provide it?

@jaapio
Copy link
Member Author

jaapio commented Jul 3, 2023

The detection of indentation might be hard to wrap in a method as it is depending on the context where you are in the parser.

But what you see a lot in RST is that indentation marks the beginning of a block. So when the parser detects a line starting with a whitespace it will start collecting a block. Which ends when the next line is less Indented.

The content of each block is passed to a separate parser pipeline to make it easier to work with, we decided to remove the initial indentation of a block before passing it to the new pipeline. So for example a listitem will always start with a dash. If we would keep the indentation of a block. The list item detection should be aware of this indentation. Right now it doesn't have to know about it.

So yes.. remove indentation would be the default. Unless we want to do something special, then we would need the raw content.

@greg0ire
Copy link
Contributor

greg0ire commented Jul 3, 2023

Okay… I have started working on something, please let me know if you think it is the right direction: #442

I have managed to reduce code in BlockQuoteRule and DirectiveRule thanks to it, but for the list rule, it is not so easy, I'm not 100% sure I can leverage what I did in the buffer class for that particular piece of code. I think I'm a bit confused as to whether you're expecting me to leverage it for deindenting the list globally, or rather to deindent each item, or both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants