Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLParser: Introduce LIMIT clause #544

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

tomschw
Copy link
Contributor

@tomschw tomschw commented Jun 13, 2023

New feature:

  • Allow the use of LIMIT clause in SQL statements to only return the first n rows of the result frame.

Changes:

  • Adjust the ANTLR parser to detect the LIMIT clause
  • Apply the value of the LIMIT clause to the final sql result using the SliceRowOp in the SQL Visitor
  • Introduce test case for LIMIT clause

@pdamme pdamme self-requested a review June 17, 2023 13:45
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tomschw, thanks for this contribution. It's great to see that the features of the SQL parser are being extended!

Question: Was is a conscious choice to allow only integer literals in the LIMIT clause, i.e., is it intended that expressions are not allowed (like LIMIT 1+1)? I think it would be more flexible by allowing expressions.

Please fix the following points before we can merge it:

  1. Providing a negative value for the limit currently results in unexpected behavior (on my system it prints rubbish data before it segfaults). Please address this problem. In case we support expressions for the limit, please keep in mind that the value may not be known at SQL parse-time. It may become known after constant propagation. But it may not even be known until runtime.
  2. Please don't reimplement how an integer literal is parsed. Currently, you use stoi(), but it would be better to call the SQL parser code that parses integer literals to guarantee exactly the same behavior. (For example, think of different formats we may support at some point, like decimal, hexadecimal, etc.)

JFYI: Given an instance of FrameType, you can derive another instance with the same column types, but all other properties reset to unknown, by FrameType::withSameColumnTypes(). I simplified the code accordingly.

@pdamme pdamme added this to the v0.3 milestone Jun 21, 2023
in the Kernel (introduce lower bounds, upper bounds and sanity check).
the function from the SQL parser.
Also allows the use of not just INTEGER_LITERALS.
Add corresponding test.
@tomschw
Copy link
Contributor Author

tomschw commented Jun 27, 2023

Hi @pdamme, as discussed on Friday the following changes should mitigate your main issues:

  1. As the negative value is cast to a size_t, it will lead to a huge positive number, which might explain your behaviour. I did not fix this by including any checks in the Visitor (due to your mentioned problems) but instead fixed it in the called Kernel, sliceRowOp. Due to the nature of size_t I did not include a check for a lower limit, instead just checking if the lower limit is smaller than the higher limit and that the number of rows in either the Matrix or Frame is higher than the higher limit. This might be also be helpful for other use cases of this Kernel.
  2. The existing parsing is now used by simply handling the value as a literal.

Regarding the decision of not allowing expressions: We did have a quick talk yesterday and would rather only allow literals at the moment. If there is a use case in the future which would necessitate the handling of expressions, we would of course revisit our decision.

@corepointer corepointer modified the milestones: v0.3, v0.4 Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants