Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACM-12279 | feat: Add a sql parser feature #55

Merged
merged 1 commit into from
Jun 20, 2024

Conversation

ziccardi
Copy link
Contributor

@ziccardi ziccardi commented Jun 14, 2024

This PR adds few utilities to the ocm-common package:

  • SQLParser: an object that can parse SQL string. It validates that the syntax conforms to a given grammar. After the SQL string is correctly validated, extract all the hardcoded values and replaces them with ? placeholders and returns an array of the extracted values.
    Currently the provided GRAMMAR can parse only WHERE clauses, but it can be easily extended.
    See pkg/utils/parser/sql_parser/README.md for details.
    For examples of supported strings and validations, look at the tests
    This object provides a SQLGrammar and a SQLScanner and depends on the StringParser
  • StringParser: with the help of a Scanner and a Grammar, parses a string and ensure it conforms to the provided grammar.
    This object depends on the Scanner and the StateMachine objects. (see pkg/utils/parser/string_parser/README.md for details)
  • Scanner This is just an interface for a scanner object. A Scanner must be implemented to scan the string as required. The package provides a StringScanner as an example that considers each character a token (see pkg/utils/parser/string_scanner/README.md for details)
  • StateMachine This is a generic state-machine object that can be used whenever a state-machine is needed. To build the state machine, each state must be described, with an acceptor function that defines what values are acceptable for that state, then all the transitions must be defined for each state. The state-machine will then be able to decide autonomously what is the next state, based on the received input (see pkg/utils/parser/state_machine/README.md for details).

SQL Parser Features

Supported tokens

The SQL parser uses the String Parser, which in turn takes a Grammar and a Scanner to parse and validate a string.
The SQL Parser thus provides a SQLGrammar and a SQLScanner to the StringScanner. Thanks to this, adding new tokens is just a matter of updating the SQL Grammar.

The Grammar provided in this PR supports the following tokens: COLUMN_NAME, LITERAL, OPEN_BRACE, CLOSED_BRACE, '=', '>', '<', '>=', '<=', '<>', 'LIKE', 'ILIKE', 'IN', 'AND', 'OR', 'NOT', '->', '@>' and all the valid transitions between these tokens.

Security features

When instantiating a SQLParser, you can limit the maximum query complexity (maximum number of logic operators: defaults to 10) and the list of column names that are allowed to be inserted into the query.

Here a few examples:

accept any column: in this example, any column is accepted

parser := NewSQLParser()
_, _, err := parser.Parse("name = 'mickey' and surname = 'mouse'")
if err ...

accept only the surname column:

parser := NewSQLParser(WithValidColumns("surname"))
parser.Parse("name = 'mickey' and surname = 'mouse'")
if err ...

In this case, the parser will return an error: [1] error parsing the filter: invalid column name: 'name', valid values are: [surname]

limit both columns and complexity

	parser := NewSQLParser(
		WithValidColumns("surname", "name", "age"),
		WithMaximumComplexity(2),
	)
	_, _, err := parser.Parse("(name = 'mickey' or name = 'minnie') and surname = 'mouse' and age > 20")
	fmt.Println(err)

In this case we will get an error due to a too high complexity: "[60] error parsing the filter: maximum number of permitted joins (2) exceeded"

@clyang82
Copy link

Thanks for your quick response. I have some comments:

  1. I think it is worth having an individual package to contain your code instead place into utils. maybe parser.
  2. StateMachine is a generic state-machine. I am OK to put into parser since there are no other uses right now. Or you want to put into an individual package. I am fine with both.
  3. can you support jsonb function query? for example: resources.payload -> 'data' -> 'manifests' @> '[{"metadata":{"labels":{"foo":"bar"}}}]'

@ziccardi
Copy link
Contributor Author

ziccardi commented Jun 17, 2024

@clyang82

  1. can you support jsonb function query? for example: resources.payload -> 'data' -> 'manifests' @> '[{"metadata":{"labels":{"foo":"bar"}}}]'

It does. Please look here for some JSONB examples.

And here for the specific query you asked for

If you want to look at how the grammar is defined, you can find the list of supported tokens here and the list of valid transitions here

I moved all the packages into a parser folder. I kept it inside utils however.
WDYT?

@clyang82
Copy link

/ok-to-test

Copy link

@clyang82 clyang82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ziccardi
Copy link
Contributor Author

/ok-to-test

@clyang82
Copy link

/assign @ciaranRoche

Have discussed with @ziccardi , the SQL parser should a common tool so that it can be shared by fleet manager / maestro and other cluster services. Thanks @ziccardi to contribute it. Hi @ciaranRoche Are you fine to have this SQL parser in this repo? Thanks.

@ziccardi ziccardi changed the title feat: add the sql parser ACM-12279 | feat: Add a sql parser feature Jun 19, 2024
Adds sql_parser, state_machine, string_parser, string_scanner utlity.
They are all needed for the SQLParser.
@ciaranRoche ciaranRoche merged commit 2ecfa6e into openshift-online:main Jun 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants