Skip to content

internals

Alexey Borzov edited this page Nov 29, 2021 · 2 revisions

Parser implementation

The SQL string is first processed by Lexer and converted to TokenStream object aggregating Token instances. Parser then goes over that stream and builds the Abstract Syntax Tree of Nodes.

Lexer class

This class is based on flex lexer defined in src/backend/parser/scan.l file of Postgres sources.

Token class

This class represents a token and has knowledge of its type, value and position in input string. It implements a matches() method that checks whether token's type and/or value matches the given values.

TokenStream

This is a stream of Tokens. The class allows forward movement with next() and skip(), lookahead with look(), matching of the current token(s) with matches() and expect(). There are also slightly optimized wrappers for most common matches() cases: matchesKeyword(), matchesSpecialChar(), matchesAnyType(), matchesKeywordSequence().

Token and TokenStream implement magic __toString() method allowing easy debug output:

use sad_spirit\pg_builder\Lexer;

$lexer = new Lexer();
echo $lexer->tokenize('select * from some_table');

yields

keyword 'select' at position 0
special character '*' at position 7
keyword 'from' at position 9
identifier 'some_table' at position 14
end of input

Parser class

This is a LL(*) recursive descent parser. It tries to closely follow bison grammar defined in src/backend/parser/gram.y file of Postgres sources, but the implementation is completely independent.

Differences from Postgres parser: the following constructs are not supported

  • TABLE name alias for SELECT * FROM name
  • SELECT INTO
  • WHERE CURRENT OF cursor for UPDATE and DELETE queries
  • Undocumented TREAT() function