todo


State of the code:

There are some big issues

1. the dialect handling for parsing is very limited
2. schema support and general catalog operations are limited
3. typechecking of query exprs is limited, and other dml and ddl is
   non-existent
4. there are lots of postgres specific things baked into the system
5. the typeconversion code, which handles overloaded function
   resolution, implicit casting, most of the typing of literals, and
   most of determining the precision, scale and nuillability of
   expressions is a complete nightmare
6. typechecking for ansi, sql server and oracle dialects are very
   limited and sql server and oracle are very incorrect
7. the tests are very limited
8. a lot of the parsing code is very crufty and/or suspect
9. the syntax design is a bit poor
10. some of the code is too coupled (especially between modules)


Immediate in progress/provisional soon to be started tasks
new catalog/ new typeid
  includes a bunch of feature updates:
    proper schemas
    proper unquoted id handling
    character sets and collations
    more checking on catalog updates/ddl + nice error messages
      (will be linked with ddl syntax eventually)
    restrict and cascade behaviour
    roles and ownership (will be extended to permissions eventually)
integrate new catalog
better tests, complete catalogs for ansi and sql server
better tests for matching, nullability, precision, literals
copy syntax and parsing from simple-sql-parser


Long term tasks

0. minor issues

put the special operators like between and substring in a separate
   namespace in the catalog and typechecking so they can co-exist with
   functions of the same name

implicit case uses type. The syntax should not depend on this, maybe
replace with something just specific to implicit cast

full dialect type is used all the way through the internals. it is
nice to have a single dialect type for the user, but to reduce couple
internally, there should be minimal dialect types for each of
lexing,parsing,pretty printing and typechecking. Maybe the
lexing,parsing and pretty printing will be the same type, but don't
want the parser to depend on all the code that the typechecking part
of the dialect depends on

the annotation type pulls in a bunch of stuff into the syntax. can the
tree be parameterized on the annotation type? can the annotation type
be implemented in a separate module at least? it uses some ag stuff at
the moment

rename some of the internal modules, types, ctors and functions,
etc. to be more clear and consistent.

fix the catalogs and tests for oracle and sql server so they are not
based on postresql, which makes them very wrong

the dialects used in the tests are a bit confused because of legacy
issues. The tests need to be more methodical anyway.

the odbc catalog does not attach onto the current dialect
properly. the typechecker should access the odbc via the dialect and
not directly. Odbc support both for parsing and typechecking should be
a flag on the dialects or something (and disabled by default). The
odbc typechecking should work in different dialects.

literals don't directly type as unknown. This should be fixed at least
for postgres, and probably is a good idea for at least some of the
other dialects.

try to do code coverage
try to set up continuous integration somewhere - mainly want to catch
build failures with older ghcs
can also do packdeps


also: much better testing.  big weak points:
we don't test nullability, precision and scale much, or the typing of literals
want some more tests on unusual dialects (e.g use numeric and have no
int types, dialect only has one text type and it isn't char or
varchar, dialect doesn't have a decimal type (affects number literals
for instance)
the parser tests are much less comprehensive that in the
simple-sql-parser project
there aren't many anomaly tests
  for instance, bad typing
  using sql which doesn't work in the current dialect/catalog
  there are also a bunch of places where the error messages can be made much nicer
    for instance:
      when you don't match a function it can provide similarly named
      functions, and if you got the name right, it can explain exactly
      what the overloads are, what your types are, and why it didn't
      match any of them (e.g. non are valid for the types, or it is
      ambiguous)
  can also consider the similar names for identifiers too

can the overload matching/implicit casting be tested by generating
lots of test cases using another dbms like postgres and sql server?


add more qualified imports and explicit import lists

get rid of the remaining calls to error

expose internals only via modules called ...Internal. There are a few
internal functions expose for utils or testing. These should be kept
more separate

more work with strings
1. data types
   (ansi has clear restrictions/differences between char/varchar/clob
   and nchar/nvarchar/nclob), postgres more or less just has text only
   (I think char and varchar in postgres are now more or less sugar
   around text).
2. character sets (we will treat a character set as an encoding)
3. collations
every string will carry a character set and optional collation with it
   (collations have the default/implicit/explicit tag)
a dialect can specify that the character set of char,varchar,clob must
   be from a subset of the character sets available, and nchar,
   nvarchar is another subset, or it can allow any character set for
   any string data type.
you set the default character set, plus you can set the character set
   for a table, plus you can set the character set per column, and you
   can convert text from one character set to another. We will
   consider character sets as encodings, character repertoires will
   not have explicit representation in hssqlppp (you can either
   convert between two character sets or you can't, and the
   availability of a conversion is always explicit and direct).
use the ansi rules for collations: each character set has a default
   collation. Not sure if this can be changed, or changed per schema
   for instance. Use the default/implicit/explicit collation rules for
   expressions. table columns can have an explicit collation, and you
   can use the collate operator in expressions (including group by and
   order by).
information about:
  character sets
  datatype/character set compatibility
    maybe each datatype can have All | Whitelist [characterset] |
       Blacklist [characterset]
  collations + which character set a collation is for
  default collation for a character set
  character sets and collations for table columns and for view
    expression types
  anything else (like domains, ...)
will go in the catalog

make example checking automated

add record syntax to createtable, check for other likely victims for
   this treatment

review replace field in createtable, createtableas, move to correct
   place (and check for other areas like this)

change the catalog arg in typecheck functions to be maybe, then
   nothing means use the default catalog for the dialect

1. rewrite the catalog code
support schemas and schema search path properly
support some more object types
help fix the confusion with the type Type
support dependencies and cascade/restrict
make sure the canonicalization of names, and case handling is done
properly
correctness handling of ddl operations
support drop and alter directly
good test coverage directly on the catalog api
understand some of the dialect issues wrt catalog stuff better
handle the odbc option better - this needs some interaction with the
general catalog stuff
later want to handle permissions to some extent. not sure exactly what
to do here or if the code will be in hssqlppp or sqream. We need at
least parsing and catalog support for registering permissions here,
even if the authorization code itself might not be here

2. rewrite the typechecking code from scratch
it is a real mess
lots of things missing

it needs to be much more literate because it is weird, so that
programmers who don't know uuagc stand a chance of working with this
code

there should be much clearer way of dealing with nullability,
precision and scale
more complete typechecking for query exprs
complete typechecking for other dml and for ddl added

there are some tree rewrites which can be done during
typechecking. These should be refactored as separate passes instead of
it being tied together

the typechecking should be better as a compiler front end. The key
addition is tracking where identifiers are defined, so e.g. we can
easily tell which schema a table has matched to, or if a subquery is
correlated.

the typechecker could also check that asts don't contain syntax which
   isn't supported in the current dialect

3. rewrite the typeconversion code and the general nullability,
precision and scale stuff
this is even worse mess
get rid of the hacks and special cases for e.g. datepart

I think the current code tries to be too rule based. For simple
reoccuring patterns, we can use some simple rules. Other than that, I
think passing in functions to e.g. determine the output precision is
better. The default rules and the special case functions should be
connected to the catalog functions in the dialect files right next to
the catalog definitions, instead of being embedded in the type
conversion code itself.

4. simple-sql-parser

there is a bunch of improved syntax and parsing code (and pretty
printing) in the simple-sql-parser project. there can be some copy
pasting into hssqlppp for now, and eventually they should be
synchronized and kept the same (somehow - we can't use
simple-sql-parser directly or use the source code since we must use
.ag in hssqlppp and I definitely don't want to use this in
simple-sql-parser).

5. improve the project documentation and examples


= later tasks

think about demo code to convert between dialects (especially things
like types and functions which need to be desugared)

try to get the old chaos sql preprocessor working again

improve the quasiquotation system: maybe also switch from text back to
strings (apart from the input to parsing, and the output from pretty
printing)

typesafe access and composing sql expressions from haskell

work on ansi procedural sql + get the typechecking for procedural sql
working again. Probably the most important dialect here is TSQL, but
maybe plsql is also important (the plpgsql work should be revived also
since the syntax is already there).

synchronization with simple-sql-parser

the syntaxes should be the same. Maybe there can be a mechanical
conversion from the simple-sql-parser haskell syntax to the uuagc in
hssqlppp

simple-sql-parser should have dialect support and annotations for this
to happen.

can't share the code or use simple-sql-parser from hssqlppp because
the syntax in hssqlppp must be written in ag, and simple-sql-parser it
should not use this so that it is much more accessible.

maybe replace parsec with megaparsec

maybe replace it with uu-parsinglib (has incremental parsing which is
really nice, and also maybe better error messages, and also a better
way to handle left factoring - I think it just does it automatically
more or less)

use wl-pprint-text or something - maybe this will be better for the
complex and identation heavy syntax?

do proper solution to operator precedence parsing. would like to do an
ast pass approach if possible. This is also needed at least for from
clauses and set operators too, both of which almost certainly get the
fixity wrong at the moment.

syntax extensibility?

lint for sql with plugins

parameterized annotation


------------------------

old notes

== typechecking

param query type fn
rough todo/checklist for exprs:
combinequeryexpr
values
withqueryexpr, withquery
jointref variations
join onexprs
funtrefs
table aliases
aggregates, windows
 + add checks for aggregates/group by
liftapp
inlist, inqueryexpr?
go through old typecheck tests and reinsert most of them?
-> want to get much better coverage of typechecking
start looking at getting the type errors back up to the old level

to serve effectively as a front end, the parser should produce nice
   error messages, and the typechecking of crud should be very
   comprehensive
what will useable non ascii character set support take? Maybe this
   works fine already?

== small fixes

see if can fix error messages from lex/parse in qq: come out all with
   the string passed through show so can't read the error message

better approach to parsing selects - the into stuff is a mess

alter the postprocess for astinternal.hs: want to add haddock line for
   the data type and each ctor for all the data defs in the file, but
   not recordize them

review examples: add new examples, clean up code, and example page on
   website. Add simple parsing and typechecking example to index page
rename examples

review docs, esp. haddock which contains all sorts of shockingly bad
   typos, spellings, inconsistencies, etc.

junk tests to get working: extensions, roundtrip?

want to be ready to do comprehensive review of pg syntax support for
   0.7.0, so can work through and get a reasonably comprehensive list
   of what is missing

documentation:

easy way to put in some sql, see if it parses, view the resultant
   ast
same with typechecking: show writing a cat by hand, and examples to
   generate from postgresql database and from ddl sql


== misc small bits

idea for :: cast

-> parse typenames as scalarexpressions by embeding them in
   numberlits (hacky but should be unambiguous) - or antictors?
then transform to cast after parsing. This can then use the proper
precedence in buildExpressionParser to parse ::. Also produce parse
errors if after parsing, you try to cast to something which isn't a
typename

could do something similar for other operators: '.', [], in?

add support for enum types

== website nav

== examples

add compilation of examples to automated tests, also add tests in the
documentation

== report generator

the idea is to have the following for experimentation, evaluate how
   well hssqlppp supports some existing sql, support while developing
   sql (possibly with syntax extensions), and generating
   documentation:
take source sql:
standard postgresql sql in text files
sql taken from postgresql dump from live db
syntax extended sql in text files
do some or all of the following:
parse and type check - report problems
parse, pretty print, reparse and check
generate documentation, catalog
load into postgresql and double check catalog from typechecker
load and dump postgresql, reparse and typecheck for changes

== documentation generator for sql codebases