Which column identifier should we use for DDL operation #1081
Replies: 6 comments 5 replies
-
Thanks for making this discussion. The diagrams you've made above match with my understanding, so we can use those as a reference (as far as I'm concerned). For me, the example you give is a sign that the SQLAlchemy function in question should be wrapped. Further, it's obvious from looking at the function that it will fail, and the solution (get the names from the attnums again) is also obvious. I realize it's just an example, but the point is that using immutable identifiers makes such problems and their solutions easy to reason about. In particular, if you were calling your example function, but modified to use names to identify columns, you'd have to either know from the function name or docs that it would modify column names in order to know that you need to re-request the column names for subsequent calls. This is why my solution to your example would be to wrap the SQLAlchemy function so that it can also take Taking views as an example of where stable IDs would really be helpful: If you change the name of a column underlying a view, the view will still see the column (because it's referenced by attnum under the hood). However, the view column name doesn't change. In fact, the "human readable" view definition will change appropriately. Any function which deals with the underlying columns of views then needs to keep the mapping between the underlying columns and the view columns sorted out. While you still need to know the mapping in the case of using attnums, the mapping would be stable until the view is dropped (or the underlying column is dropped with a cascade). This can of course be arbitrarily horrible, since views can source columns from other views. Maybe we should write a decorator that would morph a function that takes a column name or names as input into one that takes attnum(s). |
Beta Was this translation helpful? Give feedback.
-
There is also another problem, the tables have to be refreshed again, in order to get the latest column name, so it is not limited to column names alone.
By wrapping SqlAlchemy function, I am assuming you mean wrapping the
Just to be on the same page, it would be nice if you list the functions that would affect the column names, I am not sure if it is really that much of a problem to have these complexities in place when this problem can be solved in a simpler way. I am looking at this as:
If we go with (1) We would face the cons I mentioned above We could rather go with (2) and then have the function that changes the column name, have def safe_function(column_names):
alter_column_type(column_names)
some_sqlalchemy_function(column_names)
def parent_fun(attnums):
fun(attnums)
column_names = column_attnum_to_names()
alter_column_type(column_names)
def fun(attnums):
column_names = column_attnum_to_names()
some_sqlalchemy_function(column_names)
# No refresh needed since the above function does not use attnums
alter_column_name(attnums)
refresh_tables()
column_names = column_attnum_to_names()
alter_column_type(column_names)
some_sqlalchemy_function(column_names) |
Beta Was this translation helpful? Give feedback.
-
Would this mean we have to create a decorator function for each sqlalchemy function used in our codebase? I would go with a generalised decorator as I find it to be a much better option if we plan to use attnums.
But how many of those change the column name?
I am not against using an immutable identifier, Since the column names are already derived using attnums in the first place, I just find using it as an accessor every time to be redundant as it comes with an additional cost like having to decorate functions, additional queries. The only issue is that there are very few functions which could alter the column names in between function calls, so I think it much more efficient to take a note of those functions and use conventions around them. I find the existing |
Beta Was this translation helpful? Give feedback.
-
@mathemancer @silentninja I'll try to summarize the discussion. Please point out what I get wrong. SQL not supporting immutable identifiers is awkward and we can't really get around that without shifting that awkwardness somewhere. The options for where to shift it, as per your discussion, are:
Both 1 and 2i sound good to me; 2ii not so much. 2i is slower, query-wise, but I have a feeling that we'd find good ways to optimize a hot-path with this setup. 1 has a developer keeping track of more things, which might be a burden for reviewing and onboarding new contributors. It's faster, query-wise, but it's not clear if that's worth-it. I'd say let's do the slower, simpler 2i. As for 2ii, as I said above, let's not overengineer this. |
Beta Was this translation helpful? Give feedback.
-
I think @dmos62 has done a good job of summarizing the discussion so far and I agree with his conclusions:
Let's go with option 2i. If we end up finding that we need to optimize the number of queries we're making for our operations, we can tackle that problem separately at a later date. |
Beta Was this translation helpful? Give feedback.
-
This discussion was resolved in our weekly call, please see notes here. |
Beta Was this translation helpful? Give feedback.
-
Based on the Matrix discussion
We are currently using various identifiers to access a column
columns array index
- This was the preferred way to access a column earlier but we ran into problems due to it being mutable, as in any column name change or dropping intermediate columns would change theindex
position of the column.attnum
- Usingcolumn_array_index
was deprecated due to the reasons stated above and was replaced byattnum
based column reference at theservice layer
in this PR. It is not used directly to reference a column, rather thecolumn name
referred by theattnum
is fetched during runtime and is then used for various operations.column_name
- Most of the database operations are performed using a column name. So it doesn't matter which identifier we go withcolumn_name
will be the last and the ultimate identifier used for database operations.Our current code base uses Django
Column model
which is backed byattnums
on the service layer and a mix ofcolumn_aray_index
andcolumn_name
on thedb
layer. Theattnum
based Django Column model fetches the column name usingattnum
during runtime based on which it constructs a column object or fetch the column_array_index to be used by the db layer. Since we need to replace the deprecatedcolumn_array_index
on the db layer, there is a question of what to replace it with?Current codebase
The solution proposed by the issue
The new solution proposed
Pros of the solution proposed by the issue
Cons:
db
function call will have to call anattnum
tocolumn_name
converter as most of the operations are done using column names. This also increases the number of queriesattnum
and reflecting table is the only way to guarantee the reference, which seems to be infeasible, as any subsequent SQL alchemy function call using column names fetched at the start of the function after a column name alter function would still fail. For example:Since most of the calls are synchronous, do we need this complexity?
Pros of using column name:
db
operations, so we can avoid the need for any conversion layer.Cons:
My proposal is that
Service Layer should fetch column names using attnums and pass those column_names to any db function that will be called
db layer should be using column names
Impure function calls should be handled by the caller, just like any other impure function. It will have to fetch the attnum of the column name that will be changed, call the impure function and then reflect the column name again if it is being used by any subsequent calls. These types of functions are very rare(non-existent from what I have seen) in our codebase
Beta Was this translation helpful? Give feedback.
All reactions