Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add table prefixes to to_sql method #60409

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

Diadochokinetic
Copy link

@Diadochokinetic Diadochokinetic commented Nov 24, 2024

This PR adds a prefixes parameter to the to_sql method and passes it down to the underlying sqlalchemy.Table class. It takes into account that temporary tables are not stored in a database's meta data and adds temporary table specific methods for _exists_temporary and _drop_temporary_table.

Unfortunately, temporary tables don't work with null pooling, so I had to create additional sqlalchemy fixtures with the default pooling.

TODO:

  • case sensitivity check always warns: SQLDatabase.check_case_sensitive checks the databases's meta data. It will never find a temporary table and always throw a Warning. - I disabled case sensitivity checks for temporary tables.

@Diadochokinetic Diadochokinetic changed the title ENH: Add table prefixes ENH: Add table prefixes to to_sql method Nov 24, 2024
@Diadochokinetic Diadochokinetic marked this pull request as ready for review November 26, 2024 15:16
@WillAyd
Copy link
Member

WillAyd commented Nov 27, 2024

Thanks for the PR, but this seems to only work with SQLAlchemy, meaning we won't get the same behavior for our ADBC drivers. I don't think this is worth pursuing unless there is a way to avoid fragmenting features across those two different areas

@Diadochokinetic
Copy link
Author

I took a look into the ADBC implementation. to_sql uses adbc_ingest, which supports an (experimental) parameter temporary. I'll give it a shot. Although, this would only support the use case of temporary tables and not table prefixes in general. A more general solution could be to insert the prefixes into table_name. I will report, when I did some experimenting with it.

@Diadochokinetic
Copy link
Author

I did some testing and was able to implement the feature partially for adbc drivers. I used the temporary parameter of the adbc_ingest method. On the one hand, I don't like that it doesn't implement the full feature capacity of the paramater prefixes as its done in sqlalchemy, on the other hand it enables creating and appending temporary tables with the to_sql method for sqlachemy and adbc drivers, which is the main reason why I started working on this feature in the first place. Furthermore, I assume temporary tables will be like >99% of the use cases for prefixes anyway.

Now the question is:

  • Should the paramater prefixes stay as proposed? Then the documentation needs to be expanded and explicitly state only ["TEMPORARY"] will have an effect for adbc drivers. Maybe even throw a NotImplementedError for other values.
  • Or reduce the parameter to a boolean temporary so it works exactly the same for both drivers, but loses functionality for sqlalchemy.

I personally prefer the first option, because I connect to db2 databases via sqlalchemy and sometimes need to pass prefixes=["GLOBAL", "TEMPORARY"].

@WillAyd
Copy link
Member

WillAyd commented Dec 2, 2024

Can you check if there is an open issue for this upstream in the arrow-adbc repository? I think it may be of interest to them to handle more than just the temporary case.

ADBC releases are pretty quick, so if that is implemented upstream it benefits the entire ecosystem, and likely wouldn't be a long turnaround to get into pandas

@Diadochokinetic
Copy link
Author

Diadochokinetic commented Dec 2, 2024

That is a very good point. It wold be very beneficial, if both sqlalchemy and adbc support prefixes. So far, there is no open issue for that, so I opened one: apache/arrow-adbc#2343. Let's see what the adbc team thinks about this.

@Diadochokinetic
Copy link
Author

It looks like the adbc team is hesitant to implement an abstract parameter prefixes as it is available in sqlalchemy. So the option to wait for feature parity in both back ends seems to be unavailable. I prefer to implement prefixes in to_sql() to:

  • have the full feature for sqlalchemy back-ends
  • only use the TEMPORARY keyword for adbc back-ends and throw a NotImplentedError for other keywords

@WillAyd What are your thoughts on this?

@WillAyd
Copy link
Member

WillAyd commented Dec 16, 2024

I don't think its worth adding to pandas if the various backends cannot both fully support it; we may have to document instead how you could do it with more low-level SQLAlchemy operations, but to have the pandas API only offer this for SQLAlchemy but not ADBC makes the abstraction of the pandas API more complicated

@Diadochokinetic
Copy link
Author

And what about a reduced parameter temporary, that is supported by both back ends? That wouldn't solve all my use cases, but at least ~ 90% of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Add table prefixes to to_sql method
2 participants