-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ObjectStoragePath for universal_pathlib>=v0.2.1 #37524
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
Also, if a core developer can point me to the easiest way to run the relevant parts of the test suite on an M2 mac that would be wonderful ❤️ |
Cool.
This should be rather easy : Contributing docs: https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst You have two options: A) Using your favourite IDE and (recommended)
You can also manage your env in whatever way you are used to and run We use B) using 1-1 reproducible CI dockerized environment:
|
(I approved/run) the tests BTW. |
Also installing |
…tached filesystems
- uses MemoryFileSystems instead of mocked 'fake filesystems' - tests intention for `.stat()` (two added keys) - tests local filesystems with absolute file urls, because fsspec interprets `file://bucket/key` as `file://{os.getcwd()}/bucket/key`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this! I just started looking at what was required and you beat me right to it.
The PR introduces some subtle behavioral changes (improvements definitely!). Particularly around what a path
is and I agree with @uranusjr that those require strengthening of the tests and possibly improvements to the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks for this @ap--
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
Oops, I think I might have missed a failing test / regression. I'll take a look |
There are lots of conflicts while cherrypicking this to 2.8.2. I will be moving it to 2.9.0 except anyone want to try the cherry-picking? cc @bolkedebruin @potiuk |
Why? That seems weird @ephraimbuddy ? (I.e. the conflicts). It's quite isolated on its own and no I think this should be in 2.8.2. |
Fantastic :) |
Because there are other committs that will need to be cherry-picked before that one - those that introduced the backend and we have not cherry-picked them, |
Ah yes. Well up to you. Seems nice to have this in 2.8.2 but its fine in 2.9 |
Those are
|
I think we should skip this one. I will do 2.9 next month instead of 2.8.3 |
Great! Even better because I think there is a regression now @ap-- . The modules are eagerly loaded now ie:
Will now load |
This design was intentional, because fsspec filesystems can implement their own storage_option parsing and protocol strip methods, and correct parsing behaviour can only be ensured if the filesystem class is importable. I will see if I can make changes to universal_pathlib to delay that as long as possible, or offer another way of handling these cases. What operations on an I'll open an issue in universal_pathlib for further discussion. |
closes: #37067
related: #36546
This PR adjusts the
airflow.io.path.ObjectStoragePath
implementation to support the newuniversal_pathlib>=0.2.1
.Functionality should be identical to the previous implementation. A few notes:
ObjectStoragePath._accessor._store
was keeping anObjectStore
instance reference just for retrieving theconn_id
. The current refactor does not keep the store instance referenced, since the conn_id is available in the storage_options of the UPath instance.__hash__
method was previously decorated withlru_cache
, which is bad, because it keeps a reference to each ObjectStoragePath instance in the cache dict (stored on the class), preventing cleanup._bucket
,_key
, etc.. that seemed to be leftovers from a previous implementation.Tagging: @potiuk @bolkedebruin
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.