-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to use object_store with google cloud storage / gcs local emulators or custom gcs endpoint? #5263
Comments
The Github CI contains an example of using an emulator, we are running a fork of fake-gcs-server that supports the XML APIs, it is then just a case of overriding the endpoint URL |
Thanks, the override of the endpoint URL works well, but the emulator gives me a 404 any time I try to read it. These are the requests that I do (this is the write from fsspec):
Reading directly with fsspec works fine as well:
But the "read" using object_store/polars gives me a 404, (it uses the HEAD as we want to do more using object_store/polars, whereas fsspec obviously just downloads it directly), but I don't understand why this is failing either:
|
You need to run a container with the changes in fsouza/fake-gcs-server#1164, for example |
Yes thank you, I am using the image with the changes in the example above:
Perhaps it is some sort of problem with the way polars is implementing object_store |
The URLs in your requests refer to different files
It could be a red-herring depending on what the logging is actually printing, but is it possible you are constructing the object store path in such a way that it is url-encoding the dot twice? |
Just to be sure, I tried it with just "test" as well (after reuploading the file as "test" of course), unfortunately with the same result
|
Can you run |
Within the container I get: INFO[0028] 172.17.0.1 - - [11/Jan/2024:13:10:12 +0000] "HEAD /bla/test HTTP/1.1" 404 10 The image I'm using is just the one I got from:
|
Based on my experience with For example, I test with fake-gcs-server via this docker-compose file: services:
fake-gcs-server:
image: fsouza/fake-gcs-server:${FAKE_GCS_SERVER_VERSION:-1.47.7}
container_name: fake-gcs-server
ports:
- "${MAP_HOST_FAKE_GCS_SERVER:-127.0.0.1}:4443:4443"
volumes:
- fake_gcs_server_data:/data/sample-bucket <------------ look here!
command: -scheme http
volumes:
fake_gcs_server_data: |
My only guess is that the bucket doesn't exist, the CI does this via a curl request, but mounting a path may also work as @Xuanwo suggests. I don't know why fsspec would work, but possibly it buffers to the local filesystem or automatically creates buckets. The only other option I can think of is that the container isn't running the image we think it is, but that is a bit of a stretch |
Tried out the explicit mount @Xuanwo mentioned, but no difference (I was creating the bucket explicitely before so I don't think that was the issue, but I wanted to try everything). Perhaps I'm somehow still not using the right image... however, I went to the CI of this project and picked out the hash of the last CI run and used that, still the same result. Naivly trying, I can't really get any of the XML API endpoints (i.e. |
Could you perhaps provide the logs of the container, another option is the requests are actually going to a different instance of fake-gcs-server |
Sure, thanks for taking so much time trying to help on this
btw. this is the docker container running:
|
Can you instead start the server with this exact command, same as in the CI configuration
And communicate with it over 4443 within your application. When I run the image using the arguments you provide, they do not appear to be having the intended effect |
It's working now!! For anyone else that might be having this problem: The For anyone who is doing this using polars and fsspec: the Thank you for the patience and time in particular on this issue but also for your work on arrow-rs in general! |
Which part is this question about
object_store, gcs interface
Describe your question
Does object_store support the STORAGE_EMULATOR_HOST environment variable for google cloud (or any other way of setting the google cloud endpoint for emulation support)?
Additional context
I am using object_store within the polars library and would like to use the google cloud storage emulator endpoint so that I can run tests on our code.
I am using fake-gcs-server for testing applications that use gcs and utilizing https://github.com/fsouza/fake-gcs-server. However, I can't find a way to use it with object_store.
I was previously using pythons fsspec (https://filesystem-spec.readthedocs.io/en/latest/) and the gcs client directly which support this. Now that polars is using object_store exclusively as it's cloud access layer this is no longer possible.
Is there any recommended solution for this? Or is this just missing functionality in object_store?
The previous workflow is to run:
docker run -d --name fake-gcs-server -p 127.0.0.1:9090:9090 -v ${PWD}/examples/data:/data fsouza/fake-gcs-server -scheme http -port 9090 -external-url http://127.0.0.1:9090
and then either set the environment variable STORAGE_EMULATOR_HOST or just configure the python fsspec client to use this emulator url.
Here is the related polars issue where it was suggested I come here: pola-rs/polars#13085 (comment)
The text was updated successfully, but these errors were encountered: