Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The general 'wire-up' to the data issue ... #26

Open
2 tasks
ddkapan opened this issue Mar 23, 2022 · 8 comments
Open
2 tasks

The general 'wire-up' to the data issue ... #26

ddkapan opened this issue Mar 23, 2022 · 8 comments
Labels
config Configuration management

Comments

@ddkapan
Copy link
Collaborator

ddkapan commented Mar 23, 2022

Need to figure out some best practices && then implement them to wire our code to the data

@matt-har-vey
Copy link
Collaborator

Service account as a way of accessing Google Drive is one alternative to consider.

@ddkapan
Copy link
Collaborator Author

ddkapan commented Mar 26, 2022

Thanks @matth79 does a service account cost $?

@matt-har-vey
Copy link
Collaborator

I believe a service account does not cost money by creating it in a cloud project for which billing is disabled, though I'm not 100% sure of that.

However, now that you got person user accounts working for everyone, sharing around a common credentials file doesn't seem necessary or best. It also was even less easy to understand than the user OAuth token method.

If we end up with automated processes that need to access Drive, service accounts would be the right tool for that.

I did create one and verify it works for our scripts by adding a parameter to drive_auth.

drive_auth(path='service.json')

@matt-har-vey matt-har-vey added the config Configuration management label Mar 26, 2022
@matt-har-vey
Copy link
Collaborator

Specific cases include

#54 #23

@matt-har-vey
Copy link
Collaborator

I tried using symlinks for data dependencies in this commit.

It works, but it made a mess of .gitignore.

If we want the symlinks, maybe it would be good to consider not having lines ending in "input/" in .gitignore except for "original" human-created data like aru2point.csv, latlong.csv, etc. Once you have an ignore ending in a slash, you can't make exceptions underneath it.

A good convention would be that chunk C only ever read input from C/input, and if that input is the output of another chunk B, then the commit that adds the dependent code to C needs to also add a symlink from C/input into B/output, similar to the commit linked above.

If people adhere to that and we don't .gitignore input/, then "git status" will remind them which symlinks they need to add, if they have already created them locally.

@ddkapan
Copy link
Collaborator Author

ddkapan commented Aug 17, 2022

I actually ignored matt and pushed a .gitignore on a branch and merged the pull request. This is b/c Mark Schulist (elsewhere) confirmed that the symlinks need to be generated especially for each user so synchronizing them with GitHub wasn't going to work anyway. Correct @mschulist? What do you think @matt-har-vey?

@mschulist
Copy link
Collaborator

mschulist commented Aug 17, 2022

Symlinks need to be generated especially for each user so synchronizing them with GitHub wasn't going to work anyway.

Yes, this is true. We can include a line of code that will (re)create them. We could also just reference the outputs from another script, but that makes it more difficult to see what the inputs are for a given script.

@ddkapan
Copy link
Collaborator Author

ddkapan commented Aug 23, 2022

  • Go over symlink creation code for each occurrence of an /input/ directory in the codebase to finish this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config Configuration management
Projects
None yet
Development

No branches or pull requests

3 participants