Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to retrieve example dataset paths #1763

Closed
wants to merge 27 commits into from

Conversation

echedey-ls
Copy link
Contributor

@echedey-ls echedey-ls commented Jun 7, 2023

Done:

I'll wait for milestone assignment:

  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

Adds a get_example_dataset_path() function to pvlib.tools that returns a filepath to a dataset bundled with pvlib. See linked issue above.

Here is the documentation link [UPDATED 2023-06-18].

@echedey-ls echedey-ls marked this pull request as ready for review June 7, 2023 11:17
@echedey-ls
Copy link
Contributor Author

This issue is related to #1056. However, I haven't addressed any of it since in first place I don't know which purpose each file has.
Tweaking this new function would leverage work put into changing the structure.

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echedey-ls :)

there isn't a tools.rst file, so I believe it doesn't need referencing.

It's true that we don't usually document the functions in pvlib.tools, but in this case I think we probably should so that curious users can see what the function does and doesn't do. How about a new entry list at the bottom of the iotools page, something like this:

Functions for locating the example data files included in pvlib.

.. autosummary::
   :toctree: generated/

   tools.locate_example_dataset

If we do that, the function will need a complete docstring (parameter descriptions, etc).

pvlib/tests/test_tools.py Outdated Show resolved Hide resolved
pvlib/tools.py Outdated Show resolved Hide resolved
pvlib/tools.py Outdated Show resolved Hide resolved
docs/examples/soiling/plot_greensboro_kimber_soiling.py Outdated Show resolved Hide resolved
echedey-ls and others added 6 commits June 8, 2023 23:24
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
@echedey-ls
Copy link
Contributor Author

echedey-ls commented Jun 9, 2023

Here is the documentation link [UPDATED].

@echedey-ls echedey-ls changed the title Add dataset function Add function to get dataset paths Jun 9, 2023
@echedey-ls echedey-ls changed the title Add function to get dataset paths Add function to retrieve example dataset paths Jun 9, 2023
@kandersolar kandersolar added this to the 0.10.0 milestone Jun 9, 2023
@kandersolar kandersolar mentioned this pull request Jun 9, 2023
@echedey-ls
Copy link
Contributor Author

echedey-ls commented Jun 10, 2023

This should be ok now.
Please have a look at the docs. I've created a table to leverage work on the other issue. Maybe it's better to position it on another paragraph.

Edit: oh, and I ended up adding the parameter, since it won't really impact the use, someone more experienced may be able to organize the data folder.

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! @echedey-ls can you resolve the merge conflict? After that I think this will be ready to go.

Maybe @wholmgren would like to review as well.

@wholmgren
Copy link
Member

There's an importlib-resources backport for older python. We already use this approach for importlib-metadata for python < 3.8. This realpython example might be helpful too.

I think your original list of pros to a function are still valid. There the focus is on a function for example code and maybe user onboarding that separates public resources from package structure. If the user-facing function only does that then it might be a local win. On the other hand, it can also be good to propagate standard python patterns in example code so it's less clear that it's a global win. I think the internal code should keep it simple and stick to importlib-resources.

@echedey-ls
Copy link
Contributor Author

I understand the issue. We can limit the scope of this function until #1056 is addressed.
I agree with @wholmgren criteria on that this PR should only satisfy user needs. I will revert unrelated changes.

I suggested including test in the function name in an attempt to prevent confusing it for something like an iotools.get_ function

@kandersolar what is your opinion on renaming it to get_example_dataset_path (or get_pvlib_example_dataset_path, get_bundled_dataset_path, maybe consider other synonym of 'bundled')?

I'd like to bring attention to two points:

  • Importlib imports the whole module with importlib.resources.files('pvlib'). I don't know how much overhead that introduces compared to using __path__[0] or __file__, but I expect it to be a little higher. Not much, but just to consider the micro optimization.
  • Tests are being shipped with pvlib when it's installed. Is that on purpose?

echedey-ls and others added 6 commits June 18, 2023 19:35
Co-Authored-By: Kevin Anderson <57452607+kandersolar@users.noreply.github.com>
Co-Authored-By: Will Holmgren <william.holmgren@gmail.com>
@echedey-ls
Copy link
Contributor Author

This should be fine to go. Renamed to get_example_dataset_path and use cases are only in the examples.

@kandersolar
Copy link
Member

Importlib imports the whole module with importlib.resources.files('pvlib'). I don't know how much overhead that introduces compared to using __path__[0] or __file__, but I expect it to be a little higher. Not much, but just to consider the micro optimization.

It seems to use the same sys.modules cache as import does, so as long as we are assuming pvlib is already imported, or will be imported later, then I don't think there is any significant penalty.

Tests are being shipped with pvlib when it's installed. Is that on purpose?

Yes, see #473. There is a non-pvlib discussion here on whether including tests in sdists and wheels is a good idea in general (tl;dr opinions vary).

@echedey-ls
Copy link
Contributor Author

That's good to know @kandersolar !

@kandersolar kandersolar modified the milestones: 0.10.0, 0.10.1 Jun 28, 2023
@kandersolar kandersolar modified the milestones: 0.10.1, v0.10.2 Jul 6, 2023
@echedey-ls echedey-ls mentioned this pull request Aug 3, 2023
20 tasks
@kandersolar kandersolar modified the milestones: v0.10.2, v0.10.3 Sep 21, 2023
@kandersolar kandersolar modified the milestones: v0.10.3, v0.10.4 Dec 20, 2023
@kandersolar kandersolar modified the milestones: v0.10.4, Someday Mar 18, 2024
@echedey-ls echedey-ls marked this pull request as draft June 20, 2024 13:51
@echedey-ls
Copy link
Contributor Author

Closing this due to inactivity. If there's interest in the future, feel free to ping me.

@echedey-ls echedey-ls closed this Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add function to retrieve example datasets
4 participants