Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support incremental appending #81

Closed
wants to merge 2 commits into from

Conversation

davidbrochart
Copy link
Contributor

@codecov-io
Copy link

codecov-io commented Mar 6, 2021

Codecov Report

Merging #81 (8cf3f23) into master (e2188f7) will increase coverage by 1.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #81      +/-   ##
==========================================
+ Coverage   91.36%   92.39%   +1.02%     
==========================================
  Files           5        5              
  Lines         359      368       +9     
  Branches       49       51       +2     
==========================================
+ Hits          328      340      +12     
+ Misses         26       22       -4     
- Partials        5        6       +1     
Impacted Files Coverage Δ
pangeo_forge/recipe.py 91.73% <100.00%> (+1.56%) ⬆️
pangeo_forge/storage.py 89.70% <0.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2188f7...29d5fd0. Read the comment docs.

self.processed_input_nb = len(self.processed_input_urls)
input_pattern = ExplicitURLSequence(self.processed_input_urls + self.input_urls)
self._inputs = {
k: {"url": v, "processed": k < self.processed_input_nb} for k, v in input_pattern
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of ExplicitURLSequence leaks here, as k would not necessarily be an incrementing index. We could instead have:

"processed": v in self.processed_input_urls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is not true, since self.input_urls are the new URLs (they don't contain the already processed URLs), which is actually different than for the NetCDFtoZarrMultiVarSequentialRecipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change self.input_urls so that it also includes the already processed URLs, this would make NetCDFtoZarrSequentialRecipe similar to NetCDFtoZarrMultiVarSequentialRecipe.

self._inputs = {k: v for k, v in self.input_pattern}
self.processed_input_nb = len(self.processed_input_urls) // len(self._variables)
self._inputs = {
k: {"url": v, "processed": v in self.processed_input_urls}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the input_pattern generates the input file URLs including the already processed ones (which is different than for the NetCDFtoZarrSequentialRecipe). This allows to tag the already processed URLs.
If the input_pattern didn't generate the already processed URLs, the pattern would have to know about the (number of) already processed URLs, which is not great.

@davidbrochart davidbrochart marked this pull request as ready for review March 6, 2021 21:54
@rabernat
Copy link
Contributor

Hi @davidbrochart - thanks so much for this PR! We really appreciate it! Sorry it has taken me so long to reply--I had a crazy week last week and wasn't able to spend much time on Pangeo Forge.

I'm going to try to review this in detail in the next few days. I'm a bit concerned about the intersection with #78, but we will see.

@rabernat
Copy link
Contributor

David thanks for working on this. We really appreciate your efforts.

Closed this because the code base has evolved quite a bit since this PR started. I plan to work on this feature in January and will use this PR as a basis for that work.

Appreciate your patience with this high-churn code base. It hasn't been trivial to get the base abstractions right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants