Higher tmax / tmin for new driver data #41

jzwart · 2020-05-11T20:34:13Z

Copying issue from delaware-water-temp repo.
The new driver data (2017 - 2019) that was appended (# onto old drivers (1980-2016) seems to be shifted slightly higher for both tmin and tmax.

Less obvious is there is an issue with the precipitation. Elevated annual precipitation in 2018 but not shifted up like tmin / tmax

jzwart · 2020-05-12T01:31:24Z

Might be an anomaly though as most segments seem to have fairly consistent drivers, except for a few segments (e.g. 2045 - 2047 decrease in tmax_air during 2017-2019)

seg_id_nat == 2047 below

jzwart · 2020-05-12T01:33:32Z

Similar patterns for tmin_air

hcorson-dosch-usgs · 2020-05-27T17:29:11Z

Here's what the data look like spatially, if the differences between new and old values are averaged over 2015-2016 (the period for which we have overlap between the driver data pulled previously and the driver data I pulled recently)

limnoliver · 2020-05-27T18:01:48Z

So I think this is telling us it doesn't have to do with differences basin weights, right? Since we'd expect uniformity in pattern across temp and precip? It also suggests there's not just a few bad segments where something got scrambled.

jzwart · 2020-05-27T18:43:40Z

I think that's right, Sam, re: weights. The clustering of differences (high or low) are a bit odd and it almost seems like it's matching data from somewhere else

aappling-usgs · 2020-05-27T19:27:44Z

Could there be some confusion between seg_id_nat and seg_id somewhere in the whole process? (Fairly wild guess - you three are much deeper in to the sleuthing than I am)

jzwart · 2020-05-27T20:35:55Z

We're looking into the indexing in the code, which could be similar to a seg_id_nat vs seg_id mismatch.

hcorson-dosch-usgs · 2020-05-27T21:38:11Z

Ok new figure showing 2015-2016 average June-August tmax and tmin throughout the basin, based on both the previously pulled driver data (left) and the newly pulled driver data (right). Based on this, Jake and I are thinking it is an indexing issue, likely due to a work-around I put in place since we were missing one csv file with ids. I'm going to email Rich McDonald to see if we can get that file and try re-running the code.

hcorson-dosch-usgs · 2020-05-29T18:53:15Z

Ok - quick update. Rich McDonald pointed me to his new repo, with cleaned-up code and a revised conversion script. I pulled data for 2015 using the new repo and code, and we're still seeing disagreement between the original driver data and the newest data (which I'm referring to as gridmetetl, after the name of the repo, to distinguish it from the data I pulled earlier this spring [and plotted up above] using the code in the onhm-fetcher-parser repo).

Unfortunately, the newly pulled 2015 gridmetetl driver data (here denoted as 'new) also does not match the original 2015 driver data (denoted as 'old):

Here's a comparison of 2015 gridmetetl tmin data and original tmin data for segment 2047:

The gridmetetl driver data show a nearly identical spatial pattern to the onhm-fetcher-parser driver data (see above). It seems like there is still an indexing issue.

Interestingly, the gridmetl precipitation data and onhm-fetcher-parser precipitation data are identical, while the gridmetetl tmin and tmax data are consistently 0.33 - 0.39 degrees celsius higher (0.33-0.35 on average) than the onhm-fetcher-parcer tmin and tmax data.

For a given segment, the difference between gridmetl and onhm tmin does not appear to be related to the difference between gridmetetl and onhm tmax:

Next steps: Reach out to Rich to report on how implementing the new code went (well), and how the new data looks (same issues as before)

hcorson-dosch-usgs · 2020-05-29T22:25:11Z

Did a spot check with observed weather data, per Sam's suggestion. Here are plots for Segment 2047, in Wilmington, DE:

Precipitation isn't very well matched by either driver data set, but the old driver data is clearly much closer to actual observed temperatures that the recently pulled driver data, at least for this segment

jzwart · 2020-06-01T15:49:55Z

From Rich:

I just checked, and the gridmet driver hru id is saved in the netcdf file ouput as hruid so you should be able to read that and incorporate it. They are ordered in the nectcdf file from 5308 - 7252. There is a total of 764 hrus. This is based on the shapefile that I was given for the Delaware.

That order is not the same as the .cbh HRU order which is based on model index. Will try reordering to model index and compare again.

hcorson-dosch-usgs · 2020-06-01T22:48:59Z

Ok - so it indeed comes down to a difference in how the .cbh files are indexed. There does not appear to be any indexing error in how the current set of scripts downloads netCDF files or in how those files are converted to .cbh. The issue we were seeing when mapping data to the segment level was that the 'new' driver data used a different set of indices than the old driver data, so our script to map values to segments based on the hru values, when supplied with the new driver data, was computing area weighted averages for the wrong hrus (thinking they were indexed based on model_idx [like the old driver data], when they were actually index based on hru-ids]. We didn't catch this before because the .cbh files actually don't have any header information, so there is no way to know how they are indexed without digging into their generation. We assumed that the old and new data were indexed the same way, when they actually weren't.

Note: Accounting for the indexing issues solves the majority of our problem, but there still appear to be some discrepancies between the old and new driver data (at least for the date shown: 2015-01-01). I'll dig more into that tomorrow. I can also regenerate the segment-level plots tomorrow

jordansread · 2020-06-02T00:08:40Z

🕵️‍♀️ Nice work!

hcorson-dosch-usgs · 2020-06-04T20:55:36Z

Ok - so I modified the ncf2cbh script to take the data imported from netCDF, re-indexed it to model_idx, and then export it as .cbh files.

At the hru-level, the re-indexed data matches the gridmetetl data I first downloaded (i.e., reindexing worked correctly), and now the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data are both using model_idx to index the hrus. As I mentioned in my last post, there are clearly still some discrepancies between the old and new driver data -- the plots below dig into this more:
Here - January 1st, 2015

Differences notable here, on June 1st, 2015:

I then used the re-indexed .cbh files as input to Jake's script to derive segment-level driver data. Here are a series of plots comparing the 'new' driver data to the 'old' driver data . Still some weird things happening @jzwart any thoughts?:

But these time-series plot for segments look much better than before (though there are clearly discrepancies):
2047 - tmin

1830 - tmax

1634 - tmin

Here are the segment level plots - Differences between 2015-2016 year-round averages for each segment. For themajority of segments the data is different between the two datasets, but the magnitude of the differences is much smaller:

2015-2016 June-August average for each segment. Looks much better than before, in that the new driver data now shows the N-S gradient. However note that the ranges of temperatures differ somewhat:

Then I re-did the weather data plots for segment 2047. These look weird (esp. tmin), so I'm going to pull some weather data for segments for which driver data differs substantially between the 'old' and 'new' data sets (based on the segment plots, above), and plot that up as well.
Tmin almost looks shifted by a day, but that isn't reflected in the other two plots....

jzwart · 2020-06-04T21:11:13Z

Nice job on this, @hcorson-dosch ! It would be good to know how different the HRU driver data is too, in addition to the segment driver data, just to rule out any issues with the function that grabs driver data for each segment. A quick comparison by reading in the old and new .cbh files and then plotting column 8 (for example) from each .cbh file against each other would be a quick test.

Maximum temperature looks pretty good, but it's strange that tmin seems adjusted one-two days before the other data. Is that the case for every segment?

Glad that the mean differences are much better now, that's encouraging 👍

rmcd-mscb · 2020-06-04T23:15:42Z

@hcorson-dosch I agree with Jake, The direct link between the Gridmet data as pulled and the Delaware model is the HRU,

Also there have been some changes to the raw Gridmet data that could potentially result in differences between what currently pulled and the original data: http://www.climatologylab.org/gridmet.html

Look at the Updates tab

hcorson-dosch-usgs · 2020-06-05T13:15:40Z

Thanks for the link re: gridmet updates @rmcd-mscb! I'll make some hru-level plots today

hcorson-dosch-usgs · 2020-06-09T23:42:57Z

Alright - big update here. Unfortunately, it's mostly more confusing plots.

I dug into hru-level comparisons, and it turned up some clear discrepancies between the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data.

Some scatter plots - data for all 765 hrus:

In some months, data is relatively similar:

Particularly weird patterns in other months, esp. Feb 2016

If we plot data for all of the days in a given month (random colors), for all of the hrus, the oddities really show up:

I took a look at the difference between the datasets over time (for all hrus - layered with partial transparency), and it's striking how large the differences are on certain days (even though they averaged out to differences of +/- 1 degree over 2015-2016, as seen above). Note that Feb 2016 is particularly strange:

Here's a close up of spring 2016:

I plotted the differences between the maximum temperatures in each set of driver data for each hru spatially (sorry, I didn't note that it's tmax in the plot titles) - there doesn't seem to be a consistent pattern:

Even when we look at the absolute value of the differences in tmax between the two driver data sets:

So then I plotted the maximum temperature data side by side spatially for a few days:

Interestingly, the spatial pattern matched better if the new data was compared to one day ahead in the old data:

So then I went back and updated some of those plots to shift the reindexed gridmetel data by +1 day. Things look a little tighter, but still different, and something strange is still happening in Feb 2016:

SO clearly some odd things happening: maybe some shifted data, and then really big differences in Feb 2016.

Jake suggested that I do some more comparisons to weather data, and Sam suggested that I also compare to raw gridmet data.

So I did both, picking 3 hru/segment locations. I chose data from Jan-Mar 2016 for two, and data from May 2015 for the other one:

First- Jan-Mar 2016 for segment 1493 and hru model_idx 282, near Trenton, NJ:
Original Tmax -Reindexed Tmax, with hru highlighted:

Comparison of raw, original, reindexed, and shifted reindexed driver data:

Comparison to weather data:

Second- Jan-Mar 2016 for segment 1722 and hru model_idx 575, near Allentown, PA:
Original Tmax -Reindexed Tmax, with hru highlighted:

Comparison of raw, original, reindexed, and shifted reindexed driver data:

Comparison to weather data:

Lastly - May 2015 for segment 2041 and hru model_idx 748, near Wilmington, DE:
Original Tmax -Reindexed Tmax, with hru highlighted:

Comparison of raw, original, reindexed, and shifted reindexed driver data:

Comparison to weather data:

So -- a lot to think over. From all of this, I think the biggest takeaway is that the raw gridmet data matches the reindexed gridmetal data (👍 b/c that means the pull script is working correctly), BUT that means that the raw gridmet data also does not match weather data in February 2016 (👎 b/c I have no idea what could be messing up all the data for that whole month). Also - in some places the reindexed driver data look like they would be better if shifted by one day (e.g. tmin in May 2015 for Wilmington, DE and tmin and tmax in March 2016 for Trenton and Allentown), while in other spots (e.g. tmin and tmax in January 2016 for Trenton and Allentown), the raw gridmet data and therefore reindexed driver data appear to match weather data quite well. It doesn't make sense to me that only some of the data in some locations would need to be shifted. And I have no idea what could be happening in February 2016. 2016 was a leap year, but I don't see why that would throw the whole month of data off.

Thoughts??

Let me know if there's anything else I should plot. At this point I think I need to pause and get some input/other people's ideas before I do any more digging, since I'm just getting more and more confused by the data.

hcorson-dosch-usgs · 2020-06-09T23:45:43Z

Meant to mention I pulled the raw data in netCDF format from the gridmet site using the Aggregated THREDDS Catalog (OPENDAP)

jzwart · 2020-06-10T11:01:17Z

WOW @hcorson-dosch! This is a ton of work, and thanks for pulling together all these comparisons.
I agree with your summary paragraph, and it also seems like the original driver data (given to us with original PRMS-SNTemp cutout) is not gridmet data, right? Since it doesn't match to the re-indexed gridmetetl or gridmet pulled directly from THREDDS.

The shifted +1 day in some locations and times is really strange to me, while other time periods it actually matches really well. I guess I'm naive about whether this is an issue with gridded met data in general or something specific to gridmet. I know there are some biases in NLDAS data when compared to weather data, for example, but those seemed to be a consistent offset rather than the messiness that is happening here.

I think it's encouraging that the original driver data matches the weather data fairly well. It would be good to know how that was created and the source of the driver data.
These are my initial thoughts and it'd be good to hear from others

limnoliver · 2020-06-10T12:45:20Z

Oooof. On one hand, bummer that there are multiple issues, but I think you're narrowing in! I'm not sure what to do/think about the Gridmet data -- and it would be difficult to (comprehensively) know which segments or what time periods to shift by a day. Maybe as one last check of the raw data, download directly from Gridmet instead of THREDDS?

Hayley - one way to view error might be to look at cumulative error (either from weather data or from old driver data) to get a sense for how bad it is across reaches. So for your three reach examples, sum absolute error every day (of reindexed and offset data), and then plot cumulative error through time? This might help identify the time periods that are offset, and should ID February as a true outlier.

@jzwart how are you feeling about this now? Where do we go from here if the Gridmet data is "bad"?

Edited - oops, sorry Jake, I had not refreshed before I commented! Thanks for your input. I will ask Steve about the origins of the driver data.

limnoliver · 2020-06-10T12:59:33Z

This pub from 2018 about NHM says it was configured with Daymet.

jordansread · 2020-06-10T13:21:34Z

Only because it seems oddly reminiscent of a similar offset issue we had in the past that was based on how the metadata was used on OPeNDAP, sharing this: https://github.com/USGS-CIDA/stream_metab_usa/issues/68

hcorson-dosch-usgs · 2020-06-10T13:57:52Z

Thanks all. @jread-usgs that's really interesting that you and Dave saw a similar issue with OPENDAP and reading data in from .netcdfs. Something similar could be happening here.

@limnoliver I did download some data directly from gridmet, but couldn't get it to read in correctly using two different netcdf packages in Python that I've been using. Sometimes all the temp values were NaN, and in other cases the dates were 1 day off, and then in others the dates were way off. So that might all be tied back to the issue Jordan pointed to. Perhaps I should try again, but read it in using a different package, or try reading it in in R? That's also a good idea to look at cumulative error.

I'll plan to dig back into this tomorrow.

rmcd-mscb · 2020-06-10T14:03:13Z

I think it's important to compare the HRU data rather than the segment data, because Gridmet and Daymet are specifically mapped to the HRU.

Also, you should be able to read that raw gridmet data directly into xarray.

I'll also double check the gridmetetl code to make sure there are no indexing errors.

hcorson-dosch-usgs · 2020-06-10T14:19:31Z

Hi @rmcd-mscb thanks for double-checking the gridmetetl code.

All of the plots in my latest comment plot the HRU data, not the segment-level data except for nine of the plots near the end that compare weather data to driver data derived at the segment level (as well as raw gridmet data at the grid cell [lat, long] level).

I was using xarray to read in the netCDF data. It worked great for the netCDF files generated by your script and for the netCDF files I pulled from THREDDS, but for some reason when xarray read in raw netCDFs directly from gridmet I was running into errors. For the individual day netCDFs, the dates were incorrect:

And for the annual netCDFs I tried, xarray could not read in the temperature variable correctly, giving an error that it had an unsigned attribute but was not of integer type:

I also tried the Dataset module from the netCDF package, but was getting null temperature values from that as well. Since I could read in the netCDF data from THREDDS just fine with xarray, I used that data and didn't really dig into those issues, but I'll now try to troubleshoot why the raw gridmet data wasn't being read correctly by xarray.

rmcd-mscb · 2020-06-10T19:27:42Z

Hi Hayley,
That is a lot of work! And to be honest it makes my head spin a little :) I think the xarray to_dataframe() is maybe giving you issues with dates. If you were to just look at your xarray dataset, for the annual data ( that is all I've ever used in my testing) you still get that use_cftime error but I think it's just a matter of setting the use_cftime parm. And I don't see in the attributes what calendar there are using sp not sure how to set that.

Anyway look at this notebook here: https://nbviewer.jupyter.org/github/nhm-usgs/gridmetetl/blob/delaware/Examples/Delaware_xarray.ipynb

you should be able to extract the data. If you know the lat/lon of your weather station you can grab it from xarray with something like this: ds.sel(lat=-35.28, lon=149.13, method='nearest'). If you now what hru the weather station lies in, you could plot weather station data against the climate data for that HRU rather than the segment. It won't be perfect remember GM is 4km resolution

HOWEVER - I think I found the offset error. If you look at the attributes you will see the note below which I've never noticed before - Days are in MST! Also note it says approximately

note5 :
Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)

Finally, if the original forcing data is Daymet, Maybe Parker could pull you a new bandit version with updated daymet data. 2019 Daymet data just became available.

hcorson-dosch-usgs · 2020-06-10T19:38:17Z

@rmcd-mscb Thanks for all your work digging into this! I'll take a look at your notebook tomorrow, and will try addressing the offset you pointed out to see if that fixes our issues. I'll also add the weather data to the hru-level plots that compare the hru-level driver data to the raw gridmet data I had extracted from the grid at the location of the weather data.

hcorson-dosch-usgs · 2020-06-15T23:13:47Z

Ok - I was able to read in the raw gridmet data - It had read in correctly before, but appeared to be null until I specified a particular date and/or grid cell for which to view data. Not sure why. I've updated the HRU-level plots to include the raw gridmet data alongside the THREDDS gridmet data and the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data, as well as the 'new' driver data shifted by +1 day. I've included plots for three locations (Wilmington, DE; Trenton, NJ; Allentown, PA) for three periods (June 2015; January 15 - March 15, 2016; October 2016). Note that the driver-data values for each HRU are generated by weighting the gridded data for all cells that overlap each HRU, so may not match the raw gridmet data exactly.

Tmax

Overall, there is very good agreement between the raw gridmet data, the reindexed driver data, and weather data for all of the periods and locations, except for February 2016, when the raw gridmet data and reindexed driver data do not track the weather data at all.

June 1-30, 2015
Wilmington, DE

Trenton, NJ

Allentown, PA

Jan 15 - Mar 15, 2016
Wilmington, DE

Trenton, NJ

Allentown, PA

October 1-31, 2016
Wilmington, DE

Trenton, NJ

Allentown, PA

Tmin

With these plots that are temporally zoomed in (relative to my past plots), it appears that the raw gridmet data and the reindexed driver data are pretty consistently offset from the weather data for all of the periods and locations. It looks like the offset is ~1 day. As for Tmax, the raw gridmet data and reindexed driver data do not track the weather data at all in February 2016.

June 1-30, 2015
Wilmington, DE

Trenton, NJ

Allentown, PA

Jan 15 - Mar 15, 2016
Wilmington, DE

Trenton, NJ

Allentown, PA

October 1-31, 2016
Wilmington, DE

Trenton, NJ

Allentown, PA

Precipitation

Overall, the raw gridmet data and the reindexed driver data are pretty consistently offset from the weather data for all of the periods and locations. It looks like the offset is ~1 day. As for Tmax and Tmin, the raw gridmet data and reindexed driver data do not track the weather data at all in February 2016.

June 1-30, 2015
Wilmington, DE

Trenton, NJ

Allentown, PA

Jan 15 - Mar 15, 2016
Wilmington, DE

Trenton, NJ

Allentown, PA

October 1-31, 2016
Wilmington, DE

Trenton, NJ

Allentown, PA

Overall, it appears that the actual raw gridmet data match the gridmet data pulled from THREDDS and the reindexed driver data pulled using Rich's gridmetetl repo (👍 ) but that suggests that the actual raw gridmet data are incorrect in February 2016 (👎 👎 👎 ). Also - the tmin and prcp data appear to be consistently offset from weather data by approximately 1 day (👎 ❓ ). Interestingly, the tmax data do not appear to be offset (😕 ❔).

I'm still pretty mystified by this offset. According to the documentation,

gridMET nominally considers a "day" to be midnight-to-midnight

And as @rmcd-mscb noticed, it also states in the metadata that:

Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)

Here's how I am thinking about it, which may be incorrect. If so, please let me know!

The gridded data, driver data, and weather data are all daily data
The gridmet data and the driver data come in with a timestamp of 00:00:00
The weather data is imported with only a datestamp, but when converted to datetime by pandas in Python, the default time stamp is 00:00:00
For a given day, such as 1/1/2015, the gridmet data is for 12:00:00 AM MST to 11:59:59 PM MST.
This is equivalent to the following period in EST: 2:00:00 AM EST on 1/1/2015 to 1:59:59 AM EST on 1/2/2015
That still places the bulk of the day (22 hours) on the first day (here, 1/1/2015), so the gridmet data should still represent conditions that first day (e.g., 1/1/2015 rather than 1/2/2015)
Therefore, the gridmet data should not need to be shifted by a full day
The offset that is visible appears to be greater than a 2-hour difference between time zones could explain:

Does that thinking seem correct? Or am I missing something obvious?

If you are curious, here are descriptions of the metadata for the netCDF files on the Northwest Knowledge network: tmax, tmin, prcp. According to this documentation, they all have the same datetime format.

It seems odd that the tmax data is not offset, yet the tmin and prcp data do seem to be offset. Maybe there is some error in how datetime is being encoded for the tmin and prcp data?

Thoughts about the offset or what could be happening in February 2016?

jzwart · 2020-06-16T20:41:06Z

Thanks @hcorson-dosch! I agree with your assessment. The Feb 2016 issue is definitely strange and the offset that is dependent on data product / location / time of year is very confusing.
I'm also getting confused if the offset is in the right direction, and I think I've convinced myself it is correct -> the weather happening in the DRB today is given a timestamp of yesterday (e.g. MST) so the adjustment should be +1 and not -1 (I'm mostly writing this down for myself in case I get confused again :) ).
This doesn't help explain all the issues you point out such as only being 2 hours behind and not 24 hours behind, or why tmin and tmax don't both have a consistent offset. I really don't know what would cause this and I wonder if it's worth checking with gridmet folks to see if this issue has come up before

rmcd-mscb · 2020-06-19T15:38:02Z

Hi @hcorson-dosch and @jzwart

I've pointed this thread to Steve M, Parker N and Jacob LF, and it's on our list to look at early next week. They have more experience with this kind of data and correlations to weather station data, in the context of PRMS. In anticipation of showing this to the gridMET folks, can I suggest redoing the plots above by relabeling the data as:

Raw Gridmet data -> gridMET point interpolation
Original driver data -> Daymet HRU
Reindexed driver data -> gridMET HRU
Keep the weather station data
remove Thredds Gridmet data and the shifted data

Also note that the validation of gridMET data was made for stations in the Western US: https://webpages.uidaho.edu/jabatzoglou/PDF/IJOC_Abatzoglou_2012.pdf

I think it would be good to hear what the NHM group thinks first but then I'm happy to help and be involved in a conversation with the gridMET folks in anyway that would be helpful to you.

hcorson-dosch-usgs · 2020-06-19T18:16:17Z

Hi @rmcd-mscb thanks for looping in Steve, Parker, and Jacob.

After talking everything over with our stream temperature modeling team yesterday, we decided to reach out to Gridmet, so I sent John Abatzoglou and Katherine Hegewisch an email yesterday. I included some cleaned up figures that only showed the weather data and the raw Gridmet (gridMET point interpolation) data. The figures were selected to show:

An offset of ~1 day in the tmin and prcp gridded data, and
Deviations from observed weather data in February 2016 in the tmin, tmax, and prcp data.

Here are the three figures I sent, along with the text I included to explain what we were seeing:

The first two plots (‘AllentownPA_2015_06_Prcp’ and ‘WilmingtonDE_2016_10_Tmin’) illustrate the offset that we have observed in the 2015-2016 prcp and tmin data. In the prcp plot, the offset seems very consistent (~1 day), but in the tmin plot it appears that the offset may be less consistent. This offset is evident throughout 2015 and 2016 for the tmin and prcp data at all three sites. In general, it appears to be a fairly consistent offset of ~1 day (for both prcp and tmin). Interestingly, the 2015-2016 tmax data is not offset at any of the sites.

The third plot (‘TrentonNJ_2016_02_Tmax’) illustrates how the Gridmet tmax data deviates from weather data in February 2016. The deviations in February 2016 are observed at all three sites and for all three gridded products (tmin, tmax, and prcp). Note that the Gridmet tmax data from 1/1–1/31 and from 3/2–3/15 matches the weather data very well (without any offset).

Both John Abatzoglou and Katherine Hegwisch responded today, with some helpful information. John dig some digging into the Wunderground precipitation data to which I had compared the Gridmet data. By inspecting the hourly Wunderground data, he noticed that Wunderground does not accurately report daily precipitation totals, which was unknown to both him and to our team:

The precipitation offset piece is a bit more puzzling. So I looked at the Wunderground site since I’ve pulled data there as well. Keying in on the Lehigh airport on two days : June 20-21 2015. Their daily summary shows what you have, basically an inch on the 21st and nada on the 20th. But diving into the hourly reveals something a bit different
June 20
https://www.wunderground.com/history/daily/us/pa/allentown/KABE/date/2015-6-20
and then June 21st 2015
https://www.wunderground.com/history/daily/us/pa/allentown/KABE/date/2015-6-21
Looking closer on those day’s you’ll see they have daily summaries (from the previous 24 hours ending at weird times like 11am on one day, 5am the other day). Bottom line is that I’m not sure their function is given you a true day. I certainly had no idea before looking in. Quite odd.

He also had some thoughts about the tmin data offset:

The tmin shift might be a bit of an artifact of a ‘day’ as well. The ASOS stations are reporting a true day, ending midnight local, we use a 7Z local which would be a bit closer to 2am. Should the daily low happen midnight plus or minus a couple hours this could cause a difference. Most of the time daily lows are closer to sunrise.

And noted that he and his team would look into the February 2016 data:

There does seem to be something weird w/the February 2016 data that we’ll look into.

Katherine then chimed in, and provided some very helpful information about a previous user's experience reading the Gridmet NetCDF files using xarray:

Recently, a user communicated with me about a similar 1-day shift issue with our netcdf files which they were seeing using xarray and Pandas in python to extract/translate times from the netcdf files. After going back and forth with them, they discovered that pandas was throwing a warning:
RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, 'julian', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates.
t = timeseries.indexes['time'].to_datetimeindex()
and they concluded that the 1-day shift likely was do with xarray->pandas. When I ran the same extraction in MATLAB or in python (using netcdf4 instead), my dates were different than theirs by 1 day and I did not see a 1-day shift w.r.t. PRISM.
Note that I can do your same comparison of the gridmet data to PRISM using app.climateengine.org and we see no shift in ppt/tmin.

I've downloaded new weather data (from NOAA), which I will use for updated comparisons. I am also going to read in the Gridmet data using netcdf4 instead of xarray, to see if that addresses the shift in tmin and prcp that we had observed. What's odd to me, however, is that I saw no shift in the tmax data, even though it was read in with xarray using exactly the same method as I used to read the other netCDF files.

I'll add new plots here as soon as I have them put together.

hcorson-dosch-usgs · 2020-06-22T18:41:09Z

Just a quick note to keep things up-to-date here. John Abatzoglou fixed the February 2016 data, so that issue has been resolved, which is great 👍. John's hunch about the apparent offset in prcp data being due to the faulty Wunderground data was correct. The NOAA weather data I downloaded aligns with the 2015 gridMET prcp data 👍. However, we are still seeing some temporal offsets in the updated 2016 data 👎. I've sent some updated plots to John and Katherine with some additional discussion and questions for them. Hopefully we can get to the bottom of the offset issue shortly.

limnoliver · 2020-06-22T19:08:17Z

Awesome work, @hcorson-dosch -- seems like we're getting close!

hcorson-dosch-usgs · 2020-07-23T15:28:05Z

Another quick update: since I last reported here John Abatzoglou fixed the temporal offsets in the updated 2016 tmax and prcp data. Following that fix, I downloaded the raw gridMET netCDF files for each year between 1980 and 2019. I also downloaded NOAA weather data. Comparisons of the gridMET and NOAA data show that the gridMET tmax and tmin data in February 2017 (and possibly in part of February 2019) deviate from recorded weather data -- similar to the deviations we'd observed in February 2016. I reached out to John Abatzoglou about these discrepancies. He is in the midst of transitioning their systems, but will fix those variables when the systems are stable again - likely later this month or in August. Once that data is updated we'll be ready to pull and process the 1980-2019 gridMET data and use it as a driver for a new SNTemp run.

Higher tmax / tmin for new driver data #41

Higher tmax / tmin for new driver data #41

Comments

jzwart commented May 11, 2020 • edited Loading

jzwart commented May 12, 2020 • edited Loading

jzwart commented May 12, 2020 • edited Loading

hcorson-dosch-usgs commented May 27, 2020

limnoliver commented May 27, 2020

jzwart commented May 27, 2020

aappling-usgs commented May 27, 2020

jzwart commented May 27, 2020

hcorson-dosch-usgs commented May 27, 2020

hcorson-dosch-usgs commented May 29, 2020 • edited Loading

hcorson-dosch-usgs commented May 29, 2020

jzwart commented Jun 1, 2020

hcorson-dosch-usgs commented Jun 1, 2020

jordansread commented Jun 2, 2020

hcorson-dosch-usgs commented Jun 4, 2020 • edited Loading

jzwart commented Jun 4, 2020

rmcd-mscb commented Jun 4, 2020

hcorson-dosch-usgs commented Jun 5, 2020

hcorson-dosch-usgs commented Jun 9, 2020

hcorson-dosch-usgs commented Jun 9, 2020

jzwart commented Jun 10, 2020

limnoliver commented Jun 10, 2020 • edited Loading

limnoliver commented Jun 10, 2020

jordansread commented Jun 10, 2020

hcorson-dosch-usgs commented Jun 10, 2020

rmcd-mscb commented Jun 10, 2020

hcorson-dosch-usgs commented Jun 10, 2020 • edited Loading

rmcd-mscb commented Jun 10, 2020

hcorson-dosch-usgs commented Jun 10, 2020

hcorson-dosch-usgs commented Jun 15, 2020 • edited Loading

Tmax

Tmin

Precipitation

jzwart commented Jun 16, 2020

rmcd-mscb commented Jun 19, 2020

hcorson-dosch-usgs commented Jun 19, 2020

hcorson-dosch-usgs commented Jun 22, 2020

limnoliver commented Jun 22, 2020

hcorson-dosch-usgs commented Jul 23, 2020 • edited Loading

jzwart commented May 11, 2020 •

edited

Loading

jzwart commented May 12, 2020 •

edited

Loading

jzwart commented May 12, 2020 •

edited

Loading

hcorson-dosch-usgs commented May 29, 2020 •

edited

Loading

hcorson-dosch-usgs commented Jun 4, 2020 •

edited

Loading

limnoliver commented Jun 10, 2020 •

edited

Loading

hcorson-dosch-usgs commented Jun 10, 2020 •

edited

Loading

hcorson-dosch-usgs commented Jun 15, 2020 •

edited

Loading

hcorson-dosch-usgs commented Jul 23, 2020 •

edited

Loading