Skip to content
This repository has been archived by the owner on Jun 1, 2023. It is now read-only.

Higher tmax / tmin for new driver data #41

Open
jzwart opened this issue May 11, 2020 · 35 comments
Open

Higher tmax / tmin for new driver data #41

jzwart opened this issue May 11, 2020 · 35 comments

Comments

@jzwart
Copy link
Member

jzwart commented May 11, 2020

Copying issue from delaware-water-temp repo.
The new driver data (2017 - 2019) that was appended (# onto old drivers (1980-2016) seems to be shifted slightly higher for both tmin and tmax.
image
image
Less obvious is there is an issue with the precipitation. Elevated annual precipitation in 2018 but not shifted up like tmin / tmax
image

@jzwart
Copy link
Member Author

jzwart commented May 12, 2020

Might be an anomaly though as most segments seem to have fairly consistent drivers, except for a few segments (e.g. 2045 - 2047 decrease in tmax_air during 2017-2019)
image

seg_id_nat == 2047 below

image

@jzwart
Copy link
Member Author

jzwart commented May 12, 2020

Similar patterns for tmin_air

image

@hcorson-dosch-usgs
Copy link
Contributor

Here's what the data look like spatially, if the differences between new and old values are averaged over 2015-2016 (the period for which we have overlap between the driver data pulled previously and the driver data I pulled recently)
2015_2016_DriverDataDiffs

@limnoliver
Copy link
Member

So I think this is telling us it doesn't have to do with differences basin weights, right? Since we'd expect uniformity in pattern across temp and precip? It also suggests there's not just a few bad segments where something got scrambled.

@jzwart
Copy link
Member Author

jzwart commented May 27, 2020

I think that's right, Sam, re: weights. The clustering of differences (high or low) are a bit odd and it almost seems like it's matching data from somewhere else

@aappling-usgs
Copy link
Member

Could there be some confusion between seg_id_nat and seg_id somewhere in the whole process? (Fairly wild guess - you three are much deeper in to the sleuthing than I am)

@jzwart
Copy link
Member Author

jzwart commented May 27, 2020

We're looking into the indexing in the code, which could be similar to a seg_id_nat vs seg_id mismatch.

@hcorson-dosch-usgs
Copy link
Contributor

Ok new figure showing 2015-2016 average June-August tmax and tmin throughout the basin, based on both the previously pulled driver data (left) and the newly pulled driver data (right). Based on this, Jake and I are thinking it is an indexing issue, likely due to a work-around I put in place since we were missing one csv file with ids. I'm going to email Rich McDonald to see if we can get that file and try re-running the code.

2015_2016_Jun-Aug_TmaxTmin

@hcorson-dosch-usgs
Copy link
Contributor

hcorson-dosch-usgs commented May 29, 2020

Ok - quick update. Rich McDonald pointed me to his new repo, with cleaned-up code and a revised conversion script. I pulled data for 2015 using the new repo and code, and we're still seeing disagreement between the original driver data and the newest data (which I'm referring to as gridmetetl, after the name of the repo, to distinguish it from the data I pulled earlier this spring [and plotted up above] using the code in the onhm-fetcher-parser repo).

Unfortunately, the newly pulled 2015 gridmetetl driver data (here denoted as 'new) also does not match the original 2015 driver data (denoted as 'old):
GridmetetlTmax_v_TmaxOld
GridmetetlTmin_v_TminOld
GridmetetlPrcp_v_PrcpOld

Here's a comparison of 2015 gridmetetl tmin data and original tmin data for segment 2047:
Seg2047_OriginalGridmetetl_Tmin

The gridmetetl driver data show a nearly identical spatial pattern to the onhm-fetcher-parser driver data (see above). It seems like there is still an indexing issue.
Gridmetetl_2015_DriverDataDiffs
Gridmetetl_2015_Jun-Aug_TmaxTmin
Interestingly, the gridmetl precipitation data and onhm-fetcher-parser precipitation data are identical, while the gridmetetl tmin and tmax data are consistently 0.33 - 0.39 degrees celsius higher (0.33-0.35 on average) than the onhm-fetcher-parcer tmin and tmax data.
Gridmetetl-Ohnm_2015
For a given segment, the difference between gridmetl and onhm tmin does not appear to be related to the difference between gridmetetl and onhm tmax:
Gridmetetl-Ohnm_Tmax_v_Tmin

Next steps: Reach out to Rich to report on how implementing the new code went (well), and how the new data looks (same issues as before)

@hcorson-dosch-usgs
Copy link
Contributor

Did a spot check with observed weather data, per Sam's suggestion. Here are plots for Segment 2047, in Wilmington, DE:
2047_Tmin_Comparison
2047_Tmax_Comparison
2047_Prcp_Comparison
Precipitation isn't very well matched by either driver data set, but the old driver data is clearly much closer to actual observed temperatures that the recently pulled driver data, at least for this segment

@jzwart
Copy link
Member Author

jzwart commented Jun 1, 2020

From Rich:

I just checked, and the gridmet driver hru id is saved in the netcdf file ouput as hruid so you should be able to read that and incorporate it. They are ordered in the nectcdf file from 5308 - 7252. There is a total of 764 hrus. This is based on the shapefile that I was given for the Delaware.

That order is not the same as the .cbh HRU order which is based on model index. Will try reordering to model index and compare again.

@hcorson-dosch-usgs
Copy link
Contributor

Ok - so it indeed comes down to a difference in how the .cbh files are indexed. There does not appear to be any indexing error in how the current set of scripts downloads netCDF files or in how those files are converted to .cbh. The issue we were seeing when mapping data to the segment level was that the 'new' driver data used a different set of indices than the old driver data, so our script to map values to segments based on the hru values, when supplied with the new driver data, was computing area weighted averages for the wrong hrus (thinking they were indexed based on model_idx [like the old driver data], when they were actually index based on hru-ids]. We didn't catch this before because the .cbh files actually don't have any header information, so there is no way to know how they are indexed without digging into their generation. We assumed that the old and new data were indexed the same way, when they actually weren't.

Driver_Data_indexing

Note: Accounting for the indexing issues solves the majority of our problem, but there still appear to be some discrepancies between the old and new driver data (at least for the date shown: 2015-01-01). I'll dig more into that tomorrow. I can also regenerate the segment-level plots tomorrow

@jordansread
Copy link
Member

🕵️‍♀️ Nice work!

@hcorson-dosch-usgs
Copy link
Contributor

hcorson-dosch-usgs commented Jun 4, 2020

Ok - so I modified the ncf2cbh script to take the data imported from netCDF, re-indexed it to model_idx, and then export it as .cbh files.

At the hru-level, the re-indexed data matches the gridmetetl data I first downloaded (i.e., reindexing worked correctly), and now the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data are both using model_idx to index the hrus. As I mentioned in my last post, there are clearly still some discrepancies between the old and new driver data -- the plots below dig into this more:
Here - January 1st, 2015
Driver_Data_reindexed

Differences notable here, on June 1st, 2015:
20150601_Driver_Data_reindexed

I then used the re-indexed .cbh files as input to Jake's script to derive segment-level driver data. Here are a series of plots comparing the 'new' driver data to the 'old' driver data . Still some weird things happening @jzwart any thoughts?:
image
image
image

But these time-series plot for segments look much better than before (though there are clearly discrepancies):
2047 - tmin
image
1830 - tmax
image
1634 - tmin
image

Here are the segment level plots - Differences between 2015-2016 year-round averages for each segment. For themajority of segments the data is different between the two datasets, but the magnitude of the differences is much smaller:
Gridmetetl_Reindexed_DriverDataDiffs

2015-2016 June-August average for each segment. Looks much better than before, in that the new driver data now shows the N-S gradient. However note that the ranges of temperatures differ somewhat:
Gridmetetl_Reindexed_Jun-Aug_TmaxTmin

Then I re-did the weather data plots for segment 2047. These look weird (esp. tmin), so I'm going to pull some weather data for segments for which driver data differs substantially between the 'old' and 'new' data sets (based on the segment plots, above), and plot that up as well.
Tmin almost looks shifted by a day, but that isn't reflected in the other two plots....
Reindexed_2047_Tmin_Comparison
Reindexed_2047_Tmax_Comparison
Reindexed_2047_Prcp_Comparison

@jzwart
Copy link
Member Author

jzwart commented Jun 4, 2020

Nice job on this, @hcorson-dosch ! It would be good to know how different the HRU driver data is too, in addition to the segment driver data, just to rule out any issues with the function that grabs driver data for each segment. A quick comparison by reading in the old and new .cbh files and then plotting column 8 (for example) from each .cbh file against each other would be a quick test.

Maximum temperature looks pretty good, but it's strange that tmin seems adjusted one-two days before the other data. Is that the case for every segment?

Glad that the mean differences are much better now, that's encouraging 👍

@rmcd-mscb
Copy link

@hcorson-dosch I agree with Jake, The direct link between the Gridmet data as pulled and the Delaware model is the HRU,

Also there have been some changes to the raw Gridmet data that could potentially result in differences between what currently pulled and the original data: http://www.climatologylab.org/gridmet.html

Look at the Updates tab

@hcorson-dosch-usgs
Copy link
Contributor

Thanks for the link re: gridmet updates @rmcd-mscb! I'll make some hru-level plots today

@hcorson-dosch-usgs
Copy link
Contributor

Alright - big update here. Unfortunately, it's mostly more confusing plots.

I dug into hru-level comparisons, and it turned up some clear discrepancies between the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data.

Some scatter plots - data for all 765 hrus:
Hru-level_Reindexed_vs_Original
In some months, data is relatively similar:
Reindexed_vs_Original_20150901
Particularly weird patterns in other months, esp. Feb 2016
Reindexed_vs_Original_20160215
If we plot data for all of the days in a given month (random colors), for all of the hrus, the oddities really show up:
AllHRUs_Reindexed_vs_Original_June2015_June2016
AllHRUs_Reindexed_vs_Original_Feb2015_Feb2016
I took a look at the difference between the datasets over time (for all hrus - layered with partial transparency), and it's striking how large the differences are on certain days (even though they averaged out to differences of +/- 1 degree over 2015-2016, as seen above). Note that Feb 2016 is particularly strange:
20152016_O-GMdiff_allhrus
Here's a close up of spring 2016:
spring2016_O-GMdiff_allhrus

I plotted the differences between the maximum temperatures in each set of driver data for each hru spatially (sorry, I didn't note that it's tmax in the plot titles) - there doesn't seem to be a consistent pattern:
HRUmap_Orig-Reindexed_20150901
HRUmap_Orig-Reindexed_20160115
HRUmap_Orig-Reindexed_20160201
HRUmap_Orig-Reindexed_20160315
Even when we look at the absolute value of the differences in tmax between the two driver data sets:
absvalue_Orig-Reindexed_June2015
absvalue_Orig-Reindexed_Feb2016
So then I plotted the maximum temperature data side by side spatially for a few days:
HRUmap_20160901_Reindexed_vs_Original
HRUmap_20160201_Reindexed_vs_Original
Interestingly, the spatial pattern matched better if the new data was compared to one day ahead in the old data:
HRUmap_20160901_+1DayReindexed_vs_Original
HRUmap_20160201_+1DayReindexed_vs_Original
So then I went back and updated some of those plots to shift the reindexed gridmetel data by +1 day. Things look a little tighter, but still different, and something strange is still happening in Feb 2016:
Hru-level_SHIFTEDReindexed_vs_Original
AllHRUs_SHIFTEDReindexed_vs_Original_June2015_June2016
AllHRUs_SHIFTEDReindexed_vs_Original_Feb2015_Feb2016
spring2016_O-shiftedGMdiff_allhrus

SO clearly some odd things happening: maybe some shifted data, and then really big differences in Feb 2016.

Jake suggested that I do some more comparisons to weather data, and Sam suggested that I also compare to raw gridmet data.

So I did both, picking 3 hru/segment locations. I chose data from Jan-Mar 2016 for two, and data from May 2015 for the other one:

First- Jan-Mar 2016 for segment 1493 and hru model_idx 282, near Trenton, NJ:
Original Tmax -Reindexed Tmax, with hru highlighted:
spring2016_O-GMdiff_allhrus_282highlighted
Comparison of raw, original, reindexed, and shifted reindexed driver data:
282_2016_Tmax
Comparison to weather data:
Reindexed_1493_Tmin_Comparison
Reindexed_1493_Tmax_Comparison
Reindexed_1493_Prcp_Comparison

Second- Jan-Mar 2016 for segment 1722 and hru model_idx 575, near Allentown, PA:
Original Tmax -Reindexed Tmax, with hru highlighted:
spring2016_O-GMdiff_allhrus_575highlighted
Comparison of raw, original, reindexed, and shifted reindexed driver data:
575_2016_Tmax
Comparison to weather data:
Reindexed_1722_Tmin_Comparison
Reindexed_1722_Tmax_Comparison
Reindexed_1722_Prcp_Comparison

Lastly - May 2015 for segment 2041 and hru model_idx 748, near Wilmington, DE:
Original Tmax -Reindexed Tmax, with hru highlighted:
spring2016_O-GMdiff_allhrus_748highlighted
Comparison of raw, original, reindexed, and shifted reindexed driver data:
748_2016_Tmax
Comparison to weather data:
Reindexed_2041_Tmin_Comparison
Reindexed_2041_Tmax_Comparison
Reindexed_2041_Prcp_Comparison

So -- a lot to think over. From all of this, I think the biggest takeaway is that the raw gridmet data matches the reindexed gridmetal data (👍 b/c that means the pull script is working correctly), BUT that means that the raw gridmet data also does not match weather data in February 2016 (👎 b/c I have no idea what could be messing up all the data for that whole month). Also - in some places the reindexed driver data look like they would be better if shifted by one day (e.g. tmin in May 2015 for Wilmington, DE and tmin and tmax in March 2016 for Trenton and Allentown), while in other spots (e.g. tmin and tmax in January 2016 for Trenton and Allentown), the raw gridmet data and therefore reindexed driver data appear to match weather data quite well. It doesn't make sense to me that only some of the data in some locations would need to be shifted. And I have no idea what could be happening in February 2016. 2016 was a leap year, but I don't see why that would throw the whole month of data off.

Thoughts??

Let me know if there's anything else I should plot. At this point I think I need to pause and get some input/other people's ideas before I do any more digging, since I'm just getting more and more confused by the data.

@hcorson-dosch-usgs
Copy link
Contributor

Meant to mention I pulled the raw data in netCDF format from the gridmet site using the Aggregated THREDDS Catalog (OPENDAP)

@jzwart
Copy link
Member Author

jzwart commented Jun 10, 2020

WOW @hcorson-dosch! This is a ton of work, and thanks for pulling together all these comparisons.
I agree with your summary paragraph, and it also seems like the original driver data (given to us with original PRMS-SNTemp cutout) is not gridmet data, right? Since it doesn't match to the re-indexed gridmetetl or gridmet pulled directly from THREDDS.

The shifted +1 day in some locations and times is really strange to me, while other time periods it actually matches really well. I guess I'm naive about whether this is an issue with gridded met data in general or something specific to gridmet. I know there are some biases in NLDAS data when compared to weather data, for example, but those seemed to be a consistent offset rather than the messiness that is happening here.

I think it's encouraging that the original driver data matches the weather data fairly well. It would be good to know how that was created and the source of the driver data.
These are my initial thoughts and it'd be good to hear from others

@limnoliver
Copy link
Member

limnoliver commented Jun 10, 2020

Oooof. On one hand, bummer that there are multiple issues, but I think you're narrowing in! I'm not sure what to do/think about the Gridmet data -- and it would be difficult to (comprehensively) know which segments or what time periods to shift by a day. Maybe as one last check of the raw data, download directly from Gridmet instead of THREDDS?

Hayley - one way to view error might be to look at cumulative error (either from weather data or from old driver data) to get a sense for how bad it is across reaches. So for your three reach examples, sum absolute error every day (of reindexed and offset data), and then plot cumulative error through time? This might help identify the time periods that are offset, and should ID February as a true outlier.

@jzwart how are you feeling about this now? Where do we go from here if the Gridmet data is "bad"?

Edited - oops, sorry Jake, I had not refreshed before I commented! Thanks for your input. I will ask Steve about the origins of the driver data.

@limnoliver
Copy link
Member

This pub from 2018 about NHM says it was configured with Daymet.

@jordansread
Copy link
Member

Only because it seems oddly reminiscent of a similar offset issue we had in the past that was based on how the metadata was used on OPeNDAP, sharing this: https://github.com/USGS-CIDA/stream_metab_usa/issues/68

@hcorson-dosch-usgs
Copy link
Contributor

Thanks all. @jread-usgs that's really interesting that you and Dave saw a similar issue with OPENDAP and reading data in from .netcdfs. Something similar could be happening here.

@limnoliver I did download some data directly from gridmet, but couldn't get it to read in correctly using two different netcdf packages in Python that I've been using. Sometimes all the temp values were NaN, and in other cases the dates were 1 day off, and then in others the dates were way off. So that might all be tied back to the issue Jordan pointed to. Perhaps I should try again, but read it in using a different package, or try reading it in in R? That's also a good idea to look at cumulative error.

I'll plan to dig back into this tomorrow.

@rmcd-mscb
Copy link

I think it's important to compare the HRU data rather than the segment data, because Gridmet and Daymet are specifically mapped to the HRU.

Also, you should be able to read that raw gridmet data directly into xarray.

I'll also double check the gridmetetl code to make sure there are no indexing errors.

@hcorson-dosch-usgs
Copy link
Contributor

hcorson-dosch-usgs commented Jun 10, 2020

Hi @rmcd-mscb thanks for double-checking the gridmetetl code.

All of the plots in my latest comment plot the HRU data, not the segment-level data except for nine of the plots near the end that compare weather data to driver data derived at the segment level (as well as raw gridmet data at the grid cell [lat, long] level).

I was using xarray to read in the netCDF data. It worked great for the netCDF files generated by your script and for the netCDF files I pulled from THREDDS, but for some reason when xarray read in raw netCDFs directly from gridmet I was running into errors. For the individual day netCDFs, the dates were incorrect:
image

And for the annual netCDFs I tried, xarray could not read in the temperature variable correctly, giving an error that it had an unsigned attribute but was not of integer type:
image

I also tried the Dataset module from the netCDF package, but was getting null temperature values from that as well. Since I could read in the netCDF data from THREDDS just fine with xarray, I used that data and didn't really dig into those issues, but I'll now try to troubleshoot why the raw gridmet data wasn't being read correctly by xarray.

@rmcd-mscb
Copy link

Hi Hayley,
That is a lot of work! And to be honest it makes my head spin a little :) I think the xarray to_dataframe() is maybe giving you issues with dates. If you were to just look at your xarray dataset, for the annual data ( that is all I've ever used in my testing) you still get that use_cftime error but I think it's just a matter of setting the use_cftime parm. And I don't see in the attributes what calendar there are using sp not sure how to set that.

Anyway look at this notebook here: https://nbviewer.jupyter.org/github/nhm-usgs/gridmetetl/blob/delaware/Examples/Delaware_xarray.ipynb

you should be able to extract the data. If you know the lat/lon of your weather station you can grab it from xarray with something like this: ds.sel(lat=-35.28, lon=149.13, method='nearest'). If you now what hru the weather station lies in, you could plot weather station data against the climate data for that HRU rather than the segment. It won't be perfect remember GM is 4km resolution

HOWEVER - I think I found the offset error. If you look at the attributes you will see the note below which I've never noticed before - Days are in MST! Also note it says approximately

note5 :
Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)

Finally, if the original forcing data is Daymet, Maybe Parker could pull you a new bandit version with updated daymet data. 2019 Daymet data just became available.

@hcorson-dosch-usgs
Copy link
Contributor

@rmcd-mscb Thanks for all your work digging into this! I'll take a look at your notebook tomorrow, and will try addressing the offset you pointed out to see if that fixes our issues. I'll also add the weather data to the hru-level plots that compare the hru-level driver data to the raw gridmet data I had extracted from the grid at the location of the weather data.

@hcorson-dosch-usgs
Copy link
Contributor

hcorson-dosch-usgs commented Jun 15, 2020

Ok - I was able to read in the raw gridmet data - It had read in correctly before, but appeared to be null until I specified a particular date and/or grid cell for which to view data. Not sure why. I've updated the HRU-level plots to include the raw gridmet data alongside the THREDDS gridmet data and the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data, as well as the 'new' driver data shifted by +1 day. I've included plots for three locations (Wilmington, DE; Trenton, NJ; Allentown, PA) for three periods (June 2015; January 15 - March 15, 2016; October 2016). Note that the driver-data values for each HRU are generated by weighting the gridded data for all cells that overlap each HRU, so may not match the raw gridmet data exactly.

Tmax

Overall, there is very good agreement between the raw gridmet data, the reindexed driver data, and weather data for all of the periods and locations, except for February 2016, when the raw gridmet data and reindexed driver data do not track the weather data at all.

June 1-30, 2015
Wilmington, DE
748_2015_06_Tmax
Trenton, NJ
282_2015_06_Tmax
Allentown, PA
575_2015_06_Tmax
Jan 15 - Mar 15, 2016
Wilmington, DE
748_2016_02_Tmax
Trenton, NJ
282_2016_02_Tmax
Allentown, PA
575_2016_02_Tmax
October 1-31, 2016
Wilmington, DE
748_2016_10_Tmax
Trenton, NJ
282_2016_10_Tmax
Allentown, PA
575_2016_10_Tmax

Tmin

With these plots that are temporally zoomed in (relative to my past plots), it appears that the raw gridmet data and the reindexed driver data are pretty consistently offset from the weather data for all of the periods and locations. It looks like the offset is ~1 day. As for Tmax, the raw gridmet data and reindexed driver data do not track the weather data at all in February 2016.

June 1-30, 2015
Wilmington, DE
748_2015_06_Tmin
Trenton, NJ
282_2015_06_Tmin
Allentown, PA
575_2015_06_Tmin
Jan 15 - Mar 15, 2016
Wilmington, DE
748_2016_02_Tmin
Trenton, NJ
282_2016_02_Tmin
Allentown, PA
575_2016_02_Tmin
October 1-31, 2016
Wilmington, DE
748_2016_10_Tmin
Trenton, NJ
282_2016_10_Tmin
Allentown, PA
575_2016_10_Tmin

Precipitation

Overall, the raw gridmet data and the reindexed driver data are pretty consistently offset from the weather data for all of the periods and locations. It looks like the offset is ~1 day. As for Tmax and Tmin, the raw gridmet data and reindexed driver data do not track the weather data at all in February 2016.

June 1-30, 2015
Wilmington, DE
748_2015_06_Prcp
Trenton, NJ
282_2015_06_Prcp
Allentown, PA
575_2015_06_Prcp
Jan 15 - Mar 15, 2016
Wilmington, DE
748_2016_02_Prcp
Trenton, NJ
282_2016_02_Prcp
Allentown, PA
575_2016_02_Prcp
October 1-31, 2016
Wilmington, DE
748_2016_10_Prcp
Trenton, NJ
282_2016_10_Prcp
Allentown, PA
575_2016_10_Prcp

Overall, it appears that the actual raw gridmet data match the gridmet data pulled from THREDDS and the reindexed driver data pulled using Rich's gridmetetl repo (👍 ) but that suggests that the actual raw gridmet data are incorrect in February 2016 (👎 👎 👎 ). Also - the tmin and prcp data appear to be consistently offset from weather data by approximately 1 day (👎 ❓ ). Interestingly, the tmax data do not appear to be offset (😕 ❔).

I'm still pretty mystified by this offset. According to the documentation,

gridMET nominally considers a "day" to be midnight-to-midnight

And as @rmcd-mscb noticed, it also states in the metadata that:

Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day)

Here's how I am thinking about it, which may be incorrect. If so, please let me know!

  • The gridded data, driver data, and weather data are all daily data
  • The gridmet data and the driver data come in with a timestamp of 00:00:00
  • The weather data is imported with only a datestamp, but when converted to datetime by pandas in Python, the default time stamp is 00:00:00
  • For a given day, such as 1/1/2015, the gridmet data is for 12:00:00 AM MST to 11:59:59 PM MST.
  • This is equivalent to the following period in EST: 2:00:00 AM EST on 1/1/2015 to 1:59:59 AM EST on 1/2/2015
  • That still places the bulk of the day (22 hours) on the first day (here, 1/1/2015), so the gridmet data should still represent conditions that first day (e.g., 1/1/2015 rather than 1/2/2015)
  • Therefore, the gridmet data should not need to be shifted by a full day
  • The offset that is visible appears to be greater than a 2-hour difference between time zones could explain:
    748_2016_06_w12_Tmin

Does that thinking seem correct? Or am I missing something obvious?

If you are curious, here are descriptions of the metadata for the netCDF files on the Northwest Knowledge network: tmax, tmin, prcp. According to this documentation, they all have the same datetime format.

It seems odd that the tmax data is not offset, yet the tmin and prcp data do seem to be offset. Maybe there is some error in how datetime is being encoded for the tmin and prcp data?

Thoughts about the offset or what could be happening in February 2016?

@jzwart
Copy link
Member Author

jzwart commented Jun 16, 2020

Thanks @hcorson-dosch! I agree with your assessment. The Feb 2016 issue is definitely strange and the offset that is dependent on data product / location / time of year is very confusing.
I'm also getting confused if the offset is in the right direction, and I think I've convinced myself it is correct -> the weather happening in the DRB today is given a timestamp of yesterday (e.g. MST) so the adjustment should be +1 and not -1 (I'm mostly writing this down for myself in case I get confused again :) ).
This doesn't help explain all the issues you point out such as only being 2 hours behind and not 24 hours behind, or why tmin and tmax don't both have a consistent offset. I really don't know what would cause this and I wonder if it's worth checking with gridmet folks to see if this issue has come up before

@rmcd-mscb
Copy link

Hi @hcorson-dosch and @jzwart

I've pointed this thread to Steve M, Parker N and Jacob LF, and it's on our list to look at early next week. They have more experience with this kind of data and correlations to weather station data, in the context of PRMS. In anticipation of showing this to the gridMET folks, can I suggest redoing the plots above by relabeling the data as:

Raw Gridmet data -> gridMET point interpolation
Original driver data -> Daymet HRU
Reindexed driver data -> gridMET HRU
Keep the weather station data
remove Thredds Gridmet data and the shifted data

Also note that the validation of gridMET data was made for stations in the Western US: https://webpages.uidaho.edu/jabatzoglou/PDF/IJOC_Abatzoglou_2012.pdf

I think it would be good to hear what the NHM group thinks first but then I'm happy to help and be involved in a conversation with the gridMET folks in anyway that would be helpful to you.

@hcorson-dosch-usgs
Copy link
Contributor

Hi @rmcd-mscb thanks for looping in Steve, Parker, and Jacob.

After talking everything over with our stream temperature modeling team yesterday, we decided to reach out to Gridmet, so I sent John Abatzoglou and Katherine Hegewisch an email yesterday. I included some cleaned up figures that only showed the weather data and the raw Gridmet (gridMET point interpolation) data. The figures were selected to show:

  1. An offset of ~1 day in the tmin and prcp gridded data, and
  2. Deviations from observed weather data in February 2016 in the tmin, tmax, and prcp data.

Here are the three figures I sent, along with the text I included to explain what we were seeing:

The first two plots (‘AllentownPA_2015_06_Prcp’ and ‘WilmingtonDE_2016_10_Tmin’) illustrate the offset that we have observed in the 2015-2016 prcp and tmin data. In the prcp plot, the offset seems very consistent (~1 day), but in the tmin plot it appears that the offset may be less consistent. This offset is evident throughout 2015 and 2016 for the tmin and prcp data at all three sites. In general, it appears to be a fairly consistent offset of ~1 day (for both prcp and tmin). Interestingly, the 2015-2016 tmax data is not offset at any of the sites.
AllentownPA_2015_06_Prcp
WilmingtonDE_2016_10_Tmin
The third plot (‘TrentonNJ_2016_02_Tmax’) illustrates how the Gridmet tmax data deviates from weather data in February 2016. The deviations in February 2016 are observed at all three sites and for all three gridded products (tmin, tmax, and prcp). Note that the Gridmet tmax data from 1/1–1/31 and from 3/2–3/15 matches the weather data very well (without any offset).
TrentonNJ_2016_02_Tmax

Both John Abatzoglou and Katherine Hegwisch responded today, with some helpful information. John dig some digging into the Wunderground precipitation data to which I had compared the Gridmet data. By inspecting the hourly Wunderground data, he noticed that Wunderground does not accurately report daily precipitation totals, which was unknown to both him and to our team:

The precipitation offset piece is a bit more puzzling. So I looked at the Wunderground site since I’ve pulled data there as well. Keying in on the Lehigh airport on two days : June 20-21 2015. Their daily summary shows what you have, basically an inch on the 21st and nada on the 20th. But diving into the hourly reveals something a bit different
June 20
https://www.wunderground.com/history/daily/us/pa/allentown/KABE/date/2015-6-20
and then June 21st 2015
https://www.wunderground.com/history/daily/us/pa/allentown/KABE/date/2015-6-21
Looking closer on those day’s you’ll see they have daily summaries (from the previous 24 hours ending at weird times like 11am on one day, 5am the other day). Bottom line is that I’m not sure their function is given you a true day. I certainly had no idea before looking in. Quite odd.

He also had some thoughts about the tmin data offset:

The tmin shift might be a bit of an artifact of a ‘day’ as well. The ASOS stations are reporting a true day, ending midnight local, we use a 7Z local which would be a bit closer to 2am. Should the daily low happen midnight plus or minus a couple hours this could cause a difference. Most of the time daily lows are closer to sunrise.

And noted that he and his team would look into the February 2016 data:

There does seem to be something weird w/the February 2016 data that we’ll look into.

Katherine then chimed in, and provided some very helpful information about a previous user's experience reading the Gridmet NetCDF files using xarray:

Recently, a user communicated with me about a similar 1-day shift issue with our netcdf files which they were seeing using xarray and Pandas in python to extract/translate times from the netcdf files. After going back and forth with them, they discovered that pandas was throwing a warning:
RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, 'julian', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates.
t = timeseries.indexes['time'].to_datetimeindex()
and they concluded that the 1-day shift likely was do with xarray->pandas. When I ran the same extraction in MATLAB or in python (using netcdf4 instead), my dates were different than theirs by 1 day and I did not see a 1-day shift w.r.t. PRISM.
Note that I can do your same comparison of the gridmet data to PRISM using app.climateengine.org and we see no shift in ppt/tmin.

I've downloaded new weather data (from NOAA), which I will use for updated comparisons. I am also going to read in the Gridmet data using netcdf4 instead of xarray, to see if that addresses the shift in tmin and prcp that we had observed. What's odd to me, however, is that I saw no shift in the tmax data, even though it was read in with xarray using exactly the same method as I used to read the other netCDF files.

I'll add new plots here as soon as I have them put together.

@hcorson-dosch-usgs
Copy link
Contributor

Just a quick note to keep things up-to-date here. John Abatzoglou fixed the February 2016 data, so that issue has been resolved, which is great 👍. John's hunch about the apparent offset in prcp data being due to the faulty Wunderground data was correct. The NOAA weather data I downloaded aligns with the 2015 gridMET prcp data 👍. However, we are still seeing some temporal offsets in the updated 2016 data 👎. I've sent some updated plots to John and Katherine with some additional discussion and questions for them. Hopefully we can get to the bottom of the offset issue shortly.

@limnoliver
Copy link
Member

Awesome work, @hcorson-dosch -- seems like we're getting close!

@hcorson-dosch-usgs
Copy link
Contributor

hcorson-dosch-usgs commented Jul 23, 2020

Another quick update: since I last reported here John Abatzoglou fixed the temporal offsets in the updated 2016 tmax and prcp data. Following that fix, I downloaded the raw gridMET netCDF files for each year between 1980 and 2019. I also downloaded NOAA weather data. Comparisons of the gridMET and NOAA data show that the gridMET tmax and tmin data in February 2017 (and possibly in part of February 2019) deviate from recorded weather data -- similar to the deviations we'd observed in February 2016. I reached out to John Abatzoglou about these discrepancies. He is in the midst of transitioning their systems, but will fix those variables when the systems are stable again - likely later this month or in August. Once that data is updated we'll be ready to pull and process the 1980-2019 gridMET data and use it as a driver for a new SNTemp run.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants