Intake catalog referencing the official International Satellite Cloud Climatology Project (ISCCP) dataset on the NOAA S3 Bucket
s3://noaa-cdr-cloud-properties-isccp-pds
with the DOI 10.7289/V5QZ281S.
The datasets on the S3 bucket are saved as individual netCDF files at hourly or monthly resolution depending on the ISCCP product. Accessing or querying the entire ISCCP timeseries from July 1983 to June 2017 would require the download of the individual files and concatenation offline. With the dataset being available on a modern cloud storage, the requests can be made more efficient by loading only the chunks of data necessary for the computation of interest. To make this possible and have the entire ISCCP dataset lazily available, this repository created so called reference files and virtually merged the individual netCDF files to one dataset.
Warning
This is not an official repository. Because this catalog only references the original dataset, post-processing issues are limited but might still exist, particularly in the form of missing timesteps and metadata inconsistencies. Attributing this work is encouraged but the original data source provider should always be acknowledged and their reference policy followed.
pip install "intake<2.0.0" xarray intake-xarray zarr s3fs requests
>>> import intake
>>>
>>> # Load catalog
>>> cat = intake.open_catalog("https://raw.githubusercontent.com/ISSI-CONSTRAIN/isccp/main/catalog.yaml")
>>>
>>> # List catalog entries
>>> list(cat)
['ISCCP_BASIC_HGH', 'ISCCP_BASIC_HGG', 'ISCCP_BASIC_HGM']
>>>
>>> # Load dataset lazily as xarray dataset
>>> ds = cat['ISCCP_BASIC_HGG'].to_dask()
<xarray.Dataset> Size: 9TB
Dimensions: (time: 99352, lat: 180, lon: 360, cloud_irtype: 3,
cloud_type: 18, edge: 2, levpc: 7, levtau: 6,
satpos: 12)
Coordinates:
* lat (lat) float32 720B -89.5 -88.5 -87.5 ... 87.5 88.5 89.5
* levpc (levpc) float32 28B 95.0 245.0 375.0 ... 740.0 912.5
* levtau (levtau) float32 24B 0.5 2.3 6.0 14.5 34.74 109.8
* lon (lon) float32 1kB 0.5 1.5 2.5 3.5 ... 357.5 358.5 359.5
* time (time) datetime64[ns] 795kB 1983-07-01 ... 2017-06-30...
Dimensions without coordinates: cloud_irtype, cloud_type, edge, satpos
Data variables: (12/43)
cell_origin (time, lat, lon) float32 26GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
cldamt (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
cldamt_ir (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
cldamt_irmarg (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
cldamt_irtypes (time, cloud_irtype, lat, lon) float64 155GB dask.array<chunksize=(1, 3, 180, 360), meta=np.ndarray>
cldamt_types (time, cloud_type, lat, lon) float64 927GB dask.array<chunksize=(1, 18, 180, 360), meta=np.ndarray>
... ...
tc_pcdist (time, levpc, lat, lon) float64 361GB dask.array<chunksize=(1, 7, 180, 360), meta=np.ndarray>
tc_type (time, cloud_type, lat, lon) float64 927GB dask.array<chunksize=(1, 18, 180, 360), meta=np.ndarray>
time_bounds (time, edge) datetime64[ns] 2MB dask.array<chunksize=(1, 2), meta=np.ndarray>
wp (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
wp_ir (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
wp_type (time, cloud_type, lat, lon) float64 927GB dask.array<chunksize=(1, 18, 180, 360), meta=np.ndarray>
Attributes: (12/67)
Conventions: CF-1.6, ACDD-1.3
NCO: 4.4.4
acknowledgement: This project received funding s...
cdm_data_type: Grid
comment: ---------- TO RE-MAP EQUAL-AREA...
contributor_name: William B. Rossow, Alison Walke...
... ...
>>> ds.tau.sel(time='2017-05-01 00:00:00').plot()
DVC has been used to track the workflow to create the reference files.
The individual commands are therefore listed in the dvc.yaml
file and can be run by dvc repro
.
This step should only be necessary if the ISCCP dataset on the NOAA S3 bucket changes and errors occur.
- https://github.com/pangeo-forge/noaa-atmosphere-climate-cloud-properties-isccp-hgg-basic-feedstock
- https://github.com/pangeo-forge/noaa-atmosphere-climate-cloud-properties-isccp-hgh-basic-feedstock
These datasets seem to be unavailable