You can run this notebook in Binder, in Colab, in Deepnote or in Kaggle.

[ ]:
!pip install --quiet climetlab

Creating a shared dataset of GRIBs

[1]:
import climetlab as cml

Download data to the climetlab cache

[ ]:
for month in range(1, 13):  # This takes a few minutes.
    cml.load_source(
        "mars",
        param=["2t"],
        levtype="sfc",
        area=[50, -50, 20, 50],
        grid=[1, 1],
        date=f"2012-{month}",
    )
[ ]:
cml.load_source(
    "mars",
    param="msl",
    levtype="sfc",
    area=[50, -50, 20, 50],
    grid=[1, 1],
    date="2012-12-01",
);

Export the data to a shared directory

This is optional, you could keep working on the data from the cache if you are the only user of the data and you do not mind redownloading it later. Other people should not use your cache: - When using climetlab the cache will eventually fills up and the data may be deleted automatically, - You will need to deal with permissions issues. - It will make it difficult to share the data with other people.

Let us export the data to a shared directory shared-data/temperature-for-analysis

[4]:
# Some housekeeping
!rm -rf shared-data/temperature-for-analysis
!mkdir -p shared-data/temperature-for-analysis
[5]:
# export all data from my cache which is from mars and not older that 1 day
!climetlab export_cache shared-data/temperature-for-analysis --newer 1h --match mars
Copying cache entries matching 'mars' and newer than '2023-03-11 13:29:29' to shared-data/temperature-for-analysis.
100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 367.98it/s]
Copied 13 cache entries to shared-data/temperature-for-analysis.

Create indexes to speed up data access when using it. (Optional)

[ ]:
!climetlab index_directory shared-data/temperature-for-analysis
[ ]:
!climetlab availability shared-data/temperature-for-analysis

Using the data

[18]:
DATA = "shared-data/temperature-for-analysis"
[19]:
source = cml.load_source("indexed-directory", DATA)
[20]:
source.availability
[20]:

class=od, domain=g, expver=0001, levtype=sfc, md5_grid_section=ce1bd075c48ae7a5bf34f4e47166e942, step=0, stream=oper, time=1200, type=an
   date=20120101/to/20121231, param=2t
   date=20121201, param=msl

This is a good time to check the data, is all the data here? Are they missing dates? Parameters?

The data is ready to be used as numpy, tensorflow or xarray object.

[11]:
source.sel(param="msl").to_numpy().mean()
[11]:
101725.47522756307
[22]:
cml.load_source("indexed-directory", DATA, param="msl").to_numpy().mean()
[22]:
101725.47522756307
[23]:
temp = source.sel(param="2t").order_by("date")
temp.to_tfdataset()
[23]:
<PrefetchDataset element_spec=TensorSpec(shape=<unknown>, dtype=tf.float32, name=None)>
[24]:
temp.to_xarray()
[24]:
<xarray.Dataset>
Dimensions:     (number: 1, time: 366, step: 1, surface: 1, latitude: 31,
                 longitude: 101)
Coordinates:
  * number      (number) int64 0
  * time        (time) datetime64[ns] 2012-01-01T12:00:00 ... 2012-12-31T12:0...
  * step        (step) timedelta64[ns] 00:00:00
  * surface     (surface) float64 0.0
  * latitude    (latitude) float64 50.0 49.0 48.0 47.0 ... 23.0 22.0 21.0 20.0
  * longitude   (longitude) float64 -50.0 -49.0 -48.0 -47.0 ... 48.0 49.0 50.0
    valid_time  (time, step) datetime64[ns] 2012-01-01T12:00:00 ... 2012-12-3...
Data variables:
    t2m         (number, time, step, surface, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2023-03-11T14:35 GRIB to CDM+CF via cfgrib-0.9.1...
[25]:
# Note that this is wrong (not implemented yet)
temp.availability
[25]:

class=od, domain=g, expver=0001, levtype=sfc, md5_grid_section=ce1bd075c48ae7a5bf34f4e47166e942, step=0, stream=oper, time=1200, type=an
   date=20120101/to/20121231, param=2t
   date=20121201, param=msl

[ ]:

[ ]: