Caching

Warning

This part of CliMetLab is still a work in progress. Documentation and code behaviour will change.

Purpose

CliMetLab caches most of the remote data access on a local cache. Running again cml.load_dataset or cml.load_source will use the cached data instead of downloading it again. When the cache is full, cached data is deleted according it cache policy (i.e. oldest data is deleted first). CliMetLab cache configuration is managed through the CliMetLab Settings.

Warning

The CliMetLab cache is intended to be used by a single user. Sharing cache with multiple users is not recommended. Downloading a local copy of data on a shared disk to have multiple users working is a different use case and should be supported through using mirrors. Feedback and feature requests are welcome.

Cache location

The cache location is defined by the cache‑directory setting. Its default value depends on your system:

  • /tmp/climetlab-$USER for Linux,

  • C:\\Users\\$USER\\AppData\\Local\\Temp\\climetlab-$USER for Windows

  • /tmp/.../climetlab-$USER for MacOS

The cache location can be read and modified either with shell command or within python.

Note

It is recommended to restart your Jupyter kernels after changing the cache location.

From a shell with the climetlab command:

# Find the current cache directory
$ climetlab settings cache-directory
/tmp/climetlab-$USER

# Change the value of the setting
$ climetlab settings cache-directory /big-disk/climetlab-cache

# Cache directory has been modified
$ climetlab settings cache-directory
/big-disk/climetlab-cache

From a python notebook or python script:

>>> import climetlab as cml
>>> cml.settings.get("cache-directory") # Find the current cache directory
/tmp/climetlab-$USER
>>> # Change the value of the setting
>>> cml.settings.set("cache-directory", "/big-disk/climetlab-cache")

# Python kernel restarted

>>> import climetlab as cml
>>> cml.settings.get("cache-directory") # Cache directory has been modified
/big-disk/climetlab-cache

More generally, the CliMetLab settings can be read, modified, reset to their default values using the climetlab command or from python, see the Settings documentation.

Cache limits

Maximum-cache-size

The maximum-cache-size setting ensures that CliMetLab does not use to much disk space. Its value sets the maximum disk space used by CliMetLab cache. When CliMetLab cache disk usage goes above this limit, CliMetLab triggers its cache cleaning mechanism before downloading additional data. The value of cache-maximum-size is absolute (such as “10G”, “10M”, “1K”).

Maximum-cache-disk-usage

The maximum-cache-disk-usage setting ensures that CliMetLab leaves does not fill your disk. Its values sets the maximum disk usage of the filesystem containing the cache directory. When the disk space goes below this limit, CliMetLab triggers its cache cleaning mechanism before downloading additional data. The value of maximum-cache-disk-usage is relative (such as “90%” or “100%”).

Warning

If your disk is filled by another application, CliMetLab will happily delete its cached data to make room for the other application as soon as it has a chance.

Note

When tweaking the cache settings, it is recommended to set the maximum-cache-size to a value below the user disk quota (if appliable) and maximum-cache-disk-usage to None.

Caching settings default values

Name
Default
Description
cache‑directory
‘/tmp/climetlab‑docs’
Directory of where the downloaded files are cached, with ${USER} is the user id. See Caching for more information.
maximum‑cache‑disk‑usage
‘90%’
Disk usage threshold after which CliMetLab expires older cached entries (% of the full disk capacity). See Caching for more information.
maximum‑cache‑size
None
Maximum disk space used by the CliMetLab cache (ex: 100G or 2T).

Other CliMetLab settings can be found here.