Datasets plugins

A Dataset is a Python class that provide a curated set of data with specific helper functions. CliMetLab has build-in example datasets for demo purposes. See usage details in Dataset (User guide) and implementation in Dataset (Dev guide). Dataset are added with pip plugin or yaml files.

Simple datasets using yaml files

Simple datasets are datasets that rely on existing built-in data source, and cannot be parametrised by users. This can be for example a single file downloadable from a URL.

---
dataset:
  source: url
  args:
    url: http://download.ecmwf.int/test-data/metview/gallery/temp.bufr

  metadata:
    documentation: Sample BUFR file containing TEMP messages

Complex datasets using pip plugin

See https://github.com/ecmwf/climetlab-demo-dataset

  setuptools.setup(
      name="climetlab-demo-dataset",
      version="0.0.1",
      description="Example climetlab external dataset plugin",

      entry_points={"climetlab.datasets":
              ["demo-dataset = climetlab_demo_dataset:DemoDataset"]
      },

  )

See CliMetLab plugin mechanism.

See an example notebook using an external plugin.

Python documentation on plugins.

Automatic generation of a pip package

To make it easier, there is a template for a Dataset plugin using cookiecutter. In addition, for a simple dataset, you can also use a yaml file and rely only on the code provided by CliMetLab or other plugins.

pip install cookiecutter
cookiecutter https://github.com/ecmwf-lab/climetlab-cookiecutter/dataset