Data Manipulation¶
Todo
This section is a draft.
Methods provided by CliMetLab data objects¶
Methods provided by CliMetLab data objects (such as a Dataset, a data Source or a Reader): Depending on the data, some of these methods are or are not available.
A CliMetLab data object provides methods to access and use its data.
>>> source.to_xarray() # for gridded data
>>> source.to_pandas() # for non-gridded data
>>> source.to_numpy() # When the data is a n-dimensional array.
>>> source.to_tfrecord() # Experimental
Todo
Explain fields.to_xarray() and obs.to_pandas().
Explain data[0]
Add here more details about the .to_… methods.
Iterating¶
When a CliMetLab data source or dataset provides a list of fields, it can be iterated over to access each field (in a given order see below).
Let us get a source of fields from the Climate Data Store (CDS) and iterate through the list, each element is a field.
>>> import climetlab as cml
>>> ds = cml.load_source(
"cds",
"reanalysis-era5-single-levels",
param=["2t", "msl"],
product_type="reanalysis",
grid='5/5',
date=["2012-12-12", "2012-12-13"],
time=[600, 1200, 1800],
)
>>> len(ds)
10
>>> for f in ds: print(f)
GribField(2t,None,20121212,600,0,0)
GribField(msl,None,20121212,600,0,0)
GribField(2t,None,20121212,1200,0,0)
GribField(msl,None,20121212,1200,0,0)
GribField(2t,None,20121212,1800,0,0)
GribField(msl,None,20121212,1800,0,0)
GribField(2t,None,20121213,600,0,0)
GribField(msl,None,20121213,600,0,0)
GribField(2t,None,20121213,1200,0,0)
GribField(msl,None,20121213,1200,0,0)
GribField(2t,None,20121213,1800,0,0)
GribField(msl,None,20121213,1800,0,0)
Selection with [...]
¶
When a CliMetLab data source or dataset provides a list of fields, it can be iterated over to access each field (in a given order see below).
A subset of the list can be created using the standard python list interface relying on brackets and slices.
>>> import climetlab as cml
>>> ds = cml.load_source(
"cds",
"reanalysis-era5-single-levels",
param=["2t", "msl"],
product_type="reanalysis",
grid='5/5',
date=["2012-12-12", "2012-12-13"],
time=[600, 1200, 1800],
)
>>> len(ds)
10
>>> print(f[0])
GribField(2t,None,20121212,600,0,0)
>>> for f in ds[0:3]: print(f)
GribField(2t,None,20121212,600,0,0)
GribField(msl,None,20121212,600,0,0)
GribField(2t,None,20121212,1200,0,0)
>>> for f in ds[0:5:2]: print(f)
GribField(2t,None,20121212,600,0,0)
GribField(2t,None,20121212,1200,0,0)
GribField(2t,None,20121212,1800,0,0)
Selection with .sel()
¶
When a CliMetLab data source or dataset provides a list of fields, it can be iterated over to access each field (in a given order see below).
The method .sel()
allows filtering this list to select a subset of the list of fields.
For instance, the following examples shows how to select various subsets of fields from a list of fields.
After selection the required list of fields, the selected data from this subset is available with the
methods .to_numpy()
, .to_pytorch()
, .to_xarray()
, etc…
This list of fields can be filtered to extract on the fields corresponding to the 2m-temperature parameter with .sel(param="2t")
:
>>> import climetlab as cml
>>> ds = cml.load_source(
"cds",
"reanalysis-era5-single-levels",
param=["2t", "msl"],
product_type="reanalysis",
grid='5/5',
date=["2012-12-12", "2012-12-13"],
time=[600, 1200, 1800],
)
>>> len(ds)
10
>>> subset = ds.sel(param="2t")
>>> len(subset)
6
>>> for f in subset:
GribField(2t,None,20121212,600,0,0)
GribField(2t,None,20121212,1200,0,0)
GribField(2t,None,20121212,1800,0,0)
GribField(2t,None,20121213,600,0,0)
GribField(2t,None,20121213,1200,0,0)
GribField(2t,None,20121213,1800,0,0)
This list of fields can be filtered to extract on the fields corresponding to 12h time with .sel(time=1200)
:
>>> import climetlab as cml
>>> ds = cml.load_source(
"cds",
"reanalysis-era5-single-levels",
param=["2t", "msl"],
product_type="reanalysis",
grid='5/5',
date=["2012-12-12", "2012-12-13"],
time=[600, 1200, 1800],
)
>>> len(ds)
10
>>> subset = ds.sel(time=1200)
>>> len(subset)
4
>>> for f in subset:
GribField(2t,None,20121212,1200,0,0)
GribField(msl,None,20121212,1200,0,0)
GribField(2t,None,20121213,1200,0,0)
GribField(msl,None,20121213,1200,0,0)
Or both filters can be applied simultaneously with .sel(param="2t", time=1200)
.
>>> import climetlab as cml
>>> ds = cml.load_source(
"cds",
"reanalysis-era5-single-levels",
param=["2t", "msl"],
product_type="reanalysis",
grid='5/5',
date=["2012-12-12", "2012-12-13"],
time=[600, 1200, 1800],
)
>>> len(ds)
10
>>> subset = ds.sel(param="2t", time=1200)
>>> len(subset)
2
>>> for f in subset:
GribField(2t,None,20121212,1200,0,0)
GribField(2t,None,20121213,1200,0,0)
Filtering on multiple values is also possible by providing a list of values .sel(param="2t", time=[600, 1200])
.
>>> import climetlab as cml
>>> ds = cml.load_source(
"cds",
"reanalysis-era5-single-levels",
param=["2t", "msl"],
product_type="reanalysis",
grid='5/5',
date=["2012-12-12", "2012-12-13"],
time=[600, 1200, 1800],
)
>>> len(ds)
10
>>> subset = ds.sel(param="2t", time=[600, 1200])
>>> len(subset)
4
>>> for f in subset:
GribField(2t,None,20121212,600,0,0)
GribField(2t,None,20121212,1200,0,0)
GribField(2t,None,20121213,600,0,0)
GribField(2t,None,20121213,1200,0,0)
Ordering with .order_by()
¶
Merging Data sources¶
Warning
The merger functionality is experimental, the API may change.
Todo
add documentation on merging. merge=concat(). merge=merge().
import climetlab as cml
import xarray as xr
class MyMerger():
def __init__(self, *args, **kwargs):
pass
def merge(self, paths, **kwargs):
return xr.open_mfdataset(paths)
data = cml.load_source("url-pattern",
"https://www.example.com/data-{foo}-{bar}-{qux}.csv",
foo = [1,2,3],
bar = ["a", "b"],
qux = "unique"
merger = MyMerger()
)