virtualizarr.backend.open_virtual_dataset#

virtualizarr.backend.open_virtual_dataset(filepath: str, *, filetype: FileType | str | None = None, group: str | None = None, drop_variables: Iterable[str] | None = None, loadable_variables: Iterable[str] | None = None, decode_times: bool | None = None, cftime_variables: Iterable[str] | None = None, indexes: Mapping[str, Index] | None = None, virtual_backend_kwargs: dict | None = None, reader_options: dict | None = None, backend: type[VirtualBackend] | None = None) Dataset#

Open a file or store as an xarray.Dataset wrapping virtualized zarr arrays.

Some variables can be opened as loadable lazy numpy arrays. This can be controlled explicitly using the loadable_variables keyword argument. By default this will be the same variables which xarray.open_dataset would create indexes for: i.e. one-dimensional coordinate variables whose name matches the name of their only dimension (also known as “dimension coordinates”). Pandas indexes will also now be created by default for these loadable variables, but this can be controlled by passing a value for the indexes keyword argument. To avoid creating any xarray indexes pass indexes={}.

Parameters:
  • filepath – File path to open as a set of virtualized zarr arrays.

  • filetype – Type of file to be opened. Used to determine which kerchunk file format backend to use. Can be one of {‘netCDF3’, ‘netCDF4’, ‘HDF’, ‘TIFF’, ‘GRIB’, ‘FITS’, ‘dmrpp’, ‘kerchunk’}. If not provided will attempt to automatically infer the correct filetype from header bytes.

  • group – Path to the HDF5/netCDF4 group in the given file to open. Given as a str, supported by filetypes “netcdf4”, “hdf5”, and “dmrpp”.

  • drop_variables – Variables in the file to drop before returning.

  • loadable_variables – Variables in the file to open as lazy numpy/dask arrays instead of instances of ManifestArray. Default is to open all variables as virtual variables (i.e. as ManifestArrays).

  • decode_times – Bool that is passed into Xarray’s open_dataset. Allows time to be decoded into a datetime object.

  • indexes – Indexes to use on the returned xarray Dataset. Default is None, which will read any 1D coordinate data to create in-memory Pandas indexes. To avoid creating any indexes, pass indexes={}.

  • virtual_backend_kwargs – Dictionary of keyword arguments passed down to this reader. Allows passing arguments specific to certain readers.

  • reader_options – Dict passed into Kerchunk file readers, to allow reading from remote filesystems. Note: Each Kerchunk file reader has distinct arguments, so ensure reader_options match selected Kerchunk reader arguments.

Returns:

An xarray Dataset containing instances of virtual_array_cls for each variable, or normal lazily indexed arrays for each variable in loadable_variables.

Return type:

vds