Release notes#

v1.2.1 (unreleased)#

New Features#

  • Added a .nbytes accessor method which displays the bytes needed to hold the virtual references in memory. (GH167, PR227) By Tom Nicholas.

  • Sync with Icechunk v0.1.0a8 (PR368) By Matthew Iannucci <https://github.com/mpiannucci>. This also adds support for the to_icechunk method to add timestamps as checksums when writing virtual references to an icechunk store. This is useful for ensuring that virtual references are not stale when reading from an icechunk store, which can happen if the underlying data has changed since the virtual references were written.

Breaking changes#

  • Passing group=None (the default) to open_virtual_dataset for a file with multiple groups no longer raises an error, instead it gives you the root group. This new behaviour is more consistent with xarray.open_dataset. (GH336, PR338) By Tom Nicholas.

  • Indexes are now created by default for any loadable one-dimensional coordinate variables. Also a warning is no longer thrown when indexes=None is passed to open_virtual_dataset, and the recommendations in the docs updated to match. This also means that xarray.combine_by_coords will now work when the necessary dimension coordinates are specified in loadable_variables. (GH18, PR357, PR358) By Tom Nicholas.

Deprecations#

Bug fixes#

  • Fix bug preventing generating references for the root group of a file when a subgroup exists. (GH336, PR338) By Tom Nicholas.

  • Fix bug in dmrpp reader so _FillValue is included in variables’ encodings. (PR369) By Aimee Barciauskas.

  • Fix bug passing arguments to FITS reader, and test it on Hubble Space Telescope data. (PR363) By Tom Nicholas.

Documentation#

  • Change intro text in readme and docs landing page to be clearer, less about the relationship to Kerchunk, and more about why you would want virtual datasets in the first place. (PR337) By Tom Nicholas.

Internal Changes#

v1.2.0 (5th Dec 2024)#

This release brings a stricter internal model for manifest paths, support for appending to existing icechunk stores, an experimental non-kerchunk-based HDF5 reader, handling of nested groups in DMR++ files, as well as many other bugfixes and documentation improvements.

New Features#

  • Add a virtual_backend_kwargs keyword argument to file readers and to open_virtual_dataset, to allow reader-specific options to be passed down. (PR315) By Tom Nicholas.

  • Added append functionality to to_icechunk (PR272) By Aimee Barciauskas.

Breaking changes#

  • Minimum required version of Xarray is now v2024.10.0. (PR284) By Tom Nicholas.

  • Opening kerchunk-formatted references from disk which contain relative paths now requires passing the fs_root keyword argument via virtual_backend_kwargs. (PR243) By Tom Nicholas.

Deprecations#

Bug fixes#

  • Handle root and nested groups with dmrpp backend (PR265) By Ayush Nag.

  • Fixed bug with writing of dimension_names into zarr metadata. (PR286) By Tom Nicholas.

  • Fixed bug causing CF-compliant variables not to be identified as coordinates (PR191) By Ayush Nag.

Documentation#

  • FAQ answers on Icechunk compatibility, converting from existing Kerchunk references to Icechunk, and how to add a new reader for a custom file format. (PR266) By Tom Nicholas.

  • Clarify which readers actually currently work in FAQ, and temporarily remove tiff from the auto-detection. (GH291, PR296) By Tom Nicholas.

  • Minor improvements to the Contributing Guide. (PR298) By Tom Nicholas.

  • More minor improvements to the Contributing Guide. (PR304) By Doug Latornell.

  • Correct some links to the API. (PR325) By Tom Nicholas.

  • Added links to recorded presentations on VirtualiZarr. (PR313) By Tom Nicholas.

  • Added links to existing example notebooks. (GH329, PR331) By Tom Nicholas.

Internal Changes#

  • Added experimental new HDF file reader which doesn’t use kerchunk, accessible by importing virtualizarr.readers.hdf.HDFVirtualBackend. (PR87) By Sean Harkins.

  • Support downstream type checking by adding py.typed marker file. (PR306) By Max Jones.

  • File paths in chunk manifests are now always stored as abolute URIs. (PR243) By Tom Nicholas.

v1.1.0 (22nd Oct 2024)#

New Features#

Breaking changes#

  • Serialize valid ZarrV3 metadata and require full compressor numcodec config (for PR193) By Gustavo Hidalgo.

  • VirtualiZarr’s ZArray, ChunkEntry, and Codec no longer subclass pydantic.BaseModel (PR210)

  • ZArray’s __init__ signature has changed to match zarr.Array’s (PR210)

Deprecations#

  • Depreciates cftime_variables in open_virtual_dataset in favor of decode_times. (PR232) By Raphael Hagen.

Bug fixes#

  • Exclude empty chunks during ChunkDict construction. (PR198) By Gustavo Hidalgo.

  • Fixed regression in fill_value handling for datetime dtypes making virtual Zarr stores unreadable (PR206) By Timothy Hodson

Documentation#

Internal Changes#

  • Refactored internal structure significantly to split up everything to do with reading references from that to do with writing references. (GH229) (PR231) By Tom Nicholas.

  • Refactored readers to consider every filetype as a separate reader, all standardized to present the same open_virtual_dataset interface internally. (PR261) By Tom Nicholas.

v1.0.0 (9th July 2024)#

This release marks VirtualiZarr as mostly feature-complete, in the sense of achieving feature parity with kerchunk’s logic for combining datasets, providing an easier way to manipulate kerchunk references in memory and generate kerchunk reference files on disk.

Future VirtualiZarr development will focus on generalizing and upstreaming useful concepts into the Zarr specification, the Zarr-Python library, Xarray, and possibly some new packages. See the roadmap in the documentation for details.

New Features#

  • Now successfully opens both tiff and FITS files. (GH160, PR162) By Tom Nicholas.

  • Added a .rename_paths convenience method to rename paths in a manifest according to a function. (PR152) By Tom Nicholas.

  • New cftime_variables option on open_virtual_dataset enables encoding/decoding time. (PR122) By Julia Signell.

Breaking changes#

Deprecations#

Bug fixes#

  • Ensure that _ARRAY_DIMENSIONS are dropped from variable .attrs. (GH150, PR152) By Tom Nicholas.

  • Ensure that .attrs on coordinate variables are preserved during round-tripping. (GH155, PR154) By Tom Nicholas.

  • Ensure that non-dimension coordinate variables described via the CF conventions are preserved during round-tripping. (GH105, PR156) By Tom Nicholas.

Documentation#

  • Added example of using cftime_variables to usage docs. (GH169, PR174) By Tom Nicholas.

  • Updated the development roadmap in preparation for v1.0. (PR164) By Tom Nicholas.

  • Warn if user passes indexes=None to open_virtual_dataset to indicate that this is not yet fully supported. (PR170) By Tom Nicholas.

  • Clarify that virtual datasets cannot be treated like normal xarray datasets. (GH173) By Tom Nicholas.

Internal Changes#

  • Refactor ChunkManifest class to store chunk references internally using numpy arrays. (PR107) By Tom Nicholas.

  • Mark tests which require network access so that they are only run when –run-network-tests is passed a command-line argument to pytest. (PR144) By Tom Nicholas.

  • Determine file format from magic bytes rather than name suffix (PR143) By Scott Henderson.

v0.1 (17th June 2024)#

v0.1 is the first release of VirtualiZarr!! It contains functionality for using kerchunk to find byte ranges in netCDF files, constructing an xarray.Dataset containing ManifestArray objects, then writing out such a dataset to kerchunk references as either json or parquet.

New Features#

Breaking changes#

Deprecations#

Bug fixes#

Documentation#

Internal Changes#