Skip to content

Icechunk

virtualizarr.parsers.IcechunkParser

Create a ManifestStore from an icechunk repository.

There are two entry points:

  • __call__(url, registry) matches VirtualiZarr's Parser protocol, so this class works with open_virtual_dataset. It uses the registry's obstore to identify the icechunk Storage config, opens its own Repository and a readonly Session, and parses.
  • parse_session(session, registry) is the escape hatch — pass an already-open icechunk.Session and skip the URL/Storage round trip. Useful when you already have a session in hand.

Parameters:

  • branch (str | None, default: None ) –

    Branch name to open in __call__ (default "main"). Ignored by parse_session because the session is already pinned.

  • tag (str | None, default: None ) –

    Tag to open in __call__. Mutually exclusive with branch / snapshot_id.

  • snapshot_id (str | None, default: None ) –

    Snapshot id to open in __call__. Mutually exclusive with branch / tag.

  • group (str | None, default: None ) –

    Optional sub-group path within the icechunk store to use as the root.

  • skip_variables (Iterable[str] | None, default: None ) –

    Names of arrays in the group to exclude.

  • batch_size (int, default: _DEFAULT_BATCH_SIZE ) –

    Per-batch chunk count for the underlying iterator. Default 100,000.

Examples:

>>> import icechunk
>>> from virtualizarr import open_virtual_dataset
>>> from virtualizarr.parsers import IcechunkParser
>>>
>>> # Protocol-conformant path — native chunks rendered as
>>> # ``f"{url}/chunks/{id}"`` automatically.
>>> vds = open_virtual_dataset(
...     url="s3://my-bucket/my-repo",
...     registry=registry,
...     parser=IcechunkParser(),
... )
>>>
>>> # Escape hatch — already have a Session. Native-chunks prefix must be
>>> # supplied here since there's no URL.
>>> repo = icechunk.Repository.open(storage=...)
>>> session = repo.readonly_session(branch="dev")
>>> ms = IcechunkParser().parse_session(
...     session,
...     registry=registry,
...     native_chunks_prefix="s3://my-bucket/my-repo/chunks",
... )

__call__

__call__(url: str, registry: 'ObjectStoreRegistry') -> ManifestStore

Protocol-conformant entry point: open the icechunk repo from a URL.

Resolves url against registry to find an obstore, translates that obstore into an :class:icechunk.Storage (currently supports S3, local filesystem, and HTTP backends), opens the repository at the configured branch/tag/snapshot, and parses. Native chunk paths are rendered as f"{url}/chunks/{chunk_id}" — icechunk's format-constant chunks directory for the repo at that URL.

parse_session

parse_session(
    session: "icechunk.Session",
    registry: "ObjectStoreRegistry",
    *,
    native_chunks_prefix: str,
) -> ManifestStore

Escape hatch: parse an already-open icechunk Session directly.

Bypasses the URL/Storage translation in __call__. The session's snapshot is used as-is — the parser's branch/tag/snapshot_id constructor args do not apply on this path.

Parameters:

  • session ('icechunk.Session') –

    The open icechunk session to parse.

  • registry ('ObjectStoreRegistry') –

    ObjectStoreRegistry the resulting ManifestStore will use to read chunk data.

  • native_chunks_prefix (str) –

    URL prefix to render icechunk's native (managed) chunk paths under. Native chunks become f"{native_chunks_prefix}/{chunk_id}". Required here — there's no URL for the parser to derive a default from. A single trailing slash is tolerated.