Icechunk¶
virtualizarr.parsers.IcechunkParser ¶
Create a ManifestStore from an icechunk repository.
There are two entry points:
__call__(url, registry)matches VirtualiZarr'sParserprotocol, so this class works with open_virtual_dataset. It uses the registry's obstore to identify the icechunk Storage config, opens its ownRepositoryand a readonlySession, and parses.parse_session(session, registry)is the escape hatch — pass an already-openicechunk.Sessionand skip the URL/Storage round trip. Useful when you already have a session in hand.
Parameters:
-
branch(str | None, default:None) –Branch name to open in
__call__(default"main"). Ignored byparse_sessionbecause the session is already pinned. -
tag(str | None, default:None) –Tag to open in
__call__. Mutually exclusive withbranch/snapshot_id. -
snapshot_id(str | None, default:None) –Snapshot id to open in
__call__. Mutually exclusive withbranch/tag. -
group(str | None, default:None) –Optional sub-group path within the icechunk store to use as the root.
-
skip_variables(Iterable[str] | None, default:None) –Names of arrays in the group to exclude.
-
batch_size(int, default:_DEFAULT_BATCH_SIZE) –Per-batch chunk count for the underlying iterator. Default 100,000.
Examples:
>>> import icechunk
>>> from virtualizarr import open_virtual_dataset
>>> from virtualizarr.parsers import IcechunkParser
>>>
>>> # Protocol-conformant path — native chunks rendered as
>>> # ``f"{url}/chunks/{id}"`` automatically.
>>> vds = open_virtual_dataset(
... url="s3://my-bucket/my-repo",
... registry=registry,
... parser=IcechunkParser(),
... )
>>>
>>> # Escape hatch — already have a Session. Native-chunks prefix must be
>>> # supplied here since there's no URL.
>>> repo = icechunk.Repository.open(storage=...)
>>> session = repo.readonly_session(branch="dev")
>>> ms = IcechunkParser().parse_session(
... session,
... registry=registry,
... native_chunks_prefix="s3://my-bucket/my-repo/chunks",
... )
__call__ ¶
__call__(url: str, registry: 'ObjectStoreRegistry') -> ManifestStore
Protocol-conformant entry point: open the icechunk repo from a URL.
Resolves url against registry to find an obstore, translates
that obstore into an :class:icechunk.Storage (currently supports
S3, local filesystem, and HTTP backends), opens the repository at
the configured branch/tag/snapshot, and parses. Native chunk paths
are rendered as f"{url}/chunks/{chunk_id}" — icechunk's
format-constant chunks directory for the repo at that URL.
parse_session ¶
parse_session(
session: "icechunk.Session",
registry: "ObjectStoreRegistry",
*,
native_chunks_prefix: str,
) -> ManifestStore
Escape hatch: parse an already-open icechunk Session directly.
Bypasses the URL/Storage translation in __call__. The session's
snapshot is used as-is — the parser's branch/tag/snapshot_id
constructor args do not apply on this path.
Parameters:
-
session('icechunk.Session') –The open icechunk session to parse.
-
registry('ObjectStoreRegistry') –ObjectStoreRegistry the resulting ManifestStore will use to read chunk data.
-
native_chunks_prefix(str) –URL prefix to render icechunk's native (managed) chunk paths under. Native chunks become
f"{native_chunks_prefix}/{chunk_id}". Required here — there's no URL for the parser to derive a default from. A single trailing slash is tolerated.