Lance Directory Namespace Implementation Spec¶
This document describes how the Lance Directory Namespace implements the Lance Namespace client spec.
Background¶
The Lance Directory Namespace is a catalog that stores tables in a directory structure on any local or remote storage system. For details on the catalog design including V1 (directory listing), V2 (manifest), and compatibility mode, see the Directory Namespace Catalog Spec.
Namespace Implementation Configuration Properties¶
The Lance directory namespace implementation accepts the following configuration properties:
The root property is required and specifies the root directory of the namespace where tables are stored. This can be a local path like /my/dir or a cloud storage URI like s3://bucket/prefix.
The manifest_enabled property controls whether the manifest table is used for tracking tables and namespaces (V2). Defaults to true.
The dir_listing_enabled property controls whether directory scanning is used for table discovery (V1). Defaults to true.
By default, both properties are enabled, which means the implementation operates in Compatibility Mode.
Properties with the storage. prefix are passed directly to the underlying Lance ObjectStore after removing the prefix. For example, storage.region becomes region when passed to the storage layer.
Object Mapping¶
Namespace¶
The root namespace is the root directory specified by the root configuration property. This is the base path where all tables are stored.
A child namespace is a logical container tracked in the manifest table. Child namespaces are only supported in V2; V1 treats the root directory as a flat namespace containing only tables. Child namespaces do not correspond to physical subdirectories.
The namespace identifier is a list of strings representing the namespace path. For example, a namespace ["prod", "analytics"] is serialized to prod$analytics when stored in the manifest table's object_id column.
Namespace properties are stored as JSON in the metadata column of the manifest table. This is only available in V2.
Table¶
A table is a subdirectory containing Lance table data. The directory must contain valid Lance format files including the _versions/ directory with version manifests.
The table identifier is a list of strings representing the namespace path followed by the table name. For example, a table ["prod", "analytics", "users"] represents a table named users in namespace ["prod", "analytics"]. This is serialized to prod$analytics$users when stored in the manifest table's object_id column.
The table location depends on the mode and namespace level:
- In V1 (root namespace only), tables are stored as
<table_name>.lancedirectories - In V2 with
dir_listing_enabled=trueand an empty namespace (root level), tables use the<table_name>.lancenaming convention for backward compatibility - In V2 for child namespaces, or when
dir_listing_enabled=false, tables are stored as<hash>_<object_id>directories where hash provides entropy for object store throughput
Table properties are stored in Lance table metadata and can be accessed via the Lance SDK.
Lance Table Identification¶
In a Directory Namespace, a Lance table is identified differently depending on the mode:
In V1, a Lance table is any directory with the .lance suffix (e.g., users.lance/). The directory must contain valid Lance table data to be usable. Only single-level table identifiers (e.g., ["users"]) are supported in this mode.
In V2, a Lance table is identified by a row in the manifest table with object_type="table". The row's location field points to the Lance table directory. Multi-level table identifiers (e.g., ["prod", "analytics", "users"]) are supported.
A valid Lance table directory must be non-empty.
Basic Operations¶
CreateNamespace¶
This operation is only supported in V2. V1 does not support explicit namespace creation since it uses a flat directory structure.
The implementation creates a new namespace using a merge-insert operation on the manifest table:
- Validate the parent namespace exists (if not creating at root level)
- Merge-insert a new row into the manifest table with:
object_idset to the namespace identifier (e.g.,prod$analytics)object_typeset to"namespace"metadatacontaining the namespace properties as JSONcreated_atset to the current timestamp
Primary-key deduplication on object_id ensures no duplicate rows are inserted. If a namespace with the same identifier already exists, the operation fails.
Error Handling:
If a namespace with the same identifier already exists, return error code 2 (NamespaceAlreadyExists).
If the parent namespace does not exist (for nested namespaces), return error code 1 (NamespaceNotFound).
If the identifier format is invalid, return error code 13 (InvalidInput).
ListNamespaces¶
This operation lists child namespaces within a parent namespace.
In V1, this operation returns an empty list since namespaces are not supported.
In V2, the implementation queries the manifest table:
- Query for rows where
object_type = "namespace" - Filter to rows where
object_idstarts with the parent namespace prefix - Further filter to rows where
object_idhas exactly one more level than the parent - Return the list of namespace names (the last component of each identifier)
Error Handling:
If the parent namespace does not exist (V2 only), return error code 1 (NamespaceNotFound).
DescribeNamespace¶
This operation is only supported in V2 and returns namespace metadata.
The implementation:
- Query the manifest table for the row with the matching
object_id - Parse the
metadatacolumn as JSON - Return the namespace name and properties
Error Handling:
If the namespace does not exist, return error code 1 (NamespaceNotFound).
DropNamespace¶
This operation is only supported in V2 and removes a namespace.
The implementation:
- Check that the namespace exists in the manifest table
- Query for any child namespaces or tables with identifiers starting with this namespace's prefix
- If any children exist, the operation fails
- Delete the namespace row from the manifest table using the
object_idprimary key
Error Handling:
If the namespace does not exist, return error code 1 (NamespaceNotFound).
If the namespace contains tables or child namespaces, return error code 3 (NamespaceNotEmpty).
DeclareTable¶
This operation declares a new Lance table, reserving the table name and location without creating actual data files.
The implementation:
- Validate the parent namespace exists (in V2)
- Determine the table location:
- In V1:
<root>/<table_name>.lance - In V2 with
dir_listing_enabled=trueat root level:<root>/<table_name>.lance - In V2 for child namespaces or with
dir_listing_enabled=false:<root>/<hash>_<object_id>/
- In V1:
- Create a
.lance-reservedfile at the location to mark the table's existence - In V2, merge-insert a row into the manifest table with:
object_idset to the table identifierobject_typeset to"table"locationset to the table directory path
Primary-key deduplication on object_id ensures no duplicate rows are inserted. If a table with the same identifier already exists, the operation fails.
Error Handling:
If the parent namespace does not exist, return error code 1 (NamespaceNotFound).
If a table with the same identifier already exists, return error code 5 (TableAlreadyExists).
If there is a concurrent creation attempt, return error code 14 (ConcurrentModification).
ListTables¶
This operation lists tables within a namespace.
In V1:
- List all entries in the root directory
- Filter to directories matching the
*.lancepattern - Return the table names (directory names without the
.lancesuffix)
In V2:
- Query the manifest table for rows where
object_type = "table" - Filter to rows where
object_idstarts with the namespace prefix - Further filter to rows where
object_idhas exactly one more level than the namespace - Return the list of table names
When both V1 and V2 are enabled (the default Compatibility Mode), the implementation performs both queries and merges results, with manifest entries taking precedence when duplicates exist.
Error Handling:
If the namespace does not exist (V2 only), return error code 1 (NamespaceNotFound).
DescribeTable¶
This operation returns table metadata including schema, version, and properties.
The implementation:
- Locate the table:
- In V1, check for the
<table_name>.lancedirectory - In V2, query the manifest table for the table location
- When both V1 and V2 are enabled (the default Compatibility Mode),
first check the manifest table, then fall back to checking the
.lancedirectory
- In V1, check for the
- Open the Lance table using the Lance SDK
- Read the table metadata and return:
name: The table nameschema: The Arrow schema of the tableversion: The current version numberlocation: The table directory path
Error Handling:
If the parent namespace does not exist, return error code 1 (NamespaceNotFound).
If the table does not exist, return error code 4 (TableNotFound).
If a specific version is requested and does not exist, return error code 11 (TableVersionNotFound).
DeregisterTable¶
This operation deregisters a table from the namespace while preserving its data on storage. The table files remain at their storage location and can be re-registered later using RegisterTable.
In V1:
- Locate the table by checking for the
<table_name>.lancedirectory - Verify the table exists and is not already deregistered
- Create a
.lance-deregisteredmarker file inside the table directory - Return the table location for reference
The marker file approach ensures that:
- Table data remains intact at its original location
- The table is excluded from ListTables results
- The table returns TableNotFound for DescribeTable and TableExists operations
- The table can be re-registered by removing the marker file and calling RegisterTable
- DropTable still works on deregistered tables (removes both data and marker file)
In V2:
- Locate the table by querying the manifest table for the table location
- Remove the table row from the manifest table using the
object_idprimary key - Keep the table files at the storage location
- Return the table location and properties for reference
When both V1 and V2 are enabled (the default Compatibility Mode),
first check the manifest table, then fall back to checking the .lance directory.
If found in manifest, follow V2 behavior; otherwise follow V1 behavior.
Error Handling:
If the parent namespace does not exist, return error code 1 (NamespaceNotFound).
If the table does not exist or is already deregistered, return error code 4 (TableNotFound).
Additional Operations¶
DropTable¶
This operation removes a table and its data.
In V1:
- Locate the table by checking for the
<table_name>.lancedirectory - Delete the table directory and all its contents from storage
- If deletion fails midway (directory is still non-empty), the drop has failed and should be retried
In V2:
- Locate the table by querying the manifest table for the table location
- Remove the table row from the manifest table using the
object_idprimary key - Delete the table directory and all its contents from storage (failure here does not affect the success of the drop since the table is no longer reachable)
When both V1 and V2 are enabled (the default Compatibility Mode),
first check the manifest table, then fall back to checking the .lance directory.
If found in manifest, follow V2 behavior; otherwise follow V1 behavior.
Error Handling:
If the parent namespace does not exist, return error code 1 (NamespaceNotFound).
If the table does not exist, return error code 4 (TableNotFound).
If there is a file system permission error, return error code 15 (PermissionDenied).
If there is an unexpected I/O error, return error code 18 (Internal).
CreateTableVersion¶
This operation creates a new version entry for a table. It supports put_if_not_exists semantics.
When table version management is not enabled:
- Resolve the table location
- Parse the staging manifest path from the request
- Determine the final manifest path based on the naming scheme (V1 or V2)
- Copy the staging manifest to the final path in the
_versions/directory usingput_if_not_existssemantics - Delete the staging manifest file
- Return the created version info including the final manifest path
When table version management is enabled (V2 with table_version_management=true in __manifest metadata), the directory namespace acts as an external manifest store. The commit process follows these steps:
- Stage manifest in object storage: The caller writes the new manifest to a staging path (e.g.,
{table_location}/_versions/{version}.manifest-{uuid}). This staged manifest is not yet visible to readers. - Atomically commit to manifest table: Merge-insert a new row into the
__manifesttable with:object_idset to<table_id>$<version>(e.g.,users$1orns1$users$1)object_typeset to"table_version"metadatacontaining the JSON-encoded version metadata including the staging manifest path
Primary-key deduplication on object_id ensures no duplicate rows are inserted. The commit is effectively complete after this step. If this fails, another writer has already committed that version.
3. Finalize in object storage: Copy the staged manifest to the standard location ({table_location}/_versions/{version}.manifest). This makes it discoverable by readers that do not use the manifest table.
4. Update manifest table pointer: Update the metadata in the manifest table row to point to the finalized manifest path, synchronizing both systems.
Error Handling:
If the table does not exist, return error code 4 (TableNotFound).
If the version already exists, return error code 12 (TableVersionAlreadyExists).
If there is a concurrent creation attempt, return error code 14 (ConcurrentModification).
BatchCreateTableVersions¶
This operation atomically creates version entries for multiple tables.
When table version management is not enabled, this operation iterates through each entry and calls CreateTableVersion for each one. Atomicity is not guaranteed.
When table version management is enabled, the batch commit process follows these steps:
- Stage manifests in object storage: For each entry, the caller writes the new manifest to a staging path (e.g.,
{table_location}/_versions/{version}.manifest-{uuid}). - Atomically commit to manifest table: Merge-insert all version rows into the
__manifesttable in a single atomic commit, each with:object_idset to<table_id>$<version>object_typeset to"table_version"metadatacontaining the JSON-encoded version metadata including the staging manifest path
Primary-key deduplication on object_id ensures no duplicate rows are inserted. The commit is effectively complete after this step. If any version already exists, the entire batch fails.
3. Finalize in object storage: For each entry, copy the staged manifest to the standard location.
4. Update manifest table pointers: Update the metadata in each manifest table row to point to the finalized manifest paths.
Error Handling:
If any table does not exist, return error code 4 (TableNotFound).
If any version already exists, return error code 12 (TableVersionAlreadyExists).
If there is a concurrent modification, return error code 14 (ConcurrentModification).
ListTableVersions¶
This operation lists version entries for a table.
When table version management is not enabled:
- Resolve the table location
- List all files in the
_versions/directory - Parse version numbers from manifest filenames (handling both V1 and V2 naming schemes)
- Extract metadata from file attributes (size, e_tag, last_modified timestamp)
- Sort results by version number (descending if
descending=true) - Apply pagination using
page_tokenandlimit
When table version management is enabled:
- Query the manifest table for rows where:
object_type = "table_version"object_idstarts with<table_id>$
- Parse the version number from each
object_id - Parse the
metadatacolumn as JSON to extract version details - Sort results by version number (descending if
descending=true) - Apply pagination using
page_tokenandlimit
Error Handling:
If the table does not exist, return error code 4 (TableNotFound).
DescribeTableVersion¶
This operation retrieves details for a specific table version.
When table version management is not enabled:
- Resolve the table location
- Open the Lance dataset at the specified version
- Read the manifest file to extract version metadata
- Return the version information including manifest_path, manifest_size, e_tag, timestamp_millis, and metadata
When table version management is enabled, the read process validates and synchronizes the manifest:
- Query manifest table: Retrieve the manifest path for the requested version from the row with
object_id = <table_id>$<version>. If the path matches the expected path based on the naming scheme, synchronization is complete. - Synchronize to object storage: If the manifest path does not match the expected path based on the naming scheme (i.e., it is a staging path), copy the staged manifest to its final location (
{table_location}/_versions/{version}.manifest). This is an idempotent operation. - Update manifest table: Update the
metadatain the manifest table row to reflect the finalized path for future readers. - Return version information: Return the version information with the finalized manifest path, or error if synchronization fails.
Error Handling:
If the table does not exist, return error code 4 (TableNotFound).
If the version does not exist, return error code 11 (TableVersionNotFound).
BatchDeleteTableVersions¶
This operation deletes multiple version entries for a table.
When table version management is not enabled:
- Resolve the table location
- Delete the manifest files in the
_versions/directory for each specified version - Return the count of deleted versions
When table version management is enabled:
- Delete the manifest files in the
_versions/directory for each specified version - Delete rows from the manifest table using the
object_idprimary key for each specified version - Return the count of deleted versions
Error Handling:
If the table does not exist, return error code 4 (TableNotFound).
If any specified version does not exist, the operation may either skip it silently or return error code 11 (TableVersionNotFound), depending on the ignore_missing parameter.