Skip to content

Branch and Tag Specification

Overview

Lance supports branching and tagging for managing multiple independent version histories and creating named references to specific versions. Branches enable parallel development workflows, while tags provide stable named references for important versions.

Branching

Branch Name

Branch names must follow these validation rules:

  1. Cannot be empty
  2. Cannot start or end with /
  3. Cannot contain consecutive //
  4. Cannot contain .. or \
  5. Segments must contain only alphanumeric characters, ., -, _
  6. Cannot end with .lock
  7. Cannot be named main (reserved for main branch)

Branch Metadata Path

Branch metadata is stored at _refs/branches/{branch-name}.json in the dataset root. Since branch names support hierarchical naming with / characters, the / is URL-encoded as %2F in the filename to distinguish it from directory separators (e.g., bugfix/issue-123 becomes bugfix%2Fissue-123.json):

{dataset_root}/
    _refs/
        branches/
            feature-a.json
            bugfix%2Fissue-123.json  # Note: '/' encoded as '%2F'

Branch Metadata File Format

Each branch metadata file is a JSON file with the following fields:

JSON Key Type Optional Description
parent_branch string Yes Name of the branch this was created from. null indicates branched from main.
parent_version number Version number of the parent branch at the time this branch was created.
create_at number Unix timestamp (seconds since epoch) when the branch was created.
manifest_size number Size of the initial manifest file in bytes.

Branch Dataset Layout

Each branch dataset is technically a shallow clone of the source dataset. Branch datasets are organized using the tree/ directory at the dataset root:

{dataset_root}/
    tree/
        {branch_name}/
            _versions/
                *.manifest
            _transactions/
                *.txn
            _deletions/
                *.arrow
                *.bin
            _indices/
                {UUID}/
                    index.idx

Named branches store their version-specific files under tree/{branch_name}/, resembling the GitHub branch path convention. It uses the branch name as is to form the path, which means / would create a logical subdirectory (e.g., bugfix/issue-123, feature/user-auth):

{dataset_root}/
    tree/
        feature-a/
            _versions/
                1.manifest
                2.manifest
        bugfix/
            issue-123/
                _versions/
                    1.manifest

Tagging

Tag Name

Tag names must follow these validation rules:

  1. Cannot be empty
  2. Must contain only alphanumeric characters, ., -, _
  3. Cannot start or end with .
  4. Cannot end with .lock
  5. Cannot contain consecutive ..

Note that tag names do not support / characters, unlike branch names.

Tag Storage

Tags are stored as JSON files under _refs/tags/ at the dataset root:

{dataset_root}/
    _refs/
        tags/
            v1.0.0.json
            v1.1.0.json
            production.json

Tags are always stored at the root dataset level, regardless of which branch they reference.

Tag File Format

Each tag file is a JSON file with the following fields:

JSON Key Type Optional Description
branch string Yes Branch name being tagged. null or absent indicates main branch.
version number Version number being tagged within that branch.
manifest_size number Size of the manifest file in bytes. Used for efficient manifest loading.