CREATE INDEX¶
Creates a scalar index on a Lance table to accelerate queries.
Spark Extension Required
This feature requires the Lance Spark SQL extension to be enabled. See Spark SQL Extensions for configuration details.
Overview¶
The CREATE INDEX command builds an index on one or more columns of a Lance table. Indexing can improve the performance of queries that filter on the indexed columns. Depending on the index method, Lance Spark either uses a fragment-parallel build path or a driver-coordinated commit flow after parallel executor builds.
Basic Usage¶
The command uses the ALTER TABLE syntax to add an index.
Index Methods¶
The following index methods are supported:
| Method | Description |
|---|---|
zonemap |
Lightweight min/max index for fragment pruning on a scalar column. |
btree |
B-tree index for efficient range queries and point lookups on scalar columns. |
fts |
Full-text search (inverted) index for text search on string columns. |
Options¶
The CREATE INDEX command supports options via the WITH clause to control index creation. These options are specific to the chosen index method.
ZoneMap Options¶
For the zonemap method, the following options are supported:
| Option | Type | Description |
|---|---|---|
rows_per_zone |
Long | The approximate number of rows per zonemap zone. |
num_segments |
Integer | Target number of index segments (upper bound; clamped to fragment count when larger). Each segment covers a batch of fragments. Defaults to min(fragment_count, spark.default.parallelism). |
BTree Options¶
For the btree method, the following options are supported:
| Option | Type | Description |
|---|---|---|
zone_size |
Long | The number of rows per zone in the B-tree index. |
build_mode |
String | Index building mode: 'fragment' builds indexes in parallel by fragment; 'range' sorts data by indexed columns first, then partitions and builds indexes in parallel by partition. Default is 'fragment'. |
rows_per_range |
Long | The number of rows per range when built using range mode. Default is 1000000. |
FTS Options¶
For the fts method, the following options are required:
| Option | Type | Description |
|---|---|---|
base_tokenizer |
String | Tokenizer type: "simple" (whitespace/punctuation) or "ngram". |
language |
String | Language for text processing (e.g., "English"). |
max_token_length |
Integer | Maximum token length (e.g., 40). |
lower_case |
Boolean | Convert text to lowercase. |
stem |
Boolean | Enable stemming to reduce words to root form. |
remove_stop_words |
Boolean | Remove common stop words from index. |
ascii_folding |
Boolean | Normalize accented characters (e.g., 'é' to 'e'). |
with_position |
Boolean | Enable phrase queries. Increases index size. |
For advanced tokenizer configuration, refer to the Lance FTS documentation.
FTS Format Version¶
Lance FTS index format v2 is selected by the Lance runtime environment variable LANCE_FTS_FORMAT_VERSION=2. Configure it on both the Spark driver and executors before creating the index.
Spark SQL currently does not expose a per-index fts_version option. Use USING fts with the normal FTS options shown above; Spark records the index details and version returned by Lance.
Examples¶
Basic Index Creation¶
Create a simple B-tree index on a single column:
Indexing Multiple Columns¶
Create a composite index on multiple columns.
Lightweight Fragment Pruning¶
Create a zonemap index when you want lightweight min/max-based fragment pruning:
Indexing with Options¶
Create an index and specify the zone_size for the B-tree:
Zonemap with Options¶
Create a zonemap index and specify the approximate number of rows per zone:
Full-Text Search Index¶
Create an FTS index on a text column:
Query the indexed column with the SEARCH table function.
Output¶
The CREATE INDEX command returns the following information about the operation:
| Column | Type | Description |
|---|---|---|
fragments_indexed |
Long | The number of fragments that were indexed. |
index_name |
String | The name of the created index. |
When to Use an Index¶
Consider creating an index when:
- You frequently filter a large table on a specific column.
- You want lightweight fragment pruning based on per-zone min/max statistics.
- Your queries involve point lookups or small range scans.
How It Works¶
The CREATE INDEX command operates as follows:
- Index Build Execution: Lance Spark chooses an execution path based on the index method. Methods such as
btree,fts, andzonemapcan build physical index segments in parallel across fragments.zonemappublishes those segments directly as one logical index. Range-modebtreeuses Spark repartitioning and sorted preprocessed data. - Metadata Finalization: Lance Spark merges or commits the resulting index metadata on the driver so the new logical index becomes visible atomically.
- Transactional Commit: A new table version is committed with the new index information. The operation is atomic and ensures that concurrent reads are not affected.
Notes and Limitations¶
- Index Methods: The
zonemap,btree, andftsmethods are supported for scalar index creation. - Zonemap Column Count: Zonemap indexes currently support a single column only. The generic
CREATE INDEXgrammar accepts a column list, but Lance rejects multi-column zonemap creation. - Index Replacement: If you create an index with the same name as an existing one, the old index will be replaced by the new one.