Skip to content

Lance Extension

The Lance extension provides support for managing Lance tables and querying them in SQL, via DuckDB.

Installation

To install and load the extension, run these commands in your DuckDB CLI:

INSTALL lance;
LOAD lance;

If you already have a version of the extension installed locally, run the following command to update it to the latest version:

UPDATE EXTENSIONS;

Examples

All examples below reuse a small dataset with three rows (duck, horse, dragon) and a vector column with representative values. In the real world, you'd have a high-dimensional array generated by an embedding model, and a much larger Lance dataset.

Write to a Lance dataset

Use DuckDB's COPY ... TO ... to materialize query results as a Lance dataset.

COPY (
  SELECT *
  FROM (
    VALUES
      ('duck', 'quack', [0.9, 0.7, 0.1]::FLOAT[]),
      ('horse', 'neigh', [0.3, 0.1, 0.5]::FLOAT[]),
      ('dragon', 'roar', [0.5, 0.2, 0.7]::FLOAT[])
  ) AS t(animal, noise, vector)
) TO './lance_duck.lance' (FORMAT lance, mode 'overwrite');

Query a Lance dataset in DuckDB

Query the data using SQL in DuckDB.

SELECT *
  FROM './lance_duck.lance'
  LIMIT 5;

This returns:

┌─────────┬─────────┬─────────────────┐
│ animal  │  noise  │     vector      │
│ varchar │ varchar │     float[]     │
├─────────┼─────────┼─────────────────┤
│ duck    │ quack   │ [0.9, 0.7, 0.1] │
│ horse   │ neigh   │ [0.3, 0.1, 0.5] │
│ dragon  │ roar    │ [0.5, 0.2, 0.7] │
└─────────┴─────────┴─────────────────┘

Note: The Lance dataset doesn't have to be on local disk. You can query S3 paths directly as follows. To access object store URIs (such as s3://...), configure a TYPE LANCE secret.

CREATE SECRET (
  TYPE LANCE,
  PROVIDER credential_chain,
  SCOPE 's3://bucket/'
);

SELECT *
  FROM 's3://bucket/path/to/lance_duck.lance'
  LIMIT 5;

The extension lets you run fast vector search on the Lance table. This returns the _distance (smaller is closer, so sort in ascending order for nearest neighbors). The example vector here is similar to the query "duck".

SELECT animal, noise, vector, _distance
  FROM lance_vector_search(
    './lance_duck.lance',
    'vector',
    [0.8, 0.7, 0.2]::FLOAT[],
    k = 1,
    prefilter = true
  )
  ORDER BY _distance ASC;
┌─────────┬─────────┬─────────────────┐
│ animal  │  noise  │     vector      │
│ varchar │ varchar │     float[]     │
├─────────┼─────────┼─────────────────┤
│ duck    │ quack   │ [0.9, 0.7, 0.1] │
└─────────┴─────────┴─────────────────┘

From a performance standpoint, the extension pushes down projections, filters and more to the Lance level where possible. This ensures that you get all the performance benefits of using Lance as the vector store.

Run keyword-based BM25 search as shown below. This returns a _score, which is sorted in descending order to get the most relevant results.

SELECT animal, noise, vector, _score
  FROM lance_fts(
    './lance_duck.lance',
    'animal',
    'the brave knight faced the dragon',
    k = 1,
    prefilter = true
  )
  ORDER BY _score DESC;
┌─────────┬─────────┬─────────────────┐
│ animal  │  noise  │     vector      │
│ varchar │ varchar │     float[]     │
├─────────┼─────────┼─────────────────┤
│ dragon  │ roar    │ [0.5, 0.2, 0.7] │
└─────────┴─────────┴─────────────────┘

Just like in the case with vector search, the search operations are happening at the Lance level, although the instructions came in from DuckDB via SQL.

Hybrid search combines vector and FTS scores, returning a _hybrid_score in addition to _distance / _score. To get the most relevant results, sort in descending order.

SELECT animal, noise, vector, _hybrid_score, _distance, _score
  FROM lance_hybrid_search(
    './lance_duck.lance',
    'vector',
    [0.8, 0.7, 0.2]::FLOAT[],
    'animal',
    'the duck surprised the dragon',
    k = 2,
    prefilter = false,
    alpha = 0.5,
    oversample_factor = 4
  )
  ORDER BY _hybrid_score DESC;
┌─────────┬─────────┬─────────────────┐
│ animal  │  noise  │     vector      │
│ varchar │ varchar │     float[]     │
├─────────┼─────────┼─────────────────┤
│ duck    │ quack   │ [0.9, 0.7, 0.1] │
│ dragon  │ roar    │ [0.5, 0.2, 0.7] │
└─────────┴─────────┴─────────────────┘

Organize tables via Lance namespaces

A namespace is an open specification that describes access to and operations on a collection of Lance tables. When you ATTACH a local directory (containing a Lance dataset) as a Lance namespace, you can create new datasets using CREATE TABLE or CREATE TABLE AS SELECT. The dataset is written to <namespace_root>/<table_name>.lance.

ATTACH './lance_duck' AS lance_ns (TYPE LANCE);

CREATE TABLE lance_ns.main.duck_animals AS
  SELECT *
  FROM (
    VALUES
      ('duck', 'quack', [0.9, 0.7, 0.1]::FLOAT[]),
      ('horse', 'neigh', [0.3, 0.1, 0.5]::FLOAT[]),
      ('dragon', 'roar', [0.5, 0.2, 0.7]::FLOAT[])
  ) AS t(animal, noise, vector);

You can then query the namespace as follows:

SELECT count(*) FROM lance_ns.main.duck_animals;
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│      3       │
└──────────────┘

As your table management needs grow, namespaces help keep things organized, by naming tables for easier access.

More DuckDB documentation

Refer to the following pages for more details on each relevant section.

  • SQL Reference: supported SQL operations, syntax, and extension behavior.
  • REST Namespace: connecting DuckDB to a Lance REST Namespace server.
  • Cloud Storage: configuring object-store credentials and client settings.