Lance Extension¶
The Lance extension provides support for managing Lance tables and querying them in SQL, via DuckDB.
Installation¶
To install and load the extension, run these commands in your DuckDB CLI:
If you already have a version of the extension installed locally, run the following command to update it to the latest version:
Examples¶
All examples below reuse a small dataset with three rows (duck, horse, dragon)
and a vector column with representative values. In the real world, you'd have
a high-dimensional array generated by an embedding model, and a much larger Lance dataset.
Write to a Lance dataset¶
Use DuckDB's COPY ... TO ... to materialize query results as a Lance dataset.
COPY (
SELECT *
FROM (
VALUES
('duck', 'quack', [0.9, 0.7, 0.1]::FLOAT[]),
('horse', 'neigh', [0.3, 0.1, 0.5]::FLOAT[]),
('dragon', 'roar', [0.5, 0.2, 0.7]::FLOAT[])
) AS t(animal, noise, vector)
) TO './lance_duck.lance' (FORMAT lance, mode 'overwrite');
Query a Lance dataset in DuckDB¶
Query the data using SQL in DuckDB.
This returns:
┌─────────┬─────────┬─────────────────┐
│ animal │ noise │ vector │
│ varchar │ varchar │ float[] │
├─────────┼─────────┼─────────────────┤
│ duck │ quack │ [0.9, 0.7, 0.1] │
│ horse │ neigh │ [0.3, 0.1, 0.5] │
│ dragon │ roar │ [0.5, 0.2, 0.7] │
└─────────┴─────────┴─────────────────┘
Note: The Lance dataset doesn't have to be on local disk. You can query S3 paths directly as follows. To access object store URIs (such as
s3://...), configure aTYPE LANCEsecret.
CREATE SECRET (
TYPE LANCE,
PROVIDER credential_chain,
SCOPE 's3://bucket/'
);
SELECT *
FROM 's3://bucket/path/to/lance_duck.lance'
LIMIT 5;
Vector search¶
The extension lets you run fast vector search on the Lance table. This returns the _distance (smaller is closer, so sort in ascending order for nearest neighbors). The example vector here is similar to the query "duck".
SELECT animal, noise, vector, _distance
FROM lance_vector_search(
'./lance_duck.lance',
'vector',
[0.8, 0.7, 0.2]::FLOAT[],
k = 1,
prefilter = true
)
ORDER BY _distance ASC;
┌─────────┬─────────┬─────────────────┐
│ animal │ noise │ vector │
│ varchar │ varchar │ float[] │
├─────────┼─────────┼─────────────────┤
│ duck │ quack │ [0.9, 0.7, 0.1] │
└─────────┴─────────┴─────────────────┘
From a performance standpoint, the extension pushes down projections, filters and more to the Lance level where possible. This ensures that you get all the performance benefits of using Lance as the vector store.
Full-text search¶
Run keyword-based BM25 search as shown below. This returns a _score, which
is sorted in descending order to get the most relevant results.
SELECT animal, noise, vector, _score
FROM lance_fts(
'./lance_duck.lance',
'animal',
'the brave knight faced the dragon',
k = 1,
prefilter = true
)
ORDER BY _score DESC;
┌─────────┬─────────┬─────────────────┐
│ animal │ noise │ vector │
│ varchar │ varchar │ float[] │
├─────────┼─────────┼─────────────────┤
│ dragon │ roar │ [0.5, 0.2, 0.7] │
└─────────┴─────────┴─────────────────┘
Just like in the case with vector search, the search operations are happening at the Lance level, although the instructions came in from DuckDB via SQL.
Hybrid search¶
Hybrid search combines vector and FTS scores, returning a _hybrid_score in addition
to _distance / _score. To get the most relevant results, sort in descending order.
SELECT animal, noise, vector, _hybrid_score, _distance, _score
FROM lance_hybrid_search(
'./lance_duck.lance',
'vector',
[0.8, 0.7, 0.2]::FLOAT[],
'animal',
'the duck surprised the dragon',
k = 2,
prefilter = false,
alpha = 0.5,
oversample_factor = 4
)
ORDER BY _hybrid_score DESC;
┌─────────┬─────────┬─────────────────┐
│ animal │ noise │ vector │
│ varchar │ varchar │ float[] │
├─────────┼─────────┼─────────────────┤
│ duck │ quack │ [0.9, 0.7, 0.1] │
│ dragon │ roar │ [0.5, 0.2, 0.7] │
└─────────┴─────────┴─────────────────┘
Organize tables via Lance namespaces¶
A namespace is an open specification that describes access to and operations on a collection of Lance tables.
When you ATTACH a local directory (containing a Lance dataset) as a Lance namespace, you can create new datasets
using CREATE TABLE or CREATE TABLE AS SELECT. The dataset is written to
<namespace_root>/<table_name>.lance.
ATTACH './lance_duck' AS lance_ns (TYPE LANCE);
CREATE TABLE lance_ns.main.duck_animals AS
SELECT *
FROM (
VALUES
('duck', 'quack', [0.9, 0.7, 0.1]::FLOAT[]),
('horse', 'neigh', [0.3, 0.1, 0.5]::FLOAT[]),
('dragon', 'roar', [0.5, 0.2, 0.7]::FLOAT[])
) AS t(animal, noise, vector);
You can then query the namespace as follows:
As your table management needs grow, namespaces help keep things organized, by naming tables for easier access.
More DuckDB documentation¶
Refer to the following pages for more details on each relevant section.
- SQL Reference: supported SQL operations, syntax, and extension behavior.
- REST Namespace: connecting DuckDB to a Lance REST Namespace server.
- Cloud Storage: configuring object-store credentials and client settings.