UPDATE COLUMNS FROM¶
Lance supports updating existing columns in a dataset using data from a source,
matching rows by _rowaddr. Only matching rows are updated; unmatched rows in source are ignored.
This is useful for large-scale batch updates where both target and source tables are very large.
Unlike MERGE INTO, this operation uses Lance's fragment.updateColumns() API,
which preserves _rowaddr and _fragid (no row rewrite) and has lower write amplification.
Spark Extension Required
This feature requires the Lance Spark SQL extension to be enabled. See Spark SQL Extensions for configuration details.
Syntax¶
Usage¶
First, create a temporary view containing the updated column values along with _rowaddr and _fragid:
CREATE TEMPORARY VIEW update_source AS
SELECT _rowaddr, _fragid, 'new_value' as name
FROM users
WHERE id = 2;
ALTER TABLE users UPDATE COLUMNS name FROM update_source;
Update multiple columns at once:
CREATE TEMPORARY VIEW update_source AS
SELECT _rowaddr, _fragid, 200 as value, 'updated' as name
FROM users
WHERE id = 2;
ALTER TABLE users UPDATE COLUMNS value, name FROM update_source;
Note
The source must contain _rowaddr and _fragid columns to address the target dataset's rows.
The columns specified in the UPDATE COLUMNS clause must already exist in the target table.
Comparison with Other Operations¶
| Feature | UPDATE SET | UPDATE COLUMNS FROM | MERGE INTO |
|---|---|---|---|
| Matching | WHERE clause | _rowaddr |
JOIN condition |
| Scope | All matching rows | Selected columns only | Full row operations |
| Write amplification | Rewrites affected rows | Column-level update only | Rewrites affected rows |
Preserves _rowaddr |
No | Yes | No |
| Batch update support | No | Yes | No |