INSERT INTO¶
Add data to existing Lance tables using SQL or DataFrames.
Basic Insert¶
# Create new data
new_data = [
(8, "Henry", "henry@example.com"),
(9, "Ivy", "ivy@example.com")
]
new_df = spark.createDataFrame(new_data, ["id", "name", "email"])
# Append to existing table
new_df.writeTo("users").append()
# Alternative: use traditional write API with mode
new_df.write.mode("append").saveAsTable("users")
// Create new data
val newData = Seq(
(8, "Henry", "henry@example.com"),
(9, "Ivy", "ivy@example.com")
)
val newDF = newData.toDF("id", "name", "email")
// Append to existing table
newDF.writeTo("users").append()
// Alternative: use traditional write API with mode
newDF.write.mode("append").saveAsTable("users")
// Create new data
List<Row> newData = Arrays.asList(
RowFactory.create(8L, "Henry", "henry@example.com"),
RowFactory.create(9L, "Ivy", "ivy@example.com")
);
Dataset<Row> newDF = spark.createDataFrame(newData, schema);
// Append to existing table
newDF.writeTo("users").append();
// Alternative: use traditional write API with mode
newDF.write().mode("append").saveAsTable("users");
Insert with Column Specification¶
Insert from SELECT¶
Insert with Complex Data Types¶
Writing Vector Data¶
If you created a table with the arrow.fixed-size-list.size property (see CREATE TABLE),
subsequent writes will automatically use FixedSizeList. No additional configuration is needed:
# Table was created with: 'embeddings.arrow.fixed-size-list.size' = '128'
# Subsequent writes automatically use FixedSizeList encoding
# Plain schema WITHOUT metadata - it just works!
data = [(i, [float(j) for j in range(128)]) for i in range(10, 20)]
df = spark.createDataFrame(data, ["id", "embeddings"])
df.writeTo("embeddings_table").append()
// Table was created with: 'embeddings.arrow.fixed-size-list.size' = '128'
// Subsequent writes automatically use FixedSizeList encoding
List<Row> rows = new ArrayList<>();
for (int i = 10; i < 20; i++) {
float[] vector = new float[128];
for (int j = 0; j < 128; j++) {
vector[j] = random.nextFloat();
}
rows.add(RowFactory.create(i, vector));
}
Dataset<Row> df = spark.createDataFrame(rows, schema);
df.writeTo("embeddings_table").append();
Writing Blob Data¶
If you created a table with the lance.encoding property (see CREATE TABLE),
subsequent writes will automatically use blob encoding. No additional configuration is needed:
# Table was created with: 'content.lance.encoding' = 'blob'
# Subsequent writes automatically use blob encoding
# Plain schema WITHOUT metadata - it just works!
data = [(i, f"Document {i}", bytearray(b"Large content..." * 10000)) for i in range(10, 20)]
df = spark.createDataFrame(data, ["id", "title", "content"])
df.writeTo("documents").append()
// Table was created with: 'content.lance.encoding' = 'blob'
// Subsequent writes automatically use blob encoding
List<Row> data = new ArrayList<>();
for (int i = 10; i < 20; i++) {
byte[] content = new byte[100000];
Arrays.fill(content, (byte) 0x42);
data.add(RowFactory.create(i, "Document " + i, content));
}
Dataset<Row> df = spark.createDataFrame(data, schema);
df.writeTo("documents").append();
Writing Large String Data¶
If you created a table with the arrow.large_var_char property (see CREATE TABLE),
subsequent writes will automatically use LargeVarCharVector. No additional configuration is needed:
// Table was created with: 'content.arrow.large_var_char' = 'true'
// Subsequent writes automatically use large string encoding
String longContent = String.join("", Collections.nCopies(100000, "Long content..."));
List<Row> data = new ArrayList<>();
for (int i = 10; i < 20; i++) {
data.add(RowFactory.create(i, "Article " + i, longContent));
}
Dataset<Row> df = spark.createDataFrame(data, schema);
df.writeTo("articles").append();
Write Options¶
These options control how data is written to Lance datasets. They can be set using the .option() method when writing data.
| Option | Type | Default | Description |
|---|---|---|---|
write_mode |
String | append |
Write mode: append to add to existing data, overwrite to replace existing data. |
max_row_per_file |
Integer | - | Maximum number of rows per Lance file. |
max_rows_per_group |
Integer | - | Maximum number of rows per row group within a file. |
max_bytes_per_file |
Long | - | Maximum size in bytes per Lance file. |
data_storage_version |
String | - | Lance file format version: LEGACY or STABLE. |
batch_size |
Integer | 512 |
Number of rows per batch during writing. |
use_queued_write_buffer |
Boolean | false |
Use pipelined write buffer for improved throughput. |
queue_depth |
Integer | 8 |
Queue depth for pipelined writes (only used when use_queued_write_buffer=true). |