Lance - Lance

What is Lance?

Lance is a modern, open source lakehouse format for multimodal AI. It contains a file format, table format, and catalog spec, allowing you to build a complete open lakehouse on top of object storage to power your AI workflows. Lance brings high-performance vector search, full-text search, random access, and feature engineering capabilities to the lakehouse, while you can still get all the existing lakehouse benefits like SQL analytics, ACID transactions, time travel, and integrations with open engines (Apache Spark, Ray, PyTorch, Trino, DuckDB, etc.) and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino, Hive Metastore, etc.)

Learn more about Lance's technical details by reading our research paper published at VLDB 2025.

Read the Docs

Expressive Hybrid Search

Lance enables powerful hybrid search combining vector similarity, full-text search, and SQL analytics on the same dataset. All query types are accelerated by corresponding secondary indexes as part of the Lance specification.

Run semantic search on embeddings, BM25 search on keywords, and apply complex SQL predicates - all using a single table with a unified interface.

Learn More

Lightning-fast Random Access

Lance delivers 100x faster random access compared to Parquet or Iceberg. Unlike traditional formats, Lance maintains high performance even when randomly accessing scattered rows across your entire dataset.

With a highly optimized file format plus efficient row-addressing and secondary indexes at table level, you can access individual records across multiple files instantly, making it perfect for real-time ML serving, random sampling, and interactive applications.

Learn More

Native Multimodal Data Support

Store images, videos, audio, text, and embeddings alongside your traditional tabular data in a single unified format. Lance's blob encoding efficiently handles large binary objects with lazy loading, while optimized vector storage accelerates similarity search.

Perfect for AI/ML workloads where you need to store raw data, ML features, generated captions and embeddings all together for multimodal retrieval and genAI workflows.

Learn More

Data Evolution > Schema Evolution

Schema evolution in most open table formats are metadata only and fast. But when trying to backfill column values in existing rows, a full table rewrite is typically required. Lance supports data evolution (efficient schema evolution with backfill), making it perfect for ML feature engineering, embedding and media content management.

Adding a new column with data is as simple as writing new Lance files to the Lance table - no need to rewrite your entire dataset.

Learn More

Rich Ecosystem Integrations

As an open format, Lance integrates seamlessly with the Python data ecosystem and modern data platforms. Work with your favorite tools including Pandas, Polars, Ray and PyTorch for data processing and machine learning.

Connect with leading query engines like Apache DataFusion, DuckDB, Apache Spark, Trino, and Apache Flink/Fluss to run SQL analytics and distributed processing on your Lance datasets.

View Integrations

Lance^™

The Open Lakehouse Format for Multimodal AI

What is Lance?

Expressive Hybrid Search

Lightning-fast Random Access

Native Multimodal Data Support

Data Evolution > Schema Evolution

Rich Ecosystem Integrations

Lance™

The Open Lakehouse Format for Multimodal AI

What is Lance?

Expressive Hybrid Search

Lightning-fast Random Access

Native Multimodal Data Support

Data Evolution > Schema Evolution

Rich Ecosystem Integrations

Lance^™