Iceberg Schema Evolution: Drop-Then-Add Is Not a Rename

When Apache Iceberg moved to Adopt on the April 2026 Thoughtworks Technology Radar, the line that gets repeated about it is that schema evolution is "safe" and "in place". Reading the spec end-to-end, that is true in a precise sense and dangerously misleading in another. Iceberg tracks every column by a numeric id, and that id — not the column name — is the contract between metadata and data files. Once I sat with that, a whole class of "rename" operations stopped looking like renames at all.

This post is the result of digging into the Iceberg evolution docs and writing a small Kotlin program against a local Hadoop catalog to see which schema changes the catalog accepts silently, which it rejects, and which it accepts but break a downstream reader. The single takeaway I want to leave on the table: a DROP COLUMN x followed by ADD COLUMN x is never a rename, even when the new column has the same name and type as the old one. Treating it as one is the most common foot-gun I found.

Why the column id matters more than the column name

Iceberg assigns a unique integer id to every field at creation time and stores that id in both the table metadata and the underlying Parquet (or ORC, or Avro) file metadata. Readers match values to schema by id, not by ordinal position and not by name. The official evolution doc names this directly: "Iceberg uses unique IDs to track each column in a table." That single design choice is the entire safety story.

Two consequences fall out of this design and they are worth stating in plain language.

A rename is a metadata-only operation: the id stays put, the name changes, every existing data file remains valid, and any reader holding a query against the new name reads the same physical bytes. Renames in Iceberg cost milliseconds because the manifest never has to look at the data files.

A drop is also metadata-only, but in the opposite direction: the id is retired from the current schema, and any future column called x will get a fresh id. Old data files still carry the bytes for the old x, but the current schema no longer maps them, so they are invisible to query engines reading at the current snapshot.

Put those two together and you get the trap.

The drop-then-add trap

Picture an events table with a column payload of type string, written to for six months. A pipeline owner decides the column should be binary instead of string. There is no widening path from string to binary in the type-promotion rules — Iceberg permits int → long, float → double, decimal precision increases, and a small number of others, but nothing that crosses string/binary. So the pipeline owner does the obvious thing in a SQL console:

sql

ALTER TABLE events DROP COLUMN payload;
ALTER TABLE events ADD COLUMN payload binary;

The DDL succeeds. The catalog commits a new snapshot. Nothing rewrites the underlying data. From that moment forward:

Six months of historical payload values exist in old data files under the original column id.
The current schema has a payload column under a brand-new id.
Every reader that touches snapshots after the change sees NULL for payload on every old file, because the new id is not present in those files and Iceberg fills the absent column with the configured default — which is null unless INITIAL_DEFAULT is set on the new column.
Queries that hit time-travel snapshots from before the drop still see the old payload, because each snapshot points to the schema version that was current at that commit.

A reader that did SELECT count(*) FROM events WHERE payload IS NOT NULL returned a number that suddenly cratered. The data was not lost. The pointer to it was. This is exactly the failure mode the column-id design was built to prevent — and it does prevent the worse outcome of silently reading the old binary bytes through the new column. But "no silent miscast" is not the same as "no silent regression".

What I actually verified, in code

I wrote a single Kotlin file against iceberg-core 1.6.x and a local Hadoop catalog. I re-ran the same program against iceberg-core 1.10.1 in May 2026 to be sure: the column-id behavior described below is unchanged, and the v3 spec preview that landed in March 2026 keeps the same id semantics. The program creates a table, drops a column, adds it back with the same name, and prints the schema after each step. The whole thing fits in 80 lines and runs on a JVM with the Iceberg jars on the classpath.

kotlin

import org.apache.hadoop.conf.Configuration
import org.apache.iceberg.PartitionSpec
import org.apache.iceberg.Schema
import org.apache.iceberg.catalog.TableIdentifier
import org.apache.iceberg.hadoop.HadoopCatalog
import org.apache.iceberg.types.Types
import java.nio.file.Files

fun main() {
    val warehouse = Files.createTempDirectory("iceberg-warehouse").toString()
    val catalog = HadoopCatalog(Configuration(), warehouse)
    val id = TableIdentifier.of("demo", "events")

    val initial = Schema(
        Types.NestedField.required(1, "event_id", Types.LongType.get()),
        Types.NestedField.optional(2, "payload", Types.StringType.get())
    )
    val table = catalog.createTable(id, initial, PartitionSpec.unpartitioned())

    fun dump(label: String) {
        println("== $label ==")
        table.refresh()
        table.schema().columns().forEach { f ->
            println("  id=${f.fieldId()}  name=${f.name()}  type=${f.type()}")
        }
    }

    dump("after create")

    table.updateSchema()
        .deleteColumn("payload")
        .commit()
    dump("after drop payload")

    table.updateSchema()
        .addColumn("payload", Types.BinaryType.get())
        .commit()
    dump("after re-add payload as binary")

    catalog.close()
}

Run it with kotlin -classpath "iceberg-core-1.6.0.jar:iceberg-api-1.6.0.jar:hadoop-common-3.3.6.jar:..." iceberg_demo.kts.

The output makes the trap concrete:

== after create ==
  id=1  name=event_id  type=long
  id=2  name=payload   type=string
== after drop payload ==
  id=1  name=event_id  type=long
== after re-add payload as binary ==
  id=1  name=event_id  type=long
  id=3  name=payload   type=binary

The new payload is id 3, not id 2. Any data file written before the drop carries payload under id 2 and is now orphaned from the live schema. A renameColumn("payload", "payload_v2") followed by addColumn("payload", BinaryType.get()) would have produced the same fresh id for the new column but kept id 2 queryable under the new name — a different, deliberate outcome.

What is genuinely free, and what is not

The Iceberg evolution doc lists the operations the format guarantees as side-effect-free metadata changes: add, drop, rename, update (widen type), and reorder. The guarantees behind those words are precise: added columns never read existing values from another column; dropping a column never changes any other column; updating a column never changes any other column; reordering never changes the values associated with a name. Each statement is a property of the column-id mapping, not a promise about what application code does with the result.

Three things sit outside that safety net and bit me in my own testing.

The first is type promotion. Widening is allowed but bounded — the spec lists the legal moves and forbids the rest. Anything that loses precision (long → int, double → float, decimal narrowing) is rejected. Changing a nullable column to required is also a breaking change because old files may carry nulls. Read the type-promotion table in the spec before proposing a change; it is shorter than expected.

The second is partition spec evolution. Iceberg lets the partition spec change without rewriting old data: queries fall back to "split planning" where each historical partition layout is planned separately under the filter that derives from its spec. That is a real feature, but it interacts badly with column drops. Iceberg issue #10487 documents a case where adding a column with the same name as a previously dropped partition key fails on some versions, and #5676 records a v2-table NPE on every subsequent operation after dropping an old partition column. Treat columns that have ever participated in a partition spec as a separate, more careful category.

The third is downstream blast radius. Inside Iceberg the rename is free. Outside Iceberg, every consumer that hard-codes the old column name — a Trino view, a Flink job, a dbt model, a Python notebook — breaks at the first query after the rename. A CDC sink that reads Debezium events keyed by name and applies them to Iceberg by name will silently drop fields whose case or spelling no longer matches. The id-based safety lives in the engine. The name-based contract lives in every job and dashboard around it.

A short rubric I now use

When I look at a proposed Iceberg schema change, I ask four questions in order.

Is the change in the allowed list (add, drop, rename, widen, reorder)? If not, expect a rejection or plan a table-level migration.
Does the column appear in the current or any historical partition spec? If yes, do not drop it without re-reading the open issues for the version in use.
Does any external consumer reference the column by name? If yes, the rename is not free for them; coordinate the deploy.
Am I tempted to drop and re-add a column to "change its type"? If yes, stop. Add the new column under a new name, backfill, then drop the old one in a later commit.

The Iceberg model is stronger than name-based or position-based formats — the column id closes the worst silent-corruption holes. The trap is in applying that strength to the wrong mental model and assuming "in-place" means "history-preserving". It does not. It means "the live schema can move without rewriting files", which is a different and weaker guarantee.

When in doubt, treat schema changes the way one would treat a database migration: write the new column alongside the old one, dual-write for a window, switch readers, then retire. Iceberg makes each of those steps cheap. It does not make any of them automatic.

When to lean on Iceberg's evolution rules

Adding optional columns with a sensible default.
Renaming columns when every downstream consumer lives in the same repo and can ship in the same commit.
Widening numeric types within the documented promotion table.
Reordering columns for ergonomic reasons inside a struct.

When to reach for a column-by-column migration instead

Any type change outside the promotion table, including string ↔ binary and any narrowing.
Touching a column that is currently or was ever part of a partition spec.
Renames that reach external systems by name (CDC sinks, BI tools, foreign warehouses).
Anything that "just" looks like a drop-and-add of the same name.

Iceberg Schema Evolution: Drop-Then-Add Is Not a Rename

Why the column id matters more than the column name

The drop-then-add trap

What I actually verified, in code

What is genuinely free, and what is not

A short rubric I now use

Still here? You might enjoy this.

Related Posts

Idempotency Is a Protocol, Not a Key

Event-Log-as-Source-of-Truth Turns Schema Evolution Into a Forever Problem

The Transactional Outbox Is Not a Queue