Schema Evolution¶

Schema evolution lets smelt automatically migrate incremental tables when your model's output schema changes. Instead of dropping and recreating a table from scratch, smelt compares the deployed schema against the new inferred schema and generates the minimal set of DDL statements to bring the table up to date.

How it works¶

When an incremental model runs, smelt:

Compares schemas -- parses both the deployed and inferred column types into structured representations and diffs them recursively.
Classifies changes -- each change is categorized (column added, type widened, struct field added, etc.) and checked for safety.
Plans operations -- safe changes produce ALTER TABLE statements; unsafe changes trigger a full refresh.
Executes DDL -- the backend-specific DDL is generated and run. DuckDB, Spark+Delta, and Spark+Parquet each have their own code paths.

Configuration¶

Schema evolution is configured in the model's frontmatter or in smelt.yml.

Frontmatter example¶

---
materialization: table
schema_evolution:
  strategy: alter_and_backfill
columns:
  status:
    default: "'pending'"
  metadata:
    default: "STRUCT_PACK(version := 1, active := TRUE)"
---

SELECT
    id,
    status,
    metadata
FROM smelt.upstream_model

smelt.yml example¶

models:
  my_model:
    materialization: table
    schema_evolution:
      strategy: alter_and_backfill

Configuration fields¶

Field	Type	Default	Description
`schema_evolution.strategy`	string	`alter_and_backfill`	How to handle schema changes. `alter_and_backfill` uses ALTER TABLE when possible; `full_refresh` always drops and recreates.

Per-column fields¶

Per-column metadata is declared under the columns key:

Field	Type	Description
`default`	string	SQL expression used as the DEFAULT value when adding a NOT NULL column via ALTER TABLE. Examples: `"0"`, `"'unknown'"`, `"NULL"`, `"STRUCT_PACK(a := 0)"`.
`backfill`	string	SQL expression used in an UPDATE statement to backfill existing rows after a column is added. Example: `"CASE WHEN status IS NULL THEN 'unknown' ELSE status END"`.

Note

The default value is a raw SQL expression, not a YAML value. To set a string default, wrap it in SQL quotes: default: "'pending'". For numeric defaults, just use the number: default: "0".

Safe vs unsafe changes¶

smelt classifies every schema change as safe (can be handled with ALTER TABLE) or unsafe (requires a full table refresh).

Safe changes¶

Change	What happens
Add a nullable column	`ALTER TABLE ADD COLUMN`
Add a NOT NULL column with default	`ALTER TABLE ADD COLUMN ... DEFAULT expr`
Widen a scalar type (e.g., INTEGER to BIGINT)	`ALTER TABLE ALTER COLUMN TYPE`
Add a nullable field to a struct	Backend-specific DDL (see backend matrix)
Widen a type inside a struct	`ALTER TABLE ALTER COLUMN TYPE` with the full struct type
Widen array element type (e.g., INTEGER[] to BIGINT[])	`ALTER TABLE ALTER COLUMN TYPE`
Widen map value type	`ALTER TABLE ALTER COLUMN TYPE`
Change nullability (NOT NULL to nullable)	`ALTER TABLE ALTER COLUMN DROP NOT NULL`

Unsafe changes (require full refresh)¶

Change	Why
Remove a column	Data loss. Allowed with `--allow-column-removal`.
Narrow a type (e.g., BIGINT to INTEGER)	Data truncation.
Change a map's key type	No safe migration path.
Reorder struct fields	Positional storage mismatch.
Change between incompatible types (e.g., struct to scalar)	No meaningful migration.
Add a NOT NULL column without default	Existing rows would violate the constraint.

When an unsafe change is detected and the strategy is alter_and_backfill, smelt blocks execution and reports an error explaining the change. Use --allow-full-refresh to permit smelt to drop and recreate the table.

Complex type examples¶

Adding a field to a struct column¶

Your model previously produced:

-- v1: STRUCT(name VARCHAR, age INTEGER)
SELECT struct_pack(name := name, age := age) AS profile FROM ...

You add a new field:

-- v2: STRUCT(name VARCHAR, age INTEGER, email VARCHAR)
SELECT struct_pack(name := name, age := age, email := email) AS profile FROM ...

smelt detects the StructFieldAdded change and runs:

DuckDBSpark (Delta)Spark (Parquet)

ALTER TABLE main.my_table ADD COLUMN profile.email VARCHAR;

ALTER TABLE catalog.schema.my_table ADD COLUMNS (profile.email VARCHAR);

ALTER TABLE catalog.schema.my_table ADD COLUMNS (profile.email VARCHAR);

Existing rows get NULL for the new field.

Widening a type inside a struct¶

Change a field from INTEGER to BIGINT:

-- v1: STRUCT(score INTEGER, label VARCHAR)
-- v2: STRUCT(score BIGINT, label VARCHAR)

DuckDBSpark (Delta)Spark (Parquet)

ALTER TABLE main.my_table ALTER COLUMN stats TYPE STRUCT(score BIGINT, label VARCHAR);

DuckDB handles the INTEGER to BIGINT cast automatically inside the struct.

Spark cannot ALTER COLUMN TYPE with USING expressions, so smelt performs a table rewrite:

CREATE TABLE tmp_my_table AS SELECT *, CAST(stats AS STRUCT<score: BIGINT, label: STRING>) AS stats FROM my_table;
DROP TABLE my_table;
ALTER TABLE tmp_my_table RENAME TO my_table;

Parquet files contain the original types and cannot be rewritten in place. This requires --allow-full-refresh.

Adding a field to an array-of-structs¶

Your column is STRUCT(id INTEGER, name VARCHAR)[] and you add a score field:

-- v2: STRUCT(id INTEGER, name VARCHAR, score DOUBLE)[]

DuckDBSpark (Delta/Parquet)

ALTER TABLE main.my_table ALTER COLUMN items TYPE STRUCT(id INTEGER, name VARCHAR, score DOUBLE)[];

Spark uses mergeSchema on write for nullable array-of-struct field additions.

Map value evolution¶

Change map value type from INTEGER to BIGINT:

-- v1: MAP(VARCHAR, INTEGER)
-- v2: MAP(VARCHAR, BIGINT)

DuckDBSpark (Delta)Spark (Parquet)

ALTER TABLE main.my_table ALTER COLUMN lookup TYPE MAP(VARCHAR, BIGINT);

Table rewrite (Delta does not support ALTER COLUMN TYPE for maps).

Requires --allow-full-refresh.

Specifying defaults for complex types¶

Use SQL expressions for complex type defaults:

columns:
  metadata:
    default: "STRUCT_PACK(status := 'unknown', count := 0)"
  tags:
    default: "[]::VARCHAR[]"
  settings:
    default: "MAP {}"
  scores:
    default: "ARRAY[1, 2, 3]"

Multi-step evolution¶

Multiple changes to the same struct are combined into a single ALTER:

-- v1: STRUCT(a INTEGER, b VARCHAR)
-- v2: STRUCT(a BIGINT, b VARCHAR, c BOOLEAN)

smelt detects both the type widening (a: INTEGER -> BIGINT) and the field addition (c) and handles them in one operation.

Backend capability matrix¶

Not all backends support the same schema evolution operations. The table below shows what each backend can handle natively vs. what requires a fallback.

Operation	DuckDB	Spark + Delta	Spark + Parquet
Add nullable column	ALTER TABLE	ALTER TABLE	ALTER TABLE
Add NOT NULL column (with default)	ALTER TABLE	ALTER TABLE	Full refresh
Remove column	ALTER TABLE	ALTER TABLE (column mapping)	Full refresh
Widen scalar type	ALTER COLUMN TYPE	ALTER COLUMN TYPE (safe widenings)	Full refresh (most types)
Add struct field (nullable)	`ADD COLUMN col.field`	`ADD COLUMNS (col.field)`	`ADD COLUMNS (col.field)` (metastore)
Remove struct field	`DROP COLUMN col.field`	`DROP COLUMN` (column mapping)	Full refresh
Widen type in struct	ALTER COLUMN TYPE (full struct)	Table rewrite	Full refresh
Widen array element type	ALTER COLUMN TYPE	Table rewrite	Full refresh
Add field to array-of-structs	ALTER COLUMN TYPE (full type)	mergeSchema write	mergeSchema write
Widen map value type	ALTER COLUMN TYPE	Table rewrite	Full refresh
Change map key type	Full refresh	Full refresh	Full refresh
Backfill expression (UPDATE)	UPDATE statement	UPDATE statement	Full refresh

Recommendations¶

DuckDB has the most complete schema evolution support. All safe changes can be handled with ALTER TABLE.
Spark + Delta supports most operations. Use Delta when you need struct field removal (requires column mapping) or type widening inside nested types (via table rewrite).
Spark + Parquet is the most limited. Consider switching to Delta format if you need frequent schema changes. Many operations that are safe on Delta require --allow-full-refresh on Parquet.

Table format configuration¶

Spark targets default to Delta format. You can override per-target or per-model:

# Per-target
targets:
  spark_prod:
    type: spark
    connect_url: sc://localhost:15002
    schema: main
    format: delta  # or "parquet"

# Per-model override
models:
  legacy_table:
    format: parquet

See Project Configuration for full details.

The `--allow-full-refresh` flag¶

When smelt detects a schema change that cannot be handled with ALTER TABLE, it blocks execution by default:

Error: Schema evolution requires full refresh for model 'my_model':
  Parquet format does not support nested type widening.
  Consider using Delta format or run with --allow-full-refresh.

Pass --allow-full-refresh to permit smelt to drop and recreate the table:

smelt run --allow-full-refresh

Warning

Full refresh reprocesses the entire table from scratch. For large tables, this can be expensive. The flag exists to make this an intentional choice rather than a silent surprise.

Schema Evolution¶

How it works¶

Configuration¶

Frontmatter example¶

smelt.yml example¶

Configuration fields¶

Per-column fields¶

Safe vs unsafe changes¶

Safe changes¶

Unsafe changes (require full refresh)¶

Complex type examples¶

Adding a field to a struct column¶

Widening a type inside a struct¶

Adding a field to an array-of-structs¶

Map value evolution¶

Specifying defaults for complex types¶

Multi-step evolution¶

Backend capability matrix¶

Recommendations¶

Table format configuration¶

The --allow-full-refresh flag¶

Further reading¶

The `--allow-full-refresh` flag¶