Project Configuration (smelt.yml)¶
The smelt.yml file is the main configuration file for a smelt project. It must be located at the root of your project directory.
Top-Level Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name |
string | yes | Project name | |
version |
integer | yes | Configuration version (currently 1) |
|
model_paths |
string[] | no | ["models"] |
Directories to scan for SQL and Python model files |
seed_paths |
string[] | no | ["seeds"] |
Directories to scan for CSV seed files |
targets |
map | yes | Named execution environments (see Targets) | |
default_materialization |
string | no | "view" |
Default materialization for all models |
models |
map | no | {} |
Per-model configuration overrides (see Model Configuration) |
python |
string | no | Path to Python interpreter. Can also be set via the SMELT_PYTHON environment variable, which takes precedence over this field. |
Targets¶
Targets define execution environments. Each target has a name (the map key) and specifies a backend type with its connection details.
You can define multiple targets and select one at runtime with the --target CLI flag (default: dev).
DuckDB Target¶
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | yes | Must be duckdb |
database |
string | yes | Path to the DuckDB database file (relative to project root) |
schema |
string | yes | Database schema to use |
Spark Target¶
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | yes | Must be spark |
connect_url |
string | yes | Spark Connect URL (e.g., sc://localhost:15002) |
catalog |
string | no | Spark catalog name |
schema |
string | yes | Database schema to use |
format |
string | no | Table format: delta (default) or parquet. Affects schema evolution capabilities. See Schema Evolution. |
targets:
spark_prod:
type: spark
connect_url: sc://localhost:15002
catalog: spark_catalog
schema: main
format: delta # default; can also be "parquet"
Materialization Types¶
The default_materialization field and per-model materialization field accept these values:
| Value | Description |
|---|---|
table |
Persisted as a physical table. Required for incremental models. |
view |
Created as a database view. Re-computed on each query. |
ephemeral |
Not materialized at all. Inlined as a CTE into downstream models. Cannot have incremental config or target overrides. |
materialized_view |
Backend-managed persistent view (e.g., PostgreSQL, Databricks). Refreshed atomically. |
Precedence for materialization resolution:
- SQL file frontmatter (
materialization:in the model file) smelt.ymlper-model config (models.<name>.materialization)smelt.ymltop-leveldefault_materialization
Model Configuration¶
Per-model configuration is specified under the models key, using the model name (filename without extension) as the key.
models:
<model_name>:
materialization: <type>
tags: [<tag>, ...]
target: <target_name>
incremental:
# incremental fields...
Model Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
materialization |
string | no | (project default) | Materialization type for this model |
tags |
string[] | no | [] |
Tags for model selection (used with --select tag:X) |
target |
string | no | (CLI default) | Override which target to execute this model on |
incremental |
object | no | Incremental materialization configuration (see below) | |
schema_evolution |
object | no | Schema evolution configuration (see Schema Evolution) | |
format |
string | no | (from target) | Override the table format for this model: delta or parquet. Only relevant for Spark targets. |
columns |
map | no | {} |
Per-column metadata: default, backfill, description, tests |
Target precedence: SQL file frontmatter > smelt.yml model config > CLI --target flag.
Tags from smelt.yml and SQL frontmatter are merged (union, deduplicated).
Incremental Configuration¶
Incremental materialization processes only new or changed data instead of rebuilding the entire table. It is only valid for models with materialization: table.
models:
daily_revenue:
materialization: table
incremental:
enabled: true
event_time_column: transaction_timestamp
partition_column: revenue_date
granularity: day
unique_key:
- transaction_id
safety_overrides:
allow_window_functions: false
Incremental Fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled |
bool | no | true |
Whether incremental processing is active |
event_time_column |
string | yes | Column in source data to filter on (used in the injected WHERE clause) | |
partition_column |
string | yes | Column in the output table to delete by (for DELETE+INSERT strategy) | |
granularity |
string/object | yes | Partition granularity (see Granularity) | |
unique_key |
string[] | no | [] |
Columns that uniquely identify a row. When present, the backend may choose a MERGE strategy instead of DELETE+INSERT. |
safety_overrides |
object | no | (all false) | Override safety checks for patterns that may produce different results on partial data (see Safety Overrides) |
Granularity¶
The granularity field controls the size of each partition window. It accepts the following values:
| Value | Description |
|---|---|
hour |
Hourly partitions |
day |
Daily partitions |
week |
Weekly partitions (requires week_start subfield) |
month |
Monthly partitions |
quarter |
Quarterly partitions |
year |
Yearly partitions |
For weekly granularity, you must specify the start day:
Valid week_start values: monday, tuesday, wednesday, thursday, friday, saturday, sunday.
All other granularities are simple strings:
Safety Overrides¶
Smelt validates incremental models to ensure they produce the same results whether run on the full dataset or on individual partitions. Certain SQL patterns can violate this guarantee. Safety overrides let you acknowledge and allow these patterns when you know they are safe for your use case.
| Field | Type | Default | Description |
|---|---|---|---|
allow_window_functions |
bool | false |
Allow window functions (e.g., ROW_NUMBER(), LAG()) which may produce different results on partial data |
allow_having |
bool | false |
Allow HAVING clauses which filter on aggregates that may differ per-partition |
allow_limit |
bool | false |
Allow LIMIT which produces non-deterministic results on partial data |
allow_subqueries |
bool | false |
Allow subqueries which may reference data outside the current partition |
allow_nondeterministic |
bool | false |
Allow nondeterministic functions (e.g., RANDOM(), NOW()) |
allow_distinct |
bool | false |
Allow DISTINCT which may produce different results when data is split across partitions |
Schema Evolution Configuration¶
Schema evolution controls how smelt handles changes to an incremental model's output schema. See the Schema Evolution guide for detailed examples.
models:
my_model:
materialization: table
schema_evolution:
strategy: alter_and_backfill
columns:
status:
default: "'pending'"
backfill: "CASE WHEN status IS NULL THEN 'pending' ELSE status END"
Schema Evolution Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
strategy |
string | alter_and_backfill |
alter_and_backfill: use ALTER TABLE when possible. full_refresh: always drop and recreate on any schema change. |
Column Fields (for schema evolution)¶
| Field | Type | Default | Description |
|---|---|---|---|
default |
string | SQL expression for the DEFAULT value when adding a column via ALTER TABLE. Must be a valid SQL expression (e.g., "0", "'unknown'", "NULL", "STRUCT_PACK(a := 0)"). |
|
backfill |
string | SQL expression for UPDATE backfill after a column is added. Used as: UPDATE table SET column = <backfill_expr>. |
Validation Rules¶
Smelt validates model configurations and reports errors or warnings:
Errors (block execution):
- Ephemeral models cannot have incremental configuration
- Ephemeral models cannot have a target override
Warnings (printed to stderr):
- View models with incremental config (incremental only applies to tables)
- Materialized view models with incremental config (materialized views are refreshed atomically)
Complete Example¶
The following is a fully annotated smelt.yml based on the timeseries example project:
# Project identity
name: smelt_examples
version: 1
# Where to find model files (default: ["models"])
model_paths:
- models
# Where to find CSV seed files (default: ["seeds"])
seed_paths:
- seeds
# Execution environments
targets:
# Local development with DuckDB (default target)
dev:
type: duckdb
database: target/dev.duckdb
schema: main
# Remote Spark cluster
spark:
type: spark
connect_url: sc://localhost:15002
catalog: spark_catalog
schema: main
# Default materialization for models not explicitly configured
default_materialization: view
# Per-model configuration
models:
# Simple table materialization
users:
materialization: table
events:
materialization: table
user_activity:
materialization: table
transactions:
materialization: table
# Incremental model with full configuration
daily_revenue:
materialization: table
incremental:
enabled: true
event_time_column: transaction_timestamp # Column in source data (WHERE filter)
partition_column: revenue_date # Column in output (DELETE target)
granularity: day
cube_metrics:
materialization: table