smelt Functions¶
smelt functions let you define reusable SQL fragments that are checked by the compiler. They remove copy-paste duplication, add type safety at call sites, and expose structured metadata to the planner for optimizations like filter pushdown and join elimination.
Defining a function¶
Functions live in a functions/ directory alongside your models/ directory. Each .sql file can contain one or more smelt.define declarations.
Basic syntax¶
smelt.define function_name(param1, param2) AS (
-- SQL body using parameter names
param1 / NULLIF(param2, 0)
)
Typed parameters (recommended)¶
Annotating parameters with types lets smelt check that callers pass compatible values and check the body in isolation:
smelt.define safe_divide(
numerator: Expr<Numeric>,
denominator: Expr<Numeric>
) -> Expr<Double> AS (
CASE WHEN denominator = 0 OR denominator IS NULL
THEN NULL
ELSE CAST(numerator AS DOUBLE) / CAST(denominator AS DOUBLE)
END
)
Default parameter values¶
Parameters can have default values, making them optional at call sites:
smelt.define sessionize(
source: TableExpr,
user_col: Expr<Text>,
ts_col: Expr<Timestamp>,
gap: Expr<Interval> = INTERVAL '30 minutes'
) -> TableExpr AS (
SELECT
source.*,
SUM(CASE WHEN ts_col - LAG(ts_col) OVER (PARTITION BY user_col ORDER BY ts_col) > gap
THEN 1 ELSE 0 END)
OVER (PARTITION BY user_col ORDER BY ts_col) AS session_id
FROM source
)
Calling a function¶
Use smelt.functions.<name>() to call a user-defined function:
A function's call path is derived from the workspace-relative directory of the file it is declared in, joined with the declared name. The filename stem itself is not part of the call path. The mapping is enforced — calling a function under the wrong path is an UnknownSmeltFn diagnostic.
| Filesystem location | Declared name | Call path |
|---|---|---|
functions/safe_divide.sql |
safe_divide |
smelt.functions.safe_divide(...) |
functions/status.sql |
is_shipped |
smelt.functions.is_shipped(...) |
functions/patterns/x.sql |
session_rollup |
smelt.functions.patterns.session_rollup(...) |
utils/math.sql |
safe_divide |
smelt.utils.safe_divide(...) |
Renaming a function or moving its file changes the call path, the same way moving a model does.
Verifying function calls¶
Before doing a full smelt build, confirm that a function call expands correctly using --show-plan:
The ExpandedCall node in the plan output shows the inlined function body with argument substitution already applied. This is faster than a full build and catches wrong-path errors (such as UnknownSmeltFn) without touching the database.
Note that --show-plan requires a positional model file path — there is no project-wide show-plan mode. See smelt build in the CLI reference for details.
Declared return types and model schemas¶
For typed functions (those with a -> ReturnType annotation), smelt uses the declared return type as the column type in downstream models. smelt table <model> reflects this — a column fed by a -> Expr<Double> call shows as DOUBLE. Downstream aggregates also use the declared type: SUM over a -> Expr<Double> call infers as DOUBLE, not BIGINT.
NULL semantics in function bodies¶
smelt function bodies are SQL expressions — they inherit standard SQL NULL propagation. A body that compares a nullable column with = returns NULL (which is falsy) when the argument is NULL, not an error.
is_shipped('shipped')→trueis_shipped('cancelled')→falseis_shipped(NULL)→NULL(falsy) —NULL = 'shipped'isNULLin SQL
This matters in LEFT JOIN contexts: when the right side has no matching row, nullable columns arrive as NULL. is_shipped(NULL) returns falsy, so CASE WHEN smelt.functions.is_shipped(o.status) THEN o.amount END naturally produces NULL for non-matching rows — no explicit NULL guard is needed at the call site. Wrap the whole SUM(...) in COALESCE(..., 0) if you need 0 instead of NULL for customers with no shipped orders.
If your function body needs to distinguish NULL from false, use COALESCE inside the body:
smelt.define is_shipped_safe(status: Expr<Text>) -> Expr<Boolean> AS (
COALESCE(status = 'shipped', false)
)
Calling in boolean positions¶
A function whose declared return type is Expr<Boolean> can be used in any boolean position the SQL grammar accepts: WHERE, HAVING, JOIN ON, QUALIFY, CASE WHEN, and as a SELECT-list expression.
-- functions/status.sql
smelt.define is_shipped(status: Expr<Text>) -> Expr<Boolean> AS (
status = 'shipped' OR status = 'delivered'
)
Named arguments¶
Pass arguments by name to improve readability or skip over defaulted parameters:
SELECT *
FROM smelt.functions.sessionize(
smelt.events,
user_col => user_id,
ts_col => event_time,
gap => INTERVAL '1 hour'
)
Named arguments are bound to parameters by name, not by position. You can supply them in any order — the compiler maps each param => value to the matching parameter slot before running the body substitution pass.
Multi-argument calls with default parameters¶
When a function declares a default value for a parameter, you can omit it at the call site. Named arguments make it clear which parameters you're supplying and which rely on defaults:
-- functions/sessionize.sql
smelt.define sessionize(
source: TableExpr,
partition_col: Expr<Integer>,
ts_col: Expr<Date>,
platform_col: Expr<Text>,
gap: Expr<BigInt> = 30 * 60 * 1000000 -- 30 minutes in μs
) -> TableExpr AS (
-- ... window-function body ...
)
-- models/silver/sessions.sql
-- Supply the four required args by name; `gap` uses the declared default.
-- Arg order differs from the declaration order to show order-independence.
WITH sessionized AS (
SELECT *
FROM smelt.functions.sessionize(
source => smelt.silver.events_parsed,
platform_col => platform,
partition_col => device_id,
ts_col => event_ts
)
),
-- ... downstream aggregation ...
The compiler resolves each param => value pair against the declaration, then applies the default 30 * 60 * 1000000 for the omitted gap. The result is identical to writing all five arguments positionally.
The alias after the closing ) is optional — the compiler synthesises a unique alias for the derived table when none is supplied, so the target engine receives valid SQL regardless. A user-supplied alias (e.g. FROM smelt.functions.sessionize(...) AS s) overrides the synthesised one.
Projecting struct-returning function outputs¶
A function whose return type is Expr<Struct<{f₁: T₁, …, fₙ: Tₙ}>> can be projected directly in a SELECT list using the .* suffix. The compiler lowers the call to n separate aliased columns, one per struct field.
Declaration¶
-- functions/parse_event_payload.sql
smelt.define parse_event_payload(
payload_json: Expr<Text>
) -> Expr<Struct<{event_name: Text, platform: Text, url: Text}>> AS (
{
json_extract_string(payload_json, '$.event_name') AS event_name,
json_extract_string(payload_json, '$.platform') AS platform,
json_extract_string(payload_json, '$.url') AS url
}
)
Call site with .* projection¶
-- models/silver/events_parsed.sql
SELECT
event_id,
device_id,
user_id,
CAST(event_date AS DATE) + to_seconds(seconds_in_day) AS event_ts,
CAST(event_date AS DATE) AS event_date,
smelt.functions.parse_event_payload(payload).*
FROM smelt.bronze.raw_events
The compiler expands smelt.functions.parse_event_payload(payload).* into:
json_extract_string(payload, '$.event_name') AS event_name,
json_extract_string(payload, '$.platform') AS platform,
json_extract_string(payload, '$.url') AS url
The resulting model exposes event_name, platform, and url as top-level columns — identical to writing the three calls inline, but defined once in the function and reused across any model that calls it.
The declared return type is authoritative: downstream models that reference smelt.silver.events_parsed.event_name see Text, regardless of what DuckDB would infer from the raw json_extract_string call.
Three tiers of annotation¶
smelt uses gradual typing — you choose how much annotation to add.
Tier 1 — unannotated (quick and personal)¶
- No type annotations on parameters or return type.
- The body is expanded at each call site with the caller's concrete types substituted in.
- Type errors only surface at call sites where the concrete types cause a problem.
- Good for quick personal utilities or exploratory work.
Tier 2 — parameters annotated (production code)¶
- Every parameter is annotated.
- The body is type-checked once at definition time against the declared parameter types.
- Errors are reported against the function body, not each call site.
- Callers whose argument types don't satisfy the constraints get an
ArgTypeMismatchdiagnostic immediately.
Tier 3 — fully annotated (library quality)¶
smelt.define safe_divide(
numerator: Expr<Numeric>,
denominator: Expr<Numeric>
) -> Expr<Double> AS (
CASE WHEN denominator = 0 OR denominator IS NULL
THEN NULL
ELSE CAST(numerator AS DOUBLE) / CAST(denominator AS DOUBLE)
END
)
- Both parameters and return type are annotated.
- The return type is verified against the body — if the body evaluates to a different type, you get a
ReturnTypeMismatchdiagnostic at the function declaration. - LSP hover on a call site shows the declared return type directly.
See docs/smelt-functions-upgrade-story.md for how to migrate from Tier 1 to Tier 2 without breaking existing callers.
Type constraints¶
The type language for parameter and return annotations:
| Annotation | Meaning |
|---|---|
Expr<Integer> |
Any integer expression |
Expr<Numeric> |
Any numeric expression (Integer, BigInt, Float, Double, Decimal) |
Expr<Double> |
A double-precision float expression |
Expr<Text> |
A text/varchar expression |
Expr<Boolean> |
A boolean expression |
Expr<Timestamp> |
A timestamp expression |
Expr<Date> |
A date expression |
Expr<Interval> |
An interval expression |
Expr<Any> |
Any scalar expression type |
TableExpr |
A table-valued argument (bare row polymorphism) |
TableExpr<{col: Type, ..r}> |
A table with at least the listed columns |
AggExpr<T> |
An aggregate expression |
WindowExpr<T> |
A window (analytic) expression |
SelectItems<K> |
A SELECT-list fragment with kind ceiling K (Scalar, Agg, Window) |
SelectItems<K, ctx> |
As above, but columns must belong to context ctx |
Fragment parameters — TableExpr and SelectItems¶
Fragment sorts are the key to composable pipelines. They let you pass table-valued arguments and SELECT-list fragments to functions.
TableExpr — table-valued parameters¶
TableExpr parameters accept a table reference (smelt.<name>, smelt.sources.<name>, a CTE, or a subquery):
smelt.define add_margin(
source: TableExpr<{revenue: Numeric, cost: Numeric}>
) -> TableExpr AS (
SELECT source.*, revenue - cost AS margin
FROM source
)
Call it with any table that has at least those columns:
The row-requirement annotation TableExpr<{revenue: Numeric, cost: Numeric}> is checked at each call site — the compiler reports a RowRequirementMissing diagnostic if the supplied table is missing a required column.
SelectItems — SELECT-list fragments¶
SelectItems<Agg> parameters accept aggregate expressions, passed inline or via a PASSING clause:
smelt.define session_rollup(
source: TableExpr,
user_col: Expr<Text>,
ts_col: Expr<Timestamp>,
gap: Expr<Interval> = INTERVAL '30 minutes',
metrics: SelectItems<Agg, sessionized> = ()
) -> TableExpr AS (
WITH sessionized AS (
SELECT * FROM smelt.functions.sessionize(source, user_col, ts_col, gap)
)
SELECT
user_col, session_id,
MIN(ts_col) AS session_start, MAX(ts_col) AS session_end,
COUNT(*) AS event_count,
metrics
FROM sessionized
GROUP BY user_col, session_id
)
The SelectItems<Agg, sessionized> annotation means:
- The fragment must contain only aggregate expressions (Agg ceiling).
- Column references inside the fragment must be columns of the sessionized CTE (the context).
PASSING clauses — block syntax for fragment parameters¶
Instead of passing SelectItems arguments inline, use a trailing PASSING clause for ergonomic multi-line fragments:
-- Inline style
SELECT * FROM smelt.functions.session_rollup(
smelt.sources.events,
user_col => user_id,
ts_col => event_time,
metrics => (COUNT(*) AS event_count, SUM(amount) AS total_amount)
)
-- Block style with PASSING
SELECT *
FROM smelt.functions.session_rollup(
smelt.sources.events,
user_col => user_id,
ts_col => event_time
) PASSING metrics AS (
COUNT(*) AS event_count,
SUM(amount) AS total_amount
)
Multiple PASSING clauses are allowed (one per fragment parameter):
FROM smelt.functions.session_rollup(source, user_id, ts)
PASSING metrics AS (COUNT(*), SUM(revenue))
PASSING filters AS (amount > 0)
PASSING clauses are type-checked identically to inline arguments — the compiler verifies column references against the declared context.
External functions — smelt.extern¶
Declare a backend-native function that has no smelt body. This gives the compiler a typed signature for call-site checking without requiring an implementation:
The function is then callable as smelt.functions.regex_match(col, 'pattern') with full type checking at call sites. Unlike smelt.define, there is no AS (...) body.
Note
smelt.extern only accepts scalar (Expr<T>) and table (TableExpr) parameter types. Fragment-sort parameters (SelectItems, AggExpr, WindowExpr) are not supported on extern declarations.
Struct packing — smelt.as_struct()¶
smelt.as_struct(alias [EXCEPT col1, col2, ...]) converts a table reference into a struct value, optionally excluding columns. This is useful for passing all columns from a join source into a single struct field:
smelt.define enrich_order_with_as_struct(
orders: TableExpr<{order_id: BigInt, customer_id: Text, total: Numeric}>,
customers: TableExpr<{customer_id: Text, customer_name: Text, customer_tier: Text}>
) -> TableExpr AS (
SELECT
smelt.as_struct(o EXCEPT customer_id) AS order_data,
smelt.as_struct(c EXCEPT customer_id) AS customer_data
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.customer_id
)
The emitted SQL uses backend-specific struct literal syntax:
- DuckDB: {'order_id': o.order_id, 'total': o.total}
- Spark: struct(o.order_id AS order_id, o.total AS total)
- Postgres: row constructor syntax
Warning
smelt.as_struct requires a backend that supports struct literals. Declare the function's backends: frontmatter to restrict it to compatible backends (see Backends frontmatter below), or ensure your smelt.yml only targets backends that support struct literals.
Frontmatter for functions¶
Functions can have per-declaration YAML frontmatter. Place it immediately before the smelt.define or smelt.extern declaration it applies to.
backends:¶
Restrict a function to one or more backends:
---
backends: [duckdb]
---
smelt.define safe_divide(n: Expr<Numeric>, d: Expr<Numeric>) -> Expr<Double> AS (
CASE WHEN d = 0 OR d IS NULL THEN NULL
ELSE CAST(n AS DOUBLE) / CAST(d AS DOUBLE)
END
)
When backends: is declared, the compiler emits a BackendsMismatch diagnostic if the function is called from a model targeting a different backend.
deterministic: true / false¶
Marks whether the function returns the same result for the same inputs (default: unknown). The planner uses this to decide whether filters can be pushed across a function call boundary.
---
deterministic: true
---
smelt.define event_hour(ts: Expr<Timestamp>) -> Expr<Integer> AS (
EXTRACT(HOUR FROM ts)
)
provenance: and joins:¶
Declares column-level lineage and join relationships for the planner. Requires unstable_schema: true in smelt.yml.
---
deterministic: true
provenance: { margin: [source.revenue, source.cost] }
---
smelt.define add_margin_with_provenance(
source: TableExpr<{revenue: Numeric, cost: Numeric}>
) -> TableExpr AS (
SELECT source.*, revenue - cost AS margin FROM source
)
Diagnostic reference¶
| Code | Meaning | Fix |
|---|---|---|
ArgTypeMismatch |
Argument type doesn't satisfy the parameter constraint | CAST the argument or widen the annotation |
MissingArgument |
A required parameter was not supplied | Provide the argument or add a default |
UnknownSmeltFn |
smelt.functions.name not found in any function file |
Check the name and ensure the file is in functions/ |
FunctionBodyTypeMismatch |
A subexpression in the body has an unexpected type | Fix the body expression |
ReturnTypeMismatch |
Body evaluates to a type incompatible with the -> Expr<T> return annotation |
Adjust the body or the declared return type |
RowRequirementMissing |
A TableExpr argument is missing a required column |
Ensure the table has the column, or relax the row-requirement annotation |
ParameterShadowsColumn |
A parameter name matches a bare column in the TableExpr schema |
Rename the parameter or qualify the column reference |
DuplicateFunctionDefinition |
Two smelt.define declarations share a name |
Rename one of them |
BackendsMismatch |
Function's backends: is incompatible with the call site's target |
Ensure the function supports the target backend |
ExternFragmentParamUnsupported |
smelt.extern declares a SelectItems / fragment-sort parameter |
Remove the fragment parameter; extern functions are scalar/table-only |
FragmentColumnMissing |
Column referenced in a PASSING body isn't in the declared context |
Correct the column name or the context annotation |
FunctionCallCycle |
A function directly or indirectly calls itself | Restructure to eliminate the cycle |
Minimal end-to-end example¶
A complete self-contained project showing the full file → define → call cycle:
my-project/
smelt.yml
seeds/
raw_orders.csv # order_id, customer_id, status, amount
functions/
revenue.sql
status.sql
models/
stg_orders.sql
mart_revenue.sql
-- functions/revenue.sql
smelt.define safe_revenue(amount: Expr<Double>) -> Expr<Double> AS (
COALESCE(amount, 0.0)
)
-- functions/status.sql
smelt.define is_shipped(status: Expr<Text>) -> Expr<Boolean> AS (
status = 'shipped'
)
-- models/stg_orders.sql
---
name: stg_orders
materialization: table
---
SELECT
order_id,
customer_id,
smelt.functions.safe_revenue(CAST(amount AS DOUBLE)) AS amount,
status
FROM smelt.raw_orders
-- models/mart_revenue.sql
---
name: mart_revenue
materialization: table
---
SELECT
customer_id,
CAST(COUNT(*) AS INTEGER) AS order_count,
SUM(amount) AS total_revenue
FROM smelt.stg_orders
WHERE smelt.functions.is_shipped(status)
GROUP BY customer_id
Key rules demonstrated:
- functions/ is auto-discovered — no smelt.yml change needed.
- Call path = smelt.functions.<declared_name> — the filename stem is not included.
- Expr<Boolean> works directly in WHERE with no extra wrapping.
- Arguments may be positional or named (param => value); the compiler binds by name.