Skip to content

smelt Functions

smelt functions let you define reusable SQL fragments that are checked by the compiler. They remove copy-paste duplication, add type safety at call sites, and expose structured metadata to the planner for optimizations like filter pushdown and join elimination.

Defining a function

Functions live in a functions/ directory alongside your models/ directory. Each .sql file can contain one or more smelt.define declarations.

my_project/
  models/
    orders.sql
  functions/
    safe_divide.sql
    sessionize.sql
  smelt.yml

Basic syntax

smelt.define function_name(param1, param2) AS (
  -- SQL body using parameter names
  param1 / NULLIF(param2, 0)
)

Annotating parameters with types lets smelt check that callers pass compatible values and check the body in isolation:

smelt.define safe_divide(
  numerator: Expr<Numeric>,
  denominator: Expr<Numeric>
) -> Expr<Double> AS (
  CASE WHEN denominator = 0 OR denominator IS NULL
    THEN NULL
    ELSE CAST(numerator AS DOUBLE) / CAST(denominator AS DOUBLE)
  END
)

Default parameter values

Parameters can have default values, making them optional at call sites:

smelt.define sessionize(
  source: TableExpr,
  user_col: Expr<Text>,
  ts_col: Expr<Timestamp>,
  gap: Expr<Interval> = INTERVAL '30 minutes'
) -> TableExpr AS (
  SELECT
    source.*,
    SUM(CASE WHEN ts_col - LAG(ts_col) OVER (PARTITION BY user_col ORDER BY ts_col) > gap
             THEN 1 ELSE 0 END)
      OVER (PARTITION BY user_col ORDER BY ts_col) AS session_id
  FROM source
)

Calling a function

Use smelt.functions.<name>() to call a user-defined function:

-- In a model file
SELECT
  smelt.functions.safe_divide(revenue, cost) AS margin
FROM smelt.orders

A function's call path is derived from the workspace-relative directory of the file it is declared in, joined with the declared name. The filename stem itself is not part of the call path. The mapping is enforced — calling a function under the wrong path is an UnknownSmeltFn diagnostic.

Filesystem location Declared name Call path
functions/safe_divide.sql safe_divide smelt.functions.safe_divide(...)
functions/status.sql is_shipped smelt.functions.is_shipped(...)
functions/patterns/x.sql session_rollup smelt.functions.patterns.session_rollup(...)
utils/math.sql safe_divide smelt.utils.safe_divide(...)

Renaming a function or moving its file changes the call path, the same way moving a model does.

Verifying function calls

Before doing a full smelt build, confirm that a function call expands correctly using --show-plan:

smelt build --show-plan models/<model>.sql

The ExpandedCall node in the plan output shows the inlined function body with argument substitution already applied. This is faster than a full build and catches wrong-path errors (such as UnknownSmeltFn) without touching the database.

Note that --show-plan requires a positional model file path — there is no project-wide show-plan mode. See smelt build in the CLI reference for details.

Declared return types and model schemas

For typed functions (those with a -> ReturnType annotation), smelt uses the declared return type as the column type in downstream models. smelt table <model> reflects this — a column fed by a -> Expr<Double> call shows as DOUBLE. Downstream aggregates also use the declared type: SUM over a -> Expr<Double> call infers as DOUBLE, not BIGINT.

NULL semantics in function bodies

smelt function bodies are SQL expressions — they inherit standard SQL NULL propagation. A body that compares a nullable column with = returns NULL (which is falsy) when the argument is NULL, not an error.

smelt.define is_shipped(status: Expr<Text>) -> Expr<Boolean> AS (
  status = 'shipped'
)
  • is_shipped('shipped')true
  • is_shipped('cancelled')false
  • is_shipped(NULL)NULL (falsy) — NULL = 'shipped' is NULL in SQL

This matters in LEFT JOIN contexts: when the right side has no matching row, nullable columns arrive as NULL. is_shipped(NULL) returns falsy, so CASE WHEN smelt.functions.is_shipped(o.status) THEN o.amount END naturally produces NULL for non-matching rows — no explicit NULL guard is needed at the call site. Wrap the whole SUM(...) in COALESCE(..., 0) if you need 0 instead of NULL for customers with no shipped orders.

If your function body needs to distinguish NULL from false, use COALESCE inside the body:

smelt.define is_shipped_safe(status: Expr<Text>) -> Expr<Boolean> AS (
  COALESCE(status = 'shipped', false)
)

Calling in boolean positions

A function whose declared return type is Expr<Boolean> can be used in any boolean position the SQL grammar accepts: WHERE, HAVING, JOIN ON, QUALIFY, CASE WHEN, and as a SELECT-list expression.

-- functions/status.sql
smelt.define is_shipped(status: Expr<Text>) -> Expr<Boolean> AS (
  status = 'shipped' OR status = 'delivered'
)
-- models/shipped_orders.sql
SELECT *
FROM smelt.orders
WHERE smelt.functions.is_shipped(status)

Named arguments

Pass arguments by name to improve readability or skip over defaulted parameters:

SELECT *
FROM smelt.functions.sessionize(
  smelt.events,
  user_col => user_id,
  ts_col   => event_time,
  gap      => INTERVAL '1 hour'
)

Named arguments are bound to parameters by name, not by position. You can supply them in any order — the compiler maps each param => value to the matching parameter slot before running the body substitution pass.

Multi-argument calls with default parameters

When a function declares a default value for a parameter, you can omit it at the call site. Named arguments make it clear which parameters you're supplying and which rely on defaults:

-- functions/sessionize.sql
smelt.define sessionize(
    source:       TableExpr,
    partition_col: Expr<Integer>,
    ts_col:       Expr<Date>,
    platform_col: Expr<Text>,
    gap:          Expr<BigInt> = 30 * 60 * 1000000   -- 30 minutes in μs
) -> TableExpr AS (
    -- ... window-function body ...
)
-- models/silver/sessions.sql
-- Supply the four required args by name; `gap` uses the declared default.
-- Arg order differs from the declaration order to show order-independence.
WITH sessionized AS (
    SELECT *
    FROM smelt.functions.sessionize(
        source        => smelt.silver.events_parsed,
        platform_col  => platform,
        partition_col => device_id,
        ts_col        => event_ts
    )
),
-- ... downstream aggregation ...

The compiler resolves each param => value pair against the declaration, then applies the default 30 * 60 * 1000000 for the omitted gap. The result is identical to writing all five arguments positionally.

The alias after the closing ) is optional — the compiler synthesises a unique alias for the derived table when none is supplied, so the target engine receives valid SQL regardless. A user-supplied alias (e.g. FROM smelt.functions.sessionize(...) AS s) overrides the synthesised one.

Projecting struct-returning function outputs

A function whose return type is Expr<Struct<{f₁: T₁, …, fₙ: Tₙ}>> can be projected directly in a SELECT list using the .* suffix. The compiler lowers the call to n separate aliased columns, one per struct field.

Declaration

-- functions/parse_event_payload.sql
smelt.define parse_event_payload(
    payload_json: Expr<Text>
) -> Expr<Struct<{event_name: Text, platform: Text, url: Text}>> AS (
    {
        json_extract_string(payload_json, '$.event_name') AS event_name,
        json_extract_string(payload_json, '$.platform') AS platform,
        json_extract_string(payload_json, '$.url') AS url
    }
)

Call site with .* projection

-- models/silver/events_parsed.sql
SELECT
    event_id,
    device_id,
    user_id,
    CAST(event_date AS DATE) + to_seconds(seconds_in_day) AS event_ts,
    CAST(event_date AS DATE) AS event_date,
    smelt.functions.parse_event_payload(payload).*
FROM smelt.bronze.raw_events

The compiler expands smelt.functions.parse_event_payload(payload).* into:

json_extract_string(payload, '$.event_name') AS event_name,
json_extract_string(payload, '$.platform') AS platform,
json_extract_string(payload, '$.url') AS url

The resulting model exposes event_name, platform, and url as top-level columns — identical to writing the three calls inline, but defined once in the function and reused across any model that calls it.

The declared return type is authoritative: downstream models that reference smelt.silver.events_parsed.event_name see Text, regardless of what DuckDB would infer from the raw json_extract_string call.

Three tiers of annotation

smelt uses gradual typing — you choose how much annotation to add.

Tier 1 — unannotated (quick and personal)

smelt.define add_one(x) AS (x + 1)
  • No type annotations on parameters or return type.
  • The body is expanded at each call site with the caller's concrete types substituted in.
  • Type errors only surface at call sites where the concrete types cause a problem.
  • Good for quick personal utilities or exploratory work.

Tier 2 — parameters annotated (production code)

smelt.define add_margin(
  revenue: Expr<Numeric>,
  cost: Expr<Numeric>
) AS (revenue - cost)
  • Every parameter is annotated.
  • The body is type-checked once at definition time against the declared parameter types.
  • Errors are reported against the function body, not each call site.
  • Callers whose argument types don't satisfy the constraints get an ArgTypeMismatch diagnostic immediately.

Tier 3 — fully annotated (library quality)

smelt.define safe_divide(
  numerator: Expr<Numeric>,
  denominator: Expr<Numeric>
) -> Expr<Double> AS (
  CASE WHEN denominator = 0 OR denominator IS NULL
    THEN NULL
    ELSE CAST(numerator AS DOUBLE) / CAST(denominator AS DOUBLE)
  END
)
  • Both parameters and return type are annotated.
  • The return type is verified against the body — if the body evaluates to a different type, you get a ReturnTypeMismatch diagnostic at the function declaration.
  • LSP hover on a call site shows the declared return type directly.

See docs/smelt-functions-upgrade-story.md for how to migrate from Tier 1 to Tier 2 without breaking existing callers.

Type constraints

The type language for parameter and return annotations:

Annotation Meaning
Expr<Integer> Any integer expression
Expr<Numeric> Any numeric expression (Integer, BigInt, Float, Double, Decimal)
Expr<Double> A double-precision float expression
Expr<Text> A text/varchar expression
Expr<Boolean> A boolean expression
Expr<Timestamp> A timestamp expression
Expr<Date> A date expression
Expr<Interval> An interval expression
Expr<Any> Any scalar expression type
TableExpr A table-valued argument (bare row polymorphism)
TableExpr<{col: Type, ..r}> A table with at least the listed columns
AggExpr<T> An aggregate expression
WindowExpr<T> A window (analytic) expression
SelectItems<K> A SELECT-list fragment with kind ceiling K (Scalar, Agg, Window)
SelectItems<K, ctx> As above, but columns must belong to context ctx

Fragment parameters — TableExpr and SelectItems

Fragment sorts are the key to composable pipelines. They let you pass table-valued arguments and SELECT-list fragments to functions.

TableExpr — table-valued parameters

TableExpr parameters accept a table reference (smelt.<name>, smelt.sources.<name>, a CTE, or a subquery):

smelt.define add_margin(
  source: TableExpr<{revenue: Numeric, cost: Numeric}>
) -> TableExpr AS (
  SELECT source.*, revenue - cost AS margin
  FROM source
)

Call it with any table that has at least those columns:

SELECT * FROM smelt.functions.add_margin(smelt.orders)

The row-requirement annotation TableExpr<{revenue: Numeric, cost: Numeric}> is checked at each call site — the compiler reports a RowRequirementMissing diagnostic if the supplied table is missing a required column.

SelectItems — SELECT-list fragments

SelectItems<Agg> parameters accept aggregate expressions, passed inline or via a PASSING clause:

smelt.define session_rollup(
  source: TableExpr,
  user_col: Expr<Text>,
  ts_col: Expr<Timestamp>,
  gap: Expr<Interval> = INTERVAL '30 minutes',
  metrics: SelectItems<Agg, sessionized> = ()
) -> TableExpr AS (
  WITH sessionized AS (
    SELECT * FROM smelt.functions.sessionize(source, user_col, ts_col, gap)
  )
  SELECT
    user_col, session_id,
    MIN(ts_col) AS session_start, MAX(ts_col) AS session_end,
    COUNT(*) AS event_count,
    metrics
  FROM sessionized
  GROUP BY user_col, session_id
)

The SelectItems<Agg, sessionized> annotation means: - The fragment must contain only aggregate expressions (Agg ceiling). - Column references inside the fragment must be columns of the sessionized CTE (the context).

PASSING clauses — block syntax for fragment parameters

Instead of passing SelectItems arguments inline, use a trailing PASSING clause for ergonomic multi-line fragments:

-- Inline style
SELECT * FROM smelt.functions.session_rollup(
  smelt.sources.events,
  user_col => user_id,
  ts_col   => event_time,
  metrics  => (COUNT(*) AS event_count, SUM(amount) AS total_amount)
)

-- Block style with PASSING
SELECT *
FROM smelt.functions.session_rollup(
  smelt.sources.events,
  user_col => user_id,
  ts_col   => event_time
) PASSING metrics AS (
  COUNT(*) AS event_count,
  SUM(amount) AS total_amount
)

Multiple PASSING clauses are allowed (one per fragment parameter):

FROM smelt.functions.session_rollup(source, user_id, ts)
PASSING metrics AS (COUNT(*), SUM(revenue))
PASSING filters AS (amount > 0)

PASSING clauses are type-checked identically to inline arguments — the compiler verifies column references against the declared context.

External functions — smelt.extern

Declare a backend-native function that has no smelt body. This gives the compiler a typed signature for call-site checking without requiring an implementation:

smelt.extern regex_match(
  text: Expr<Text>,
  pattern: Expr<Text>
) -> Expr<Boolean>

The function is then callable as smelt.functions.regex_match(col, 'pattern') with full type checking at call sites. Unlike smelt.define, there is no AS (...) body.

Note

smelt.extern only accepts scalar (Expr<T>) and table (TableExpr) parameter types. Fragment-sort parameters (SelectItems, AggExpr, WindowExpr) are not supported on extern declarations.

Struct packing — smelt.as_struct()

smelt.as_struct(alias [EXCEPT col1, col2, ...]) converts a table reference into a struct value, optionally excluding columns. This is useful for passing all columns from a join source into a single struct field:

smelt.define enrich_order_with_as_struct(
  orders: TableExpr<{order_id: BigInt, customer_id: Text, total: Numeric}>,
  customers: TableExpr<{customer_id: Text, customer_name: Text, customer_tier: Text}>
) -> TableExpr AS (
  SELECT
    smelt.as_struct(o EXCEPT customer_id) AS order_data,
    smelt.as_struct(c EXCEPT customer_id) AS customer_data
  FROM orders AS o
  JOIN customers AS c ON o.customer_id = c.customer_id
)

The emitted SQL uses backend-specific struct literal syntax: - DuckDB: {'order_id': o.order_id, 'total': o.total} - Spark: struct(o.order_id AS order_id, o.total AS total) - Postgres: row constructor syntax

Warning

smelt.as_struct requires a backend that supports struct literals. Declare the function's backends: frontmatter to restrict it to compatible backends (see Backends frontmatter below), or ensure your smelt.yml only targets backends that support struct literals.

Frontmatter for functions

Functions can have per-declaration YAML frontmatter. Place it immediately before the smelt.define or smelt.extern declaration it applies to.

backends:

Restrict a function to one or more backends:

---
backends: [duckdb]
---
smelt.define safe_divide(n: Expr<Numeric>, d: Expr<Numeric>) -> Expr<Double> AS (
  CASE WHEN d = 0 OR d IS NULL THEN NULL
       ELSE CAST(n AS DOUBLE) / CAST(d AS DOUBLE)
  END
)

When backends: is declared, the compiler emits a BackendsMismatch diagnostic if the function is called from a model targeting a different backend.

deterministic: true / false

Marks whether the function returns the same result for the same inputs (default: unknown). The planner uses this to decide whether filters can be pushed across a function call boundary.

---
deterministic: true
---
smelt.define event_hour(ts: Expr<Timestamp>) -> Expr<Integer> AS (
  EXTRACT(HOUR FROM ts)
)

provenance: and joins:

Declares column-level lineage and join relationships for the planner. Requires unstable_schema: true in smelt.yml.

---
deterministic: true
provenance: { margin: [source.revenue, source.cost] }
---
smelt.define add_margin_with_provenance(
  source: TableExpr<{revenue: Numeric, cost: Numeric}>
) -> TableExpr AS (
  SELECT source.*, revenue - cost AS margin FROM source
)

Diagnostic reference

Code Meaning Fix
ArgTypeMismatch Argument type doesn't satisfy the parameter constraint CAST the argument or widen the annotation
MissingArgument A required parameter was not supplied Provide the argument or add a default
UnknownSmeltFn smelt.functions.name not found in any function file Check the name and ensure the file is in functions/
FunctionBodyTypeMismatch A subexpression in the body has an unexpected type Fix the body expression
ReturnTypeMismatch Body evaluates to a type incompatible with the -> Expr<T> return annotation Adjust the body or the declared return type
RowRequirementMissing A TableExpr argument is missing a required column Ensure the table has the column, or relax the row-requirement annotation
ParameterShadowsColumn A parameter name matches a bare column in the TableExpr schema Rename the parameter or qualify the column reference
DuplicateFunctionDefinition Two smelt.define declarations share a name Rename one of them
BackendsMismatch Function's backends: is incompatible with the call site's target Ensure the function supports the target backend
ExternFragmentParamUnsupported smelt.extern declares a SelectItems / fragment-sort parameter Remove the fragment parameter; extern functions are scalar/table-only
FragmentColumnMissing Column referenced in a PASSING body isn't in the declared context Correct the column name or the context annotation
FunctionCallCycle A function directly or indirectly calls itself Restructure to eliminate the cycle

Minimal end-to-end example

A complete self-contained project showing the full file → define → call cycle:

my-project/
  smelt.yml
  seeds/
    raw_orders.csv        # order_id, customer_id, status, amount
  functions/
    revenue.sql
    status.sql
  models/
    stg_orders.sql
    mart_revenue.sql
-- functions/revenue.sql
smelt.define safe_revenue(amount: Expr<Double>) -> Expr<Double> AS (
  COALESCE(amount, 0.0)
)
-- functions/status.sql
smelt.define is_shipped(status: Expr<Text>) -> Expr<Boolean> AS (
  status = 'shipped'
)
-- models/stg_orders.sql
---
name: stg_orders
materialization: table
---
SELECT
  order_id,
  customer_id,
  smelt.functions.safe_revenue(CAST(amount AS DOUBLE)) AS amount,
  status
FROM smelt.raw_orders
-- models/mart_revenue.sql
---
name: mart_revenue
materialization: table
---
SELECT
  customer_id,
  CAST(COUNT(*) AS INTEGER) AS order_count,
  SUM(amount) AS total_revenue
FROM smelt.stg_orders
WHERE smelt.functions.is_shipped(status)
GROUP BY customer_id

Key rules demonstrated: - functions/ is auto-discovered — no smelt.yml change needed. - Call path = smelt.functions.<declared_name> — the filename stem is not included. - Expr<Boolean> works directly in WHERE with no extra wrapping. - Arguments may be positional or named (param => value); the compiler binds by name.