Targets and Backends¶

Targets are named execution environments defined in smelt.yml. Each target specifies a backend type (DuckDB, Spark) and connection details. You can define multiple targets and switch between them at runtime.

Defining targets¶

Targets are listed under the targets key in smelt.yml:

targets:
  dev:
    type: duckdb
    database: target/dev.duckdb
    schema: main

  spark:
    type: spark
    connect_url: sc://localhost:15002
    catalog: spark_catalog
    schema: main

The first target listed is not automatically the default -- smelt defaults to a target named dev unless you specify otherwise with --target.

Backends¶

DuckDB¶

DuckDB is an embedded analytical database. smelt bundles a DuckDB binary, so no separate installation is required.

targets:
  dev:
    type: duckdb
    database: target/dev.duckdb
    schema: main

Field	Required	Description
`type`	Yes	Must be `duckdb`.
`database`	Yes	Path to the DuckDB database file. Created automatically if it does not exist.
`schema`	Yes	Default schema for created tables and views.

DuckDB is the recommended backend for local development and testing. The database file is portable and can be inspected with the DuckDB CLI or any tool that supports DuckDB.

Spark¶

Spark is supported via Spark Connect for distributed execution.

targets:
  spark_prod:
    type: spark
    connect_url: sc://spark-cluster:15002
    catalog: spark_catalog
    schema: production

Field	Required	Description
`type`	Yes	Must be `spark`.
`connect_url`	Yes	Spark Connect URL (e.g., `sc://host:15002`).
`catalog`	No	Spark catalog name.
`schema`	Yes	Default schema for created tables and views.

Switching targets¶

Use the --target flag on any command:

# Run against DuckDB (default)
smelt run

# Run against Spark
smelt run --target spark

# Build with a specific target
smelt build --target spark_prod

# Seed into a specific target
smelt seed --target dev

Per-model target overrides¶

Individual models can be pinned to a specific target, regardless of the --target flag. This is useful in multi-engine setups where some models must run on a particular backend.

In smelt.yml:

models:
  heavy_aggregation:
    target: spark_prod
  quick_lookup:
    target: dev

In YAML frontmatter:

---
target: spark_prod
---
SELECT ...

Target precedence (highest to lowest):

YAML frontmatter in the SQL file
models: section in smelt.yml
--target CLI flag (defaults to dev)

Multi-target setup example¶

A typical project uses DuckDB for development and Spark for production:

name: my_project
version: 1

targets:
  dev:
    type: duckdb
    database: target/dev.duckdb
    schema: main

  spark:
    type: spark
    connect_url: sc://localhost:15002
    catalog: spark_catalog
    schema: main

models:
  # Most models use whatever target is passed via --target
  daily_revenue:
    materialization: table
  # This model always runs on Spark, even during dev
  large_aggregation:
    target: spark
    materialization: table

# Development: everything runs on DuckDB (except large_aggregation)
smelt build

# Production: everything runs on Spark
smelt build --target spark

Spark requirements¶

The Spark backend communicates via PySpark over Spark Connect. You need:

Python with PySpark installed (pip install pyspark)
Spark Connect server running on the configured URL
For Databricks: use pip install databricks-connect instead of pyspark
For EMR/Dataproc: ensure Spark Connect is enabled on the cluster

smelt uses PyO3 to call PySpark from Rust. Data is exchanged via Arrow (zero-copy), so there is no serialization overhead for query results.

Cross-engine data exchange¶

When models on different backends reference each other, smelt automatically handles data transfer via Parquet files.

How it works:

A Spark model writes its output as Parquet files in the warehouse directory
A DuckDB model references the Spark model with smelt.spark_model
smelt resolves the cross-engine reference and emits a read_parquet() call pointing to the Spark model's output files
DuckDB natively reads the Parquet files -- no explicit copy step

Example:

# smelt.yml
targets:
  local:
    type: duckdb
    database: target/dev.duckdb
    schema: main
  spark:
    type: spark
    connect_url: sc://localhost:15002
    schema: analytics

models:
  # Runs on Spark
  heavy_transform:
    target: spark
    materialization: table

  # Runs on DuckDB, reads from Spark output
  reporting_summary:
    materialization: table

-- models/reporting_summary.sql
-- This ref resolves to read_parquet('warehouse/analytics/heavy_transform/**/*.parquet')
SELECT category, SUM(amount) as total
FROM smelt.heavy_transform
GROUP BY 1

Note

Cross-engine exchange currently uses the local filesystem. Cloud storage (S3, GCS, ADLS) is not yet supported.

Cross-engine SQL compilation¶

smelt compiles SQL to the target's dialect automatically. You write standard SQL with smelt.<name> and smelt.sources.<name>, and smelt translates function calls, types, and syntax to match the target backend.