Skip to content

Project Structure

Directory Layout

A typical smelt project looks like this:

my-project/
├── smelt.yml            # Project configuration
├── sources.yml          # External source definitions (optional)
├── models/              # SQL and Python model files
│   ├── staging/         # Raw data cleanup
│   ├── intermediate/    # Business logic
│   └── marts/           # Final analytical tables
├── tests/               # Model test files
│   ├── test_user_activity.sql
│   └── test_cohort_sizes.sql
├── seeds/               # CSV files loaded as tables
│   └── raw/             # Subdirectories map to schemas
├── .smelt/              # Deployed schema state (gitignored)
│   └── schemas/         # Last-deployed column schemas per model
└── target/              # Generated artifacts (gitignored)
    └── dev.duckdb       # DuckDB database file

smelt.yml

The project configuration file defines your project name, model paths, seed paths, target environments, and default materialization strategy.

name: my-project
version: 1

model_paths:
  - models

seed_paths:
  - seeds

targets:
  dev:
    type: duckdb
    database: target/dev.duckdb
    schema: main
  prod:
    type: spark
    connect_url: sc://spark-cluster.internal:15002
    catalog: spark_catalog
    schema: analytics

default_materialization: view

For full configuration options, see the project configuration reference.

sources.yml

Source definitions declare external tables that your models read from. Defining sources enables schema validation and LSP support (type checking, autocompletion).

version: 1

sources:
  raw:
    tables:
      page_views:
        description: Raw page view events
        columns:
          - name: user_id
            type: INTEGER
          - name: url
            type: VARCHAR
          - name: viewed_at
            type: TIMESTAMP

Models reference sources with smelt.source('raw.page_views').

For full configuration options, see the sources configuration reference.

models/ Directory

Each .sql file in models/ defines one model. Models can include optional YAML frontmatter for configuration:

---
materialization: table
tags:
  - finance
  - daily
---
SELECT
    order_id,
    customer_id,
    amount
FROM smelt.ref('staging.orders')
WHERE status = 'completed'

Subdirectories are purely organizational -- they have no semantic meaning. Use whatever structure makes sense for your team. A common convention is:

  • staging/ -- Light transformations that clean raw source data (renaming columns, casting types, filtering invalid rows).
  • intermediate/ -- Business logic that combines staging models. Not typically exposed to end users.
  • marts/ -- Final analytical tables consumed by dashboards and reports.

Python models (.py files with a @model decorator) are also supported in the models directory. See the Python models guide for details.

seeds/ Directory

CSV files in seeds/ are loaded into the database as tables. This is useful for small reference data like country codes, category mappings, or test fixtures.

seeds/
├── raw/
│   ├── country_codes.csv
│   └── currency_rates.csv
└── lookups/
    └── status_mapping.csv

Subdirectory names become schema names in the database:

File path Database table
seeds/raw/country_codes.csv raw.country_codes
seeds/lookups/status_mapping.csv lookups.status_mapping

Models reference seeds the same way they reference other models: smelt.ref('raw.country_codes').

See the seeds guide for more details.

tests/ Directory

Test files are .sql files with materialization: test in YAML frontmatter. By convention, they live in a tests/ directory, which must be listed in model_paths in your smelt.yml:

model_paths:
  - models
  - tests
tests/
├── test_user_activity.sql
├── test_cohort_sizes.sql
└── test_customer_quantiles.sql

Each test defines mock inputs and expected outputs for a model (or a specific CTE within a model). Run tests with smelt test.

Tests can also be co-located as additional sections in model files -- see the Testing guide for details.

target/ Directory

The target/ directory contains generated artifacts and should be added to .gitignore.

For DuckDB targets, this is where the database file lives. smelt also stores run history and interval tracking state here for incremental models.

target/
├── dev.duckdb           # DuckDB database
├── run_history.json     # Execution log
└── intervals/           # Incremental model state
    └── staging.events.json

Note

The target/ directory is fully managed by smelt. You can safely delete it to start fresh -- smelt will recreate everything on the next run (incremental models will do a full refresh).

Example Projects

The smelt repository includes complete example projects you can use as references:

examples/timeseries/
A user and event analytics pipeline with 12 SQL models. Demonstrates incremental materialization, seed data, and interval-based processing.
examples/retail_analytics/
A TPC-DS-based retail analytics pipeline with 25 models organized in staging, intermediate, and marts layers. A good reference for larger project structure.
examples/multi_engine/
A multi-backend project demonstrating Spark and DuckDB models in the same pipeline with cross-engine Parquet data exchange.

To try an example locally:

cd examples/timeseries
smelt run