Skip to content

Validating column data

Sometimes a single warn/fail expression isn’t enough — you want to assert facts about the columns a query returns. The validate block does that declaratively, one rule at a time.

check "users_valid" {
query = "select id, email, age, status from users"
validate {
column "id" { not_null = true unique = true }
column "email" { matches = ".+@.+" }
column "age" { type = "int" range = { min = 0, max = 130 } }
column "status" { allowed = ["active", "suspended", "closed"] }
}
}

validate is mutually exclusive with warn and fail on the same check. Use one style or the other, not both. For expression-based checks, see Writing checks.

How outcomes are decided

  • Any rule violation makes the check FAIL.
  • A column named in validate that is missing from the result, or a rule that can’t be evaluated, makes the check ERROR — groundtruth never reports a fake PASS.
  • Up to 10 offending sample rows are attached to the result (terminal output and the JSON sample array).
  • An empty result set is a PASS — there’s nothing to violate.
  • Rules skip NULL values, except not_null and null_rate, which are specifically about NULLs.

The validators

Each rule below is a single attribute inside a column "name" { ... } block.

type

column "age" { type = "int" }

Non-null values must be the named type. timestamp accepts RFC3339, YYYY-MM-DDTHH:MM:SS, or YYYY-MM-DD HH:MM:SS.

not_null

column "id" { not_null = true }

Fail if any NULL is present.

null_rate

column "middle_name" { null_rate = 0.2 }

Fail if the fraction of NULLs exceeds the given number (0–1).

allowed

column "status" { allowed = ["active", "suspended", "closed"] }

Non-null values must all be in this set.

matches

column "email" { matches = ".+@.+" }

Non-null string values must match the regular expression.

range

column "score" { range = { min = 0, max = 100 } }

Non-null numeric values must fall in [min, max] inclusive; a non-numeric value fails. Setting min > max is a config error.

unique

column "id" { unique = true }

Non-null values must be unique.

outliers

column "latency_ms" { outliers = "iqr" }

Flag statistical outliers. iqr needs at least 4 values; zscore flags values with |z| > 3 and needs at least 3 values.

distribution

column "measurement" { distribution = "normal" }

Runs a Jarque-Bera normality test. Needs at least 8 non-null values; it’s a violation when p < 0.05.

Validator reference

RuleValueMeaning
typeint | float | string | bool | timestampNon-null values must be this type. timestamp accepts RFC3339, YYYY-MM-DDTHH:MM:SS, or YYYY-MM-DD HH:MM:SS.
not_nulltrueFail if any NULL present.
null_ratenumber (0–1)Fail if the fraction of NULLs exceeds this.
allowedlist of stringsNon-null values must be in this set.
matchesregex stringNon-null string values must match.
range{ min = .., max = .. }Non-null numeric values in [min, max] inclusive; non-numeric fails. min > max is a config error.
uniquetrueNon-null values must be unique.
outliers"iqr" | "zscore"Flag outliers. iqr needs ≥4 values; zscore flags `
distribution"normal"Jarque-Bera normality test; needs ≥8 non-null values; violation if p < 0.05.