bv.var

Sample variance via Welford's online algorithm.

Signature

bv.var(
    field: str,
    *,
    window: str,
    where: bv.Col | None = None,
) -> AggDescriptor

Previously called bv.variance. Renamed to var per ADR-002 for Polars-convention consistency. The old name remains as a deprecation alias in v0.0.x and is removed in v0.1.

Description

bv.var returns the sample variance (Bessel-corrected, divisor n-1) of a numeric field across events that match the optional where= predicate. State is updated via Welford's online algorithm — numerically stable across many orders of magnitude and across long-running streams. Per-entity state is (count, mean, M2) (three f64 slots) regardless of stream length.

Use bv.var("amount", window="24h") for "amount-variance over the last day" or pair with bv.std for the standard-deviation form. Variance is the bedrock for outlier detection (bv.outlier_count uses sigma = sqrt(var)) and for entity-specific z-scores (bv.z_score).

bv.var belongs to the core family. Tier 1 cost (~12 ns floor / ~32 ns measured — five FP ops per update). Both field and window are required; the field must be i64 or f64 (rejected at register time otherwise).

Parameters

Name Type Required Default Description
field str Yes Name of the numeric field (i64 or f64).
window str Yes Duration string matching \d+(ms|s|m|h|d) or "forever".
where bv.Col No None Boolean expression on event fields; only matching events contribute.

Returns

A single f64. When the entity has seen fewer than two matching events, the result is null (Python None) — sample variance is undefined for n<2.

Complexity

Resource Bound
CPU per event Tier 1 (~12 ns algorithm floor / ~32 ns measured — Welford 5-FP-op step) — see cost-class.md
Memory per entity O(1)(count, mean, M2) per bucket (≤64 buckets)
Lifetime mode (window="forever") AllowedO(1) footprint per Phase 12.8 V0-MEM-GOV-02

Examples

Example 1: Per-user transaction-amount variance, hourly

import beava as bv

@bv.event
class Txn:
    user_id: str
    amount: float

@bv.table(key="user_id")
def TxnSpread(txns) -> bv.Table:
    return (
        txns.group_by("user_id")
            .agg(amount_var_1h=bv.var("amount", window="1h"))
    )

# Push events
app.push("Txn", {"user_id": "alice", "amount": 10.0})
app.push("Txn", {"user_id": "alice", "amount": 30.0})
app.push("Txn", {"user_id": "alice", "amount": 50.0})

# Query
result = app.get("TxnSpread", "alice")
# result == {"amount_var_1h": 400.0}  # sample variance: ((10-30)^2 + (30-30)^2 + (50-30)^2) / 2

Example 2: Latency-variance for successful payments only

@bv.table(key="user_id")
def LatencyDispersion(payments) -> bv.Table:
    return (
        payments.group_by("user_id")
                .agg(latency_var_ok=bv.var("latency_ms",
                                             window="24h",
                                             where=bv.col("status") == "ok"))
    )

Wire

JSON wire form in a register payload:

{
  "kind": "derivation",
  "name": "TxnSpread",
  "output_kind": "table",
  "key": ["user_id"],
  "agg": {
    "amount_var_1h": {
      "op": "var",
      "params": {
        "field": "amount",
        "window": "1h"
      }
    }
  }
}

See examples/wire/register-fraud-team.request.json for a full payload example.

Edge cases

  • Empty stream / cold-start: result is null — no matching events ⇒ no variance.
  • Single matching event (n=1): result is null — sample variance requires at least two observations (Bessel correction divides by n-1).
  • Non-numeric field: rejected at register time with schema_mismatch.
  • NaN inputs: propagate per IEEE-754 — a single NaN poisons the Welford M2 term. Filter with where=~bv.col("amount").isnull() if your source can emit NaN.
  • Missing window=: ValueError at SDK-helper-call time. Use window="forever" for explicit lifetime variance.
  • Lifetime mode (window="forever"): explicitly allowed — O(1) per entity. Welford is numerically stable across millions of events.

See also