Pipeline DSL Expressions (`bv.col`)
Status: Authoritative for v0. Documents the post-13.5 target Python expression DSL. The current
python/beava/_col.pyalready implements every surface in this doc — Phase 13.5 polishes naming + cross-language parity. Last reviewed: 2026-05-03 (Phase 13.0).
Overview
bv.col(name) constructs a column-reference expression — a leaf in an
operator-overloaded AST. Composing it with arithmetic, comparison, boolean,
and method calls yields more nodes; calling .to_expr_string() (the SDK
does this implicitly at register time) emits a canonical parenthesised
string the server's expression evaluator parses back into a predicate.
The grammar is locked — the server-side parser depends on the canonical shape; SDK ports MUST produce the same string for the same Python source.
Grammar (canonical)
expr := field | literal | bin_op | unary_op | call
field := identifier | identifier "." identifier # e.g. x, Stream.x
literal := number | "'" string "'" | "true" | "false" | "null"
bin_op := "(" expr <space> op <space> expr ")" # EVERY binary op is parenthesized
op := "+" | "-" | "*" | "/" | ">" | ">=" | "<" | "<=" | "==" | "!=" | "and" | "or"
unary_op := "(" "not" <space> expr ")"
call := ident "(" expr ("," <space> expr)* ")"
String literal escaping: \\ becomes \\\\; ' becomes \\'. This is the
single mitigation point for predicate-string injection from user-supplied
strings (T-03-02-01).
bv.col(name) — column reference
import beava as bv
amount = bv.col("amount") # Field('amount')
amount.to_expr_string() # "amount"
# Qualified field reference (used when ops cross multiple upstream sources):
foreign = bv.col("Txn.amount")
foreign.to_expr_string() # "Txn.amount"
Args:
name— non-empty string.TypeErrorif absent / empty.
Returns: an _ExprAST leaf node, ready to compose with operators.
Arithmetic: + - * /
The four binary arithmetic operators are overloaded on _ExprAST and accept
either another expression or a Python scalar (auto-wrapped via bv.lit(...)):
(bv.col("a") + bv.col("b")).to_expr_string() # "(a + b)"
(bv.col("amount") * 2).to_expr_string() # "(amount * 2)"
(bv.col("amount") / 100).to_expr_string() # "(amount / 100)"
(5 - bv.col("balance")).to_expr_string() # "(5 - balance)"
Both forms (left-operand or right-operand scalar) compile correctly because
_ExprAST implements both __add__ and __radd__ (and the same for -,
*, /).
Type rules (server-side, applied at register time during schema propagation):
- Both operands must be numeric (
i64orf64);boolis NOT numeric in arithmetic context. Cast first via.cast("int"). - Division (
/) always widens tof64to avoid integer-truncation surprises. - Otherwise,
f64 + i64 → f64;i64 + i64 → i64.
Comparison: > >= < <= == !=
(bv.col("amount") > 100).to_expr_string() # "(amount > 100)"
(bv.col("amount") >= 100).to_expr_string() # "(amount >= 100)"
(bv.col("status") == "ok").to_expr_string() # "(status == 'ok')"
(bv.col("status") != "ok").to_expr_string() # "(status != 'ok')"
All comparison ops return bool regardless of operand types. String
literals on the right-hand side are auto-quoted and backslash-escaped per
the grammar.
Boolean combinators: & | ~
Python's keyword and / or / not cannot be operator-overloaded; Beava uses
& / | / ~ instead and emits the keywords in the canonical grammar:
left = bv.col("amount") > 100
right = bv.col("merchant") == "amazon"
(left & right).to_expr_string() # "((amount > 100) and (merchant == 'amazon'))"
(left | right).to_expr_string() # "((amount > 100) or (merchant == 'amazon'))"
(~left).to_expr_string() # "(not (amount > 100))"
Both operands MUST be boolean — the server-side type inference rejects
bool & i64 etc. with schema_mismatch. Use .cast("bool") if you have a
0/1 column you want to combine; or use (col != 0) to coerce first.
Operator precedence: Python's & and | bind tighter than > /
==. To get the obvious "either of two predicates" reading, parenthesise:
# WRONG — Python parses this as `bv.col("a") > (100 & bv.col("b")) > 0`
bv.col("a") > 100 & bv.col("b") > 0
# RIGHT
(bv.col("a") > 100) & (bv.col("b") > 0)
The SDK does NOT detect or rewrite the wrong form; it produces a schema-mismatch error at register time. Always parenthesise your sub-predicates.
.isnull() — null-check
bv.col("amount").isnull().to_expr_string() # "(amount == null)"
Shorthand for the (col == null) form. Emitted as a _BinOp so it composes
with boolean combinators:
non_null_pos = (~bv.col("amount").isnull()) & (bv.col("amount") > 0)
non_null_pos.to_expr_string()
# "((not (amount == null)) and (amount > 0))"
Use .isnull() rather than (col == None) for clarity — both compile to
the same wire form, but .isnull() reads better in chains.
.cast(type_name) — type coercion
bv.col("flag_str").cast("bool").to_expr_string() # "cast(flag_str, bool)"
bv.col("amount").cast("int").to_expr_string() # "cast(amount, int)"
Args:
type_name— one of"str","int","float","bool". Other strings are rejected at decoration time withValueError.
.cast() is a _Call node (not a _BinOp); the canonical form is
cast(<expr>, <type>). The cast target name renders as a bare
identifier (NOT a quoted string), so the server's parser can dispatch on
the type without a string-strip.
Use case 1 — within with_columns(...) to derive a new column:
@bv.event
def TxnWithFlagInt(txn: Txn) -> bv.Event:
return txn.with_columns(is_fraud_int=bv.col("is_fraud").cast("int"))
This produces a new is_fraud_int column on the downstream derivation that
aggregations can then sum. This is the recommended boolean-sum pattern —
see compilation-rules.md § Boolean-sum trick.
Use case 2 — within where= predicates:
@bv.table(key="user_id")
def UserStats(txn) -> bv.Table:
return (
txn.group_by("user_id")
.agg(big_count=bv.count(window="1h",
where=bv.col("amount").cast("int") > 100))
)
.alias(name) — rename in derivation context
Status: Implemented post-13.5; current code uses
**kwargsnaming onwith_columns(name=expr)instead. SDK porters in 13.6 may add.alias()as a convenience method. Documented here for forward compatibility.
expensive = (bv.col("amount") > 100).alias("is_expensive")
In the v0 API the recommended form is the kwarg-as-name shape:
@bv.event
def TxnAnnotated(txn: Txn) -> bv.Event:
return txn.with_columns(is_expensive=bv.col("amount") > 100)
The kwarg name (is_expensive) becomes the new column name on the wire;
.alias(...) exists in the spec for completeness with Polars conventions
but is not required to author v0 pipelines.
bv.lit(value) — literal-value expression
Status: Optional — most SDK code paths auto-wrap scalars via the internal
_wrap()helper (e.g.,bv.col("amount") > 100works without an explicitbv.lit(100)).bv.lit(...)is exposed for clarity in auto-generated / programmatic pipelines.
import beava as bv
bv.lit(100).to_expr_string() # "100"
bv.lit(3.14).to_expr_string() # "3.14"
bv.lit(True).to_expr_string() # "true"
bv.lit(None).to_expr_string() # "null"
bv.lit("amazon").to_expr_string() # "'amazon'"
Supported value types: int, float, bool, str, None. Other types
raise TypeError at decoration time. Strings are quoted with single quotes
and backslash-escape \\ and ' per the grammar.
You almost never need bv.lit(...) in user code — the operator overloading
auto-wraps Python scalars on either side of binary ops. Use it when you
need an explicit literal at a position the SDK cannot infer (e.g., as a
positional argument to a future call expression).
Per ADR-003, bv.lit is the canonical public surface for literal construction across all three SDKs (Python bv.lit(value), TypeScript bv.lit(value), Go beava.Lit(value)). The implicit operator-overloading coercion path keeps working in Python; bv.lit is exposed for explicit cases (constant columns via events.with_columns(source=bv.lit("web")), type-coercion patterns, and cross-language parity with TS/Go SDKs that lack Python's flexible operator overloading).
Compilation: every node knows how to emit
Each _ExprAST subclass implements to_expr_string(). The SDK calls this
at register time when serialising:
EventDerivation.filter(expr)→{"op": "filter", "expr": expr.to_expr_string()}EventDerivation.with_columns(name=expr, ...)→{"op": "with_columns", "exprs": {name: expr.to_expr_string(), ...}}bv.<agg>(field, where=expr, ...)(e.g.bv.count(where=bv.col("status") == "ok")) →{"op": "<agg>", "params": {"where": expr.to_expr_string(), ...}}
The expression string is the canonical contract between SDK and server.
SDK porters MUST produce the same string for semantically equivalent
expressions; round-trip tests on every fixture in
examples/wire/ verify this.
Validation at register-time
When the server parses the register payload (per
wire-spec OP_REGISTER) it runs Phase 4
expression validation on every expr string:
- Parse error (malformed grammar, unbalanced parens) →
RegistrationError(code="invalid_expression"). - Field reference unknown (e.g.,
bv.col("typo_amount")referencing a field not in the upstream schema) →RegistrationError(code="unknown_field_reference"). - Type mismatch (e.g., arithmetic on a
boolfield, boolean op on a numeric field) →RegistrationError(code="schema_mismatch"). - Cast target invalid (e.g.,
cast(x, "complex64")) →RegistrationError(code="invalid_cast_target").
The server emits all errors in a fail-soft batch — a single register call returns the full list of validation failures, not just the first.
Cross-references
- Pipeline DSL Overview — how expressions fit in the larger pipeline-authoring story.
- Pipeline DSL Compilation Rules — per-method H3
worked examples; expressions are referenced from
.filter,.with_columns,.cast, and aggregationwhere=kwargs. - Wire spec — canonical JSON contract.
- Error codes —
invalid_expression,unknown_field_reference,schema_mismatch,invalid_cast_target.