bv.histogram
Fixed-bucket count histogram of a numeric field.
bucketsis a required register-time kwarg per V0-MEM-GOV-02.
Signature
bv.histogram(
field: str,
*,
buckets: list[float], # REQUIRED — register-time kwarg
where: bv.Col | None = None,
) -> AggDescriptor
Description
bv.histogram returns a count per fixed numeric bucket of field across
events that match the optional where= predicate. buckets is a strictly
increasing list of split points; the cells are (-inf, b[0]), [b[0], b[1]),
…, [b[n-1], +inf). For buckets=[10, 20, 50] you get four cells with
labels "<10", "10-20", "20-50", ">=50". Use it for "transaction
amount distribution by tier" or "request size in p50 / p90 / p99 buckets" —
features where you want a coarse shape, not a quantile estimate.
buckets is a required keyword argument per
V0-MEM-GOV-02: the lifetime-aggregation
memory contract requires every unbounded-by-default operator to declare a
finite per-entity ceiling at register time. bv.histogram's ceiling is
exactly len(buckets) + 1 u64 counters per entity. The register-time
JSON-prelude shim (pre_check_unbounded_op_in_lifetime_mode) rejects any
histogram payload missing buckets (or with an empty buckets array)
with the structured error code unbounded_op_in_lifetime_mode. There is
no fallback default — picking the bucket edges is a deliberate
capacity-planning + signal-design step.
bv.histogram belongs to the bounded-buffer family. Per-event update
is a linear scan over buckets (≤ ~20 in practice) plus a saturating
counter increment — Tier 1 floor (~10 ns / ~30 ns measured). The query
path materializes a BTreeMap of labelled counts and is therefore listed
under Tier 3 in cost-class.md; the apply-path cost
remains Tier 1. There is no window= kwarg in v0 — bv.histogram is
lifetime-only. For "amount histogram for the last 24 h", compose with
@bv.event(cold_after=...) or use bv.quantile
for a quantile sketch instead.
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
field |
str |
Yes | — | Name of the numeric field to bucket (f64 / i64). |
buckets |
list[float] |
Yes | — | Strictly increasing list of split points. n values produce n + 1 cells. Caps per-entity memory at n + 1 u64 counters per V0-MEM-GOV-02 BoundedByRequiredKwarg("buckets"). |
where |
bv.Col |
No | None |
Boolean expression on event fields; only matching events update the counters. |
Returns
A dict[str, int] keyed by bucket label (e.g. "<10", "10-20",
">=50") with i64 count values. Wire form is Value::Map with
BTreeMap-sorted iteration. When the entity has seen zero matching
events, the result is the dict with all counters at 0.
Complexity
| Resource | Bound |
|---|---|
| CPU per event | Tier 1 (~10 ns floor / ~30 ns measured — linear scan over ≤ ~20 buckets + saturating add) — see cost-class.md |
| Query | Tier 3 allocates a BTreeMap of len(buckets) + 1 entries on each app.get(...) — see cost-class.md — apply-thread cost is Tier 1; query-time cost is the asymmetry to flag when profiling |
| Memory per entity | BoundedByRequiredKwarg("buckets") — (len(buckets) + 1) × 8 bytes per Phase 12.8 V0-MEM-GOV-02 |
| Lifetime mode | Required — bv.histogram has no window= kwarg in v0; lifetime is the only mode |
Examples
Example 1: Transaction-amount histogram per user
import beava as bv
@bv.event
class Txn:
user_id: str
amount: float
@bv.table(key="user_id")
def UserAmountHistogram(txn) -> bv.Table:
return (
txn.group_by("user_id")
.agg(amount_hist=bv.histogram("amount",
buckets=[10.0, 50.0, 100.0, 500.0]))
)
# Push events
for amt in [5.0, 12.0, 25.0, 80.0, 200.0, 750.0]:
app.push("Txn", {"user_id": "alice", "amount": amt})
# Query
result = app.get("UserAmountHistogram", "alice")
# result == {"amount_hist": {"<10": 1, "10-50": 2, "50-100": 1, "100-500": 1, ">=500": 1}}
Example 2: Successful-only request size histogram
@bv.table(key="endpoint")
def EndpointReqSizeHist(reqs) -> bv.Table:
return (
reqs.group_by("endpoint")
.agg(req_size_hist=bv.histogram("size_bytes",
buckets=[1024, 65536, 1048576],
where=bv.col("status") == 200))
)
Wire
JSON wire form in a register payload:
{
"kind": "derivation",
"name": "UserAmountHistogram",
"output_kind": "table",
"key": ["user_id"],
"agg": {
"amount_hist": {
"op": "histogram",
"params": {
"field": "amount",
"buckets": [10.0, 50.0, 100.0, 500.0]
}
}
}
}
See examples/wire/register-fraud-team.request.json for a full payload example.
Edge cases
bucketsmissing or empty at register time: rejected with structured error codeunbounded_op_in_lifetime_modeper V0-MEM-GOV-02. The JSON-prelude shim catches this before any state is allocated.bucketsnot strictly increasing: rejected withaggregation_invalid_paramat register time.- Empty stream / cold-start: all counters are
0; the result dict is the full label set with zero values. window=kwarg attempted: raisesTypeErrorat SDK-helper-call time. v0 has no windowed histogram; use@bv.event(cold_after=...)to scope the lifetime via cold-entity eviction, orbv.quantilefor a windowed shape estimate.- Non-numeric source field (
Value::Str,Value::Bool): event silently dropped (not bucketed). For categorical histograms usebv.event_type_mix. - NaN inputs: dropped —
numeric_from_rowreturnsNoneon non-F64/I64payloads. For cleaner semantics filter withwhere=~bv.col(field).isnull(). - Bucket-label format: integer-valued edges render without trailing
.0("<10"not"<10.0"); fractional edges keep their decimal form. Stable across versions — Python SDKs can dict-key on the labels. - Counter overflow: the per-bucket
u64saturates at2^64 - 1(impossible in practice for a single entity). - Out-of-order event-time: does not matter. beava is processing-time-only per
project_redis_shaped_no_event_time_ever; buckets are populated in arrival order. - Lifetime mode: the only mode. Per-entity ceiling is
(len(buckets) + 1) × 8bytes per V0-MEM-GOV-02 BoundedByRequiredKwarg("buckets").
See also
- cost-class.md — performance tier (Tier 1 update / Tier 3 query)
- bv.quantile — quantile-estimate companion (when you want p50/p90/p99 instead of fixed buckets)
- bv.event_type_mix — categorical-distribution companion (proportions per category, not counts per bucket)
- bv.hour_of_day_histogram — fixed 24-bin time-of-day histogram (no
buckets=needed) - bv.dow_hour_histogram — fixed 168-bin day-of-week × hour histogram
- V0-MEM-GOV-02 — BoundedByRequiredKwarg memory governance contract
- pipeline-dsl/compilation-rules.md — chain compilation rules