bv.distance_from_home
Distance (km) of the current event from the running centroid of the entity's last
samplesmatching events.samplesis a soft-defaulted register-time kwarg per V0-MEM-GOV-02 —BoundedByConfig("samples", 100).
Signature
bv.distance_from_home(
*,
lat: str, # REQUIRED — name of the latitude field on the event
lon: str, # REQUIRED — name of the longitude field on the event
samples: int = 100, # SOFT default — BoundedByConfig per V0-MEM-GOV-02
where: bv.Col | None = None,
) -> AggDescriptor
Description
bv.distance_from_home returns the haversine distance (in km) from
the current event's (lat, lon) to the running centroid of the
entity's last samples matching events. State is a circular buffer
of capacity samples storing recent (lat, lon) pairs plus the
last-observed point. On every accepted event the buffer is overwritten
at the head index and the head advances modulo samples. The query
computes the arithmetic-mean centroid of the buffer's current contents,
then returns haversine(last, mean_centroid). Use it for "how far
is this transaction from where this user usually transacts?" — a
classical home-anomaly fraud signal that scores the current event
against the entity's recent geographic baseline.
samples is optional with a default of 100 per
V0-MEM-GOV-02: the
lifetime-aggregation memory contract requires every unbounded-by-default
operator to declare a finite per-entity ceiling at register time, but
for distance_from_home the ceiling is soft-defaulted
(BoundedByConfig("samples", 100)) — the kwarg is OPTIONAL, but the
per-entity ring is always sized at samples × 16 bytes (two f64s per
slot) regardless of caller behavior. Per register_validate.rs line
482, the JSON-prelude shim's pre_check_unbounded_op_in_lifetime_mode
treats absent-samples as the default rather than a register-time
rejection. The default of 100 fits the "centroid of recent home
locations" use case at ~1.6 KB/entity. Pick a larger value (e.g.
samples=500) for entities with sparse activity where the last 100
events poorly characterise their home; pick smaller (e.g.
samples=50) for memory-sensitive deployments at the cost of more
volatile centroids.
bv.distance_from_home belongs to the bounded-buffer + geo family.
Per-event UPDATE is Tier 3 (~12 ns floor / ~32 ns measured —
write to ring buffer at head index, O(1)) per
cost-class.md;
QUERY iterates the ring for the centroid (O(samples) — at most 100
points by default, cold-path on app.get(...)). Update cost is
effectively Tier 1; the cost class lists this op in Tier 3 because the
query path can dominate in query-heavy pipelines. State is behind a
Box for the AggOp::DistanceFromHome variant per Phase 12.9 boxing
(the variant fits the 80-byte AggOp enum cap; see
crates/beava-core/src/agg_op.rs line 489 and
Phase 12.9 SUMMARY).
There is no window= kwarg in v0 — bv.distance_from_home is
lifetime-only. The "home" is implicitly bounded by the ring's
samples capacity (newest event displaces oldest after ring fills);
for a time-bounded "home", compose with @bv.event(cold_after=...)
per V0-MEM-GOV-01.
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
lat |
str |
Yes | — | Name of the latitude field on the event (NOT a literal coordinate). Field value must be f64 or i64 decimal degrees in [-90, 90]. Resolved to a column index at register time per Plan 19.2-06 D-01 fast-path. |
lon |
str |
Yes | — | Name of the longitude field on the event. Field value must be f64 or i64 decimal degrees in [-180, 180]. |
samples |
int |
No | 100 |
Capacity of the circular buffer of recent locations whose mean defines "home". Soft-defaulted per V0-MEM-GOV-02 BoundedByConfig("samples", 100). Clamped to ≥ 1 at state construction. Per-entity memory: samples × 16 bytes. |
where |
bv.Col |
No | None |
Boolean expression on event fields; only matching events update the ring + last. |
Returns
A scalar float — the haversine distance in km from the current
(latest matching) event to the centroid of the buffer's current
contents. Cold-start (zero matching events): returns null. The
first event seeds the buffer (1 point in the partial ring) and last
to the same point — the centroid equals the latest observation, so
the first query returns 0.0.
Complexity
| Resource | Bound |
|---|---|
| CPU per event (UPDATE) | Tier 3 (~12 ns floor / ~32 ns measured — ring-buffer write at head index, O(1)) — see cost-class.md. Update cost is effectively Tier 1; classified Tier 3 because of the query-time iteration cost (see next row). |
| CPU per query | O(samples) centroid iteration — at most samples × (2 f64 reads + 1 add) per app.get(...). ~100 ns at samples=100. Cold-path; doesn't dominate apply-thread budget but flag if your pipeline is query-heavy. |
| Memory per entity | BoundedByConfig("samples", 100) — samples × 16 bytes per Phase 12.8 V0-MEM-GOV-02. At default samples=100: 1600 bytes for the ring + ~16 bytes for the last-point memo. Boxed inside AggOp per Phase 12.9 (crates/beava-core/src/agg_op.rs line 489). |
| Lifetime mode | Required — bv.distance_from_home has no window= kwarg in v0; lifetime is the only mode. |
Examples
Example 1: Per-card distance from recent transaction centroid
import beava as bv
@bv.event
class Txn:
card_id: str
latitude: float
longitude: float
@bv.table(key="card_id")
def CardDistanceFromHome(txns) -> bv.Table:
return (
txns.group_by("card_id")
.agg(km_from_home=bv.distance_from_home(
lat="latitude",
lon="longitude",
samples=100,
))
)
# After 100+ Boston-area transactions, then a Las Vegas swipe
result = app.get("CardDistanceFromHome", "card_xyz")
# result == {"km_from_home": 4128.5} # ~4100 km from the Boston centroid
Example 2: Smaller ring for memory-sensitive deployments
@bv.table(key="user_id")
def UserDistanceFromHomeSmall(events) -> bv.Table:
return (
events.group_by("user_id")
.agg(km_from_home=bv.distance_from_home(
lat="lat",
lon="lon",
samples=50, # 800 bytes/entity; more volatile centroid
))
)
Wire
JSON wire form in a register payload:
{
"kind": "derivation",
"name": "CardDistanceFromHome",
"output_kind": "table",
"key": ["card_id"],
"agg": {
"km_from_home": {
"op": "distance_from_home",
"params": {
"lat": "latitude",
"lon": "longitude",
"samples": 100
}
}
}
}
See examples/wire/register-fraud-team.request.json for a full payload example.
Edge cases
- Cold-start / zero matching events: result is
null—lastisNone, no centroid to compare against. - First matching event: result is
0.0—lastand the single buffer point coincide; haversine is zero. - Buffer not yet full (
< samplesevents seen): centroid is the mean of the partial buffer (e.g. mean of 5 points after 5 events whensamples=100). Newer events extend the buffer until it fills; subsequent events overwrite at the head index. samples=0or negativesamples: clamped to1at state construction (samples.max(1)). The ring degenerates to "distance from the previous matching event" (single-slot buffer).samplesmissing at register time: defaults to100perBoundedByConfig("samples", 100)— does NOT triggerunbounded_op_in_lifetime_mode. The ring is still bounded, just at the soft default.latorlonmissing on the event: event is silently dropped (no buffer write, nolastupdate). For stricter behavior, gate withwhere=~bv.col("lat").isnull() & ~bv.col("lon").isnull().- Non-numeric
lat/lon(Value::Str,Value::Bool,Value::Null): event is silently dropped (read_lat_lonreturnsNone). - All matching events at the same point: centroid coincides with the latest observation; result is
0.0. The home is wherever the entity has been. - Antimeridian crossings: the arithmetic-mean centroid does NOT handle longitude wrap correctly (e.g. mean of
lon=179andlon=-179is0, not±180). Workaround: shift longitudes into a continuous range upstream when the entity straddles the antimeridian. - Polar latitudes (
|lat| > ~85°): haversine itself stays accurate; the arithmetic-mean centroid in degree-space is a reasonable approximation away from the poles. Sub-arctic deployments should validate. window=kwarg attempted: raisesTypeErrorat SDK-helper-call time. v0's geo ops are lifetime-only. For "distance from home over the last 30 days", compose with@bv.event(cold_after="30d").- Snapshot reload: the ring buffer + head index serialize/deserialize cleanly via
serde. WAL replay reconstructs the same buffer contents in the same head order, so the centroid is replay-deterministic. - Out-of-order event-time: does not matter. beava is processing-time-only per
project_redis_shaped_no_event_time_ever; the ring is populated in arrival order. The "last" event for the query is whichever matching event arrived most recently. - Lifetime mode: the only mode. Per-entity ceiling is
samples × 16bytes per V0-MEM-GOV-02BoundedByConfig("samples", 100).
See also
- cost-class.md — performance tier (Tier 3 — query-iteration class; update is Tier 1)
- bv.geo_velocity — max-implied-km/h sibling (consecutive-event speed, not centroid distance)
- bv.geo_distance — total-path-length sibling (cumulative km, not centroid distance)
- bv.geo_spread — RMS-dispersion sibling (how spread out vs centroid distance — both express "geographic baseline")
- bv.most_recent_n — generic last-N-values sibling (this op is the geo-specific specialisation that computes a centroid + distance)
- V0-MEM-GOV-01 — cold-entity eviction (
@bv.event(cold_after=...)) for time-bounded "home" - V0-MEM-GOV-02 —
BoundedByConfiglifetime-aggregation contract - Phase 12.9 SUMMARY —
AggOp::DistanceFromHomeboxing context - pipeline-dsl/compilation-rules.md — chain compilation rules