router/v1 — Intent Classification + Sink Routing¶
Status: Draft (design-locked, ready for first implementation) · Stability: v1 will be frozen with the first reference router (
router-v1) · Implementations: in-tree only until v1 is frozen.
The router/v1 surface sits between asr/v1 (which produces transcribed
text in partially-filled IntentEnvelopes) and the orchestrator (which fans
envelopes out to sinks per sink/v1). The router does two jobs:
- Classify the envelope's intent (
Intent.Kind,Intent.Confidence, optionalIntent.Reasoning). - Route the envelope to sinks (
Routing.PrimarySink,Routing.AlsoTo,Routing.Suppress).
Optionally, the router also derives envelopes — splitting compound utterances, rewriting transcripts via coreference, emitting session summaries, and dropping noise.
This document is the contract. Routers that conform to it can be loaded
by any version of the Vox core that supports router/v1.
Scope¶
router/v1 covers:
- The router's input contract (partially-filled
IntentEnvelope) - The classifier chain (pluggable: rules / local-model / LLM)
- Per-stream history (shallow context window)
- Routing rule format (declarative table + per-source overrides)
- Suppress patterns (kill-switch list)
- Bundled routing presets
- Derived envelope rules (splitting, coreference, summaries)
- Drop rules
- Provenance stamping for derived envelopes
- Router interface lifecycle
- Error semantics
- Audit hook contract
- Versioning and stability rules
router/v1 does not cover:
- Audio capture or transcription (
capture/v1,asr/v1) - Speaker diarization (
segment/v1) - Sink delivery (
sink/v1) - The orchestrator that drives the pipeline (internal)
- Predicate-based dynamic routing rules (deferred to v1.x — additive)
- RBAC / authz checks on routing decisions (Enterprise:
authz/v1)
Input — what the router consumes¶
The router receives a partially-filled IntentEnvelope (full schema in
sink-v1.md). When asr/v1 finishes transcribing a speech
segment, the envelope has:
- Populated: Identity (EnvelopeID, SessionID, StreamID, ParentID), Time span (StartedAt, EndedAt, Duration), Content (Transcript, Language, Confidence), Speaker (Label, SourceKind, optional Embedding), Provenance (ASRBackend, SegmenterImpl, CapturedAt, Pipeline), optional AudioRef.
- Empty:
Intentblock,Routingblock.
The router fills Intent and Routing, then emits the completed envelope.
Per-stream history¶
The router maintains a per-stream sliding window of recent envelopes for coreference and short-term context.
router:
history_depth: 1 # default: just the previous envelope
# range: 0 (stateless) to 20
- Per-stream, not per-session — different streams (your mic vs system audio) keep independent histories.
- Read-only from the classifier's perspective. The router never mutates
past envelopes; if it needs to derive new content from past context, it
emits a new envelope with
ParentIDset. - Stateless mode (
history_depth: 0) supported for deterministic-replay and diagnostic use cases.
Classifier Chain¶
The router doesn't classify directly. It dispatches to classifiers in a
configurable chain. Each classifier returns {intent_kind, confidence,
reasoning} or "abstain".
Three classifier families (v1)¶
| Classifier | Cost | Latency budget | Determinism | Default in chain? |
|---|---|---|---|---|
rules |
Free | 10 ms | Yes | Yes |
local-model |
Free | 500 ms | Mostly | Yes |
llm |
Tokens | 5 s | No | No (opt-in) |
Chain semantics¶
- Classifiers run in declared order.
- Each classifier returns confidence ∈
[0.0, 1.0]or abstains. - First classifier with confidence ≥
min_confidence_thresholdwins; chain stops. - If all classifiers abstain or fall below threshold →
Intent.Kind = unclassified→ routes perroutes.unclassified(fallback).
Configuration¶
router:
history_depth: 1
min_confidence_threshold: 0.7
classifier_chain:
- type: rules
patterns_file: ~/.vox/router/rules.yaml
- type: local-model
model_path: ~/.vox/router/intent-classifier.onnx
# llm classifier omitted by default — opt-in
# - type: llm
# sink: llm-anthropic # reuse an existing LLM sink in "classify" mode
# system_prompt_template: ~/.vox/router/classify.tmpl
rules classifier¶
Lightweight regex / keyword / verb-pattern matching. The file format:
rules:
- intent: command
patterns:
- "^(create|make|add|file|open)\\s+(a |an )?(bd |beads |task |issue |ticket)"
- "^(remind me to|todo|action item:)"
confidence: 0.85
- intent: question
patterns:
- "\\?$"
- "^(what|why|how|when|where|who|can you|could you|would you|do you|does)"
confidence: 0.75
- intent: prompt
patterns:
- "^(write|generate|draft|compose|tell me)"
confidence: 0.7
- intent: note
patterns:
- "^(note:|fyi:|just noting|btw)"
confidence: 0.8
Bundled defaults ship with Vox; users override per repo or per user.
local-model classifier¶
A small ONNX text classifier (DistilBERT-class), fetchable via
vox model download intent-classifier-v1. Not bundled (license + size).
Custom models welcome with the same input/output signature:
- Input: UTF-8 transcript text
- Output: one of the 9
IntentKindvalues + a confidence float
llm classifier¶
Reuses an existing LLM sink in "classify" mode. The router sends a
structured prompt; the LLM returns JSON with intent_kind + confidence +
reasoning. Reasoning is captured in Intent.Reasoning for audit.
Off by default because every utterance becomes an LLM call → cost + latency + offline-story breakage. Users with budget and quality requirements enable it as the chain's terminal classifier.
Latency budgets¶
| Classifier | Default budget | On overrun |
|---|---|---|
rules |
10 ms | Hard cap; abort classifier |
local-model |
500 ms | Skipped if budget exhausted |
llm |
5 s | User-tuned; configurable |
| End-to-end router | 600 ms with rules+local-model | router.latency_exceeded telemetry event |
Routing Rule Format¶
The v1 routing table¶
Declarative intent_kind → {primary, also_to} with optional by_source
override.
router:
routes:
prompt:
primary: llm-anthropic
also_to: [s3]
command:
primary: llm-anthropic
also_to: [bd, s3]
todo:
primary: bd
also_to: [s3, ox-ledger]
note:
primary: ox-ledger
also_to: [s3]
question:
primary: llm-anthropic
also_to: [s3]
summary:
primary: email-smtp
also_to: [s3, ox-ledger]
raw_transcript:
primary: s3
also_to: [ox-ledger]
llm_response:
primary: s3
also_to: [email-smtp]
unclassified:
primary: local-file # fallback when classifier gives up
also_to: []
# Per-source-kind overrides (sparse — only entries that differ from default)
by_source:
online:
summary:
primary: email-smtp
also_to: [s3, ox-ledger, bd] # meetings extract action items into bd
todo:
primary: bd
also_to: [s3, ox-ledger, email-smtp]
self:
prompt:
primary: llm-anthropic
also_to: [] # don't archive every dictation prompt
file:
raw_transcript:
primary: local-file
also_to: []
Lookup order¶
by_source[envelope.Speaker.SourceKind][envelope.Intent.Kind]— most specificroutes[envelope.Intent.Kind]— default for that intentroutes.unclassified— terminal fallback
Result populates Routing.PrimarySink and Routing.AlsoTo. Sinks then
apply their own filter blocks (locked in sink/v1) — both layers must
pass for actual delivery.
Suppress Patterns — The Safety Net¶
Pattern-based opt-outs that go straight into Routing.Suppress. Additive to
any Suppress entries from upstream stages.
router:
suppress:
- name: private-marker
pattern: "(?i)\\bprivate\\b"
sinks: [s3, ox-ledger, email-smtp] # mentioned "private" — keep off shared surfaces
- name: secret-detection
pattern: "(?i)\\b(password|secret|api[_\\s]?key|token|bearer\\s)\\b"
sinks: [s3, ox-ledger, email-smtp, llm-anthropic, llm-openai, llm-google]
# secrets stay local-only
- name: pii-email-fallback
pattern: "[\\w.+-]+@[\\w-]+\\.[\\w.-]+"
sinks: [llm-google] # example: never send emails to a specific LLM
Defaults¶
Two patterns ship enabled by default — table-stakes safety:
private-marker— "private" keyword → off shared surfacessecret-detection— common secret patterns → no network sinks
Disable via router.suppress.enabled: false (not recommended).
Bundled Presets¶
Most users won't write a routing table from scratch. They pick a preset and tweak.
router:
preset: meetings-and-dictation # bundled preset name
overrides: # applied on top of the preset
summary:
also_to: [s3, ox-ledger, my-custom-sink]
The six bundled presets¶
| Preset | Primary use case | Key routing characteristic |
|---|---|---|
meetings-and-dictation (default) |
Mixed: meetings + dictation + calls | Balanced; all intent_kinds routed sensibly |
dictation-only |
Voice memos, prompt composition | Self source emphasized; commands → bd; prompts → LLM; minimal archive |
meeting-capture |
Recording meetings (in-person + online) | online + in-person emphasized; summary → email; action items → bd; full archive |
archive-everything |
Compliance / over-record posture | Every envelope to s3 + ox-ledger regardless of intent |
local-only |
No-network / offline / paranoid | All sinks are local-file or bd; no LLM, no email, no S3, no ox |
llm-heavy |
Power user with budget | LLM in classification chain enabled; LLM primary for most intents; full archive |
Presets live at internal/router/presets/<name>.yaml. vox router presets
lists them; vox router preset show <name> prints the resolved config.
Copy and edit:
vox router preset show meetings-and-dictation > ~/.vox/router.yaml
Derived Envelopes¶
The router can emit zero, one, or many envelopes per input. Four derivation modes:
(a) Splitting compound utterances — opt-in¶
Compound utterances ("send Sarah an email and create a bd issue") become multiple envelopes. The LLM classifier handles segmentation; rules and local-model can't reliably detect compounds.
router:
splitting:
enabled: false # opt-in; off by default
classifier: llm # LLM is required for reliable splitting
min_segment_confidence: 0.75
When enabled, each output segment becomes its own envelope:
- New
EnvelopeIDfor each segment ParentID = <original envelope's EnvelopeID>(audit linkage)- Same
SessionID,StreamID,StartedAt,EndedAt,Speaker,Provenanceas the parent Transcript = <segment text>Intent.Kind = <segment kind>
(b) Coreference resolution¶
If the current transcript contains a pronominal reference (it, that,
this, one, the same, do so, them) AND the previous envelope is
recent enough, the router rewrites the transcript.
router:
coreference:
enabled: true # default ON — basic mode is cheap and useful
mode: prepend-previous # prepend-previous | rewrite-llm | off
pronouns: [it, that, this, one, the same, "do so", "them"]
max_context_chars: 200
require_short_gap: 30s
prepend-previous (default): rewrite by appending the previous
transcript as parenthetical context.
Original: "do that for next Tuesday"
Previous: "remind me to email Sarah"
Rewritten: "do that for next Tuesday (referring to: 'remind me to email Sarah')"
Dumb but reliable. Transparent (downstream can see what was done). Zero LLM cost.
rewrite-llm (opt-in): sends current + previous to an LLM with a
"rewrite to be self-contained" prompt. More accurate, costs tokens.
When coreference applies, Custom["router.coreference_applied"] = true
is stamped on the envelope.
(c) Summary triggers — opt-in¶
The router accumulates envelopes per session and emits derived summary
envelopes on:
| Trigger | Default | Configurable |
|---|---|---|
Session end (orchestrator emits session.ended event) |
Always when summaries enabled | triggers.on_session_end: bool |
| Idle timeout | 10 minutes | triggers.idle_timeout: <duration> |
| Envelope count threshold | Off | triggers.envelope_threshold: <int> |
Explicit command (vox session summarize) |
Always available | — |
Two summary content modes:
| Mode | Behavior | Use case |
|---|---|---|
concatenate (default) |
Transcript = newline-joined envelopes in time order, with speaker labels prefixed | Cheap; works offline; raw transcript IS the summary |
llm |
Route the concatenated transcript through a designated LLM sink in summarize mode; LLM response becomes the summary envelope's transcript | Higher quality; costs tokens; needs LLM sink configured |
router:
summaries:
enabled: true
mode: concatenate # concatenate | llm
llm_sink: llm-anthropic # required if mode: llm
llm_prompt_template: ~/.vox/router/summarize.tmpl
triggers:
on_session_end: true
idle_timeout: 10m
envelope_threshold: 0
The emitted summary envelope:
Intent.Kind = summaryParentID = ""(derived from many envelopes — useCustom["router.source_envelope_ids"]to list them)SessionIDmatches the session being summarizedStreamID = ""(multi-stream summary)StartedAt/EndedAtspan the entire sessionProvenance.RouterImpl = "router-v1"Custom["router.summary_mode"] = "concatenate"or"llm"
Routes through the normal routing table (default summary route:
email-smtp + s3 + ox-ledger).
(d) Drop semantics¶
Configurable drop_rules — envelopes matching are dropped entirely, not
routed anywhere.
router:
drop_rules:
- intent_kind: unclassified
max_confidence: 0.3 # drop ONLY if classifier was REALLY uncertain
- intent_kind: raw_transcript
always: true # drop all raw transcripts (rare config)
- max_transcript_chars: 5 # drop anything 5 chars or less
log_dropped: true # log at DEBUG; counter always increments
Default drop rules:
unclassifiedwithconfidence < 0.3— filler ("uh", "hmm")- Any transcript ≤ 5 chars
Dropped envelopes increment router.dropped counter and emit an
audit/v1 event when audit is loaded.
Derived Envelope Provenance¶
Every router-produced envelope (split, coreference-rewritten, summary) carries uniform provenance stamping:
| Field | Value |
|---|---|
EnvelopeID |
new UUID |
ParentID |
source envelope's EnvelopeID, or "" for many-to-one (summary) |
Custom["router.source_envelope_ids"] |
for summaries: comma-separated list of contributing envelope IDs |
Custom["router.derivation"] |
"split" / "coreference" / "summary" |
Provenance.RouterImpl |
the router's identifier (e.g., "router-v1") |
Custom["router.summary_mode"] |
populated for summary envelopes |
Custom["router.coreference_applied"] |
true if coreference rewrote the transcript |
An auditor can reconstruct: "this summary covers these 47 envelopes from session X; this envelope had its transcript rewritten by coreference against envelope Y."
Router Interface¶
Router {
# Identity
Name() -> string
Capabilities() -> Capabilities
# Lifecycle
Open(config) -> Error
Close() -> Error # drains internal state with timeout
# Hot path
Route(ctx, envelope) -> []IntentEnvelope | RouterError
# Returns zero or more output envelopes:
# - zero → envelope dropped (drop_rules matched)
# - one → standard 1:1 classification
# - many → splitting fired (compound utterance)
# Asynchronous emission (summaries, late LLM classifications)
Emissions() -> <-chan IntentEnvelope
# Session lifecycle awareness (for summary triggers)
OnSessionEvent(event) # "session.started" | "session.ended" | "session.idle"
# Diagnostics
Stats() -> Stats # routed_in, routed_out, classifier_hits,
# latency_p50/p99, drops, errors
Health() -> Health
}
Capabilities {
SupportedClassifiers []string # "rules" | "local-model" | "llm"
SupportsSplitting bool
SupportsCoreference bool
SupportsSummaries bool
MaxHistoryDepth uint32
}
Orchestrator loop (informative)¶
envelope = asr.NextEnvelope()outputs, err = router.Route(ctx, envelope)- For each
outputinoutputs: orchestrator sends to sinks peroutput.Routing - Concurrently, orchestrator drains
router.Emissions()for async derived envelopes (summaries that arrive after a session-end trigger) - Orchestrator forwards session lifecycle events (
OnSessionEvent) so the router can fire summary triggers
One router per pipeline¶
router/v1 does NOT support a chain of routers. Composition happens
inside the router via the classifier chain. One router instance,
configured via classifier chain + routing table + summary settings.
Reasoning: - Chain of routers would duplicate speaker resolution, history bookkeeping, summary accumulation - A single router with internal composition is simpler to debug - Users with exotic needs can write a wrapper router that delegates internally
Error Model¶
Typed errors, mirroring capture/v1 + sink/v1:
RouterError {
Kind RouterErrorKind
Stage string # "classifier:rules" | "classifier:llm" |
# "split" | "coreference" | "summary"
Message string
Cause Error?
}
RouterErrorKind {
ErrClassifierUnavailable # specific classifier down
ErrAllClassifiersFailed # entire chain abstained or errored
ErrInvalidEnvelope # input envelope failed validation
ErrInternal # bug
ErrBudgetExceeded # latency budget hit
}
Failure handling¶
| Failure | Action |
|---|---|
| Classifier crashes / panics | Router-side recover; mark classifier unhealthy; take out of rotation for health_recovery_interval (60s default); continue with remaining classifiers |
| All classifiers abstain or below threshold | Intent.Kind = unclassified, Intent.Confidence = 0, route per routes.unclassified (default fallback: local-file) |
| Classifier returns invalid IntentKind | Treat as abstain; log structured warning; classifier stays in rotation; counter increments |
| Splitting LLM call fails | Emit the original envelope unchanged (single output); log warning |
| Coreference LLM rewrite fails | Fall through to prepend-previous (or skip if that's also failing); envelope continues |
Summary generation (llm mode) fails |
Fall back to concatenate mode and emit; orchestrator gets a summary envelope regardless |
| Router itself crashes | Orchestrator-side recover; mid-route envelopes dropped (logged + audit event); router restarted |
Route() exceeds budget |
Return whatever the chain produced so far; emit router.latency_exceeded event; count budget violation |
Key principle: the router never blocks the pipeline. Worst case is
Intent.Kind = unclassified routing to fallback — never a hung pipeline.
Route() returns output envelopes AND optionally a RouterError for
telemetry. The error is informational, never fatal — for stats and
audit only.
Audit Hook¶
When audit/v1 is loaded, the router MUST emit a RouterDecisionEvent
for every routing decision:
RouterDecisionEvent {
Timestamp Timestamp
EnvelopeID string
ParentID string?
InputTranscript string # transcript as classified (may differ from output if coreference rewrote)
OutputTranscript string # final transcript on the output envelope
Intent {
Kind IntentKind
Confidence float
Reasoning string?
}
Routing {
PrimarySink string
AlsoTo []string
Suppress []string
}
ClassifierChain []ClassifierStep # which classifiers fired, in order
Derivation string? # "split" | "coreference" | "summary" | null
LatencyMS uint32
}
ClassifierStep {
Name string # "rules" | "local-model" | "llm-anthropic"
Confidence float
Abstained bool
LatencyMS uint32
Reasoning string?
}
Routing decisions are the most consequential trust boundary in Vox — they determine where your voice content ends up. An auditor reconstructing "why did my password show up in an S3 bucket" needs exactly this data.
When audit/v1 is NOT loaded, the router emits the same data as
structured logs at DEBUG level (configurable to INFO via
router.log_decisions_at: info).
Versioning and Stability¶
router/v1 is the contract above. Once frozen:
- Non-breaking changes (allowed in
v1.x): adding optional fields toCapabilities/Stats/ configuration; adding new classifier types; adding new presets; adding new drop_rules options; adding the deferred predicate-based extra_rules (with CEL or similar) as additive opt-in. - Breaking changes (require
v2): changing the routing-table lookup semantics; changing theRoute()signature; changing how splitting / coreference / summary derivations work in a non-additive way; removing or repurposing any existing field.
The core supports one vN of router/ at a time, with overlap during
migrations.
v1.x additive features (shipped)¶
Filler-word removal¶
After the classifier sets Intent, the router optionally strips common verbal
noise tokens ("um", "uh", "like" as discourse marker, "you know", etc.) from
the transcript before it reaches sinks. Classification always runs on the raw
transcript; filler removal is a post-classification pass.
router:
filler_removal_enabled: true # default: true; set false to disable
filler_removal_words: [] # override the default list (empty = use built-in list)
Default filler list: um, uh, er, ah, hmm, like, you know, sort of, kind of,
basically, literally, actually, well, so, right, okay.
When fillers are stripped, three provenance keys are stamped on the envelope:
- router.filler_removed = true
- router.filler_count = <int> — number of distinct filler tokens removed
- router.original_transcript = <string> — the raw pre-removal transcript
If the entire utterance consists of fillers (fewer than 3 non-whitespace characters would remain), removal is skipped and the original is preserved.
Snippet templates (voice text-expander)¶
A pre-classification stage replaces a matched trigger phrase with a configured expansion, then lets normal classification run on the expanded text.
router:
snippets:
"vox slack ack": "Got it, will follow up shortly."
"vox standup status": "Yesterday: shipped X. Today: working on Y. Blockers: none."
"vox sign off email": "Best, Jay"
Matching is case- and punctuation-insensitive. When a snippet fires, three
provenance keys are stamped:
- router.snippet_expanded = true
- router.snippet_trigger = <original transcript> — what the user said
- router.snippet_match = <normalized key> — the matched trigger
Snippets default to empty (disabled). Configure via router.snippets in the
config map.
Predicate-based extra_rules (deferred to v1.x)¶
router:
extra_rules: # opt-in; evaluated AFTER routes + by_source
- when: "Intent.Kind == 'command' && contains(Transcript, 'urgent')"
add_to: [email-smtp] # additive — not replacement
- when: "Speaker.Label == 'cto'"
add_to: [s3-cto-archive]
Why not in v1:
- Requires a sandboxed expression engine (likely CEL)
- Adds a debugging dimension ("why didn't my rule fire?") that hurts
predictability
- Most use cases are already covered by by_source overrides + suppress
patterns
Path to v1.x:
- Add CEL or equivalent as the expression engine
- Gate behind router.extra_rules.enabled flag
- Document the expression DSL with examples
- Ship as additive — no breaking change to v1's table-based config
Reference Implementation¶
One built-in router ships in v1: router-v1 (aka default). It
implements everything above.
Users can swap it for a custom router via router.implementation: <name>,
but in practice the built-in covers the entire grilled design space.
Enterprise: router-rbac-audit (planned)¶
Same router/v1 contract, adds:
- RBAC checks before allowing certain routes (e.g., "envelopes from
speakers without
compliance:readcannot route to LLM sinks") - Immutable audit storage integration (
authz/v1+ enterprise audit store) - Per-team / per-user routing policy overrides
Lives in the enterprise repo. Loaded via the same registry mechanism; swap-in is a config change.
Project Principle: Opinionated Defaults, Every Default Configurable¶
This contract continues the principle from capture/v1 and sink/v1.
Every behavior with a defensible default (min_confidence_threshold: 0.7,
history_depth: 1, coreference.mode: prepend-previous,
summaries.mode: concatenate, the meetings-and-dictation preset, etc.)
is exposed as a config knob. Defaults reflect a considered recommendation
for the typical voice-to-LLM use case; the knobs exist so specialized
workflows can tune them.