sink/v1 — Intent Envelope Output Contract¶
Status: Draft (design-locked, ready for first implementation) · Stability: v1 will be frozen with the first reference sinks (
local-file,llm-anthropic) · Implementations: in-tree only until v1 is frozen.
The sink/v1 surface is the exit of every Vox pipeline. A sink is
anywhere a structured intent envelope ends up: a Large Language Model with
the user's own API key, an S3-compatible bucket, an email summary, a bd
task, a local JSONL file. Sinks share a single base contract; each sink
type layers domain-specific behavior on top.
This is the contract that captures the entire purpose of the system. The
intent envelope schema and the sink interface together define what
everything upstream (capture → segment → asr → router) must produce.
Scope¶
sink/v1 covers:
- The intent envelope schema — the unit of data flowing into sinks
- The base sink interface — lifecycle, write path, completion reporting, error model
- The multi-sink orchestration model — fan-out semantics, routing, and failure isolation
- Reference contracts for the six built-in sink families: LLM (BYOK), S3-compatible storage, email (SMTP + transactional), local file, bd
- The sink registry — registration mechanism shared between open and enterprise sinks
sink/v1 does not cover:
- How envelopes are produced (
segment/v1,asr/v1,router/v1) - Tool execution after an LLM tool-call response (separate
tools/v1surface) - Identity / RBAC (enterprise:
authz/v1) - Audit storage details (
audit/v1; sinks emit audit events but don't manage the audit store)
The Intent Envelope¶
The intent envelope is the load-bearing data structure of Vox. Every upstream stage produces them; every sink consumes them.
IntentEnvelope {
# Identity
EnvelopeID string # uuid4
SessionID string # matches the capture-side SessionID; groups envelopes by session
StreamID string # the capture stream this envelope came from
ParentID string? # optional; for envelopes derived from another (LLM responses, summaries, re-routes)
# Time span
StartedAt Timestamp # wall-clock of the first audio sample this envelope covers
EndedAt Timestamp # wall-clock of the last sample
Duration Duration # convenience; derived
# Content
Transcript string # the words; UTF-8
Language string? # BCP-47 (e.g., "en-US"); optional if not detected
Confidence float # 0.0–1.0; ASR-side confidence
# Speaker
Speaker {
Label string # e.g., "self", "remote-1", "lapel-jane"; opaque to sinks
SourceKind SourceKind # "self" | "in-person" | "online" | "file"
Embedding []float? # optional; only present if diarization includes embeddings
}
# Intent classification (set by router)
Intent {
Kind IntentKind # "prompt" | "command" | "todo" | "note" | "question" |
# "summary" | "raw_transcript" | "llm_response" | "unclassified"
Confidence float
Reasoning string? # optional; classifier's rationale (for audit / debug)
}
# Routing metadata (set by router; consumed by orchestrator + sinks)
Routing {
PrimarySink string # which sink is this primarily for? (e.g., "llm-anthropic")
AlsoTo []string # fan-out sinks (e.g., ["s3", "bd"])
Suppress []string # explicit "do NOT send to" list (kill-switch for sensitive content)
}
# Provenance
Provenance {
ASRBackend string # e.g., "whisper-cpp", "deepgram", or "llm:anthropic:claude-opus-4-7" for derived envelopes
ASRVersion string? # backend version when known
SegmenterImpl string # which segment impl produced this span
RouterImpl string # which router classified it
CapturedAt Timestamp # original audio capture wall-clock (== StartedAt for non-derived envelopes)
Pipeline string # e.g., "vox/0.1.0"
}
# Optional payloads
AudioRef AudioRef? # optional; pointer to the audio for this span
Custom map<string, any> # adapter / sink-specific extensions; sinks ignore unknown keys
}
AudioRef {
Location string # "s3://bucket/key", "file:///path", "memory://session/stream/span"
Encoding Encoding # "f32" | "i16" | "wav" | "opus" | "flac"
SampleRate uint32?
Channels uint8?
Bytes []byte? # populated for "memory://" refs only
}
Design choices baked in¶
- Envelope, not "message" or "event" — these things ARE the unit of intent flowing through the system; the name reflects that. They carry transcript + intent + routing + provenance.
- Routing is part of the envelope. The router stamps routing decisions
onto the envelope; the orchestrator and sinks read them. This is the
alternative to a pub/sub topology where the router does fan-out itself.
The envelope-carries-routing model lets sinks filter ("I only handle
intent: todo") without the router needing to know every sink. - Provenance is mandatory. When something goes wrong (bad transcript, miscategorized intent), the envelope itself carries who produced what. Critical for debugging and for the "tamper-evident audit log" enterprise feature.
AudioRefis optional and decoupled. Sinks that don't need audio (LLM, email summary) never load it. Sinks that do (S3 archive) follow the ref. Keeps envelope size small in the hot path.Customis the "additive growth" hatch. Sink-specific extensions that don't deserve top-level fields live here. Convention: namespace by sink name (anthropic.*,s3.*) to prevent collisions. Sinks ignore unknown keys.
IntentKind enum¶
| Kind | Meaning |
|---|---|
prompt |
A direct request to an LLM ("write me a haiku about cats") |
command |
An imperative addressed to the system or a tool ("create a bd issue for this") |
todo |
A self-noted action item ("remember to email Sarah about the deck") |
note |
An observation or thought worth capturing but not actionable |
question |
A question the user wants answered (possibly by an LLM) |
summary |
A summary span generated by an upstream stage |
raw_transcript |
An unclassified transcript chunk; the catch-all when the router has no high-confidence label |
llm_response |
A derived envelope created by an LLM sink containing the response to a prior envelope |
unclassified |
Router couldn't classify with sufficient confidence |
Schema versioning is implicit via the contract version. The sink/v1
contract defines this envelope shape. Future schema changes ship as v1.x
(additive — new optional fields, new IntentKind values, new Encoding values)
or v2 (breaking — removed or repurposed fields).
Base Sink Interface¶
Every sink — LLM, S3, email, local file, bd, and any community / enterprise sink — implements this:
Sink {
# Identity
Name() -> string
Capabilities() -> Capabilities
# Lifecycle
Open(config) -> Error
Close() -> Error # drains internal queue with timeout, then closes
# Hot path
Write(ctx, envelope) -> WriteResult
WriteBatch(ctx, envelopes) -> []WriteResult # optional; only if SupportsBatch
# Async results
Completions() -> <-chan Completion
# Diagnostics
Stats() -> Stats # accepted, rejected, delivered, retried, dead-lettered, queue_depth
Health() -> Health # ok | degraded | unhealthy + reason
}
Capabilities {
SupportsBatch bool
AcceptsIntentKinds []IntentKind # empty = all
RequiresAudio bool # true for sinks that need AudioRef populated
RequiresLLMResponse bool # true for sinks that only process derived llm_response envelopes
MaxQueueDepth uint32 # internal queue capacity (default 1000)
}
WriteResult {
Accepted bool # sink accepted the envelope into its internal pipeline
EnvelopeID string # for correlating completion events
RejectReason string? # if !Accepted ("queue_full" | "invalid_envelope" | "sink_down")
}
Completion {
EnvelopeID string
State CompletionState # delivered | retrying | permanently_failed | discarded
Attempt uint32 # 1-based; how many tries it took
CompletedAt Timestamp
Error Error? # populated when State != delivered
Detail map<string, any> # e.g., LLM response ID, S3 object key, message ID
}
Caller-side sync, sink-internal async¶
Write returns fast (typically microseconds). It only confirms the sink
accepted the envelope into its internal queue.
- Sync sinks (in-memory, local file, bd): set
Completed: trueinWriteResultand never emit aCompletionevent. Caller code that doesn't drainCompletions()still works. - Async sinks (LLM, S3, email): queue the envelope; do network work in
background workers; emit a
Completionevent when final state is known.
Per-sink isolation is mandatory. A failing LLM sink MUST NOT prevent the S3 sink from receiving the same envelope. Each sink runs its own internal worker pool.
Backpressure¶
Each sink has a bounded internal queue (default 1000 envelopes — much higher than capture buffer because envelopes are infrequent at human-speech rate).
When the queue is full:
- Write returns Accepted: false, RejectReason: "queue_full".
- Orchestrator's per-sink on_accept_failure policy decides what's next:
skip (default — log and move on), dead_letter (route to dead-letter
sink), or halt (stop the orchestrator and alert).
Error model¶
Same typed-error scheme as capture/v1:
Error {
Kind ErrorKind # see below
Sink string # sink name
Op string # which method was being called
Message string # human-readable, no PII
Cause Error? # optional wrapped cause
}
ErrorKind {
ErrInvalidConfig
ErrAuthFailed
ErrQuotaExceeded # rate-limit / quota
ErrInvalidEnvelope # envelope didn't pass sink-side validation
ErrSinkUnavailable # provider down, network unreachable
ErrPermissionDenied # sink-specific permission issue
ErrPersistent # logically permanent; no retry will help
ErrTransient # retry may help
ErrInternal
}
Retry policy defaults¶
The orchestrator and sinks together implement a typed retry policy. All values configurable per sink.
| Failure type | Default behavior |
|---|---|
| Transient (timeout, 5xx, connection reset) | Retry 5 times with exponential backoff (1s, 2s, 4s, 8s, 16s) |
Auth (401, 403) |
1 retry (in case of transient auth glitch), then permanent fail |
Quota / rate-limit (429) |
Retry with Retry-After honored, up to 10 attempts |
Permanent (400, 404, malformed envelope) |
No retry; dead-letter immediately |
| Unknown | Treat as transient (5 retries with backoff) |
Dead-letter destinations¶
| Destination | Default? | Notes |
|---|---|---|
log (structured WARN line) |
Always on | Cheap; envelope summary, not full body |
file://<path> |
Off (configurable) | Append-only JSONL of failed envelopes; replayable |
| Re-queue to another sink | Off (configurable) | Powerful but easy to misconfigure |
audit/v1 event |
On when audit/v1 is loaded |
Compliance hook |
Multi-Sink Orchestration¶
The orchestrator owns the loop that delivers each envelope to its target sinks. Its job is small but precise.
Routing — both-layer model¶
Two layers, both must pass for a sink to receive an envelope.
Layer 1 (positive intent): the envelope's Routing block:
- PrimarySink — the canonical destination (e.g., "llm-anthropic")
- AlsoTo — fan-out destinations (e.g., ["s3", "bd"])
- Suppress — explicit kill-switch list (e.g., ["s3"] to keep this
envelope out of archive)
Layer 2 (negative filter): each sink's declarative filter block in
its config:
filter:
intent_kinds: [prompt, todo, command] # include only these
source_kinds: [self, online] # not in-person
min_confidence: 0.7 # require ASR confidence ≥ 0.7
include_derived: true # include llm_response envelopes
Final delivery rule: a sink receives an envelope iff
sink_name ∈ (PrimarySink ∪ AlsoTo)
AND sink_name ∉ Suppress
AND sink.filter accepts envelope
This gives the router product-level intent ("send prompts to LLMs, commands
to bd, summaries to email") while letting individual sinks defend themselves
("I don't care if the router says to send me everything, my filter says no
raw_transcript").
Delivery order — parallel by default¶
Sinks must not depend on each other for their work (per-sink isolation, see above). Dependencies between sinks are expressed as data flow, not delivery order:
- An LLM sink that produces a derived
llm_responseenvelope can route that envelope to other sinks (email, S3, bd) via standard routing. - The LLM response is a first-class envelope, not a callback or side effect.
This keeps the orchestrator simple and prevents subtle "sink A succeeded but sink B failed, now what?" coupling.
Sink discovery¶
v1: static config. Sinks are declared in the user's config file at startup. The orchestrator validates each declared sink against the registry, opens it, then starts routing. Adding or removing sinks requires a restart.
Hot-reload of sink config is out of scope for v1; planned for v2 or the enterprise edition where multi-tenant runtime config matters.
Orchestrator error handling¶
| Failure | Orchestrator action |
|---|---|
Single sink rejects at Write |
Log, increment counter, continue with other sinks |
Sink's Health() returns unhealthy |
Take out of rotation; periodically probe; auto-restore when healthy; emit audit event |
| All sinks for an envelope reject | Dead-letter the envelope (configurable destination); emit ERROR log; emit audit event |
Completions() reports permanent failure |
Per-sink dead-letter policy applies; envelope-level retry-via-different-sink is opt-in (off by default) |
Built-in Sinks¶
LLM (BYOK)¶
The most product-defining open-core sink. When this sink is used, the open-core code calls the configured provider's API directly with the user's BYOK credential — no Vox proxy in the open-core code path.
Bundled-LLM — a Vox-subscription-token-driven sink that proxies
through Vox's cloud service to provider APIs with aggregate-volume
credentials — is implemented in the enterprise repo as a separate
llm-bundled sink type. It implements this same sink/v1 LLM interface;
the open-core pipeline doesn't distinguish between the two. The
proprietary piece is the Vox cloud service that issues subscription tokens
and proxies provider traffic.
LLMSink : Sink {
Provider() -> ProviderInfo
Models() -> []ModelInfo
Frame(envelope) -> LLMRequest # default framing; user can override via templates
SubscribeStream(envelopeID) -> <-chan StreamChunk | error
}
ProviderInfo {
Name string # "anthropic" | "openai" | "google" | "mistral" | "groq" |
# "ollama" | "llamacpp" | "azure-openai" | "bedrock"
Endpoint string
AuthType AuthType # "api_key" | "oauth" | "iam" | "none"
SupportsStreaming bool
SupportsTools bool
SupportsVision bool
}
LLMRequest {
Model string
SystemPrompt string
Messages []Message
Tools []ToolDef?
MaxTokens uint32
Temperature float
Stream bool
# Provenance — passed through but not sent to provider
EnvelopeID string
SessionID string
Custom map<string, any> # provider-specific knobs
}
LLMResponse {
EnvelopeID string
ResponseText string
ToolCalls []ToolCall?
UsageTokens UsageInfo
CompletedAt Timestamp
StreamChunks []StreamChunk?
}
Design choices:
- One LLMSink instance = one provider + one credential. Two Anthropic
accounts = two
llm-anthropicsinks. Provider-switching is config-time, not runtime. - Provider-agnostic at the contract layer, provider-native at the wire
layer. Common fields (model, messages, tools, max_tokens, temperature,
stream) translate to each provider's native API. Provider-specific knobs
go in
LLMRequest.Custom— sinks pass through, sinks that don't care ignore. - BYOK is in the sink config, NOT in the envelope. Envelopes are credential-less. Credentials live in the sink's startup config (env var, OS keychain, etc.). An envelope can fan out to N LLM sinks with N different providers and credentials without re-stamping.
- Framing is separable from sending.
Sink.Frame(envelope) -> LLMRequestis exposed so users can override framing per sink via templates ("forintent: prompt, system prompt is X; forintent: command, it's Y"). Auditable separately from credential / endpoint. - Response capture as derived envelopes. When the LLM responds, the sink
emits a
CompletionAND optionally creates a derivedIntentEnvelopewithParentID = original envelope ID,Intent.Kind = "llm_response",Transcript = ResponseText,Provenance.ASRBackend = "llm:anthropic:claude-opus-4-7". This derived envelope routes through the normal pipeline — append to bd, archive to S3, email summary.
v1.x additive: per-intent system prompts¶
The llm-anthropic sink (and by convention all LLM sinks) supports a
system_prompts map that selects a different system prompt based on the
envelope's Intent.Kind. This lets you tune LLM style per intent without
duplicating sink configuration.
sinks:
- name: my-anthropic
type: llm-anthropic
system_prompts:
prompt: "You are a helpful assistant answering a dictated request. Reply directly."
question: "You are a helpful assistant. Reply directly and concisely."
command: "You are receiving an imperative command. Interpret it as an action request."
todo: "Elaborate this todo into a clear action item with concrete next steps."
note: "Polish this note into clean meeting-note style. Don't expand; just clean it."
default: "You are a helpful assistant called by Blackrim Vox."
# Legacy single-prompt form still works when system_prompts is absent:
# system_prompt: "..."
Lookup order at Write time:
1. system_prompts[envelope.Intent.Kind] — per-intent override
2. system_prompts["default"] — catch-all override
3. system_prompt field — old-style single prompt
4. Compiled-in default
Backward compatible: existing configs using only system_prompt continue to
work unchanged.
Authentication precedence¶
Credential lookup order at Open():
1. Explicit env var (ANTHROPIC_API_KEY, etc.)
2. OS keychain (macOS Keychain, Windows Credential Manager, libsecret) — default
3. Config file (~/.vox/credentials.yaml) — deprecated; warns if used
4. External secrets manager (Vault, AWS Secrets Manager, 1Password CLI) via
a future secrets/v1 extension
vox auth set anthropic is the user-facing onboarding command — prompts for
the key, stores in OS keychain, confirms. Easy path = secure path.
Streaming¶
- Default: stream for
promptandquestionintent kinds; non-stream forsummaryandnote. Configurable per sink + per intent. - Stream chunks emitted on a per-envelope subscribable channel via
Sink.SubscribeStream(envelopeID). - If the caller doesn't subscribe, chunks are buffered until completion and
returned in
LLMResponse.StreamChunks. - Stream chunks are NOT re-emitted as separate envelopes — only the assembled final response becomes a derived envelope. Partial streams are a UI affordance, not a pipeline data type.
Tool definitions¶
For envelopes with Intent.Kind = "command", the sink MAY pass tool
definitions to the provider. Tools are configured per sink, not per
envelope.
sinks:
- name: anthropic-with-tools
type: llm-anthropic
model: claude-opus-4-7
tools:
- name: bd_create
description: "Create a task in beads"
# tool definition follows JSON Schema
This keeps the envelope simple (carries intent only) and tools auditable (you know what each LLM sink CAN do without reading every envelope).
Tool execution is out of scope for sink/v1 — tools are declared to the
LLM here, but the execution layer that handles a tool-call response and
dispatches the action is a separate concern (planned tools/v1).
Tier-1 providers (ship with v1)¶
| Provider | Notes |
|---|---|
| anthropic | Claude Sonnet / Opus |
| openai | GPT-4o / o1 / o3 |
| Gemini Pro / Flash | |
| ollama | Local-first; the "everything local" path |
Tier-2 providers (community-contributable, same contract): mistral, groq, llamacpp, azure-openai, bedrock.
S3-compatible storage¶
Archive sink for envelopes + audio. Tier-1 across AWS S3, Cloudflare R2, Backblaze B2, Wasabi, MinIO (self-hosted).
Object key schema¶
{prefix}/sessions/{session_id}/envelopes/{envelope_id}.json
{prefix}/sessions/{session_id}/audio/{envelope_id}.{ext}
{prefix}/sessions/{session_id}/manifest.json
{prefix}— user-configured base path (e.g.,vox/prod/){session_id},{envelope_id}— from the envelope- Audio extension depends on encoding (
opus,flac,wav)
Session-rooted by default — cross-midnight sessions stay together. Date-rooted path strategy is opt-in:
s3:
path_strategy: session-rooted # session-rooted | date-prefixed
Date-rooted yields {prefix}/{YYYY}/{MM}/{DD}/sessions/{session_id}/...
which is convenient for lifecycle rules.
What gets written¶
Two objects per envelope: envelope JSON + audio (when applicable). Per-envelope PUTs — cost is negligible at speech-rate envelope volumes (≈ $0.007 per hour-long meeting at AWS S3 pricing).
A manifest.json is built incrementally per session, flushed every 30s by
default, finalized at session close. Lets a consumer pull a single object
to enumerate an entire session without listing under the prefix.
Audio encoding¶
| Format | 1hr 16kHz mono | Lossless | Speech-tuned |
|---|---|---|---|
| Opus (default) | 7-15 MB | No | Yes |
| FLAC | 50-70 MB | Yes | No |
| WAV | 115 MB | Effectively | No |
none |
0 | — | — |
Default opus_bitrate: 24000 (24 kbps — near-transparent for speech).
Object metadata¶
Every object stamped with metadata keys for lifecycle filtering without parsing content:
| Metadata key | Value |
|---|---|
x-amz-meta-vox-session-id |
UUID |
x-amz-meta-vox-stream-id |
UUID |
x-amz-meta-vox-source-kind |
self / in-person / online / file |
x-amz-meta-vox-intent-kind |
prompt / command / etc. |
x-amz-meta-vox-captured-at |
ISO 8601 |
x-amz-meta-vox-retention-policy |
default / compliance / pii / custom |
x-amz-meta-vox-schema-version |
v1 |
Authentication¶
AWS SDK is the underlying client. Same credential precedence as LLM sink
(env → keychain → profile file → external secrets manager). Provider-agnostic
config — endpoint URL specifies the destination provider:
sinks:
- type: s3
endpoint: https://s3.amazonaws.com # or r2.cloudflarestorage.com, s3.wasabisys.com, etc.
region: us-east-1
bucket: vox-archive
prefix: vox/prod/
auth:
method: keychain
credential_name: vox-s3-aws
path_strategy: session-rooted
audio:
encoding: opus
opus_bitrate: 24000
manifest:
flush_interval: 30s
encryption:
sse: aws:s3 # aws:s3 (default) | aws:kms | none
kms_key_id: ""
iam-role auth method uses AWS SDK's automatic IMDS lookup for
EC2 / ECS / Lambda contexts.
Server-side encryption¶
aws:s3(default) — provider-managed SSE-S3 (AES-256)aws:kms— SSE-KMS with user's key (compliance scenarios)none— for self-hosted MinIO without encryption
Client-side encryption is out of scope for open-core sink/v1 —
compliance-tier feature that belongs in the enterprise repo where
audit / key-management infrastructure can support it properly.
Email¶
Tier-1 across SMTP and the major transactional providers.
Transport¶
Pluggable transports. Each ships as a separate sink registration with the same envelope-handling contract:
| Sink name | Transport |
|---|---|
email-smtp |
SMTP (any relay: Gmail, ProtonMail, self-hosted Postfix, etc.) |
email-sendgrid |
SendGrid HTTP API |
email-postmark |
Postmark HTTP API |
email-mailgun |
Mailgun HTTP API |
email-resend |
Resend HTTP API |
email-ses |
AWS SES |
Credential precedence is the standard chain (env → keychain → config → secrets manager).
EmailSink : Sink {
Transport() -> string
TestSend(ctx) -> Error # send a test message; validates transport config
}
Triggering modes¶
| Mode | When email is sent | Use case |
|---|---|---|
per-envelope |
One email per envelope received | Dictation → email; immediate forwarding |
per-session (default for online/in-person) |
Accumulate envelopes; send one summary at session close | Meeting summaries — the canonical case |
scheduled |
Daily / weekly digest of envelopes matching a filter | "Friday 5pm summary of what I dictated this week" |
Configurable per sink:
sinks:
- name: meeting-summary
type: email-smtp
trigger: per-session
flush_idle_after: 5m # send after N minutes of no new envelopes
flush_max_wait: 4h # hard cap
- name: dictation-forward
type: email-smtp
trigger: per-envelope
filter:
intent_kinds: [prompt, note]
source_kinds: [self]
- name: weekly-digest
type: email-smtp
trigger: scheduled
schedule: "0 17 * * FRI" # cron
digest_window: 7d
Templates¶
Go templates (text/template + html/template). Three bundled defaults:
default-summary.html.tmpl—per-sessionmeeting summary carddefault-envelope.html.tmpl—per-envelopesingle transcriptdefault-digest.html.tmpl—scheduleddigest list
Users override per sink:
template:
subject: "Meeting: {{ .Session.Title }}"
html_path: ~/.vox/templates/team-summary.html.tmpl
text_path: ~/.vox/templates/team-summary.txt.tmpl
Template context exposes .Session, .Envelopes, .LLMResponses, .Stats.
Recipient determination¶
Three-level precedence:
- Envelope override —
envelope.Custom.email.to: [...]wins (highest) - Session participants — orchestrator-supplied; sink uses when configured
- Sink-config recipients — static fallback (lowest)
recipients:
to: ["[email protected]"]
cc: []
bcc: ["[email protected]"]
use_session_participants: true
Threading¶
Stable Message-ID derived from SessionID (for per-session) or
digest_window_start (for scheduled).
- Per-session first send:
<{session_id}@vox.local> - Per-session re-flush (long session):
<{session_id}-{flush_n}@vox.local>withIn-Reply-To: <{session_id}@vox.local> - Scheduled:
<digest-{date}@vox.local>withIn-Reply-To: <digest-{previous_date}@vox.local>for chained digests
Hostname (vox.local) is configurable; defaults to the configured SMTP /
API domain.
Attachments¶
| Content | Behavior |
|---|---|
| Transcript text/markdown | Attached when ≤ 100 KB; inline in body otherwise |
| Audio bytes | Never attached. Linked via S3 URL in body if S3 sink also fired |
| HTML summary | Inline (multipart/alternative with text fallback) |
| PDF summary | Out of scope for v1 (add later via separate pdf-render sink → email) |
Filtering¶
Standard sink filter block:
filter:
intent_kinds: [prompt, todo, command]
source_kinds: [self, online]
min_confidence: 0.7
include_derived: true
Local file¶
Simplest sink. Useful as a no-cloud fallback, dev/test substrate, and the default sink for self-hosters who want zero network.
Mirrors the S3 sink's key schema so users can run both side-by-side or migrate between them without rethinking layout.
sinks:
- type: local-file
base_dir: ~/.vox/archive
format: jsonl # jsonl (default) | json-array | sqlite
path_strategy: per-session # per-session (default) | per-day | single-file
audio: sidecar # sidecar (default) | embed-base64 | none
rotation: none # none | daily | size:100MB (for single-file mode)
compress: none # none | gzip
fsync_every: "" # paranoid mode: fsync after N envelopes or duration
Default layout:
{base_dir}/sessions/{session_id}.jsonl
{base_dir}/sessions/{session_id}/audio/{envelope_id}.opus
JSONL is append-friendly, line-oriented, streamable, and grep-able — the
standard for envelope-style streams. json-array and sqlite are
available for users who want different ergonomics.
ox-ledger (SageOx team-context ledger)¶
Writes envelopes as murmurs into a SageOx (ox)
ledger directory. Murmurs are git-tracked JSON files in
data/murmurs/YYYY-MM-DD/HH/<id>.json that ox uses to share team context
across humans and AI coding agents.
This sink turns voice-captured intent into team-shared context that any
ox-integrated AI coworker automatically loads via ox agent prime. Full
integration design + upstream-coordination tracker:
docs/integrations/ox.md.
sinks:
- name: team-ledger
type: ox-ledger
ledger_dir: ~/.sageox/ledger # auto-detected from ~/.sageox/config.yaml when blank
agent_id_template: "vox-{{ .InstanceID }}"
agent_type: vox # appears in ox UI / queries
topic_template: "voice/{{ .Envelope.Speaker.SourceKind }}/{{ .Envelope.Intent.Kind }}"
importance_template: "{{ if gt .Envelope.Intent.Confidence 0.8 }}normal{{ else }}ambient{{ end }}"
scope: team # "team" | "ledger"
schema_version: "1" # ox murmur schema version
git_commit_interval: 30s # batch commits to avoid repo bloat
git_auto_push: false # user / ox daemon owns push
prefer_daemon_ipc: true # use ox daemon when reachable; direct write otherwise
filter:
intent_kinds: [prompt, command, todo, note, summary, llm_response]
source_kinds: [self, in-person, online]
min_confidence: 0.6
Envelope → murmur mapping (abridged; full table in
docs/integrations/ox.md):
| ox murmur field | Vox envelope source |
|---|---|
id |
EnvelopeID |
timestamp |
StartedAt |
agent_id / agent_type |
sink config (vox-{instance} / vox) |
principal_id / principal_type |
derived from Speaker.Label / Speaker.SourceKind |
topic / importance |
rendered from topic_template / importance_template |
content |
Transcript |
metadata |
namespaced vox.* keys with session_id / stream_id / intent_kind / confidence / audio_ref / etc. |
tags |
[vox, source:<kind>, intent:<kind>] plus user-supplied |
scope |
sink config (team default) |
Performance: file writes are immediate; git commits are batched (default every 30s, configurable). Never auto-pushes — that's the user's or ox daemon's responsibility.
Two integration modes:
- Direct write (default fallback): Vox writes JSON files into the
ledger directory and runs
git add+git commiton the batch interval. Works when ox daemon isn't running. - Daemon IPC (preferred when ox daemon is reachable): Vox sends
murmurs to the ox daemon via the adapter-protocol IPC; the daemon
handles file I/O + commit serialization. Requires the
ox-adapter-voxbinary (ships in v1.1) to be installed.
Detection: ledger_dir auto-detects from
- OX_LEDGER_DIR env var
- ~/.sageox/config.yaml (when ox is installed)
- explicit ledger_dir field in sink config (overrides both)
If none resolve, the sink's Open() returns ErrInvalidConfig with a
human-readable pointer to the ox setup docs.
bd (Beads task tracker)¶
Envelopes with Intent.Kind = "todo" or "command" naturally become bd
issues when bd is available in the host project. The bd sink wraps
bd create and bd update.
sinks:
- type: bd
filter:
intent_kinds: [todo, command]
title_template: "{{ truncate 80 (firstSentence .Envelope.Transcript) }}"
description_template: "" # empty = bundled default
default_type: task # bd issue type
default_priority: p2
auto_claim: false
include_s3_link: true # auto-detect S3 sink output, link in description
Idempotency: envelope_id is the dedup key. On retry, the sink looks up existing issues by description-prefix containing the envelope_id; if found, updates instead of creating.
bd not present: Open() returns ErrUnsupported; orchestrator marks
the sink unhealthy; other sinks continue.
bd remote sync: the sink does NOT push. The host project's normal bd
workflow (bd dolt push) handles that. Keeps the sink in its lane.
Sink Registration¶
Sinks register via a single, documented entry point:
RegisterSink(name: string, factory: (config) -> Sink)
Registration is package-init in Go, equivalent in other languages. The core maintains a single registry; duplicate names panic at startup (intentional).
Enterprise plugins register against the same registry — the core does not distinguish open vs. enterprise sinks at the loader level.
Built-in v1 sinks¶
| Name | Family | Tier | Notes |
|---|---|---|---|
local-file |
local | 1 | Reference impl; simplest possible sink |
bd |
bd | 1 | Task tracker integration |
ox-ledger |
integration | 1 | SageOx team-context ledger writer; see docs/integrations/ox.md |
llm-anthropic |
llm | 1 | Primary LLM; flagship provider |
llm-openai |
llm | 1 | GPT-4o / o1 / o3 |
llm-google |
llm | 1 | Gemini Pro / Flash |
llm-ollama |
llm | 1 | Local-first LLM path |
llm-mistral |
llm | 2 | Mistral Large / Codestral |
llm-groq |
llm | 2 | Llama 3 / Mixtral fast inference |
llm-llamacpp |
llm | 2 | Embedded; reference Whisper.cpp pattern |
llm-azure-openai |
llm | 2 | Enterprise-Azure path |
llm-bedrock |
llm | 2 | Enterprise-AWS path |
s3 |
s3 | 1 | AWS S3 + Cloudflare R2 + B2 + Wasabi + MinIO |
email-smtp |
1 | Universal; works with any SMTP relay | |
email-sendgrid |
1 | SendGrid transactional | |
email-postmark |
1 | Postmark transactional | |
email-mailgun |
2 | Mailgun transactional | |
email-resend |
2 | Resend transactional | |
email-ses |
2 | AWS SES |
Tier 1 = ships with first stable release. Tier 2 = community-contributable adapter slots that follow the same contract.
Configuration Schema¶
Top-level configuration for the entire sink layer:
sinks:
- name: my-anthropic # unique sink instance name
type: llm-anthropic # registered factory name
# ... type-specific config
- name: my-archive
type: s3
# ...
orchestrator:
on_accept_failure: skip # skip (default) | dead_letter | halt
dead_letter:
log: true # always on; structured WARN
file: ~/.vox/dead-letter.jsonl # optional; off by default
audit: true # on when audit/v1 loaded
envelope_retry_via_different_sink: false # opt-in chaining on permanent fail
Each sink config inherits the standard filter: block and the
sink-type-specific fields documented in its section above.
Versioning and Stability¶
sink/v1 is the contract above. Once frozen:
- Non-breaking changes (allowed in
v1.x): adding optional fields with sensible defaults toIntentEnvelope,WriteResult,Completion,Capabilities, orStats; adding newIntentKindvalues; adding newErrorKindvalues; adding new built-in sinks; adding new providers under existing sink families. - Breaking changes (require
v2): removing or renaming any existing field or method; changing the meaning of an existing field; changing the envelope schema in any non-additive way; changing the routing semantics.
The core supports one vN of sink/ at a time, with overlap during
migrations. Sinks declare which version they target via their Name()
return value or a parallel SupportedVersions() method (TBD before freeze).
Reference Implementations (Build Order)¶
| Order | Sink | Family | Why this order |
|---|---|---|---|
| 1 | local-file |
local | First. Zero dependencies; validates the base interface + envelope schema end-to-end; testable in CI without any network |
| 2 | llm-anthropic |
llm | First BYOK LLM; proves the auth precedence + streaming + derived-envelope flow |
| 3 | bd |
bd | Validates idempotency + intent-kind filtering; tightly bounded scope |
| 4 | ox-ledger |
integration | Validates git-batching + templated murmur emission; unblocks the SageOx integration (the highest-leverage product partnership) |
| 5 | email-smtp |
Hardest of the triggering modes (per-session); proves the orchestrator's batching path |
|
| 6 | s3 |
s3 | Most config surface; lifecycle / encryption / metadata to validate |
| 7+ | other LLM providers, transactional email | various | Same contract, different wire protocols |
Build the sinks in this order; build the orchestrator with local-file
alone first; add sinks one at a time. The orchestrator and base interface
stabilize before any network-bound sink lands.
Project Principle: Opinionated Defaults, Every Default Configurable¶
This contract continues the principle established in capture/v1. Every
behavior with a defensible default (buffer_frames: 1000,
retry_max_attempts: 5, audio.encoding: opus,
triggering.flush_idle_after: 5m, etc.) is exposed as a config knob. The
defaults reflect a considered recommendation for the typical voice-to-LLM
+ archive + summary use case; the knobs exist so specialized workflows can
tune them.