Architecture
Ingest pipeline, cost engine, alert engine, schema validator, storage, and how it all fits together.
TokenJam is a small set of components arranged as a pipeline. Spans come in via three paths, get normalized, get evaluated against cost, alerts, and schema rules, then land in DuckDB. Everything after the pipeline (CLI, web UI, MCP, Prometheus) is a read on DuckDB.
Component diagram
@watch + patch_*@tokenjam/sdkTjSpanExporterPOST /api/v1/spanspricing.tomltj CLI:7391:7391/metricsIngest paths
Three ways spans enter the system, normalized into the same shape:
- OTLP: for agents that already emit OpenTelemetry (Claude Code, Codex, OpenAI Agents SDK, LlamaIndex, etc.).
- Python SDK:
@watchplus provider/framework patches send spans viaTjSpanExporter. - TypeScript SDK: explicit
SpanBuilder→ HTTP POST.
IngestPipeline
Every incoming span flows through three stages:
- Sanitize. Strip content per
[capture]config, normalize timestamps, validate required fields. - Session continuity. Spans missing
session.idare attributed to the most recent open session for that agent. Long-running sessions get auto-rotated on idle. - Extract. Pull GenAI SemConv attributes into typed columns for cheap querying. Raw attributes remain available.
CostEngine
pricing.toml maps (provider, model) to per-token prices. Updated when providers change pricing. The engine prices every LLM call as it ingests. Costs are stored alongside spans, not computed at query time.
AlertEngine
Evaluates 13 alert types against each span and across windowed batches. Cooldowns prevent storms. Channels dispatch async (ntfy, Discord, Telegram, webhook, stdout, log). See Alerts.
SchemaValidator
For tool calls, either:
- Declares a JSON Schema in your config under
[agents.<id>.tools.<name>.schema], or - Lets the validator infer a schema from the first N successful calls.
Subsequent violations emit a schema_violation alert.
Storage
DuckDB. One file, embedded, no separate server. Read-write held by tj serve; read-only mode used by the MCP server and CLI so they can coexist. Retention pruning runs on a daily schedule.
Why DuckDB?
Three reasons:
- Columnar. Span queries are aggregations across millions of rows, exactly what DuckDB optimizes for.
- Embedded. No daemon to install, no port to open, no auth to manage.
- SQL. Existing OTel queries port directly; advanced users can drop to SQL via
tj export --format json+ their own tools.
Full design notes and contribution guide: AGENTS.md in the OSS repo.