Architecture — TokenJam Docs

TokenJam is a small set of components arranged as a pipeline. Spans come in via three paths, get normalized, get evaluated against cost, alerts, and schema rules, then land in DuckDB. Everything after the pipeline (CLI, web UI, MCP, Prometheus) is a read on DuckDB.

Component diagram

Your agent

↓

Coding agents

Claude Code · Codex

Python SDK

@watch + patch_*

TypeScript SDK

@tokenjam/sdk

↓

OTLP export

TjSpanExporter

POST /api/v1/spans

↓

IngestPipeline

Sanitize · Session continuity · Extract

↓

CostEngine

pricing.toml

AlertEngine

13 types · 6 channels

SchemaValidator

JSON Schema + infer

↓

DuckDB

local · embedded

↓

tj CLI

REST API + Web UI

:7391

MCP Server

13 tools

Prometheus

:7391/metrics

Ingest paths

Three ways spans enter the system, normalized into the same shape:

OTLP: for agents that already emit OpenTelemetry (Claude Code, Codex, OpenAI Agents SDK, LlamaIndex, etc.).
Python SDK: @watch plus provider/framework patches send spans via TjSpanExporter.
TypeScript SDK: explicit SpanBuilder → HTTP POST.

IngestPipeline

Every incoming span flows through three stages:

Sanitize. Strip content per [capture] config, normalize timestamps, validate required fields.
Session continuity. Spans missing session.id are attributed to the most recent open session for that agent. Long-running sessions get auto-rotated on idle.
Extract. Pull GenAI SemConv attributes into typed columns for cheap querying. Raw attributes remain available.

CostEngine

pricing.toml maps (provider, model) to per-token prices. Updated when providers change pricing. The engine prices every LLM call as it ingests. Costs are stored alongside spans, not computed at query time.

AlertEngine

Evaluates 13 alert types against each span and across windowed batches. Cooldowns prevent storms. Channels dispatch async (ntfy, Discord, Telegram, webhook, stdout, log). See Alerts.

SchemaValidator

For tool calls, either:

Declares a JSON Schema in your config under [agents.<id>.tools.<name>.schema], or
Lets the validator infer a schema from the first N successful calls.

Subsequent violations emit a schema_violation alert.

Storage

DuckDB. One file, embedded, no separate server. Read-write held by tj serve; read-only mode used by the MCP server and CLI so they can coexist. Retention pruning runs on a daily schedule.

Why DuckDB?

Three reasons:

Columnar. Span queries are aggregations across millions of rows, exactly what DuckDB optimizes for.
Embedded. No daemon to install, no port to open, no auth to manage.
SQL. Existing OTel queries port directly; advanced users can drop to SQL via tj export --format json + their own tools.

Full design notes and contribution guide: AGENTS.md in the OSS repo.