Open-source · Runs locally · No signup, no proxy, no SaaS account

Token efficiency
for AI agents.

Make your AI agents make the most of their tokens.

WORKS WITH

Anthropic OpenAI Google Gemini AWS Bedrock LangChain LangGraph CrewAI AutoGen LlamaIndex OpenAI Agents SDK OpenClaw NemoClaw
tj — bash
quick install
pip pip install tokenjam
npm npm install @tokenjam/sdk

Your agent ran.
How much did it cost?
What did it do?

Every observability tool is built for LLM developers. TokenJam is built for people whose agents have real-world side effects — and real-world bills.

No visibility into what agents do while you sleep

Coding agents and autonomous workflows run for hours unattended. They edit files, send emails, hit APIs. Without observability, you find out what happened when something breaks — or when the bill arrives.

Surprise bills, no obvious fix

A Claude Code session can rack up $45 in an hour. Most calls don't need the most expensive model — but without per-task cost attribution, you can't tell which ones do.

Every tool requires a SaaS account

Behavioral drift, sensitive-action alerts, eval-to-production correlation — they all require API keys, hosted backends, and credit cards. TokenJam runs on your machine.

What TokenJam gives you

See where your tokens go — Real-time USD cost per LLM call, per agent, per task. Per-session, per-day, and per-prompt breakdowns. Works for Claude Code, Codex CLI, OpenAI Agents SDK, LangChain, CrewAI, or any OTel-native agent.
Find where they're wastedtj optimize analyzes your real sessions and flags model-downgrade candidates plus per-provider monthly budget projection. Shipped.
Keep agents in line — Sensitive-action alerts (email sends, file writes, payment actions, form submits). Behavioral drift detection. Cost budget alerts with optional enforcement.
Connect evals to production — Import results from Inspect AI, DeepEval, Promptfoo, and others. Correlate eval failures with matching production sessions.soon
OTel-native, no vendor lock-in — Full GenAI Semantic Conventions compliance. Exportable to Grafana, Jaeger, Datadog, or any OTel backend.
Local-first, CLI-first — Single pip install. No signup, no proxy, no SaaS account. Full-featured CLI (tj status / traces / cost / drift) with JSON output. Local REST API + Prometheus /metrics.

Built for individuals.
Architected for teams.

Autonomous agent safety alerts

The only observability tool built for agents with real-world side effects. Configurable alerts fire on email sends, file writes, form submissions, and payment actions.

unique to TokenJam
Token & cost tracking

Real-time USD cost per LLM call, attributed to the agent and tool that triggered it. Configurable daily/session/per-agent budget alerts fire before you get the bill.

per-model pricing TOML
Local behavioral drift detection

Deterministic, no-cloud drift detection. Automatically baselines token usage, tool call sequences, output schema, and session duration — alerts when agents deviate.

no API key required
OTel-native telemetry

Full GenAI Semantic Conventions compliance from day one. Agent spans, tool calls, token metrics — exportable to Grafana, Jaeger, Datadog, or any OTel backend without transformation.

OTel SemConv v1.37+
Output schema validation

JSON Schema validation for tool outputs and agent responses. Declare schemas per-agent/tool in config, or use inference mode to auto-derive from observed sessions.

JSON Schema draft-07
CLI + local REST API

A full-featured CLI (tj status / traces / cost / drift) with JSON output on every command. Local API at localhost with Prometheus /metrics endpoint, OpenAPI spec included.

pipe-friendly · scriptable
Trace-driven cost optimization

Most coding-agent calls don't need the most expensive model. tj optimize analyzes your real sessions and flags model-downgrade candidates plus per-provider monthly budget projection — driven by your actual traces, not generic heuristics. Every recommendation ships with a quality-equivalence caveat, so you decide what to apply.

tj optimize · shipped

Works with every major agent runtime

OpenClaw
LangChain
LangGraph
LlamaIndex
CrewAI
AutoGen
OpenAI Agents SDK
Anthropic (direct)
Google Gemini
AWS Bedrock
NemoClaw
Custom agents

Other tools tell you the agent ran.
TokenJam tells you what it did,
what it cost, and what to fix.

The tools your team already uses are built for LLM developers. TokenJam fills the gap they all leave open.

Feature TokenJam Langfuse LangSmith Helicone Guardrails AI
Observability
OTel GenAI SemConv nativecompliant from day one ~ ~
LLM call tracing
Token & cost tracking
Framework agnostic
Autonomous agent safety
Sensitive action alertsemail, file write, payment, form submit
Cost budget alertsdaily / session / per-agent
NemoClaw sandbox events
Retry loop detection
Runtime verification
Behavioral drift detection
Output schema validation
Token economics
Trace-driven cost recommendationsmodel-downgrade candidates + per-provider budget projection ~
Eval-to-production correlationimport Inspect / DeepEval / Promptfoo ~ ~
MCP server for agent self-introspection13 MCP tools shipped
Multi-agent fleet aggregationcloud.tokenjam.dev — coming soon cloud ~
Developer experience
Fully local, no signup ~
CLI interface
OTLP export to any backendGrafana, Jaeger, Datadog…
Open source / self-hostable
Supported ~ Partial or roadmap Not available

Eval-to-production correlation.

Connect what graded your agent offline with what it's doing in production. In active development; the OSS roadmap is public.

tj eval import Q2 2026

Ingest results from Inspect AI, DeepEval, Promptfoo, HUD, and Coval. Correlate failed eval cases with matching production sessions. The only OSS layer that connects what graded your agent offline with what it's doing in production.

5 importers planned
Discussions & roadmap on GitHub
Got a complimentary product? partner@metabldr.com
Interested in joining our mission? join@metabldr.com
Looking for advice on how to use AI & Agents? consult@metabldr.com
Inspired by what you see and want to invest? invest@metabldr.com
Follow us on our social channels: