Documentation

Get Intencion running in a minute.

Patch your model client once, wrap your handler in a run, and every call your agent makes shows up grouped by intent and outcome, so you can debug which runs fail, at which step, and why.

Overview

Intencion is how you debug AI agents at the product level, whether that agent answers users in a chat, runs in the background, or works through a queue of tasks. It captures every run (the goal behind it, the steps your agent took, and whether it worked) and groups those runs by intent so you can see which intents fail, at which step, and why, then fix the biggest one.

Each run is scored against the outcome you define and grouped by what the user wanted, so you can see your agent the way your users do, and know exactly what to ship next.

There are two pieces to wire up:

Instrument your model client so every call is captured automatically.
Wrap your handler in a run so steps and outcomes attach to a user goal.

You can ship step one alone and still get model-level analytics. Runs add the intent grouping.

Quickstart

Install the SDK and set your API key, available on the Install screen.

npm i @intencion/sdk

Patch your client. Every call from here on is captured.

agent

import { Intencion } from "@intencion/sdk";
import Anthropic from "@anthropic-ai/sdk";

const ix = new Intencion({ apiKey: process.env.INTENCION_API_KEY });
const anthropic = ix.instrumentAnthropic(new Anthropic());

// every call captured: model, tokens, latency, outcome.

Using OpenAI instead? Same one line: const openai = ix.instrumentOpenAI(new OpenAI()) (TypeScript) or client = intencion.instrument_openai(OpenAI()) (Python).

In a short-lived script or serverless function, call ix.flush() (TS) / intencion.flush() (Python) before exit (or pass ix.flush() to your platform's waitUntil) so queued runs are sent. Long-lived servers flush in the background automatically. flush() returns a { sent, dropped, queued } result so you can confirm runs actually landed, and a rejected API key warns loudly once instead of dropping data silently.

Core concepts

Run

One user goal, captured from the first call to the final outcome, with every tool call in between on a single timeline. A run is the unit you're billed on and the unit the dashboard groups.

Step

One action inside a run: a single model call or tool call, timed and placed on the run's timeline. The run is the goal; its steps are the moves it took to get there. Auto-instrumented model calls and run.tool() calls both land here as steps.

Intent

What the user wanted, captured as a named business intent on each run (like refund_request). Declare it when you know it, or let Intencion infer one per run from the input. Either way every run carries a real, human-readable business label that names the user's actual goal. New intents are named by a small model and surface under Emerging.

Outcome

How the run ended on the goal axis: success or failure. Returning is success and throwing is failure; override with run.fail() / run.ok(). A caught tool error is recorded on the step and surfaced as a reliability signal, not mixed into this verdict. Outcome is a business fact you define (did the refund actually happen), not an answer-quality score. There is no judge model, no eval harness, and no per-run scoring cost.

How they fit together

Two of these nest, and two group. A run contains its steps: the run is one goal, each step is one move inside it. A session and a trace are two independent ways to group runs. A session gathers the runs of one conversation by user and time. A trace gathers the runs of one task into a causal tree, a parent run with its sub-agent runs underneath. They sit on separate axes, so a single run carries a session and a trace at the same time, and one session can hold many traces. For a single request handled in one unit of work, one run and its steps are the whole picture; sessions and traces start to earn their place once a conversation runs across turns or a task fans out into sub-agent runs.

Install the SDK

Intencion publishes @intencion/sdk for TypeScript and intencion for Python. Set INTENCION_API_KEY from your environment, never hard-code it.

Record a run

Wrap the handler for a user request in ix.run(). It returns whatever your handler returns. You don't name the intent. Leave it off and Intencion infers a real, per-run label from the input. Record tool calls with run.tool(): it times the call and records the step, capturing the error message if the tool throws, so tool failures are always visible on the run timeline.

tools

await ix.run({ input, user }, async (run) => {       // intent inferred per run
  // run.tool times the call, records the step, and marks it errored on throw
  const order = await run.tool("lookup_order", "orders-db", () => lookupOrder(id));
  return await issueRefund(order);                         // returns → success
});

Sessions & users

A session groups related runs that belong to one larger piece of work over time, whatever shape it takes: a multi-turn conversation, a background job that fans out into many steps, or a scheduled batch working through a queue. Wrap them in a session so every run (and every auto-instrumented call) inherits the same session id and user, with no plumbing. Pass user to attribute runs to a person or system, and session to group the work. For a task where one agent calls another, reach for Traces, which captures the parent and child runs as a tree.

session

// group every run for the same unit of work under one session id:
// a chat (one run per turn), a batch (one run per item), a workflow (one run per task)
await ix.session({ session: sessionId, user: actorId }, () =>
  ix.run({ input: task }, async (run) => {       // intent inferred per run
    // ...your agent loop; record tool calls with run.tool(...)
  })
);

Each run stays its own unit, one goal and one outcome, and the session ties them together in the dashboard. A chat conversation is simply the case where each user turn is a run.

Traces

A trace groups the runs of one task into a causal tree, so a multi-agent task reads as a parent run with its sub-agent runs nested underneath and a failure pins to the sub-agent that caused it. Where a session groups a conversation by time and user, a trace groups a task by cause and effect. The two are independent: a trace can sit inside a session, and either works on its own.

Nesting is automatic. A run opened inside another becomes its child, sharing a trace id and carrying the parent's id, so you write the agent the way you already would. Auto-instrumented model calls inside a run stay steps on that run; only an explicit nested run becomes a child run.

trace

// a supervisor that calls sub-agents becomes a run tree:
await ix.run({ intent: "research_task" }, async () => {   // the task: a trace root
  await ix.run({ intent: "search" },    async () => { /* sub-agent run */ });
  await ix.run({ intent: "summarize" }, async () => { /* sub-agent run */ });
});

// group siblings that aren't lexically nested:
await ix.trace(async () => {
  await ix.run({ intent: "plan" }, async () => { /* ... */ });
  await ix.run({ intent: "act"  }, async () => { /* ... */ });
});

OpenTelemetry exporters get the same tree automatically: a trace whose spans carry agent or chain boundaries (an intencion.intent, an OpenInference AGENT/CHAIN span, or intencion.run_boundary) splits into the matching run tree. Traces show up under Traces in the dashboard.

Intents

By default you don't pass an intent at all: Intencion infers one per run from the input. Pass it explicitly only when you want deterministic grouping under a label you control. Don't hardcode a single constant like "chat-turn" on every run, since that buckets distinct work under one label; leaving it inferred gives a real intent per run. Either way, runs are grouped so you see the success rate per goal, not a flat log.

Inferred (default). We read the input and assign the closest existing intent, or name a new one. The result is a stable label per run, not a post-hoc bucket.
Declared. Pass a stable key like "refund_request" when you already know the goal and want to group under it deterministically.

Outcomes

An outcome is one of success or failure on the goal axis. By default, returning is success and throwing is failure. A caught tool error is recorded on the step and surfaced as a separate reliability signal; it does not change the run outcome on its own. When a run returned but didn't actually help, set the outcome yourself with run.fail(reason) or run.ok(), or a confirmOutcome resolver:

run.fail("no inventory match"); // also: run.ok()

A resolver can also label the failure as it classifies it: return { outcome: "failure", reason } from confirmOutcome (or classifyOutcome) and the reason becomes the run's failure_reason, so failures group by mode on the dashboard. For the common cases, drop in the built-in deterministic heuristics: by default they flag empty answers (empty_output) and zero-result lookups (no_results) with no judge model. Since not every run is a conversational answer, refusals (refused) are opt-in and you can scope the checks to specific intents.

import { Intencion, confirmOutcomeFromHeuristics } from "@intencion/sdk";

const ix = new Intencion({
  apiKey: process.env.INTENCION_API_KEY,
  confirmOutcome: confirmOutcomeFromHeuristics(),
});

Capturing content

By default Intencion captures metadata only — intents, steps, model, tokens, latency, and outcome — never your message or tool content. Opt in with captureContent (TypeScript) / capture_content (Python):

const ix = new Intencion({ apiKey: process.env.INTENCION_API_KEY, captureContent: true });

With it on, auto-instrumented model calls fold their reply text onto the run as output_text, and run.tool() records the tool's return value as the step's output. Streamed responses still capture metadata only.

Redaction

Redaction is on by default (redact: true) and strips emails, credit cards, US SSNs, and phone numbers from any captured text before it leaves your process. Because model output and tool returns can carry other sensitive data, pass a redactor (replaces the built-in patterns) or redactPatterns / redact_patterns (extra rules) when you enable content capture. Preview exactly what would be scrubbed before sending real traffic:

ix.previewRedaction("contact jane@example.com");
// { redacted: "contact <EMAIL>", matches: [{ value: "jane@example.com", replacement: "<EMAIL>" }] }

Backfill existing logs

Already have OpenAI or Anthropic logs on disk or in a warehouse? Import them as runs without touching your app, so you see value before changing any code. Each log becomes one run — input, model, tokens, and tool steps are extracted with the same parser live capture uses, so imported and live runs agree field-for-field.

// request is the body you sent; response is what you got back.
ix.importOpenAI([{ request, response, intent: "support", id: log.id, user, session }]);
ix.importAnthropic({ request, response, id });

// provider-agnostic (CSV / JSONL / your own shape):
ix.importRuns([{ intent: "checkout", input, model: "gpt-4o", outcome: "success", steps: [] }]);

await ix.flush();

Pass a stable id per record (the provider's chatcmpl-… id or your own) and re-importing is idempotent: the id is the server's dedupe key. Import bypasses sampling — an explicit backfill is always captured — and is redacted like any other run.

Verify it's working

Captured runs appear in your dashboard within a few seconds of a flush. To assert on what you captured in a test, with no network, inspect the ingest payload: a POST of { events: [run, ...] } where each run carries intent_label, session_id, user_ref, steps (each with status / error), outcome, tokens_in / tokens_out, and latency_ms.

test

// inject a fetch that records the batch instead of sending it
const sent = [];
const ix = new Intencion({
  apiKey: "test",
  fetch: async (_url, init) => {
    sent.push(...JSON.parse(init.body).events);
    return new Response("{}", { status: 200 });
  },
});
// ...run your agent...
await ix.flush();
console.log(sent[0].outcome, sent[0].steps);

Debug from your editor

Once a run looks wrong, you don't have to leave your editor to find out why. @intencion/mcp is a read-only MCP server that lets your coding agent (Claude Code, Cursor) query your captured runs and walk you to the root cause, with the failing step and its exact error quoted as evidence. Ask "why did my agent fail?" and it chains the lookups for you, no run ids to copy by hand.

Add it with your API key from the Install screen. In Claude Code it's one command:

Claude Code

claude mcp add intencion --env INTENCION_API_KEY=in_pk_... -- npx -y @intencion/mcp

In Cursor, or any client that takes an mcpServers config:

mcp.json

{
  "mcpServers": {
    "intencion": {
      "command": "npx",
      "args": ["-y", "@intencion/mcp"],
      "env": { "INTENCION_API_KEY": "in_pk_..." }
    }
  }
}

Then just ask. Behind one question the agent triages the success rate, finds the failing intent and its worst step, pulls a concrete failing run, and diagnoses it, quoting the literal error rather than guessing:

in your editor

why does my order-status agent keep failing?

The server only ever reads, through your key-scoped API — no database access, no writes. It surfaces whatever you've captured to the model in your editor: with content capture on that includes prompt and tool text, already redacted. Keep it metadata-only with captureContent: false.

Coverage

Auto-instrumentation patches the official OpenAI and Anthropic client classes. Because it patches at the class level, it also captures calls that frameworks make through those clients (LangChain, the OpenAI Agents SDK, LlamaIndex) with no extra setup.

Auto-captured: any code path that calls the official openai or @anthropic-ai/sdk client, directly or through a framework.
Capture manually: stacks that bypass those clients (e.g. the Vercel AI SDK, raw HTTP, other providers). Wrap them in ix.run() and record steps with run.tool().

Privacy

Emails, credit-card numbers, US SSNs, and phone numbers are stripped before anything is stored, so nothing sensitive lands in our database. Enterprise adds SSO, audit logs, data residency, and a self-hosted option (see pricing).

Redaction runs in the SDK, before the payload leaves your process.