Caps and the Three-Way Decision Model in Cycles

Most budget systems make a binary decision: allow or deny.

Cycles adds a third option.

When a reservation or decide request is evaluated, the server returns one of three decisions:

ALLOW — sufficient budget exists, proceed normally
ALLOW_WITH_CAPS — sufficient budget exists, but soft constraints apply
DENY — insufficient budget or policy block

The middle option — ALLOW_WITH_CAPS — is what makes Cycles more useful than a simple gate.

Why binary decisions are not enough

In practice, many autonomous actions can still produce value in a constrained mode.

A model call does not have to use the maximum token budget. An agent does not have to invoke every available tool. A workflow does not have to run at full concurrency.

If the only options are "full speed" or "stop," teams end up choosing one extreme:

budgets are set too loose (to avoid breaking production)
or budgets are set too tight (and useful work gets denied unnecessarily)

Caps provide a middle ground.

What caps are

Caps are server-imposed constraints that the client should respect when the decision is ALLOW_WITH_CAPS.

The protocol defines five cap fields:

max_tokens

The maximum number of tokens the action should consume.

This lets the server say: "you may proceed, but limit output length."

max_steps_remaining

The maximum number of remaining steps for an agent or loop.

This lets the server say: "you may proceed, but wrap up soon."

tool_allowlist

A list of tools the action is allowed to use.

If present, only these tools may be invoked. All others are implicitly denied.

tool_denylist

A list of tools the action is not allowed to use.

If present, all tools except these may be invoked.

The allowlist takes precedence: if tool_allowlist is non-empty, the denylist is ignored.

cooldown_ms

A delay in milliseconds before the next action should begin.

This lets the server throttle execution without denying it outright.

How caps flow through the system

Caps are returned by the server in two places:

Reservation responses — when the decision on POST /v1/reservations is ALLOW_WITH_CAPS
Decide responses — when the decision on POST /v1/decide is ALLOW_WITH_CAPS

Caps are only present when the decision is ALLOW_WITH_CAPS. They are absent for ALLOW and DENY.

The client is responsible for respecting caps. The server does not enforce them at commit time. They are guidance, not hard blocks.

When caps appear

The server may return caps based on:

budget pressure at one or more scopes
policy rules that apply to the action class
operator-configured constraints
approaching budget limits that do not yet warrant denial

For example, a scope may have enough budget for the reservation, but the server may still want to signal: "budget is getting tight — reduce token usage" or "avoid expensive tools for the rest of this run."

Using caps in practice

Model calls

When max_tokens is returned, the client should pass it to the model provider as a generation limit.

This reduces the cost of the call without denying it entirely.

Agent loops

When max_steps_remaining is returned, the agent should plan to finish within that many steps.

This creates a bounded wind-down instead of an abrupt stop.

Tool selection

When tool_allowlist or tool_denylist is returned, the agent should filter its available tools accordingly.

This narrows the action surface under budget pressure without eliminating all capability.

Pacing

When cooldown_ms is returned, the client should wait that long before making the next action.

This reduces execution velocity without stopping the workflow.

Caps in client code

Python

Inside a @cycles-decorated function, access caps through get_cycles_context():

python

from runcycles import cycles, get_cycles_context

@cycles(estimate=1000)
def process(prompt: str) -> str:
    ctx = get_cycles_context()
    if ctx.has_caps():
        if ctx.caps.max_tokens:
            # limit generation length
            pass
        if not ctx.caps.is_tool_allowed("web.search"):
            # skip web search tool
            pass
    return call_llm(prompt)

Java (Spring Boot)

The Spring Boot client makes caps available through the CyclesReservationContext:

java

CyclesReservationContext ctx = CyclesContextHolder.get();
if (ctx.hasCaps()) {
    Caps caps = ctx.getCaps();
    if (caps.getMaxTokens() != null) {
        // limit generation length
    }
    if (!caps.isToolAllowed("web.search")) {
        // skip web search tool
    }
}

The @Cycles annotation logs a warning when ALLOW_WITH_CAPS is returned, so teams can see when caps are being applied even without explicit handling.

Caps are advisory

Caps are not enforced by the server at commit time.

If the client ignores max_tokens and generates more output, the commit will still succeed (subject to the overage policy).

But respecting caps is important for two reasons:

It keeps budget consumption aligned with server expectations
It prevents the system from escalating into harder denials on subsequent requests

Caps are a signal to self-regulate before enforcement becomes necessary.

Tool list precedence

The tool filtering logic follows a clear precedence:

If tool_allowlist is non-empty, only those tools are allowed (denylist is ignored)
If tool_allowlist is empty but tool_denylist is non-empty, all tools except those in the denylist are allowed
If both are empty or absent, no tool restrictions apply

Tool names are case-sensitive and match the Action.name field exactly.

How caps relate to degradation paths

Caps are the protocol-level mechanism behind the degradation strategies described in the Cycles documentation:

Downgrade — reduce model size or generation length → max_tokens
Disable — remove expensive tools → tool_allowlist / tool_denylist
Defer — slow down execution → cooldown_ms
Limit scope — bound remaining steps → max_steps_remaining

Caps make these degradation paths concrete and server-driven rather than hardcoded in the client.

Summary

The three-way decision model — ALLOW, ALLOW_WITH_CAPS, DENY — gives Cycles a richer control surface than binary allow/deny.

Caps provide server-driven guidance that lets clients:

reduce token consumption
limit remaining steps
restrict tool usage
pace execution

This creates a smoother degradation curve between full execution and hard denial.

That is how budget pressure becomes a signal for adjustment, not just a gate.

Next steps

To explore the Cycles stack:

Read the Cycles Protocol
Run the Cycles Server
Manage budgets with Cycles Admin
Integrate with Python using the Python Client
Integrate with TypeScript using the TypeScript Client
Integrate with Spring AI using the Spring Client

Caps and the Three-Way Decision Model in Cycles ​

Why binary decisions are not enough ​

What caps are ​

max_tokens ​

max_steps_remaining ​

tool_allowlist ​

tool_denylist ​

cooldown_ms ​

How caps flow through the system ​

When caps appear ​

Using caps in practice ​

Model calls ​

Agent loops ​

Tool selection ​

Pacing ​

Caps in client code ​

Python ​

Java (Spring Boot) ​

Caps are advisory ​

Tool list precedence ​

How caps relate to degradation paths ​

Summary ​

Next steps ​