Skip to content

Caps and the Three-Way Decision Model in Cycles

Most budget systems make a binary decision: allow or deny.

Cycles adds a third option.

When a reservation or decide request is evaluated, the server returns one of three decisions:

  • ALLOW — sufficient budget exists, proceed normally
  • ALLOW_WITH_CAPS — sufficient budget exists, but soft constraints apply
  • DENY — insufficient budget or policy block

The middle option — ALLOW_WITH_CAPS — is what makes Cycles more useful than a simple gate.

Why binary decisions are not enough

In practice, many autonomous actions can still produce value in a constrained mode.

A model call does not have to use the maximum token budget. An agent does not have to invoke every available tool. A workflow does not have to run at full concurrency.

If the only options are "full speed" or "stop," teams end up choosing one extreme:

  • budgets are set too loose (to avoid breaking production)
  • or budgets are set too tight (and useful work gets denied unnecessarily)

Caps provide a middle ground.

What caps are

Caps are server-imposed constraints that the client should respect when the decision is ALLOW_WITH_CAPS.

The protocol defines five cap fields:

max_tokens

The maximum number of tokens the action should consume.

This lets the server say: "you may proceed, but limit output length."

max_steps_remaining

The maximum number of remaining steps for an agent or loop.

This lets the server say: "you may proceed, but wrap up soon."

tool_allowlist

A list of tools the action is allowed to use.

If present, only these tools may be invoked. All others are implicitly denied.

tool_denylist

A list of tools the action is not allowed to use.

If present, all tools except these may be invoked.

The allowlist takes precedence: if tool_allowlist is non-empty, the denylist is ignored.

cooldown_ms

A delay in milliseconds before the next action should begin.

This lets the server throttle execution without denying it outright.

How caps flow through the system

Caps are returned by the server in two places:

  1. Reservation responses — when the decision on POST /v1/reservations is ALLOW_WITH_CAPS
  2. Decide responses — when the decision on POST /v1/decide is ALLOW_WITH_CAPS

Caps are only present when the decision is ALLOW_WITH_CAPS. They are absent for ALLOW and DENY.

The client is responsible for respecting caps. The server does not enforce them at commit time. They are guidance, not hard blocks.

When caps appear

The server may return caps based on:

  • budget pressure at one or more scopes
  • policy rules that apply to the action class
  • operator-configured constraints
  • approaching budget limits that do not yet warrant denial

For example, a scope may have enough budget for the reservation, but the server may still want to signal: "budget is getting tight — reduce token usage" or "avoid expensive tools for the rest of this run."

Using caps in practice

Model calls

When max_tokens is returned, the client should pass it to the model provider as a generation limit.

This reduces the cost of the call without denying it entirely.

Agent loops

When max_steps_remaining is returned, the agent should plan to finish within that many steps.

This creates a bounded wind-down instead of an abrupt stop.

Tool selection

When tool_allowlist or tool_denylist is returned, the agent should filter its available tools accordingly.

This narrows the action surface under budget pressure without eliminating all capability.

Pacing

When cooldown_ms is returned, the client should wait that long before making the next action.

This reduces execution velocity without stopping the workflow.

Caps in client code

Python

Inside a @cycles-decorated function, access caps through get_cycles_context():

python
from runcycles import cycles, get_cycles_context

@cycles(estimate=1000)
def process(prompt: str) -> str:
    ctx = get_cycles_context()
    if ctx.has_caps():
        if ctx.caps.max_tokens:
            # limit generation length
            pass
        if not ctx.caps.is_tool_allowed("web.search"):
            # skip web search tool
            pass
    return call_llm(prompt)

Java (Spring Boot)

The Spring Boot client makes caps available through the CyclesReservationContext:

java
CyclesReservationContext ctx = CyclesContextHolder.get();
if (ctx.hasCaps()) {
    Caps caps = ctx.getCaps();
    if (caps.getMaxTokens() != null) {
        // limit generation length
    }
    if (!caps.isToolAllowed("web.search")) {
        // skip web search tool
    }
}

The @Cycles annotation logs a warning when ALLOW_WITH_CAPS is returned, so teams can see when caps are being applied even without explicit handling.

Caps are advisory

Caps are not enforced by the server at commit time.

If the client ignores max_tokens and generates more output, the commit will still succeed (subject to the overage policy).

But respecting caps is important for two reasons:

  1. It keeps budget consumption aligned with server expectations
  2. It prevents the system from escalating into harder denials on subsequent requests

Caps are a signal to self-regulate before enforcement becomes necessary.

Tool list precedence

The tool filtering logic follows a clear precedence:

  1. If tool_allowlist is non-empty, only those tools are allowed (denylist is ignored)
  2. If tool_allowlist is empty but tool_denylist is non-empty, all tools except those in the denylist are allowed
  3. If both are empty or absent, no tool restrictions apply

Tool names are case-sensitive and match the Action.name field exactly.

How caps relate to degradation paths

Caps are the protocol-level mechanism behind the degradation strategies described in the Cycles documentation:

  • Downgrade — reduce model size or generation length → max_tokens
  • Disable — remove expensive tools → tool_allowlist / tool_denylist
  • Defer — slow down execution → cooldown_ms
  • Limit scope — bound remaining steps → max_steps_remaining

Caps make these degradation paths concrete and server-driven rather than hardcoded in the client.

Summary

The three-way decision model — ALLOW, ALLOW_WITH_CAPS, DENY — gives Cycles a richer control surface than binary allow/deny.

Caps provide server-driven guidance that lets clients:

  • reduce token consumption
  • limit remaining steps
  • restrict tool usage
  • pace execution

This creates a smoother degradation curve between full execution and hard denial.

That is how budget pressure becomes a signal for adjustment, not just a gate.

Next steps

To explore the Cycles stack: