Caps and the Three-Way Decision Model in Cycles
Most budget systems make a binary decision: allow or deny.
Cycles adds a third option.
When a reservation or decide request is evaluated, the server returns one of three decisions:
- ALLOW — sufficient budget exists, proceed normally
- ALLOW_WITH_CAPS — sufficient budget exists, but soft constraints apply
- DENY — insufficient budget or policy block
The middle option — ALLOW_WITH_CAPS — is what makes Cycles more useful than a simple gate.
Why binary decisions are not enough
In practice, many autonomous actions can still produce value in a constrained mode.
A model call does not have to use the maximum token budget. An agent does not have to invoke every available tool. A workflow does not have to run at full concurrency.
If the only options are "full speed" or "stop," teams end up choosing one extreme:
- budgets are set too loose (to avoid breaking production)
- or budgets are set too tight (and useful work gets denied unnecessarily)
Caps provide a middle ground.
What caps are
Caps are server-imposed constraints that the client should respect when the decision is ALLOW_WITH_CAPS.
The protocol defines five cap fields:
max_tokens
The maximum number of tokens the action should consume.
This lets the server say: "you may proceed, but limit output length."
max_steps_remaining
The maximum number of remaining steps for an agent or loop.
This lets the server say: "you may proceed, but wrap up soon."
tool_allowlist
A list of tools the action is allowed to use.
If present, only these tools may be invoked. All others are implicitly denied.
tool_denylist
A list of tools the action is not allowed to use.
If present, all tools except these may be invoked.
The allowlist takes precedence: if tool_allowlist is non-empty, the denylist is ignored.
cooldown_ms
A delay in milliseconds before the next action should begin.
This lets the server throttle execution without denying it outright.
How caps flow through the system
Caps are returned by the server in two places:
- Reservation responses — when the decision on
POST /v1/reservationsis ALLOW_WITH_CAPS - Decide responses — when the decision on
POST /v1/decideis ALLOW_WITH_CAPS
Caps are only present when the decision is ALLOW_WITH_CAPS. They are absent for ALLOW and DENY.
The client is responsible for respecting caps. The server does not enforce them at commit time. They are guidance, not hard blocks.
When caps appear
The server may return caps based on:
- budget pressure at one or more scopes
- policy rules that apply to the action class
- operator-configured constraints
- approaching budget limits that do not yet warrant denial
For example, a scope may have enough budget for the reservation, but the server may still want to signal: "budget is getting tight — reduce token usage" or "avoid expensive tools for the rest of this run."
Using caps in practice
Model calls
When max_tokens is returned, the client should pass it to the model provider as a generation limit.
This reduces the cost of the call without denying it entirely.
Agent loops
When max_steps_remaining is returned, the agent should plan to finish within that many steps.
This creates a bounded wind-down instead of an abrupt stop.
Tool selection
When tool_allowlist or tool_denylist is returned, the agent should filter its available tools accordingly.
This narrows the action surface under budget pressure without eliminating all capability.
Pacing
When cooldown_ms is returned, the client should wait that long before making the next action.
This reduces execution velocity without stopping the workflow.
Caps in client code
Python
Inside a @cycles-decorated function, access caps through get_cycles_context():
from runcycles import cycles, get_cycles_context
@cycles(estimate=1000)
def process(prompt: str) -> str:
ctx = get_cycles_context()
if ctx.has_caps():
if ctx.caps.max_tokens:
# limit generation length
pass
if not ctx.caps.is_tool_allowed("web.search"):
# skip web search tool
pass
return call_llm(prompt)Java (Spring Boot)
The Spring Boot client makes caps available through the CyclesReservationContext:
CyclesReservationContext ctx = CyclesContextHolder.get();
if (ctx.hasCaps()) {
Caps caps = ctx.getCaps();
if (caps.getMaxTokens() != null) {
// limit generation length
}
if (!caps.isToolAllowed("web.search")) {
// skip web search tool
}
}The @Cycles annotation logs a warning when ALLOW_WITH_CAPS is returned, so teams can see when caps are being applied even without explicit handling.
Caps are advisory
Caps are not enforced by the server at commit time.
If the client ignores max_tokens and generates more output, the commit will still succeed (subject to the overage policy).
But respecting caps is important for two reasons:
- It keeps budget consumption aligned with server expectations
- It prevents the system from escalating into harder denials on subsequent requests
Caps are a signal to self-regulate before enforcement becomes necessary.
Tool list precedence
The tool filtering logic follows a clear precedence:
- If
tool_allowlistis non-empty, only those tools are allowed (denylist is ignored) - If
tool_allowlistis empty buttool_denylistis non-empty, all tools except those in the denylist are allowed - If both are empty or absent, no tool restrictions apply
Tool names are case-sensitive and match the Action.name field exactly.
How caps relate to degradation paths
Caps are the protocol-level mechanism behind the degradation strategies described in the Cycles documentation:
- Downgrade — reduce model size or generation length →
max_tokens - Disable — remove expensive tools →
tool_allowlist/tool_denylist - Defer — slow down execution →
cooldown_ms - Limit scope — bound remaining steps →
max_steps_remaining
Caps make these degradation paths concrete and server-driven rather than hardcoded in the client.
Summary
The three-way decision model — ALLOW, ALLOW_WITH_CAPS, DENY — gives Cycles a richer control surface than binary allow/deny.
Caps provide server-driven guidance that lets clients:
- reduce token consumption
- limit remaining steps
- restrict tool usage
- pace execution
This creates a smoother degradation curve between full execution and hard denial.
That is how budget pressure becomes a signal for adjustment, not just a gate.
Next steps
To explore the Cycles stack:
- Read the Cycles Protocol
- Run the Cycles Server
- Manage budgets with Cycles Admin
- Integrate with Python using the Python Client
- Integrate with TypeScript using the TypeScript Client
- Integrate with Spring AI using the Spring Client
