Safety screen - HermesCo

NemoClaw

NemoClaw is the safe-runtime screen that runs on every spend. It combines a deterministic policy layer (the inviolable part) with an NVIDIA Nemotron classification (the judgement part). The deterministic layer is decisive on its own, so a spend can never slip through just because a model was slow or unavailable. The implementation is src/lib/hermesco/safety.ts.

Order of evaluation

screenSpend(amount, text, budget)
prohibitedVerdict(text)     -> blocked if it matches a prohibited-use pattern
capVerdict(amount, budget)  -> blocked if amount > per-action cap
classifyWithNemotron(...)   -> Nemotron risk classification
capVerdict(amount, budget)  -> caps re-checked as the final word

The hard cap is checked before and after the model. The model can tighten the outcome but it can never loosen a cap.

The deterministic gate

export function capVerdict(amountUsd: number, budget: Budget): SafetyVerdict {
  if (amountUsd > budget.maxSpendPerActionUsd) {
    return {
      risk: "blocked",
      reason: `Exceeds the per-action hard cap ($${budget.maxSpendPerActionUsd}). NemoClaw refuses.`,
    };
  }
  if (amountUsd >= budget.autoApproveUnderUsd) {
    return {
      risk: "review",
      reason: `$${amountUsd} is at or above the auto-approve threshold ($${budget.autoApproveUnderUsd}). Needs a human tap.`,
    };
  }
  return { risk: "safe", reason: "Within the auto-approve band and all hard caps." };
}

A verdict has three levels:

Risk	Meaning	Outcome
`blocked`	Over a hard cap or prohibited	Refused in code, no money moves
`review`	Allowed but not trivial	Held for a human decision
`safe`	Small and within all caps	Auto-approved, then caps re-checked at execution

Nemotron classification

When the deterministic layer does not already block, NVIDIA Nemotron classifies the spend. The screen calls NVIDIA’s own API (integrate.api.nvidia.com) when an NVIDIA_API_KEY is configured, and otherwise reaches the same Nemotron model (nvidia/nemotron-3-ultra-550b-a55b) through OpenRouter:

classifyWithNemotron({ ... })  ->  chatComplete({ provider, modelId: SAFETY_MODEL_ID, ... })

Nemotron is a reasoning model, so the prompt opens with /no_think and a wider token budget so it returns the verdict JSON directly instead of spending the budget reasoning out loud. If the model is unavailable or times out, the function returns null and the deterministic layers stand. The safety outcome is therefore never blocked on model availability.

What the agent sees

The agent is told the verdict in plain terms so it behaves correctly without retry loops:

Awaiting approval: it pauses and explains what it needs and why.
Refused by the Treasury: it does not retry the spend.
Failed a hard cap: it does not retry.

This keeps the agent honest about money it does not have and spends it cannot make.

​NemoClaw

​Order of evaluation

​The deterministic gate

​Nemotron classification

​What the agent sees

NemoClaw

Order of evaluation

The deterministic gate

Nemotron classification

What the agent sees