Lesson #1453

External CLI agent caveats — gemini, codex, or-agent quirks Medium authority: 75

ID: 1453
Author: ai
Agent: agent-claude
Reviewed: ✓ Yes
Source authority: 75 / 100
Source: Lessons from 2026-05-04 fan-out: which external CLI works for what, what fails how
Source issue: —
Created at: 2026-05-12T10:00:22.870777+00:00
Valid until: —
Deprecated at: —
Supersedes: —
Obsidian path: /root/.claude/projects/-nvmetank1-projects/memory/feedback_external_cli_caveats.md
Obsidian hash: 16ad1509ba4e567f38a354403eefc173
Tags: claude-memory,feedback

Content

**Context:** Discovered while orchestrating fix(hero) + fix(buttons) yoga
work via external CLIs on 2026-05-04. Each CLI has hard limits worth
remembering before delegating.

## or-agent (npm @openrouter/agent + sdk)

**Strength:** Full agent loop with read_file/write_file/list_dir/run_bash/rag_search tools. Streaming output.

**Failure mode (free models):** With `qwen/qwen3-coder` (free) the agent loop **stalls after ~5-7 turns** on multi-file investigation tasks. Output is truncated mid-plan ("I'll start by examining... Now let me check..."), then exits with very low output token count (~70 out of 25-turn budget). Suspected cause: free-tier streaming + tool-calling combination.

**Mitigation:**
1. Use **paid `qwen/qwen3-coder-plus`** (~$0.20/M input). More robust under tool-calling pressure.
2. Tighten the prompt to **<10 turns explicit max** with numbered steps ("Step 1: read X, Step 2: edit Y").
3. Limit read+write file count: "Max 3 reads, 1 write".

## gemini-cli (headless `-p` mode)

**Strength:** 1M context window, fast. Read-only `--approval-mode plan` is safe.

**Failure modes:**
1. **`run_shell_command` is NOT available in headless mode.** Available tools are `read_file`, `replace`, `update_topic`, `grep_search`, `list_directory`. So no `git log`, no `cat`, no shell.
2. **Workspace path-restriction:** when invoked from a worktree (e.g. `/foo/.claude/worktrees/A1/`), absolute paths to the parent repo (`/foo/templates/...`) are **rejected** as "outside allowed workspace". Use ONLY relative paths.
3. **Hallucinated output:** When tool-calls fail repeatedly, gemini may end with text like "I have created the file..." but the file does NOT exist. Always verify with `ls` after.

**Mitigation:**
- Use gemini for **read+analyze on small file-set** with relative paths.
- Don't ask gemini to invoke shell commands.
- Verify outputs after gemini exits.

## codex-cli (ChatGPT Plus)

**Strength:** Strong at code-review, pytest, small coding tasks. Free via subscription.

**Hard constraints:**
- Constitution `agent-codex.path_prefix_allow = ["/nvmetank1/projects/glug"]` only.
  - Cannot write to: yoga, rag-stack, pokemon, /docker/yoga, /etc.
- Trust list in `/root/.codex/config.toml` per-project. Yoga + rag-stack ARE trusted as of 2026-05-04 (visible in config), but the constitution policy still wins for path-prefix.

**When to use:** glug-only tasks, especially pytest-write and code-review.

## qwen-local (Ollama, qwen3-coder:30b)

**Strength:** No quota, no rate-limit, runs locally on host GPU.

**Constraint:** No agent-loop wrapper out of the box. Use as one-shot prompt → response. Not for multi-turn read-edit-write.

**When to use:** Bulk classify/format/summarize, single-pass code-snippets.

## Decision matrix

| Task | First pick | Fallback |
|---|---|---|
| Yoga CSS fix (write) | or-agent paid `qwen3-coder-plus` | Claude (last resort) |
| Glug pytest write | codex `gpt-5` | sonnet via subagent |
| Long-context review (any repo) | gemini `-p --approval-mode plan` | claude opus |
| Bulk refactor (e.g. plugin renames) | or-agent + qwen3-coder | qwen-local |
| Code snippet, single function | or-chat `coder` shortname | sub-agent code-snippet |
| Issue classification → JSON | sub-agent classify (free) | or-agent gpt-oss-20b |

## Forward: rag-stack#92 dispatcher

`agent-dispatch route --task "..." [--prefix X]` already encodes most of
this. Use it before manually picking. Override when you have a specific
constraint (e.g. "this is a yoga write, codex blocked").