#012: Building my own coding agent: Human-in-the-loop
One of the first things I wanted to do after setting up the foundations for my coding agent, agx, was to add human-in-the-loop (HITL) controls to it. These controls allow the user to:
- Require explicit approval before running tool calls (creating/editing files, running bash commands, etc.)
- Reject tool calls (with optional feedback to alter the approach taken)
- Interrupt the conversation at any time
Controls like these are non-negotiable for any coding agent that can perform
destructive actions on the user’s machine. Without them, a single hallucinated
rm -rf or an overzealous file edit could cause real damage. However,
implementing HITL controls adds complexity to the agentic loop — state
management becomes trickier, and edge cases multiply. This post describes how I
added such controls to agx and how it led to a fundamental change in its
architecture.
A basic prototype
As described in the last post, I’m using the rig crate to abstract away the
low level details of interacting with LLM provider APIs. The first
implementation of agx relied on rig’s “multi-turn” functionality for running
the agentic loop, which takes care of calling tools and sending API requests to
provider APIs.
I started by adding basic controls: approve/reject tool calls by typing
<enter>/n, alongside the ability to provide feedback to instruct the LLM to
take a different approach. I also added support for interrupting the
conversation when the user presses Ctrl+c, which involved properly handling
any in-progress tool call invocations.
rig exposes a “hooks” mechanism where certain functions can be called before a
tool call is fired. Getting a prototype of the HITL controls with this mechanism
wasn’t too difficult, but it felt architecturally wrong. The hook mechanism
introduced indirection that made state harder to track, and forced me to work
with raw JSON values instead of the concrete types I’d defined for the tools. If
I needed to show additional context for a tool call (eg, a code diff), I had to
parse and validate it, something that rig’s internal mechanism would be doing
again after the tool call was approved. It felt like I was fighting the
abstraction rather than using it.
The HITL prototype worked, but would sometimes cause issues when a tool call was
rejected or the conversation was interrupted. Given how large the payloads sent
to LLMs can become, relying on tracing to debug this became tedious quickly. So,
I decided to take a tangent — something I am guilty of doing quite a lot — to
solve this issue: add a debug UI to agx that would clearly show me its
internal state at each turn of the conversation.
Opening up the Black Box
The debug UI is written in Gleam, using the Elm-inspired framework Lustre. It’s powered by a debug server that runs concurrently alongside the agent. As various events occur in the conversation, this server forwards them to the UI using SSE. The UI renders these events in a vertical layout, with each event getting a distinct color. The UI also includes a minimap to help understand the agentic loop from a higher vantage point, and to allow for easy jumping to a specific event.
This debug UI helped a lot in understanding the lower level details of the
abstractions provided by rig. It also helped me understand why the HITL
prototype was failing on rejections and interruptions: I wasn’t always managing
the chat history correctly when these happened. Most LLM provider APIs mandate
that each tool call must be associated with a tool result, something I was
failing to do in some edge cases.
Besides helping fix the HITL issues, it surfaced a problem I hadn’t noticed
before: rig, at the time of writing, doesn’t include assistant text in the
chat history when it precedes tool calls, which means the LLM loses context
about what it said before in subsequent turns. This led me to discover that it
manages its own internal history for a multi-turn session, one that diverged
from the history I was maintaining manually.
Armed with the debug UI, I was able to properly manage the chat history in the case of rejections and interruptions. It worked but didn’t feel elegant in the multi-turn setup. This, alongside the fact that building HITL controls via the hook mechanism felt brittle, led me to decide that I should be managing turns manually.
Manual Control
Managing turns manually required me taking control over every part of the agentic loop:
- Send prompt and stream the response
- Capture assistant text, reasoning, and tool calls
- Parse and validate tool calls
- Pause for user approval, if required (HITL)
- Execute tools (or handle rejection)
- Update chat history and repeat
This change required a big refactor to agx, but opened up the door to several
improvements:
- HITL fits nicely in the agentic loop. I get manual control over parsing tool call requests, validating them, displaying context for them to the user, and executing them
- Tool call rejections/cancellations are easier to handle
- There’s one chat history to maintain
I’ve implemented a simple “approval system”, where the user can decide to
approve certain tool calls for the session (like creating/editing files), and
others for every invocation (like running bash commands). The user’s choice for
the latter kind gets stored in the directory where agx is run, and can be
picked up on further runs.
Having the debug UI available during this refactor was quite helpful — it allowed me to verify exactly what state was being tracked at various steps. I expect it to continue being helpful as I add more features.
I haven’t yet seen how feature-rich agent building toolkits enable HITL, but
with a lower-level abstraction like rig, implementing it meant confronting the
architecture of modern agentic loops directly. That’s exactly the kind of
understanding I was hoping to gain by building agx in Rust.
Next up: support for configuring multiple agents, and maybe a more ergonomic UI.