dhruv's devlog

#011: Building my own coding agent: The Foundations


Command line agentic AI solutions have taken the software engineering world by storm. It’s fascinating how much progress has been made in this field this year (Anthropic released the research preview of Claude Code on Feb 24, 2025). Since then, a lot of other contenders have emerged: opencode, codex-cli, ampcode, cursor-cli, gemini-cli, copilot-cli, etc. I’ve been interested in how these agentic tools work for a while now, so I’ve decided to build one of my own.

The idea isn’t to build something that gets widely used, but to understand the principles and technology that go into building agentic developer tools. I’ll document the process in a series of posts here, hence the suffix to this post: “The Foundations”. I might also build agents in more than one tech stack to understand their pros and cons. For the first one, I’ve chosen Rust as the programming language, and started work on agx (short for “agentic executor”).

I prefer command line agents because they let me stay in the terminal (it’ll take a LOT more innovation in the AI IDE space to get me out of the terminal). I also like that they’re a thin abstraction on top of the underlying LLM being used, as opposed to being bolted onto an already bloated piece of software. Since they can also be run in command line only mode (as opposed to a TUI/console mode), they can be composed as regular UNIX tools, which I highly value.

Humble Beginnings

My first milestone for agx was to have a simple agentic loop in place that can perform common software engineering tasks, even if not in the most efficient manner. Having already built a primitive agent earlier this year following Thorsten Ball’s excellent blog post How to Build an Agent, I knew this wasn’t crazy complicated — the agent needs tools to read/edit/list files and run shell commands, and the LLM models trained with tool calling abilities take it from there. The effectiveness of this simple setup shocked me when I first saw it in action, and it still does. agx has been equipped with these capabilities. Here it is in action, modifying its own source code.

The Experience so Far

Adding support for every LLM provider manually is a lot of work. To abstract over communication with provider APIs, I’m using the Rust crate rig. Out of the box, rig provides a clean API for prompting LLMs and deserializing their varied responses into a common interface.

It remains to be seen how well this approach will hold up once I want to support switching models mid-conversation. What I definitely like, though, is rig’s approach to tool definition and invocation. The Tool trait provides a simple, ergonomic way to define tools, and Rust’s type system really shines here. Here’s what a typical tool definition looks like.

#[derive(Debug)]
pub struct SampleTool;

#[derive(Debug, serde::Deserialize)]
pub struct SampleToolArgs {
    pub prop: String,
}

#[derive(Debug, serde::Serialize)]
pub struct SampleToolOutput {
    pub result: String,
}

#[derive(Debug, thiserror::Error)]
pub enum SampleToolError {
    #[error("invalid input provided: {0}")]
    InvalidInput(String),
    #[error("an unexpected error occurred: {0}")]
    Unexpected(anyhow::Error),
}

impl Tool for SampleTool {
    const NAME: &'static str = "sample_tool";
    type Error = SampleToolError;
    type Args = SampleToolArgs;
    type Output = SampleToolOutput;

    async fn definition(&self, _prompt: String) -> ToolDefinition {
        todo!()
    }

    async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
        todo!()
    }
}

Beyond tool support, rig has good support for streaming LLM responses, which is critical for making the agent feel responsive. Each item in the stream can be one of the following variants.

#[derive(Deserialize, Serialize, Debug, Clone)]
#[serde(tag = "type", rename_all = "camelCase")]
#[non_exhaustive]
pub enum MultiTurnStreamItem<R> {
    /// A streamed assistant content item.
    StreamAssistantItem(StreamedAssistantContent<R>),
    /// A streamed user content item (mostly for tool results).
    StreamUserItem(StreamedUserContent),
    /// The final result from the stream.
    FinalResponse(FinalResponse),
}

The stream yields “assistant” content as it arrives (text, tool call requests, reasoning), “user” content when tools return results, and a final response when the agent completes its turn.

“assistant” content can be one of the following variants.

#[derive(Clone, Debug, Deserialize, Serialize, PartialEq)]
#[serde(untagged)]
pub enum StreamedAssistantContent<R> {
    Text(Text),
    ToolCall(ToolCall),
    ToolCallDelta {
        id: String,
        delta: String,
    },
    Reasoning(Reasoning),
    ReasoningDelta {
        id: Option<String>,
        reasoning: String,
    },
    Final(R),
}

These types make stream handling quite ergonomic. In the demo above, as agx explores the codebase and makes changes, all of that is streamed incrementally, making the agent feel responsive rather than frozen while waiting for the LLM.

Context management is quite naive at the moment. I’m simply maintaining a list of messages exchanged between the user and the LLM, appending messages to this list while handling the output stream. This means that as conversations grow, the entire history gets sent with every request, which quickly becomes expensive and eventually hits context window limits. This is one of the first things I’ll need to improve in the subsequent iterations.

Next Steps

I want to make this agent usable for myself, which will require several more features. Here’s my list of features to add next:

I chose Rust for the first implementation because I like to know how things work at a lower level before using higher-level abstractions. Having said that, I’d also like to investigate more feature-rich agent building toolkits like LangChain, Google’s ADK, Vercel AI SDK, and others to see what they offer.