dhruv's devlog

#003: Jules v. GitHub Copilot coding agent for a simple task


A user opened an issue on hours asking for a keymap to save time logs without requiring a comment. After implementing it, I realized punchout could benefit from the same feature since both tools have similar time tracking functionality.

This task seemed like a good contender for testing asynchronous coding agents — it’s quite simple, has a reference implementation, and doesn’t need much new “thought”. So, I gave the task to GitHub Copilot’s coding agent and Google Jules with the same prompt, which is listed below.

Take a look at this change that I've made in another tool which has similar time
tracking functionality as punchout. This change adds a keymap that allows the
user to save a time log without a comment. punchout can make use of this keymap
too. The only caveat is that this keymap should only be available if a fallback
comment is configured. An error message can be shown in that case.

Ensure that the time being tracked is greater than a minute (as is being done in
the change).

Here the change:
https://github.com/dhth/hours/commit/fb3601c20154867e4773b17e5eb721a2bb7477d3

Copilot

Copilot’s agent veered off course almost immediately. Despite GitHub MCP being available on the agent VM by default, it didn’t use it to examine the reference commit. Instead, it went off and implemented features I didn’t ask for, missing the core functionality entirely.

The first review required 9 comments from me pointing out what was wrong. Even after that feedback, it still didn’t leverage the MCP to properly understand the reference implementation. The second attempt was better than the first, but still far from what I needed.

Jules

Jules performed noticeably better. While its UI didn’t explicitly show which tools it used behind-the-scenes, it clearly accessed the reference commit since its implementation closely mirrored the original. The code structure and approach were on target, requiring only minor adjustments to be complete. At the end, I implemented the change myself, but could’ve very easily continued where Jules left off.

Summary

Since Copilot is native to GitHub, the experience of providing feedback directly on specific PR lines feels natural. But the agent’s performance was disappointing — maybe explicitly instructing it to use GitHub MCP would’ve helped (though you’d think that would be the default behavior).

Jules performed better at understanding the task and reference material, but its UI feels rough around the edges at the moment. Since it’s an external tool, I am not sure how Google will implement back-and-forth communication from the PR itself, as well as reading the output of CI runs for feedback.

For this particular task, it would’ve been much faster if I had implemented the changes myself. I’ll still keep an open mind going forward.