Version control for everything

2026-02-12

AI assisted agentic coding has reached escape velocity, but non-programming use cases haven’t seen the same degree of adoption. I believe that the main reason for this is the lack of version control.

Imagine using claude-code outside of a git repository. Even for small things like refactors, using AI would be very stressful and error prone:

  • It would be near-impossible to track the changes that the LLM made. Auditing the LLM generated code is useful in the moment to ensure that changes are reasonable before moving on to another task, and in the future when you want to understand why some code was written
  • The LLM could put the codebase in a bad state and you’d have no way of reverting This is true even if the LLM is incredibly smart and didn’t make any “mistakes” - the human prompter forgetting to tell it about a design constraint could be bad enough
  • There’s no split between the “development branch” and “prod”
  • You can’t parallelize development by having multiple LLMs work on different branches

Even when I pay Claude to work on a small script, I always create a new git repo just to make my life easier. But outside of coding, it’s nearly impossible to find tooling that has the same guardrails and affordances.

Case study: software development outer loop

Managing the software development process is hard. We use issue trackers and pull requests to manage work, people write documentation and communicate over email, instant messaging, and in meetings. Keeping all of the information in these channels synchronized and up to date is a full time job. In this scenario, let’s say we’re concerned with the following systems:

  1. github issues (read/write)
  2. pull requests (read/write)
  3. google calendar (read/write)
  4. google docs (read/write)
  5. gmail (read)
  6. slack (read)

and you’re interested in using an LLM to find places where some information hasn’t made its way from one service to another e.g. update an issue with new information after an email conversation . This task is hard in isolation though I think that today’s LLMs could do it but the biggest issue is that none of these services have built-in mechanisms that would allow the LLM to propose an action to be reviewed by a human.

Option 1: a proxy layer

Without changing any of the underlying services, you could imagine building a proxy to add a “pull requests” layer that would allow staging changes across multiple underlying services and allow review before publishing the changes. An agent would act through this proxy, which would aggregate the mutations until someone could review, approve, and publish them.

This is challenging for a few reasons:

  1. Building one-off systems like this is time consuming, it’s tied to the specific workflow and complexity grows as you need to integrate more services.
  2. “Revert” would probably be out of reach. Underlying services might not provide functionality necessary to implement “undo" especially when reverting a change that has already had other changes stacked on top of it .
  3. Detecting and resolving merge conflicts in the underlying services isn’t always possible.
  4. There’s no atomicity. If someone clicks the “publish” button and one service rejects the change for any reason, all the previous changes to other services are already out in the world. This is especially troubling if the already-published changes can’t be reverted
  5. The layer on top of the underlying services prevents you from seeing what the whole state of the world would look like if you were to merge the change. Imagine having to do code review, but could only look at the diff instead of being given the diff and the ability to see the entire contents of the codebase before and after the diff is applied.

Option 2: put everything else in git

If you’re ok with leaving github, google docs, etc, then you could move this functionality into git. Jane Street famously does code review by embedding code review comments directly in the source code as code comments, and this workflow decision makes it trivial to involve LLMs in code review because everything in the process is tracked with version control. Why not put issues alongside the codebase? Pull request and code-review metadata in source control? Design docs from Google Docs to checked-in markdown? It would be ideal if all of these were stored in the same repository as the code itself so that changes to code, issues, and docs could be made in a single atomic update instead of having to coordinate across services.

In my view, the main obstacle here is that without serious dedication, the user experience for humans would be a major downgrade. This isn’t insurmountable, but it would be a lot of work.

A better world for LLMs is a better world for me

Although I’ve framed this blog post as “things that would make LLMs more useful outside of programming”, you could just as easily replace “LLM” with “Junior Developer” or “Senior developer” and all of the points would hold. It’s not just agent-style LLMs that would benefit from this integration, I would be more productive if all of my tools had branches, version history, and atomic changes.

It can be hard to get management to invest in developer productivity tooling, but for the next few years I think it’d be easier to justify spending on “AI Infrastructure” that happens to be a better experience for devs as well. Maybe you could use this to your advantage :D

Extra reading

Here are some links that you might find interesting:

  • The Local-First Software movement has been championing merge-based and conflict-free techniques for synchronizing data and documents.
  • Irmin is an OCaml library for building git-like databases with branches, merges, etc.