Agentic coding - a review of Claude Code¶

I wrote previously about some thoughts on ai coding, and today I'd like to expand on my recent experience. Earlier this month, I took the plunge and bought the Pro subscription to Claude.ai.

Why Claude? No real reason, other than recommendations from a few friends at work. While ChatGPT seems to be the rage these days, Claude performed better at coding tasks. Since I do like to write a bunch of code, it felt like a good move. What held me back initially was the cost, but it turns out that even on the $30/month subscription to Claude, you also get access to Claude Code.

Warning

I would like to stress that I am not endorsing Claude as a product. If you found this article useful, then good on you! Please don't come complaining to me when your solutions crash and burn. That's on you! More about that in the rest of the article.

About Claude Code¶

To understand my experience with Claude Code, let me first explain my prior approach. My previous coding with AI revolved around asking ChatGPT some very specific questions, it produced some Python code for me, and I manually copied & pasted the code into my project. It allowed me a level of control, quality, and assurance that what I'm doing is following my required architectural patterns. The downside of this pattern is the time it takes to ask questions, copying and pasting, and reformatting because the AI just didn't get it quite right.

Claude Code on the other hand was a godsend. As a tool, it is command-line based. You install it on your machine, and in your git repo, run claude, activate it with your subscription, and away you go. Ask it to update your application, review some code, whatever you can think of. It will read the repo and start making code updates, right within your repo. It was both amazing and frightening at the same time to see what this thing could do.

With this foundation in place, I decided to put Claude Code to the test on my personal projects.

How Claude solved (and ruined) my side projects¶

Like many of you, I also have private GitHub repos with a collection of my side projects. An idea, a project, that multi-million dollar business plan that will solve all my financial problems, yet I never have the time to work on them. Then came Claude. During an early morning coding session, I decided to test Claude's capabilities. I pointed it to one of my repos and asked it: Review the code, and update the function to do x, y and z. It went off, read the repo, got an understanding of the codebase, what it does, and like magic it started updating my code. And to my amazement, it worked. Within a few minutes it was able to do what took me weeks (if not months) of painstakingly back and forth and experimenting to achieve.

One of my other projects was a bit more interesting. I've always had the vision of creating a tool to help businesses, think something like Salesforce, combined with Atlassian and SAP. That all-encompassing workflow and business management processes, all rolled into one easy-to-use, intuitive, secure application. There is a level of complexity in my design. For one, my design was that of a code generator - you provide the tool with your business logic, and the tool goes off and generates Python code in Django for that application.

Claude did an excellent job with the ask. It did such a good job with it, that I eventually rejected the entire project, because it is not what I wanted. This led to an unexpected realization - something I thought for years was going to be a big win turned out to be a disaster. The reality was I never anticipated the user experience - the person who needs to develop their business application on my tool. While it was a bit of a disaster for me personally, using Claude as a prototyping tool to get something spun up quickly turned out to be a big win.

After reflecting on this experience, it occurred to me - while Claude may be much faster at coding, let's try a few things. Back to my side gig, I took a different approach. This time, I let Claude do everything. Not a single line of code from me, other than just instructions. I started crafting my claude.md file, with very specific instructions. I told Claude (just like I would tell a junior developer) exactly what my expectations are, and what I expect the app to do.

Something I also picked up was you can't throw too much at it. You have to break the ask up into smaller chunks. So in my claude.md file I broke it up into specific tasks (or sprints). A few takeaways from this:

Claude generally completed the task.
It didn't get it all right on the first try. Sometimes it got confused, and I had to explicitly tell it what to do.
It made assumptions - things I didn't ask it to do it would decide to add functionality that I never asked for.
You can update the claude.md file to instruct it not to go off on its own, but it doesn't always follow the rules.
It is able to do troubleshooting. It can run commands and test the output, and confirm that things worked as designed. But please, DO NOT DO THIS IN PRODUCTION!!!
It got confused between the templates being used by the code generator, and the code being generated by the generator. It would end up fixing bugs in temporary files rather than the source. That's likely on me - my project is a bit complex and abstract.
Even with explicit rules in the claude.md file, it would sometimes not follow what I instructed it to do. Even when I explicitly tell it to keep the requirements.txt file up to date, or keep the README.md up to date with usage instructions, it would not follow it. I suspect it runs out of context tokens, and can't remember to do it all.
There were times where it was "too dumb" to understand the problem. It keeps fixing something, yet is not able to provide any different result.
The code quality was something different - it was convoluted, complex, wrapped in so much abstraction that no one can really understand what this thing was thinking when it designed it.

Critical Analysis of AI Coding Risks¶

These experiences, while educational, highlighted some concerning patterns in AI-assisted development that warrant deeper discussion.

Old man

I've done the extremes - I've used Claude as an assistant, where I write the code, and ask it to contribute a function or a module. I've gone the other extreme, and let Claude do everything. What struck me is that Claude is terrible at design and architecture. It focuses on your ask, and when it runs out of context, it will dumb it down to that one ask only. It struggles to keep all the context of the entire solution. My better results came when I retained control of the architecture, reviewing the code, validating that I understand what it did, and confirm that the code it produced for me actually fits within my overall design. When you start letting loose on the design & architecture controls, that's when things go wrong.

Let's look at a few examples of where AI Coding has gone wrong in the recent weeks.

Exploding costs¶

There's a story about a bloke who used an AI Tool like Claude to develop a SaaS product. He was very impressed with himself, and created a one-man business. Problem was that the tool created such a convoluted architecture, that his AWS costs were so high that his business tanked before he could even get started. However, if you ask an AI to build you a highly available AWS architecture, it will do that. You need to make sure you actually have the income to cover those infrastructure costs.

Unmaintainable code¶

Some organisations are now adopting a mindset that AI will replace all their developers, so they're reducing their development teams significantly. While I agree that AI coding will certainly shift the needle, it will never be able to fully replace developers. AI is generating code that may make sense to an AI, but without proper QA and control the code will be unmaintainable when things go wrong.

Deleting production data¶

There's been a story recently of an AI agent that accidentally deleted production data. Somehow the AI "panicked" and recreated around 4000 fake records. Why you'd allow an AI agent to have access to a production database (let alone ADMIN access) is a bit bizarre.

No security¶

The recent one, the "Tea" application, an AI generated app that allows people to take selfies and share that with each other. It turns out the architecture relied on an AWS S3 bucket with a public bucket policy, so the entire bucket, with all user data was simply left open. Was the AI asked to secure the bucket? Who knows. The reality is the app was not "hacked" - the door was left wide open.

Key Takeaways and Future Considerations¶

AI coding is the next revolution - no doubt about it. Like with any new technology, we need to be careful. You cannot fully outsource everything in your IT shop to AI. Sure some things can be done better by AI, and just like you don't trust a junior in your team to make executive decisions, similarly we should not allow AI near production code, until code has been reviewed and QA'd, and preferably not by another AI.