At some point, or multiple points, anyone working with or using AI tools has an oh shit moment where it does something magical or amazing that makes you stop and think for a minute. Here’s one of mine I had recently
The Backstory
We’ve a microservices setup where there’s a few different services running all together, talking to each other on a kubernetes cluster. Standard fare for a lot of software companies these days. Recently we’ve been adding a new one which is deployed to production but hidden behind feature flags so is internal use only for now. It is actively being developed so there’s lots of churn with loads of changes going on and the full pipelines and checks haven’t yet been set up. Some things are still yet to be done but we’ll get there.
We also run this locally for local dev. There’s not many services, not yet anyway, so it’s not too much of a burden to run the three or four services on the local machine. All this could and should be improved but it is what it is for now. We’ll get there.
The Problem
Anyway this story begins when someone deployed a change which broke the service without anyone noticing for a day or two. Because those checks are not done yet. Again I said this is a new thing so nothing unexpected for this stage of the project. After some initial debugging and looking at logs nothing was showing up, the pod just failed immediately after a deploy and was rolled back. The service still worked but the change wasn’t being deployed.
So we added some logging to the Node.js app on startup and this gave us a log like this
{"message":"Failed to start HTTP server: listen EACCES: permission denied tcp://10.2.3.4:3000","severity":"ERROR", et cetera dot dot dot
Strange because the port is correct and that’s an internal IP for kubernetes so nothing crazy yet.
Enter Claude
Asking about this, it correctly and quickly diagnosed that the tcp://10.2.3.4:3000 is in fact incorrect. That should be the port only, so 3000. Where’s the IP coming from then? That was the IP of the service the deployment is behind. Where’d that come from?
First it started telling me that we’re setting the wrong port but the yaml proves otherwise. By now the deployment had been rolled back so nothing existed on the cluster. I grabbed the yaml before it disappeared on the cluster during the deploy so I gave that too to Claude.
The Issue
After churning for a bit, looking at the code, the yaml and whatever else it does, it came back with this. As part of our aforementioned local setup, because we need to run them all on different ports we set the ports something like this
const port = process.env.APPNAME_PORT || process.env.PORT || 3000;
So running locally we can set each app port differently, while on kubernetes we can use the standard PORT environment variable to tell it where to run. Makes it easy for us and for developers.
Anyone with proper experience with kubernetes can probably tell immediately where the issue is here but not me. Claude correctly identified it. The APPNAME_PORT is on of the special environment variables in kubernetes which it automatically injects to containers. Here, it injects SERVICENAME_PORT which happens to clash as our APPNAME is the same as SERVICENAME. We expected the port to be set to 3000 which is the value of PORT, but instead it was set to the value of the automatically injected APPNAME_PORT which is tcp://10.2.3.4:3000.
The Solution
When Claude first suggested it I wasn’t sure if it was another hallucination or a real thing. It just fit the problem too perfectly and seemed too easy to be it. And how did I ever not know this was a thing where these get injected? I asked for the references and here’s the link: https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables
Once we found this, the solution was easy, just change our env var to something else and it immediately worked
TIL
Kubernetes injects environment variables to containers so you better make sure you don’t clash with those. The full list is here: https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables
Claude Code is good. I mean sure, sometimes it still does stupid stuff so you’ve to redirect it back to the right track again but then it does something like this in minutes which would have taken me a lot of time and a lot of headache to figure out. Scary where it’s going, but interesting