I kept hitting Claude Code rate limits way earlier than made sense.
Not because I was doing anything at scale. I wasn’t running massive jobs or processing huge datasets. I was just using it — normally — and somehow burning through tokens faster than I expected.
Turns out most of the inefficiency wasn’t from what I was doing. It was how I was doing it. Specifically: unnecessary context. Claude was doing a lot of work that had nothing to do with the actual task.
Spent a few weeks paying closer attention and cleaning things up. Sharing what I found in case it saves someone else the same frustration.
The quick wins
These are the ones that took maybe five minutes to change and had immediate impact.
Edit your original prompt instead of stacking follow-ups. This was a habit I didn’t realize I had. When Claude gave a mediocre answer, I’d just… reply. Try to redirect. Add more context. The problem is every reply extends the chain, and Claude has to reason over all of it — the good parts, the bad parts, the failed attempts. Going back and editing the original prompt is almost always cleaner. You get a better answer and don’t carry the weight of a failed exchange into the next one.
Reset conversations periodically. Around every 10–20 turns, I’d ask Claude for a summary of where we were, then start a fresh session with that summary as the opening context. Sounds like extra work but it’s really not. You keep almost everything that matters, and you drop all the cruft. Same quality of continuity, fraction of the tokens.
Match the model to the task. This sounds obvious. I still wasn’t doing it. I was defaulting to more powerful models far more than necessary — Opus for things that Sonnet handles easily, Sonnet for things Haiku would knock out. Once I started being intentional about it (Haiku for simple, scoped tasks, Sonnet for most work, Opus only when I genuinely need deep reasoning), the usage dropped noticeably.
The ones that actually made a dent
Be specific about files. “Fix this repo” is an expensive instruction. Claude has to orient itself, scan around, figure out what’s relevant before it can do anything. “Look at /src/payments/checkout.ts” skips all of that. The more precisely you point it at what matters, the less searching happens — and searching is where a lot of tokens quietly disappear.
Keep a simple repo map. This is just a short reference document — a few lines — that tells Claude where things live. Auth lives here, payments there, utils somewhere else. You paste it in at the start of a session and Claude navigates directly instead of exploring. It’s a small thing to set up and it changes how sessions feel entirely.
Use /compact to control context growth. Honestly this was my biggest single inefficiency. I wasn’t managing context at all. I just let it grow indefinitely, assuming more context was always better. Running /compact periodically trims it down to what’s actually relevant. Combined with a soft mental limit on session length before I compact or reset, this made a real difference.
Add a .claudeignore. Think of it as a .gitignore for tokens. node_modules, build folders, large generated files — none of that needs to be in Claude’s context. Setting this up once means it’s never a problem again.
If you want to go deeper — look at rtk. I came across this repo a few weeks in and found it genuinely useful. The idea is that when Claude Code runs commands, rtk sits in between and strips out tokens that are useful for humans but wasteful for agents — extra whitespace, unnecessary newlines, that kind of thing. Stuff we need to read code comfortably, but Claude doesn’t. It’s more involved to set up, but if you’re using Claude Code heavily it’s worth knowing about.
None of this is individually groundbreaking. But it compounds. The cumulative effect of cleaner prompts, shorter sessions, the right models, and less noise in context — it adds up faster than you’d expect.
I’m still looking to tighten things further. Curious what others have found that works.
Leave a Reply