Token limits have been the number 1 problem of vibecoding. Aside from high usage, vague prompting and inefficient context management are two key causes that contribute to this problem. Tokens are counted from two things, the user’s input and agent’s output. You can have the best prompt in the world, but if your AI puke a Shakespearean essay for each response, your token will reach its limit faster. Similarly, if your prompt is too long, your token usage will also skyrocket.
Prompting
Prompting seems easy enough, but the reality is, if you want quality output, the input must be of similar quality. Here are some tips you can try to achieve a better, token-efficient prompt.
Stick to English language only
Certain languages may use higher tokens. Switching between languages might not be a good idea. Sure you can control your input prompt, but the output may use more tokens.
Extremely short input
Do not use ‘please’ or ‘thank you’. Get rid of filler words. Straight to the point. Talk like a caveman. ‘I want to make a fire by heating up wood, thanks.’ is bad, ‘Make fire with wood’ is good.
Extremely short output
You caveman. Is agent caveman too? Install the caveman extension to make sure the output is always just as short as the input. Installing the extension is easy, and it's properly documented there, so go check it out.
Do not refactor outputs
Why ask your agent to refactor a code they just gave you? Instead, make sure your prompt is clear enough so the output code doesn’t need to be refactored afterwards.
If you want to check how many tokens your prompts are using, verify it on this tokenizer website
Context Management
Set reference points
Mention files to refer (eg: @file_path/file) and mention functions to refer. Do not directly paste code into the prompt input, mentioning is better. Code outputs that are based on existing code always use less token than a completely new code output.
Long context is expensive
Start a new session for a new task. Clear conversation once you’re done. Regularly review context usage with /context command.
Use context management engines
This one is a bit ‘open’. There is no one-solution-fix-all here. Some use a custom made skill, some use extensions, some use softwares. It’s completely up to you. The key is to make sure your agent has just enough context to ensure good token efficiency while still having a good response. One tool that I use is Vexp.
TLDR
Why use many words when few do tricks? Embrace caveman conversation to use fewer tokens. Hope this post helps your vibecoding sessions.
Comments
Post a Comment