Here's how almost everyone uses AI: you type a request, read the answer, decide it's not quite right, and type a slightly better request. You are the loop. You're the one holding the standard, spotting the misses, and deciding when it's good enough. The model just answers and forgets.
A loop moves that job into the system. Instead of one ask, you set up a small machine that asks, checks its own work, and asks again — until it's good or it runs out of tries. You stop babysitting; the loop carries the standard.
A loop is three things
- Objective — what "good" actually means, in one sentence. Not "write a good post" — "teach one useful idea in under 200 words."
- Metric — how the system itself can tell a better attempt from a worse one, without you reading every word. Usually a short list of yes/no checks. "Does the first line earn the click? Is there exactly one idea? Would a smart reader learn something new?"
- Boundary — how far it runs on its own before it stops or checks back. "Try up to 10 times, then show me the best 3."
Get those three right and the model can grade its own drafts, throw out the weak ones, and keep the strong ones — on its own.
flowchart LR
O([Objective]) --> W[Write a draft]
W --> J{Judge it
against the metric}
J -- "below the bar" --> I[Critique & rewrite]
I --> J
J -- "clears the bar" --> D([Keep it])
J -. "hit the boundary" .-> DWhy it beats asking once
One-shot prompting gives the model no feedback — it answers into the void and you do the quality control by hand. A loop closes that gap: every attempt gets scored, and the score steers the next attempt. Quality compounds instead of depending on you catching every flaw. The same idea runs under most of the impressive "agentic" AI you've seen — it's not a bigger model, it's a model in a loop.
The catch: the metric is the hard part
This is where loops live or die. A vague check like "is it engaging?" gives the judge nothing solid to grade, so it returns mushy 7s on everything and the loop stops discriminating. The fix is to make the metric specific and observable — a question you could answer yes/no about a single draft. Writing a good metric forces you to actually decide what "good" means, which is most of the work and most of the value.
One more upgrade: have a different model do the judging than the one that wrote the draft. A model grading its own work flatters itself; a fresh judge with a strict rubric is far harder to fool.
Design one right now
Fill in the three parts below and watch a real, runnable loop prompt assemble itself. Change the objective to your task, sharpen the checks, set the bar — then copy it into any chat model, or run it for real.
This isn't theory — this site runs on one
Every news post and playbook here is written by one model, then scored 1–10 by a separate model against a rubric, and rewritten if it falls short of the bar before it's allowed to publish. That loop is why the site can run itself without a human reading every line. The metric is the rubric; the boundary is the score threshold. It's the exact shape described above, in production.
Once you've felt how much better a loop is than one-shot, you'll want to build your own — pick a task, write the objective, sharpen the metric, set the boundary, and let it run.