Vibe coding is broken. Could Controlled Natural Language - CNL

Six months ago, “vibe coding” was supposed to change everything. Tell the AI what you want, sit back, and watch it generate working software. Andrej Karpathy, ex-Tesla and OpenAI, hyped it as the future: forget syntax, just describe your intent. The demos were intoxicating. Startups bragged about entire codebases written by GPT-like copilots. Prototypes spun up in hours. The dream: software without software engineers.

Reality arrived quickly. The code worked—for a week. Then it broke. One fintech startup discovered their AI-generated authentication module was checking passwords after granting access—a reversed if-statement the model had happily produced from the vague requirement “verify user credentials.” The problem isn’t the AI—it’s how we talk to it.

Fix the English, not the model

A growing group of researchers thinks the solution isn’t more GPU firepower but more discipline in how humans describe requirements. Enter controlled natural language (CNL): a restricted form of English with strict grammar and vocabulary designed to remove ambiguity.

Think of it as English with training wheels: restricted vocabulary, predictable syntax, zero ambiguity. Instead of writing—

❌ “Handle overdrafts properly with notifications.”

you write—

✅

Rule: If a customer’s balance falls below 0,
then send an overdraft notification email
within 60 minutes.

Both are English. But only the controlled version is specific enough to be parsed, validated, and—crucially—understood consistently by an LLM. The counterintuitive pitch: to make AI code useful, developers may need to give up some linguistic freedom. Not better AI, but better English.

Controlled prompts in practice

If this sounds familiar, it is. Behavior-Driven Development (BDD) has used structured “Given/When/Then” English for over a decade. Test frameworks like Gherkin force teams to write requirements in constrained syntax so they’re executable.

Controlled prompting extends that idea upstream. Instead of leaving AI prompts as one-off improvisations, developers define templates. One prototype, CNL-P (Controlled Natural Language for Prompting), looks like structured frontmatter:

Feature: Calculate loan payments
Context: User provides amount, term, interest rate
Requirements:
1. Use amortization formula
2. VIP users get 0.5% discount
Output: Python code + usage example

Feed this to a coding model, and the odds of getting something usable rise compared to a casual “write a loan calculator” prompt.

Why would developers embrace stricter English when they already groan at YAML? Because prompts are already YAML—just invisible, inconsistent YAML that lives in your head. CNL makes that structure explicit and shareable. One engineer’s carefully tuned loan-calculator request becomes a team asset instead of Slack ephemera.

Ambiguity as feature, not bug

Before diving further, it’s worth admitting the strongest critique: ambiguity isn’t always a problem. Sometimes it’s the point.

Loose prompts let models propose solutions humans wouldn’t have specified. Constrain too tightly, and you choke off creativity. Some developers already treat AI assistants like chaotic brainstorming partners—valuable precisely because they color outside the lines.

That’s why most researchers don’t propose controlled language as a replacement for vibe coding, but as a complement. Use free-form prompts for prototyping and exploration. Bring in CNL once the stakes rise: production systems, regulatory logic, multi-team projects. In other words, keep the vibes for early sketches, add discipline for the blueprints.

Context engineering: teaching the AI what matters

Prompt engineering gets the headlines, but context engineering may be just as important. Context is everything the model “sees” before generating code: prior conversation, snippets of source, system instructions, retrieved documents.

This is where controlled specs shine. A CNL requirement isn’t just a better prompt; it’s a retrievable artifact. Because the language is consistent and structured, you can index specs, fetch them with vector search, and feed only the relevant pieces into the model.

Instead of dumping a 20-page wiki into a single prompt, you could decompose a CNL spec requirement by requirement. Rule 1: generate function A. Rule 2: generate function B. Each piece is modular, traceable, and testable.

That’s not theoretical. Researchers working on CABERNET and RuleCNL have shown how structured requirements can be broken down and executed step by step. In vibe coding, that translates into an assistant that doesn’t just hallucinate an entire system but assembles it piece by piece against a spec.

Early evidence

Controlled languages aren’t new. Airbus manuals are written in a simplified English dialect so mechanics in Toulouse and Tokyo read the same meaning. Business Rule Engines (BREs) like Corticon let managers write “If X then Y” rules that compile directly into code.

What’s new is the marriage with LLMs. A recent paper, "WHEN PROMPT ENGINEERING MEETS SOFTWARE ENGINEERING: CNL-P AS NATURAL AND ROBUST “APIS” FOR HUMAN-AI INTERACTION", tested structured prompt formats on common coding tasks. The results were thin but suggestive: controlled prompts reduced syntax errors by roughly a third and required fewer iterative corrections compared to free-form equivalents. In a handful of student projects, teams using structured prompts reported faster convergence to working solutions. This isn’t enterprise-grade data. It’s lab-scale, with prototypes. But the direction is clear: precision helps.

Why this might fail (and why that’s OK)

Tooling is the first wall. Without IDE support—linters, autocomplete, CI checks—CNL will feel like bureaucratic overhead. The research prototypes are barely past toy status. No one wants to hand-roll specs in Markdown that drift out of sync with code.

The social dynamics are tricky, too. “This is just JIRA tickets with delusions of grandeur,” said one senior engineer who asked not to be named. “Now PMs can write English and expect working code? That’s not empowerment, that’s fantasy.”Another developer who experimented with structured prompts told Ars: “Half the team loved the consistency. The other half felt like we’d replaced coding with filling out forms.”

And don’t forget the models themselves. LLMs are probabilistic text generators. They complete prompts in ways that seem plausible, not necessarily in ways that follow rules. Even when you give them a structured input, they sometimes ignore it. Until models are fine-tuned on controlled languages—or strong validators sit between input and output—compliance will remain inconsistent.

Finally, there’s the creativity trade-off. Imprecision sometimes yields elegant solutions. Over-constraining may stop the model from surprising us. That may be fine for compliance systems, but less welcome in creative coding.

The future: who’s betting on this?

Work is underway. Research is refining CNL-P and building a converter that takes messy English and suggests controlled rephrasings. Academic teams behind CABERNET and RuleCNL continue publishing prototypes for business rules. A handful of open-source projects on GitHub experiment with “prompt linters” that flag vague terms like fast or secure.

But the ecosystem is thin. For this to matter, controlled prompts need IDE plugins, version-controlled specs, and CI integration. They need to be first-class artifacts, not research curiosities.

The likely path is incremental. Teams may start by templating just one class of prompts—API endpoints, data validators—before rolling out controlled patterns more broadly. Hybrid workflows are plausible: use vibes for prototyping, then lock down specs in CNL once features stabilize.

Bottom line

Vibe coding isn’t dead, but it has a maintainability crisis. Controlled natural language is one proposed fix: less improvisation, more structure. Even skeptics might accept this trade: stricter English for code that doesn’t rot.

The evidence is early, the tooling immature, and the social resistance real. But the question is provocative: what if the future of programming isn’t just better AI models, but better English?

The future of programming might not be written in Python or Rust. It might be written in a version of English disciplined enough to treat like code—and casual enough that humans can still argue about it over lunch.