Menu Planning: How to Scale Without Burning Your Budget

Most restaurants aim to get a first dish on the table within five to ten minutes of the order being placed. Not because salads are the most important part of the meal, but because the kitchen needs breathing room. If every table expects their entrée immediately, the line gets slammed, tickets back up, and quality collapses. You serve the salad first so the sauté station has time to properly fire the steak.

This sequencing is called menu flow, and it’s the difference between a kitchen that hums and one that burns. The same principle applies to AI. Most companies are building what I call “God Prompts,” massive blocks of instructions sent to expensive models, expecting complex results in a single shot. Like a table asking for all of the dishes to be served at the same time, the table has no space to hold all the plates and the kitchen gets backed-up. The system gets overwhelmed, and you pay premium prices for inconsistent results.

The solution is prompt chaining: breaking complex tasks into sequential steps where the output of one model becomes part of the input for the next. Each step handles what it’s designed for. Nothing gets overwhelmed. And critically, you place your core instruction right before asking for outputs, not at the beginning just to become buried under layers of context. Just like a chef checks the ticket one final time before the plate leaves the pass.

The God Prompt Problem

I recently worked on a consulting project involving a high-volume search platform. The goal was to take raw, often ambiguous user queries and break them down into four key areas:

Core User Intent: What are they actually trying to do?
Main Keywords: What are the essential “ingredients” of the search?
Areas of Ambiguity: Where is the query or unclear?
Potential Next Searches: What is the next course in the user’s journey?

Early versions of this tool used a single, top-tier reasoning model to do everything. One prompt. One model. One shot. The results were okay, but the cost was astronomical. The system was rushing the work, trying to be fast without being smooth, and burning through budget on parts of the task that didn’t require that level of horsepower.

The prompt itself was a monster. Thousands of tokens of instructions, examples, and context crammed into a single request. Every query, no matter how simple, triggered the full machinery. Someone searching for “pizza near me” got the same expensive treatment as someone typing an ambiguous multi-part research question. We were using a sledgehammer to hang picture frames.

Worse, the model was trying to do everything at once. Understand intent, extract keywords, identify ambiguity, and suggest next steps, all in a single cognitive pass. Like an overwhelmed cook trying to work every station at once, a guarantee for burnt food. In AI, it’s a recipe for inconsistent output and ballooning costs.

We decided to rebuild the workflow using menu planning.

The Brigade System

A successful restaurant survives on its food cost, which usually needs to hover around 20-30%. If you use high-cost ingredients for low-value side dishes, you lose your margin before you sell your first entrée. AI pricing works the same way. Every token has a cost, and every model has a price tier.

The brigade system is how professional kitchens have organized labor for over a century. Each person owns a specific role. When we rebuilt our search tool, we mapped this structure onto our AI workflow with three tiers:

The Head Chef (Reasoning Model): Highest cost, highest judgment. The Head Chef doesn’t cook. They read the ticket, interpret any modifications, and create the plan. They understand that table four wants their steak medium-rare with sauce on the side and the allergy note means no butter on the vegetables. They translate the customer’s intent into clear instructions for the line and determine the firing order so dishes arrive together without overwhelming any single station. You pay premium prices for this interpretation and planning work.

The Cooks (General Generation Models): Mid-range cost. These are your workhorses. Once the Head Chef has read the ticket and called the order, the cooks execute. They handle the bulk of the actual production: extracting keywords, drafting responses, doing the reliable station work that makes up most of any workflow. They don’t need to interpret the customer’s intent. That decision was already made. They just need to execute their station consistently.

The Runners (Summarization Models): Low cost, high speed. Runners don’t cook and they don’t plan. They move output from the kitchen to the table. In AI terms, they clean up final formatting, summarize results, and package everything for delivery. Fast, cheap, and focused on presentation rather than production.

If you don’t match the task to the right role, you aren’t being thorough. You’re being wasteful.

Rebuilding the Workflow

Here’s how we applied the brigade system to our search platform. The critical insight was that each of our four components, intent, keywords, ambiguity, and next searches, became a separate output request. Each step’s output informed the inputs for the steps that followed. The chain built on itself.

Phase 1: Reading the Ticket

We used a reasoning model strictly for interpretation and planning. This model didn’t produce any of the final deliverables. It acted as the Head Chef reading the ticket.

Given a raw user query, it analyzed what the user was actually hungry for. Information? A transaction? A specific location? It identified any “modifications to the order,” the nuances and constraints that would change how downstream steps should execute. And it produced a plan: which components needed the most attention, what context was most relevant, and how the pieces should flow together.

This step required genuine judgment. The difference between “apple” as a fruit and “Apple” as a company changes everything downstream. So we paid the Head Chef price for this interpretation work. But only for this interpretation work.

Phase 2: Setting the Station

Instead of cramming everything into a single call, we used split prompting. This is the digital version of mise en place. Before service, a cook doesn’t just grab ingredients when they need them. They prep everything in advance. Sauces are reduced. Proteins are portioned. Garnishes are cut and waiting. When the ticket fires, the cook isn’t thinking about preparation. They’re thinking about execution.

We did the same thing with our context window. We added turns with instructions, output format requirements, and the planning context from the thinking model. We prepped the station so that by the time the model had to fire the response, all the ingredients were already in place. No scrambling. No confusion. Just execution.

This separation matters more than it might seem. When you dump everything into one prompt, the model has to simultaneously understand what you want, absorb the context, and generate the output. That’s cognitive overload. By splitting the prep from the cooking, we let the model focus on one thing at a time.

Phase 3: The Line Work

Once the Head Chef’s plan was in place and context was loaded, we ran the four components as a sequence of separate calls. Keyword extraction came first, and its output became part of the input for intent classification. Intent informed the ambiguity analysis. All three fed into the next-search suggestions.

Each call used a mid-tier cook model. These models didn’t need to understand the full picture. They received clear instructions from the planning phase and executed their specific station. The chain meant that early precision compounded. Good keyword extraction made intent classification easier, which made ambiguity detection sharper, which made next-search suggestions more relevant.

Phase 4: Running the Plates

Finally, a fast runner model packaged everything for delivery. It took the outputs from each station and formatted them into clean, consistent deliverables. No deep thinking required. Just presentation.

The Results: By breaking the dish into components and using the brigade system, the models replicated human reasoning at a rate of over 90% across all four key areas. We didn’t lose quality by using cheaper models. We gained precision by giving each model a specific role they were designed to master.

The math was simple but striking. Our reasoning model, the Head Chef, now handled maybe 15% of the total workload, all of it interpretation and planning. The rest was distributed across cheaper, faster models that didn’t need to think deeply. They just needed to execute consistently. Total cost per query dropped by nearly 60% while accuracy actually improved. Specialization beats generalization when you design the workflow correctly.

Separating Thinking from Output

There’s another kitchen concept that proved essential: the difference between what happens in the kitchen and what arrives at the table.

When the Head Chef reads a ticket and plans the firing order, they’re thinking through dependencies, timing, and potential problems. The cooks don’t need to hear all of that reasoning. They need clear instructions: “Fire two ribeyes medium-rare, hold the butter on the veg for table four.” The thinking informs the instruction, but the instruction is what gets executed.

We built this same separation into our AI workflow. The reasoning model’s thinking process, its full chain of interpretation and planning, was captured separately from its output instructions. The cook models received only the clean instructions they needed to execute their station.

But here’s why the separation matters beyond efficiency: human reviewers need that thinking. When something goes wrong, when a query gets misclassified or keywords come back wrong, the thinking trace is how you diagnose the problem. Was the intent interpretation off? Did the plan make sense but the execution fail? Without visibility into the Head Chef’s reasoning, you’re debugging blind.

So we kept two channels: the thinking channel for human review and debugging, and the output channel for downstream model consumption. The kitchen gets clean tickets. The manager gets the full picture.

Future Possibilities

One pattern we didn’t use on this project but is worth exploring: running a smaller model in parallel with a more complex process to improve overall quality. Imagine a prep cook who watches the line and flags potential problems before they become mistakes. A cheap monitoring model that reviews outputs in real-time and kicks edge cases back to the Head Chef for re-evaluation. The cost is minimal. The quality improvement could be significant. It’s the digital equivalent of having a sous chef taste every sauce before it leaves the station.

Stabbing the Ticket

There’s one more technique worth mentioning, because it solved a problem that almost derailed the whole project.

There is a documented phenomenon in AI called “lost in the middle.” If you give a model a massive amount of context, it often forgets the original instruction by the time it reaches the end of the prompt. The model gets distracted by all the data you’ve fed it and loses sight of what you actually asked for.

A Google study confirmed that restating the user’s query right before you ask for the final output dramatically improves results.

In the kitchen, we call this stabbing the ticket. A Chef reads the order when it first hits the kitchen to get the firing order started. But right before the plate leaves the pass, the Chef looks at the ticket one more time. They verify the steak is medium-rare. They confirm the sauce is on the side. They refocus on the intent right at the moment of delivery and if something is wrong they get a chance to fix it before the guest ever realizes there was a mistake.

This is why prompt structure matters so much. Your core instruction belongs at the beginning and the end, right before you ask for output. The model should be reminded of your intent at the moment it matters most.

In our workflow, we implemented this ticket recall at the end of every chain. We restated the core user intent just before the final summary. It’s a small addition. Maybe one hundred extra tokens. But it was the difference between inconsistent output and reliable precision.

The Digital Chef’s Playbook

Scaling an AI system isn’t a challenge of raw power. It’s a challenge of discipline and orchestration. If you want to move from being a line cook of syntax to a Head Chef of automation, you have to stop looking for magic and start planning your menu.

Serve the salad first. Don’t try to fire every station at once. Sequence your workflow so each step has room to execute properly. The thinking model plans. The cooks execute. The runners deliver. Nobody gets overwhelmed.

Chain your outputs. Each step should inform the next. Keyword extraction improves intent classification improves ambiguity detection. Let precision compound through the workflow.

Separate thinking from instructions. Your reasoning model’s full thought process is valuable for debugging, but your execution models need clean, focused instructions. Keep both channels, but don’t cross the streams.

Stab the ticket. Place your core instruction right before asking for output. Refocusing attention at the end of the process is the difference between a mess and a masterpiece.

AI is changing how we work, but the ancient rules of the kitchen still apply. Get your station ready. Plan your menu. And never serve a dish you haven’t tasted yourself.

I’ll see you on the line.