AI Agents · Outbound

What Belongs In A Sales Agent Runbook Before It Touches Leads

Turn ICP rules, disqualifiers, approval gates, and tone rules into a safer outbound system — written before the agent is built, not after the cleanup meeting.

By Faizan Muhammad · Ink Persuasion AI Outbound Systems 13 min read

Most "AI sales agent" projects fail for the same reason. The team builds the agent first, plugs it into a lead list second, and writes the operating rules third — or never. Then the agent sends a $50k enterprise pitch to a solo freelancer, or promises a feature the product does not have, or fires 800 personalized emails into a list of 200 because the dedup logic was not in the runbook.

The result is a damaged reputation, an angry sales team, and a long meeting about "what went wrong." The fix is to write the runbook before the agent is built. Here is what every safe outbound agent needs in place on day one.

If you are replacing enrichment tools with agentic workflows, this runbook is the control layer that keeps the system useful. It pairs with the Clay-to-Codex replacement workflow: first define what the agent is allowed to do, then let the agent move faster inside those limits.

The four guardrails every agent needs

Every sales agent, no matter how simple, needs four documents written and signed off before it touches a real lead:

The ICP rules. Who is in and who is out.
The disqualifiers. The signals that mean "do not contact, even if they match ICP."
The approval gates. The moments where a human must look before the agent acts.
The tone rules. How the agent speaks, what it is allowed to claim, and what is banned.

If any of these is missing, the agent will invent one. It will pick its own ICP. It will approve its own sends. It will write in the most generic SaaS voice on earth. None of those defaults will match your business.

ICP rules: be ruthless about exclusions

The biggest mistake in ICP rules is including too much. "Marketing leaders at B2B SaaS companies with 50 to 5,000 employees in North America" sounds precise. It is not. It includes the CMO at a 50-person bootstrapped company who cannot afford your enterprise tier, the content marketing intern at a 5,000-person company who has no budget authority, and every marketing leader who joined last week and is still learning their own stack.

A good ICP rule has both inclusion and exclusion. For example:

Include: VP of Sales or above at B2B SaaS companies, 200 to 1,000 employees, US or UK, currently using a competing tool.
Exclude: companies less than 2 years old, companies in regulated industries (healthcare, finance, government), companies with fewer than 20 employees, anyone whose title contains "assistant" or "coordinator."

The exclusion list is what stops the agent from spamming people who will never buy. Write at least as many exclusion rules as inclusion rules. The exclusion rules are doing more work — the inclusion list is just the targeting hint on top.

Disqualifiers: signals that override ICP

A lead can match your ICP perfectly and still be the wrong person to email. The disqualifier list catches these:

Unsubscribe or hard bounce history. If they previously asked to be removed, or if their address has bounced, the agent must not contact them under any circumstances.
Existing customer. Do not pitch a current customer the same product they already pay for. Route them to expansion instead.
Active opportunity in pipeline. If a human is already working this account, the agent steps aside.
Competitor employees. Do not email them through the standard outbound flow. Different channel, different message, often human-only.
Bad-fit signals in their public footprint. If their LinkedIn shows they are leaving the role next month, do not start a 6-week nurture sequence. If their company just announced a hiring freeze, do not pitch expansion.

These are not "nice to have." Each one is a known failure mode from past campaigns. The disqualifier list is the institutional memory of "do not do this again."

Approval gates: where a human looks

Approval gates are the brakes on the agent. Every agent should have at least three:

First-send gate. Before the agent sends its first message to a new lead, a human reviews the message and the lead. This catches the most embarrassing errors — wrong name, wrong company, tone mismatch — before they go out. Once the human signs off on 50 to 100 leads, the gate can downgrade to a 10 percent sample rate. The goal is to never keep the gate at 100 percent, because that defeats the point of automation.
High-value lead gate. Any lead matching a "whale" pattern — a company with 1,000+ employees, or a C-level title at a target account — must get human review before send. The cost of a wrong message here is too high to automate fully.
Reply gate. When a prospect replies, the agent does not respond autonomously unless the reply is one of a small set of pre-approved responses (out of office, unsubscribe confirmation, simple FAQ). For everything else, the agent pauses and routes the conversation to a human. This is the most important gate. Auto-responding to replies is where most AI agents embarrass themselves.

The "we will review later" approach does not work. Once 500 messages have gone out, no one is going to review anything. The review has to happen before, on a sample, with teeth.

Tone rules: what the agent is allowed to say

Tone rules are the longest section of the runbook and the one most often skipped. The default voice of an LLM is generic SaaS marketing copy. If you let that voice run, your outbound will sound like every other pitch in the prospect's inbox.

A useful tone rules document has:

A voice reference. Two or three example emails a real human at your company has actually sent and gotten replies on. The agent matches the voice of those emails, not "best practices for cold email."
Banned phrases. The exact words and constructions that are not allowed: "I hope this email finds you well," "I wanted to reach out," "just circling back," "touch base," "synergy," "leverage." Add every phrase that makes you flinch when you read it in someone else's pitch.
Banned claims. Anything the agent is not allowed to promise about the product. If your product does not have SOC 2, the agent cannot say "enterprise-grade security." This list is built from real product truth, not marketing copy.
Length limits. Cold emails under 90 words. Follow-ups under 50. Anything longer, the agent has to justify in plain English to a human.
Personalization depth. At least one specific, verifiable fact about the lead — recent post, company news, shared connection, role-specific observation. Generic "I noticed your company is in fintech" does not count.

If the agent's first draft fails any of these checks, the draft is rejected and regenerated. This is not a guideline. It is a hard rule, enforced in the prompt and verified on output.

How the runbook is enforced

Writing the runbook is the easy part. Enforcing it is the work. Three enforcement patterns that work:

The runbook as a skill file. Save the ICP rules, disqualifiers, approval gates, and tone rules as a structured file (YAML or markdown) that the agent reads before every action. The prompt explicitly says: read this file, follow it, do not improvise. Cheapest enforcement — and the easiest to bypass if the agent is clever, so pair it with one of the others.
A second agent as reviewer. Run the first agent's output through a second, cheaper agent whose only job is to check the output against the runbook. If the reviewer says no, the message is blocked. It costs extra API tokens per send, but it catches errors the first agent rationalizes away.
Daily batch review. Once a day, a human reviews every message the agent sent in the last 24 hours. This is the lightest-touch enforcement and the least reliable — it catches patterns but does not stop a bad message before it goes out. Use it as a feedback loop, not as the primary check.

The right answer is usually 1 plus 2: skill file for guidance, reviewer agent for enforcement. Daily review is the audit trail that catches drift over time.

Daily credit and API limits

Sales agents do not work on cost efficiency by default. They work on whatever the API lets them do, which is often "as much as possible." Every runbook needs explicit limits:

Daily send cap per mailbox. Hard ceiling. If the lead list has more than that, the agent stops and waits for tomorrow.
API spend cap per day. Hard ceiling on total token spend. If the agent's review-and-rewrite loop is eating credits, this cap forces it to stop before it burns $300 in a day chasing a single lead.
Personalization budget per lead. Maximum number of API calls the agent can make to research one lead before it has to either send or skip. Without this, the agent will spend 40 API calls enriching one lead.

These caps are not about saving money in the abstract. They are about preventing the agent from optimizing itself into a corner where it is spending $20 per email on a $50 per month customer.

How to test the runbook before going live

Before the agent touches real leads, run it against a test set:

Build a fixture of 50 to 100 known leads. Half match ICP, half do not. Include at least 5 clear disqualifiers — competitor employees, existing customers, unsubscribes.
Run the agent on the fixture. Capture every message it produces, every decision it makes, every API call it spends.
Check three things by hand. Did the agent contact anyone on the disqualifier list. Did it generate any message that violates the tone rules. Did its API spend stay under the cap.
If any check fails, fix the runbook, not the agent. The agent is doing what it was told. If the output is wrong, the runbook is wrong. This is the part most teams skip, because rewriting the runbook is harder than rewriting the agent prompt.
Re-run until the fixture passes clean. Then run it on a fresh fixture from a different lead source. If it passes that too, the runbook is robust enough for production.

This whole loop should take a day or two. It will save you weeks of cleanup later.

What the agent is allowed to do without asking

A clean runbook also has a positive list, not just a negative one:

Send the first message to a lead that matches ICP, clears all disqualifiers, and is not in a high-value segment.
Send up to 3 follow-ups at pre-approved intervals — commonly day 3, day 7, day 14.
Detect a reply and route it to the right human or auto-respond with the pre-approved FAQ answers.
Update lead status in the CRM: contacted, replied, bounced, unsubscribed.
Pause a lead if their behavior signals stop — no opens in 30 days, role change detected, company news indicates bad fit.

Anything outside this list is not in scope without a human deciding. The agent is not authorized to negotiate pricing, promise custom features, commit to timelines, or escalate to legal. Each of those is a separate gate.

What to build this week

Write the ICP rules. Both inclusions and exclusions. Put them in a file the agent can read.
Write the disqualifier list. Every known "do not contact" signal from past campaigns. Same file.
Define the three approval gates. First-send, high-value, and reply. Decide who owns each review.
Write the tone rules. Voice examples, banned phrases, banned claims, length limits, personalization depth.
Build the test fixture. 50 to 100 leads, half bad fits, at least 5 disqualifiers. Run the agent against it. Fix the runbook until the fixture passes.
Set the daily caps. Sends per mailbox, API spend, enrichment spend per lead. Make them hard, not soft.

The team that writes this runbook before launching the agent looks slow for the first week and unstoppable for the next year. The team that skips it looks fast for the first week and spends the next year cleaning up.

Questions operators actually ask

What is an AI sales agent runbook?

It is the operating file that tells the agent who to contact, who to skip, what it can say, when a human must approve the action, and which daily caps stop the system. Without it, the agent is just a prompt with access to your lead list.

Where do I start on day one of a sales agent project?

Write the runbook first. ICP rules, disqualifiers, approval gates, tone rules, daily caps, test fixture. If any of those six documents does not exist in version-controlled storage by the time the agent has its first prompt, the agent will invent its own version of that document on day two. The invention is where the damage comes from.

What is the single highest-impact ICP rule I can add this week?

The exclusion list. Inclusion ICP is a hint to the model; exclusion ICP is a hard check. Without it, a model that says "VP of Sales at a B2B SaaS company" can interpret that as "anyone with the word sales in their LinkedIn bio." With a negative-ICP list of twenty-or-fewer items — no students, no agencies, no free-email domains, no competitors, no job titles containing "assistant" — the model has the boundaries it needs to stop guessing.

What guardrails does an AI sales agent need?

At minimum: ICP exclusions, disqualifiers, first-send review, high-value account review, reply routing, banned claims, banned phrases, and hard daily caps. The safest agents have both a written runbook and a second reviewer agent checking outputs against it.

Which approval gate do I delete first if my team is small?

Do not delete any of them. Downgrade the first-send gate from a 100 percent review to a 10 percent sample after the first 50 to 100 sends. The high-value lead gate cannot be downgraded. The reply gate is the one you cannot skip even on day one, because auto-replying to a live conversation is where the worst outbound failures happen.

How many follow-ups is too many?

Three. Day 3, day 7, day 14. Past day 14 the cadence is a sign that the message is wrong, not that the timing is wrong.

What API budget per lead is reasonable?

A personalization budget of three to five API calls per lead for research plus one for generation. If you do not cap it, a curious agent will spend forty calls enriching one lead that was never going to convert. The cap is a hard ceiling, not a soft target.

What is the cheapest enforcement upgrade after I have the runbook as a skill file?

A second, cheaper agent whose only job is to review the first agent's output against the runbook. Tone rules, banned phrases, ICP match, disqualifier match, approval-gate match. The reviewer agent costs a few hundred tokens per send. It catches the errors the first agent rationalizes away.

How do you build, train, and deploy an AI outbound agent safely?

Build the runbook first, train against a fixed test set, block every output that violates the runbook, then deploy with low volume and human review. Do not connect the agent to live sending until the fixture passes clean and the reply gate is already wired to a human.

Does the runbook apply even if my agent only sends a few hundred leads a week?

Yes. The damage does not scale with volume; it scales with failures per thousand. A hundred leads a week with a two percent bad-fit rate is two damage events per week, which over a quarter is still a reputation problem. The runbook exists to drive that percentage toward zero, independent of volume.

Primary references

Google Workspace authentication and authorization scopes (choose the narrowest OAuth scope on any sending integration): developers.google.com/workspace/guides/auth-overview
Gmail API authentication scopes reference: developers.google.com/gmail/api/auth/scopes
Google Postmaster Tools: postmaster.google.com
Microsoft Smart Network Data Services (SNDS): substrate.office.com/ip-domain-management-snds/snds
Yahoo Sender Hub best practices: senders.yahooinc.com/best-practices