Outbound Ops · AI Agents

How To Replace Your Clay Subscription With Claude Code or Codex

Clay is a hosted waterfall builder — six steps under the hood. You can own those steps with one orchestrator agent, N parallel workers, and a QA agent, and pay a flat fee instead of watching a credit counter.

By Faizan Muhammad · Ink Persuasion Outbound Operations 16 min read

Most teams adopt Clay because it looks like the fastest way to build a waterfall. You import a list, you stack a few providers, you run Claygent over the rows, you push the result into Google Sheets or your CRM, and you watch the leads light up.

Then the second month hits. The limits show up. The credit math shows up. The queue time shows up. And the questions start to sound the same:

Why does this row stay at "partial"?
Why is this run still spinning after an hour?
Why am I paying for actions I never asked for?
What if I could swap the SaaS for an agent I already own, pay a flat monthly fee, and never see the credit counter again?

You can. That is what this guide is for.

This is not a Clay-hate post. Clay is a real product. This is a build guide for when you have decided the next doubling of cost — or the next ceiling on rows per table — is not worth it, and you would rather own the workflow.

What Clay actually does

Before you replace anything, you have to understand what is being replaced. Clay is, at heart, a hosted waterfall builder. You upload a list, Clay runs that list through a chain of providers and AI steps, and Clay writes the result back to a table or to your CRM. The pieces that make that waterfall work:

HTTP fetch. Clay calls out to enrichment providers, scraping tools, and any external API you connect.
HTML to text. Clay cleans each fetched page, strips the noise, and prepares the content for the model.
AI prompt over the cleaned text. Clay runs the cleaned page through a model with your prompt. The prompt is what most people call a "Claygent."
JSON shaping. Clay parses the model output, validates it against your schema, and writes the result into the table.
Push to destination. Clay syncs to Google Sheets, HubSpot, Salesforce, or whatever sequencer you use downstream.
Optional sequencer trigger. Clay can hand off to its own emailer or to Instantly, Smartlead, or wherever you send.

That is the waterfall. Everything else is a UI around those six steps. Most teams never see it that way, because the Clay UI hides it inside a row-level editor. But that is what is happening under the hood. Every "column" in a Clay table is one of those steps.

What Clay costs in 2026

As of June 30, 2026, Clay publishes four tiers. The exact display changes by billing view, so always check the live pricing page before buying; the useful operator-level version is:

Free. 500 actions per month, 6,000 per year, 1.2K data credits per year, 100 per month. Unlimited seats and tables. Multi-provider waterfalls. Up to 200 rows per table. Bring your own API key. Claygent enrichment.
Launch — $185/mo monthly, or $167/mo on annual billing. Starts at 15,000 actions per month and 180,000 per year. Clay's FAQ describes 2.5K Data Credits per month on Launch, while the annual plan display packages credits by year. Up to 50,000 rows per table. Adds phone enrichment, signals, and email campaign integrations.
Growth — $495/mo monthly, or $446/mo on annual billing. Starts at 40,000 actions per month and 480,000 per year. Clay's FAQ describes 6K Data Credits per month on Growth. Adds CRM auto-sync, unlimited Audiences search and 250K imports, custom HTTP integrations, webhook automations, web intent signals, priority support.
Enterprise — custom. Custom actions and data credits, SSO, role-based access, unlimited ad audiences, unlimited Audiences search.

Data Credits start at $0.05 each and get more cost-effective as you grow. Actions start at less than $0.01 each. For AI runs specifically, Clay now has two pricing modes:

Fixed-price. About 80% of models cost a flat number of Data Credits per task. These are the native Clay models — ideal for short classification, summaries, templated content, and simple lookups.
Variable. Token-intensive frontier models are billed at the actual token cost Clay pays, with no markup. Best for multi-step research and detailed account-level work. The column shows an estimated cost with a tilde before you run; Clay says most runs come in under the estimate.

You can also bring your own API key on any plan, including Free. When you use your own key, the run still counts as one Action, but no Data Credit is consumed for the AI tokens. Clay says AI runs are about 2× faster on Clay's keys because of negotiated rate limits — so BYOK trades speed for cost.

Why the SaaS waterfall starts to fail

Once you understand what Clay does and what it costs, the failure modes become obvious. They are not Clay-specific. They are the failure modes of any single-agent hosted waterfall over a list with more than a few hundred rows.

Failure 1: context window loss

Most users run Claygent prompts that ask the model to read a fetched page, parse it against a schema, and return structured JSON. That works fine for a single row. But when you queue 5,000 rows through the same prompt, the model is not actually reading each page from scratch. Its internal context is shaped by the prompts and outputs that came before it. For long runs, that drift causes three real problems:

The model "remembers" the schema from the first few rows and starts pattern-matching instead of actually re-reading the page.
The model drops rare fields it had not seen recently, so a row that should include a pricing tier comes back without it.
The model invents a plausible answer when the fetched page is ambiguous, because the run is "almost done" and any structured JSON passes the next step's validator.

The user sees this as "Clay randomly blanks out on a few rows." It is not random. It is the model running on a degraded context.

Failure 2: serial speed

A hosted waterfall is, by definition, a queue. Clay batches rows internally, but the batch is one prompt over a list of inputs, not a fleet of independent agents. When the model is the bottleneck, no amount of UI improvement makes the run finish faster. You are paying the queue, not the work.

For a 5,000-row waterfall that takes 45 minutes in Clay, the same waterfall on a fleet of Claude Code or Codex sub-agents running in parallel can finish in 8 to 12 minutes, because each agent handles a slice of the rows at once.

Failure 3: cost opacity

Data Credits look cheap at $0.05, but they stack. A 5,000-row run with one variable-price Claygent over a frontier model can chew through meaningful usage before the enrichment credits. For a team running multiple waterfalls a week, the subscription is not always the expensive line. The usage is.

Failure 4: queue + scope

Clay caps rows per table at 200 on Free, 50,000 on Launch and Growth. Fine for most campaigns. But when you start doing account-level research across 100,000 accounts, the cap forces you to split the run into multiple tables and re-merge. The merge step is the part that breaks most often, because the row IDs do not align cleanly across tables and the schemas drift between runs.

What Claude Code and Codex actually are

Two products, same idea, different vendors:

Claude Code. Anthropic's CLI for its coding model. Install via npm, log in with an eligible Claude plan or API setup, point it at a directory. The CLI can read files, write code, run shell commands, use tools, and operate on a git repo autonomously.
Codex. OpenAI's coding agent and CLI. Same shape: log in with an eligible ChatGPT plan or API setup, point it at a directory, and let the agent read, write, run commands, and orchestrate work across files. The exact model and usage limits change by plan, so check the current Codex docs before designing the cost model.

Both CLIs can run as a long-lived agent that takes a high-level task, decomposes it, calls external tools, and writes artifacts to disk. They can also spawn subagents that work in parallel on the same workspace.

The two-tier agent pattern

This is the pattern that replaces Clay. It is not new. It is just rarely drawn for outbound teams. There are three roles:

Orchestrator. One Claude Code or Codex session that owns the task end to end. It reads the input list, partitions the rows into batches, spawns worker agents, collects their output, runs the QA agent, and writes the final artifact to Google Sheets or your CRM.
Worker agents. N independent subagents, each running on a slice of rows. Each worker fetches each row's source URL, runs HTML-to-text, and writes a structured per-row record. Workers do not see the full list — only their slice and the row-level prompt.
QA agent. One session that reviews every per-row record against the same schema and ICP principles. Its only job is to grade, drop, or escalate each row.

The orchestrator runs the workers in parallel, then runs the QA agent over the union output. The QA agent can also be parallelized if you want more throughput. This pattern fixes all three failure modes:

Context window loss. Each worker is its own context. No worker sees more than its slice. No worker drifts based on what came 4,000 rows earlier.
Serial speed. Workers run in parallel. With a higher-usage Claude Code or Codex setup, you can run more workers; with lower-tier plans, run fewer workers and keep the batches smaller.
Cost opacity. You pay a flat subscription plus any external API keys you choose to plug into the workers. No credit counter. No per-row markup.

How to build it in five steps

Step 1 — Define the row schema and the ICP principles

The first mistake teams make is jumping to the code. The first move is to write down two artifacts on paper. First, the output schema — what fields does each row need? For a typical outbound waterfall, ours is:

Output schema

first_name, last_name, company, role, public_email,
linkedin_url, website, recent_post_url,
niche, audience_signal, product_idea, fit_score, fit_reason

Second, the ICP principles — what makes a lead a fit, a partial, or a reject? Five to ten plain-language rules are usually enough. For example:

Reject if the audience is under 50K and not in a clear commerce lane.
Reject if the recent content has no commerce signal in the last 30 days.
Reject if the role is "Founder / CEO" of a service business with no product lane.
Partial if there is a product lane but no recent commerce signal.
Fit if there is a product lane and a recent commerce signal that matches one of our five niches.

These two artifacts become the system prompt for the QA agent. The worker agent uses a simpler prompt that focuses on extraction, not judgment.

Step 2 — Build the worker prompt

The worker prompt is small on purpose. Its only job is to extract structured data from one row:

prompts/worker.md

You are extracting structured fields from one lead.

You will receive:
- A JSON object with first_name, last_name, company,
  role, website, and any URLs.

Your job:
1. Fetch the website URL.
2. Fetch the recent_post_url if present.
3. Run HTML to text on each fetched page.
4. From the cleaned text, extract:
   - niche (lifestyle, fitness, beauty, fashion,
     athletes, jewelry, haircare, accessories,
     streetwear, music, none)
   - audience_signal (one sentence on commerce
     behavior, or "no signal")
   - product_idea (one product line that fits, or "none")
   - recent_signal (one sentence on the latest post)

Return ONLY a JSON object matching this shape:
{
  "niche": "...",
  "audience_signal": "...",
  "product_idea": "...",
  "recent_signal": "..."
}

Note what the worker does not do. It does not score. It does not decide fit. It does not reject rows. It extracts. This is the architectural choice that fixes context drift: the worker is a pure extractor, the QA agent is the judge.

Step 3 — Build the QA agent prompt

The QA agent is the one that knows the ICP principles. Its system prompt includes the five to ten rules and the schema from Step 1. On every row it reads the input fields and the worker's extraction, decides fit / partial / reject, and gives a one-sentence reason.

prompts/qa.md

You are reviewing one lead against these ICP rules:
[ICP_RULES]

You will receive:
- The original lead fields.
- The worker agent's extracted fields
  (niche, audience_signal, product_idea, recent_signal).

Decide fit and return ONLY this JSON:
{
  "fit_score": "fit" | "partial" | "reject",
  "fit_reason": "one short sentence"
}

That is the entire QA prompt. No enrichment. No fetching. Just judgment.

Step 4 — Wire the orchestrator

The orchestrator is a Claude Code or Codex session that owns three things: slice generation (partition the list into N slices), worker spawn (one subagent per slice, each writing a JSON file per row to a shared directory), and the QA pass (spawn the QA agent over the union output once all workers finish). It can be a single long-running command, or a small shell loop — most teams start with the loop because it is easier to debug.

orchestrator.sh — adapt to your CLI

#!/bin/bash
set -euo pipefail

INPUT=leads.csv
WORKERS=10
OUT=work/
QA_OUT=qa/

# Slice the list
split -n l/${WORKERS} -d --additional-suffix=.csv \
  $INPUT $OUT/slice_

# Spawn workers in parallel
for slice in $OUT/slice_*.csv; do
  codex exec --sandbox danger-full-access \
    "Read the list at $slice. For each row, run the \
     worker prompt in prompts/worker.md. Write one \
     JSON file per row to $OUT/rows/." &
done
wait

# QA pass + push
python3 merge_qa.py $OUT/rows/ $QA_OUT/
python3 push_sheets.py $QA_OUT/

The actual prompts live in prompts/worker.md and prompts/qa.md. The orchestrator just calls them.

Step 5 — Connect the cheap model API

This is where you cut cost without losing quality. The worker prompt is small — it does extraction, not judgment. The QA prompt is small — it does grading, not synthesis. For both, a smaller, cheaper model usually performs as well as the frontier model. The orchestrator itself is the only place where you need a stronger model, and it runs once per campaign, not once per row.

Two low-cost options teams commonly test for the worker and QA slots:

DeepSeek V3. Strong extraction and short-form grading, low cost. Good for the worker and QA prompts when the schema is simple.
MiniMax M3. Comparable shape, useful as a fallback if DeepSeek rate limits bite.

Wire these into your workflow as the low-cost model tier. The orchestrator can stay on Claude or OpenAI; the workers and QA agents run on the cheaper model. The result: your variable cost per row is roughly the cheap model's input + output token cost. For a 500-token extraction prompt and a 200-token output, this can land under $0.001 per row before retries, fetch costs, and any paid data provider calls.

What to use it for

The same workflows you would build in Clay work fine in this pattern. Three concrete examples:

Example 1 — ICP-fit waterfall

Input: 5,000 founders and operators with name, role, company, website. The orchestrator slices into 10 worker batches of 500; each worker fetches the website, runs HTML-to-text, extracts niche / audience_signal / product_idea / recent_signal; the QA agent reviews every row against ICP rules and returns fit_score and fit_reason; the orchestrator writes fit leads to Google Sheets. Throughput on a $200/mo subscription: about 10,000 to 30,000 rows per day — well above what most teams need.

Example 2 — Personalization waterfall

Input: 1,000 ICP-fit leads from the previous run. The orchestrator spawns one personalization subagent per row; each reads the lead's website, recent post, and LinkedIn URL, then drafts a 2-sentence first message using the personalization skill file; the QA agent reviews every message against brand-voice rules, drops any with a forbidden phrase, and flags ambiguous cases. The skill file is the key — it defines brand voice, forbidden phrases, the angle library, and the soft CTA. Write it once and every worker fills the same template with the same rules.

Example 3 — ABM account research

Input: 100 target accounts with domain and employee count. The orchestrator spawns one subagent per account; each fetches the main site, pricing page, careers page, recent press, and last 10 blog posts, then produces a structured account memo (ICP fit, current GTM motion, recent hires, recent launches, recommended first-touch angle); the QA agent reviews every memo against the account-research rubric and returns a confidence score. This is the pattern most agencies use to replace the manual research layer Claygent does not automate well.

What this is not

A few things to be honest about, so you do not over-promise to a client or to yourself:

It is not zero code. You need a shell script, two markdown prompt files, and a small Python merge step. That is the entire codebase.
It is not free. The subscription is the floor. The cheap-model API is the variable. The total is usually lower than the equivalent Clay tier, but it is not zero.
It is not instant to set up. Plan a focused day to wire the worker prompt, the QA prompt, the orchestrator shell, and the Sheets push. After that, every new campaign is a config change.
It is not magic on hard problems. If your ICP rules are bad, no agent pattern fixes them. The pattern amplifies a clear ICP — it does not invent one.

When to keep Clay

There are real reasons to stay. Use this guide to decide, not to feel forced into a switch:

You run one waterfall a week and the cost is fine. The orchestrator pattern pays off when you run waterfalls often enough that per-row cost matters.
Your team is non-technical and the Clay UI is the reason they can do the work. The orchestrator pattern needs someone comfortable in a terminal.
You need the Clay UI for collaboration, audit trails, or the signals product. Those are real features the pattern does not replicate.
You are not running at a scale where the 50,000-row table cap is binding.

When to switch

You switch when one of these is true:

You are paying for Data Credits on top of a Clay subscription and the variable cost is becoming the expensive line.
You are hitting the rows-per-table cap and the merge step is breaking.
You want to ship a personalization waterfall with skill files that match your brand voice, not Claygent prompts.
You want to package outbound research as a service and need a workflow you own end to end.

How to package it for clients

If you are an agency, this pattern is also a service offering. Three packages that work:

Implementation. You set up the worker prompt, QA prompt, orchestrator shell, and Sheets push for one client, plus the skill file for their brand voice. One-time fee, scoped to their ICP and personalization angle.
Maintenance. You run their waterfalls on a recurring cadence. Per-row or per-campaign pricing. Lower margin than Clay, but you control the cost stack.
Skill authoring. You sell the brand-voice skill file and the ICP-rules file as standalone artifacts. Once those exist, the client can run the pattern without you.

Each package is a clean service line that does not depend on Clay's roadmap, pricing changes, or row caps. That is the long-term advantage of owning the workflow.

The short version

Clay is a hosted waterfall builder. The waterfall is six steps. You can replace it with one orchestrator agent plus N worker agents plus one QA agent, all running on Claude Code or Codex, with a cheap model API behind the workers. That pattern fixes the three real failure modes of hosted waterfalls over big lists: context window loss, serial speed, and cost opacity.

It is not free and it is not zero code. It is a day of setup and a flat subscription. After that, every campaign is a config change and your variable cost is the cheap model's API token bill. For most outbound teams running more than a few waterfalls a week, the trade is worth it.

Frequently asked questions

What is the best Clay alternative for lead enrichment?

It depends on what you mean by "alternative." If you want a polished UI, use another GTM SaaS. If you want to own the waterfall, Claude Code or Codex is the stronger alternative — it can fetch sites, clean pages, call APIs, run model prompts, score leads, and write the final output back to Google Sheets.

Can Claude Code replace Clay?

Yes, for many lead-scoring and personalization workflows. Claude Code can act as the orchestrator, while worker agents handle website research, HTML-to-text extraction, API calls, and row-level scoring. You still need to define the ICP rules, the output schema, and the QA agent prompt.

Can Codex replace Clay?

Yes. Codex can run the same orchestrator pattern: split the list, assign slices to worker agents, merge outputs, run a QA pass, and push the final rows into Google Sheets. The value is not a nicer UI than Clay — it is that Codex can own the full workflow as code.

What does Clay do that an agent workflow has to rebuild?

Clay gives you six things: data fetch, enrichment providers, AI prompts over rows, structured output, table storage, and integrations. A Claude Code or Codex workflow has to rebuild those pieces with scripts, API keys, prompts, and Google Sheets or CRM sync.

Is Clay still worth it?

Yes — if the UI is the reason your team can run the workflow, if your volume is low, or if you need Clay's built-in data marketplace and signals. Clay becomes easier to replace when you are paying heavily for Data Credits, running repeated large waterfalls, or turning outbound research into a client-facing service.

How much does Clay cost?

As of June 2026, Clay lists Free, Launch at $167/mo, Growth at $446/mo, and Enterprise as custom. Data Credits start at $0.05 each, and Actions start below $0.01 each. The real cost depends on how many rows you run and whether you use fixed-price or variable AI models.

Does Clay allow bring-your-own API keys?

Yes. Clay's current pricing page says customers can bring their own API keys for data enrichment or AI. Each run still counts as one Action, but no Data Credit is used for the model tokens. Clay also says its own keys can run faster because of higher negotiated rate limits.

What is a Claygent alternative?

A prompt-driven worker agent that researches one row, extracts structured fields, and returns JSON. The difference is that in Claude Code or Codex you can separate extraction from judgment: workers extract, a QA agent scores.

How do you stop AI agents from losing context on big lead lists?

Do not run one giant prompt over the full list. Split the list into slices, give each slice to a worker agent, and keep each worker's context small. Then run a separate QA agent over the merged output. That is the main reason the orchestrator pattern works.

Can AI agents score leads against an ICP?

Yes, if the ICP rules are written clearly. The agent should not invent the ICP. You give it five to ten fit / reject / partial rules. The QA agent then scores every row as fit, partial, or reject and writes a one-sentence reason.

Is this cheaper than Clay?

Usually yes for repeated high-volume waterfalls, because the subscription becomes the fixed floor and the worker-model API becomes a small variable cost. It is not automatically cheaper for low-volume teams — you still need setup time, API keys, and someone who can maintain the workflow.

Is this better than Apollo, Instantly, or Smartlead?

It is not the same category. Apollo is mainly a database. Instantly and Smartlead are mainly sequencers. Clay is a workflow and enrichment layer. Claude Code or Codex replaces the workflow layer, not the sending layer — you can still push final leads into Instantly or Smartlead.

How many leads can this process per day?

For simple waterfalls, a well-parallelized Claude Code or Codex setup with external API keys can often process thousands of rows per day if the worker tasks are small. The real limit is not only the model — it is website fetch speed, rate limits, retry handling, and QA strictness.

Written by Faizan Muhammad, founder of Ink Persuasion, where we build cold email, LinkedIn content, AI-agent, and outbound operations systems for B2B teams.

Sources

Clay pricing page, retrieved 2026-06-30: clay.com/pricing
Claude Code overview, retrieved 2026-06-30: docs.anthropic.com/en/docs/claude-code/overview
Claude subscription pricing, retrieved 2026-06-30: anthropic.com/pricing
OpenAI Codex CLI docs, retrieved 2026-06-30: developers.openai.com/codex/cli
DeepSeek API pricing, retrieved 2026-06-30: api-docs.deepseek.com/quick_start/pricing
MiniMax API pricing, retrieved 2026-06-30: platform.minimax.io/docs/guides/pricing-paygo

What Clay actually does

What Clay costs in 2026

Why the SaaS waterfall starts to fail

Failure 1: context window loss

Failure 2: serial speed

Failure 3: cost opacity

Failure 4: queue + scope

What Claude Code and Codex actually are

The two-tier agent pattern

How to build it in five steps

Step 1 — Define the row schema and the ICP principles

Step 2 — Build the worker prompt

Step 3 — Build the QA agent prompt

Step 4 — Wire the orchestrator

Step 5 — Connect the cheap model API

What to use it for

Example 1 — ICP-fit waterfall

Example 2 — Personalization waterfall

Example 3 — ABM account research

What this is not

When to keep Clay

When to switch

How to package it for clients

The short version

Frequently asked questions

What is the best Clay alternative for lead enrichment?

Can Claude Code replace Clay?

Can Codex replace Clay?

What does Clay do that an agent workflow has to rebuild?

Is Clay still worth it?

How much does Clay cost?

Does Clay allow bring-your-own API keys?

What is a Claygent alternative?

How do you stop AI agents from losing context on big lead lists?

Can AI agents score leads against an ICP?

Is this cheaper than Clay?

Is this better than Apollo, Instantly, or Smartlead?

How many leads can this process per day?

Sources

Want this pattern built for your outbound?