How to Automate Facebook Ad Library Lead Enrichment (And Load Verified Contacts to Instantly)

Q: How do you prevent duplicates across runs and tools?

We key every record as `domain:adId` and store it in a shared state table. The worker checks that key before enrichment. Instantly imports run with `upsert: true` so retried pushes do not create duplicates. That three-part approach keeps retries and batch replays safe.

Facebook Ad Library lead enrichment automation works by scraping daily ad snapshots from the Meta Ad Library (formerly Facebook Ad Library) via Apify, normalizing and filtering candidates in a Node.js worker, waterfalling through six enrichment providers to get a verified decision-maker email, drafting a short on-brand pitch with gpt-5-mini, then loading only verified contacts into Instantly campaigns. In production it took enrichment hit rate from 26 percent to about 95 percent on our target niche.

Facebook Ad Library lead automation is: a repeatable pipeline that turns public ad signals into verified contacts and ready-to-send campaigns, without manual research.

If you are a lead-generation team hunting local service businesses that already run paid ads, this post shows the exact architecture, the code patterns, where we tripped up, and what changed after we shipped it.

The problem it solves

A person can scroll the Facebook Ad Library, copy business names, guess domains, hunt for a decision-maker, try an email-finder, verify the address, and paste it into a sender. That takes minutes per lead, it is inconsistent across staff, and enrichment stalls when the first tool returns nothing. Sequences go cold while you research.

Answer first: the automation replaces manual ad-to-contact research with a daily, hands-off pipeline that pulls ads, verifies emails across six sources, and loads only deliverable contacts into Instantly.

Manual prospecting	Automated pipeline
Search Ad Library, copy names by hand	Apify actor pulls daily snapshots to a dataset
One enrichment tool, stop on a miss	Six-layer waterfall until a verified email exists
Personal guesswork on ICP	Deterministic filters: niche, geo, size, block list
Paste text into sender	Draft + load to Instantly via API, no copy-paste
Duplicates and rework common	Shared state prevents repeats across runs

According to Harvard Business Review, companies that tried to contact potential customers within an hour were nearly seven times as likely to qualify the lead as those that waited longer than an hour (source: https://hbr.org/2011/03/the-short-life-of-online-sales-leads). Faster sourcing and enrichment is what makes those responses possible.

The free manual path with the Meta Ad Library

Before building anything, know that the Meta Ad Library (formerly Facebook Ad Library) is free and public, and for a small list you may not need automation at all. There are two no-cost paths.

The first is the Ad Library search interface at facebook.com/ads/library. You pick a country and category, search by keyword or advertiser, and read every active ad a business is running, including how long each ad has been live. For a handful of target businesses, you can browse this by hand, copy the advertiser name, and start your own research from there.

The second is the Meta Ad Library API, Meta's official programmatic access to the same data. It requires an approved developer app and an access token, and it is rate-limited and gated behind Meta's terms, but it returns structured ad records without any scraping. For political and issue ads the API is broad; for commercial ads the fields are narrower, which is one reason a scraping actor is sometimes used to reach the fields a lead pipeline needs.

Both of these are the honest starting point: if you only need a few dozen leads a month, the search UI or the API is enough. The pipeline below exists for the case where you want hundreds of verified contacts a day, hands-off, which is where manual browsing and the raw API both run out of road.

How the automation works

Answer first: the worker ingests daily ad snapshots, filters to your ICP, enriches emails with a six-tool waterfall, drafts a short pitch, and pushes verified prospects into the right Instantly campaign with idempotent state.

Apify Facebook Ad Library actor: Scrapes active ads for US local service businesses and writes a snapshot to a dataset. We map the nested fields to a flat schema: name, domain guess, page URL, ad text, last seen, geo.
Node.js worker on Railway: Normalizes the snapshot, applies ICP filters and block lists, dedupes against prior sends, and orchestrates enrichment and drafting.
Six-layer enrichment waterfall: Apollo first, then website scrape for a contact page, then Snov, Hunter, Prospeo, and Tomba until a verified address appears. Only verified contacts pass.
AI drafting engine: gpt-5-mini writes a 3 to 5 sentence, link-free cold open that references the prospect's own ad copy as social proof. We keep tone and claims consistent.
Instantly campaign loader: Creates or updates the contact in a specific campaign, sets custom fields, and attaches the draft for review or immediate scheduling per your policy.

Step-by-step: how to build it

Step 1: Pull daily ad snapshots from Apify

Answer first: use Apify to run a daily Facebook Ad Library actor and pull dataset items into your worker with proper mapping, since the output is nested under snapshot fields.

// packages: apify, node-fetch
import fetch from "node-fetch";
 
const APIFY_TOKEN = process.env.APIFY_TOKEN!;
const DATASET_ID = process.env.APIFY_DATASET_ID!; // actor writes here daily
 
export async function fetchAds(limit = 2000) {
  const url = `https://api.apify.com/v2/datasets/${DATASET_ID}/items?token=${APIFY_TOKEN}&clean=true&format=json&limit=${limit}`;
  const res = await fetch(url);
  if (!res.ok) throw new Error(`Apify dataset fetch failed: ${res.status}`);
  const rows = await res.json();
  // Map nested snapshot → flat record
  return rows.map((r: any) => ({
    pageName: r.page?.name || "",
    pageUrl: r.page?.url || "",
    adText: r.snapshot?.text?.slice(0, 600) || "",
    lastSeen: r.snapshot?.lastSeen || r.lastSeen || null,
    geo: r.snapshot?.countries || [],
    domainGuess: guessDomain(r.page?.url),
    adId: r.id,
  }));
}
 
function guessDomain(pageUrl?: string) {
  try {
    if (!pageUrl) return "";
    const u = new URL(pageUrl);
    // Often page links out in About. We resolve later via a scrape fallback.
    return u.hostname.replace(/^www\./, "");
  } catch {
    return "";
  }
}

Gotcha: the actor output is nested, not flat. Map under snapshot fields and keep adId for de-dup keys.

Step 2: Apply ICP filters, block lists, and dedupe

Answer first: normalize to your niche, filter by geo, exclude big brands by domain or name, and drop anything you have seen before with a single shared state store.

// Shared state can be a Sheet, SQLite, or Postgres. Here: SQLite sketch.
import Database from "better-sqlite3";
const db = new Database("state.sqlite");
db.exec(`CREATE TABLE IF NOT EXISTS seen (k TEXT PRIMARY KEY, ts INTEGER)`);
 
const BLOCKED = ["facebook.com", "amazon.com", "walmart.com"]; // expand from logs
 
export function filterAndDedupe(rows: any[]) {
  const stmtHas = db.prepare("SELECT 1 FROM seen WHERE k = ?");
  const stmtPut = db.prepare("INSERT OR IGNORE INTO seen (k, ts) VALUES (?, ?)");
 
  return rows.filter(r => {
    if (!r.domainGuess || BLOCKED.some(b => r.domainGuess.endsWith(b))) return false;
    if (!inTargetGeo(r.geo)) return false;
    const k = `${r.domainGuess}:${r.adId}`;
    const seen = stmtHas.get(k);
    if (!seen) stmtPut.run(k, Date.now());
    return !seen;
  });
}
 
function inTargetGeo(countries: string[]) {
  // Example: US only
  return Array.isArray(countries) && countries.includes("US");
}

Gotcha: the block list is what stops big brands from leaking through when ad-text heuristics fail.

Step 3: Run the six-layer email enrichment waterfall

Answer first: call each provider in order and stop on the first verified result. Apollo first, then contact-page scrape, then Snov, Hunter, Prospeo, Tomba.

export type Enriched = { email: string; source: string; confidence: number } | null;
 
export async function enrich(domain: string, company: string): Promise<Enriched> {
  const steps = [apolloLookup, contactPageScrape, snovLookup, hunterLookup, prospeoLookup, tombaLookup];
  for (const step of steps) {
    const out = await step(domain, company);
    if (out && isVerified(out)) return out;
  }
  return null;
}
 
function isVerified(e: Enriched) {
  if (!e) return false;
  return e.confidence >= 0.9 || /valid|verified/i.test(e.source);
}
 
async function apolloLookup(domain: string, company: string) {
  // Placeholder: follow Apollo People API with domain + seniority filters
  // return { email, source: "apollo:verified", confidence: 0.97 } on success
  return null;
}
 
async function contactPageScrape(domain: string) {
  // Fetch /contact, /about, parse mailto: links, run quick SMTP verify
  return null;
}
 
async function snovLookup(domain: string) { return null; }
async function hunterLookup(domain: string) { return null; }
async function prospeoLookup(domain: string) { return null; }
async function tombaLookup(domain: string) { return null; }

Gotcha: do not push unverified guesses. Verification gating is what protects deliverability.

Step 4: Draft the first-touch email with gpt-5-mini

Answer first: keep drafts short, reference the business's own ad copy, and set OpenAI parameters correctly for a reasoning model: use max_completion_tokens and a low reasoning_effort.

import fetch from "node-fetch";
 
const OPENAI_KEY = process.env.OPENAI_API_KEY!;
 
export async function draftEmail(company: string, adText: string) {
  const body = {
    model: "gpt-5-mini",
    reasoning: { effort: "low" },
    max_completion_tokens: 220,
    messages: [
      { role: "system", content: "You write concise, link-free cold emails. 3-5 sentences. Cite the prospect's own ad copy as social proof. No claims you cannot back up. Plain English." },
      { role: "user", content: `Company: ${company}\nAd copy:\n${adText}\n\nDraft a brief opener offering to improve lead handling without changing their ad budget.` }
    ]
  };
  const res = await fetch("https://api.openai.com/v1/responses", {
    method: "POST",
    headers: { "Authorization": `Bearer ${OPENAI_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify(body)
  });
  if (!res.ok) throw new Error(`OpenAI error: ${res.status}`);
  const json = await res.json();
  return json.output_text?.trim() || "";
}

Gotcha: max_tokens is not a valid parameter on gpt-5-mini. Use max_completion_tokens, and set reasoning effort to minimal to avoid empty outputs.

Step 5: Load verified prospects and drafts into Instantly

Answer first: create or upsert the contact with custom fields, attach the draft, and drop into the right campaign. Keep this idempotent.

const INSTANTLY_KEY = process.env.INSTANTLY_API_KEY!;
 
async function upsertInstantlyContact(campaignId: string, name: string, email: string, company: string, draft: string) {
  const payload = {
    campaignId,
    contacts: [{
      email,
      firstName: name.split(" ")[0] || "",
      lastName: name.split(" ").slice(1).join(" ") || "",
      company,
      customFields: { ai_draft: draft }
    }],
    upsert: true
  };
  const res = await fetch("https://api.instantly.ai/api/v1/contacts/import", {
    method: "POST",
    headers: { "X-API-KEY": INSTANTLY_KEY, "Content-Type": "application/json" },
    body: JSON.stringify(payload)
  });
  if (!res.ok) throw new Error(`Instantly import failed: ${res.status}`);
  return res.json();
}

Gotcha: keep a single dedup key across your system. We use domain:adId and also rely on Instantly upserts to avoid accidental double-loads.

Step 6: Add quotas, logs, and a cost ceiling

Answer first: cap daily actor reads, enforce per-provider rate limits, and log each step so you can reconstruct why a record did or did not enter a campaign.

type Journal = {
  key: string; // domain:adId
  step: string; // fetch, filter, apollo, snov, draft, instantly
  status: "OK" | "SKIP" | "FAIL";
  note?: string;
  ts: number;
};
 
const logs: Journal[] = [];
function log(key: string, step: string, status: Journal["status"], note?: string) {
  logs.push({ key, step, status, note, ts: Date.now() });
}
 
// Example ceiling
const DAILY_LIMIT = Number(process.env.DAILY_LEAD_CAP || 150);
let emitted = 0;
function canEmit() { return emitted < DAILY_LIMIT; }

Gotcha: a hard daily ceiling keeps third-party costs predictable. Add backoff and pacing on provider APIs to avoid bans.

Where it gets complicated

Apify output shape is nested. The actor writes fields under snapshot and page. A flat-map pass is mandatory or your filter logic will miss key fields like lastSeen and geo.

ICP filters leak big brands without block lists. Generic heuristics fail on well-known names. A maintained domain and name block list is the practical fix.

Single-source enrichment caps at about a quarter. Apollo alone reached roughly 26 percent enrichment on our target slice. The six-layer waterfall brought verified emails to about 95 percent. The waterfall is the difference.

Reasoning-model params matter. gpt-5-mini ignores max_tokens and can consume its budget on hidden reasoning. Use max_completion_tokens and reasoning.effort: low or outputs come back empty.

Idempotency across tools. The worker, the state store, and Instantly all need the same dedup key. We use domain:adId everywhere. That is what prevents doubles across retries.

Provider errors can be 200 OK. Some enrichment APIs and scrapers return HTTP 200 with an empty body on quota exhaustion. Treat zero-length and obviously malformed payloads as hard errors and retry later.

Real-world results

We built this for a lead-generation agency targeting local service businesses that already run ads. The first iteration used Apollo plus Hunter and hit about a 26 percent enrichment rate across a small batch. After we added website contact-page scrapes and expanded to a six-layer waterfall, verified contacts reached about 95 percent on the same niche. The first batch of ten verified contacts loaded to Instantly and began sending.

The gain is structural. Ad Library signals intent. A waterfall avoids single-source blind spots. Verification gating protects deliverability. The pipeline runs daily, so new ads enter the queue without anyone opening a browser.

Meta's Ad Library publicly indexes active ads and is designed for transparency on ad content and advertisers (source: https://www.facebook.com/ads/library/). We use it strictly as a lead signal, then verify contacts through third parties before any send.

Frequently asked questions

Does Facebook Ad Library have an official API?

Yes. Meta exposes an Ad Library API with rate limits and policy requirements, and there is also the public web interface. We used an Apify actor against the public interface because it provided the fields and cadence we needed. The choice is pragmatic: use the path that gives you reliable data with clear compliance.

How do you prevent duplicates across runs and tools?

We key every record as domain:adId and store it in a shared state table. The worker checks that key before enrichment. Instantly imports run with upsert: true so retried pushes do not create duplicates. That three-part approach keeps retries and batch replays safe.

Can this run in real time or only daily?

We run daily batches. Hourly is possible, but you will need stricter rate limits and a smaller per-batch cap to stay within provider quotas. For most teams, a daily pull is the right balance of freshness and cost.

What does this cost to run monthly?

Costs are usage based: Apify actor runs, enrichment API credits, Instantly, and a small LLM spend for drafting. We add a hard daily lead cap and per-provider pacing so spend is predictable. Infrastructure on Railway or a similar host is minimal compared to the third-party credits.

Can a non-technical team set this up without a developer?

Not end to end. Scraping, provider authentication, rate limits, and idempotency are real engineering. Once built, non-technical operators can adjust filters, block lists, and campaigns through config without touching code.

Why not just use Apollo or Hunter alone?

Single-source enrichment stalls when a domain is small, new, or privacy-first. The waterfall covers those gaps. We start with Apollo for breadth, then scrape contact pages, and only then fan through Snov, Hunter, Prospeo, and Tomba. Verified-only gating preserves deliverability.

If you want this exact pipeline tuned to your niche, we already built and shipped it. See our take on why outreach breaks in the first place in Why Most Cold Outreach Fails, and if you are ready to put a daily ad-to-contact engine in place, read our AI sales outreach service and book a 15-minute call. We will tell you in the first five minutes whether your setup maps to this pattern.

Want us to build this for you?

15-minute discovery call. No pitch. We tell you what to automate first.

Book a Discovery Call

How to Automate Facebook Ad Library Lead Enrichment (And Load Verified Contacts to Instantly)

Related reading