AI Sourcing Agent for Staffing-Tech | BroutonLab Case Study

An AI sourcing agent we built for a US staffing firm with 200+ recruiters — production-grade automated candidate sourcing that finds the right candidate among millions from a single natural-language brief, with semantic disambiguation and automatic relaxation for narrow searches.

The problem

The client is a US staffing firm with 200+ recruiters. Their team spent 60–70% of every workday on manual candidate sourcing — building boolean LinkedIn queries, scrolling through irrelevant results, and reading resumes that didn’t actually fit the brief. Worse: the briefs hiring managers actually wrote weren’t boolean-ready.

A real recruiter brief looks like this:

“Looking for a C++ developer from New York who worked at Series A startups, has 15 years of experience, and is currently at a Series C company.”

Filling that out as a structured search form would take 10–15 minutes — and even then, the form would likely return the wrong people, because hiring intent doesn’t map cleanly to checkbox fields. The recruiter would tweak filters, retry, get a different bad set of results, and tweak again.

So in practice the team did it manually: read the brief, mentally translate it into search terms, run LinkedIn queries, manually adjust when results were off. Slow, inconsistent across recruiters, and — most painful — senior recruiters spent the bulk of their day on this process work instead of the actual judgment calls only they could make: candidate qualification, hiring-manager calibration, closing.

The founder wanted real candidate sourcing automation. Not a “boolean query generator” with an AI sticker on it. Not another GPT wrapper that hallucinates candidates. Something that could take a brief the way a recruiter would speak it and consistently surface qualified people.

Our approach

We built three coordinated layers — each chosen for what it does best:

AI agent (OpenAI Agents SDK) — parses unstructured natural-language briefs and translates them into structured search intent. Disambiguates the messy parts: “Series A → C” means a trajectory, not the current stage; “15 years of experience” applies to total work history, not just the current role; “C++ from NYC” might also include strong Boost/embedded systems candidates who never used that exact phrasing.
Semantic search over canonical taxonomies — titles, skills, industries, company tags. When the agent extracts “C++ developer”, semantic search expands it into the canonical roles (“Senior Software Engineer”, “Staff Engineer”, “Founding Engineer with C++”) and related industries (HFT, embedded systems, game engines, ML infra). The kinds of expansions a senior recruiter makes in their head.
Elasticsearch lexical queries — once the agent has the structured intent (industries + skills + seniority + stage history + location), it generates a precise ES query against an indexed pool of millions of candidates. With automatic query relaxation: when a strict query returns too few results, the agent tries progressively weaker variants on its own — without dumping the recruiter back into the manual loop.

The architectural bet: right tool for each layer. LLM-based agent where reasoning over fuzzy intent matters. Semantic search where domain knowledge encoded in taxonomies matters. Lexical Elasticsearch where speed and precision over a large candidate pool matter.

What we built

After discovery and a technical spike, we shipped the first production version with the client’s recruiting team as design partners. We then iterated on the system based on real recruiter usage — adjusting agent behavior, expanding taxonomies, and refining the query templates.

The AI agent

Built on OpenAI Agents SDK — declarative tool-calling, structured outputs, and concurrency primitives out of the box. The agent has access to several tools:

search_titles — semantic search over canonical job titles
search_skills — same over the skill taxonomy
search_industries — same over industries and company-stage signals
query_elasticsearch — final lexical query against the candidate index
relax_query — progressive relaxation when strict matches return too few candidates

The agent’s job: read the unstructured brief, plan a sequence of tool calls, build the structured intent, run the Elasticsearch query, and optionally relax-and-retry. Tool use was critical here — versus a pure “LLM generates final query in one shot” approach — because each search call gives the agent grounded information (this skill exists in the taxonomy, that one doesn’t) before committing to the final query. Without grounding, the LLM would happily produce syntactically valid queries referencing skills or industries that weren’t actually indexed.

Semantic search layer

Recruiter mental models don’t map 1:1 to clean taxonomy. A recruiter mentions “Series A startup background” and means: founding-engineer-level seniority, multi-functional ownership, exposure to small-team chaos. None of that is directly searchable as a discrete field.

We built embedding indexes over canonical fields (titles, skills, industries, company tags) and let the agent query them. When the agent’s intent extraction returns something like “experienced founding engineer at small AI startups”, it fires three semantic searches in parallel:

titles → ["Senior Software Engineer", "Founding Engineer", "Staff Engineer", "Lead AI Engineer"]
industries → ["AI/ML", "early-stage SaaS", "ML infrastructure"]
skills → ["Python", "PyTorch", "production ML"]

Now the agent has a set of grounded canonical terms — the kind a senior recruiter would arrive at after a couple of mental jumps. These go into the Elasticsearch query.

Elasticsearch query generation

Final layer: the agent constructs an ES query combining the disambiguated fields. Filters for the strict constraints (location, years of experience, current company stage), should-clauses for the multiple title and skill variants, and boosts where some signals matter more than others (e.g., recent experience at the requested seniority outweighs older similar roles).

ES gives us two things the agent layer can’t: speed (sub-second responses over millions of candidates) and precision on hard constraints. A recruiter asking for “candidates in New York” wants exactly NYC, not “candidates near NYC ranked by embedding similarity”. Lexical filters deliver that.

Query relaxation

This was a meaningful product feature, not engineering polish. Recruiter briefs hit a ceiling fast: a narrow brief returns 0–3 candidates total, and the recruiter is back to manually loosening constraints — the exact work we were trying to remove.

The relaxation logic lets the agent automatically:

Drop the rarest constraint first (e.g., “Series A history” before “C++”)
Widen radius (NY → tri-state area)
Soften experience bounds (15 years → 12+ years)

Each relaxation step is logged and surfaced to the recruiter so they see exactly what was loosened. Keeps the human in the loop on judgment calls without forcing them through the manual tweak cycle.

Observability and eval

We instrumented the agent with LangFuse — every tool call, every prompt/response, every relaxation step is traced and recorded. This was non-negotiable for production reliability.

When a recruiter said “the bot returned wrong people on this brief”, we could pull up the exact trace: which tools the agent called, what the semantic searches returned, where the disambiguation went wrong, which ES query was generated. Without that, iteration would have been guesswork. With it, we could move from a vague complaint to a concrete improvement (taxonomy gap, prompt tweak, relaxation threshold) within a single sprint.

Performance

The final system is optimized to handle approximately 1,000 requests per second on a 16-core server — enough headroom for the full recruiter team running concurrent sourcing across multiple hiring-manager briefs.

Results

The system is in production at the client and powers candidate sourcing for the recruiting team:

Recruiters trigger a sourcing request with a natural-language brief and receive a ranked candidate list in seconds — no more manual translation into boolean queries.
Query relaxation automates what was previously the second round of manual tweaking.
LangFuse observability gives the engineering team full visibility into the agent’s reasoning, which keeps iteration speed high.
Performance optimized for ~1,000 req/sec on a 16-core server.

Engineering trade-offs

A few decisions worth flagging — each one came directly from production iteration:

Why an AI agent (not just LLM-as-query-generator). Early experiments tried the simpler pattern: LLM reads the brief, outputs an Elasticsearch query in one shot. It worked in demos and failed in production. The LLM had no grounded knowledge of what taxonomies were actually indexed, so it produced queries with valid syntax but referenced skills and industries that didn’t exist in the database. The agent-with-tools pattern forces the system to ground itself in real canonical data before committing to a final query. More tool calls, more latency, but vastly more reliable.

Why semantic + lexical (not pure semantic search). Pure embedding search would have been architecturally simpler, but it loses precision on hard constraints. “Candidates in NYC” should not return “candidates near NYC ranked by relevance” — it should return NYC. Lexical filters in Elasticsearch give that. Semantic search handles the fuzzy parts (intent expansion, taxonomy traversal); Elasticsearch handles the strict parts (location, years, current stage). The two together give a result set that’s both relevant and correct.

Why query relaxation became a product feature. We initially thought relaxation was a nice engineering safety net. In practice it turned out to be one of the most-used features — narrow briefs are common, and the difference between “0 results, recruiter manually loosens constraints” and “5 results after one auto-relaxation step, recruiter sees exactly what was loosened” is the entire point of the product. We surfaced relaxation as a first-class part of the workflow rather than hiding it.

When this architecture applies

The same pattern — agent that grounds itself in canonical taxonomies, then generates structured search queries with progressive relaxation — applies to any domain where:

Buyer intent is fuzzy and natural-language (“looking for a C++ engineer from a Series A in NYC”)
Underlying data is structured and indexed (Elasticsearch, Postgres, vector DB, whatever)
Strict matching alone returns too few results, so some form of intelligent relaxation is needed

Common applications beyond staffing-tech: B2B prospecting databases, real estate listings, technical literature search, e-commerce SKU discovery, M&A target screening. Anywhere a structured database meets an unstructured human question.

An AI sourcing agent that finds the right candidate in millions — from one English sentence