Reveal the RealReveal the Real

The AI Reading Your Employer Brand Is Guessing & That's A Real Problem For You

More content won't fix what AI says about your company

Everyone in the AI bubble is telling you the same thing - "publish more." More content, more often, because the AI is scraping everything and you need to feed it. I've learnt that that's completely backwards.

I recently wrote that your EVP and employer brand aren't being read anymore. They're being scraped. An AI hoovers up whatever it can find about you - job ads, Glassdoor, your careers site, a stray Reddit thread - and compresses the lot into three sentences for a candidate who'll never see your award-winning brand film.

Since then I've spent the last few weeks building the thing that does the scraping. It's called TalentTell (coming soon [hopefully!]), and when it launches it'll do two things for you. It'll rate how specific and distinctive your talent attraction communication is actually is. And it'll hand you a brand character, based on the well-known brand archetypes methodology - the framework that says every brand has a personality that steers both who it is on the inside and how it talks on the outside. Nike is the Hero: all about mastery. Harley-Davidson is the Outlaw, or Rebel: all about liberation. Apple is the Creator: all about innovation. If that's new to you, it's worth two minutes on Google before you read on - it'll help make the rest of this land.

Building TalentTell has taught me something the entire Generative Engine Optimisation (GEO) conversation is quietly skating over. The AI isn't just reading reading you. A lot of the time it's guessing. And it doesn't guess the same way twice.

Let me show you what I mean, because once you've seen it you can't unsee it.

Scrape it twice and you get two different companies

Early on, I ran one well-known tech brand, a company whose name you'd probably recognise, through the engine four times. Same website. Same content. Same day.

Run one, the model decided their dominant character was the Innocent: warm, inclusive, everyone belongs. Run two, Innocent again. Run three, it came back as the Rebel: disruptive, forging a new path. Run four, the Explorer: restless, always chasing the next frontier.

Three different identities from one unchanged careers website.

The underlying scores barely moved - their position on the map shifted by a point or two. But the headline label, the one-word answer to "who are you as an employer", flipped run to run. And it flipped because the company genuinely says contradictory things about itself. It has the warm, inclusive language and the swaggering, trail-blazing language sitting right next to each other. So every time the model read it, it weighted those signals slightly differently and reached a different verdict.

That's the part nobody mentions. These models are probabilistic, not deterministic. They don't look something up and return a fixed answer. They generate one, and there's a roll of the dice in there every single time. I tried to control for it. When I started, the engine ran on Claude Opus 4.6, where I could set what's called the temperature - the dial that decides how much the model improvises - to zero. We're now on Opus 4.8, and you can't do that anymore. The setting has gone. My guess, and it's only a guess, is that the model is now sophisticated enough to handle that itself - it does it automatically and has decided we don't need the manual control. Either way, even back when I could set it to zero, the honest truth is that identical output was never actually guaranteed.

And that's just the model's end. The scrape itself drifts too. Run it on Tuesday and again on Thursday and the raw material changes underneath you. The featured employee testimonial rotates. The "posted three days ago" stamp ticks over. The "100+ engineers" counter updates. Some sites serve a different hero headline to different visitors as a personalisation feature or an A/B test, or different copy depending on what country they think you're sitting in.

So you can be scraped twice and come back as two subtly different companies before the model has even started guessing. And if your own signals already contradict each other, the two versions that come back can look like two completely different companies.

Enjoying this? I can pop the next one in your inbox if you'd like.

It often can't reach the most important part of you

The next problem cost me weeks, and it's the one I least expected.

Arguably the two highest-volume employer brand signals you produce are your job adverts and your careers site - and a careers site can be anything from a single page to dozens. Your live roles alone are dozens of documents, refreshing constantly, describing the actual work. If an AI is learning who you are as an employer, your job adverts and your careers site are a big part of that textbook.

But your job adverts are also the part most likely to be completely invisible to the scrapers.

A huge share of employers post their roles through an Applicant Tracking System - Workday, Greenhouse, Lever, Eightfold, Phenom People, and a dozen others. And they do not all behave the same way when something automated comes knocking.

Greenhouse and Lever are gracious hosts - they hand over clean job data through a public door. Workday hides the same data behind per-tenant configuration most scrapers can't open. Some of the biggest platforms - the ones running careers sites for household-name employers - render their job descriptions entirely in client-side JavaScript. To a human in a browser it looks perfect. To a machine, one of them returned over 10,000 words of stylesheet and navigation code and not a single line of the actual job. Another handed back the job title, the location, a tidy list of "similar jobs", and then nothing where the description should be. Job boards sitting behind a reCAPTCHA gave up only their cookie-consent text.

So picture two companies with identically brilliant cultures and identically well-written roles. One bought Greenhouse, the other bought a JavaScript-only platform. To the AI, the first is articulate and the second is a ghost. That gap has nothing to do with their employer brand and everything to do with a procurement decision made by someone in HR Ops three years ago who never imagined a language model would one day be the reader.

My rule whilst building was simple: never penalise a company for my scraper's limitations. If a human can see it, I have to find a way to extract it. But the LLMs out in the wild, answering candidate questions all day, are not being that careful. If they can't reach it, as far as they're concerned it doesn't exist.

It can't tell your employer voice from your furniture

This is the one that broke my brain, and it's the heart of why this is so much harder than the GEO crowd makes out.

When a model lands on your careers page, it does not arrive knowing which words are your employer brand. It just sees text. And a careers page is never only employer brand. It's employer brand wrapped in product marketing, event banners, a cookie-consent notice, legal boilerplate, navigation menus, tracking scripts, and - my personal favourite - the bits of template the agency forgot to fill in. I have genuinely watched pages serve up "Your engaging subtitle goes here" and "Widget title goes here" as though it were carefully chosen messaging.

The machine doesn't instinctively know that your cookie banner isn't how you talk about yourself as an employer. If you don't make the distinction obvious, it might decide that the loudest, most repeated text on the page - which is sometimes the legal and navigational furniture - is your voice. It could read your cookie notice and quietly file it under "how this company describes working here".

And it scores on proportion, the same way a person skim-reading would. If 70% of what it can grab is generic boilerplate and 10% is the genuinely brilliant, specific, human stuff buried three clicks down, then as far as the machine is concerned you're a 70%-boilerplate company. The brilliant bit doesn't rescue the average. It just drowns in it.

So why "publish more content" is exactly backwards

This is where I part company with most of the GEO advice doing the rounds. The standard prescription is, "feed the machines, publish more, flood the zone with employer content."

But look at what we've just covered. The AI is regularly guessing, it often can't reach your most important content, and it can't reliably tell your real voice from your furniture. Pour more content into a house that disordered and all you've done is give a probabilistic, half-blind reader more ways to misquote you.

Volume isn't the lever. It's barely even secondary.

Getting your house in order comes first. And that's far less glamorous than commissioning a campaign, which is precisely why most people will skip it. Five things actually matter, roughly in this order:

1. Make sure a machine can actually reach it. Open your careers site and your job ads with JavaScript switched off, or look at the raw page source. Whatever survives is roughly what the AI sees. If your roles vanish, they're locked in a client-side system and you're invisible exactly where it counts. Sort that - render content server-side, or host job descriptions somewhere crawlable - before you write another word.

2. Signpost what's employer content and what isn't. Give your culture, values, and life-here content clear, stable web addresses and real headings - /life-at, /values, /benefits. Use proper page structure, not a wall of prettily styled boxes. Don't trap your best material inside a PDF, an image, or a video with no transcript. You're trying to help a machine tell your employer voice apart from your product copy and your privacy policy. Make it easy for it.

3. Clear out the furniture. Hunt down the placeholder text, the abandoned pages, the duplicated boilerplate, the cookie copy that reads like a contract. Every scrap of generic noise dilutes the signal, because the machine weighs proportion. Less, cleaner, and unmistakably about working for you beats more.

4. Say the same thing everywhere. Your careers page, your job ads, your LinkedIn, your Glassdoor responses - they should tell one coherent story. Consistency is what lets you compress into something sharp instead of fragmenting into the contradictory mush that makes the machine guess differently each time. The brand that says one clear thing in ten places survives the squeeze. The brand that says ten different things gets averaged into nothing.

5. Be specific, and be distinctive. This is the real fix for the guessing. A probabilistic reader needs something solid to grab. Vague, generic, sounds-like-everyone-else language gives it nothing, so the read drifts and you blur into the sector. Concrete, ownable, true language - the stuff only you could have written - is what gets read the same way twice. It's also, not coincidentally, what makes a human candidate choose you. This is why "more content" sits at the bottom of the list. More generic content makes the guessing worse, not better.

None of this is exciting. It's plumbing, not a brand film. There's no awards ceremony for a well-structured careers site or a job advert that says something true. But it's the difference between an AI describing your company accurately and consistently to a candidate, and an AI improvising you from your cookie banner.

In that first post I bet most of you didn't have a Scooby what AI was learning about your organisation. I've now spent weeks building the machine that goes and finds out. The first thing it taught me is that the machine is only ever as good as the house you let it walk into.

So go and tidy the house. The content can wait.

Like what you're reading?

If my content resonates with you, I can deliver it to your inbox whenever I publish something new. No fluff and definitely no spam.

Unsubscribe anytime. No hard feelings. Although I might cry myself to sleep tonight.

See the latest blog posts

Want to discuss something?

contact me

Subscribe to blog