BlogAI and Voice

How to spot AI-generated content in 2026: the em-dash and 8 other tells

How to spot AI content in 2026: the em-dash is the canonical tell, but it is one of nine. Here is the full diagnostic. Eight more vocabulary, structure, and rhythm signs of AI writing, two common false positives, and the byline-removal test that catches the rest.

May 12, 2026 · 6 min read

If you want to spot AI-generated content quickly, look for the em-dash first. Two or three em-dashes in a 200-word post is a strong signal that the writing was either drafted by an LLM or heavily edited by one. The em-dash is the most famous AI tell in 2026 because the major models punctuate with them at a rate that exceeds almost every human writer who is not specifically a long-form essayist. But the em-dash is one of nine tells, not the whole list. Spotting AI content reliably means reading for the cluster, not just the punctuation. This piece is the full diagnostic: the em-dash plus eight other patterns that, taken together, reveal AI writing in roughly 30 seconds of reading. Plus the two false positives that catch out lazy detection, and the byline-removal test that catches the rest.

Why this matters now. The base rate of AI-assisted content on every major platform crossed a threshold in 2026. The question is no longer whether a given post might be AI. The question is whether you can tell which ones are, and whether your own writing has drifted into the same patterns without you noticing. The tells below are useful in both directions: as a reader-side filter and as a writer-side audit.

Tell 1: em-dash density

The em-dash is the canonical AI tell because the major LLMs deploy it as a default rhythm tool. The result is em-dash density that no normal writer matches. Two or three em-dashes in a single short post. Em-dashes mid-clause, mid-sentence, mid-list. Em-dashes used where a comma, colon, or period would do the same work with less drama. The pattern shows up across platforms (X, LinkedIn, blog, email) regardless of the surface topic, because it is a model-level habit rather than a writer-level choice.

How to use it diagnostically: count em-dashes in any single paragraph. If the count is two or more in a paragraph under 100 words, the probability the writing is AI-assisted goes up sharply. If the count is three or more, treat it as near-certain unless you know the writer is a long-form essayist whose voice naturally runs em-dash-heavy.

Tell 2: the AI vocabulary cluster

Certain words appear in AI-drafted writing at frequencies that are visibly elevated. The cluster shifts model-to-model and quarter-to-quarter, but a stable subset has held through 2026: leverage (as a verb), delve, unlock, navigate (the metaphorical sense), elevate, foster, harness, robust, seamless, comprehensive, holistic, in the realm of, in the world of. None of these words is bad in isolation. The tell is the cluster: when three or four of them show up in a single short post, the probability the writing was at least partly drafted by an LLM is high. The longer companion piece on this is the words AI overuses (and how to ban them from your writing forever), which gives the full list and the substitution table.

Tell 3: the symmetric hook template

AI-drafted hooks pattern toward symmetric two-clause openings. "Most people think X. The reality is Y." "It is not about X. It is about Y." "Forget X. Focus on Y." The structure is fine in moderation. The tell is the frequency: when a feed shows a creator opening four out of five posts with the same two-clause symmetric pattern, the underlying drafting tool is shaping the output. The mechanical explainer for why this happens (model defaults, training-data weighting toward the same templates) is in why AI drafts sound the same.

Tell 4: the "not just X, but Y" frame

A specific construction worth its own line: "It is not just X. It is Y." Or its sibling: "This is not about X. It is about Y." The frame is overused in AI output to the point that it functions as a signature. Models reach for it as a way to add apparent depth to a flat statement. Used once, it is a fine rhetorical move. Used three times in one post, it is a fingerprint.

Tell 5: the beige bullet middle

AI-drafted long posts tend to collapse into a bulleted middle section where every bullet is the same length, every bullet starts with a similar grammatical structure, and every bullet says something true but unspecific. "Build trust with your audience. Show up consistently. Provide value with every post. Engage authentically." Each line passes a shallow read. The cluster is the tell: four or five evenly weighted bullets that could appear in any business-content post on any platform. Real human bullet lists tend to be uneven (one bullet long, one short, one a fragment, one a complete thought) and tend to include at least one bullet that is unmistakably the writer's own.

Tell 6: the closing CTA register

AI-drafted posts default to a specific register at the close. "What's your take?" "Drop your thoughts below." "Curious to hear your experience." "Let me know in the replies." The closing line tends to be a generic engagement-bait sentence rather than a specific question or a sharp observation. The tell is the genericness, not the presence of a CTA. A specific question ("who else has shipped a feature on a Friday and regretted it") reads as a writer's voice; the generic CTA reads as model default.

Tell 7: paragraph-rhythm symmetry

Human writers have uneven paragraph rhythm. Three short sentences. Then one long meandering one with an aside. Then a fragment. AI drafts tend toward visually uniform paragraphs of similar lengths, with similar sentence-count patterns, and few fragments. The macro-rhythm reads as suspiciously even. This is harder to spot in a single post but obvious across a creator's feed: when every post has the same shape, the underlying drafting tool is normalizing the rhythm.

Tell 8: voice-flat coherence

The hardest tell to articulate, but the most reliable once you tune to it. AI-drafted posts read as fluent and coherent and forgettable. The grammar is right, the structure is right, the cadence is right, and there is no specific detail you would remember 24 hours later. Voice-flat is the named version of this pattern in the AI slop essay. The full mechanical case for what produces it is in the nine dimensions of voice. When a post passes the substitution test (you could swap the writer's name for any other creator's name in the same niche and the post would still read the same), that is voice-flat coherence in action.

Tell 9: the missing taboo

Real writers have taboos: words they refuse, framings they avoid, opinions they will not soften. The taboos are part of what makes the writing recognizable. AI-drafted writing has no taboos by default. It uses every safe word, includes every conventional framing, and softens every sharp edge into a balanced both-sides hedge. The absence of taboos is itself a tell. If a post reads as carefully balanced and inoffensive on every dimension, the chance it was drafted by an LLM is high.

Two false positives worth knowing

Lazy detection generates two common false-positive errors. Worth knowing both, because misidentifying real human writing as AI is its own credibility problem.

False positive 1: the long-form essayist. Some writers (essayists, novelists, certain academics, some screenwriters) use em-dashes at rates that look AI-shaped if you only count the em-dashes. The cluster test resolves this: if the writer uses em-dashes heavily but their vocabulary, hooks, and bullet patterns are all unmistakably their own, the em-dashes are voice, not AI tell. Cormac McCarthy famously used em-dashes constantly. He was not an LLM.

False positive 2: writers who use AI for editing only. A creator who drafts in their own voice and then runs the draft through an LLM for grammar and tightening can end up with em-dashes added by the editor pass. The post is mostly their voice with an AI surface polish. This is a category that deserves its own diagnostic, not the same alarm as fully AI-drafted output. The tell here is mixed: some signals fire (em-dash, vocabulary), others do not (taboos present, specific details preserved, hooks idiosyncratic).

The byline-removal test

The single most reliable way to test whether a post is AI-drafted: remove the byline and ask whether you would still know who wrote it. If yes, the writing carries voice signals strong enough that no LLM in the current generation could have produced it without a voice-trained workflow. If no, the post is either AI-drafted or written in a register so generic that the AI-versus-human distinction barely matters. The byline-removal test is the highest-leverage one because it captures the underlying question, which is not really "was this AI" but "does this writing belong to a specific person."

What to do about it as a writer

If you are a writer who uses AI in any part of your workflow, the tells above are your audit checklist. Run them on your last 10 posts. Count em-dashes. Scan for the AI vocabulary cluster. Read your hooks for the symmetric two-clause pattern. Look at your bullets for evenness. Check whether your closing lines all read as the same generic CTA. The audit takes 10 minutes and reveals where the AI surface polish has flattened your voice. The tool-side counterpart to this human-eye diagnostic (a skeptical-honest read on what Originality.ai, GPTZero, ZeroGPT, Copyleaks, and Winston AI actually catch in 2026, where the false-positive problem hits long-form essayists and non-native English writers, and why no consequential decision should be made on tool output alone) is at AI detection tools tested: what Originality.ai, GPTZero, ZeroGPT, Copyleaks, and Winston AI actually catch in 2026. The two pieces are complements: this diagnostic for the human-eye read, the tool-tested piece for the machine-classifier read.

The fix is not to stop using AI. The fix is to use AI in a workflow where the voice is preserved as a constraint. The mechanical case for what that looks like is in AI tweet writing without losing voice, and the structural failure mode that creeps in over time even when you are careful is in voice drift. The founder-essay prescription for stepping out of the AI-tells pattern entirely (the four operational requirements that produce non-AI-shaped output) is in why all AI-written tweets sound the same (and how to actually fix it), which reads as the prescription companion to this diagnostic. The audience-perception companion question (which fraction of an audience actually detects these tells, whether the audience cares when they do, and the asymmetry between the detection-rate question and the audience-quality question) is at can your audience tell you're using AI? an honest 2026 analysis; the diagnostic at this URL plus the perception read at that URL together make the full writer-plus-audience picture. The active-while-drafting companion that converts the nine canonical tells into nine avoidance practices with constructed before/after examples per tell is at how to avoid the AI tells: a writer's checklist for 2026; use this URL to audit shipped writing, that URL when drafting new writing.

Where Auden fits

Auden, the brain inside VoiceMoat, is the inverse of the AI tells listed above. Auden is trained on a creator's full profile (100 to 200 posts, replies, threads, and images across 9 signals of voice) so the output preserves the writer's specific patterns rather than collapsing into the model defaults. Auden refuses words like leverage and delve as taboos at the model level. The voice match score is the inverse diagnostic: instead of asking "does this read as AI," it asks "does this read as me," and a score above 90 says the post would pass the byline-removal test in your own voice. The deeper case for why voice itself is the moat that survives the AI-fluency floor is in authenticity as a moat.

Quick checklist

Em-dash density. Two or more in a sub-100-word paragraph. Strong signal.
Vocabulary cluster. Three or more of leverage, delve, unlock, navigate, elevate, foster, harness, robust, seamless, comprehensive, holistic. Strong signal.
Symmetric hook template. "Most people think X. The reality is Y." repeated across the feed.
The not-just-X-but-Y frame. Fine once. Fingerprint when used three times in one post.
Beige bullet middle. Four or five evenly weighted bullets that could appear in any business post.
Generic closing CTA. "What's your take?" rather than a specific question.
Symmetric paragraph rhythm. No fragments. No uneven sentence lengths.
Voice-flat coherence. Fluent, forgettable, swappable byline.
Missing taboos. No refusals, no sharp edges, careful both-sides hedge on everything.
Run the byline-removal test as the final check.