The X algorithm in May 2026

VoiceMoat vs Tweet Hunter, Typefully, Hypefury: which one writes for the 2026 algorithm

Three of the major AI tweet writers were built when the X ranker was a heuristic-plus-GBDT pipeline scoring each post on a single weighted scalar. Their templates, viral hooks, and engagement maximizers are calibrated to that world. In May 2026, X.AI open-sourced a Grok-derived transformer ranker that scores 19 separate engagement heads, applies negative-signal weights of equal magnitude, isolates candidates from each other in the attention mask, and conditions every score on the viewer's history sequence. The new moat is voice consistency across that history sequence, because consistency compounds per-creator across all 19 heads while drift triggers mute and not-interested with asymmetric penalty. This article compares Tweet Hunter, Typefully, Hypefury, and VoiceMoat against six 2026-algorithm criteria. The conclusion is structural, not feature-list: only one of the four is built on the voice-fidelity assumption the new algorithm rewards.

May 28, 2026 · 13 min read · VoiceMoat team

Pick a writing tool and you are picking a bet about the algorithm. The templates a tool ships, the cadence it encourages, the metrics it surfaces, all reflect a model of what the ranker rewards. Three of the major AI writing tools on X today (Tweet Hunter, Typefully, Hypefury) were architected when the X ranker was a heuristic-plus-GBDT pipeline scoring each post on a single weighted scalar. The 2026 ranker is a Grok-derived transformer scoring 19 separate engagement heads with a candidate-isolation mask, conditioned on viewer history sequences, followed by exponential author-diversity decay and DPP reranking. Two different worlds, two different bets. This article scores four writing tools, VoiceMoat included, against six criteria that drop out of the 2026 ranker mechanics. The conclusion is structural, not a feature-list comparison. Closes the ten-article series; the prior nine pieces build the technical foundation this one rests on.

What changed between 2023 and 2026

Quick recap, with linkbacks for the full mechanics.

The 2023 stack was a hand-engineered feature pipeline plus a MaskNet heavy-ranker. Six thousand engineered features fed the ranker; weights on ten visible engagement signals were disclosed in the public source. SimClusters, TwHIN, and RealGraph supplied separate embedding and graph-affinity signals. A separate light ranker and source-specific rerankers handled the candidate-pipeline assembly.

The 2026 stack is a wholesale rewrite. A1 walks the change in full: one Grok-derived transformer named Phoenix replaces the entire ranker layer; a two-tower retrieval model handles out-of-network candidates; SimClusters, TwHIN, and RealGraph have been removed; hand-engineered features have been eliminated; nineteen action heads replace the ten visible weights; the candidate-isolation mask isolates posts from each other during scoring; the author-diversity scorer multiplies repeated-author scores by exponential decay; the DPP reranker enforces top-of-feed diversity.

Tools optimised for the 2023 stack optimise for things the 2026 stack does not score directly. Tools optimised for the 2026 stack optimise for things the 2023 stack did not have. The shift is large enough that the same writing tool cannot be simultaneously aligned with both architectures.

Two architectural shifts adjacent to the ranker also matter for tool comparison. The recency window walked in A9 replaced the 2023 recency multiplier; cadence strategy that worked under a continuous decay function does not work under a binary 48-hour window. The feed-slot composition walked in A4 shifted the organic share of the visible top-10 downward; tools that encourage volume against a smaller organic slot pool fight a tighter competition than they did in 2023. Both shifts compound the ranker rewrite in the same direction: per-post quality matters more, high-volume saturation matters less.

The six 2026-algorithm criteria

Six criteria that fall out of the prior articles, used as the scoring axes for the tool comparison.

The six 2026-algorithm criteria, with the article in this series that walks each

Source: X Algorithm series

CriterionWhat it measuresSeries article
Multi-head scoringDoes the tool's output fire multiple engagement heads (reply, dwell, profile click, share-via-DM) rather than just visible-engagement heads?A2
OON retrieval similarityDoes the output sit in a sparse embedding region (high top-K survival rate against matched viewers) or in the dense engagement-centre cluster?A3
Negative-signal exposureDoes the tool gate output against voice drift before publish, or ship whatever the creator writes regardless of drift?A5
Dwell-firing structureDoes the output earn continuous-dwell time (second-line payoff, specificity) or deliver in the first line and lose the dwell?A6
Cadence saturation disciplineDoes the tool encourage saturating the per-author cap and diversity decay, or recommend cadence inside the architectural ceiling?A7
Voice-embedding fidelityDoes the tool train on the creator's full corpus to anchor the embedding cluster, or fit output to a category-default template?A8

Each criterion is binary-ish (the tool is aligned or it is not); the combination determines architectural fit. Note the criteria do not include "does the tool produce good copy" or "is the UX clean." Editorial quality and UX are valid axes for tool evaluation, but they are not 2026-algorithm criteria. A tool can score well on UX while scoring poorly on the algorithmic criteria above.

Tweet Hunter, scored

Tweet Hunter ships a large library of viral-post templates organised by category and tone. The product model is template-fit-plus-tone-preset: the creator picks a template (a viral hook structure, a thread template, a reply template), pastes in their topic, and the system adapts the template to fit. The product surface is heavily viral-metrics-driven, with engagement-shaping features (call-to-action templates, hook libraries) as the central value proposition.

Architectural fit against the six criteria:

  • Multi-head scoring: weak. Templates are calibrated to the visible engagement heads (favorite, retweet) that the 2023 ranker rewarded directly. The nine invisible heads (profile click, photo expand, click, three shares, two dwell variants, follow-author) are not the focus.
  • OON retrieval similarity: weak. Template output sits in the dense engagement-centre cluster of the embedding space. Templates are by design pattern-matchable, which means many creators using the same templates produce outputs that cluster near each other, driving down per-viewer retrieval similarity for any individual creator.
  • Negative-signal exposure: weak. No voice-distance gate. Output that deviates from a creator's historical voice ships without warning.
  • Dwell-firing structure: mixed. Some templates encourage multi-line structure that can fire dwell. Many templates deliver the payload in the first line, optimising for like-fire rather than dwell-fire.
  • Cadence saturation discipline: weak. The product surface encourages high posting volume and includes scheduling tooling built around volume cadence.
  • Voice-embedding fidelity: weak by construction. Templates are shared across creators; output from many creators using the same template clusters at the template centroid, not at any individual creator's cluster.

Honest take: Tweet Hunter is a strong product for creators whose priority is short-term visible-engagement growth and who are comfortable with template-shaped output. The product's calibration is to the 2023 ranker. It pre-dates the 2026 rewrite, and the rewrite moved the relative scoring of multi-head, dwell, and voice-distinctive content in a direction the templates do not optimise for.

Typefully, scored

Typefully's core value proposition is a clean composer for threads and single posts, with scheduling and analytics layered on. The product focus is on writing experience and content-organisation, not on template generation. AI features exist but are not the central proposition; the product is positioned as a high-quality composer for serious writers.

Architectural fit against the six criteria:

  • Multi-head scoring: neutral. The product does not bias toward any particular head; the creator writes what they write. Output fidelity depends entirely on the creator, not on the tool.
  • OON retrieval similarity: neutral. Same as above.
  • Negative-signal exposure: weak. No voice-distance gate. The product trusts the creator to ship intentional work.
  • Dwell-firing structure: neutral. Composer features support multi-line structure but do not specifically encourage dwell-firing patterns.
  • Cadence saturation discipline: mixed. Scheduling tooling exists but the product brand leans toward thoughtful publishing rather than high-volume automation.
  • Voice-embedding fidelity: weak. No voice-training layer. AI features are general-LLM-flavoured rather than per-creator voice-trained.

Honest take: Typefully is a strong product for creators who write their own copy and want a clean composer-plus-scheduler. The product does not actively misalign with the 2026 ranker, but it also does not actively help against the criteria that require voice-training or embedding-distinctness. It is architecturally neutral, which is a respectable position.

Hypefury, scored

Hypefury's core value proposition is automation at scale: scheduled posting, cross-platform syndication, auto-DMs, evergreen rotation of prior posts. The product target is high-volume creators running posting cadences in the dozens of posts per week. AI writing features exist but are not the central differentiator; the differentiation is operational scale.

Architectural fit against the six criteria:

  • Multi-head scoring: weak. Volume-first output bias optimises for visible-engagement statistics (the metrics the product surfaces), which maps to a small subset of the head pool.
  • OON retrieval similarity: weak. Evergreen-rotation features surface old posts back into the feed, which compounds the embedding-centre clustering problem because the same posts surface to overlapping viewer histories.
  • Negative-signal exposure: weak. No voice gate. High-volume output increases the probability that off-voice drift slips through.
  • Dwell-firing structure: weak. Automated cadence reduces per-post editorial care; dwell-firing patterns require deliberate writing that high-volume automation discourages.
  • Cadence saturation discipline: very weak. The product encourages cadences that hit the per-author cap and diversity decay hard. The 2026 architectural ceiling is several times below the cadence Hypefury is built for.
  • Voice-embedding fidelity: weak. AI features are not voice-trained.

Honest take: Hypefury is a strong product for creators whose strategy specifically depends on operational scale (multi-platform agencies, high-volume affiliate operations, content-marketing teams running many accounts). The product's architectural assumptions are misaligned with the 2026 X ranker specifically, but the misalignment may be acceptable for use cases where X is one platform of many and absolute X-reach optimisation is not the primary goal.

VoiceMoat, scored

VoiceMoat trains a per-user model named Auden on the creator's full profile across 10 signals (tone, vocabulary, hook style, pacing, formatting, quirks, persona, authority, topic surface, register). The training corpus is the creator's 100 to 200 content pieces (posts, replies, threads, attached images). The output scoring layer surfaces a voice-fidelity score per draft. The product roadmap includes voice-distance gating and dwell-affordance checks per draft.

Architectural fit against the six criteria:

  • Multi-head scoring: strong by design. Voice-trained output fires the heads the creator's audience historically responds to, which includes the invisible heads (profile click, dwell, share-via-DM) that visible-metric tools systematically under-optimise for.
  • OON retrieval similarity: strong. Voice-distinctive output sits in the sparser regions of the embedding space, increasing top-K survival rate for matched viewers. The full mechanism is in A3.
  • Negative-signal exposure: strong. The voice-fidelity score is the explicit gate. Drafts that drift outside the creator's historical envelope surface a warning before publish, reducing the predicted-mute probability described in A5.
  • Dwell-firing structure: strong for trained creators. The training corpus includes the creator's dwell-winning historical posts; the model learns those patterns and produces output in the same shape. Generic dwell heuristics are not the mechanism; the creator's specific dwell-firing patterns are.
  • Cadence saturation discipline: strong. The product framing is fewer, higher-quality posts. Architectural alignment with the diversity decay walked in A7.
  • Voice-embedding fidelity: strong. The 100-to-200-piece corpus is the smallest sample size at which the embedding cluster geometry stabilises, walked in A8.

Honest take: VoiceMoat is architecturally aligned with the 2026 X ranker by construction; the product was built on the voice-as-moat assumption that the rewrite operationalised. The product is not strong on operational-scale features (does not automate cadence, does not handle cross-platform syndication at depth). For creators whose goal is X-specific voice-fidelity growth, the architectural alignment is the relevant differentiator.

The scorecard

Four tools scored against the six 2026-algorithm criteria

Source: qualitative assessment based on the prior nine articles in this series and the tools' public product documentation

CriterionTweet HunterTypefullyHypefuryVoiceMoat
Multi-head scoringweakneutralweakstrong
OON retrieval similarityweakneutralweakstrong
Negative-signal exposureweakweakweakstrong
Dwell-firing structuremixedneutralweakstrong (trained)
Cadence saturation disciplineweakmixedvery weakstrong
Voice-embedding fidelityweakweakweakstrong

The table is meant to be useful, not flattering. Several cells on the right are "strong" by architectural design choice, not by feature-list completeness; VoiceMoat is a younger product than Tweet Hunter or Hypefury and does not match those products on operational-scale features. The criteria above are 2026-algorithm criteria specifically. For different criteria (cross-platform reach, scheduling at scale, affiliate management) the rankings would invert.

A simulated trajectory

The chart below sketches the qualitative shape of voice-fidelity trajectory over a creator's first 90 days of use for each tool. Numbers are illustrative; specific values depend on the creator's starting voice consistency and the audience composition.

Simulated voice-fidelity trajectory across 90 days of consistent use

Source: illustrative simulation, exact values vary by creator

1007550250Tweet Hunter (templates)Typefully (clean composer)Hypefury (volume automation)VoiceMoat (voice-trained)relative drift / predicted negative contribution
Voice-distance from creator's historical patternPredicted negative-signal contribution

The shape that drops out of the simulation: tools that compose output from templates or volume-automate pull the creator's recent posts away from their historical cluster. Tools that respect the creator's existing voice (Typefully, by being editorially neutral) or actively train on it (VoiceMoat) keep voice-distance lower. Predicted negative-signal contribution tracks voice-distance with a slight lag.

The structural difference

Two ways to build an AI writing tool that scales to many creators. The first is to learn what works across all creators, codify the patterns, and ship a tool that pattern-fits the codified style to each new creator's topic. Template-driven tools fall in this category. The output is consistent and predictable; the creator-specific voice is the variable the tool does not capture.

The second is to learn what works specifically for one creator, build a per-user model, and ship a tool that produces output in that specific creator's style. Voice-trained tools fall in this category. The output is voice-specific and unpredictable across creators; the common-pattern baseline is the variable the tool does not lean on.

The 2026 X ranker's architecture rewards the second approach for structural reasons walked across the prior nine articles. The candidate-isolation mask makes each post compete on its own merits against the viewer's per-creator history pattern. The history sequence anchors the model's expectation of how that specific creator writes. Voice consistency keeps the prior strong; voice drift weakens it. The mechanism is not a brand-marketing argument. It is in the source.

Tools built on the first approach are increasingly misaligned with this architecture. Tools built on the second approach are aligned with it. The product category is small (voice-trained per-user models require substantially more engineering work than template libraries), and the alignment matters more in 2026 than it did in 2023.

When to use which tool

An honest closing. The four tools have different sweet spots; none of them is universally right or wrong.

Tweet Hunter is the best fit for creators whose primary goal is visible-engagement growth, who are comfortable with template-shaped output, and who are willing to trade voice consistency for cadence volume. It works well for affiliate-marketing accounts, growth-hacker audiences, and creators in the early-engagement-velocity phase of account growth.

Typefully is the best fit for creators who write their own copy deliberately, want a clean composer-plus-scheduler, and prefer editorial control over automation. Thread-heavy writers, essay-style creators, and people who use X as a thinking surface get value from the product's writing-focused UX.

Hypefury is the best fit for creators running multi-platform content strategies at scale, where X is one of several platforms and operational efficiency matters more than X-specific voice fidelity. Content marketers, multi-account operators, and agencies running many accounts simultaneously get value from the scale features.

VoiceMoat is the best fit for creators whose audience is X-specific and reputation-led, whose value compounds on voice consistency, and whose strategy depends on the 2026-algorithm criteria above. Founders, operators, ghostwriters, and individual builders running high-trust audiences get value from the voice-training depth.

Pick the one whose architectural fit matches your strategy. The 2026 ranker rewards different things differently than the 2023 ranker did, and the right tool depends on what you are actually trying to achieve. If your goal is voice-aligned X growth, the architectural argument walked in this series points to one specific category. If your goal is something else, the other tools have their own merits.

The cornerstone of this series (A1) opened with the claim that voice is now an algorithm-level moat, not a brand-marketing line. The nine articles between then and now walked the mechanism. This article scores the field. The conclusion, for any creator whose strategy is X-specific voice-aligned growth: the tool that matches the ranker's architecture is the one that compounds.

AI disclosure

This article was drafted with AI assistance and human-edited by the VoiceMoat team. All technical claims are sourced to the xai-org/x-algorithm repository; file paths are cited inline.