The May 2026 X algorithm: why voice wins when the ranker becomes a transformer

In May 2026, X.AI quietly pushed three commits to a public repository called xai-org/x-algorithm and walked away. No blog post, no launch thread, no developer call. Just a clean drop: 214 KB of source, a 3 GB Git LFS checkpoint, issues disabled. The repo now has 25,000 stars and almost no one writing about it has read it carefully.

We did. The conclusion that came out of that reading is the one this entire series is built around: X's ranker is no longer a feature engineering problem. It is a sequence model. And every piece of "X algorithm advice" that survives from 2020 to 2023 is now wrong in the same specific way.

What actually changed

The 2023 Twitter release (twitter/the-algorithm, twitter/the-algorithm-ml) described a stack we will recognise from anything labelled "modern" in machine-learning recommender systems through about 2022. There was a Scala home-mixer orchestrating a candidate pipeline. There was a MaskNet heavy-ranker scoring a few thousand candidates against around 6,000 hand engineered features. There were three embedding systems running in parallel: SimClusters for community-level signals, TwHIN for graph-aware representations, RealGraph for follow-affinity scores. There was a light ranker, a heavy ranker, candidate-source-specific rerankers, and a serving layer called Navi. Roughly half the team's effort sat in feature engineering for that pipeline.

That entire stack is gone. The May 2026 repo is a wholesale rewrite, and the README does not soften it: "We have eliminated every single hand engineered feature and most heuristics from the system." In place of the 2023 layer cake, X now runs one Grok-derived transformer named Phoenix (phoenix/recsys_model.py) for ranking and a two tower retrieval model (phoenix/recsys_retrieval_model.py) for out of network candidates. The orchestration around them, the home-mixer pipeline (home-mixer/), is now in Rust.

The 2023 stack and what replaced it in May 2026

Source: xai-org/x-algorithm/README.md + repo inventory

Layer	2023 release	May 2026 release
Orchestration	Scala home-mixer	Rust home-mixer with candidate-pipeline traits
Ranker	MaskNet heavy-ranker on ~6,000 hand-engineered features	Phoenix transformer with no hand-engineered features
Light ranker	Separate model	Folded into Phoenix
Community signals	SimClusters embeddings	Removed; learned implicitly
Graph signals	TwHIN embeddings + RealGraph follow-affinity	Removed; one boolean in-network flag plus mutual follow
In-network store	Earlybird search index	Thunder in-memory store
Out-of-network retrieval	Mixed candidate sources, source-specific rerankers	Two-tower retrieval (PhoenixRetrieval) plus MoE and topic variants
Trust and safety	Visibility filtering inside home-mixer	Split: vf_filter post-selection plus Grox content service
Engagement signals	10 weights disclosed in the public code	19 action heads, weights redacted from the params module
Inference serving	Navi	Python pipeline (run_pipeline.py) plus value-model gRPC

Read that table once more and notice what is conspicuously absent on the right column. No similarity clusters. No graph embeddings. No follow affinity score. No hand engineered features. The transformer is expected to learn all of that implicitly from the user's history sequence. That is the shift this article is about. Everything downstream of it, the 19 action heads, the new retrieval architecture, the negative-signal economy, the diversity reranker, falls out of that one design move.

Phoenix in one picture

The diagram below collapses the May 2026 ranker into a single picture. It omits a great deal, but it preserves the four properties that matter for anyone writing on X.

The Phoenix ranker, end to end. Inputs on top, outputs at the bottom.

Four properties to lock in before we go further. First, the viewer's history is a sequence, not a summary. The transformer sees a chronologically ordered tape of what the viewer has seen and how the viewer reacted, and self-attention does the rest. Second, the model does not produce one "engagement score." It produces 19 separate per-candidate probabilities, one per supported action. Third, those 19 probabilities feed a weighted linear combination defined in home-mixer/scorers/weighted_scorer.rs that turns into the final score. Fourth, and most important for everything below, the attention mask isolates candidates from each other. When Phoenix scores your post, it cannot see the other posts in the same scoring batch. The score is a function only of you and the viewer.

The 19 action heads and the new scoring math

The 2023 ranker disclosed ten weight constants in its public code. The 2026 ranker exposes nineteen action heads but redacts every numeric weight (the params module is excluded from the open release). What we have are the head names: FAVORITE_WEIGHT, REPLY_WEIGHT, QUOTE_WEIGHT, QUOTED_CLICK_WEIGHT, RETWEET_WEIGHT, PHOTO_EXPAND_WEIGHT, CLICK_WEIGHT, PROFILE_CLICK_WEIGHT, VQV_WEIGHT (video quality view), SHARE_WEIGHT, SHARE_VIA_DM_WEIGHT, SHARE_VIA_COPY_LINK_WEIGHT, DWELL_WEIGHT, CONT_DWELL_TIME_WEIGHT, FOLLOW_AUTHOR_WEIGHT, NOT_INTERESTED_WEIGHT, BLOCK_AUTHOR_WEIGHT, MUTE_AUTHOR_WEIGHT, REPORT_WEIGHT. Plus an implicit not_dwelled_score referenced in home-mixer/scorers/vm_ranker.rs but not the simpler weighted scorer, which is its own minor mystery.

The scoring formula, reproduced from home-mixer/scorers/weighted_scorer.rs, looks like this:

combined = Σ_i (P(action_i) × WEIGHT_i)
final = offset_score(combined)

offset_score is an asymmetric clamp. Net positive scores pass through roughly linearly. Net negative scores, where predicted mutes and not interested overwhelm predicted positives, route to a separate branch that rescales by (combined + NEGATIVE_WEIGHTS_SUM) / WEIGHTS_SUM × NEGATIVE_SCORES_OFFSET. The full negative-signal economy (A5 covers it in detail) is the single biggest scoring change creators should internalize from this rewrite. One predicted mute can sink dozens of predicted likes. Every one of those nineteen heads has its own breakdown in A2 of this series, where we split them into the six verifiable from run_pipeline.py index constants and the thirteen we can only infer from the proto field names.

The implication of nineteen heads, not one, is the next compounding property. A post that fires three of the high-weight heads (reply, profile-click, continuous-dwell) ranks above a post that fires one (favorite) with three times the volume. The graph below sketches that effect schematically. It is not from production telemetry; the numbers are illustrative.

Same number of impressions, very different scoring shape under the two rankers

Source: illustrative simulation

Heuristic ranker (2023 shape)Phoenix multi-head (2026)

The single-signal shape on the left rewards volume on the highest weighted signal. The multi-head shape on the right rewards posts that fire a wider mix of actions. That is the structural property anything optimised for "how do I get more likes" is now misaligned with.

Out of network is a different model entirely

The handful of accounts a viewer follows are not the For You feed anymore. They are one of twelve candidate sources, alongside three dedicated out of network retrieval sources (home-mixer/sources/phoenix_source.rs, home-mixer/sources/phoenix_moe_source.rs, home-mixer/sources/phoenix_topics_source.rs), ads, who-to-follow modules, prompts, push-to-home heroes, cached posts, and a tweet-mixer source. Three of the twelve sources are dedicated to posts from accounts the viewer does not follow.

Those three OON sources all call a two tower retrieval model. One tower embeds the viewer's history sequence. The other embeds candidate posts plus author signals. The retrieval step takes the dot product, returns the top K. Then a single multiplier, OON_WEIGHT_FACTOR in home-mixer/scorers/oon_scorer.rs, rescales those candidates against the in network ones before final ranking.

The piece worth saying out loud: getting into a stranger's feed in 2026 is a question about your post's embedding, not your engagement metrics. The two tower retrieval does not see your likes count when it picks you. It sees how close your candidate embedding sits to the viewer's history embedding. Generic engagement bait sits in the dense centre of the embedding space, alongside everything else trained on the same internet. Voice-distinctive copy lives further out. The OON retrieval is where that distance pays off, and it is the largest single growth surface most creators have on X right now. A3 of this series walks the two tower math and the playbook for getting embedded-distinctive copy past the dot product.

Why "algorithm hacks" stopped working

The 2023 ranker was a feature-engineering problem. Every "X algorithm hack" you ever read, the question-as-hook, the 280-char limit gambit, the reply-magnet first line, the dwell-time bait, the screenshot-of-text move, worked because each one mapped onto a specific feature the heavy ranker consumed. Optimize for the feature, win the slot.

The 2026 ranker has no features in that sense. Every hand engineered feature was eliminated. Phoenix learns its own representation of what a "reply-magnet first line" looks like from the millions of viewer histories in training data. That representation is then conditioned on the specific viewer, because every score is conditioned on their history sequence. The hack that worked in 2023 because it tickled a single feature representation now competes against a per-viewer learned model of "what kind of post does this person actually engage with." If that person's engagement pattern is biased toward thoughtful long-form replies, your screenshot-of-text move scores worse for them, even if it scored well for a generic 2023 audience.

The diversity layer compounds this. After Phoenix ranks the candidates, home-mixer/scorers/author_diversity_scorer.rs walks the list and multiplies repeated-author scores by an exponential decay factor. Your second post in a feed gets multiplied by decay. Your third by decay². By the seventh post in a session, the multiplier is near the diversity floor, regardless of how good Phoenix said your individual score was. home-mixer/scorers/vm_ranker.rs's Determinantal Point Process reranker layers an additional diversity-aware selection on top of that for the highest-prominence slots. The volume playbook ("post 10 times a day, optimize for impressions") is now structurally penalised. A7 quantifies the decay curve and gives the cadence playbook that respects it.

The summary version: hacks worked because the system rewarded surface patterns. The system now rewards a learned model of "what this specific viewer responds to," diversified across the top of feed. Surface patterns get crushed by both layers.

Voice as a history embedding

Here is the property of Phoenix that most algorithm coverage has missed. The viewer's history sequence is a chronologically ordered tape of past posts and the action the viewer took on each. Every creator the viewer has historically engaged with appears in that sequence multiple times: their posts as token embeddings, paired with the actions the viewer took.

The transformer learns, implicitly, what posts from you look like in that viewer's history. Not your username (that is just one hashed token in a 1M-author vocabulary). Your distribution. The pattern of opinions, sentence structures, framing moves, recurring themes. The viewer's attention head can latch onto that pattern across the sequence. When a new candidate from you arrives, the model has a per-viewer prior on "this is one of those posts."

Voice consistency directly increases the strength of that prior. Voice drift weakens it. The collapse is per-viewer (every follower has a different history), but it is consistent in direction: the more your recent posts look unlike your historical posts in that viewer's history, the less the model can leverage the prior. The score gets pulled toward the cold-start baseline for that viewer.

The candidate-isolation mask compounds the effect. Because candidates cannot attend to each other, one viral post in your recent history does not rescue an off-voice post that ships afterward. Each candidate is scored on its own merits against the viewer's history pattern. A streak of three on-voice posts followed by one off-voice post does not average out. The off-voice post is scored, in isolation, against a viewer who remembers your historical pattern.

This is the structural argument for voice as a moat, and it is now an algorithm-level argument. Not a brand-level one, not a "your audience will notice" one, but a "the ranker assigns it lower probability because it does not match the embedding pattern in viewer history" one. A8 of this series walks the embedding math in depth and gives the mechanical definition of voice fidelity in Phoenix-native terms.

What this means for tools built on the old assumption

Three of the major AI writers in this category, Tweet Hunter, Typefully, Hypefury, were built when the ranker was a feature-engineering problem. Their templates, viral hooks, scheduled-thread patterns, and engagement optimizers are calibrated to that world. Tweet Hunter ships viral-tweet templates because in 2023 viral templates fired the retweet weight and won slots. Hypefury automates posting cadence because volume mapped to impressions. Typefully optimizes thread composition for in-network engagement because in-network was the dominant retrieval lane.

None of those bets survives the rewrite cleanly. Viral templates produce generic-engagement embeddings that lose the OON dot product to voice-distinctive copy. Posting cadence runs straight into the diversity decay. In-network optimization ignores three of twelve candidate sources.

VoiceMoat was built on the assumption that the ranker would move in this direction (we wrote the product roadmap against learned representations, not against feature engineering). Auden, the writing-pattern model inside VoiceMoat, trains per user on 100 to 200 content pieces (posts, replies, threads, attached images) across 10 signals: tone, vocabulary, hook style, pacing, formatting, quirks, persona, authority, topic surface, and register. The 10-signal stack is the input to a per-user voice fingerprint that scores every draft against that user's actual distribution. That output is the operational definition of "does this look like one of your posts in your followers' history sequences." The full competitor breakdown, scored against six 2026-algorithm criteria, lives in A10 of this series.

The nine articles after this one

This piece is the cornerstone. Nine more articles unpack the specific mechanics. They publish over the next several weeks.

A2: How Phoenix ranks every post. The 19 action heads, six verified from run_pipeline.py, thirteen inferred from the PhoenixScores proto, with the weighted-scorer math walked end to end.
A3: Out of network is the new in network. The two tower retrieval model, the three OON sources, OON_WEIGHT_FACTOR, and what embedding-distinctive copy looks like next to engagement-bait copy.
A4: The five ways X steals your feed slots. Ads, who-to-follow, prompts, push-to-home, cached posts. The real organic top-10 is usually top-6.
A5: The negative-signal economy. Mute, block, not-interested, not-dwelled, plus the muted-keyword and author-socialgraph filters. Why net-negative posts get rescored on a different branch.
A6: Dwell time is the new like. The four dwell heads and the writing patterns that fire them.
A7: Why X feeds reject your third post of the day. The author-diversity exponential decay, the DPP reranker, and the cadence playbook that survives both.
A8: Your voice is an embedding. The deep-technical companion to this article. How Phoenix represents creators, what voice drift looks like in embedding space, and why 20 posts is not enough corpus to anchor a voice.
A9: 48 hours in Thunder. The recency-boost myth and the recency-window reality. Thunder's 2-day retention, the 3-layer impression bloom, and the OON escape hatch for old posts.
A10: VoiceMoat vs Tweet Hunter, Typefully, Hypefury. The four tools scored against six 2026-algorithm criteria.

Every article in the series cites the repo file paths each claim rests on, labels inferred claims as inferred, and treats the 2023 leaked weights as directional reference only. If a claim is "widely believed but not in the repo," we say so. The point of writing all of this is not to be the loudest voice in algorithm coverage. It is to be the most accurate one, because the strategic conclusion ("voice is the moat") is only as strong as the technical foundation underneath it. The technical foundation is in the repo. We have read it. The next nine pieces unpack what is in there.

Resources