In May 2026, X.AI quietly pushed three commits to a public repository called
xai-org/x-algorithm and walked
away. No blog post, no launch thread, no developer call. Just a clean drop:
214 KB of source, a 3 GB Git LFS checkpoint, issues disabled. The repo now
has 25,000 stars and almost no one writing about it has read it carefully.
We did. The conclusion that came out of that reading is the one this entire series is built around: X's ranker is no longer a feature engineering problem. It is a sequence model. And every piece of "X algorithm advice" that survives from 2020 to 2023 is now wrong in the same specific way.
What actually changed
The 2023 Twitter release (twitter/the-algorithm, twitter/the-algorithm-ml)
described a stack we will recognise from anything labelled "modern" in
machine-learning recommender systems through about 2022. There was a
Scala home-mixer orchestrating a candidate pipeline. There was a MaskNet
heavy-ranker scoring a few thousand candidates against around 6,000 hand
engineered features. There were three embedding systems running in
parallel: SimClusters for community-level signals, TwHIN for graph-aware
representations, RealGraph for follow-affinity scores. There was a light
ranker, a heavy ranker, candidate-source-specific rerankers, and a serving
layer called Navi. Roughly half the team's effort sat in feature
engineering for that pipeline.
That entire stack is gone. The May 2026 repo is a wholesale rewrite, and
the README does not soften it: "We have eliminated every single hand
engineered feature and most heuristics from the system." In place of the
2023 layer cake, X now runs one Grok-derived transformer named Phoenix
(phoenix/recsys_model.py) for ranking and a two
tower retrieval model (phoenix/recsys_retrieval_model.py)
for out of network candidates. The orchestration around them, the
home-mixer pipeline (home-mixer/), is now in Rust.
Source: xai-org/x-algorithm/README.md + repo inventory
| Layer | 2023 release | May 2026 release |
|---|---|---|
| Orchestration | Scala home-mixer | Rust home-mixer with candidate-pipeline traits |
| Ranker | MaskNet heavy-ranker on ~6,000 hand-engineered features | Phoenix transformer with no hand-engineered features |
| Light ranker | Separate model | Folded into Phoenix |
| Community signals | SimClusters embeddings | Removed; learned implicitly |
| Graph signals | TwHIN embeddings + RealGraph follow-affinity | Removed; one boolean in-network flag plus mutual follow |
| In-network store | Earlybird search index | Thunder in-memory store |
| Out-of-network retrieval | Mixed candidate sources, source-specific rerankers | Two-tower retrieval (PhoenixRetrieval) plus MoE and topic variants |
| Trust and safety | Visibility filtering inside home-mixer | Split: vf_filter post-selection plus Grox content service |
| Engagement signals | 10 weights disclosed in the public code | 19 action heads, weights redacted from the params module |
| Inference serving | Navi | Python pipeline (run_pipeline.py) plus value-model gRPC |
Read that table once more and notice what is conspicuously absent on the right column. No similarity clusters. No graph embeddings. No follow affinity score. No hand engineered features. The transformer is expected to learn all of that implicitly from the user's history sequence. That is the shift this article is about. Everything downstream of it, the 19 action heads, the new retrieval architecture, the negative-signal economy, the diversity reranker, falls out of that one design move.
Phoenix in one picture
The diagram below collapses the May 2026 ranker into a single picture. It omits a great deal, but it preserves the four properties that matter for anyone writing on X.
The Phoenix ranker, end to end. Inputs on top, outputs at the bottom.
Four properties to lock in before we go further. First, the viewer's
history is a sequence, not a summary. The transformer sees a chronologically
ordered tape of what the viewer has seen and how the viewer reacted, and
self-attention does the rest. Second, the model does not produce one
"engagement score." It produces 19 separate per-candidate probabilities,
one per supported action. Third, those 19 probabilities feed a weighted
linear combination defined in
home-mixer/scorers/weighted_scorer.rs that turns into
the final score. Fourth, and most important for everything below, the
attention mask isolates candidates from each other. When Phoenix scores
your post, it cannot see the other posts in the same scoring batch. The
score is a function only of you and the viewer.
The 19 action heads and the new scoring math
The 2023 ranker disclosed ten weight constants in its public code. The
2026 ranker exposes nineteen action heads but redacts every numeric
weight (the params module is excluded from the open release). What we
have are the head names: FAVORITE_WEIGHT, REPLY_WEIGHT, QUOTE_WEIGHT,
QUOTED_CLICK_WEIGHT, RETWEET_WEIGHT, PHOTO_EXPAND_WEIGHT,
CLICK_WEIGHT, PROFILE_CLICK_WEIGHT, VQV_WEIGHT (video quality view),
SHARE_WEIGHT, SHARE_VIA_DM_WEIGHT, SHARE_VIA_COPY_LINK_WEIGHT,
DWELL_WEIGHT, CONT_DWELL_TIME_WEIGHT, FOLLOW_AUTHOR_WEIGHT,
NOT_INTERESTED_WEIGHT, BLOCK_AUTHOR_WEIGHT, MUTE_AUTHOR_WEIGHT,
REPORT_WEIGHT. Plus an implicit not_dwelled_score referenced in
home-mixer/scorers/vm_ranker.rs but not the simpler
weighted scorer, which is its own minor mystery.
The scoring formula, reproduced from
home-mixer/scorers/weighted_scorer.rs, looks like
this:
combined = Σ_i (P(action_i) × WEIGHT_i)
final = offset_score(combined)
offset_score is an asymmetric clamp. Net positive scores pass through
roughly linearly. Net negative scores, where predicted mutes and not
interested overwhelm predicted positives, route to a separate branch that
rescales by (combined + NEGATIVE_WEIGHTS_SUM) / WEIGHTS_SUM × NEGATIVE_SCORES_OFFSET. The full negative-signal economy
(A5
covers it in detail) is the single biggest scoring change creators
should internalize from this rewrite. One predicted mute can sink dozens
of predicted likes. Every one of those nineteen heads has its own
breakdown in A2
of this series, where we split them into the six verifiable from
run_pipeline.py index constants and the thirteen we can only infer from
the proto field names.
The implication of nineteen heads, not one, is the next compounding property. A post that fires three of the high-weight heads (reply, profile-click, continuous-dwell) ranks above a post that fires one (favorite) with three times the volume. The graph below sketches that effect schematically. It is not from production telemetry; the numbers are illustrative.
Source: illustrative simulation
The single-signal shape on the left rewards volume on the highest weighted signal. The multi-head shape on the right rewards posts that fire a wider mix of actions. That is the structural property anything optimised for "how do I get more likes" is now misaligned with.
Out of network is a different model entirely
The handful of accounts a viewer follows are not the For You feed anymore.
They are one of twelve candidate sources, alongside three dedicated out
of network retrieval sources (home-mixer/sources/phoenix_source.rs,
home-mixer/sources/phoenix_moe_source.rs,
home-mixer/sources/phoenix_topics_source.rs),
ads, who-to-follow modules, prompts, push-to-home heroes, cached posts,
and a tweet-mixer source. Three of the twelve sources are dedicated to
posts from accounts the viewer does not follow.
Those three OON sources all call a two tower retrieval model. One tower
embeds the viewer's history sequence. The other embeds candidate posts
plus author signals. The retrieval step takes the dot product, returns the
top K. Then a single multiplier, OON_WEIGHT_FACTOR in
home-mixer/scorers/oon_scorer.rs, rescales those
candidates against the in network ones before final ranking.
The piece worth saying out loud: getting into a stranger's feed in 2026 is a question about your post's embedding, not your engagement metrics. The two tower retrieval does not see your likes count when it picks you. It sees how close your candidate embedding sits to the viewer's history embedding. Generic engagement bait sits in the dense centre of the embedding space, alongside everything else trained on the same internet. Voice-distinctive copy lives further out. The OON retrieval is where that distance pays off, and it is the largest single growth surface most creators have on X right now. A3 of this series walks the two tower math and the playbook for getting embedded-distinctive copy past the dot product.
Why "algorithm hacks" stopped working
The 2023 ranker was a feature-engineering problem. Every "X algorithm hack" you ever read, the question-as-hook, the 280-char limit gambit, the reply-magnet first line, the dwell-time bait, the screenshot-of-text move, worked because each one mapped onto a specific feature the heavy ranker consumed. Optimize for the feature, win the slot.
The 2026 ranker has no features in that sense. Every hand engineered feature was eliminated. Phoenix learns its own representation of what a "reply-magnet first line" looks like from the millions of viewer histories in training data. That representation is then conditioned on the specific viewer, because every score is conditioned on their history sequence. The hack that worked in 2023 because it tickled a single feature representation now competes against a per-viewer learned model of "what kind of post does this person actually engage with." If that person's engagement pattern is biased toward thoughtful long-form replies, your screenshot-of-text move scores worse for them, even if it scored well for a generic 2023 audience.
The diversity layer compounds this. After Phoenix ranks the candidates,
home-mixer/scorers/author_diversity_scorer.rs walks
the list and multiplies repeated-author scores by an exponential decay
factor. Your second post in a feed gets multiplied by decay. Your third
by decay². By the seventh post in a session, the multiplier is near the
diversity floor, regardless of how good Phoenix said your individual
score was. home-mixer/scorers/vm_ranker.rs's
Determinantal Point Process reranker layers an additional diversity-aware
selection on top of that for the highest-prominence slots. The volume
playbook ("post 10 times a day, optimize for impressions") is now structurally
penalised. A7
quantifies the decay curve and gives the cadence playbook that
respects it.
The summary version: hacks worked because the system rewarded surface patterns. The system now rewards a learned model of "what this specific viewer responds to," diversified across the top of feed. Surface patterns get crushed by both layers.
Voice as a history embedding
Here is the property of Phoenix that most algorithm coverage has missed. The viewer's history sequence is a chronologically ordered tape of past posts and the action the viewer took on each. Every creator the viewer has historically engaged with appears in that sequence multiple times: their posts as token embeddings, paired with the actions the viewer took.
The transformer learns, implicitly, what posts from you look like in that viewer's history. Not your username (that is just one hashed token in a 1M-author vocabulary). Your distribution. The pattern of opinions, sentence structures, framing moves, recurring themes. The viewer's attention head can latch onto that pattern across the sequence. When a new candidate from you arrives, the model has a per-viewer prior on "this is one of those posts."
Voice consistency directly increases the strength of that prior. Voice drift weakens it. The collapse is per-viewer (every follower has a different history), but it is consistent in direction: the more your recent posts look unlike your historical posts in that viewer's history, the less the model can leverage the prior. The score gets pulled toward the cold-start baseline for that viewer.
The candidate-isolation mask compounds the effect. Because candidates cannot attend to each other, one viral post in your recent history does not rescue an off-voice post that ships afterward. Each candidate is scored on its own merits against the viewer's history pattern. A streak of three on-voice posts followed by one off-voice post does not average out. The off-voice post is scored, in isolation, against a viewer who remembers your historical pattern.
This is the structural argument for voice as a moat, and it is now an algorithm-level argument. Not a brand-level one, not a "your audience will notice" one, but a "the ranker assigns it lower probability because it does not match the embedding pattern in viewer history" one. A8 of this series walks the embedding math in depth and gives the mechanical definition of voice fidelity in Phoenix-native terms.
What this means for tools built on the old assumption
Three of the major AI writers in this category, Tweet Hunter, Typefully, Hypefury, were built when the ranker was a feature-engineering problem. Their templates, viral hooks, scheduled-thread patterns, and engagement optimizers are calibrated to that world. Tweet Hunter ships viral-tweet templates because in 2023 viral templates fired the retweet weight and won slots. Hypefury automates posting cadence because volume mapped to impressions. Typefully optimizes thread composition for in-network engagement because in-network was the dominant retrieval lane.
None of those bets survives the rewrite cleanly. Viral templates produce generic-engagement embeddings that lose the OON dot product to voice-distinctive copy. Posting cadence runs straight into the diversity decay. In-network optimization ignores three of twelve candidate sources.
VoiceMoat was built on the assumption that the ranker would move in this direction (we wrote the product roadmap against learned representations, not against feature engineering). Auden, the writing-pattern model inside VoiceMoat, trains per user on 100 to 200 content pieces (posts, replies, threads, attached images) across 10 signals: tone, vocabulary, hook style, pacing, formatting, quirks, persona, authority, topic surface, and register. The 10-signal stack is the input to a per-user voice fingerprint that scores every draft against that user's actual distribution. That output is the operational definition of "does this look like one of your posts in your followers' history sequences." The full competitor breakdown, scored against six 2026-algorithm criteria, lives in A10 of this series.
The nine articles after this one
This piece is the cornerstone. Nine more articles unpack the specific mechanics. They publish over the next several weeks.
- A2: How Phoenix ranks every post. The 19 action heads, six verified from
run_pipeline.py, thirteen inferred from the PhoenixScores proto, with the weighted-scorer math walked end to end. - A3: Out of network is the new in network. The two tower retrieval model, the three OON sources,
OON_WEIGHT_FACTOR, and what embedding-distinctive copy looks like next to engagement-bait copy. - A4: The five ways X steals your feed slots. Ads, who-to-follow, prompts, push-to-home, cached posts. The real organic top-10 is usually top-6.
- A5: The negative-signal economy. Mute, block, not-interested, not-dwelled, plus the muted-keyword and author-socialgraph filters. Why net-negative posts get rescored on a different branch.
- A6: Dwell time is the new like. The four dwell heads and the writing patterns that fire them.
- A7: Why X feeds reject your third post of the day. The author-diversity exponential decay, the DPP reranker, and the cadence playbook that survives both.
- A8: Your voice is an embedding. The deep-technical companion to this article. How Phoenix represents creators, what voice drift looks like in embedding space, and why 20 posts is not enough corpus to anchor a voice.
- A9: 48 hours in Thunder. The recency-boost myth and the recency-window reality. Thunder's 2-day retention, the 3-layer impression bloom, and the OON escape hatch for old posts.
- A10: VoiceMoat vs Tweet Hunter, Typefully, Hypefury. The four tools scored against six 2026-algorithm criteria.
Every article in the series cites the repo file paths each claim rests on, labels inferred claims as inferred, and treats the 2023 leaked weights as directional reference only. If a claim is "widely believed but not in the repo," we say so. The point of writing all of this is not to be the loudest voice in algorithm coverage. It is to be the most accurate one, because the strategic conclusion ("voice is the moat") is only as strong as the technical foundation underneath it. The technical foundation is in the repo. We have read it. The next nine pieces unpack what is in there.