oon_weight_factor in oon_scorer.rs: how X surfaces unfollowed creators

A creator with 5,000 followers and a viewer with 20,000 follows do not share many feed slots in the obvious way. The viewer sees a tiny fraction of any one creator's posts even when they actively follow them. The interesting question is how a viewer who does not follow the creator ever sees that creator's posts at all. The 2026 X ranker answers that question with three dedicated out-of-network sources, all running a two-tower retrieval model, scored against in-network candidates through a single redacted multiplier. The arithmetic of "how do unfollowed people see my work" runs through that retrieval lane. This article walks the 12 candidate sources in home-mixer/sources/, the three OON sources specifically, the two-tower architecture they share, and the playbook for writing copy that survives the dot product. Companion to the cornerstone A1 and the embedding-mechanics article A8.

The twelve candidate sources

The 2023 home-mixer worked with a small set of candidate sources, most of them tightly coupled to in-network retrieval and a handful of candidate-source-specific rerankers. The 2026 home-mixer has 12 candidate sources wired into the candidate-pipeline framework. Three of them are dedicated out-of-network ML retrieval.

Ten of the twelve candidate sources in home-mixer/sources/. Two additional sources are wired in but not enumerated here.

Source: home-mixer/sources/ inventory

Source file	Type	Stage
thunder_source.rs documented	in-network organic	core
phoenix_source.rs documented	out-of-network ML retrieval (general)	core
phoenix_moe_source.rs documented	out-of-network ML retrieval (mixture of experts)	core
phoenix_topics_source.rs documented	out-of-network ML retrieval (topic-keyed)	core
ads_source.rs documented	paid	non-organic
who_to_follow_source.rs documented	follow recommendation module	non-organic
prompts_source.rs documented	platform prompts (full-cover, half-cover, inline, relevance)	non-organic
push_to_home_source.rs documented	notification-arrived hero injection	non-organic
tweet_mixer_source.rs documented	auxiliary mixing pool	support
cached_posts_source.rs documented	session-continuation cache	support

Three of those (rows 2 through 4) form the out-of-network ML retrieval lane. Posts that arrive in a viewer's feed without the viewer following the author flow through one of these three sources. The remaining organic source, thunder_source.rs, is the in-network store maintained by the Thunder service (the 48-hour retention window walked in A9); it covers posts from accounts the viewer follows.

The non-organic sources cover ads, follow recommendations, prompts, and notification-driven heroes. The detail on slot-competition between organic and non-organic candidates lives in A4; this article focuses on the organic OON lane.

Two-tower retrieval, sketched

All three OON sources call a shared two-tower retrieval model defined in phoenix/recsys_retrieval_model.py. The two-tower architecture is the standard recommendation-system pattern: one tower encodes the user (the viewer, via their history sequence), another tower encodes the candidate (the post plus author), and the similarity score between the two embeddings determines retrieval order.

The OON retrieval flow. Twelve candidate sources feed the home-mixer pipeline. Three of them call the two-tower retrieval model. The dot product against the viewer's history embedding decides which OON candidates surface.

The retrieval mechanism is the same one Spotify, Netflix, Pinterest, and most modern recommendation systems use. The X-specific properties are the inputs to the two towers: the user tower consumes the history-sequence-plus-action-token format described in A8; the candidate tower consumes the post-tokens-plus-author-hash format. Both towers project into the same learned embedding space.

The retrieval is exact-nearest-neighbour at small scale and approximate-nearest-neighbour at production scale. The implementation detail is not in the open repo. What is in the open repo is the input format and the output shape: a ranked list of candidate IDs from the OON pool, scored by similarity to the viewer's history embedding.

The new-user retrieval split

Phoenix retrieval has two variant configurations gated on viewer history length. The threshold parameter is referenced in the code as PhoenixRetrievalNewUserHistoryThreshold. Viewers whose history sequence is shorter than the threshold use a new-user retrieval path; viewers above the threshold use the power-user retrieval path. The distinction matters because the user tower cannot embed a coherent representation from a sequence that is too short.

The new-user retrieval path likely falls back to a more content-popularity-weighted retrieval, since the user-tower embedding is unreliable. The power-user retrieval path runs the standard two-tower similarity. The boundary value is not in the public source.

For creators, the practical implication: viewers who recently joined the platform see a substantially different OON candidate pool than long-term active viewers do. Posts whose voice-trained embeddings distinguish them on the power-user path may not surface on the new-user path, and vice versa. The two retrieval paths are different audiences with different filtering, and a creator's OON reach is a weighted mixture of both.

MoE: a second OON retrieval in parallel

phoenix_moe_source.rs is the mixture-of-experts variant of OON retrieval. The MoE architecture, in this context, means the retrieval model is split into multiple expert sub-models, with a gating mechanism that routes each query (the viewer's history embedding) to one or more appropriate experts.

The probable structure: different experts specialise in different content distributions (technical content, entertainment content, political content, news, lifestyle, etc.), and the gate selects the expert most relevant to the viewer's history pattern. The output is combined with the general-retrieval output for the final OON candidate pool.

The MoE variant is a second OON retrieval running in parallel to the general one. Both contribute candidates to the post-pool that gets scored by Phoenix. The split between them is not surfaced in the public source; production may weight one heavier than the other.

A note on the per-creator implication. The MoE setup matters because it means the answer to "which expert does my content get routed through?" depends on the viewer's history embedding, not on the post itself. A single post from a creator may surface to one set of viewers via the general retrieval and a different set via the MoE retrieval, because the gating decision is per-viewer. Creators whose audience clusters share a strong specialised expert (technical readers, news consumers, entertainment-first viewers) tend to benefit more from MoE retrieval because the routing concentrates their post in front of the expert that surfaces it most reliably. Creators whose audience is more generalist tend to benefit more from the general retrieval lane.

Topic retrieval and the topic-ID layer

The third OON source, phoenix_topics_source.rs, runs retrieval keyed on topic IDs. The topic-ID layer comes from the Grox content classification service (grox/), which assigns each post a topic ID based on its content. Viewers also have topic affinities inferred from their engagement history.

Topic retrieval is the lane that makes "topic-following" work mechanically. A viewer who has consistently engaged with posts in a specific topic class gets OON candidates from that topic surfaced via this lane, regardless of whether they follow the specific accounts posting in that topic.

For creators, topic retrieval is the most controllable OON lane: it is the one most directly responsive to "what is this post actually about?" Posts with clear topical anchors (specific subject matter, named entities, focused arguments) get cleaner Grox topic-ID classifications and surface more reliably in this lane. Posts with vague topical content drift in the topic-classification step and underperform on this lane.

OON_WEIGHT_FACTOR: the one knob that decides how much OON

After Phoenix scores all candidates (in-network plus OON), the home-mixer/scorers/oon_scorer.rs applies a single multiplier (OON_WEIGHT_FACTOR) to the out-of-network candidates' final scores. The factor exists. The factor's sign and existence are documented; the numeric magnitude is in the redacted params module and not in the open release.

The sign is a system-level lever: positive values upweight OON candidates relative to in-network; values below 1.0 downweight them. The value is presumably tuned to hit a target ratio of in-network vs out-of-network posts in a typical viewer's feed. Whatever that target is, it is a global parameter, not a per-creator one.

Two consequences for creators. First, the OON lane is not infinite-ROI for any specific post. The lane competes against in-network candidates in the final blend, with a fixed multiplier that does not adjust to post quality. A great OON-eligible post is still throttled by the multiplier. Second, the multiplier is the same across creators, so the relative performance of two creators in OON depends entirely on the similarity score before the multiplier applies. The retrieval-tower similarity is where the differentiation lives.

The three OON sources side by side

Source: home-mixer/sources/phoenix_*.rs

Source	Specialisation	Gating	Strongest for
phoenix_source.rs documented	general OON retrieval	viewer history threshold	creators with broad distributional reach
phoenix_moe_source.rs documented	mixture-of-experts retrieval	expert gate on history embedding	creators with deep specialisation in a content class
phoenix_topics_source.rs documented	topic-keyed retrieval	Grox topic-ID match	creators with clear topical anchors per post

Embedding-distinctive vs engagement-bait copy

The retrieval similarity score is the variable creators can move through writing. The mechanic is the embedding-space geometry described in A8. The user tower's history embedding sits somewhere in the embedding space, anchored by the viewer's specific engagement pattern. Candidates that sit close to that anchor get high similarity scores.

The trap most creators fall into: writing "broadly engaging" content, which by definition occupies the dense centre of the embedding space. That dense centre is where every other generic-engagement post also sits. Hundreds of thousands of posts compete for top-K positions against any given viewer history embedding from that location. The top-K selection process is the dot product picking the closest few; the generic centre is far from most specific viewer anchors.

Voice-distinctive copy occupies sparser regions of the embedding space. Fewer competing candidates, but the geographically-relevant subset of viewer histories (the people whose anchors are near that region) gets served the candidate at higher similarity. The trade-off is range for density: voice-distinctive content surfaces to fewer viewers, but for each of those viewers the similarity score is higher, the score is higher per-impression, and the dwell-and-reply heads fire more reliably.

Retrieval-similarity distribution for two copy archetypes

Source: illustrative simulation

Engagement-bait copy (dense embedding centre)Voice-distinctive copy (sparser region)

The chart sketches the qualitative split. Engagement-bait copy distributes its similarity scores broadly but rarely cracks the top-K for any specific viewer because the local competition (other generic posts) is intense. Voice-distinctive copy concentrates its similarity scores on the matching audience and survives top-K for a meaningful fraction of them.

The competitor positioning falls out of this analysis directly. Tweet Hunter's template library, by design, produces output in the dense engagement-centre cluster. The output is recognisable as viral-shaped, which fires the favorite head from the audience whose anchors are near the engagement-centre, but the same recognisability puts the output in dense competition for retrieval slots. Voice-trained output, by contrast, sits where the competition is sparser, and gains disproportionately on retrieval. The full four-tool comparison is in A10.

A concrete scenario

To make the geometry visceral, here is a practical scenario. A creator writes about applied machine learning, posts roughly five times a week, and has built up a corpus of approximately 250 posts over the last twelve months. Their audience is a mix of practitioners, researchers, and adjacent technical operators. The creator's embedding cluster sits in a sparse region of the embedding space because the specific combination of "applied ML voice plus operator perspective" is not the broad-internet pretraining centre.

A typical post for them: "Three months running a model in production will teach you more about model evaluation than any paper. Here is the specific thing I learned from the failure modes."

That post, in the OON retrieval space, surfaces with high similarity to viewers whose history sequences contain ML-practitioner content, build-in-public engineering posts, and applied technical writing. The viewer pool with anchors near that region is small (a few hundred thousand viewers across X), but among those viewers the retrieval similarity is high, the top-K survival rate is high, and the downstream Phoenix scoring lands the post with strong predicted positives.

Compare to a generic-engagement version of the same idea: "Production ML is hard. What's the most valuable thing you learned the hard way?"

That post sits in the dense engagement-centre cluster. It surfaces to millions of viewers nominally, but the retrieval similarity is low for any specific viewer because the post is indistinguishable from many other generic-engagement posts competing for those slots. Top-K survival is low, the post often does not make the candidate pool that reaches Phoenix scoring, and even when it does the downstream predictions are weaker.

The aggregate score for the voice-distinctive post is higher than the aggregate score for the generic post, despite the generic post having greater nominal reach. The non-linear gain from retrieval similarity dominates the linear loss from narrower distribution.

What to do with this

Three operational moves for creators thinking about OON growth.

Write content that wears its specificity on its sleeve. Topic clarity (specific subject matter, named entities) helps the Grox-keyed topic-retrieval lane. Style clarity (consistent voice, identifiable phrasing) helps the general and MoE retrieval lanes.

Accept that distribution width and per-impression score trade off. Engagement-bait copy reaches more viewers and earns lower per-viewer score. Voice-distinctive copy reaches fewer viewers and earns higher per-viewer score. Total weighted score across an audience is usually higher for the voice-distinctive path because the per-impression score gain is non-linear in similarity; reach-without-similarity contributes little.

Do not chase OON_WEIGHT_FACTOR. It is a global multiplier outside any creator's control. The lever is retrieval similarity, which is upstream of the multiplier. Focus there.

The OON retrieval lane is the largest growth surface most creators have on X in 2026 because it is the lane that puts creators in front of viewers who do not follow them. The lane is dense, competitive, and embedding-driven. Voice consistency is the single most controllable variable that affects retrieval similarity. The cornerstone of this series opened with the structural argument; this article closes the loop on how that argument cashes out in the specific lane where unfollowed-viewer growth happens.

Resources

Out of network is the new in-network: how Phoenix retrieval surfaces unfollowed creators