negative_scores_offset: how one mute outweighs 50 likes on X (2026)

Algorithm advice on X is almost entirely additive. "Do this, ship that, post at this time, fire this signal." Read the actual scoring code and the picture inverts. The single most score-moving event the 2026 X ranker handles is not a like, a retweet, or a reply. It is a mute. And the math that processes it routes through a different branch of the scoring function entirely. This article walks the negative-signal economy: the four explicit negative weights, the implicit not-dwelled score, the asymmetric offset_score branch, the four hard-kill filters that run upstream of any scoring, and the off-voice-drift pipeline that turns voice inconsistency into predicted mutes. Companion to A1 and A2, both of which establish the architecture and the head inventory this piece builds on.

The offset branch: a different scoring formula for net-negative posts

Recall the weighted scoring formula from home-mixer/scorers/weighted_scorer.rs:

combined = Σ_i (P(action_i) × WEIGHT_i)
final = offset_score(combined)

The first line is the standard linear combination. The second is the load-bearing bit creators rarely hear about. offset_score is not the identity function. When the combined score is positive, the value passes through largely unchanged. When the combined score is negative, the formula rescales it:

final = (combined + NEGATIVE_WEIGHTS_SUM) / WEIGHTS_SUM × NEGATIVE_SCORES_OFFSET

The intuition: a post predicted to net-fire more negatives than positives is not just placed at a lower position. It is routed to a different scoring regime entirely. The numerator adds the sum of negative weights (a large negative number) to the already-negative combined score, then the denominator divides by the sum of all weights to normalise, then a final offset multiplier scales the result into a separate range. Production values for these constants are redacted (the params module is excluded from the public release), but the structural property is clear: the same combined score, sitting in net-positive vs net-negative territory, is treated as living in two different scoring universes.

The single practical implication: a post that nets a handful of likes but triggers a non-trivial predicted-mute probability does not just score lower than an unambiguously good post. It scores in a different regime, behind a non-linear transformation. The asymmetry is the mechanical reason "reach loss" feels sudden rather than gradual to creators whose posts cross from net-positive into net-negative.

The four explicit negative heads, side by side

The four explicit negative heads in weighted_scorer.rs, with 2023 leaked weights as directional reference

Source: home-mixer/scorers/weighted_scorer.rs + 2023 twitter/the-algorithm-ml SrcEngagementWeights

Head	UI gesture	2023 leaked weight	2026 status	Effect
Not interested documented	long-press, "Not interested in this post"	minus 74 2023 ref	redacted	soft de-weight via offset branch
Mute author documented	post menu, mute author	roughly minus 100 2023 ref	redacted	soft de-weight plus deterministic filter on next requests
Block author documented	post menu, block author	roughly minus 100 2023 ref	redacted	soft de-weight plus deterministic filter, bidirectional
Report documented	post menu, report for policy violation	not enumerated in leaks	redacted	soft de-weight plus policy escalation

A note on the leaked 2023 magnitudes. The favorite weight in 2023 was 0.5. The mute weight was roughly minus 100. The ratio is 200 to 1. Translated into expected score: a Phoenix prediction of even 0.5 percent mute probability, applied to a weight of minus 100, contributes minus 0.5 on a single head. That contribution alone cancels the contribution of one full favorite at a hundred percent probability. A predicted mute probability of 5 percent, contributing minus 5, cancels the contribution of ten guaranteed favorites. None of these numbers carry over to 2026 unchanged, but the order-of-magnitude property (negatives outweigh positives per action by a factor in the dozens to hundreds) is structural to the formula's design and survives any specific weight adjustment.

Not-dwelled: the negative signal nobody opted into

There is a fifth negative signal that does not appear as a WEIGHT constant in home-mixer/scorers/weighted_scorer.rs. It lives in the PhoenixScores proto referenced by home-mixer/scorers/vm_ranker.rs, a parallel scoring path that uses a remote value model. The field name is not_dwelled_score, and its presence in the proto but absence in the simpler weighted scorer suggests one of two things. Either home-mixer/scorers/vm_ranker.rs is an experimental path running alongside the canonical scorer with at least one extra signal, or it is in fact the production path, in which case the simpler weighted scorer in the public release omits a signal that production uses.

The mechanism of not_dwelled_score is the same as the discrete dwell head, inverted. A viewer scrolling past your post without pausing fires the binary "not dwelled" condition. Unlike the four explicit negatives, this one requires no menu interaction, no decision, no friction from the viewer. Every impression that does not earn a dwell counts. For a creator, the practical implication is sobering: the negative-signal volume is dominated not by mutes and reports, which are rare events, but by the continuous stream of viewers scrolling past without engaging. The predicted-not-dwelled probability is what most off-target candidates move on.

This is the silent kingmaker of the negative side of the ledger. A2 walked the positive dwell heads; the absence of dwell is the corresponding negative signal, and it operates at scale because every non-dwelled impression contributes.

The hard-kill filter layer

Upstream of the scorer, four filter files in home-mixer/filters/ remove candidates deterministically before they ever reach Phoenix. These are not probabilistic predictions. They are boolean conditions: either the filter triggers and the candidate is dropped, or it does not.

The four post-selection and pre-scoring hard-kill filters

Source: home-mixer/filters/ + home-mixer/scorers/vm_ranker.rs context

Filter	What it removes	Stage	Severity
muted_keyword_filter.rs documented	candidates whose tokens match the viewer's muted-keyword list	pre-scoring	hard kill
author_socialgraph_filter.rs documented	candidates from authors the viewer has blocked, muted, or otherwise excluded via social-graph relationships	pre-scoring	hard kill
vf_filter.rs documented	candidates flagged by visibility filtering: deleted posts, spam, violence, gore, policy violations	post-selection	hard kill
topic_ids_filter.rs documented	candidates whose Grox-classified topic IDs match the viewer's topic exclusions	pre-scoring	hard kill

The split is meaningful. The probabilistic layer (the four negative heads in the weighted scorer) handles "the viewer will probably want to mute this if served." The deterministic layer (the four filters above) handles "the viewer has already muted this author, or this content class." Both layers feed the same "less of this" intent, but they operate at different stages and on different signals.

Two consequences for creators. First, once a viewer mutes your account, they are removed from your impression base for that account entirely. The deterministic filter triggers on every subsequent request and your post is dropped before scoring. There is no recovery short of the viewer reversing the mute themselves. Second, mutes are not isolated events. They contribute to Phoenix's training signal across the follower base. A pattern of mutes from a creator's audience adjusts the predicted-mute probability for similar viewers next time.

The voice-drift to mute pipeline

The data flow that connects voice drift to actual mutes runs through five steps, each observable in the repo or directly in the model behaviour.

Step one. A creator's audience formed around a specific voice. The follow decision encoded "I want more of posts like the ones I already saw from this account." Every follower's history sequence now contains several to many of the creator's prior posts paired with the actions that follower took (favorite, dwell, reply, profile click). Phoenix has learned a per-creator distribution across the engagement heads for that follower.

Step two. The creator ships a post that deviates from their historical distribution. The deviation can be tonal (suddenly more formal or more casual), structural (a thread when the audience expects single posts), topical (a topic outside the historical surface area), or stylistic (template-shaped engagement copy when the audience expects unedited voice).

Step three. Phoenix scores the candidate against the viewer's history sequence. The candidate-isolation mask (explained in A1) means the candidate cannot rely on other posts in the same batch for support. The score depends on the candidate's own representation and how well it matches the viewer's per-creator history pattern. An off-voice candidate matches the pattern less well, and the predicted-positive probabilities (favorite, dwell, reply, profile click) shift downward.

Step four. Simultaneously, the predicted-negative probabilities shift upward. The model has implicitly learned that off-distribution posts from creators are the ones followers historically reacted to with mute or "not interested." A 1.5-percent predicted-mute probability that would have been 0.2 percent on an on-voice post, multiplied by the large negative weight, is enough to shift the combined score across the positive-to-negative boundary and into the offset_score rescaling branch.

Step five. The post serves to a smaller fraction of the audience. Of that smaller fraction, a few actually mute. Those mutes feed the training signal for the next pass. The creator's follower base now contains a slightly stronger prior for off-voice content triggering mute, which raises predicted-mute probability for similar future posts. Repeated drift accelerates the effect.

The pipeline is a compounding mechanism, not a single-event one. The voice-fidelity question is not "did this one post survive?" but "what did this post do to the predicted-mute prior across my follower base for the next ninety days?"

Voice drift score vs predicted mute rate

The relationship between voice-distance from a creator's historical distribution and predicted-mute rate is non-linear. Drift inside the historical envelope has near-zero impact. Drift outside it ramps fast. The chart below sketches the shape with illustrative numbers; production telemetry is not available.

Voice-distance bucket vs predicted negative-signal contribution

Source: illustrative simulation, weights and counts notional

Predicted-positive contributionPredicted-negative contribution (negated for comparison)

Two readings of this chart matter. First, the on-voice bucket is not risk-free; there is a baseline negative contribution that comes from viewers who simply did not engage with the post. Second, the crossover where the right-column bar overtakes the left-column bar happens before the post would feel obviously off to the creator. The model is sensitive enough that drift well short of "this reads like a different person wrote it" is already shifting the predicted-action mix.

The shadowban myth, gently retired

A short detour worth making. "Shadowban" is the most common word used for what creators experience when negative signals catch up with them. The word implies a hidden flag, set by a human or a moderator, that silently throttles a specific account. That model does not match what the public 2026 source shows.

What the source shows is two layers operating in parallel. One is the probabilistic scorer described above, which produces a score that depends on the predicted-action mix per viewer. The other is the deterministic filter layer, which removes candidates based on viewer-set preferences (muted keywords, blocked or muted authors, excluded topics) or platform-enforced classifications (visibility filtering from home-mixer/filters/vf_filter.rs for policy violations). Neither layer references a hidden per-account flag. The probabilistic layer responds to model predictions that update from audience reactions. The deterministic layer responds to user-set or policy-set rules. Both are auditable in the open repo.

The lived experience of "shadowban" is the rate-of-change in those two layers. A surge of mutes from your audience pushes predicted-mute probability up across viewer histories. A keyword that lands in many viewers' muted-keyword lists pulls posts containing that keyword out of their candidate pool. A wave of reports that survive vf_filter review raises the policy classification on the creator's content class. None of these are "shadowban" in the hidden-flag sense. All of them produce the experience that is labelled shadowban. The mechanism is in the source, not in a moderator's spreadsheet.

The practical implication: chasing rumours about hidden flags is a waste. The right operational question is "what is my predicted-action mix doing across my audience?" Voice-fidelity is the strongest direct lever on that mix. Topic surface and policy-class drift are the next two. Everything else is downstream.

What this means for tools that do not measure voice

Three categories of writing tools fail the negative-signal test by construction. Template-based generators (most engagement-optimised products) fit voice to a category-default template; the output reads fluent but off-distribution for any specific creator. Scheduler-shaped tools (volume-first) ship whatever the creator writes, with no fidelity gate. Generic-LLM writing assistants (general-purpose models prompted with light context) produce helpful-assistant register that the audience pattern-matches as not-the-creator within a few posts.

All three categories have positive use cases. None of them protects against the negative-signal economy described above. The protection requires a voice-fidelity check between draft and publish that specifically scores the draft against the creator's historical distribution. That score is mechanically related to predicted-mute probability through the pipeline above, even though no model is yet shipped that maps the two scores directly. A10 of this series compares the four major writing tools against six 2026-algorithm criteria, with the negative-signal exposure one of them.

What to do with this

Two operational moves come out of the negative-signal economy mechanically.

Use voice as the gate, not the goal. The voice-distance score on a draft does not tell you the draft is good. It tells you it sits inside or outside your historical distribution. Inside is the necessary condition. Whether the draft is sharp, insightful, useful, that is a separate question your editorial judgment owns. The score gates risk; it does not promise upside.

Watch the velocity of mutes, not the absolute count. A creator with 50,000 followers expects some baseline mute rate per week. The signal that matters is the rate-of-change. If the weekly mute count is steady, the predicted-mute prior in Phoenix is steady. If the weekly mute count is climbing, the prior is climbing too, which means future posts will score lower by default. Public mute counts are not surfaced in X's own analytics; the proxy is reach loss on otherwise comparable content. A steady week-over-week reach drop on on-topic posts is the lagging indicator of the prior moving against you.

The negative-signal economy is the structural reason "voice as a moat" is not a marketing line. It is mechanically the only positive lever a creator has against a non-linear penalty that scales with audience size. The next article in this series, A8 on voice as an embedding, walks the embedding-level mechanism. The piece after that, on dwell, returns to the positive side of the ledger with the four dwell heads that on-voice content fires most reliably.

Resources

The negative-signal economy: how one mute outweighs 50 likes on X