Algorithm advice on X is almost entirely additive. "Do this, ship that,
post at this time, fire this signal." Read the actual scoring code and
the picture inverts. The single most score-moving event the 2026 X
ranker handles is not a like, a retweet, or a reply. It is a mute. And
the math that processes it routes through a different branch of the
scoring function entirely. This article walks the negative-signal
economy: the four explicit negative weights, the implicit not-dwelled
score, the asymmetric offset_score branch, the four hard-kill filters
that run upstream of any scoring, and the off-voice-drift pipeline that
turns voice inconsistency into predicted mutes. Companion to
A1
and A2,
both of which establish the architecture and the head inventory this
piece builds on.
The offset branch: a different scoring formula for net-negative posts
Recall the weighted scoring formula from
home-mixer/scorers/weighted_scorer.rs:
combined = Σ_i (P(action_i) × WEIGHT_i)
final = offset_score(combined)
The first line is the standard linear combination. The second is the
load-bearing bit creators rarely hear about. offset_score is not the
identity function. When the combined score is positive, the value
passes through largely unchanged. When the combined score is negative,
the formula rescales it:
final = (combined + NEGATIVE_WEIGHTS_SUM) / WEIGHTS_SUM × NEGATIVE_SCORES_OFFSET
The intuition: a post predicted to net-fire more negatives than positives
is not just placed at a lower position. It is routed to a different
scoring regime entirely. The numerator adds the sum of negative weights
(a large negative number) to the already-negative combined score, then
the denominator divides by the sum of all weights to normalise, then a
final offset multiplier scales the result into a separate range.
Production values for these constants are redacted (the params module
is excluded from the public release), but the structural property is
clear: the same combined score, sitting in net-positive vs net-negative
territory, is treated as living in two different scoring universes.
The single practical implication: a post that nets a handful of likes but triggers a non-trivial predicted-mute probability does not just score lower than an unambiguously good post. It scores in a different regime, behind a non-linear transformation. The asymmetry is the mechanical reason "reach loss" feels sudden rather than gradual to creators whose posts cross from net-positive into net-negative.
The four explicit negative heads, side by side
Source: home-mixer/scorers/weighted_scorer.rs + 2023 twitter/the-algorithm-ml SrcEngagementWeights
| Head | UI gesture | 2023 leaked weight | 2026 status | Effect |
|---|---|---|---|---|
| Not interested documented | long-press, "Not interested in this post" | minus 74 2023 ref | redacted | soft de-weight via offset branch |
| Mute author documented | post menu, mute author | roughly minus 100 2023 ref | redacted | soft de-weight plus deterministic filter on next requests |
| Block author documented | post menu, block author | roughly minus 100 2023 ref | redacted | soft de-weight plus deterministic filter, bidirectional |
| Report documented | post menu, report for policy violation | not enumerated in leaks | redacted | soft de-weight plus policy escalation |
A note on the leaked 2023 magnitudes. The favorite weight in 2023 was 0.5. The mute weight was roughly minus 100. The ratio is 200 to 1. Translated into expected score: a Phoenix prediction of even 0.5 percent mute probability, applied to a weight of minus 100, contributes minus 0.5 on a single head. That contribution alone cancels the contribution of one full favorite at a hundred percent probability. A predicted mute probability of 5 percent, contributing minus 5, cancels the contribution of ten guaranteed favorites. None of these numbers carry over to 2026 unchanged, but the order-of-magnitude property (negatives outweigh positives per action by a factor in the dozens to hundreds) is structural to the formula's design and survives any specific weight adjustment.
Not-dwelled: the negative signal nobody opted into
There is a fifth negative signal that does not appear as a WEIGHT
constant in home-mixer/scorers/weighted_scorer.rs.
It lives in the PhoenixScores proto referenced by
home-mixer/scorers/vm_ranker.rs, a parallel scoring
path that uses a remote value model. The field name is
not_dwelled_score, and its presence in the proto but absence in the
simpler weighted scorer suggests one of two things. Either
home-mixer/scorers/vm_ranker.rs is an experimental
path running alongside the canonical scorer with at least one extra
signal, or it is in fact the production path, in which case the simpler
weighted scorer in the public release omits a signal that production
uses.
The mechanism of not_dwelled_score is the same as the discrete dwell
head, inverted. A viewer scrolling past your post without pausing fires
the binary "not dwelled" condition. Unlike the four explicit negatives,
this one requires no menu interaction, no decision, no friction from the
viewer. Every impression that does not earn a dwell counts. For a
creator, the practical implication is sobering: the negative-signal
volume is dominated not by mutes and reports, which are rare events, but
by the continuous stream of viewers scrolling past without engaging. The
predicted-not-dwelled probability is what most off-target candidates
move on.
This is the silent kingmaker of the negative side of the ledger. A2 walked the positive dwell heads; the absence of dwell is the corresponding negative signal, and it operates at scale because every non-dwelled impression contributes.
The hard-kill filter layer
Upstream of the scorer, four filter files in home-mixer/filters/
remove candidates deterministically before they ever reach Phoenix.
These are not probabilistic predictions. They are boolean conditions:
either the filter triggers and the candidate is dropped, or it does not.
Source: home-mixer/filters/ + home-mixer/scorers/vm_ranker.rs context
| Filter | What it removes | Stage | Severity |
|---|---|---|---|
| muted_keyword_filter.rs documented | candidates whose tokens match the viewer's muted-keyword list | pre-scoring | hard kill |
| author_socialgraph_filter.rs documented | candidates from authors the viewer has blocked, muted, or otherwise excluded via social-graph relationships | pre-scoring | hard kill |
| vf_filter.rs documented | candidates flagged by visibility filtering: deleted posts, spam, violence, gore, policy violations | post-selection | hard kill |
| topic_ids_filter.rs documented | candidates whose Grox-classified topic IDs match the viewer's topic exclusions | pre-scoring | hard kill |
The split is meaningful. The probabilistic layer (the four negative heads in the weighted scorer) handles "the viewer will probably want to mute this if served." The deterministic layer (the four filters above) handles "the viewer has already muted this author, or this content class." Both layers feed the same "less of this" intent, but they operate at different stages and on different signals.
Two consequences for creators. First, once a viewer mutes your account, they are removed from your impression base for that account entirely. The deterministic filter triggers on every subsequent request and your post is dropped before scoring. There is no recovery short of the viewer reversing the mute themselves. Second, mutes are not isolated events. They contribute to Phoenix's training signal across the follower base. A pattern of mutes from a creator's audience adjusts the predicted-mute probability for similar viewers next time.
The voice-drift to mute pipeline
The data flow that connects voice drift to actual mutes runs through five steps, each observable in the repo or directly in the model behaviour.
Step one. A creator's audience formed around a specific voice. The follow decision encoded "I want more of posts like the ones I already saw from this account." Every follower's history sequence now contains several to many of the creator's prior posts paired with the actions that follower took (favorite, dwell, reply, profile click). Phoenix has learned a per-creator distribution across the engagement heads for that follower.
Step two. The creator ships a post that deviates from their historical distribution. The deviation can be tonal (suddenly more formal or more casual), structural (a thread when the audience expects single posts), topical (a topic outside the historical surface area), or stylistic (template-shaped engagement copy when the audience expects unedited voice).
Step three. Phoenix scores the candidate against the viewer's history sequence. The candidate-isolation mask (explained in A1) means the candidate cannot rely on other posts in the same batch for support. The score depends on the candidate's own representation and how well it matches the viewer's per-creator history pattern. An off-voice candidate matches the pattern less well, and the predicted-positive probabilities (favorite, dwell, reply, profile click) shift downward.
Step four. Simultaneously, the predicted-negative probabilities
shift upward. The model has implicitly learned that off-distribution
posts from creators are the ones followers historically reacted to with
mute or "not interested." A 1.5-percent predicted-mute probability that
would have been 0.2 percent on an on-voice post, multiplied by the
large negative weight, is enough to shift the combined score across the
positive-to-negative boundary and into the offset_score rescaling
branch.
Step five. The post serves to a smaller fraction of the audience. Of that smaller fraction, a few actually mute. Those mutes feed the training signal for the next pass. The creator's follower base now contains a slightly stronger prior for off-voice content triggering mute, which raises predicted-mute probability for similar future posts. Repeated drift accelerates the effect.
The pipeline is a compounding mechanism, not a single-event one. The voice-fidelity question is not "did this one post survive?" but "what did this post do to the predicted-mute prior across my follower base for the next ninety days?"
Voice drift score vs predicted mute rate
The relationship between voice-distance from a creator's historical distribution and predicted-mute rate is non-linear. Drift inside the historical envelope has near-zero impact. Drift outside it ramps fast. The chart below sketches the shape with illustrative numbers; production telemetry is not available.
Source: illustrative simulation, weights and counts notional
Two readings of this chart matter. First, the on-voice bucket is not risk-free; there is a baseline negative contribution that comes from viewers who simply did not engage with the post. Second, the crossover where the right-column bar overtakes the left-column bar happens before the post would feel obviously off to the creator. The model is sensitive enough that drift well short of "this reads like a different person wrote it" is already shifting the predicted-action mix.
The shadowban myth, gently retired
A short detour worth making. "Shadowban" is the most common word used for what creators experience when negative signals catch up with them. The word implies a hidden flag, set by a human or a moderator, that silently throttles a specific account. That model does not match what the public 2026 source shows.
What the source shows is two layers operating in parallel. One is the
probabilistic scorer described above, which produces a score that
depends on the predicted-action mix per viewer. The other is the
deterministic filter layer, which removes candidates based on viewer-set
preferences (muted keywords, blocked or muted authors, excluded topics)
or platform-enforced classifications (visibility filtering from
home-mixer/filters/vf_filter.rs for policy
violations). Neither layer references a hidden per-account flag. The
probabilistic layer responds to model predictions that update from
audience reactions. The deterministic layer responds to user-set or
policy-set rules. Both are auditable in the open repo.
The lived experience of "shadowban" is the rate-of-change in those two layers. A surge of mutes from your audience pushes predicted-mute probability up across viewer histories. A keyword that lands in many viewers' muted-keyword lists pulls posts containing that keyword out of their candidate pool. A wave of reports that survive vf_filter review raises the policy classification on the creator's content class. None of these are "shadowban" in the hidden-flag sense. All of them produce the experience that is labelled shadowban. The mechanism is in the source, not in a moderator's spreadsheet.
The practical implication: chasing rumours about hidden flags is a waste. The right operational question is "what is my predicted-action mix doing across my audience?" Voice-fidelity is the strongest direct lever on that mix. Topic surface and policy-class drift are the next two. Everything else is downstream.
What this means for tools that do not measure voice
Three categories of writing tools fail the negative-signal test by construction. Template-based generators (most engagement-optimised products) fit voice to a category-default template; the output reads fluent but off-distribution for any specific creator. Scheduler-shaped tools (volume-first) ship whatever the creator writes, with no fidelity gate. Generic-LLM writing assistants (general-purpose models prompted with light context) produce helpful-assistant register that the audience pattern-matches as not-the-creator within a few posts.
All three categories have positive use cases. None of them protects against the negative-signal economy described above. The protection requires a voice-fidelity check between draft and publish that specifically scores the draft against the creator's historical distribution. That score is mechanically related to predicted-mute probability through the pipeline above, even though no model is yet shipped that maps the two scores directly. A10 of this series compares the four major writing tools against six 2026-algorithm criteria, with the negative-signal exposure one of them.
What to do with this
Two operational moves come out of the negative-signal economy mechanically.
Use voice as the gate, not the goal. The voice-distance score on a draft does not tell you the draft is good. It tells you it sits inside or outside your historical distribution. Inside is the necessary condition. Whether the draft is sharp, insightful, useful, that is a separate question your editorial judgment owns. The score gates risk; it does not promise upside.
Watch the velocity of mutes, not the absolute count. A creator with 50,000 followers expects some baseline mute rate per week. The signal that matters is the rate-of-change. If the weekly mute count is steady, the predicted-mute prior in Phoenix is steady. If the weekly mute count is climbing, the prior is climbing too, which means future posts will score lower by default. Public mute counts are not surfaced in X's own analytics; the proxy is reach loss on otherwise comparable content. A steady week-over-week reach drop on on-topic posts is the lagging indicator of the prior moving against you.
The negative-signal economy is the structural reason "voice as a moat" is not a marketing line. It is mechanically the only positive lever a creator has against a non-linear penalty that scales with audience size. The next article in this series, A8 on voice as an embedding, walks the embedding-level mechanism. The piece after that, on dwell, returns to the positive side of the ledger with the four dwell heads that on-voice content fires most reliably.