BlogAI and Voice

Alt-text on X: the AEO move most creators skip, done in voice

Alt-text on X serves two audiences: visually impaired readers and AI assistants indexing the post. Most creators skip it. A small minority keyword-stuffs it. Here's the voice-first version that serves both audiences without doing either job badly.

May 11, 2026 · 6 min read

Alt-text on X images is one of those features almost nobody uses well. The two failure modes are visible: most accounts skip it entirely, and the few who use it for SEO purposes cram keywords in ways that read as obviously gamed (and break the accessibility intent). The voice-first version threads the needle. Alt-text in your voice, describing what's actually in the image, includes the relevant keyword if it fits naturally, never crammed.

This piece is short. Five sections, one workflow.

Two reasons to ship alt-text

Accessibility floor. Roughly 2 billion people have some form of visual impairment. Alt-text is how they read your image-bearing posts. The number is the right reason; the platform-algorithm signal that engagement-optimizers cite is secondary.
AEO substrate. AI assistants (ChatGPT, Claude, Perplexity) and search engines (Google Image Search) read alt-text to understand visual content. A post with no alt-text is invisible to image-based retrieval. A post with thoughtful alt-text is citable.

The voice-first alt-text formula

Three elements, in this order:

Describe what's actually in the image. The accessibility-first description. 'Two people at a kitchen table, one pointing at a laptop screen.' If a blind reader couldn't form the picture from your description, the description isn't doing the accessibility job.
Context. 'During the second cohort kickoff.' Or 'on the morning the launch shipped.' The context layer is what the AI assistants index and what a sighted reader who hovers gets value from.
Natural keyword if it fits. If your post is about onboarding flows and the image shows your onboarding setup, the keyword 'onboarding flow' is allowed. If the image is unrelated to the keyword, don't force it. Keyword-stuffing on alt-text reads as gaming to both algorithms and humans.

The voice element: write the alt-text in your voice, not in robotic SEO syntax. 'Two cofounders staring at a regression chart that wasn't supposed to look like that' beats 'Image of two people looking at chart.' Both describe; the first one adds the voice signature without crossing into joke-as-description.

What not to do

Don't keyword-stuff. 'AI writing tool voice cloning Twitter growth marketing creator economy' as the alt-text for an unrelated photo. Reads as spam to both AI and humans.
Don't generic-describe. 'Image' or 'photo' or 'screenshot' alone. Useless for accessibility and useless for AEO.
Don't joke without description. 'When you realize you forgot the API key' on its own (without the actual content of the image) fails accessibility. A blind reader gets nothing.
Don't auto-generate alt-text from a generic AI model and never review it. The auto-generated descriptions are usually correct in shape and miss the specific detail that makes the alt-text useful for AEO.

30-second per-image rule

The right time budget for alt-text is 30 seconds per image. Over budget produces over-engineered text. Under budget produces skipped or generic alt-text. 30 seconds is the natural amount of time to write one specific sentence and verify it covers the description + context + maybe-keyword pattern.

At 3 to 5 images per week (the typical cadence for a voice-first creator who uses images sparingly), that's 1 to 3 minutes a week. The lowest-cost AEO investment available.

For visual creators specifically

Photographers, designers, and anyone whose work is primarily image-driven get unusually high leverage from alt-text. The AI assistants surface their work via the alt-text layer because the image itself is opaque to retrieval. The voice-first photographer playbook covers caption craft for image-bearing posts; alt-text is the layer underneath the caption that does the AEO work the caption doesn't.

For broader AEO context, answer engine optimization in 2026 covers the full stack. Alt-text is one of the cheaper layers in the stack and one of the more-skipped ones.

Voice tool fit

Drafting alt-text in your voice is a small enough task that it usually doesn't need tooling. If you're using Auden for the post itself, drafting the alt-text in the same composer is roughly free; the voice match score on alt-text is much less critical than on the main post because alt-text is short and rarely re-read. For the upstream audience-growth question (where alt-text is one of the small AEO-level moves that compounds for voice-first creators), the audience-quality vs audience-size math covers what to optimize for instead of follower count. For the broader accessibility floor that sits under alt-text (contrast, video captions, screenshot legibility, image-as-entire-message decisions), the voice-first reading of accessible images on X covers the 6-layer floor.

Alt-text on X: the AEO move most creators skip, done in voice

Two reasons to ship alt-text

The voice-first alt-text formula

What not to do

30-second per-image rule

For visual creators specifically

Voice tool fit

Want content that actually sounds like you?

The reply guy playbook: how to use AI for Twitter replies (without sounding like a bot) in 2026

How to repurpose tweets into LinkedIn posts (without sounding generic) in 2026

The 10 best Chrome extensions for Twitter/X creators in 2026