We Rated 700+ X Articles with AI: What We Found

We built a system that rates X articles the way a newsroom editor would: not just “is this interesting?” but “is this actually good?”

Four AI agents score every article independently on credibility, originality, depth, and reader value. Each gets a 0-10 score. The overall rating uses a weighted average with a few adjustments to prevent everything from clustering in the middle.

After rating over 600 articles across dozens of topics, the numbers tell a clear story.

The headline number: most X articles are mid

The average score across all rated articles is 6.1 out of 10. The median is 6.3. That puts the typical X article squarely in the “decent” tier, not great, not terrible, just okay.

Here’s how the distribution breaks down:

Tier	Score Range	% of Articles
Skip	0-3	1.8%
Filler	3-5	16.0%
Decent	5-7	55.9%
Worth It	7-9	26.3%
Must Read	9-10	0.0%

Over half of everything published lands in the 5-7 range. Readable, sure. But not the kind of content that teaches you something new or changes how you think about a topic.

The “Worth It” tier (7-9) captures about a quarter of articles. These are the ones with genuine insight, real data, or a perspective you haven’t encountered before.

And “Must Read”? Zero articles out of 700+ have hit a 9.0. That’s not a bug in the system. It’s a reflection of how rare genuinely exceptional writing is, on any platform.

Originality is the weakest dimension

Of the four scoring dimensions, originality consistently scores the lowest, averaging 0.46 points below the overall score. That means even articles that are well-written, credible, and useful are often saying things that have been said before.

The most common pattern: someone takes a trending topic (AI agents, prompt engineering, crypto market analysis), writes a competent summary of the current state of things, adds a few personal takes, and publishes. It’s not bad writing. It’s just not new.

Reader value scores the highest on average, which makes sense. Even derivative content can be useful if it’s well-organized and saves someone time. But the gap between “useful” and “original” is where the real quality difference lives.

AI slop is real, but it’s not the majority

About 1.8% of articles scored below 3.0 (the “Skip” tier). These are the obvious cases: content that reads like it was generated by AI with minimal editing, articles with fabricated-sounding statistics, or pieces that say nothing in 2,000 words.

The bigger problem isn’t the 1.8% that’s clearly bad. It’s the 55.9% that’s clearly mediocre. The “decent” tier is full of articles that are technically fine but don’t earn the time you’d spend reading them. They’re the content equivalent of fast food: fills the gap but doesn’t nourish.

The topics that score highest (and lowest)

After tagging every article with topics, some patterns emerge. Articles about specific technical implementations (how someone built something, with code and real numbers) tend to score well on depth and credibility.

Articles about broad industry trends (“the future of AI”, “where crypto is headed”) tend to score lower on originality, because everyone is writing the same trend piece with the same data points.

The best-performing articles share a few traits:

First-hand experience: someone writing about what they actually did, not what they think about what someone else did
Specific numbers: real revenue figures, actual user counts, concrete timelines
Strong opinions backed by evidence: not hedging every claim with “it depends”
A single clear thesis: not trying to cover everything about a topic

What the credibility agent catches

The credibility agent (Dr. Vera Chen, in our system’s persona) evaluates whether claims are backed up and whether the author signals what they know versus what they’re guessing.

The most common credibility issues:

Unsourced statistics: “Studies show that 80% of…” without naming the study
Confidence without evidence: strong claims presented as fact with no supporting data
AI-generated feel: suspiciously comprehensive coverage of a topic with no personal experience or specific examples
Outdated information: citing data from 2-3 years ago as if it’s current

Articles that score 8+ on credibility almost always have one thing in common: the author clearly has direct experience with the thing they’re writing about. They name specific tools, reference actual conversations, and acknowledge what they don’t know.

The depth problem

The depth agent (Prof. Kapoor) is the second-harshest scorer. Most articles establish a thesis in the first two paragraphs and then spend the rest of the word count restating it in different ways.

Real depth means:

Exploring counterarguments
Explaining why, not just what
Acknowledging tradeoffs and edge cases
Building on the thesis rather than repeating it

The articles that score highest on depth tend to be longer, but length alone doesn’t help. A 3,000-word article that says the same thing five ways scores lower than a 1,000-word article that explores three distinct angles on a topic.

So what’s actually worth reading on X?

Based on 700+ ratings, the honest answer: about 1 in 4 articles. The 26.3% that score “Worth It” or above are the ones where the author genuinely thought about what they were writing, had something real to contribute, and put in the effort to make their case.

The rest isn’t necessarily bad. It’s just not worth the 8-15 minutes you’d spend reading it when you could be reading something from that top quartile instead.

That’s exactly why we built XDigestly. Not to summarize articles (though it does that), but to answer the question that matters: is this worth your time?

Try it yourself. Paste any X article URL at xdigestly.app/rate and see where it lands. Or browse the trending page to see the highest-rated articles from the past week.