How Importance Scoring Works

A vast amber sorting machine with articles entering at top and falling into ranked bins below

Not every article matters equally. A #1 story that holds its position across a dozen snapshots is more important than a #1 story that appears once and slides. An article discussed by journalists who consistently appear on the biggest stories carries more weight than one retweeted by random accounts. A story that attracts ten thematically related articles is a nexus point; a story semantically distant from everything else on the same day is a novel signal. Importance scoring combines these intuitions into a single number — one that powers every ranking on the Pulse API and every sort order across the platform.

The Problem with Simple Ranking

The naive approach is to sort by position: #1 is the most important story, #2 is second, and so on. This fails for three reasons.

First, position is a snapshot. A story's position on an aggregation page changes throughout the day. An article might debut at #15, rise to #1 as more outlets cover it, then fade to #30 by evening. A single position reading captures none of this trajectory. Was it at #1 for twelve hours or twelve minutes?

Second, position ignores social signal. Two articles at the same position can have vastly different social footprints. One gets discussed by three accounts; the other triggers a hundred-tweet debate among the industry's most prominent voices. Position doesn't know the difference.

Third, position is day-relative. Being #1 on a quiet Sunday means something different than being #1 on a day when three major companies report earnings simultaneously. Importance needs to account for the context an article competes in.

Five Signals

The importance score combines five independent signals, each capturing a different dimension of "this story mattered." Every signal produces a 0-to-1 value. They're weighted and combined into a final score.

1. Position AUC (weight: 35%)

We scrape the homepage 12-24 times per day. Each scrape records every article's position. Instead of using a single position snapshot, we compute the area under the curve of an article's prominence over time — the same AUC technique used in clinical trials and machine learning evaluation.

An article that holds #1 for ten hours scores higher than one that spikes to #1 for thirty minutes. An article that rises from #20 to #3 and stays there scores differently than one that starts at #3 and decays. The AUC captures the full trajectory, not a single frame.

Articles that remain on the page for longer also receive a mild longevity boost — sustained presence is a signal of lasting relevance.

2. Gravity (weight: 20%)

Every article has a 1,536-dimensional embedding — a vector that captures its semantic meaning. On any given day, we can measure how many other articles are semantically close to a given article. We call this gravitational pull.

An article about Microsoft's earnings report will have high gravity because dozens of related articles — analyst reactions, competitor comparisons, segment breakdowns — cluster nearby in vector space. An article about a niche regulatory filing will have low gravity because nothing else that day covers similar ground.

High gravity means a story attracted a constellation of coverage. It became a nexus point that other reporting oriented around. This is different from related coverage count (which measures linked articles) — gravity captures thematic clustering among articles that may not link to each other at all.

3. Link Power (weight: 20%)

When multiple outlets cover the same story, each additional source is evidence that the story matters. An article with 40 related sources attracted attention across the industry. One with 2 related sources is a niche report.

Link power uses logarithmic scaling — the jump from 0 to 5 sources matters more than the jump from 40 to 45. It also factors in source authority: coverage by outlets that historically land in top positions carries more weight than coverage by outlets that rarely break through. (Explore any outlet's coverage history on its domain page.)

4. Social Authority (weight: 15%)

The count of social discussion posts about an article is useful but crude. Ten posts from random accounts mean less than three posts from people who consistently discuss the most important stories.

We compute social authority using a PageRank-style iterative propagation. The insight from Google's original algorithm applies directly: a link from an important page is worth more than a link from an unimportant one. In our case: a discussion from an authoritative handle is worth more than one from an unknown account.

The loop works like this:

Seed each article with a base importance score from the other four signals.
Compute handle authority: each handle's authority is the average importance of the articles it discusses. Handles that consistently appear on important stories earn high authority.
Recompute each article's social score as a blend of raw discussion count and the authority of the handles discussing it.
Repeat until scores converge (typically 3 iterations, with convergence to less than 0.01% change).

This creates a recursive quality signal: articles discussed by authoritative handles score higher, and handles that discuss high-scoring articles earn more authority. The system discovers which voices matter without any manually curated influencer list. Every handle's authority is visible on its handle page.

5. Semantic Novelty (weight: 10%)

Each day, we compute a centroid — the average embedding of all articles published that day (the same centroid that powers semantic drift measurement). An article's novelty is its cosine distance from this centroid. Articles far from the center cover genuinely different ground from the day's dominant themes; articles near the center are part of the pack.

Novelty is the complement of gravity. A big earnings story has high gravity (many nearby articles) but low novelty (it's close to the centroid because it is the day's theme). A surprising scoop about an unrelated company has low gravity but high novelty — it stands alone.

This is where the scoring system captures scoops, contrarian reports, and genuinely new information that didn't fit the day's narrative.

Cross-Day Comparability

A critical design choice: all five signals use absolute values, not day-relative rankings. Position AUC measures absolute prominence. Link power counts real sources. Social authority uses real handle weights. Novelty uses raw cosine distance (calibrated to the empirical distribution across all days). Gravity counts real nearby articles with a fixed cap.

This means scores are comparable across days. An article scoring 0.85 on a Tuesday is meaningfully more important than one scoring 0.60 on a Friday. This enables day-level aggregation — summing or averaging article scores to measure how important a day was, not just which article won each day. Browse any day's scored coverage in the daily chronicles.

A day-density adjustment provides a mild boost on busier days: standing out semantically among 90 articles is harder than standing out among 30, and the scoring reflects this.

What the Score Means

Earnings reports from trillion-dollar companies, landmark product launches, industry-shifting announcements

Major product updates, large funding rounds, regulatory actions, executive changes at top companies

Industry reports, mid-tier company news, partnership announcements, feature launches

Standard coverage, minor product updates, personnel moves, regional stories

Low-position articles with limited social discussion and minimal related coverage

What's Not in the Score

The system deliberately excludes several things:

Sentiment. Importance and sentiment are orthogonal. A devastating security breach and a triumphant product launch can both score 0.85. The score measures how much the story mattered, not whether it was good or bad news.

Prediction. The score doesn't predict future impact. It measures observed importance at the time of coverage — position persistence, social discussion, coverage breadth. An article that quietly plants a seed for a future trend scores based on what it received, not what it caused. (For tracking how narratives evolve over time, see entity momentum.)

Editorial judgment. No human curates the weights or overrides the scoring. The only hand-tuned parameters are structural — the weight ratios between signals, the diminishing returns caps, the PageRank damping factor. These were calibrated empirically against score distributions and validated against known major stories.

Technical Summary

Signal	Weight	Method	Source
Position AUC	35%	Trapezoidal integration of prominence over 12-24 daily snapshots	article_positions table
Gravity	20%	Count of same-day articles within cosine distance 0.45, capped at 15	Embedding similarity
Link Power	20%	Log-scaled related coverage count with source authority boost	related_coverage array
Social Authority	15%	PageRank propagation over handle-article bipartite graph (3 iterations)	discussion_posts array
Novelty	10%	Raw cosine distance from daily centroid, calibrated to empirical p5-p95	Embedding distance

Source authority applies a multiplicative adjustment of up to ±8% based on a source's historical top-5 rate. The final score is clamped to [0, 1].

The scoring runs automatically as part of the daily pipeline. Every new article is scored within hours of ingestion, using all available signals at that point. Scores can be recomputed as more social discussion and related coverage accumulates. Access importance scores programmatically through the Pulse API, the API reference, or the CLI.

See also: How TEXXR Works, Entity Momentum, Semantic Drift, Knowledge Graphs. Build on this data with the Tech News API.