acurast.fleet · node -- OPERATIONAL --:--:-- UTC anon
[ CORPORA 03−METHOD ] · METHODOLOGY

What's being measured. And what isn't.

01 // The two corpora

DriftMetrics computes a single number — drift score — from the gap between two text corpora that both claim to represent "what the internet thinks about X." The gap is what the algorithmic sort suppresses. The drift score is how big that gap is.

The Enclosure

The first ten organic results from a mainstream commercial SERP. By default this is Bing (with Yahoo as fallback when Bing rate-limits). These are the URLs and snippets a typical user sees when they type the keyword into a search engine — heavily ranked, heavily filtered for commercial viability, optimized for clickthrough.

The Baseline

Two sources fused: html.duckduckgo.com/html/ (the legacy DDG endpoint, more keyword-literal and less aggressively re-ranked) and the Wikipedia OpenSearch + REST summary API (organic encyclopedic intent, ranked by reader behavior, not advertiser bidding).

02 // The math

Both corpora get tokenized with a tight stopword filter. Each corpus becomes a TF-IDF vector over the union vocabulary. From those two vectors:

The composite drift score is a weighted blend: 0.55·SEM + 0.20·LEX + 0.15·(1−OVL) + 0.10·COM. Weighted toward semantic delta because that's the signal that maps to "the algorithm is showing you a different reality."

03 // What this isn't

Drift score is not bias detection. It does not say which side is "right." A high drift just means the two corpora disagree heavily on which words matter; both could be lying, neither could be lying, the gap is the data.

Drift score is not a search engine quality measure. Bing and DDG aren't being graded against each other. They're being used as proxies for two different sampling mechanisms — commercial ranking (Enclosure) and lexical / organic-encyclopedic surfacing (Baseline). The score describes the distance between those mechanisms applied to one keyword, not which mechanism is better.

Drift score is not stable across time. The same keyword scanned a week apart will produce different scores because the internet underneath both corpora moved. The number is a snapshot, not a constant.

04 // Honest limitations

05 // Why this exists

Most "search transparency" tools are built by ad-tech companies measuring how to optimize ad spend. DriftMetrics is the inverse: built by a small studio (Asleepius Games) to give everyone — not just paying customers — a way to look at the gap between algorithmic surfaces and lexical surfaces, freely, with the same view everyone else gets.

The architecture enforces that. There is no admin tier, no enterprise SSO, no internal dashboard with "real" numbers. The numbers in the index are the numbers we have. If you paid a code to scan a keyword, anyone hitting the cache the next hour sees what you saw. That's the trade.