How benchmark.ai works
A precise account of the data, scoring rubric, and inference process behind every report — and an honest map of where the method is strong and where it is not.
Data sources and what they measure
Live web search (Tavily)
Tavily is a search API built for language models; it retrieves and ranks recent web content on demand. At report time benchmark.ai queries it per company and collects up to 20 signals — news, earnings commentary, official announcements, and analyst coverage — from the trailing 12 months. It supplies the live, company-specific evidence that the static sector benchmarks cannot.
Limitation: Coverage skews toward English-language, digitally visible organisations, and retrieval reflects how a company is discussed publicly rather than what it does internally.
IMD Future Readiness Indicator (2026)
Published annually by IMD Business School, this index scores 67 economies across four pillars — technology, future readiness, connectivity, and knowledge — derived from enterprise survey data and public investment metrics. benchmark.ai reads its sector-level readiness aggregates. Selected for its rigorous multi-pillar construction and consistent year-on-year methodology.
Limitation: Scores reflect national sector averages and cannot be disaggregated to individual company performance.
Stanford HAI AI Index (2025)
Stanford’s Institute for Human-Centered AI publishes this annual report synthesising AI investment, model performance, adoption, and labour-demand data from academic, governmental, and commercial datasets. benchmark.ai uses its sector-level investment and adoption indicators. Selected for the breadth and independence of its underlying data.
Limitation: It measures aggregate ecosystem and sector trends, not the maturity of any individual firm.
McKinsey State of AI (2025)
McKinsey’s annual global executive survey reports AI and generative-AI adoption rates and self-assessed value capture by function and industry. It contributes a practitioner view of where AI is deployed and where it is reported to create value. Selected for its large recurring sample and functional granularity.
Limitation: Responses are self-reported and subject to selection and response bias; results describe industries, not named companies.
OECD AI and Work (2024)
The OECD’s work on AI and labour markets analyses adoption, automation exposure, and workforce impact across member economies using official statistics and employer surveys. It anchors the workforce-transformation dimension at sector level. Selected for its methodological independence and labour-market focus.
Limitation: Updated less frequently than the other sources, and reports occupational and sectoral exposure rather than firm-level workforce change.
Scoring dimensions and behavioural anchors
Each company is scored 0–10 on three dimensions. The anchors below define what distinguishes one band from the next, so a 6 and a 7 reflect materially different evidence rather than impression.
AI Integration Depth
Quantified Business Impact
Workforce Transformation
From signals to scores
For each company the model receives up to 20 retrieved web signals alongside the relevant sector benchmark figures. It assigns a 0–10 score on each dimension by matching the available evidence to the behavioural anchors above, and is required to cite the specific signals that justify each score.
Scores reflect the weight and specificity of evidence, not its volume alone: a company with three corroborating earnings-call references scores higher than one supported by a single press release. The model is instructed to score conservatively where evidence is thin — defaulting to 3 or below when a dimension is unsupported — rather than inflate sparse signals. These scores are not validated against any ground truth; they are a structured, reproducible reading of public information, not a measurement.
Reading the strategic positioning chart
Each company is plotted on two fixed axes — AI Integration Depth (horizontal) and Quantified Business Impact (vertical) — with dot size encoding the third dimension, Workforce Transformation. The quadrants are diagnostic rather than evaluative: a position describes a posture, not a verdict.
Leaders
Deep integration converting into measurable business impact.
Optimisers
Measurable impact from focused, less extensive deployment.
Experimenters
Broad AI activity not yet translated into measurable returns.
Laggards
Limited integration and little quantified impact to date.
Evidence weighting and confidence thresholds
Every company carries a confidence rating that signals how much weight its scores should bear.
Limitations and interpretive caveats
- 1
Benchmark granularity: All four institutional benchmarks publish sector-level data only; individual company scores are not available from them. Company positioning within a sector is inferred from live signal analysis, not directly measured.
- 2
Evidence asymmetry: Listed companies with active investor-relations programmes generate more signals than private firms or those with limited English-language digital presence, creating a systematic bias toward larger, more communicative organisations.
- 3
Temporal inconsistency: Live signals reflect the past 12 months while the benchmark sources span 2024 to 2026, so cross-source comparisons should account for this gap.
- 4
LLM non-determinism: Model outputs are probabilistic: two reports on identical inputs may differ by ±1–2 points. Reports should not be used to draw fine-grained distinctions between companies with similar scores.
- 5
Signal-volume constraint: Each report draws on at most 20 web signals per company — sufficient for orientation and hypothesis generation, but not a comprehensive audit.
- 6
Self-reported data: Many signals originate from press releases, earnings calls, and newsrooms, which carry promotional bias. The model weights third-party and analyst sources more heavily but cannot fully eliminate it.
Reports are generated by Claude Sonnet 4.6 (Anthropic, 2025), a large language model with a knowledge cutoff of early 2025. Web signals are retrieved in real time via the Tavily Search API. Sector benchmark data is sourced from static datasets updated annually. benchmark.ai v1. The authors recommend treating outputs as structured research orientation rather than definitive competitive intelligence. All findings should be independently verified before informing strategic decisions.