Methodology
Each card is produced against a held-out evaluation set drawn from the RAID-style multilingual corpus, augmented with an arXiv academic-prose subset for English and translation-equivalent subsets for the other nine languages. The operating threshold is fixed at FPR 5% on the calibration window; verdicts in the abstain band between the two thresholds return manual review rather than a forced classification.
We publish the calibration card before the verdict. A scan returns a receipt that references the card SHA used to produce it. An auditor can fetch the card, confirm the model SHA, and reproduce the verdict against the published thresholds. The samples figure refers to the held-out evaluation set, not the full training corpus.