Show summary Hide summary
ArXiv has moved to tighten oversight of papers that rely on AI text generators, warning that clear evidence authors failed to verify model outputs could lead to suspension. The change, announced this week by the head of arXiv’s computer science section, aims to protect the repository’s role as a trusted early-distribution channel for research.
Long before peer review, many computer science and mathematics studies appear on arXiv, shaping how ideas spread and how data sets used to train models are constructed. That influence has raised stakes as more authors experiment with large language models to draft content, sometimes without sufficient checks.
What the new approach says
Invest strategically as tech rally cools: S&P 500 falls 0.55%, Nasdaq retreats on bond selloff
10 year treasury yield rises to 4.61%, highest level in 10 months
The policy update, conveyed by the computer science section chair, makes a hard line: if reviewers or moderators find incontrovertible signs that authors did not validate material produced by an LLM, the submission will be treated as unreliable. Identifying marks include invented or unverifiable references and unedited model prompts or replies embedded in the submitted text.
In those cases, the repository will apply a one-year ban on future postings from the authors. After that period, any new submissions must first have been accepted by an established, peer-reviewed venue before being posted on arXiv.
This is not a blanket prohibition on using AI tools. The platform’s leaders emphasize that researchers remain fully responsible for their manuscripts’ accuracy and integrity, no matter how content was created. Copying unchecked model output — whether it contains plagiarism, biased statements, factual errors, or fabricated citations — will still be the authors’ responsibility.
How enforcement will work
Moderators will flag suspicious submissions and section chairs must confirm the evidence before any sanction is imposed. The rule is intended as a single-strike penalty, but authors will have an appeals route should they contest the finding.
ArXiv has already introduced other safeguards against low-quality, AI-driven contributions, such as requiring first-time posters to obtain an endorsement from an established author. The organization is also transitioning away from long-term hosting by Cornell to operate as an independent nonprofit, a move that could expand its capacity to fund moderation and platform improvements.
- Why this matters now: arXiv is a primary distribution point for preprints and a common source for datasets used in AI research. Contaminated entries distort scientific records and downstream models.
- Impact on authors: Mistakes tied to unvetted AI output can result in temporary exclusion and a more onerous pathway back onto the site.
- Broader consequences: Increasing reports of made-up citations in peer-reviewed fields, notably biomedicine, suggest the problem reaches beyond preprints and threatens trust in published work.
| Trigger | Likely outcome | Author options |
|---|---|---|
| Clear signs of unverified LLM output (e.g., fabricated references) | One-year posting ban; future submissions require prior peer-reviewed acceptance | Appeal the decision; correct and resubmit through peer-reviewed channels |
| Use of LLMs with proper verification and attribution | No automatic penalty; standard moderation applies | Disclose AI use and document checks performed |
For researchers, the practical takeaway is straightforward: if you use an AI model to draft or assist with a manuscript, verify every factual claim, confirm every citation, and remove any visible model prompts or raw outputs before posting. Failing to do so risks not only temporary exclusion from a major research archive but also damage to professional reputations.
As preprint servers and journals grapple with similar issues, expect tighter screening processes and more explicit disclosure requirements across the research ecosystem. That may slow some submissions but could improve long-term reliability of scientific communication — a trade-off many in the community view as necessary.












