About

Behind glio.ai.

A small, focused tool that pattern-matches proposed glioblastoma trial designs against a hand-curated database of 31 historical trials, spanning both Phase 2 and Phase 3 readouts. Built to learn from past failures rather than repeat them.

Why this exists

Roughly 100,000 people die from glioblastoma every year worldwide. The standard treatment, the Stupp protocol, hasn't fundamentally changed since 2005. Despite billions of dollars and hundreds of trials, no drug has improved survival in two decades.

The reason isn't lack of trying. The field keeps running large, expensive trials that repeat the same patterns of failure: trials that improve a scan-based measure but not survival, trials that advance on borderline Phase 2 evidence, trials that test drugs which never reach the brain at therapeutic levels.

glio.ai is one small attempt to map those patterns and surface them when a new trial is being designed. Not as a regulator. Not as a biostatistician. As a structured second opinion grounded in what has already been tried and how it landed.

How AI is used

The pre-mortem analysis is generated by Anthropic's Claude (Sonnet 4.6 for the trial-design analysis, Haiku 4.5 for individual trial deep-dives). The AI receives the full curated database plus the user's structured input and is instructed to ground every claim in the data provided.

The AI's job is pattern matching against the database, not novel statistical prediction. Specifically, the AI does not:

Generate probabilities of regulatory approval
Run any kind of statistical model
Simulate trial outcomes
Replace biostatistical or regulatory consultation

AI output is non-deterministic. The same input may produce slightly varied verdicts. Users should read the AI's reasoning, not just its risk rating. Always verify specific claims (trial names, hazard ratios, p-values) against the cited primary sources.

How the database was built

The 31 trials in the database were curated manually from the following sources:

Primary publications in peer-reviewed journals: Lancet Oncology, NEJM, JAMA Oncology, Journal of Clinical Oncology, Neuro-Oncology, Clinical Cancer Research, Nature Medicine.
ClinicalTrials.gov for trial design details (NCT numbers, sample sizes, primary endpoints, randomization).
Sponsor press releases and investor announcements for the sponsor's stated reason for outcome.
PubMed for citation verification (PMIDs included where available).

Inclusion criteria

Phase 3 GBM trials with public readouts (21 entries).
Phase 2 GBM trials that are well-documented, mechanistically diverse, and either pivotal predecessors to a Phase 3 trial or important standalone readouts (10 entries).
Years 2005 to 2024.
Glioblastoma as the primary indication (excludes anaplastic astrocytoma except where noted).
Adult populations.
English-language primary publications.

Fields extracted per trial

For each trial, the following fields were extracted and verified against the primary source:

Trial name and NCT number
Drug, dose, and schedule
Mechanism of action
Target class (anti-angiogenic, checkpoint inhibitor, etc.)
Setting (newly diagnosed vs. recurrent)
Biomarker enrichment (MGMT methylation status, EGFR amplification, etc.)
Sample size
Primary endpoint and trial hypothesis
Key results: median overall survival, hazard ratio, p-value, whether the OS endpoint was met, whether the PFS endpoint was met
Sponsor's stated reason for outcome
Independent analysis (which failure pattern, if any)
Source citations (PMIDs)

The 9 failure patterns

The pattern library was derived inductively by looking across the curated trials and identifying common modes of failure. Each pattern is named, defined, and tied to specific examples in the database.

Phase 2 signals don't survive Phase 3 confirmation. Phase 2 to Phase 3 effect-size collapse Early-phase trials often compare patient outcomes against historical data rather than a randomized control. When the same drug is tested in a larger, randomized Phase 3, the apparent benefit shrinks dramatically or disappears entirely. The original signal was an artifact of the comparison method, not a true treatment effect.
Imaging response without survival benefit. Pseudoresponse (anti-angiogenic class) Certain drugs reduce the contrast-enhancement signal that brain tumors give off on MRI without slowing the underlying cancer. Scans show smaller tumors and progression-free survival appears to improve, but overall survival is unchanged. Most common with anti-angiogenic agents, which alter blood-brain barrier permeability rather than tumor biology itself.
Immune-cold tumors resist checkpoint blockade. Cold-tumor immune therapy failure Glioblastoma has a low tumor mutational burden, sparse infiltrating T cells, and frequent steroid-induced immunosuppression. Drugs that release the brakes on the immune system (PD-1 inhibitors like Keytruda or Opdivo) have failed across every glioblastoma subgroup tested. The immune system cannot attack what it cannot recognize.
Mutation frequency does not equal driver dependency. EGFR target frequency-vs-dependency mismatch Roughly half of glioblastomas carry mutations or amplifications of the EGFR gene. Despite this prevalence, drugs that block EGFR have not improved outcomes. The mutation is common but not load-bearing for tumor survival; the cancer does not depend on EGFR to grow.
Standard-of-care evolution outpaces the trial. Standard-of-care drift Phase 3 trials often run for five or more years. A drug that looked promising against historical control outcomes may, by the time the trial reads out, be compared against a control arm that has improved through better surgery, patient selection, or supportive care. The benchmark shifts under the trial's feet.
Active comparators introduce their own biases. Active-comparator complications When a trial uses another active drug instead of placebo as the comparison arm, that drug's own effects (especially imaging-level effects, like bevacizumab's pseudoresponse) can complicate interpretation. The comparison no longer cleanly measures the new drug's value.
Mid-trial design changes undermine the results. Methodological corruption Examples include changing the primary endpoint after enrollment has begun, allowing universal crossover from control to investigational arm at progression, or comparing against external (non-randomized) controls. Each of these compromises the inferential clarity of the trial.
Drug exposure is inadequate at the chosen dose or schedule. PK or dose-schedule inadequacy The drug is administered too rarely, at too low a concentration, or via a route that does not reach the brain in therapeutic amounts. Whatever the trial measures, it isn't truly a test of the drug's mechanism, only a test of an under-dosed regimen.
The drug class has repeatedly failed in this disease. Drug-class repeat failure When multiple drugs operating through the same mechanism have already failed in glioblastoma, repeating the approach without a meaningful change in patient selection, dose, or trial design rarely produces a different result.

Limitations

Honest about what this tool can and can't do:

Small dataset. 31 trials is a fraction of the full GBM drug development landscape. Citeline's enterprise database has 28,000+ oncology trials. Pattern recognition from 31 cases is suggestive, not statistically powered prediction.
One-disease scope. The tool is designed for glioblastoma only. Findings do not transfer to other cancer types.
One curator. The database reflects one person's choices about what counts as a failure, what fields matter, and which trials to include. Subjective bias is real.
Phase 2 and Phase 3 only. Phase 1 dose-escalation trials, expanded access, compassionate use, and most pediatric trials are excluded (with one Phase 1 exception, PVSRIPO, included as the field's most-cited oncolytic virus reference).
Not peer-reviewed. The methodology has not been formally peer-reviewed or published.
English-language only. Trials reported only in non-English journals are missing.
2005-2024 window. Older trials and trials still ongoing are not represented.

What's next

Planned: validation study (running the tool against known historical trials retrospectively), expansion to additional Phase 2 trials, and a methodology preprint on medRxiv.

Acknowledgments

PubMed and the National Library of Medicine for free public access to clinical trial publications.
ClinicalTrials.gov for trial registration data.
Anthropic for the Claude API that powers the analysis.
Vercel for free hosting.
The clinical investigators, patients, and families whose participation in failed trials produced the lessons this tool draws on.

Database last updated April 2026

Trials in database 31

Latest readout included ROAR (2022)

Date range 2005 – 2024