Reviews

OpenEvidence review: a brilliant clinical answer engine with a business-model problem

May 22, 202610 min readBy CompareScribes Team

OpenEvidence is the most successful physician software launch of the decade so far — measured by adoption, by growth rate, or by valuation. It is also becoming the most consequential. When a single tool informs the clinical reasoning of roughly two-thirds of America's doctors, is it any good? stops being a consumer question and becomes a public-health one. This is our editorial assessment.

A word on what this is, because on this site the distinction is load-bearing. This is an editorial assessment, not a hands-on test. CompareScribes has not put OpenEvidence through clinical use — the product withdrew from the EU and UK in April 2026, which makes first-hand testing from our side genuinely hard. What follows is our read of the public record: peer-reviewed and preprint evaluations, named clinician accounts, trade reporting and the company's own disclosures — the same evidence-led method we apply to any tool we score from public data. When we are able to complete a hands-on review, we will update this page and label it clearly.

One more clarification we repeat whenever OpenEvidence comes up: it is not an AI medical scribe. It does not transcribe encounters or draft your notes. It is a clinical answer engine — a cited medical-evidence tool clinicians query at the point of care. We cover it because it is the most important product adjacent to the scribe market, and because, in 2026, it is starting to appear inside scribe and EHR workflows.

The verdict

OpenEvidence is the best clinical answer engine available to physicians today, and it is not especially close. It is fast, free at the point of use, and — unlike a general-purpose chatbot — genuinely citation-grounded: independent evaluations repeatedly find it avoids fabricated references, the single most dangerous failure mode of medical AI.

Two things hold it back from an unqualified recommendation. First, it is strongest exactly where medicine is most straightforward — structured, guideline-driven questions — and weakest where clinical judgement matters most: complex, multi-morbid and subspecialty cases. Second, and more seriously, OpenEvidence is an advertising business. It is free because pharmaceutical and device manufacturers pay to reach prescribers at the moment of decision. The company states that advertising does not influence its answers; our editorial position is that a tool with this much reach warrants treating that as a claim to be independently audited, not a reassurance to be taken on faith.

Our bottom line: use it — but use it as a fast, well-sourced literature search, not as an oracle, and not without knowing who is paying for the room.

What OpenEvidence is, in brief

Founded in 2022 by Daniel Nadler and Zachary Ziegler and incubated through the Mayo Clinic Platform accelerator, OpenEvidence answers free-text clinical questions conversationally, synthesising the literature with inline citations to primary sources. It is free for clinicians whose identity is verified (in the US, via the National Provider Identifier), and it carries multi-year content agreements with the NEJM Group and the JAMA Network. By January 2026 it had raised a Series D at a $12 billion valuation; in May it was named to the CNBC Disruptor 50. The full origin-and-Europe story is in our companion explainer — here we focus on the harder question: how good is it, really?

What it genuinely does well

The growth is not purely a financing story. There is real clinical utility underneath it.

It rarely fabricates citations. This is the headline strength, and it deserves to be. Multiple independent evaluations through 2025–26 found OpenEvidence generated evidence-supported answers without inventing references — the failure mode that makes a general chatbot unsafe at the point of care. It gets there by training on peer-reviewed medical literature, using an ensemble of specialised models rather than one, and exposing every source so the clinician can click through. A scribe that hallucinates a symptom is a documented danger; an answer engine that hallucinates a citation is the same danger wearing a lab coat. OpenEvidence has, by the available evidence, largely solved that.

It is fast, and it is free. A conversational, cited answer in seconds, at no cost to a verified clinician, is a genuine workflow change — particularly against UpToDate's subscription pricing, which has long kept the gold standard out of reach for students, trainees and resource-constrained settings.

The corpus is serious. NEJM and JAMA content agreements, plus reported relationships with the AMA and NCCN, mean the literature it draws on is not a scrape of the open web.

It is being embedded where doctors already work. In February 2026 Sutter Health put OpenEvidence inside Epic for its physicians; in March, Mount Sinai deployed it across seven hospitals to physicians, nurses and pharmacists. The company reported one million clinical consultations in a single day on 10 March 2026. Its DeepConsult agent extends the product from single-question search toward multi-step literature review.

And the clinicians who rate it, rate it highly. Dr. Paul Sax, an infectious-disease physician at Brigham and Women's Hospital, has said the tool's answers often "border on miraculous." That is not nothing — Sax is a careful, widely-read clinical writer, not an easy mark.

Where it's weak — and where it's risky

Accuracy is good — but not uniform, and it dips where the stakes rise. The same evaluations that praise its citation discipline are clear-eyed about its limits. A 2025 pilot study of complex medical subspecialty scenarios found that accuracy and repeatability varied; performance was strongest on structured, guideline-based questions and shakier outside them. Clinicians have noted it can draw conclusions that are too strong for the small studies behind them. Reviewers also describe a subtler problem: it tends to reinforce a clinician's existing plan rather than challenge it, and can make interpretive errors even when the citations themselves are correct. None of this makes it unsafe; all of it means it is a research assistant, not a decision-maker.

That caution is shared by clinicians themselves. In one survey of physicians, accuracy and the risk of misinformation was the single most-cited concern (44%), ahead of the lack of oversight or explainability (19%) and legal liability (16%).

It is built for American medicine. The corpus leans US — it will cite AHA guidance and FDA labelling, which can diverge from NICE or the BNF. Verification is built around the US NPI, and the product has withdrawn from the EU and UK altogether. For a non-US clinician this is not a tuning quirk; it is a reason the answer may not match your national guideline.

It does not replace expert synthesis. A peer-reviewed journal article describes a research finding; it does not tell you how to treat the patient in front of you. That translation — distilling evidence into graded, caveated treatment guidance — is what UpToDate's editorial apparatus does and what an answer engine, however fluent, does not. It is why a clinical-governance committee will still accept "per UpToDate" as the basis for a protocol and will not yet accept "the AI said so."

The business-model problem

Here is the part the $12 billion headlines tend to skip — and the part we think every clinician using OpenEvidence should sit with for a minute.

OpenEvidence is free because it is an advertising platform. Pharmaceutical and medical-device companies pay to place sponsored content in front of prescribers — and they pay a lot: trade reporting puts the cost at CPMs of $70–150 and up, against an audience of roughly 600,000 US prescribers, for a reported revenue run-rate near $100 million by early 2026. The ads are served at the most valuable moment imaginable for a drug marketer: while a clinician waits for the answer to a clinical question. The platform also records how clinicians use it — which topics they engage, on which devices — and that signal can be used to target what each user sees.

Step back and the structure is stark. The drug industry already spends around $14 billion a year influencing the small population of people licensed to prescribe. OpenEvidence has built the most precisely targeted channel into that population that has ever existed, and wired it directly into the moment of decision.

The cautionary precedent is not hypothetical. Practice Fusion, an EHR vendor, took pharmaceutical money to embed clinical decision-support alerts that nudged doctors toward a specific manufacturer's opioid — alerts that fired hundreds of millions of times before the scheme produced a $145 million federal settlement in 2020. The point is not that OpenEvidence is doing this. The point is that "a clinical-software company is paid by pharma to influence what clinicians see at the point of care" is a sentence with a documented, recent and ugly history.

To its credit, OpenEvidence has the right policy on paper: it states that advertising "shall not be considered an endorsement" and that "advertisers cannot influence answers." We have seen no evidence that policy is being broken. But a policy is a promise, not a control. Which topics trigger which sponsored content — and whether the answer and the ad stay as cleanly separated in practice as in the terms of service — is not something an outside party can currently audit. A site like this one exists precisely because vendor assurances in healthcare need independent checking, and on OpenEvidence that check does not yet exist. Until it does, the honest editorial position is: trust the answers, open the citations, and stay actively aware that the page is also a very expensive billboard.

The disclosure gap

There is a second, quieter issue. Most patients have no idea any of this is happening. OpenEvidence is used across tens of millions of clinical encounters a month, and there is no general requirement that a clinician disclose having consulted it — nor that the AI-shaped reasoning be recorded in the chart. The chart shows the decision, not the input behind it. US transparency rules for AI in certified EHRs (the ONC's HTI-1 rule) may not even reach a standalone tool used alongside, rather than inside, the record. This is not OpenEvidence's failing in particular — it is a regulatory gap the whole category sits in — but a buyer should know it is there.

OpenEvidence vs UpToDate

For most clinicians the practical question is not "OpenEvidence or nothing" but "OpenEvidence or UpToDate" — and the honest answer is that they do different jobs.

	OpenEvidence	UpToDate
Format	AI answer engine over primary literature	Expert-synthesised topic reviews
Speed	Seconds; conversational	Slower; you read a topic
Cost	Free for verified clinicians	Paid subscription
Best at	Specific questions, recent papers, raw speed	Graded, actionable treatment guidance
Governance trust	Not yet accepted as a basis of record	Widely cited in protocols and committees
Funded by	Pharmaceutical advertising	Subscriptions

Treat OpenEvidence as the fast first look — a specific question, a recent paper, a quick orientation on an unfamiliar topic — and UpToDate (or your national guideline) as the place you confirm a complex or high-stakes decision. Most clinicians who use both describe exactly that division of labour.

Who should use it — and how

Use it if you want a fast, citation-grounded literature search at the point of care, a second angle on a question, or a way into the recent evidence on an unfamiliar topic — and especially if UpToDate's price has kept you out.

Be cautious if you practise outside the US (the guidance may not match your national standard), or if you are tempted to let it settle a complex, multi-morbid or subspecialty decision on its own.

However you use it: click through to the cited source rather than trusting the summary; treat a confident answer built on a small study with the same scepticism you would apply to the study itself; and remember that sponsored content shares the page with the evidence.

What this means for scribe buyers

OpenEvidence matters to this site for a concrete reason. In March 2026 it launched Coding Intelligence, surfacing ICD-10, E/M and CPT suggestions inline as clinicians document — and it has begun appearing inside ambient workflows, including Microsoft Dragon Copilot. The moment an evidence tool starts suggesting codes and impressions, it moves from "search" toward the territory a scribe occupies — and, in Europe, toward the Medical Device Regulation's Rule 11, which pulls decision-informing software into Class IIa and above. It is the same regulatory fault line we treat as first-class in our methodology and our compliance buyer's checklist. If you are choosing a scribe, assume the evidence layer and the documentation layer are converging — and ask every vendor where their tool sits on that line.

Bottom line

OpenEvidence has earned its adoption. It is the strongest clinical answer engine available, it has largely solved the fabricated-citation problem that makes general AI unsafe in medicine, and it is free at a point where the incumbent is expensive. Used as a fast, well-sourced literature search — with the citations actually opened, and with judgement reserved for the clinician — it is a genuine asset.

But it is not an oracle, it is weakest on exactly the hard cases where doctors most want help, and it is, structurally, an advertising business pointed at prescribers. The hype treats the $12 billion valuation as proof the product is beyond question. We would put it the other way around: a tool this powerful, this fast-growing and this entangled with pharmaceutical money is one that deserves exactly the kind of independent, sceptical scrutiny it has so far mostly escaped.

That is our read from the public record. When we can put OpenEvidence through hands-on clinical use, we will test these conclusions directly — and update this review, in full view, with whatever we find.

Related guides